(Lecture Notes in Computer Science 6729 Theoretical Computer Science and General Issues) Li Nie, Liang Gao, Peigen Li, Xiaojuan Wang (Auth.), Ying Tan, Yuhui Shi, Yi Chai, Guoyin Wang (Eds.) - Advance

Download as pdf or txt
Download as pdf or txt
You are on page 1of 611
At a glance
Powered by AI
The document discusses proceedings from the Second International Conference on Swarm Intelligence including topics related to biclustering algorithms.

The document discusses proceedings from a conference related to swarm intelligence and includes sections on biclustering of gene expression data.

Techniques like Fish School Search (FSS), ant colony inspired biclustering (bicACO), and PSO-GA hybrid are discussed for biclustering.

Lecture Notes in Computer Science 6729

Commenced Publication in 1973


Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Alfred Kobsa
University of California, Irvine, CA, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Germany
Madhu Sudan
Microsoft Research, Cambridge, MA, USA
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbruecken, Germany
Ying Tan Yuhui Shi Yi Chai
Guoyin Wang (Eds.)

Advances
in Swarm Intelligence

Second International Conference, ICSI 2011


Chongqing, China, June 12-15, 2011
Proceedings, Part II

13
Volume Editors

Ying Tan
Peking University
Key Laboratory of Machine Perception (MOE)
Department of Machine Intelligence
Beijing, 100871, China
E-mail: [email protected]

Yuhui Shi
Xi’an Jiaotong-Liverpool University
Department of Electrical and Electronic Engineering
Suzhou, 215123,China
E-mail: [email protected]

Yi Chai
Chongqing University
Automation College
Chongqing 400030, China
E-mail: [email protected]

Guoyin Wang
Chongqing University of Posts and Telecommunications
College of Computer Science and Technology
Chongqing, 400065, China
E-mail: [email protected]

ISSN 0302-9743 e-ISSN 1611-3349


ISBN 978-3-642-21523-0 e-ISBN 978-3-642-21524-7
DOI 10.1007/978-3-642-21524-7
Springer Heidelberg Dordrecht London New York

Library of Congress Control Number: 2011928465

CR Subject Classification (1998): F.1, H.3, I.2, H.4, H.2.8, I.4-5

LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues

© Springer-Verlag Berlin Heidelberg 2011


This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Preface

This book and its companion volume, LNCS vols. 6728 and 6729, constitute
the proceedings of the Second International Conference on Swarm Intelligence
(ICSI 2011) held during June 12–15, 2011 in Chongqing, well known as the
Mountain City, the southwestern commercial capital of China. ICSI 2011 was
the second gathering in the world for researchers working on all aspects of swarm
intelligence, following the successful and fruitful Beijing ICSI event in 2010,
which provided a high-level international academic forum for the participants to
disseminate their new research findings and discuss emerging areas of research.
It also created a stimulating environment for the participants to interact and
exchange information on future challenges and opportunities in the field of swarm
intelligence research.
ICSI 2011 received 298 submissions from about 602 authors in 38 countries
and regions (Algeria, American Samoa, Argentina, Australia, Austria, Belize,
Bhutan, Brazil, Canada, Chile, China, Germany, Hong Kong, Hungary, India,
Islamic Republic of Iran, Japan, Republic of Korea, Kuwait, Macau, Madagas-
car, Malaysia, Mexico, New Zealand, Pakistan, Romania, Saudi Arabia, Singa-
pore, South Africa, Spain, Sweden, Chinese Taiwan, Thailand, Tunisia, Ukraine,
UK, USA, Vietnam) across six continents (Asia, Europe, North America, South
America, Africa, and Oceania). Each submission was reviewed by at least 2
reviewers, and on average 2.8 reviewers. Based on rigorous reviews by the Pro-
gram Committee members and reviewers, 143 high-quality papers were selected
for publication in the proceedings with an acceptance rate of 47.9%. The pa-
pers are organized in 23 cohesive sections covering all major topics of swarm
intelligence research and development.
In addition to the contributed papers, the ICSI 2011 technical program in-
cluded four plenary speeches by Russell C. Eberhart (Indiana University Pur-
due University Indianapolis (IUPUI), USA), K. C. Tan (National University of
Singapore, Singapore, the Editor-in-Chief of IEEE Computational Intelligence
Magazine (CIM)), Juan Luis Fernandez Martnez (University of Oviedo, Spain),
Fernando Buarque (University of Pernambuco, Brazil). Besides the regular oral
sessions, ICSI 2011 had two special sessions on ‘Data Fusion and Swarm Intelli-
gence’ and ‘Fish School Search Foundations and Application’ as well as several
poster sessions focusing on wide areas.
As organizers of ICSI 2011, we would like to express sincere thanks to
Chongqing University, Peking University, Chongqing University of Posts and
Telecommunications, and Xi’an Jiaotong-Liverpool University for their spon-
sorship, to the IEEE Computational Intelligence Society, World Federation on
Soft Computing, International Neural Network Society, and Chinese Association
for Artificial Intelligence for their technical co-sponsorship. We appreciate the
Natural Science Foundation of China for its financial and logistic supports.
VI Preface

We would also like to thank the members of the Advisory Committee for their
guidance, the members of the International Program Committee and additional
reviewers for reviewing the papers, and members of the Publications Committee
for checking the accepted papers in a short period of time. Particularly, we are
grateful to the proceedings publisher Springer for publishing the proceedings in
the prestigious series of Lecture Notes in Computer Science. Moreover, we wish
to express our heartfelt appreciation to the plenary speakers, session chairs,
and student helpers. There are still many more colleagues, associates, friends,
and supporters who helped us in immeasurable ways; we express our sincere
gratitude to them all. Last but not the least, we would like to thank all the
speakers and authors and participants for their great contributions that made
ICSI 2011 successful and all the hard work worthwhile.

June 2011 Ying Tan


Yuhui Shi
Yi Chai
Guoyin Wang
Organization

General Chairs
Russell C. Eberhart Indiana University - Purdue University, USA
Dan Yang Chongqing University, China
Ying Tan Peking University, China

Advisory Committee Chairs


Xingui He Peking University, China
Qidi Wu Tongji University, China
Gary G. Yen Oklahoma State University, USA

Program Committee Chairs


Yuhui Shi Xi’an Jiaotong-Liverpool University, China
Guoyin Wang Chongqing University of Posts and
Telecommunications, China

Technical Committee Chairs


Yi Chai Chongqing University, China
Andries Engelbrecht University of Pretoria, South Africa
Nikola Kasabov Auckland University of Technology, New Zealand
Kay Chen Tan National University of Singapore, Singapore
Peng-yeng Yin National Chi Nan University, Taiwan, China
Martin Middendorf University of Leipzig, Germany

Plenary Sessions Chairs


Xiaohui Cui Oak Ridge National Laboratory, USA
James Tin-Yau Kwok The Hong Kong University of Science and
Technology, China

Special Sessions Chairs


Majid Ahmadi University of Windsor, Canada
Hongwei Mo Harbin Engineering University, China
Yi Zhang Sichuan University, China
VIII Organization

Publications Chairs
Rajkumar Roy Cranfield University, UK
Radu-Emil Precup Politehnica University of Timisoara, Romania
Yue Sun Chongqing University, China

Publicity Chairs
Xiaodong Li RMIT Unversity, Australia
Haibo He University of Rhode Island Kingston, USA
Lei Wang Tongji University, China
Weiren Shi Chongqing University, China
Jin Wang Chongqing University of Posts and
Telecommunications, China

Finance Chairs
Chao Deng Peking University, China
Andreas Janecek University of Vienna, Austria

Local Arrangements Chairs


Dihua Sun Chongqing University, China
Qun Liu Chongqing University of Posts and Telecommu-
nications, China

Program Committee Members


Payman Arabshahi University of Washington, USA
Carmelo Bastos University of Pernambuco, Brazil
Christian Blum Universitat Politecnica de Catalunya, Spain
Leandro Leandro dos
Santos Coelho Pontifı́cia Universidade Católica do Parana,
Brazil
Carlos Coello Coello CINVESTAV-IPN, Mexico
Oscar Cordon European Centre for Soft Computing, Spain
Jose Alfredo Ferreira Costa UFRN Universidade Federal do Rio Grande
do Norte, Brazil
Iain Couzin Princeton University, USA
Xiaohui Cui Oak Ridge National Laboratory, USA
Swagatam Das Jadavpur University, India
Prithviraj Dasgupta University of Nebraska, USA
Kusum Deep Indian Institute of Technology Roorkee, India
Mingcong Deng Okayama University, Japan
Haibin Duan Beijing University of Aeronautics and
Astronautics, China
Organization IX

Mark Embrechts RPI, USA


Andries Engelbrecht University of Pretoria, South Africa
Wai-Keung Fung University of Manitoba, Canada
Beatriz Aurora Garro
Licon CIC-IPN, Mexico
Dunwei Gong China University of Mining and Technology,
China
Ping Guo Beijing Normal University, China
Walter Gutjahr University of Vienna, Austria
Qing-Long Han Central Queensland University, Australia
Haibo He University of Rhode Island, USA
Lu Hongtao Shanghai Jiao Tong University, China
Mo Hongwei Harbin Engineering University, China
Zeng-Guang Hou Institute of Automation, Chinese Academy of
Sciences, China
Huosheng Hu University of Essex, UK
Guang-Bin Huang Nanyang Technological University, Singapore
Yuancheng Huang Wuhan University, China
Hisao Ishibuchi Osaka Prefecture University, Japan
Andreas Janecek University of Vienna, Austria
Zhen Ji Shenzhen University, China
Changan Jiang Kagawa University, Japan
Licheng Jiao Xidian University, China
Colin Johnson University of Kent, UK
Farrukh Aslam Khan FAST-National University of Computer and
Emerging Sciences, Pakistan
Arun Khosla National Institute of Tech. Jalandhar, India
Franziska Klügl Örebro University, Sweden
James Kwok Hong Kong University of Science and Technology,
China
Xiaodong Li RMIT University, Australia
Yangmin Li University of Macau, China
Fernando Buarque
De Lima Neto Polytechnic School of Pernambuco, Brazil
Guoping Liu University of Glamorgan, UK
Ju Liu Shandong University, China
Qun Liu Chongqing University of Posts and
Communications, China
Wenlian Lu Fudan University, China
Juan Luis Fernandez
Martinez University of Oviedo, Spain
Wenjian Luo University of Science and Technology of China,
China
Jinwen Ma Peking University, China
Bernd Meyer Monash University, Australia
X Organization

Martin Middendorf University of Leipzig, Germany


Mahamed G. H. Omran Gulf University for Science and Technology,
Kuwait
Jeng-Shyang Pan Pan National Kaohsiung University of
Applied Sciences, Taiwan, China
Shaoning Pang Auckland University of Technology, New Zealand
Bijaya Ketan Panigrahi IIT Delhi, India
Thomas Potok ORNL, USA
Radu-Emil Precup Politehnica University of Timisoara, Romania
Guenter Rudolph TU Dortmund University, Germany
Gerald Schaefer Loughborough University, UK
Yuhui Shi Xi’an Jiaotong-Liverpool University, China
Michael Small Hong Kong Polytechnic University, China
Jim Smith University of the West of England, UK
Ponnuthurai Suganthan Nanyang Technological University, Singapore
Norikazu Takahashi Kyushu University, Japan
Kay-Chen Tan National University of Singapore, Singapore
Ying Tan Peking University, China
Ke Tang University of Science and Technology of China,
China
Peter Tino University of Birmingham, UK
Christos Tjortjis The University of Manchester, UK
Frans Van Den Bergh CSIR, South Africa
Ba-Ngu Vo The University of Western Australia, Australia
Bing Wang University of Hull, UK
Guoyin Wang Chongqing University of Posts and
Telecommunications, China
Hongbo Wang Yanshan University, China
Jiahai Wang Sun Yat-sen University, China
Jin Wang Chongqing University of Posts and
Telecommunications, China
Lei Wang Tongji University, China
Ling Wang Tsinghua University, China
Lipo Wang Nanyang Technological University, Singapore
Benlian Xu Changshu Institute of Technology, China
Pingkun Yan Philips Research North America, USA
Yingjie Yang De Montfort University, UK
Hoengpeng Yin Chongqing University, China
Peng-Yeng Yin National Chi Nan University, Taiwan, China
Dingli Yu Liverpool John Moores University, UK
Jie Zhang Newcastle University, UK
Jun Zhang Waseda University, Japan
Lifeng Zhang Renmin University of China, China
Qieshi Zhang Waseda University, Japan
Qingfu Zhang University of Essex, UK
Organization XI

Dongbin Zhao Institute of Automation, Chinese Academy of


Science, China
Zhi-Hua Zhou Nanjing University, China

Additional Reviewers
Bi, Chongke Qing, Li
Cheng, Chi Tai Quirin, Arnaud
Damas, Sergio Saleem, Muhammad
Ding, Ke Samad, Rosdiyana
Dong, Yongsheng Sambo, Francesco
Duong, Tung Singh, Satvir
Fang, Chonglun Sun, Fuming
Guo, Jun Sun, Yang
Henmi, Tomohiro Tang, Yong
Hu, Zhaohui Tong, Can
Huang, Sheng-Jun Vázquez, Roberto A.
Kalra, Gaurav Wang, Hongyan
Lam, Franklin Wang, Lin
Lau, Meng Cheng Yanou, Akira
Leung, Carson K. Zhang, Dawei
Lu, Qiang Zhang, X.M.
Nakamura, Yukinori Zhang, Yong
Osunleke, Ajiboye Zhu, Yanqiao
Table of Contents – Part II

Multi-Objective Optimization Algorithms


Multi-Objective Optimization for Dynamic Single-Machine
Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Li Nie, Liang Gao, Peigen Li, and Xiaojuan Wang

Research of Pareto-Based Multi-Objective Optimization for


Multi-vehicle Assignment Problem Based on MOPSO . . . . . . . . . . . . . . . . . 10
Ai Di-Ming, Zhang Zhe, Zhang Rui, and Pan Feng

Correlative Particle Swarm Optimization for Multi-objective


Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Yuanxia Shen, Guoyin Wang, and Qun Liu

A PSO-Based Hybrid Multi-Objective Algorithm for Multi-Objective


Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Xianpeng Wang and Lixin Tang

The Properties of Birandom Multiobjective Programming Problems . . . . 34


Yongguo Zhang, Yayi Xu, Mingfa Zheng, and Liu Ningning

A Modified Multi-objective Binary Particle Swarm Optimization


Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Ling Wang, Wei Ye, Xiping Fu, and Muhammad Ilyas Menhas

Improved Multiobjective Particle Swarm Optimization for


Environmental/Economic Dispatch Problem in Power System . . . . . . . . . 49
Yali Wu, Liqing Xu, and Jingqian Xue

A New Multi-Objective Particle Swarm Optimization Algorithm for


Strategic Planning of Equipment Maintenance . . . . . . . . . . . . . . . . . . . . . . . 57
Haifeng Ling, Yujun Zheng, Ziqiu Zhang, and Xianzhong Zhou

Multiobjective Optimization for Nurse Scheduling . . . . . . . . . . . . . . . . . . . . 66


Peng-Yeng Yin, Chih-Chiang Chao, and Ya-Tzu Chiang

A Multi-objective Binary Harmony Search Algorithm . . . . . . . . . . . . . . . . . 74


Ling Wang, Yunfei Mao, Qun Niu, and Minrui Fei

Multi-robot, Swarm-robot, and Multi-agent Systems


A Self-organized Approach to Collaborative Handling of Multi-robot
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Tian-yun Huang, Xue-bo Chen, Wang-bao Xu, and Wei Wang
XIV Table of Contents – Part II

An Enhanced Formation of Multi-robot Based on A* Algorithm for


Data Relay Transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Zhiguang Xu, Kyung-Sik Choi, Yoon-Gu Kim, Jinung An, and
Suk-Gyu Lee

WPAN Communication Distance Expansion Method Based on


Multi-robot Cooperation Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Yoon-Gu Kim, Jinung An, Kyoung-Dong Kim, Zhi-Guang Xu, and
Suk-Gyu Lee

Relative State Modeling Based Distributed Receding Horizon


Formation Control of Multiple Robot Systems . . . . . . . . . . . . . . . . . . . . . . . 108
Wang Zheng, He Yuqing, and Han Jianda

Simulation and Experiments of the Simultaneous Self-assembly for


Modular Swarm Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Hongxing Wei, Yizhou Huang, Haiyuan Li, and Jindong Tan

Impulsive Consensus in Networks of Multi-agent Systems with Any


Communication Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Quanjun Wu, Li Xu, Hua Zhang, and Jin Zhou

Data Mining Methods


FDClust: A New Bio-inspired Divisive Clustering Algorithm . . . . . . . . . . . 136
Besma Khereddine and Mariem Gzara

Mining Class Association Rules from Dynamic Class Coupling Data to


Measure Class Reusability Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Anshu Parashar and Jitender Kumar Chhabra

An Algorithm of Constraint Frequent Neighboring Class Sets Mining


Based on Separating Support Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Gang Fang, Jiang Xiong, Hong Ying, and Yong-jian Zhao

A Multi-period Stochastic Production Planning and Sourcing Problem


with Discrete Demand Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Weili Chen, Yankui Liu, and Xiaoli Wu

Exploration of Rough Sets Analysis in Real-World Examination


Timetabling Problem Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
J. Joshua Thomas, Ahamad Tajudin Khader, Bahari Belaton, and
Amy Leow

Community Detection in Sample Networks Generated from Gaussian


Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Ling Zhao, Tingzhan Liu, and Jian Liu
Table of Contents – Part II XV

Efficient Reduction of the Number of Associations Rules Using Fuzzy


Clustering on the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Amel Grissa Touzi, Aicha Thabet, and Minyar Sassi

A Localization Algorithm in Wireless Sensor Networks Based on


PSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Hui Li, Shengwu Xiong, Yi Liu, Jialiang Kou, and Pengfei Duan

Game Theoretic Approach in Routing Protocol for Cooperative


Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Qun Liu, Xingping Xian, and Tao Wu

Machine Learning Methods


A New Collaborative Filtering Recommendation Approach Based On
Naive Bayesian Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
Kebin Wang and Ying Tan

Statistical Approach for Calculating the Energy Consumption by Cell


Phones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
Shanchen Pang and Zhonglei Yu

Comparison of Ensemble Classifiers in Extracting Synonymous Chinese


Transliteration Pairs from Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
Chien-Hsing Chen and Chung-Chian Hsu

Combining Classifiers by Particle Swarms with Local Search . . . . . . . . . . . 244


Liying Yang

An Expert System Based on Analytical Hierarchy Process for Diabetes


Risk Assessment (DIABRA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Mohammad Reza Amin-Naseri and Najmeh Neshat

Practice of Crowd Evacuating Process Model with Cellular Automata


Based on Safety Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Shi Xi Tang and Ke Ming Tang

Feature Selection Algorithms


Feature Selectionfor Unlabeled Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Chien-Hsing Chen

Feature Selection Algorithm Based on Least Squares Support Vector


Machine and Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 275
Song Chuyi, Jiang Jingqing, Wu Chunguo, and Liang Yanchun

Unsupervised Local and Global Weighting for Feature Selection . . . . . . . . 283


Nadia Mesghouni, Khaled Ghedira, and Moncef Temani
XVI Table of Contents – Part II

Graph-Based Feature Recognition of Line-Like Topographic Map


Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
Rudolf Szendrei, István Elek, and Mátyás Márton

Automatic Recognition of Topographic Map Symbols Based on Their


Textures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
Rudolf Szendrei, István Elek, and István Fekete

Using Population Based Algorithms for Initializing Nonnegative Matrix


Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
Andreas Janecek and Ying Tan

A Kind of Object Level Measuring Method Based on Image


Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
Xiaoying Wang and Yingge Chen

Pattern Recognition Methods


Fast Human Detection Using a Cascade of United Hogs . . . . . . . . . . . . . . . 327
Wenhui Li, Yifeng Lin, and Bo Fu

The Analysis of Parameters t and k of LPP on Several Famous Face


Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
Sujing Wang, Na Zhang, Mingfang Sun, and Chunguang Zhou

Local Block Representation for Face Recognition . . . . . . . . . . . . . . . . . . . . . 340


Liyuan Jia, Li Huang, and Lei Li

Feature Level Fusion of Fingerprint and Finger Vein Biometrics . . . . . . . . 348


Kunming Lin, Fengling Han, Yongming Yang, and Zulong Zhang

A Research of Reduction Algorithm for Support Vector Machine . . . . . . . 356


Susu Liu and Limin Sun

Fast Support Vector Regression Based on Cut . . . . . . . . . . . . . . . . . . . . . . . 363


Wenyong Zhou, Yan Xiong, Chang-an Wu, and Hongbing Liu

Intelligent Control
Using Genetic Algorithm for Parameter Tuning on ILC Controller
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Alireza rezaee and Mohammad jafarpour jalali

Controller Design for a Heat Exchanger in Waste Heat Utilizing


Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
Jianhua Zhang, Wenfang Zhang, Ying Li, and Guolian Hou
Table of Contents – Part II XVII

Test Research on Radiated Susceptibility of Automobile Electronic


Control System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Shenghui Yang, Xiangkai Liu, Xiaoyun Yang, and Yu Xiao

Forgeability Attack of Two DLP-Base Proxy Blind Signature


Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
Jianhong Zhang, Fenhong Guo, Zhibin Sun, and Jilin Wang

Other Optimization Algorithms and Applications


Key Cutting Algorithm and Its Variants for Unconstrained
Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
Uthen Leeton and Thanatchai Kulworawanichpong

Transmitter-Receiver Collaborative-Relay Beamforming by Simulated


Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
Dong Zheng, Ju Liu, Lei Chen, Yuxi Liu, and Weidong Guo

Calculation of Quantities of Spare Parts and the Estimation of


Availability in the Repaired as Old Models . . . . . . . . . . . . . . . . . . . . . . . . . . 419
Zhe Yin, Feng Lin, Yun-fei Guo, and Mao-sheng Lai

The Design of the Algorithm of Creating Sudoku Puzzle . . . . . . . . . . . . . . 427


Jixian Meng and Xinzhong Lu

Research and Validation of the Smart Power Two-Way Interactive


System Based on Unified Communication Technology . . . . . . . . . . . . . . . . . 434
Jianming Liu, Jiye Wang, Ning Li, and Zhenmin Chen

A Micro Wireless Video Transmission System . . . . . . . . . . . . . . . . . . . . . . . . 441


Yong-ming Yang, Xue-jun Chen, Wei He, and Yu-xing Mao

Inclusion Principle for Dynamic Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449


Xin-yu Ouyang and Xue-bo Chen

Lie Triple Derivations for the Parabolic Subalgebras of gl(n, R) . . . . . . . . 457


Jing Zhao, Hailing Li, and Lijing Fang

Non-contact Icing Detection on Helicopter and Experiments


Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
Jie Zhang, Lingyan Li, Wei Chen, and Hong Zhang

Research on Decision-Making Simulation of “Gambler’s Fallacy” and


“Hot Hand” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
Jianbiao Li, Chaoyang Li, Sai Xu, and Xue Ren

An Integration Process Model of Enterprise Information System


Families Based on System of Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
Yingbo Wu, Xu Wang, and Yun Lin
XVIII Table of Contents – Part II

Special Session on Data Fusion and Swarm


Intelligence
A Linear Multisensor PHD Filter Using the Measurement Dimension
Extension Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
Weifeng Liu and Chenglin Wen
An Improved Particle Swarm Optimization for Uncertain Information
Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
Peiyi Zhu, Benlian Xu, and Baoguo Xu
Three-Primary-Color Pheromone for Track Initiation . . . . . . . . . . . . . . . . . 502
Benlian Xu, Qinglan Chen, and Jihong Zhu
Visual Tracking of Multiple Targets by Multi-Bernoulli Filtering of
Background Subtracted Image Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
Reza Hoseinnezhad, Ba-Ngu Vo, and Truong Nguyen Vu
Mobile Robotics in a Random Finite Set Framework . . . . . . . . . . . . . . . . . . 519
John Mullane, Ba-Ngu Vo, Martin Adams, and Ba-Tuong Vo
IMM Algorithm for a 3D High Maneuvering Target Tracking . . . . . . . . . . 529
Dong-liang Peng and Yu Gu
A New Method Based on Ant Colony Optimization for the Probability
Hypothesis Density Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
Jihong Zhu, Benlian Xu, Fei Wang, and Qiquan Wang

Special Session on Fish School Search - Foundations


and Application
A Hybrid Algorithm Based on Fish School Search and Particle Swarm
Optimization for Dynamic Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
George M. Cavalcanti-Júnior, Carmelo J.A. Bastos-Filho,
Fernando B. Lima-Neto, and Rodrigo M.C.S. Castro
Feeding the Fish – Weight Update Strategies for the Fish School Search
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
Andreas Janecek and Ying Tan
Density as the Segregation Mechanism in Fish School Search for
Multimodal Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
Salomão Sampaio Madeiro, Fernando Buarque de Lima-Neto,
Carmelo José Albanez Bastos-Filho, and
Elliackin Messias do Nascimento Figueiredo
Mining Coherent Biclusters with Fish School Search . . . . . . . . . . . . . . . . . . 573
Lara Menezes and André L.V. Coelho

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583


Table of Contents – Part I

Theoretical Analysis of Swarm Intelligence


Algorithms
Particle Swarm Optimization: A Powerful Family of Stochastic
Optimizers. Analysis, Design and Application to Inverse Modelling . . . . . 1
Juan Luis Fernández-Martı́nez, Esperanza Garcı́a-Gonzalo,
Saras Saraswathi, Robert Jernigan, and Andrzej Kloczkowski

Building Computational Models of Swarms from Simulated Positional


Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Graciano Dieck Kattas and Michael Small

Robustness and Stagnation of a Swarm in a Cooperative Object


Recognition Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
David King and Philip Breedon

Enforced Mutation to Enhancing the Capability of Particle Swarm


Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
PenChen Chou and JenLian Chen

Normalized Population Diversity in Particle Swarm Optimization . . . . . . 38


Shi Cheng and Yuhui Shi

Particle Swarm Optimization with Disagreements . . . . . . . . . . . . . . . . . . . . 46


Andrei Lihu and Ştefan Holban

PSOslope: A Stand-Alone Windows Application for Graphical Analysis


of Slope Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Walter Chen and Powen Chen

A Review of the Application of Swarm Intelligence Algorithms to 2D


Cutting and Packing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Yanxin Xu, Gen Ke Yang, Jie Bai, and Changchun Pan

Particle Swarm Optimization


Inertia Weight Adaption in Particle Swarm Optimization Algorithm . . . . 71
Zheng Zhou and Yuhui Shi

Nonlinear Inertia Weight Variation for Dynamic Adaptation in Particle


Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Wudai Liao, Junyan Wang, and Jiangfeng Wang
XX Table of Contents – Part I

An Adaptive Tribe-Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . 86


Yong Duan Song, Lu Zhang, and Peng Han

A Novel Hybrid Binary PSO Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93


Muhammd Ilyas Menhas, MinRui Fei, Ling Wang, and Xiping Fu

PSO Algorithm with Chaos and Gene Density Mutation for Solving
Nonlinear Zero-One Integer Programming Problems . . . . . . . . . . . . . . . . . . 101
Yuelin Gao, Fanfan Lei, Huirong Li, and Jimin Li

A New Binary PSO with Velocity Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 111


Laura Lanzarini, Javier López, Juan Andrés Maulini, and
Armando De Giusti

Adaptive Particle Swarm Optimization Algorithm for Dynamic


Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Iman Rezazadeh, Mohammad Reza Meybodi, and Ahmad Naebi

An Improved Particle Swarm Optimization with an Adaptive Updating


Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Jie Qi and Yongsheng Ding

Mortal Particles: Particle Swarm Optimization with Life Span . . . . . . . . . 138


Yong-wei Zhang, Lei Wang, and Qi-di Wu

Applications of PSO Algorithms


PSO Based Pseudo Dynamic Method for Automated Test Case
Generation Using Interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Surender Singh Dahiya, Jitender Kumar Chhabra, and Shakti Kumar

Reactive Power Optimization Based on Particle Swarm Optimization


Algorithm in 10kV Distribution Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Chao Wang, Gang Yao, Xin Wang, Yihui Zheng, Lidan Zhou,
Qingshan Xu, and Xinyuan Liang

Clustering-Based Particle Swarm Optimization for Electrical Impedance


Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Gang Hu, Min-you Chen, Wei He, and Jin-qian Zhai

A PSO- Based Robust Optimization Approach for Supply Chain


Collaboration with Demand Uncertain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Yutian Jia, Xingquan Zuo, and Jianping Wu

A Multi-valued Discrete Particle Swarm Optimization for the


Evacuation Vehicle Routing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Marina Yusoff, Junaidah Ariffin, and Azlinah Mohamed
Table of Contents – Part I XXI

A NichePSO Algorithm Based Method for Process Window Selection . . . 194


Wenqi Li, Yiming Qiu, Lei Wang, and Qidi Wu

Efficient WiFi-Based Indoor Localization Using Particle Swarm


Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Girma S. Tewolde and Jaerock Kwon

Using PSO Algorithm for Simple LSB Substitution Based


Steganography Scheme in DCT Transformation Domain . . . . . . . . . . . . . . 212
Feno Heriniaina Rabevohitra and Jun Sang

Numerical Integration Method Based on Particle Swarm


Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Leila Djerou, Naceur Khelil, and Mohamed Batouche

Identification of VSD System Parameters with Particle Swarm


Optimization Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Yiming Qiu, Wenqi Li, Dongsheng Yang, Lei Wang, and Qidi Wu

PSO-Based Emergency Evacuation Simulation . . . . . . . . . . . . . . . . . . . . . . . 234


Jialiang Kou, Shengwu Xiong, Hongbing Liu, Xinlu Zong,
Shuzhen Wan, Yi Liu, Hui Li, and Pengfei Duan

Training Spiking Neurons by Means of Particle Swarm Optimization . . . 242


Roberto A. Vázquez and Beatriz A. Garro

Ant Colony Optimization Algorithms


Clustering Aggregation for Improving Ant Based Clustering . . . . . . . . . . . 250
Akil Elkamel, Mariem Gzara, and Hanêne Ben-Abdallah

Multi-cellular-ant Algorithm for Large Scale Capacity Vehicle Route


Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Jie Li, Yi Chai, Penghua Li, and Hongpeng Yin

Ant Colony Optimization for Global White Matter Fiber Tracking . . . . . 267
Yuanjing Feng and Zhejin Wang

Bee Colony Algorithms


An Efficient Bee Behavior-Based Multi-function Routing Algorithm for
Network-on-Chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
Junhui Wang, Huaxi Gu, Yintang Yang, and Zhi Deng

Artificial Bee Colony Based Mapping for Application Specific


Network-on-Chip Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Zhi Deng, Huaxi Gu, Haizhou Feng, and Baojian Shu
XXII Table of Contents – Part I

Using Artificial Bee Colony to Solve Stochastic Resource Constrained


Project Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Amin Tahooneh and Koorush Ziarati

Novel Swarm-Based Optimization Algorithms


Brain Storm Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Yuhui Shi
Human Group Optimizer with Local Search . . . . . . . . . . . . . . . . . . . . . . . . . 310
Chaohua Dai, Weirong Chen, Lili Ran, Yi Zhang, and Yu Du
Average-Inertia Weighted Cat Swarm Optimization . . . . . . . . . . . . . . . . . . 321
Maysam Orouskhani, Mohammad Mansouri, and
Mohammad Teshnehlab
Standby Redundancy Optimization with Type-2 Fuzzy Lifetimes . . . . . . . 329
Yanju Chen and Ying Liu
Oriented Search Algorithm for Function Optimization . . . . . . . . . . . . . . . . 338
Xuexia Zhang and Weirong Chen
Evolution of Cooperation under Social Norms in Non-Structured
Populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
Qi Xiaowei, Ren Guang, Yue Gin, and Zhang Aiping
Collaborative Optimization under a Control Framework for ATSP . . . . . . 355
Jie Bai, Jun Zhu, Gen-Ke Yang, and Chang-Chun Pan
Bio-Inspired Dynamic Composition and Reconfiguration of
Service-Oriented Internetware Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
Huan Zhou, Zili Zhang, Yuheng Wu, and Tao Qian
A Novel Search Interval Forecasting Optimization Algorithm . . . . . . . . . . 374
Yang Lou, Junli Li, Yuhui Shi, and Linpeng Jin

Artificial Immune System


A Danger Theory Inspired Learning Model and Its Application to
Spam Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
Yuanchun Zhu and Ying Tan
Research of Hybrid Biogeography Based Optimization and Clonal
Selection Algorithm for Numerical Optimization . . . . . . . . . . . . . . . . . . . . . 390
Zheng Qu and Hongwei Mo
The Hybrid Algorithm of Biogeography Based Optimization and Clone
Selection for Sensors Selection of Aircraft . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
Lifang Xu, Shouda Jiang, and Hongwei Mo
Table of Contents – Part I XXIII

A Modified Artificial Immune Network for Feature Extracting . . . . . . . . . 408


Hong Ge and XueMing Yan

Differential Evolution
Novel Binary Encoding Differential Evolution Algorithm . . . . . . . . . . . . . . 416
Changshou Deng, Bingyan Zhao, Yanling Yang, Hu Peng, and
Qiming Wei
Adaptive Learning Differential Evolution for Numeric Optimization . . . . 424
Yi Liu, Shengwu Xiong, Hui Li, and Shuzhen Wan
Differential Evolution with Improved Mutation Strategy . . . . . . . . . . . . . . 431
Shuzhen Wan, Shengwu Xiong, Jialiang Kou, and Yi Liu
Gaussian Particle Swarm Optimization with Differential Evolution
Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
Chunqiu Wan, Jun Wang, Geng Yang, and Xing Zhang

Neural Networks
Evolving Neural Networks: A Comparison between Differential
Evolution and Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 447
Beatriz A. Garro, Humberto Sossa, and Roberto A. Vázquez
Identification of Hindmarsh-Rose Neuron Networks Using GEO
metaheuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
Lihe Wang, Genke Yang, and Lam Fat Yeung
Delay-Dependent Stability Criterion for Neural Networks of
Neutral-Type with Interval Time-Varying Delays and Nonlinear
Perturbations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
Guoquan Liu, Simon X. Yang, and Wei Fu
Application of Generalized Chebyshev Neural Network in Air Quality
Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
Fengjun Li
Financial Time Series Forecast Using Neural Network Ensembles . . . . . . . 480
Anupam Tarsauliya, Rahul Kala, Ritu Tiwari, and Anupam Shukla
Selection of Software Reliability Model Based on BP Neural Network . . . 489
Yingbo Wu and Xu Wang

Genetic Algorithms
Atavistic Strategy for Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
Dongmei Lin, Xiaodong Li, and Dong Wang
XXIV Table of Contents – Part I

An Improved Co-evolution Genetic Algorithm for Combinatorial


Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
Nan Li and Yi Luo
Recursive Structure Element Decomposition Using Migration Fitness
Scaling Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
Yudong Zhang and Lenan Wu
A Shadow Price Guided Genetic Algorithm for Energy Aware Task
Scheduling on Cloud Computers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522
Gang Shen and Yan-Qing Zhang
A Solution to Bipartite Drawing Problem Using Genetic Algorithm . . . . 530
Salabat Khan, Mohsin Bilal, Muhammad Sharif, and
Farrukh Aslam Khan

Evolutionary Computation
Evaluation of Two-Stage Ensemble Evolutionary Algorithm for
Numerical Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
Yu Wang, Bin Li, Kaibo Zhang, and Zhen He
A Novel Genetic Programming Algorithm For Designing Morphological
Image Analysis Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
Jun Wang and Ying Tan

Fuzzy Methods
Optimizing Single-Source Capacitated FLP in Fuzzy Decision
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
Liwei Zhang, Yankui Liu, and Xiaoqing Wang
New Results on a Fuzzy Granular Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
Xu-Qing Tang and Kun Zhang
Fuzzy Integral Based Data Fusion for Protein Function Prediction . . . . . 578
Yinan Lu, Yan Zhao, Xiaoni Liu, and Yong Quan

Hybrid Algorithms
Gene Clustering Using Particle Swarm Optimizer Based Memetic
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587
Zhen Ji, Wenmin Liu, and Zexuan Zhu
Hybrid Particle Swarm Optimization with Biased Mutation Applied to
Load Flow Computation in Electrical Power Systems . . . . . . . . . . . . . . . . . 595
Camila Paes Salomon, Maurilio Pereira Coutinho,
Germano Lambert-Torres, and Cláudio Ferreira
Table of Contents – Part I XXV

Simulation of Routing in Nano-Manipulation for Creating Pattern with


Atomic Force Microscopy Using Hybrid GA and PSO-AS Algorithms . . . 606
Ahmad Naebi, Moharam Habibnejad Korayem, Farhoud Hoseinpour,
Sureswaran Ramadass, and Mojtaba Hoseinzadeh

Neural Fuzzy Forecasting of the China Yuan to US Dollar Exchange


Rate—A Swarm Intelligence Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
Chunshien Li, Chuan Wei Lin, and Hongming Huang

A Hybrid Model for Credit Evaluation Problem . . . . . . . . . . . . . . . . . . . . . . 626


Hui Fu and Xiaoyong Liu

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635


Multi-Objective Optimization for Dynamic
Single-Machine Scheduling

Li Nie, Liang Gao*, Peigen Li, and Xiaojuan Wang

The State Key Laboratory of Digital Manufacturing Equipment and Technology,


Huazhong University of Science and Technology,
Wuhan, Hubei 430074, People’s Republic of China
[email protected]

Abstract. In this paper, a multi-objective evolutionary algorithm based on gene


expression programming (MOGEP) is proposed to construct scheduling rules
(SRs) for dynamic single-machine scheduling problem (DSMSP) with job release
dates. In MOGEP a fitness assignment scheme, diversity maintaining strategy and
elitist strategy are incorporated on the basis of original GEP. Results of simulation
experiments show that the MOGEP can construct effective SRs which contribute
to optimizing multiple scheduling measures simultaneously.

Keywords: multi-objective optimization; gene expression programming; dy-


namic scheduling; single-machine.

1 Introduction
Production scheduling problem is one of the most important tasks carried out in manu-
facturing systems and has received considerable attention in operations research litera-
ture. In this area, it is usually assumed that all the jobs to be processed are available at
the beginning of the whole planning horizon. However, in many real situations, jobs
may arrive over time due to transportation etc.
There are many approaches have been proposed to solve production scheduling
problem, such as branch and bound [1], genetic algorithms [2], tabu search [3] etc.
However, these methods usually offer good quality solutions with the cost of a huge
amount of computational time. Furthermore, these techniques are not applicable in
dynamic or uncertain conditions because it is needed to frequently modify the original
schedules to respond to the changes of system status.
Scheduling with scheduling rules (SRs) that defines only the next state of the system
is highly effective in such dynamic environment [4]. Due to inherent complexity and
variability of scheduling problem, a considerable effort is needed to develop suitable
SRs for the problem at hand. Many researchers have investigated the use of genetic
programming (GP) to create problem specific SRs [4][5][6]. In our previous work, we
have applied gene expression programming (GEP), a new evolutionary algorithm, on
dynamic single-machine scheduling problem (DSMSP) with job release dates and
demonstrated that GEP is more promising than GP to create efficient SRs [7]. All the

*
Corresponding author.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 1–9, 2011.
© Springer-Verlag Berlin Heidelberg 2011
2 L. Nie et al.

work mentioned above has only concentrated on single-objective optimization. Several


objectives usually must be considered simultaneously in the real-world production
situation and these objectives often conflict with each other. It is not possible to have a
single solution which simultaneously optimizes all objectives. To tradeoff between
these objectives is necessary, which makes the multi-objective optimization problems
(MOPs) more difficult than the single-objective optimization problems. There have
been many multi-objective optimization evolutionary algorithm (MOEA) proposed
[8][9][10][11][12][13]. However, they can not be employed to construct SRs for
DSMSP.
In this paper, we propose a multi-objective evolutionary algorithm based on gene
expression programming (MOGEP) and apply it on optimizing of several objectives
simultaneously for DSMSP. In MOGEP, (1) a fitness assignment scheme which com-
bines Pareto-dominating relation and density information is proposed to guide the
search to approximate the Pareto optimal solutions; (2) a diversity maintain strategy is
used to adjust non-dominated set of each generation in order to keep the diversity of
the non-dominated set; (3) an elitist strategy is used to guarantee the convergence of
search. The remainder of the paper is organized as follows. In Section 2, DSMSP with
job release date is described. In section 3, the fundamental concepts of multi-objective
optimization are stated briefly. In Section 4, MOGEP and its application on DSMSP
are elaborated. In Section 5, the experiments are presented. The final conclusions are
given in Section 6.

2 Problem Description
The DSMSP with job release dates is described as following: The shop floor consists
of one machine and n jobs, which are released over time and are processed once on the
machine without preemption. The attributes of a job, such as processing time, release
date, due date, are unknown in advance till the job is currently available at the machine
or arrive in the immediate future. It is assumed that the machine is available all the
time and cannot process more than one job simultaneously. The task of scheduling is to
determine a sequence of jobs on the machine in order to minimize several optimization
criteria simultaneously, in our case, makespan, mean flow time, maximum lateness and
mean tardiness. The four performance criteria are defined below.
F1 = Cmax = max(ci , i = 1,..., n) . (1)

1 n
F2 = F = ∑ (ci − ri ) .
n i =1
(2)

F3 = Lmax = max(ci − d i , i = 1,..., n) . (3)

1 n
F4 = T = ∑ max(ci − di , 0) .
n i =1
(4)

Where, ci, ri and di denote the completion time, release date and due date of job i,
respectively. n denotes the number of jobs. Cmax, F , Lmax and T denote makespan,
mean flow time, maximum lateness and mean tardiness, respectively.
Multi-Objective Optimization for Dynamic Single-Machine Scheduling 3

3 Basic Concepts of Multi-Objective Optimization


In the section, we shortly describe several basic concepts of multi-objective optimiza-
tion and Pareto-optimality those are intensively used in the literature [14].
The multi-objective optimization problem is generally formulated as follow:
minimize
F ( X ) = ( F1 ( X ), F2 ( X ),..., FL ( X )) . (5)
X ∈Ω

Where X is a possible solution, Ω is the feasible solution space, F (i) is the objec-
tive function and Fr (i) is the rth objective function (for 1 ≤ r ≤ L ).
A solution a dominates a solution b (or b is dominated by a), if the following condi-
tions are satisfied:
Fi (a ) ≤ Fi (b), ∀i ∈{1, 2,..., L} . (6)

Fi (a ) < Fi (b), ∃i ∈{1, 2,..., L} . (7)


A solution a is indifferent to solution b, if a can not dominate b and b can not domi-
nate a. A solution is called non-dominated solution, if it is not dominated by any other
one. Pareto-optimal set is constituted of non-dominated solutions. Pareto-optimal
frontier is allocations in the objective space corresponding to the Pareto-optimal set.
The goal of multi-objective optimization is to find or approximate the Pareto-optimal
set. It is usually not possible to have a single solution which simultaneously optimizes
all objectives, therefore, an algorithm that gives a large number of alternative solutions
lying on or near the Pareto-optimal front is of great practical value.

4 MOGEP for DSMSP


GEP is a new technique of creating computer programs based on principle of evolu-
tion, firstly proposed by Ferreira [15]. GEP has been applied in different field, e.g.
functions discovery [16], classification rules discovery [17], time series prediction
[18], digital elevation model generation [19], and it shows powerful ability to solve
complex problems. However, original GEP can only optimize one objective. If several
objectives are demanded to be optimized concurrently, some extra steps should be
designed specially. Based on original GEP, MOGEP is equipped with a fitness assign-
ment scheme, diversity maintaining strategy and elitist strategy.

4.1 Flow of MOGEP


MOGEP is executed in the following steps:
Step 1: Iteration counter iter = 0. An initial population Pt which consists of N indi-
viduals are randomly generated and an empty external archive NDSet, whose size is M
(M<N), is created.
Step 2: Each individuals in Pt is assessed and assigned fitness value according to the
fitness assignment scheme (section 4.2).
Step 3: External archive NDSet is updated with the non-dominated solutions of the
current population Pt. If the number of non-dominated solutions is less than archive
4 L. Nie et al.

size M, external archive NDSet is filled with dominated solutions in population Pt;
otherwise if the number of non-dominated solutions exceeds archive size M, some
member in the archive NDSet is removed according with the diversity maintaining
scheme (section 4.3).
Step 4: If iter exceeds the maximal number of iteration, the algorithm is ended and
NDSet is put out; otherwise, go to the Step 5.
Step 5: According with the elitist strategy (section 4.4) the individuals in the exter-
nal archive NDSet are copied directly to the next population Pt+1.
Step 6: Genetic operators (section 4.6) are employed on population Pt and the off-
spring individuals are saved into the population Pt+1. Pt+1 size is maintained to be N .
Then increment iteration counter iter = iter + 1, and go to step 2.

4.2 Fitness Assignment Scheme

As for MOPs, assignment scheme of fitness is very important and effective fitness
assignment scheme makes sure that the search is directed towards the Pareto-optimal
solutions. In this paper, a fitness assignment scheme which combines Pareto
dominance relation and the density information is proposed. Each individual at each
generation is evaluated according the following steps: (1) Rank for an individual is
determined. (2) Density of an individual is estimated. (3) Fitness of an individual is
determined through incorporating its density information into its rank.
The non-dominated sorting algorithm [9] is used to define a rank for each individ-
ual. According Pareto dominance relation the population is splits into different non-
dominated fronts PF1, PF2,…, PFG, where G is the number of non-dominated fronts.
The individual in PFj+1 is dominated by at lease an individual in PFj (j=1,…, G-1).
And the individuals in each non-dominated fronts PFj (j=1,…, G) are indifferent to
each other. The rank of each individual i in PFj is assigned as below:
R (i ) = j − 1(i ∈ PFj ) . (8)

Since the individuals in each non-dominated front do not dominate each other and
have identical rank, additional density information is necessary to be incorporated
to discriminate between them. The density estimation technique [8] is used to define an
order among the individuals in PFj (j=1,…,G). Specifically, for each individual i in PFj,
the distances to all individuals in PFj are calculated and stored in increasing order. The
k
k-th element is denoted as di . k is set to be the square root of the front size. The density
D(i) corresponding to individual i is defined by:

D(i ) = exp(−2 * dik / d max ) . (9)

Where
d max = max{d ik , i ∈ PF j } . (10)

Finally, the fitness f(i) of an individual i is formulated as


f (i ) = G − R (i ) − D (i) . (11)
Multi-Objective Optimization for Dynamic Single-Machine Scheduling 5

According with the fitness assignment scheme, the fitness of the individuals in PF1
is in the interval of [G-1,G), the fitness of those in PF2 is in the interval of [G-2,G-1),
and the fitness of those in PFG is in the interval of [0,1). It is notable that fitness is to
be maximized here, in other words, a better individual is assigned a higher fitness so
that it may transfer fine genes to offspring with a higher probability.

4.3 Diversity Maintaining Scheme

Apart from the population, an external archive, whose size is fixed, is used to save the
non-dominated individuals of the population. If the number of non-dominated indi-
viduals exceeds the predefined archive size, some individuals is needed to be deleted
from the archive. In order to maintain the diversity of the population, the individuals
with higher density should be deleted, i.e., the individuals with lower fitness should be
deleted from the archive.

4.4 Elitist Strategy

Although a number of Pareto-based multi-objective optimizing algorithms, such as


MOGA [10], NPGA [11], and NSGA [12] are demonstrated the capability to approxi-
mate the set of optimal trade-offs in a single optimization run, they are all Non-elitism
approach. Zitzler et al. [13] had shown in his study that elitism helps in achieving better
convergence in multi-objective evolutionary algorithm. In the paper, each member of
the archive is regarded as an elitist and replicated directly to the next population. Being
directly copied to the next population is the only way how an individual can survive
several generations in addition to pure reproduction which may occur by chance. This
technique is incorporated in order not to lose certain portions of the current non-
dominated front due to random effects.

4.5 Chromosome Representation Scheme

The function set (FS) and terminal set (TS) used to construct SRs are defined as fol-
lows. FS = {+, -, *, /}. TS = { p, r, d, sl, st, wt}, where p denotes job’s processing time;
r denotes job’s release date; d denotes job’s due date; sl denotes job’s positive slack, sl
= max {d − p − max{t, r}, 0}, where t denotes the idle time of the machine; st denotes
job’s stay time, st = max {t – r, 0}, where t is defined as above; wt denotes job’s wait
time, wt = max {r – t, 0}, where t is defined as above.
Chromosomes are encoded according with the stipulations: (1) The head may con-
tain symbols from both FS and TS, whereas the tail consists only of symbols come
from TS. (2) The length of head and tail must satisfy the equation tl = hl * (arg − 1) +
1, where hl and tl is the length of head and tail, respectively, and arg is the maximum
number of arguments for all operations in FS. An example chromosome expressed
with the elements of FS and TS defined in above is illustrated in Fig. 1(a), where un-
derlines are used to indicate the tails.
Decoding is the process transferring the chromosomes to SRs. For the example
chromosome shown in Fig. 1(a), it is mapped into expression tree (ET) following a
depth-first fashion (Fig. 1(b)). The ET is interpreted into a SR with a mathematical
form as shown in Fig. 1(c).
6 L. Nie et al.

* wt
-.*.+.r.d./.sl.p.wt.st.wt.p.sl. (r+d)*sl/p-wt
+ /
(a) chromosome (c) SR
r d sl p
(b) ET

Fig. 1. Encoding and decoding scheme of GEP

4.6 Genetic Operators

The genetic operators are carried out on the population are listed below [15]:
Selection operator creates a mating pool comprised of individuals selected from
current population according to fitness by roulette wheel sampling. The roulette is
spun N-M times in order to maintain the population size unchanged (Note that M
individuals are copied directly from NDSet).
Mutation operator randomly changes symbols in a chromosome. In order to main-
tain the structural organization of chromosomes, in the head, any symbol can change
into any other function or terminals, while symbols in the tail can only change into
terminals.
Transposition operator (1) IS transposition, i.e., randomly choose a fragment be-
gins with a function or terminal (called IS elements) and transpose it to the head of
genes, except for the root of genes; (2) RIS transposition, i.e., randomly choose a
fragment begins with a function (called RIS elements) and transpose it to the root of
genes. In order not to affect the tail of the gene, symbols are removed from the end of
the head to make room for the inserted string.
Recombination operator (1) one-point recombination, i.e., split the two randomly
chosen parent chromosomes into halves and swap the corresponding sections; (2)
two-point recombination, i.e., split the chromosomes into three portions and swap the
middle one.

5 Experiments and Results


In this section, Simulation experiments are conducted on training and validating sets
which comprise of several problem instances to create SRs and evaluate their perform-
ance. The problem instances are randomly generated according with the following
method. The number of jobs is set to be 100. The processing times of jobs are drawn out
of [1,100]. Release dates of jobs are chosen randomly from [0, c*TP], where TP denotes
the sum of the processing time of all jobs, c is assigned values of 0.1. Due dates of jobs
are drawn out of [r+(TP-r)*(1-T-R/2), r+(TP-r)*(1-T+R/2)], where r denotes release
date of jobs, TP denotes the sum of the processing time of all jobs, T and R are assigned
values of 0.5, respectively. A train set which consists of 5 problem instances is gener-
ated and used to train MOGEP to create SRs. Another 5 instances are generated with the
same parameter settings and compose a validating set.
Multi-Objective Optimization for Dynamic Single-Machine Scheduling 7

In the experiments, MOGEP parameter settings are shown below. Population size
is 200. The length of head is 10 and thereby the total length of a chromosome is 21.
The mutation probability is 0.3. IS and RIS probability are 0.3 and 0.1, respectively.
One-point and two-point probability are 0.2 and 0.5, respectively. GEP stops run if it
finishes 100 iterations.
The SRs created by MOGEP on the training sets are listed below:

R1: 2p+wt(r+st) (12)


R2: p+d+sl*wt (13)
R3: p+(d+wt)(r+st)-sl (14)
R4: p+(sl+wt)(r+st)+sl (15)
To illustrate the efficiency of the SRs created by MOGEP, we compare their results
on the validating set with those of the SRs created by GEP [7], which aim to optimize
single objective. The results are listed in Table 1. R-Fi denotes the SRs obtained by
GEP for single objective of Fi (i = 1,…,4), respectively. Ri denotes the SRs obtained
by MOGEP for the objective of Fi (i = 1,…,4) simultaneously.
Take the result on instance 1-1 for example. The solutions generated on instance
1-1 with the MOGEP-created SRs R1, R2, R3 and R4 are X1={5167, 1557, 3218, 385},
X2={5167, 2397, 1338, 368}, X3={5167, 2428, 1330, 385}, and X4={5167, 2292,
2601, 234}, respectively. The solution generated on instance 1-1 with the GEP-
created SR R-F1 is X-F1={5167, 2447, 3422, 675}. It is optimal with respect to the
objective of makespan in the comparison with the solutions generated by other GEP-
creared SRs on instance 1-1, whereas it is dominated by Xi (i=1, 2, 3, 4). The solution
generated with R-F2 is X-F2={5168, 1569, 3219, 380}. It is optimal with respect to the
objective of flowtime in the comparison with the other solutions generated by GEP-
created SRs. However, X1 is better than X-F2 with respect to the objective of flowtime.
Although X2, X3 and X4 is worse than X-F2 with respect to the objectives of flowtime,
they are distinct better than X-F2 with regard to the other objectives, respecitvely. The
solution generated with R-F3 is X-F3={5167, 2434, 1330, 385}. It is optimal with
respect to the objective of lateness. However, it is dominated by X3. Although X1, X2
and X4 is worse than X-F3 with respect to the objectives of lateness, they outperform
X-F3 with regard to the other objectives, respectively. The solution generated with R-
F4 is X-F4={5215, 2270, 2649, 235}. It is optimal with respect to the objective of
tardiness in the comparison with the other solutions generated by GEP-created SRs.
However, X4 is better than X-F4 with respect to the objective of tardiness. Although
X1, X2 and X3 is worse than X-F4 with respect to the objectives of tardiness, they out-
perform X-F4 with regard to the other at least two objectives, respectively. The results
in table 1 demonstrate that the SRs created by GEP can generate optimal solutions on
a majority of instances with respect to single objective, whereas other objectives are
deteriorated meantime. The SRs created by MOGEP can generate solutions which
tradeoff well between multi objectives.
8 L. Nie et al.

Table 1. Comparison between MOGEP-created SRs and GEP-created SRs

GEP MOGEP
Ins. Obj.
R-F1 R-F2 R-F3 R-F4 R1 R2 R3 R4
F1 5167 5168 5167 5215 5167 5167 5167 5167
F2 2447 1569 2434 2270 1557 2397 2428 2292
1-1
F3 3422 3219 1330 2649 3218 1338 1330 2601
F4 675 380 385 235 385 368 385 234
F1 4691 4696 4691 4691 4691 4691 4697 4697
F2 2106 1385 2091 1982 1386 2055 2083 2032
1-2
F3 2381 2691 1083 1620 2686 1083 1089 1549
F4 597 339 225 146 338 215 218 149
F1 4896 4907 4896 4966 4896 4896 4905 4905
F2 2269 1503 2289 2139 1515 2257 2264 2149
1-3
F3 3162 2768 1217 2290 2757 1217 1226 2202
F4 665 398 343 209 407 331 330 200
F1 5230 5230 5230 5274 5230 5230 5230 5230
F2 2337 1544 2313 2205 1544 2267 2303 2218
1-4
F3 2947 3313 1280 2183 3313 1280 1280 1815
F4 639 340 285 185 340 266 285 178
F1 4604 4606 4604 4604 4604 4604 4606 4606
F2 2017 1294 2081 1912 1292 2043 2075 1967
1-5
F3 2578 2928 1141 1901 2926 1141 1143 1903
F4 568 325 305 168 325 290 303 178

6 Conclusions
Considering the fact that jobs usually arrive over time and several optimization objec-
tives must be considered simultaneously in many real scheduling situations, we pro-
posed MOGEP and applied it on the construction of SRs for DSMSP. MOGEP was
equipped with a fitness assignment scheme, diversity maintain strategy and elitist
strategy on the basis of original GEP. Simulation experiment results demonstrate that
MOGEP creates effective SRs which can generate good Pareto optimal solutions for
DSMSP. These findings encourage the further improvement of MOGEP and applica-
tion it on more complex scheduling problems.

Acknowledgments. This research is supported by the Natural Science Foundation of


China (NSFC) under Grant No. 60973086, 51005088, the Program for New Century
Excellent Talents in University under Grant No. NCET-08-0232.

References
1. Balas, E.: Machine scheduling via disjunctive graphs: an implicit enumeration algorithm.
Oper. Res. 17, 941–957 (1969)
2. Goldberg, D.: Genetic Algorithms in Search, Optimization and Machine Learning. Addi-
son-Wesley, Reading (1989)
Multi-Objective Optimization for Dynamic Single-Machine Scheduling 9

3. Laguna, M., Barnes, J., Glover, F.: Tabu search methods for a single machine scheduling
problem. J. Intell. Mauf. 2, 63–74 (1991)
4. Jakobović, D., Budin, L.: Dynamic Scheduling with Genetic Programming. In: Collet, P.,
Tomassini, M., Ebner, M., Gustafson, S., Ekárt, A. (eds.) EuroGP 2006. LNCS, vol. 3905,
pp. 73–84. Springer, Heidelberg (2006)
5. Atlan, L., Bonnet, J., Naillon, M.: Learning Distributed Reactive Strategies by Genetic
Programming for the General Job Shop Problem. In: 7th Annual Florida Artificial Intelli-
gence Research Symposium. IEEE Press, Florida (1994)
6. Miyashita, K.: Job-shop Scheduling with Genetic Programming. In: Genetic and Evolu-
tionary Computation Conference, pp. 505–512. Morgan Kaufmann, San Fransisco (2000)
7. Nie, L., Shao, X.Y., Gao, L., Li, W.D.: Evolving Scheduling Rules with Gene Expression
Programming for Dynamic Single-machine Scheduling Problems. Int. J. Adv. Manuf.
Tech. 50, 729–747 (2010)
8. Zitzler, E., Thiele, L.: Multiobjective Evolutionary Algorithms: A Comparative Case
Study and the Strength Pareto Approach. IEEE T. Evolut. Comput. 3(4), 257–271 (1999)
9. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A Fast Elitist Nondominated Sorting Ge-
netic Algorithm for Mmulti-objective Optimization: NSGA-II. In: Schoenauer, M., Deb,
K., Rudolph, G., Yao, X., Lutton, E., Merelo, J.J., Schwefel, H.-P. (eds.) Parallel Problem
Solving from Nature – PPSN VI, pp. 849–858. Springer, Berlin (2000)
10. Fonseca, C.M., Fleming, P.J.: Genetic Algorithms for Multiobjective Optimization: For-
mulation, Discussion and Generalization. In: 5th International Conference on Genetic Al-
gorithms, pp. 416–423. Morgan Kaufmann, California (1993)
11. Horn, J., Nafpliotis, N., Goldberg, D.E.: A Niched Pareto Genetic Algorithm for Multiob-
jective Optimization. In: 1st IEEE Conference on Evolutionary Computation, IEEE World
Congress on Computational Computation, pp. 82–87. IEEE Press, New Jersey (1994)
12. Srinivas, N., Deb, K.: Multiobjective Optimization Using Nondominated Sorting in Ge-
netic Algorithms. Evol. Comput. 2(3), 221–248 (1994)
13. Zitzler, E., Deb, K., Thiele, L.: Comparison of Multiobjective Evolutionary Algorithms:
Empirical Results. Evol. Comput. 8(2), 173–195 (2000)
14. Kacem, I., Hammadi, S., Borne, P.: Pareto-optimality Approach for Flexible Job-shop
Scheduling Problems: Hybridization of Evolutionary Algorithms and Fuzzy Logic. Math.
Comput. Simulat. 60, 245–276 (2002)
15. Ferreira, C.: Gene Expression Programming: A New Adaptive Algorithm for Solving
Problems. Complex System 13(2), 87–129 (2001)
16. Ferreira, C.: Discovery of the Boolean Functions to the Best Density-Classification Rules
Using Gene Expression Programming. In: Foster, J.A., Lutton, E., Miller, J., Ryan, C.,
Tettamanzi, A.G.B. (eds.) EuroGP 2002. LNCS, vol. 2278, pp. 50–60. Springer, Heidel-
berg (2002)
17. Zou, C., Nelson, P.C., Xiao, W., Tirpak, T.M.: Discovery of Classification Rules by Using
Gene Expression Programming. In: International Conference on Artificial Intelligence, Las
Vegas, pp. 1355–1361 (2002)
18. Zuo, J., Tang, C., Li, C., Yuan, C., Chen, A.: Time Series Prediction Based on Gene Ex-
pression Programming. In: Li, Q., Wang, G., Feng, L. (eds.) WAIM 2004. LNCS,
vol. 3129, pp. 55–64. Springer, Heidelberg (2004)
19. Chen, Y., Tang, C., Zhu, J.: Clustering without Prior Knowledge Based on Gene Expres-
sion Programming. In: 3rd International Conference on Natural Computation, pp. 451–455
(2007)
Research of Pareto-Based Multi-Objective Optimization
for Multi-Vehicle Assignment Problem Based on MOPSO

Ai Di-Ming, Zhang Zhe, Zhang Rui, and Pan Feng

Beijing Special Vehicles Research Institute


School of Automation, Beijing Institute of Technology (BIT), 5 South Zhongguancun Street,
Haidian District BeiJing,
100081 P.R.China
[email protected]

Abstract. The purpose of a multi-vehicle assignment problem is to allocate ve-


hicles to complete various missions at different destinations, meanwhile it is
required to satisfy all constrains and optimize overall criteria. Combined with
MOPSO algorithm, a Pareto-based multi-objective model is proposed, which
includes not only the time-cost tradeoff, but also a “Constraint-First-Objective-
Next" strategy which handles constraints as an additional objective. Numerical
experimental results illustrate that it can efficiently achieve the Pareto front and
demonstrate the effectiveness.

Keywords: Multi-objective vehicle assignment problem, Pareto, Multi-objective


Particle swarm optimizer (MOPSO).

1 Introduction
Nowadays, the importance and complexity of vehicle assignment system for an on-
demand transportation system are growing up and research approaches are evolving.
The vehicle assignment and job allocation problems focus on how to allocate and
schedule vehicles to perform missions at each destination, and to maximize effective-
ness of the overall mission, involved in goal assignment, trajectory optimization, and
time or job requirements, etc [1].
Some models of typical assignment and scheduling problems have been referred
and adapted, including Mixed-Integer Linear Programming (MILP) [2], Binary Linear
Programming (BLP) [3], Linear Ordering Problem (LOP) [4], Traveling Salesman
Problem (TSP) [5], computational intelligence algorithms based model [6,7]and
so on.
Generally, only the time consumption is considered in the abovementioned models.
And yet it is a multi-objective optimization problem with complex constrains, includ-
ing cost, time, distance and so on. The solution of multi-objective optimization
problem is a set of optimal solutions. In the paper, a Pareto-based multi-objective
optimization strategy is utilized, moreover, all constrains are treated as an additional
objective in the following section. And in the 3rd section, a multi-objective particle
swarm optimizer (MOPSO) is combined with the proposed model to handle the three
objectives optimization problem.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 10–16, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Research of Pareto-Based Multi-Objective Optimization 11

2 Multi-Objective Model of Vehicle Assignment Problem


According to various requirements, assignment schemes may be different from each
other. In this paper, the first objective is to assign vehicles to minimize the total mis-
sion time cost, and the second one is to maximize the total profit for performing tasks
on all targets. The third objective is to minimize the constraint violations.

2.1 Minimize Total Time Cost

A vehicle assignment problem consists of M vehicles, N destinations and K tasks,


subject to all constraints. The time cost function can be defined as follows:

J1 = min J CN ,M , ΠK , N , ONK ) (1)

The objective defined in Equation (1) is to minimize the maximum cumulative


completion time, where J(|CN,M(·)|) is the maximum value of the cumulative time
matrix CN,M. The variables are explained in Table 1.

Table 1. Nomenclature

Items Explanation
Tij = {tij}(N+M)×N, time cost matrix. tij is the flight time from node i to node j
C NM = {cn,m}N×M, cumulative time matrix. cn,m stands for the cumulative time of both the n-th
target and the m-th vehicle.
O NK = {oi-k}1×N·K, target sequence array. oik means the execution of the k-th task at the i-th
target
Π KN = {лk,n}K×N UAV assignment matrix, лk,m is the vehicle number to perform k-th task at
the n-th target

a. Cumulative Time matrix CNM


The matrix CNM (in Equation(2)) contains a cumulative time accounting of the mission
time. Each row in the CNM matrix corresponds to one destination, and each column
corresponds to one vehicle. When a job of the n-th destination is completed by the m-
th vehicle, the cumulative completion time cn is calculated. Thus the largest value
appearing in any cell of matrix CN.M is the maximum elapsed time for the mission,
which is the maximum time taken by any of the vehicles.

⎡ c1,1 ... c1, M ⎤


⎢ ⎥
C N ,M = ⎢ ... cn ,m ... ⎥ (2)
⎢⎣cN ,1 ... cN ,M ⎥⎦ N ×M

b. Assignment Matrix ПKN


The assignment matrix ПKN is a K by N matrix (in Equation(3)). Each row stands for
one task and each column stands for one destination. πKN is the element in the k-th
row and the n-th column of ПKN and stands for the vehicle number which performs the
k-th task of the n-th target.
12 A. Di-Ming et al.

⎡ π 1,1 .... π 1,N ⎤


Π K , N = ⎢⎢ ... π k,n ... ⎥⎥ (3)
⎣⎢π K,1 ... π K,N ⎦⎥ K × N

2.2 Maximize Total Profit

Since the importance differences of the tasks at various destinations and implementa-
tion capacities of the vehicles, the total profit (see Equation(4)) reflects the effective-
ness of each task completion, in terms of the assignment scheme. V1,V2,…Vm are
defined as the preferred value for each target. Pij represents that the abilities of the
vehicles to perform the task. i is the vehicle sequence array and j is task sequence
array. So the total profit for all tasks can be calculated as followed:
max J 2 = ∑
i∈M , j∈N
Vi ⋅ Pij (4)

2.3 Handle Constrains

The constraints are described as follows:


a. A vehicle leaves a destination at most once.
b. Only one vehicle can be assigned to perform ‘Task-2’ at one destination and can-
not subsequently be assigned to perform any other tasks at any destinations.
c. A vehicle, coming from the outside, can visit the destination at most once.
d. A vehicle can perform a task only if the preceding task at the node has been
completed.
Each constraints function is defined as an objective (see Equation(5)) The “Con-
straint-First-Objective-Next" technique is introduced. Obviously, the optimal of the
vehicle assignment problem not only has a minimum time cost to complete all tasks,
but also can satisfy all constraints which means Гi=0, i=1,2…R, that is J3=0.
J 3 = min ∑ Γi (Π K , N , ONK ) (5)

3 MOPSO Application
Based on the classic Particle Swarm Optimizer (PSO), whose parameters are set in
Table 2, with the consideration of the multi-objective optimization problem, some
modifications are introduced and the improved PSO is called MOPSO.

Table 2. PSO parameter settings

Parameters Settings Parameters Settings


Population size 20 Inertia weight 0.79
Neighborhood size 2 c1, c2 2.05
Pareto pool of pbest 10 Pareto pool of gbest 20
Research of Pareto-Based Multi-Objective Optimization 13

3.1 Design of Fitness Function

Aforementioned multi-objective assignment problem can be defined as follows:

J * = {min J1 (⋅), max J 2 (⋅), min J 3 (⋅)} (6)

The purpose of the MOP is to find the Pareto optimal sets and Pareto front, so as to
provide a series of potential solution to decision makers who can reach the final plan
according to synthesize other information and requirements. The MOPSO includes
external file maintenance, pbest and gbest information update, and so on.

3.2 Encoding

The position vector X of a particle in MOPSO is a K· (N+M) vector with real val-
ues(see Equation(7)), which consists of two parts that the first K·N elements corre-
spond to the target sequence array and the last K·M variables stand for the assignment
O
matrix. X KN is sorted first and then the modules of these sorted serial numbers are
calculated to acquire the target sequence array, explained as Table 3. The operation of
Π O
XKM is similar to X KN .
Π
X =[x1,..., xn,..., xK⋅N , xK⋅N+1,..., xv ,..., xK⋅N+K⋅M ] =[XKN
O
, XKM ] (7)

Table 3. Target Sequence Array

O O O
O
X KN X KN (1) X KN (2) X KN (3) X KNO (4)
Real value 12.315 10.343 55.819 19.556
ONK 3 4 1 2

3.3 The Maintenance of pBest and gBest

The maintenance strategy of Pareto pool for both pBest and gbest is as follows:
a. If the inferior solution particles dominate some of the solutions in the Pareto
pool, delete the dominated particles, and join the current solution to the Pareto pool.
b. The particle which be dominated by the Pareto pool particles is directly ignored.
c. If the current particle and the Pareto pool particles have no relation of dominate,
and the population of the Pareto pool did not reach the scale of the pool, then the
particle will join the Pareto pool directly. Otherwise, calculate the particle’s distance
among other particles and remove the particle with the minimum density distance.
The pBest for next iteration will be selected by the roulette-wheel selection from the
Pareto pool. And all existing gBest in the set will be ranked by the density. The gbest
in the set with less density will have a high probability to be selected as the gbest for
next generation.
14 A. Di-Ming et al.

3.4 The Details of MOPSO Algorithm

The Details of MOPSO algorithm as follows:


a. Initialize P[i] randomly(P is the population of particles), the speed of each parti-
cle V[i], the maximum number of iterations T and the parameters as Table 2, Create
external archive A that stores nondominated solutions null;
b. If T=0, Store the nondominated vectors found in P into A.
c. The values of objective functions based on the position of P[i] are calculated in
P and A.
d. Updating of pBest.
e. Insert all new nondominated solution in P into A, if they are not dominated by
any of the stored solutions. All dominated solutions in A by the new solution are
removed from A. If A is full, the solution to be replaced is determined by the the
crowding distance values of each nondominated solution in A.
f. Updating of gBest.
g. Calculate the new position and the new velocity.
h. The procedure from c to g is repeated until the number of iterations reaches the
specified number T. Then the searching is stopped and the set of A at that time is
regarded as a set of optimal solutions.

4 Numerical Experiments and Analysis

Two scenarios with different destinations, vehicles and two tasks requirements are
studied. The format of the ‘Scenarios’ is [Destination, Vehicle], which stands for the
number of destination and vehicle separately.
The minimum of the total mission time cost matrix includes flight time between
nodes and the task execution time. Tij, a N×(N+M) matrix, represents the time cost of
the flight time between nodes, while Tsk, a K×N×M matrix, represents the time cost of
the task execution.

4.1 Experimental 1. Scenario [2, 3]

For the Scenario [2, 3] with two destinations, three vehicles and two tasks, the pre-
ferred value of targets is shown in Table 4. The profits that vehicles perform different
tasks are listed in the Table 5, A set of Pareto solutions can be obtained by MOPSO,
and the Pareto front fitness value is shown in Table 6.

Table 4. Targets prefered value of Scenario [2,3]

Destination Preferred value V


Destination1 5.0
Destination2 8.0
Research of Pareto-Based Multi-Objective Optimization 15

Table 5. The UAV received value on performing the different tasks

Vehicles Received value P


Task-1 Task-2
Vehicle1 4 7
Vehicle2 9 5
Vehicle3 6 9

Table 6. The fitness of Scenario [2,3]

Objectives Fitness
J1 11.5131 16.88552 126.6441 213.18341
J2 185 195 209 210
J3 0 0 0 0

4.2 Experimental 2. Scenario [3, 4]

For the Scenario [3, 4] with three destinations, four vehicles and two tasks, the pre-
ferred value and profits are shown in the Table 7 and Table 8. The Pareto front fitness
value is shown in Table 9.

Table 7. Targets prefered value of Scenario [3,4]

Destination Preferred value V


Destination1 5.0
Destination2 15.0
Destination3 10.0

Table 8. The UAV received value on performing the different tasks

Vehicles Received value P


Task-1 Task-2
Vehicle1 18 10
Vehicle2 10 20
Vehicle3 5 12
Vehicle4 10 4

Table 9. The fitness of Scenario [3,4]

Objectives Fitness
J1 10 11.51671 12.98741 25.84051 219.42261
J2 475 525 645 670 710
J3 0 0 0 0 0
16 A. Di-Ming et al.

5 Conclusion
Many of the experiments with different scenarios have been tested, each scenario, a
set of Pareto optimal can be achieved. On the other hand, the Pareto front are not
continue, since the existence of constrain requirements which is the third fitness J3
which determines the feasible solutions are limited.
In summary, the proposed multi-objective vehicle assignment model can reduce
the dimension of the solution space and be easily adapted by MOPSO algorithms.
Furthermore, the constrain treatment strategy, which considers the violations as an
objective, is an effective method. The future work is going to refine the model for
more complicate scenarios and improve algorithm’s flexibility, stability and distribu-
tion uniformity for more tasks.

Acknowledgments

The authors gratefully acknowledge the support of NNSF(Grant No.60903005).


Special thanks go to Dr. Russ Eberhart, Xiaohui HU and Yaobin Chen at Indiana
University-Purdue University Indianapolis for their assistance and collaboration.

References
1. Chandler, P., Pachter, M., Swaroop, D., Fowler, J.: Complexity in UAV cooperative con-
trol. In: American Control Conference, ACC, pp. 1831–1836 (2002)
2. Schumacher, C., Chandler, P.R., Pachter, M., Pachter, L.S.: Optimization of Air Vehicle
Operations Using Mixed-Integer Linear Programming. Air Force Research Lab
(AFRL/VACA) Wright-Patterson AFB, OH Control Theory Optimization Branch (2006)
3. Guo, W., Nygard, K.E., Kamel, A.: Combinatorial Trading Mechanism for Task Alloca-
tion. In: Proceedings of the 14th International Conference on Computer Applications in In-
dustry and Engineering, Las Vegas, Nevada, USA (2001)
4. Arulselvan, A., Commander, C.W., Pardalos, P.M.: A hybrid genetic algorithm for the tar-
get visitation problem. Naval Research Logistics (2007)
5. Vijay, K.S., Moises, S., Rakesh, N.: Priority-based assignment and routing of a fleet of
unmanned combat aerial vehicles. Elsevier Science Ltd. 35, 1813–1828 (2008)
6. Pan, F., Hu, X., Eberhart, R., Chen, Y.: A New UAV Assignment Model Based on PSO.
In: IEEE Swarm Intelligence Symposium (SIS 2008), St. Louis, USA (2008)
7. Pan, F., Chen, J., Tu, X.-Y., Cai, T.: A multiobjective-based vehicle assignment model for
constraints handling in computational intelligence algorithms. In: International Conference
Humanized Systems 2008, Beijing, P.R.China (2008)
Correlative Particle Swarm Optimization for
Multi-objective Problems

Yuanxia Shen1,2, Guoyin Wang1, and Qun Liu1


1
Institute of Computer Science and Technology, Chongqing University of Posts and
Telecommunications, Chongqing 400065, China
2
Anhui University of Technology, Maanshan 243002, China
[email protected], [email protected]

Abstract. Particle swarm optimization (PSO) has been applied to multi-


objective problems. However, PSO may easily get trapped in the local optima
when solving complex problems. In order to improve convergence and diversity
of solutions, a correlative particle swarm optimization (CPSO) with disturbance
operation is proposed, named MO-CPSO, for dealing with multi-objective
problems. MO-CPSO adopts the correlative processing strategy to maintain
population diversity, and introduces a disturbance operation to the non-
dominated particles for improving convergence accuracy of solutions.
Experiments were conducted on multi-objective benchmark problems. The
experimental results showed that MO-CPSO operates better in convergence
metric and diversity metric than three other related works.

Keywords: Multi-objective problems, Correlative particle swarm optimization,


Population diversity.

1 Introduction

Optimization plays a major role in the modern-day design and decision-making


activities. The multi-objective optimization (MOO) is a challenging problem due to
the inherent confliction nature of objective to be optimized. As evolutionary
algorithm can deal simultaneously with a set of possible solutions in a single run, it is
especially suitable to solve MOO problems.
Since Schaffer proposed a Vector Evaluated Genetic algorithm (VEGA) in 1984[1],
many evolutionary MOO algorithms have been developed in the past decades. The
most of studies published on multi-objective genetic algorithm (MOGA) and multi-
objective evolutionary algorithm (MOEA) in different fashion, such as, SPEA2 [2],

NSGA [3].
Particle swarm optimization (PSO) is a class of stochastic optimization technique
that is inspired by the behavior of bird flocks [4]. As PSO has the fast convergence
speed, Coello Coello employed PSO to solve MOO problems, and proposed MOPSO
in 2004 [5]. The main difference between the original single-objective PSO and the

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 17–25, 2011.
© Springer-Verlag Berlin Heidelberg 2011
18 Y. Shen, G. Wang, and Q. Liu

multi-objective PSO (MOPSO) is the local guide (gbest) distribution must be


redefined in order to obtain a set of non-dominated solutions (Pareto front). To
maintain population diversity is also important for solving MOO, and then several
techniques [5-7] are introduced to PSO, e.g. an adaptive-grid mechanism, an adaptive
mutation operation. However, this results in desired diversity and convergence still
not close enough to the Pareto front. In order to improve convergence and diversity of
solutions, a correlative particle swarm optimization (CPSO) with disturbance
operation is proposed, named MO-CPSO. MO-CPSO adopts the correlative
processing strategy to maintain population diversity, and introduces the disturbance
operation to the optimal particles have found for improving convergence accuracy
of solutions.
The remainder of this paper is organized as follows. Basic concepts of MOO are
described in section 2. CPSO is introduced in section 3. MO-CPSO is presented in
section 4. Experimental results on some benchmark optimization problems are
discussed in section 5. Conclusions are drawn in section 6.

2 Basic Concepts of MOO

In general, many real-world applications involve complex optimization problems with


various competing specifications and constraints. Without loss of generality, we
consider a minimization problem with decision space Y which is a subset of real
numbers. For the minimization problem, it tends to find a parameter set y for

Min F ( y ) y ∈ RD (1)
y∈Y

where y = [y1, y2, . . . , yD] is a vector with D decision variables and F = [f1, f2, . . . , fM]
are M objectives to be minimized.
In the absence of any preference information, a set of solutions is obtained, where
each solution is equally significant. Pareto dominance and Pareto optimality are
defined as follows:

Definition 1 (Pareto dominance). A solution y= [y1, y2, . . . , yD] is said to dominate


the other solution z = [z1, z2, . . . , zD], if both statement below are satisfied.

1. The solution y is no worse than z in all objectives, or fi(y)≤fi(z) for all i {1,2, . . .,
M}.
2. The solution y is strictly better than z in at least one objective, or fi(y)<fi(z) for at

least one i {1,2, . . .,M}.

Definition 2 (Pareto optimality). For a general MO problem, a given solution y F ∈


(where F is the feasible solution space) is the Pareto optimality if, and only if there is

no z F that dominates y.
Correlative Particle Swarm Optimization for Multi-objective Problems 19

Definition 3 (Pareto front). The front obtained by mapping the Pareto optimal set (OS)
into the objective space is called POF.
JK
POF = { f = ( f1 ( x)," , f M ( x)) | x ∈ OS } (2)

The determination of a complete POF is a very difficult task, owing to the


presence of a large number of suboptimal Pareto fronts. By considering the existing
memory constraints, the determination of the complete Pareto front becomes
infeasible and requires the solutions to be diverse covering its maximum possible
regions.

3 Correlative Particle Swarm Optimization (CPSO)

3.1 Standard PSO (SPSO)

A swarm in PSO consists of a number of particles. Each particle represents a potential


solution of the optimization task. Each particle adjusts its velocity according to the
past best position pbest and the global best position gbest in such a way that it
accelerates towards positions that have had high objective (fitness) values in previous
iterations. The position vector and velocity vector of the particle i in N-dimensional
space can be indicated as xi =(xi1,…, xin , …, xiN) and vi=(vi1,…, vin , … ,viN)
respectively. The updating velocity and position of the particles are calculated using
the following two equations:
Vi , j (t + 1) = wVi , j (t ) + c1r1i , j ( pbesti , j (t ) − X i , j (t )) + c2 r2i , j ( gbest j (t ) − X i , j (t )). (3)

X i , j ( t + 1) = X i, j ( t ) + V i , j ( t + 1). (4)

where, w is the inertia weight; c1 and c2 are positive constants known as acceleration
coefficients; Random factors r1 and r2 are independent uniform random numbers in
the range [0,1]. The value of velocity vector vi can be restricted to the range [-vmax,
vmax] to prevent particles from moving out of the search range. vmax represents the
maximal magnitude of the element of velocity vector vi,j.

3.2 Correlative Particle Swarm Optimization (CPSO)

In SPSO model, the strategy with independent random coefficients is used to process
gbest and pbest. This strategy makes no difference to exploit gbest and pbest, and lets
the cognitive and the social components of the whole swarm contribute randomly to
the position of each particle in the next iteration. In CPSO, correlative factors are used
to process gbest and pbest, and create the relationship between gbest and pbest, where
correlative factors are correlated random factors.
20 Y. Shen, G. Wang, and Q. Liu

In [8], Shen and Wang pointed out the positive correlation between random
factors can maintain population diversity. In order to improve diversity of
solutions, the random factors are positive correlation in this paper. The Gaussian
Copula is used to describe correlated random factors. The Gaussian copulas is a
member of the Elliptical copulas family, which is by far the most popular copula used
in the framework of intensity or structural models because that it is easy to
simulate. The updating velocity of the particle is calculated using the following
equation:

⎧⎪Vi , j (t + 1) = wVi , j (t ) + c1r1 i, j (t )( pbesti , j (t ) − X i , j (t )) + c2 r2 i, j ( gbest j (t ) − X i , j (t )) (5)



⎪⎩ H ( r1 i, j (t ), r2 i , j (t )) = Φ ρ (Φ ( r1 i, j (t )), Φ ( r2 i , j (t ))) ρ > 0
−1 −1

where, H is the joint distribution function of correlative factors. Φρ denotes the joint
distribution function of a standard 2-dimensional normal random vector with
correlation matrix, and Φ is the univariate standard normal distribution function. Φ-1
is the inverses function of Φ. ρ denotes the correlation coefficient between correlated
random factors r1 and r2, where 0<ρ<1.

4 MO-CPSO

In single-objective problems, the term gbest represents the best solution obtained by
the whole swarm. In MO problems, more than one conflicting objectives must all
need be optimized. The number of non-dominated solutions which are located on/near
the Pareto front will be more than one. To resolve this problem, the concept of non-
dominance is used and an archive of non-dominance solutions is maintained, from
which a solution is picked up as the gbest in MO-CPSO. The historical archive stores
non-dominance solutions to prevent the loss of good particles. The archive is updated
at each cycle, e.g., if the candidate solution is not dominated by any members in the
archive, it will be added to the archive. Likewise, any archive members dominated by
this solution will be removed form the archive.
In MO problems, there are many non-dominated solutions which are located on
the Pareto front. This paper introduces a disturbance operation to non-dominated
solutions in the archive for trying to find better solutions or other non-dominated
solutions. The disturbance operation will randomly select m non-dominated solutions
from the archive to put noise into their positions, and is shown as (6)
X i , j (t ) = X i , j (t ) + b *η * X i , j (t ) (6)

Where, b is a positive constant; η is a Gaussian distribution noise with mean value of


0, and the variance is 1. MO-CPSO is described in Fig.1.
Correlative Particle Swarm Optimization for Multi-objective Problems 21

/Ns: size of the swarm; MaxIter: maximum member of iterations; d: the


dimensions of the search space./
(1) t = 0, randomly initialize S0; /St: swarm at iteration t /
Ь initialize xi,j, ię{1, . . . ,Ns} and ję{1, . . . ,d}/ xi,j: the j-th coordinate
of the i-th particle /
Ь initialize vi,j, ię{1, . . . ,Ns} and ję{1, . . . ,d}/vi,j: the velocity of i-th
particle in j-th dimension /
Ь pbestiĸ xi, ię{1, . . . ,Ns} / pbesti: the coordinate of the personal
best of the i-th particle /
(2)Evaluate each of the particles in S0.
(3)A0ĸnon_dominated(S0) /returns the non-dominated solutions from the
swarm; At: archive at iteration t /
(4) for t = 1 to t = MaxIter:,
Ь for i = 1 to i = Ns / update the swarm /
/ generating correlative factors /
Set correlation coefficient U of correlative factors to be 1, generate the
correlative factors are uniform in the range [0,1]
(a) Given two independent random variables t1,t2 from U(0,1)
(b) Calculate k1=)-1(t1), k2=)-1(t2), where ) (.) is standard Normal
distribution;
(c) Perform Cholesky transformation: k1= k1, k2=U k1 +(1-U2)0.5 k2˗
(d) Calculate r1=) (k1), r2=) (k2), r1 and r2 are thus simulated from the
elliptical copula with correlation coefficient U
/ updating the velocity of each particle /
Ь vi= w vi + c1r1(pbesti -xi) + c2r2(Regb-xi)
/ Regb is the non-dominant solution that is randomly taken form the
archive/
/updating coordinates /
Ь xi = xi + vi
(5) Evaluate each of the particles in St.
(6) /updating the archive /
Atĸnon_dominated(St).
(7)/disturbance operation to the randomly selected solutions in At/
AtĸSelected (non_dominated(St))(1+b*Ș);
(8) END
Return At

Fig. 1. Pseudo code of MO-CPSO


22 Y. Shen, G. Wang, and Q. Liu

5 Experimental Results

5.1 Test Functions and Parameters Setting


In the context of MOO, the benchmark problems must pose sufficient difficulty to
impede searching for the Pareto optimal solutions.
In this paper, four benchmark problems are selected to test the performance of the
proposed MO-CPSO. The definition of these test functions is summarized in
Table 1.
In this experiment, the maximum fitness evaluation (FE) is set at 10000. The
population size is set at 100 for all problems. The correlation coefficient ρ is set at
0.95. The parameter b in the disturbance operation is set at 0.05.

Table 1. Test functions

Test functions Definition


Schaffer’s Minimize F=(f1(x),f2(x)), where f (x) = x2 , f (x) = (x − 2)2 , x ∈[−103,103 ]
1 2
study (SCH)
Fonseca,s and Minimize F=(f1(x),f2(x)), where
( ) ( )
f1 ( x) = 1 − exp( −∑ i =1 xi − 1/ 3 ), f 2 ( x) = 1 − exp(−∑ i =1 xi + 1/ 3 ), xi ∈ [ −4, 4], i = 1, 2,3
2 2
Fleming’s study 3 3

(FON)
Minimize F=(f1(x),f2(x)), where
f1 ( x) = [1 + ( A1 − B1 )2 + ( A2 − B2 )2 ], f 2 ( x) = [( x1 + 3)2 + ( x2 + 1)2 ]
Poloni’s study A1 = 0.5sin1 − 2cos1 + sin 2 − 1.5cos 2, A2 = 1.5sin1 − cos1 + 2sin 2 − 0.5cos 2
(POL)
B1 = 0.5sin x1 − 2cos x1 + sin x2 − 1.5cos x2 , B2 = 1.5sin x1 − cos x1 + 2sin x2 − 0.5cos x2 ,
xi ∈[−π , π ], i = 1, 2
Minimize F=(f1(x),f2(x)), where
Kursawe’s
f1 ( x ) = ∑ i =1[−10 exp( −0.2 xi2 + xi2+1 )], f 2 ( x ) = ∑ i =1[| xi |0.8 +5sin( xi3 )],
2 3
study
(KUR) xi ∈ [−5,5], i = 1, 2,3

5.2 Performance Metrics

The knowledge of Pareto front of a problem provides an alternative for selection from
a list of efficient solutions. It thus helps in taking decisions, and also, the knowledge
gained can be used in situations where the requirements are continually changing. In
order to provide a quantitative assessment for the performance of MO optimizer, two
issues are taken into consideration, i.e. the convergence to the Pareto-optimal set and
the maintenance of diversity in solutions of the Pareto-optimal set. In this paper,
convergence metric γ [7] and diversity metric δ [7] have as qualitative measures.
Convergence metric is used to measure the extent of convergence of the obtained set
of solutions. The smaller is the value of γ, the better is the convergence toward the
POF. Diversity metric is used to measure the spread of solutions lying on the POF.
For the most widely and uniformly spread out set of non-dominated solutions,
diversity metric δ would be very small.
Correlative Particle Swarm Optimization for Multi-objective Problems 23

5.3 Experimental Results and Discussions

Results for the convergence metric and the diversity metric obtained using
MO-CPSO, are given in Table 2 and 3, where results of NSGA- , MOPSO and Ⅱ
MOIPSO come form Ref. [7]. From the results, they are evident that MO-CPSO
converges better than the other three algorithms. In order to clearly visualize the
quality of solutions obtained, figures have been plotted for the obtained Pareto fronts
with POF. As can been seen form Fig. 2, the front obtained from MO-CPSO has the
high extent of coverage and uniform diversity for all test problems. In a word, the
performance of MO-CPSO is better in converges metric and diversity metric. It must
be noted that MOPSO adopts an adaptive mutation operator and an adaptive-grid
division strategy to improve its search potential, while MOIPSO adopts search
methods including an adaptive-grid mechanism, a self-adaptive mutation operator,
and a novel decision-making strategy to enhance balance between the exploration and
exploitation capabilities. MO-CPSO only adopts disturbance operation to solve MOO
problems, and no other parameters are introduced.

Table 2. Results of the convergence metric for test problems

Test function γ NSGA- Ⅱ MOPSO MOIPSO MO-CPSO


Mean 3.382e-3 3.242e-3 2.638e-3 2.067e-3
SCH
Std 1.100e-4 4.900e-4 4.410e-4 8.102e-5
Mean 1.914e-3 1.806e-3 1.517e-3 1.448e-3
FON
Std 1.906e-3 1.100e-3 3.000e-4 3.437e-4
Mean 1.734e-2 1.694e-2 1.253e-2 1.241e-2
POL
Std 0 2.300e-6 1.400e-6 2.858e-6
Mean 2.954e-2 2.647e-2 3.128e-2 2.464e-2
KUR
Std 2.300e-4 2.700e-4 4.500e-4 2.381e-4

Table 3. Results of the diversity metric for test problems

Test function δ NSGA- Ⅱ MOPSO MOIPSO MO-CPSO


Mean 4.692e-1 4.524e-1 4.388e-1 3.931e-1
SCH
Std 3.486e-3 3.570e-3 3.430e-3 3.121e-3
Mean 3.804e-1 3.729e-1 3.162e-1 2.711e-1
FON
Std 8.090e-4 8.500e-3 1.140e-4 1.021e-4
Mean 3.672e-1 3.726e-1 3.140e-1 3.055e-1
POL
Std 8.640e-4 2.435e-3 1.980e-4 2.721e-4
Mean 4.279e-1 4.106e-1 4.541e-1 3.955e-1
KUR
Std 1.180e-3 8.470e-4 1.200e-3 1.105e-3
24 Y. Shen, G. Wang, and Q. Liu

(a)

(b)

(c)

(d)

Fig. 2. Pareto solutions of MOPSO and MO-CPSO. (a) SCH, (b) FON, (c) POL, (d)KUR.
Correlative Particle Swarm Optimization for Multi-objective Problems 25

6 Conclusion

In this paper, we propose MO-CPSO for multi-objective problems, where the


correlative processing strategy is used to maintain population diversity, and the
disturbance operation is adopted to improve convergence accuracy of solutions.
Experimental results show that the proposed algorithm can find solutions with good
diversity and convergence, and is an efficient approach for complex multi-objective
optimization problems.

Acknowledgments. This paper is supported by Chongqing Key Lab of Computer


Network and Communication Technology No.CY-CNCL-2009-03.

References
1. Schaffer, J.D.: Multiple objective optimization with vector evaluated genetic algorithms.
PhD thesis, Vanderbilt University (1984)
2. Zitzler, E., Thiele, L.: Multiobjective evolutionary algorithms: A comparative case study
and the strength Pareto approach. Transactions on Evolutionary Computation 3(4), 257–
271 (2000)
3. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic
algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6(2), 182–197
(2002)
4. Kennedy, J., Eberhart, R.C.: Particle Swarm Optimization. In: Proceeding of International
Conference on Neural Networks, pp. 1942–1948. IEEE Press, Perth (1995)
5. Coello, C.A.C., Pulido, G.T., Lechuga, M.S.: Handling multiple objectives with particle
swarm optimization. IEEE Transactions on Evolutionary Computation 3(3), 256–280
(2004)
6. Liu, D.S., Tan, K.C., Goh, C.K., Ho, W.K.: A multi-objective memetic algorithm based on
particle swarm optimization. IEEE Transaction on Systems, Man and Cybernetics, Part b:
Cybernetics 37(1), 42–61 (2007)
7. Agrawal, S., Dashora, Y., Tiwari, M.K., Son, Y.J.: Interactive particle swarm: a pareto-
adaptive metaheuristic to multiobjective optimization. IEEE Transaction on Systems, Man
and Cybernetics, Part a: Systems and Humans 38(2), 258–278 (2008)
8. Shen, Y.X., Wang, G.Y., Tao, C.M.: Particle swarm optimization with novel processing
strategy and its application. International Journal of Computational Intelligence
Systems 4(1), 100–111 (2011)
A PSO-Based Hybrid Multi-Objective Algorithm for
Multi-Objective Optimization Problems

Xianpeng Wang and Lixin Tang

Liaoning Key Laboratory of Manufacturing System and Logistics, The Logistics Institute,
Northeastern University, Shenyang, 110004, China
[email protected], [email protected]

Abstract. This paper proposes a PSO-based hybrid multi-objective algorithm


(HMOPSO) with the following three main features. First, the HMOPSO takes
the crossover operator of the genetic algorithm as the particle updating strategy.
Second, a propagating mechanism is adopted to propagate the non-dominated
archive. Third, a local search heuristic based on scatter search is applied to
improve the non-dominated solutions. Computational study shows that the
HMOPSO is competitive with previous multi-objective algorithms in literature.

Keywords: Multi-objective optimization, hybrid particle swarm optimization.

1 Introduction

Many kinds of MO evolutionary algorithms (MOEAs) have been proposed and


widely used. These MOEAs range from traditional EAs such as genetic algorithm
(GA), evolution strategy, and genetic programming, to newly developed techniques
such as the NSGA-II [1], the PAES [2], the SPEA2 [3], the MO scatter search [4], and
so on. In recent years, many particle swarm optimization (PSO) algorithms have been
proposed for solving MO problems, and the computational results show that PSO is
very suitable for MO problems. Previous researches on MOPSO can be classified
into the following four categories.
The main mechanism of PSO is that each particle flies with the guidance of pbest
and gbest, so the selection and update of pbest and gbest are very critical for MOPSO
algorithms. Hu and Eberhart [5] proposed a dynamic neighborhood PSO, in which in
each iteration each particle first finds its new neighbors by calculating the distances to
every other particle and then takes the local best one in its new neighbors as gbest. A
sigma method was proposed by Mostaghim and Teich [6] to find the appropriate gbest.
Coello et al. [7] proposed to adopt an external archive, which stores the best non-
dominated particles found so far, to guide the flight of particles.
Unlike general MOPSO algorithms that use a single swarm, the second category
divides the single swarm into sub-swarms so as to improve the diversity of MOPSO.
Chow and Tsui [8] presented a modified MOPSO by considering each objective
function as a species swarm. Yen and Leong [9] proposed a dynamic multiple swarm
MOPSO, in which the number of sub-swarms is adaptively adjusted throughout the
search process via the dynamic swarm strategy.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 26–33, 2011.
© Springer-Verlag Berlin Heidelberg 2011
A PSO-Based Hybrid Multi-Objective Algorithm 27

The third category combines or incorporates the advantages of other EA algorithms


or local search algorithms into MOPSO to improve its exploration and exploitation
abilities. Li [10] incorporated the main mechanisms of the NSGA-II into PSO, and
developed a hybrid MOPSO. Srinivasan and Seow [11] developed the particle swarm
inspired evolutionary algorithm (PS-EA) that is a hybrid between PSO and EA. Liu
et al. [12] adopted the local search and proposed a memetic algorithm based on PSO,
in which a new particle updating strategy is adopted based on the concept of fuzzy
global-best to avoid premature convergence and maintain diversity.
The fourth category focuses on the influence of variations of parameters on the
performance of MOPSO and adopts adaptive parameter control strategy. Tripathi et
al. [13] proposed the time variant MOPSO, in which the vital parameters (i.e., w, c1,
c2) change adaptively with iterations so as to help the algorithm to explore the search
space more efficiently.
In this paper, we propose a new multi-objective optimization algorithm based on
PSO and denote it as HMOPSO.
The rest of this paper is organized as follows. Section 2 gives the introduction of
MO optimization. The details of the HMOPSO are introduced in Section 3. Section 4
reports and analyzes the computational results on benchmark problems. Finally, the
paper is concluded in Section 5.

2 Proposed HMOPSO Algorithm

2.1 Algorithm Overview

The implementation of the HMOPSO can be presented in Fig. 1. In the following of


the paper, we use EXA to represent the external archive and PBA[i] to represent the
personal best archive of each particle i.

2.2 EXA Propagating Mechanism

As mentioned in the literature review of MOPSO, many previous researches focused


on the selection of gbest (or guiding solutions) from the EXA, but few of them took into
account the quality of the EXA (i.e., the number and diversity of non-dominated
solutions stored in the EXA). In the experiment, we found that at the beginning
iterations (or beginning search process) of canonical MOPSO algorithms, the EXA
generally has very few non-dominated solutions stored in it. Since there are only very
few non-dominated solutions to be selected as gbest that has significant influence on
the flight of particles, particles in the swarm tend to be attracted by the same non-
dominated solution or very close non-dominated solutions and thus the swarm may
converge quickly and be stuck to a local optimal front (especially for MO problems
with many local optimal fronts). Therefore, in this paper we propose a propagating
mechanism to propagate the EXA whenever the number of non-dominated solutions in
it is too small so as to avoid premature convergence and improve the exploration
ability of MOPSO.
Let nEXA denote the maximum size of the EXA and |EXA| denote the current size of
the EXA, and then the propagating mechanism can be described as follows.
28 X. Wang and L. Tang

Begin:
Initialization:
1. Set the termination criterion, and initialize the values of parameters such as the size of
the population, the size of EXA, the size of PBA[i] (note that all particles have the same
size of PBA[i]), the mutation probability.
2. Set EXA and PBA[i] to be empty.
3. Randomly initialize all the particles in the swarm.
4. Evaluate each particle in the swarm, and store each particle i in PBA[i].
5. Store the non-dominated particles in the swarm in EXA.
while (the termination criterion is not reached) do
1. EXA-propagating-mechanism () % extend EXA when necessary %
2. Particle-flight-mechanism () % particle flight using crossover %
3. Particle-mutation ()
4. Evaluate each particle in the swarm.
5. for each particle i in the swarm
PBA[i]-update-strategy ()
End for
6. for each non-dominated particle i in the swarm
EXA-update-strategy () using the non-dominated particle i
End for
7. EXA-improvement () % local search on EXA %
End while
Report the obtained non-dominated solutions in EXA.
End

Fig. 1. The main procedure of HMOPSO

Step 1. If |EXA| = 1, go to Step 2; otherwise, go to Step 3.


Step 2. Perturb the single solution in EXA for nEXA times to generate other nEXA new
solutions.
Step 3. Randomly select two solutions in EXA, and use the simulated binary
crossover (SBX) operator to generate two offspring solutions. Select the
best non-dominated one as the new solution. Repeat this procedure until
nEXA new solutions are obtained.
Step 4. Use the EXA-update-strategy described in the following section 3.7 to
update the EXA with the nEXA new solutions.

2.3 Particle Flight Mechanism

The canonical MOPSO algorithm updates particles using the flight equations, which
consists of three parameters, i.e., w, c1, and c2. When combined with other algorithms,
the hybrid MOPSO will have more parameters, which will cause great difficulty in
the parameter tuning. Therefore, in this paper we do not follow the canonical flight
equations, but prefer to adopt a new strategy based on the SBX operator of GA. Since
there have been many research report with respect to the parameters of the SBX
operator, it is reasonable to follow the suggested setting, and thus there is no
parameters to be tuned in the adopted update mechanism. For each particle i, this
strategy has two simple steps: (1) randomly select a pbest from PBA[i], use the
crossover operator to generate two offspring solutions from the particle i and its
A PSO-Based Hybrid Multi-Objective Algorithm 29

selected pbest, and then randomly select a gbest from the EXA; and (2) use the SBX
crossover operator to generate two offspring solutions from the selected pbest and gbest,
and then select the best one based on Pareto dominance as the new particle.

2.4 Particle Mutation

To improve the search diversity, our HMOPSO algorithm also use the mutation
operation. For each dimension of each particle in the swarm, we first generate a
random number rnd in [0, 1]. If rnd<pm (the mutation probability), then the
polynomial mutation operator in [2] is used to mutate this dimension.

2.5 PBA[i] Update Strategy

The HMOPSO algorithm adopts a personal best archive PBA[i] for each particle i so as
to keep a good memory of the search history of particle i and at the same time
improve the search diversity. For simplicity, the maximum size of each PBA[i] is set
to be the same one, i.e., nPBA. For particle i with new positions, it is directly stored in
the PBA[i] if the current size of PBA[i] is smaller than nPBA; otherwise, we will first
store it in the PBA[i] and then randomly remove one from the PBA[i].

2.6 EXA Update Strategy

One major task of the MO algorithm requires that the solutions in the obtained EXA
should be uniformly distributed along the Pareto front in the objective space.
Therefore, the mechanism of crowding distance of NSGA-II in [2] is adopted to
maintain the diversity of the EXA.
Since the EXA has been initialized by the non-dominated particles in the swarm at
the first iteration, so for a given non-dominated particle (e.g., particle i) in the current
swarm at iteration t, the update procedure of the EXA can be described as follows.
Step 1. If particle i is dominated by one solution in the EXA, then discard particle i.
Step 2. If particle i is not dominated by any solution in the EXA, store it in the EXA
and then remove all solutions that are dominated by it from the EXA.
Step 3. If |EXA| > nEXA (the maximum size of the EXA), calculate the crowding
distance of all solutions in the EXA, and then remove the most crowded
solution (i.e., the solution with the least crowding distance) from the EXA.
Repeat this step until |EXA| = nEXA.

2.7 EXA Improvement

Since the selection of gbest from the EXA has significant influence on the performance
of MOPSO, it is then clear that the improvement on the EXA can improve the
MOPSO because this will help to provide better candidate solutions to be selected as
gbest. Motivated by this idea, we develop a local search heuristic named the EXA-
improvement to further improve the quality of the EXA, i.e., the distance of the EXA to
the true Pareto front and the diversity of solutions in the EXA. This local heuristic can
be viewed as a simplified version of scatter search (SS) because it adopts the concept
of reference set (denoted as REF) of SS in [14].
30 X. Wang and L. Tang

To give the EXA-improvement method, we first define the distance between


k

∑ fi ( x) − fi ( y) ( fi max − fi min ) , in which fi


max
solutions x and y of a MOP as d ( x, y ) =
i =1

and f i min are the maximum and minimum values of the ith objective function in the
EXA, respectively. Let nREF denote the maximum size of the REF, then the EXA-
improvement method can be described as follows.
Step 1. Construct REF
Step 1.1 If |EXA| = nREF, store all solutions in EXA in the REF.
Step 1.2 If |EXA| < nREF, store all solutions in EXA to REF, and then use the
non-dominated sorting method to classify the particles in the swarm
into different levels. Starting from front 1, randomly select a
particle and store it in REF, until |REF| = nREF.
Step 1.3 If |EXA| > nREF, then perform the following procedures.
Step 1.3.1 Calculate the crowding distance of each solution in the EXA and
then store them in the non-ascending order of their crowding
distances in a list L.
Step 1.3.2 Select the first nREF / 2 solutions in L and add them to REF, and
then delete them from L.
Step 1.3.3 Select the solution p with the maximum value of the minimum
distance to REF from L, add it to REF and then delete it from L.
Repeat this step until another nREF/2 solutions are added to REF.
Step 2. Generate new solutions from REF. Select two solutions from REF, use
the SBX operator to generate two offspring solutions, and then select the
best one as the new solution. Note that there are a total of nREF(nREF-1)/2
new solutions generated.
Step 3. Update the EXA. Use the obtained nREF(nREF-1)/2 new solutions to update
the EXA based on the EXA-update-strategy.

3 Computational Experiments
We adopt the General Distance (GD), the Spacing (SP), and the Maximum Spread
(MS) to evaluate the algorithm’s performance. Based on the experimental results, the
following parameter setting is adopted: npop = 100, nEXA =100, nREF =10, nPBA =5, and
nprop =0.3.
The HMOPSO is compared with other powerful or state-of-the-art algorithms such
as the NSGA-II [2], the MOPSO [7] (denoted as cMOPSO), and the MOPSO with
crowding distance [15] (denoted as MOPSO-CD). These three algorithms are selected
because they are proven to be very effective and often used by many researchers. In
this experiment, the maximum runtime is used as the stopping criterion because all
the algorithms are written in C++ and run on the same computer. In addition, for each
problem 30 independent duplications were carried out and we select the best one.
The computational results of each test problem are given in Figures 2-5. Based on
these results, it is clear that the proposed HMOPSO outperforms the other algorithms.
In addition, the proposed HMOPSO can reach the true Pareto fronts for all test MOPs,
and it shows a very robust performance. Among the rival algorithms, NSGA-II can
also reach the true Pareto fronts of all test MOPs, but its performance on the
A PSO-Based Hybrid Multi-Objective Algorithm 31

distribution of the non-dominated solutions is not as good as that obtained by the


proposed HMOPSO. The cMOPSO and MOPSO-CD cannot reach the true Pareto
fronts of all ZDT series of problems. Specially, they get trapped in local optimum for
ZDT4. For the other test MOPs, they show a bad performance on the distribution of
the obtained non-dominated solutions. With the incorporation of the crowding
distance of NSGA-II, the MOPSO-CD obtains significant improvements with
comparison to the cMOPSO.
1.2 Pareto front 1.2 Pareto front 1.2 Pareto front 1.2 Pareto front
1.0 HMOPSO 1.0 NSGA-II 1.0 HMOPSO 1.0 NSGA-II

0.8 0.8 0.8 0.8

0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2

0.0 0.0 0.0 0.0


0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00

1.4 Pareto front 1.4 Pareto front 1.6 Pareto front 1.4 Pareto front
1.2 cMOPSO 1.2 MOPSO-CD 1.4 cMOPSO 1.2 MOPSO-CD
1.2
1.0 1.0 1.0
1.0
0.8 0.8 0.8
0.8
0.6 0.6 0.6
0.6
0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0
0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00

Fig. 2. Pareto fronts produced by different algorithms for ZDT1 and ZDT2

1.2 1.2 1.2 Pareto front 1.2 Pareto front


Pareto front Pareto front
1.0 1.0 HMOPSO NSGA-II
HMOPSO NSGA-II 1.0 1.0
0.8 0.8
0.6 0.6 0.8 0.8
0.4 0.4
0.2 0.2 0.6 0.6
0.0 0.0
-0.2 -0.2 0.4 0.4
-0.4 -0.4
-0.6 -0.6 0.2 0.2
-0.8 -0.8
-1.0 -1.0 0.0 0.0
0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00

1.5 Pareto front 1.5 70.0 Pareto front 90.0 Pareto front
Pareto front
cMOPSO MOPSO-CD 60.0 cMOPSO 80.0 MOPSO-CD
1.0 1.0 70.0
50.0
60.0
0.5 0.5 40.0 50.0
30.0 40.0
0.0 0.0
30.0
20.0
-0.5 -0.5 20.0
10.0 10.0
-1.0 -1.0 0.0 0.0
0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00

Fig. 3. Pareto fronts produced by different algorithms for ZDT3 and ZDT4

1.0 Pareto front 1.0 Pareto front 2 Pareto front 2 Pareto front
0.9 HMOPSO 0.9 NSGA-II
NSGA-II 0 HMOPSO 0
0.8 0.8
0.7 0.7 -2 -2
0.6 0.6 -4 -4
0.5 0.5
0.4 0.4 -6 -6
0.3 0.3 -8 -8
0.2 0.2
-10 -10
0.1 0.1
0.0 0.0 -12 -12
0.20 0.40 0.60 0.80 1.00 0.20 0.40 0.60 0.80 1.00 -20 -19 -18 -17 -16 -15 -14 -20 -19 -18 -17 -16 -15 -14

7.0 Pareto front 10.0 Pareto front 2 Pareto front 2 Pareto front
cMOPSO 9.0 MOPSO-CD cMOPSO MOPSO-CD
6.0 0 0
8.0
5.0 7.0 -2 -2
4.0 6.0 -4 -4
5.0
3.0 4.0 -6 -6
2.0 3.0 -8 -8
2.0
1.0 -10 -10
1.0
0.0 0.0 -12 -12
0.20 0.40 0.60 0.80 1.00 0.20 0.40 0.60 0.80 1.00 -20 -19 -18 -17 -16 -15 -14 -20 -19 -18 -17 -16 -15 -14

Fig. 4. Pareto fronts produced by different algorithms for ZDT6 and KUR
32 X. Wang and L. Tang

8.0 Pareto front 8.0 Pareto front 8.6 Pareto front 8.6 Pareto front
7.0 HMOPSO 7.0 NSGA-II 8.4 HMOPSO 8.4 NSGA-II
6.0 6.0 8.2 8.2
5.0 5.0
8.0 8.0
4.0 4.0
7.8 7.8
3.0 3.0
7.6 7.6
2.0 2.0
1.0 1.0 7.4 7.4
0.0 0.0 7.2 7.2
0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00 -4 -2 0 2 4 6 -4 -2 0 2 4 6

9.0 Pareto front 14.0 Pareto front 8.6 Pareto front 8.6 Pareto front
8.0 cMOPSO 12.0 MOPSO-CD 8.4 cMOPSO 8.4 MOPSO-CD
7.0
10.0 8.2 8.2
6.0
5.0 8.0 8.0 8.0
4.0 6.0 7.8 7.8
3.0
4.0 7.6 7.6
2.0
1.0 2.0 7.4 7.4
0.0 0.0 7.2 7.2
0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00 -4 -2 0 2 4 6 -4 -2 0 2 4 6

Fig. 5. Pareto fronts produced by different algorithms for Deb2 and KITA

4 Conclusion
In this paper, we investigated the improvement to the canonical MOPSO algorithm
and proposed three main strategies. First, the traditional update equations for
particles’ positions are replaced by a new particle flight mechanism that is based on
the crossover operator in GA. Second, motivated by observations that there are few
non-dominated solutions for some problems in the starting process of MOPSO, we
proposed a propagating mechanism to improve the quality and diversity of the
external archive. Third, a modified version of scatter search was adopted as the local
search to improve the external archive. In addition, we adopted the DOEs method to
analyze the influences of each parameter and their interactions on the performance of
our HMOPSO algorithm. In the comparative study, HMOPSO is compared against
existing state-of-the-art multi-objective algorithms through the use of benchmark test
problems. The results indicate that our HMOPSO algorithm is competitive or superior
to the NSGA-II, and much better than two MOPSO algorithms in the literature for all
benchmark problems.

Acknowledgements
This research is supported by Key Program of National Natural Science Foundation
of China (71032004), National Natural Science Foundation of China (70902065),
National Science Foundation for Post-doctoral Scientists of China (20100481197),
and the Fundamental Research Funds for the Central Universities (N090404018).

References
1. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast and elitist multi-objective genetic
algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6(2), 182–197
(2002)
2. Knowles, J.D., Corne, D.W.: Approximating the nondominated front using the Pareto
archived evolution strategy. Evolutionary Computation 8, 149–172 (2000)
A PSO-Based Hybrid Multi-Objective Algorithm 33

3. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the strength Pareto evolutionary
algorithm. Computer Engineering Networks Lab (TIK), Swiss Federal Institute of
Technology (ETH), Zurich, Switzerland, Technical Report, 103 (2001)
4. Nebro, A.J., Luna, F., Alba, E., Dorronsoro, B., Durillo, J.J., Beham, A.: AbYSS -
Adapting scatter search to multiobjective optimization. IEEE Transactions on Evolutionary
Computation 12(4), 439–457 (2008)
5. Hu, X., Eberhart, R.C.: Multiobjective optimization dynamic neighborhood particle swarm
optimization. In: Proceedings of Congress on Evolutionary Computation, pp. 1677–1681
(2002)
6. Mostaghim, S., Teich, J.: Strategies for finding local guides in multi-objective particle
swarm optimization (MOPSO). In: Proceedings of IEEE Swarm Intelligence Symposium,
pp. 26–33 (2003)
7. Coello, C.A.C., Pulido, G.T., Lechuga, M.S.: Handling multiple objectives with particle
swarm optimization. IEEE Transactions on Evolutionary Computation 8(3), 256–279
(2004)
8. Chow, C.K., Tsui, H.T.: Autonomous agent response learning by a multi-species particle
swarm optimization. In: Proceedings of Congress on Evolutionary Computation, pp. 778–
785 (2004)
9. Yen, G.G., Leong, W.F.: Dynamic multiple swarms in multiobjective particle swarm
optimization. IEEE Transactions on Systems, Man, and Cybernetics – Part A 39(4), 890–
911 (2009)
10. Goh, C.K., Tan, K.C., Liu, D.S., Chiam, S.C.: A competitive and cooperative co-
evolutionary approach to multi-objective particle swarm optimization algorithm design.
European Journal of Operational Research 202(1), 42–54 (2010)
11. Li, X.D.: A non-dominated sorting particle swarm optimizer for multiobjective
optimization. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M.,
Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter,
M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO
2003. LNCS, vol. 2723, pp. 37–48. Springer, Heidelberg (2003)
12. Srinivasan, D., Seow, T.H.: Particle swarm inspired evolutionary algorithm (PS-EA) for
multiobjective optimization problem. In: Proceedings of Congress on Evolutionary
Computation, pp. 2292–2297 (2003)
13. Tripathi, P.K., Bandyopadhyay, S., Pal, S.K.: Multi-Objective Particle Swarm
Optimization with time variant inertia and acceleration coefficients. Information
Science 177(22), 5033–5049 (2007)
14. Martí, R., Laguna, M., Glover, F.: Principles of scatter search. European Journal of
Operational Research 169(2), 359–372 (2006)
15. Raquel, C.R., Naval Jr., P.C.: An effective use of crowding distance in multiobjective
particle swarm optimization. In: Proceedings of Conference on Genetic Evolutionary
Computation, pp. 257–264 (2005)
The Properties of Birandom Multiobjective
Programming Problems

Yongguo Zhang1 , Yayi Xu2 , Mingfa Zheng2 , and Liu Ningning1


1
College of Elementary Education, Xingtai University
Xingtai, Hebei, 054001, China
2
College of Science, Air Force Engineering University,
Xi’an, Shanxi, 710051, China
{yongguo924,mingfa103}@163.com, {mingfazheng,lnn}@126.com

Abstract. This paper is devoted to the multiobjective programming


problem based on the birandom theory. We first propose the birandom
multiobjective programming (BRMOP) problem and its expected value
model. Then we present the concepts of non-inferior solution, called
expected-value efficient solutions and expected-value wake efficient so-
lutions, and their properties are also discussed. The results obtained in
this paper can provide theoretical basis for designing algorithms to solve
the BRMOP problem.

Keywords: birandom variable, multiobjective programming , expected-


value efficient solution.

1 Introduction
The multiobjective programming problems are studied by many researchers such
as [2], [7], [8]. For given multiobjective problem, its absolute optimal solutions
which optimize each objective functions simultaneously usually dons’t exist, so
we consider their non-inferior solutions in a sense, which are Pareto optimal
solutions in common use.
There are various types of uncertainties in the real-world problem. As we
known, random phenomena is one class of uncertain phenomena which has been
well studied. Based on the probability, stochastic multiobjective programming
problems have been presented such as [1], [10].
In a practical decision-making process, we often face a hybrid uncertain envi-
ronment where linguistic and frequent nature coexist. For the examples of two
fold uncertainty, we may refer to [9], Liu [3], [4], Liu[5], Liu and Liu [6], Yazenin
[11]. To deal with this two fold uncertainty, it is required to employ biran-
dom theory [9]. The multiobjective programming in birandom environment have
not been developed well, therefore, following the idea of stochastic multiobjec-
tive programming, this paper devotes the birandom multiobjective programming
(BRMOP) problems based on the birandom theory. For the birandom param-
eters, we consider their expectation which convert the BRMOP problem into
the expected-value model of random fuzzy multiobjective (EVBRMOP) which

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 34–40, 2011.

c Springer-Verlag Berlin Heidelberg 2011
The Properties of Birandom Multiobjective Programming Problems 35

is a deterministic multiobjective problem. By the deterministic problem above,


we can obtain the expected-value efficient solutions or expected-value wake effi-
cient solutions to the BRMOP problem, and their relations are also discussed,
whose results can provide theoretical basis for designing algorithm to solve the
proposed problem.
This paper is organized as follows. The next section provides a brief review on
the related concepts and results in birandom theory. Section 3 presents the BR-
MOP problem and its expected value model. Furthermore, based on the expected
value model, the expected-value efficient solution and expected-value wake ef-
ficient solution to the BRMOP is proposed, and their properties are discussed.
Finally, Section 4 provides a summary of the main results of this paper.

2 Preliminaries
Let ξ be a random variable defined on the probability space(Ω, Σ, Pr), where
Ω is a universe and Σ is the σ-algebra of subsets of Ω and Pr is a probability
measure defined on (Ω, Σ, Pr).

Definition 2.1.[9] A birandom variable is a function ξ from the probability space


(Ω, Σ, Pr) to the set of random variables such that Pr{ξω ∈ B} is a measurable
function of ω for any Borel set B of .

Proposition 2.1.[9] Assume that ξ is a birandom variable defined on the proba-


bility space (Ω, Σ, Pr). Then for ω ∈ Ω, we have
(1) Pr{ξω ∈ B} a random variable for any B ∈ B();
(2) E[ξω ] is a random variable provided that E[ξω ] is finite for fixed ω ∈ Ω.
Definition 2.2.[9] Let be a birandom variable defined on the probability space
(Ω, Σ, Pr). The expected value E[ξ] is defined by[9]
 ∞  0
E[ξ] = E[E[ξω ]] = Pr{E[ξω ] ≥ r}dr − Pr{E[ξω ] ≤ r}dr (1)
0 −∞

provided that at least one of the two integrals is finite.


From Eq.(1), we can obtain the following expectation of birandom variable
i.e.,
E[ξ] = Eω [Eω [ξω (ω)]], where ω × ω  ∈ Ω × Ω  . (2)

3 Birandom Multiobjective Programming


3.1 Expected Value Model of Birandom Multiobjective
Programming
If y = (y1 , y2 , · · · , yn )T , z = (z1 , z2 , · · · , zn )T ∈ Rn , then we define:
y = z ⇐⇒ yi = zi , i = 1, 2, · · · , n
y > z ⇐⇒ yi > zi , i = 1, 2, · · · , n
y >= z ⇐⇒ yi >= zi , i = 1, 2, · · · , n,
y≥z⇐⇒yi >= zi , i = 1, 2, · · ·, n; there exist j0 at least such that
yj0 > zj0 , 1 =< j0 <= n, i.e., y 
= z.
36 Y. Zhang et al.

Similarly, we can define y < z, y <= z, y ≤ z.


Considering the birandom multiobjective programming (BRMOP) problem
as follows:

min f(x, ξ) = (f1 (x, ξ), f2 (x, ξ), · · · , fp (x, ξ)T ) ⎪

x∈R
(BRMOP) s.t. g(x, ξ) = (g1 (x, ξ), g2 (x, ξ), · · · , gm (x, ξ))T <= 0 (3)

h(x, ξ) = (h1 (x, ξ), h2 (x, ξ), · · · , hl (x, ξ))T = 0, ⎭

where decision-making variable x ∈ Rn , ξ is a continuous birandom variable.


For the BRMOP problem, we assume the condition that fj (x, ξω (ω)), j =
1, 2, · · · , p, is borel measure function defined on measure space (Ω, Σ, Pr), then,
by the definition of random fuzzy variable, we can easily obtain that fj (x, ξω ) =
Eω [fj (x, ξω (ω)] is a random variable for given x ∈ Rn and ω  ∈ Ω  .
To solve the BRMOP problem, based on birandom theory, we present the
expected value model of birandom multiobjective programming (EVBRMOP)
problem which is a deterministic multiobjective programming problem as follows:
min E[f (x, ξ)] = (E[f1 (x, ξ)], E[f2 (x, ξ)], · · · , E[fp (x, ξ)])T , (4)
x∈D

where D={x∈Rn|E[g(x, ξ)] = (E[g1 (x, ξ)], E[g2 (x, ξ)], · · · , E[gm (x, ξ)])T<= 0,
E[h(x, ξ)] = (E[h1 (x, ξ)], E[h2 (x, ξ)], · · · , E[hl (x, ξ)])T = 0}.

Theorem 3.1.1. Let ξ be a birandom variable, f(x, t) and g(x, t) be convex vec-
tor function on x for any given t, then the EVBRMOP problem is a convex
programming.

Proof. To prove the theorem, it is sufficient to illuminate that E[f (x, ξ)] is a
convex vector function and feasible region D is a convex set. By the assumed
conditions, for any given t, we can obtain:

f(λx1 + (1 − λ)x2 , t) ≤ λf(x1 , t) + (1 − λ)f(x2 , t),

for any λ ∈ [0, 1] and x1 , x2 ∈ n .


It is evidence that the following inequality

f(λx1 + (1 − λ)x2 , ξ) ≤ λf(x1 , ξ) + (1 − λ)f(x2 , ξ) (5)

holds for ω  ×ω ∈ Γ ×Ω. Taking the expectation of random variable to inequality


(5), by the linear properties of random variable, we can obtain:

Eω [f (λx1 + (1 − λ)x2 , ξω (ω))] ≤ λEω [f (x1 , ξω (ω))] + (1 − λ)Eω [f (x2 , ξω (ω))].
Using the similar method, we know that the random variable Eω [f (x, ξω (ω))]
have linear properties, so we have
Eω [Eω [f (λx1+(1−λ)x2 , ξω(ω))]]≤λEω [Eω [f (x1 , ξω(ω))]]+(1−λ)Eω [Eω [f (x2 , ξω(ω))]],

namely,

E[f (λx1 + (1 − λ)x2 , ξ)] ≤ λE[f (x1 , ξ)] + (1 − λ)E[f (x2 , ξ)],
The Properties of Birandom Multiobjective Programming Problems 37

which shows E[f (x, ξ)] is a convex vector function.


Next we will illuminate that feasible region D is a convex set. If x1 , x2 ∈ D,
it follows from the convexity of vector function g that

g(λx1 + (1 − λ)x2 , ξ) ≤ λg(x1 , ξ) + (1 − λ)g(x2 , ξ),

for any t and λ ∈ [0, 1].


Similarly, by the linear properties of random variable, we can obtain:

Eω [g(λx1 +(1−λ)x2 , ξω (ω))] ≤ λEω [g(x1 , ξω (ω))]+(1−λ)Eω [g(x2 , ξω (ω))] <= 0.
(6)
It follows from the linear properties that
Eω [Eω [g(λx1+(1−λ)x2 , ξω(ω))]]≤λEω[Eω[g(x1 , ξω(ω))]]+(1−λ)Eω[Eω [g(x2 , ξω(ω))]]<=0.

namely,

E[g(λx1 + (1 − λ)x2 , ξ)] ≤ λE[g(x1 , ξ)] + (1 − λ)E[g(x2 , ξ)] <= 0. (7)

On the other hand, because h(ξ, t) is linear vector function, we can obtain

h(λx1 + (1 − λ)x2 , ξ) = λh(x1 , ξ) + (1 − λ)h(x2 , ξ).

Similarly, it follows from the linear properties that

E[h(λx1 + (1 − λ)x2 , ξ)] = λE[h(x1 , ξ)] + (1 − λ)E[h(x2 , ξ)]. (8)

Obviously, by Eq.(7)and Eq.(8),we know the feasible region D is a convex set.


Hence, the EVBRMOP problem is a convex programming. The proof is complete.

3.2 The Expected-Value Non-inferior Solutions and Their Relations


Definitions 3.2.1. For the EVBRMOP problem, if x∗ ∈ D, we say that x∗ is the
expected-value absolute optimal solution to the BRMOP problem whose solution
set is denoted Dab if it satisfies the following conditions:

E[f (x∗ , ξ)] <= E[f (x, ξ)],

namely,
E[fj (x∗ , ξ)] <= E[fj (x, ξ)], for all j = 1, 2, · · · , p.
Definitions 3.2.2. For the EVBRMOP problem, if x∗ ∈ D, we say that x∗ is the
expected-value efficient solutions to the BRMOP problem whose solution set is
denoted Dpa if it satisfies the following conditions: there does’t exist x ∈ D such
that
E[f (x, ξ)] ≤ E[f (x∗ , ξ)].
namely,

E[fj (x∗ , ξ)] <= E[fj (x, ξ)], for all j = 1, 2, · · · , p,


38 Y. Zhang et al.

and there must exist some j0 at least such that

E[fj0 (x∗ , ξ)] < E[fj0 (x, ξ)].

Definitions 3.2.3. For the EVBRMOP problem, if x∗ ∈ D, we say that x∗ is the


expected-value wake efficient solutions to the BRMOP problem whose solution
set is denoted Dwpa if it satisfies the following conditions: there does’t exist
x ∈ D such that
E[f (x, ξ)] < E[f (x∗ , ξ)].
Theorem 3.2.1. Dab ⊂ Dpa ⊂ Dwpa ⊂ D.
Proof. We first prove that Dab ⊂ Dpa . If Dab = φ, then the result is immediate.
If not, suppose that x∗ ∈ Dab , and x∗ ∈ / Dpa , then, by the definition of the
expected-value efficient solution, their must exist x ∈ D such that

E[f (x, ξ)] ≤ E[f (x∗ , ξ)],

namely,
E[fj (x, ξ)] ≤ E[fj (x∗ , ξ)],
for all j = 1, 2, . . . , p, and their exists j0 at least such that

E[fj (x, ξ)] < E[fj (x∗ , ξ)], 1 <= j0 <= p,

which implies the contradiction with x∗ ∈ Dab . Hence, Dab ⊂ Dpa .


Then we prove that Dpa ⊂ Dwpa . If x∗ ∈ Dpa , and x∗ ∈ / Dwpa , then, by the
definition of expected-value wake efficient solution, their must exist x ∈ D such
that E[f (x, ξ)] < E[f (x∗ , ξ)], namely,

E[fj (x, ξ)] < E[fj (x∗ , ξ)]

holds for all j = 1, 2, . . . , p. Thus, we can obtain:

E[f (x, ξ)] ≤ E[f (x∗ , ξ)],

which implies x∗ ∈ / Dpa . By the previous assumption, we obtain the contradic-


tion with x∗ ∈ Dpa . Hence, Dpa ⊂ Dwpa . It follows from the definition of the
expected-value wake efficient solution that Dwpa ⊂ D, which proves the desired
theorem.
Theorem 3.2.2. (1) If Dab  = φ, then Dab = Dpa .
(2)If h(x, ξ) is linear vector function, f(x, t) and g(x, t) are strict convex
vector function on x, then we can obtain

Dpa = Dwpa .

Proof. It follows from Theorem 3.2.1 that we need only to prove Dab ⊃ Dpa . If
x∗ ∈ Dpa , and x∗ ∈ / Dab , since Dab 
= φ, their must exist x ∈ Dab , by the defi-
nition of expected-value absolute optimal solution, we can obtain E[f (x, ξ)] <=
E[f (x∗ , ξ)]. Since x∗ ∈ = E[f (x∗ , ξ)]. It follows from
/ Dab , we have E[f (x, ξ)] 
The Properties of Birandom Multiobjective Programming Problems 39

the inequality above that E[f (x, ξ)] ≤ E[f (x∗ , ξ)], which is a contradiction with
x∗ ∈ Rpa . Hence, Dab ⊃ Dpa , which implies the required conclusion.
(2) It follows from Theorem 3.2.1 that we need only to prove Dwpa ⊂ Dpa . If
x∗ ∈ Dwpa , and x∗ ∈ / Dpa , we know that their must exist x ∈ D, and x  = x∗ ,

such that E[f (x, ξ)] ≤ E[f (x , ξ)]. By the assumed conditions and Theorem 3.1,
we can obtain that D is a convex set, hence, αx + (1 − α)x∗ ∈ D for any given
α ∈ (0, 1). Since f(x, ξ) is strict convex vector function on D, and f(x, ξ) is also
conmonotonic, by noting the inequality just given, it easy to know that

E[f (αx + (1 − α)x∗ , ξ)] < αE[f (x, ξ)] + (1 − α)E[f (x∗ , ξ)] < E[f (x∗ , ξ)],

which is a contradiction with x∗ ∈ Dwpa . Thus, Dwpa ⊂ Dpa , which proves the
required theorem.

4 Conclusions
Based on birandom theory, the BRMOP problem and its expected value model
has been introduced in this paper. Since the non-inferior solutions play impor-
tant role to multiobjective problem, the expected-value efficient solutions and
expected-value wake efficient solutions of the BRMOP problem are presented
and their relations are also studied. The results in this paper which can be as
theoretical tool to design algorithm for solving BRMOP problem.

Acknowledgments

The authors Mingfa Zheng and Yayi Xu were supported by National Natural
Science Foundation of China under Grant 70571021, and the Shanxi Province
Science Foundation under Grant SJ08A02.

References
1. Benabdelaziz, F., Lang, P., Nadeau, R.: Pointwise efficiency in multiobjective
stochastic linear Prograaming. Jourmal of Operational Research Sociaty 45, 11–18
(2000)
2. Hu, Y.D.: The efficient theory of multiobjective programming. Shanghai Since and
Technology Press, China (1994)
3. Liu, B.: Fuzzy random dependent-chance programming. IEEE Trans. Fuzzy Syst. 9,
721–726 (2001)
4. Liu, B.: Uncertain programming. Wiley, New York (1999)
5. Liu, B.: Random fuzzy dependent-chance programming and its hybrid intelligent
algorithm. Information Sciences 141, 259–271 (2002)
6. Liu, Y.K., Liu, B.: Expected value operator of random fuzzy variable operator.
International Journal of Uncertainty, Fuzziness, Knowlledge-Based Systems 11,
195–215 (2003)
7. Lin, C.Y., Dong, J.L.: The efficient theory and method of multiobjective program-
ming. Jilin Educational Press, China (2006)
40 Y. Zhang et al.

8. Ma, B.J.: The efficient rate of efficient solution to linear multiobjective program-
ming. Jounal of Systems Engineering and Electronic Techology 2, 98–106 (2000)
9. Peng, J., Liu, B.: Birandom variables and birandom programming. Technical (2003)
10. Stancu-Minasian, I.M.: Stochastic programming with multiple objective functions.
Buckarest (1984)
11. Yager, R.R.: A foundation for a theory of possibility. Journal of Cybernetics 10,
177–204 (1980)
A Modified Multi-objective Binary Particle Swarm
Optimization Algorithm

Ling Wang, Wei Ye, Xiping Fu, and Muhammad Ilyas Menhas

Shanghai Key Laboratory of Power Station Automation Technology,


School of Mechatronics Engineering and Automation, Shanghai University,
Shanghai 200072, China
[email protected]

Abstract. In recent years a number of works have been done to extend Particle
Swarm Optimization (PSO) to solve multi-objective optimization problems, but
a few of them can be used to tackle binary-coded problems. In this paper, a
novel modified multi-objective binary PSO (MMBPSO) algorithm is proposed
for the better multi-objective optimization performance. A modified updating
strategy is developed which is simpler and easier to implement compared with
standard discrete binary PSO. The mutation operator and dissipation operator
are introduced to improve the search ability and keep the diversity of algorithm.
The experimental results on a set of multi-objective benchmark functions dem-
onstrate that the proposed MBBPSO is a competitive multi-objective optimizer
and outperforms the standard binary PSO algorithm in terms of convergence
and diversity.

Keywords: Binary PSO, Multi-objective optimization, Pareto.

1 Introduction
Multi-objective optimization problems (MOPs), which have more than one objective
function, are ubiquitous in science and engineering fields such as astronomy science,
electronic engineering, automation, artificial intelligence. In MOPs, a unique optimal
solution is hard to find due to the contradictory objectives. On the contrary, the ‘trade-
off’ solutions, in other words, the non-dominated solutions are preferred. Several
approaches have been proposed to deal with multi-objective optimization problems
like reducing the problem dimension by combining all objectives into a single objec-
tive [1] or optimizing one while the rests being constrained [2]. However, these men-
tioned methods rely on a priori knowledge of the appropriate weights or constraint
values. Furthermore, they are only capable of finding the individual point on the
tradeoff curve for each problem solution. As a result, Pareto-based multi-objective
methods, which optimize all objectives simultaneously and eliminate the need for
determining appropriate weights or formulating constraints, have been current re-
search hotspot. Pareto-based multi-objective methods operate on the concept of ‘Pare-
to domination’ [3] and the solutions on the curve of Pareto front represent the best
possible compromises among the objectives [4]. So, one of the crucial goals in multi-
objective optimization is to find a set of optimal solutions that distribute well along
the Pareto front.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 41–48, 2011.
© Springer-Verlag Berlin Heidelberg 2011
42 L. Wang et al.

Particle Swarm Optimization (PSO) was firstly developed by Kennedy and Eberhart
in 1995[6]. It is originated by imitating the behavior of a swarm of birds trying to search
for food in an unknown area [5]. Owing to its simple arithmetic structure, high conver-
gence speed and excellent global optimization ability, PSOs have been researched and
improved to solve various multi-objective optimization problems. However, standard
PSO and most of its improved versions work in continuous space, which mean they
cannot tackle the binary-coded problems directly. To make up for it, Kennedy extended
the PSO and proposed a novel discrete binary PSO (DBPSO) [6]. Based on DBPSO,
researchers have introduced binary PSO to solve multi-objective problems. Abdul [8]
proposed a multi-objective DBPSO, called BMPSO, to select the cluster head for leng-
thening the network lifetime and preventing network connectivity degradation. Peng
and Xu [7] proposed a modified multi-objective binary PSO combining DBPSO with
immune system to optimize the placement of the phasor measurement unit. These works
prove that DBPSO-based multi-objective optimizers are efficient in solving MOPs.
Nevertheless, the previous works on single objective optimization problems show that
the optimization ability of DBPSO is not ideal [9], [10]. So we propose a novel mod-
ified multi-objective binary PSO (MMBPSO) in this paper to achieve the better multi-
objective search ability and simplified the implementation of algorithm.
The rest of the paper is organized as follows. In Section 2, the brief introduction on
DBPSO and a modified binary PSO algorithm is given first, and then the proposed
MMBPSO algorithm are described in detail. Section 3 validates the MMBPSO with
several benchmark problems, and the optimization performance and comparison are
also illustrated. Finally, some concluding remarks are given in Section 4.

2 Modified Multi-objective Binary Particle Swarm Optimization


2.1 Standard Modified Binary Particle Swarm Optimization
Shen and Jiang [11] developed a modified binary PSO (MBPSO) algorithm for fea-
ture selection. In MBPSO, the updating formulas are demonstrated as Eq. (1-3).

X X 0 V α (1)
X P α V 1 α ⁄2 (2)
X P 1 α ⁄2 V 1 (3)

The parameter α, called static probability, should be set properly. A small value of α
can improve the convergent speed of the algorithm but makes MBPSO be trapped in
the local optimum easily; while MBPSO with a big α may be ineffective as it cannot
utilize the knowledge gained before well [9].
Although the update formulas of MBPSO and DBPSO are different, the updating
strategy is still the same. In MBPSO, each particle still flies through the search space
according to its past optimal experience and the global optimal information of the
group. The Eq. (1) is an exhibition of inertia which represents the information that a
particle inherited from its previous generation. The Eq. (2) represents particle’s cogni-
tive capability which draws the particle to its own best position. The Eq. (3) is the
particle’s social capacity which leads the particle to move to the best position found
by the swarm [12].
A Modified Multi-objective Binary Particle Swarm Optimization Algorithm 43

2.2 Modified Multi-objective Binary Particle Swarm Optimization

Although MBPSO has been successfully adopted to solve various problems such as
numerical optimization problem, feature selection and multidimensional knapsack
problem, it is obvious that standard MBPSO cannot tackle Pareto-based multi-
objective optimization problems. So we extend MBPSO and propose a novel mod-
ified multi-objective binary PSO.

2.2.1 Updating Operator


To achieve the good convergence and diversity performance simultaneously in multi-
objective problem, it should very carefully counterpoise the global and local search
capability. In original MBPSO, the control parameter α has not the function of adjust-
ing the global search ability which spoils the performance of algorithm in MOPs. To
make up of this drawback, we modified the updating operator as Eq. (4-6).

X X 0 V α (4)
X P α V β (5)
X P β V 1 (6)
Here the parameter β can adjust the probability of tracking the two different best
solutions.
According to the Eq. (4-6), MMBPSO is easy to stick in the local optimal. For in-
stance, if X , P and P are all equal to “1”, X will be “1” forever and vice versa. So
the dissipation operator and the mutation operator are introduced to keep the diversity
and enhance the local search ability.

2.2.2 Dissipation Operator


The dissipation operator, as Eq. (7), is defined as randomly re-initializing a particle
with some probability which brings the new particle into the swarm and retains the
diversity effectively. Due to the characteristic of randomness, the probability of dissi-
pation operating pd should not be a big value which may destroy the basic updating
mechanism of algorithm. Usually pd is set between 0.05 and 0.15 to prevent the loss
of optimal information.

X Reini Xi rand (7)

2.2.3 Mutation Operator


The dissipation operator can greatly improves the diversity, but it may destroy the
useful information at the same time as it operates on the level of individual. To further
enhance the optimization ability of algorithm, the mutation operation is also intro-
duced which is defined as Eq. (8). Different from the dissipation operator, the muta-
tion operator works on the level of bit with the probability pm. So mutation can
improve the local search ability as well as keeping the diversity of algorithm.
1, if X 0
if rand X (8)
0, if X 1
44 L. Wang et al.

2.2.4 The Updating of Personal Best and Global Best


Each particle is guided by its two best individuals, i.e., the personal best solution (P )
and the global best solution of swarm (P ) to perform the search. So the updating of P
and P is very important for the optimization performance of algorithm. In Pareto-
based MOPs, the goal of algorithm is to find the diverse non-dominated solutions laid
in the Pareto front, which means that attention should be paid to the diversity as well
as convergence when we design the algorithm. To realize this goal, the niche count
[13] as a density measure is adopted in MMBPSO to select P .
Niche count is defined as the number of the particles in the niche. For example,
σ is the niche which indicates the radius of the neighborhood in Figure 1. From
Figure 1, we can find that particle B has 4 neighbors while particle A has 8 neighbors,
that is, particle B has a less crowded niche than particle A. So particle B is superior to
particle A in terms of diversity. In this work, the σ is calculated as Eq. (9).
max max min min
σ (9)
N 1
where max and min are the maximum and minimum values of the two objective
functions, and N is the population size.

A B

Fig. 1. An example of niche count

During each iteration process, the non-dominated solution set is sorted according to
the niche count. P for each generation is randomly chosen among top 10%“less
crowded” non-dominated particles in the set. To encourage MMBPSO to search the
whole space and find more non-dominated solutions, P is replaced by the non-
dominated current particle.

3 Experiments
3.1 Benchmark Functions and Performance Metrics
To test the performance of the proposed MMBPSO, five well-known benchmark
functions, i.e., ZDT1, ZDT2, ZDT3, ZDT4 and ZDT6 [15] are adopted in the paper.
All problems have two objective functions and without any constraint. Multi-
objective optimizer is designed to achieve two goals: 1) convergence to the Pareto-
optimal set and 2) maintenance of diversity in solutions of the Pareto-optimal set.
These two tasks cannot be measured adequately with one performance metric. So the
A Modified Multi-objective Binary Particle Swarm Optimization Algorithm 45

convergence metric Υ proposed in [15] and the diversity metric S proposed in [14] are
adopted to evaluate the performance of MMBPSO.

3.2 Experimental Results


MMBPSO was applied to optimize the 5 benchmark functions, and each function was
run 30 times independently. For a comparison, BMPSO [8] and DBPSO with the
same P and P updating strategy are also used to solve these benchmarks. The popula-
tion size and the maximum generation of all algorithms are 200 and 250 respectively,
and each decision variable is coded to 30 bits. The other parameter settings of 3 algo-
rithms are shown in Table 1. The optimization results are listed in the Table 2-3 and
are drawn in Fig.2 -3 as well.
The experimental results of convergence metric in Fig. 2 demonstrates that the
proposed MMBPSO finds better solutions which are more closely related to the real
Pareto front. For functions ZDT1, ZDT2, ZDT3 and ZDT6, MMBPSO has no diffi-
culty to reach P ; but for function ZDT4, the performance of MMBPSO is not ideal
due to the 21 different local Pareto-optimal fronts in the search space. However,
MMBPSO has much better convergence values than BMPSO, MDBPSO in all 5
benchmark functions. The metrics of diversity drawn in Fig. 3 also show that
MMBPSO is better than the other two algorithms on all functions.
To evaluate the performances of 3 algorithms more exactly and clearly, Fig. 4
plots the founded Pareto front of MMBPSO, MDBPSO and BMPSO on all functions,
which displays that MMBPSO outperforms MDBPSO and BMPSO.

Table 1. Parameters settings of MMBPSO, MDBPSO and BMPSO

Algorithm Parameters
MMBPSO α=0.55, β=0.775, pd=0.1, pm=0.001;
MDBPSO c 2.0, c 2.0 , ω 0.8, v 5,5 ;
BMPSO [10] c 2.0, c 2.0 , ω 0.8, v 5,5 .

ZDT1 ZDT2 ZDT3


0
10
0 0
10 10
Convegence
Convegence
Convegence

-1
10
-1
-1 10
10

-2
-2 10
10
-2
10 MMBPSO MDBPSO BMPSO
MMBPSO MDBPSO BMPSO
MMBPSO MDBPSO BMPSO ZDT6
ZDT4
2
10
0
10
Convegence
Convegence

-1
10
1
10
-2
10

MMBPSO MDBPSO BMPSO


MMBPSO MDBPSO BMPSO

Fig. 2. Box plots of the convergence metric obtained by MMBPSO, MDBPSO and BMPSO
46 L. Wang et al.

ZDT1 ZDT2 ZDT3

-1
10

Spacing
Spacing

-1

Spacing
10 -1
10

-2
10 -2
10

MMBPSO MDBPSO BMPSO MMBPSO MDBPSO BMPSO MMBPSO MDBPSO BMPSO


ZDT4
ZDT6
0
10

0
Spacing

-1

Spacing
10 10

-2
10
-2
10
MMBPSO MDBPSO BMPSO MMBPSO MDBPSO BMPSO

Fig. 3. Box plots of the distance metric obtained by MMBPSO, MDBPSO and BMPSO

ZDT1 ZDT2 ZDT3


4 5 5
MMBPSO MMBPSO
3.5 MDBPSO
MMBPSO
MDBPSO 4
BMPSO 4 MDBPSO
3 BMPSO
BMPSO
2.5
3
3
2 2
1.5 2
1
1
1
0
0.5

0 0 -1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
ZDT4 ZDT6
140 8
MMBPSO MMBPSO
120 MDBPSO MDBPSO
BMPSO 6 BMPSO
100

80
4
60

40
2
20

0 0
0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

Fig. 4. The founded Pareto front of MMBPSO, MDBPSO and BMPSO on ZDT series functions

Table 2. The results of convergence metric Υ

Algorithm MMBPSO MDBPSO BMPSO


Mean 1.41E-02 1.88E+00 1.77E+00
ZDT1
Variance 3.68E-05 1.60E-01 5.12E-02
Mean 1.74E-02 2.94E+00 2.45E+00
ZDT2
Variance 2.12E-04 1.93E-01 4.23E-02
Mean 1.23E-02 8.26E-01 9.23E-01
ZDT3
Variance 2.43E-05 5.91E-02 2.19E-02
Mean 8.39E+00 8.30E+01 1.08E+02
ZDT4
Variance 1.44E+01 2.34E+02 1.49E+02
Mean 4.90E-03 6.08E+00 6.40E+00
ZDT6
Variance 3.80E-05 1.37E-01 4.98E-02
A Modified Multi-objective Binary Particle Swarm Optimization Algorithm 47

Table 3. The results of diversity metric S

Algorithm MMBPSO MDBPSO BMPSO


Mean 2.46E-02 1.82E-01 1.59E-01
ZDT1
Variance 4.70E-03 1.20E-02 3.80E-03
Mean 3.48E-02 2.03E-01 2.26E-01
ZDT2
Variance 6.80E-03 2.20E-02 2.31E-02
Mean 5.05E-02 2.08E-01 1.57E-01
ZDT3
Variance 4.87E-04 1.33E-02 3.30E-03
Mean 3.92E-01 9.23E+00 8.35E+00
ZDT4
Variance 1.59E+00 3.72E+01 2.83E+01
Mean 2.74E-02 2.71E-01 2.47E-01
ZDT6
Variance 4.10E-03 4.17E-02 2.25E-02

4 Conclusion
In this paper, a novel multi-objective modified binary particle swarm optimization is
proposed. Compared with DBPSO, the proposed MMBPSO developed an improving
updating strategy which is simpler and more easily to implement. The mutation opera-
tor and dissipation operator are introduced to improve its search ability and keep the
diversity of algorithm. The modified global best and local best solutions updating
strategy help MMBPSO converge to the Pareto front better. Five well-known bench-
mark functions were adopted for testing the proposed algorithm. The experimental
results proved that the proposed MMBPSO can find better solutions than MDBPSO
and BMPSO. Especially, the superior of MMBPSO to MDBPSO demonstrated the
advantages of the developed updating strategy in terms of convergence and diversity.

Acknowledgments. This work is supported by Research Fund for the Doctoral Pro-
gram of Higher Education of China (20103108120008), the Projects of Shanghai
Science and Technology Community (10ZR1411800 & 08160512100), Mechatronics
Engineering Innovation Group project from Shanghai Education Commission, Shang-
hai University “11th Five-Year Plan” 211 Construction Project and the Graduate
Innovation Fund of Shanghai University (SHUCX102218).

References
1. Xiang, Y., Sykes, J.F., Thomson, N.R.: Alternative formulations for ptimal groundwater
remediation design. J. Water Resource Plan Manage 121(2), 171–181 (1995)
2. Das, D., Datta, B.: Development of multi-objective management models for coastal aqui-
fers. J. Water Resource Plan Manage 125(2), 76–87 (1999)
3. Erickson, M., Mayer, A., Horn, J.: Multi-objective optimal design of groundwater remed-
iation systems: application of the niched Pareto genetic algorithm (NPGA). Advances in
Water Resources 25(1), 51–65 (2002)
4. Sharaf, A.M., El-Gammal, A.: A novel discrete multi-objective Particle Swarm Optimiza-
tion (MOPSO) of optimal shunt power filter. In: Power Systems Conference and Exposi-
tion, pp. 1–7 (2009)
48 L. Wang et al.

5. Clerc, M., Kennedy, J.: The particle swarm—explosion, stabilityand convergence in a mul-
tidimensional complex space. IEEE Trans. Evol. Comput. 6(1), 58–73 (2002)
6. Kennedy, J., Eberhart, R.C.: A discrete binary version of the particle swarm algorithm,
Systems, Man, and Cybernetics. In: IEEE International Conference on Computational Cy-
bernetics and Simulation, vol. 5, pp. 4104–4108 (1997)
7. Peng, C., Xu, X.: A hybrid algorithm based on immune BPSO and N-1 principle for PMU
multi-objective optimization placement. In: Third International Conference on Electric
Utility Deregulation and Restructuring and Power Technologies, pp. 610–614 (2008)
8. Abdul Latiff, N.M., Tsimenidis, C.C., Sharif, B.S., Ladha, C.: Dynamic clustering using
binary multi-objective Particle Swarm Optimization for wireless sensor networks. In: IEEE
19th International Symposium on Personal, Indoor and Mobile Radio Communications,
pp. 1–5 (2008)
9. Wang, L., Wang, X.T., Fei, M.R.: An adaptive mutation-dissipation binary particle swarm
optimisation for multidimensional knapsack problem. International Journal of Modelling,
Identification and Control 8(4), 259–269 (2009)
10. Wang, L., Wang, X.T., Fu, J.Q., Zhen, L.L.: A Novel Probability Binary Particle Swarm
Optimization Algorithm and Its Application. Journal of Software 9(3), 28–35 (2008)
11. Qi, S., Jian, H.J., Chen, X.J., Guo, L.S., Ru, Q.Y.: Modified particle swarm optimization
algorithm for variable selection in MLR and PLS modeling: QSAR studies of antagonism
of angiotensin II antagonists. European Journal of Pharmaceutical Sciences 22(2-3), 145–
152 (2004)
12. Jahanbani Ardakani, A., Fattahi Ardakani, F., Hosseinian, S.H.: A novel approach for op-
timal chiller loading using particle swarm optimization. Energy and Buildings 40, 2177–
2187 (2008)
13. Li, X.: A non-dominated sorting particle swarm optimizer for multiobjective optimization.
In: The Genetic and Evolutionary Computation Conference, pp. 37–48 (2003)
14. Gong, M., Liu, C., Cheng, G.: Hybrid immune algorithm with Lamarckian local search for
multi-objective optimization. Memetic Computing 2(1), 47–67 (2010)
15. Deb, K., Jain, S.: Running performance metrics for evolutionary multi-objective optimiza-
tion. Technical Report, no. 2002004 (2002)
Improved Multiobjective Particle Swarm Optimization
for Environmental/Economic Dispatch Problem in
Power System*

Yali Wu, Liqing Xu, and Jingqian Xue

School of Automation and Information Engineering,


Xi’an University of Technology, shaanxi, China
[email protected],
[email protected],
[email protected]

Abstract. An improved particle swarm optimization based on cultural algo-


rithm is proposed to solve environmental/economic dispatch (EED) problem in
power system. Population space evolves with the improved particle swarm
optimization strategy. Three kinds of knowledge in belief space, named situ-
ational, normative and history knowledge are redefined respectively to accor-
dance with the solution of multi-objective problem. The results of standard test
systems demonstrate the superiority of the proposed algorithm in terms of the
diversity and uniformity of the Pareto-optimal solutions obtained.

Keywords: Environmental/economic dispatch, Cultural algorithm, Particle


swarm optimization, Multi-objective optimization.

1 Introduction
With the increasing concern of environmental pollution, operating at absolute mini-
mum cost can no longer be the only criterion for economic dispatch of electric power
generation. Environmental/economic dispatch (EED) is becoming more and more
desirable for not only resulting in great economical benefit, but also reducing the
pollutants emission [1]. However, minimizing the total fuel cost and total emission
are conflicting in nature and they cannot be minimized simultaneously. Hence, the
EED problem is a large-scale highly constrained nonlinear multi-objective optimiza-
tion problem.
Over the past decade, the meta-heuristic optimization methods have been signifi-
cantly used in EED primarily due to their nice feature of population-based search [2].
Many multi-objective evolutionary algorithms such as niched Pareto genetic algo-
rithm (NPGA) [3], non-dominated sorting genetic algorithm (NSGA) [4], strength
Pareto evolutionary algorithm (SPEA) [5] and NSGA-II [6, 7] have been introduced
to solve the EED problem with impressive success.
*
Manuscript received January 2, 2011. This work was supported by Natural Science Founda-
tion of Shaanxi Province (Grant No.2010JQ8006) and Science Research Programs of Educa-
tion Department of Shaanxi Province (Grant No.2010JK711).

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 49–56, 2011.
© Springer-Verlag Berlin Heidelberg 2011
50 Y. Wu, L. Xu, and J. Xue

As a new population-based algorithm, particle swarm optimization (PSO) has sev-


eral key advantages over the other existing optimization techniques in terms of sim-
plicity, convergence speed and robustness. Several PSO-based approaches [8-11]
have been proposed to solve the EED problem, but all the improved algorithms can-
not consider the effective utilization of evolution knowledge.
In this paper, an improved particle swarm optimization based on cultural algorithm
(CA-IMOPSO) is proposed to solve EED problem. Circular crowded sorting approach
is used to generate a set of well-distributed Pareto-optimal solutions, and the global
best individual in multi-objective optimization domain was redefined through a new
multi-objective fitness roulette technique. The evolutionary process of MOPSO in
population space is controlled by adaptive adjustment policy.

2 Problem Statement
The typical EED problem can be formulated as a bi-criteria optimization model. The
two conflicting objectives, i.e., fuel cost and pollutants emission, should be mini-
mized simultaneously while fulfilling certain system constraints. This problem is
formulated as follows.

2.1 Problem Objectives

Objective 1: Minimization of fuel cost. The total fuel cost F ( PG ) can be represented
as follows:

Fi ( PGi ) = ai + bi PGi + ci PGi 2 (1)

Where M is the number of generators committed to the operating system, ai , bi , ci are


fuel cost coefficients of i -th generator, and PG is the real power output of the i -th
i

generator.

Objective 2: Minimization of pollutants emission. The emissions can be modeled


through a combination of polynomial and exponential terms [12].

Ei ( PGi ) = α i + β i PGi + γ i PGi 2 + ξi exp(λi PGi ) (2)

Where αi , βi , γ i ,ξi , λi are coefficients of the i -th generator emission characteristics.

2.2 Problem Constraints

Constraint 1: Generation capacity constraint. For normal system operations, real


power output of each generator is restricted by lower and upper bounds as follows:
PGmin
i
≤ PGi ≤ PGmax
i
(3)

where PGimin and PGimax are the minimum and maximum power generated by generator i ,
respectively.
Improved Multiobjective Particle Swarm Optimization 51

Constraint 2: Power balance constraint. The total power generation must cover the
total demand PD and the real power loss in transmission lines PLoss.
NG
PD + PLOSS − ∑ PGi = 0 (4)
i =1

NG NG NG
PLOSS = ∑∑ PGi Bij PGi + ∑ PGi B0i + B00 (5)
i =1 j =1 i =1

where Bij is the transmission loss coefficient, B0i is the i -th element of the loss coeffi-
cient vector and B00 is the loss coefficient constant.

3 Improved Algorithm (CA-IMOPSO)


The structure of CA-IMPSO algorithm is shown in fig.1 [13], in which population
space and belief space are linked through acception function and influence function.

Adjust
Belief
space

Acceptance Influence

Population Performance
Selection
space Function
Variation

Fig. 1. Spaces of a cultural algorithm

3.1 Particle Swarm Optimization (PSO) in the Population Space

A particle status on the population space is characterized by two factors: its position
and velocity, which are updated by following equations [14]:
vid (t + 1) = wvid (t ) + c1r1d (φid (t ) − xid (t )) + c2 r2 d (φgd (t ) − xid (t )) (6)

xid (t + 1) = xid (t ) + vid (t + 1) (7)


where vid represents the d -th dimensional velocity of particle i ; xid represents the
d -th dimensional position values of particle i ; φid represents the best previous posi-
tion of particle i ; φgd represents the best position among all particles in the popula-
tion. r1d and r2 d are two independently uniformly distributed random variables with
range [0, 1]; c1 and c2 are acceleration coefficients; w is the inertia weight.

3.2 The Knowledge of the Belief Space

Three knowledge sources, named situational knowledge, normative knowledge and


history knowledge, are considered in belief space.
52 Y. Wu, L. Xu, and J. Xue

Situational Knowledge. Situational knowledge is a set of exemplary individuals


useful for the experiences of all individuals. So the initial situational knowledge is
choosen from the nondominated set of the population space depending on the
diversity and uniformity.
The variation operator of differential evolution techniques is used to update the
situational knowledge, that is:
xi' = xi + F ⋅ ( pi , r1 + pi , r 2 ) (8)

where xi is the i -th individual in the situational knowledge, and pi , r1 , pi , r 2 are differ-
ent particles in the nondominated set. F ∈ [ 0,1] is the ratio factor of differential
evolution.
If xi' dominates xi , then xi' replaces xi . If neither of them dominates each other,
select the new individual at random.

Normative Knowledge. Normative knowledge consists of a set of promising ranges.


It provides a standard guiding principle within which individual adjustments can be
made. Normative knowledge is updated in two means:
1) For the i -th dimension of each nondominated individual, if the fitness is located in
the iteration, we generate a child around it. Else, we generate a child randomly on
homogeneous distribution in the iteration;
2) Replace the i-th dimension of a particle with its minimum and maximum velocities
to generate two children. If one child dominates the particle and the other child, then
adopt it as the new non-dominated individual; Else if both of the children dominates
the particle but non-dominates each other, then choose one of them as the new non-
dominated individual.

History Knowledge. History knowledge keeps track of the history of the search
process and records key events in the search. It might be either a considerable move
in the search space or a discovery of a landscape change. Individuals use the history
knowledge for guidance in selecting a moving direction [11].
The history knowledge will be used later to adapt the distribution of the individuals
after finding the Pareto-front.

3.3 Communication Protocol

Acceptance Function. The global worst one of the belief space is replaced by the
global best of the population space every Acc generation.
Acc = Bnum + t / Tmax × Dnum (9)

Where Bnum and Dnum are two constants. The global best of population space is the
least number of the individual. And the global worst of the belief space is the individ-
ual with the shortest crowding distance in Pareto-front.
Improved Multiobjective Particle Swarm Optimization 53

Influence Function. After each Inf generation, the global worst one of the
population space is replaced by the global best of the belief achieve.
Inf = Bnum + (Tmax − t ) / Tmax × Dnum (10)
The global best individual of the belief space is the one with the longest crowding
distance in Pareto-front. And the global worst individual of the population space is the
one with the largest number.

3.4 Archiving Mechanism

The non-dominated solutions of the archive are composed of two parts. Some of them
are new non-dominated solutions in population space; the others are new non-
dominated solutions in belief space. A circular crowding sorting algorithm is adopted
in this paper to improve the uniformity of Pareto optimal front.

4 Implement of CA-IMOPSO for EED Problem


In this section, the proposed algorithm was applied to the standard IEEE 30-bus six-
generator test system [15]. This power system is connected through 41 transmission
lines and the total system demand amounts to 2.834 p.u.

4.1 Encoding of Particles

The first step is the encoding of the decision variables. The power output of each
generator is selected as the gene, and many genes comprise a particle which repre-
sents a candidate solution for the EED problems. That is, every particle j consists of N
real coded string such as x j = {PG1, j , PG 2, j ,......PGM , j } , where PGi , j , i = 1, 2,", M means the
power output of the i -th generator with respect to the j -th particle.

4.2 Parameter Setting of the Proposed Algorithm

The inertia weight ω , acceleration coefficient c1 and c2 is defined as follows.


ω = ωmax − ((ωmax − ωmin ) / Tmax )t (11)

c1 = (c1 f − c1i )t / Tmax + c1i (12)

c2 = (c2 f − c2i )t / Tmax + c2i (13)


Where t is the current iteration number and Tmax is the maximum iteration number.
The other parameters are set in the following. The size of swarm and archive set
are both fixed at 50. ω min = 0.4 , ω max = 0.9 , c1i = 2.5, c1 f = 0.5, c2i = 0.5, c2 f = 2.5 ,
F = 0.5 , Tmax = 7000 .
54 Y. Wu, L. Xu, and J. Xue

5 Simulation Results and Discussions


Two cases have been considered. In Case 1, the generation capacity and the power
balance constraints with neglecting Ploss are considered; in Case 2, the generation
capacity and the power balance constraints with considering Ploss are considered.

5.1 Multiobjective Optimization Using CA-IMOPSO

In next page, the Pareto-optimal sets are shown in Figs.2(a) for case 1 and in Figs.2(b)
for case 2. It can be seen that the CA-IMOPSO technique preserves the diversity and
uniformity of the Pareto-optimal front and solve effectively the problem in both cases
considered.
The non-dominated solutions with CA-IMOPSO for case 1 and case 2 are
compared to those reported in the literature [10], [4], [3], [5]. And the best two non-
dominated solutions with the proposed approach and those reported for case 1 and
case 2 are given in Table 1 and 2 respectively.

0.225 0.225

0.22 0.22

0.215 0.215
Emission (ton/hr)
Emission (ton/hr)

0.21 0.21

0.205 0.205

0.2 0.2

0.195 0.195

0.19 0.19
600 610 620 630 640 600 610 620 630 640 650
Fuel Cost ($/hr) Fuel Cost ($/hr)
(a) Pareto-optimal front for case 1 (b) Pareto-optimal front for case 2

Fig. 2. The pareto-optimal front of EED problem

Table 1. The result of minimum cost in case 1 with different algorithms

FCPSO [10] NSGA [4] NPGA [3] SPEA [5] MOPSO CA-IMOPSO

P1 0.1070 0.1567 0.1080 0.1099 0.1091 0.1059

P2 0.2897 0.2870 0.3284 0.3186 0.3120 0.2983

P3 0.525 0.4671 0.5386 0.5400 0.5259 0.5264

P4 1.015 1.0467 1.0067 0.9903 1.0041 1.0197

P5 0.5300 0.5037 0.4949 0.5336 0.5224 0.5276

P6 0.3673 0.33729 0.3574 0.36507 0.3603 0.3557

Cost 600.1315 600.572 600.259 600.22 600.138 600.115

Emission 0.22226 0.22282 0.22116 0.2206 0.2211 0.2226


Improved Multiobjective Particle Swarm Optimization 55

Table 2. The result of minimum cost in case 2 with different algorithms

FCPSO [10] NSGA [4] NPGA [3] SPEA [5] MOPSO CA-IMOPSO

P1 0.1130 0.1168 0.1245 0.1279 0.1139 0.1209

P2 0.3145 0.3165 0.2792 0.3163 0.3252 0.2894

P3 0.5826 0.5441 0.6284 0.5803 0.6161 0.5814

P4 0.9860 0.9447 1.0264 0.9580 0.9689 0.9894

P5 0.5264 0.5498 0.4693 0.5258 0.4998 0.5220

P6 0.3450 0.3964 0.3993 0.3589 0.3464 0.3557

Cost 607.7862 608.245 608.147 607.86 608.790 605.881

Emission 0.2201 0.2166 0.2236 0.2176 0.2191 0.2203

From the table we can conclude that the proposed CA-IMOPSO technique is supe-
rior to all reported techniques. And it demonstrates the potential and effectiveness of
the proposed technique to solve EED problem.

6 Conclusion
In this paper, a novel multiobjective particle swarm optimization technique based on
cultural algorithm has been proposed and applied to environmental economic dispatch
optimization problem. The results of the EED problem show the potential and effi-
ciency of the proposed algorithm. In addition, the simulation results also reveal the
superiority of the proposed algorithm in terms of the diversity and quality of the ob-
tained Pareto-optimal solutions.

References
1. Talaq, J.H., EI-Hawary, F., EI-Hawary, M.E.: A summary of environmental/economic dis-
patch algorithms. J. IEEE Trans. Power Syst. 9(3), 1508–1516 (1994)
2. Lingfeng, W., Chanan, S.: Environmental/economic power dispatch using a fuzzified
multi-objective particle swarm optimization algorithm. J. Electr. Power Syst. Research 77,
1654–1664 (2007)
3. Abido, M.A.: A niched Pareto genetic algorithm for multiobjective environ-
mental/economic dispatch. J. Electr. Power Energy Syst. 25(2), 97–105 (2003)
4. Abido, M.A.: A novel multiobjective evolutionary algorithm for environmental/ economic
power dispatch. J. Electr. Power Syst. Research 65, 71–91 (2003)
5. Abido, M.A.: Multiobjective evolutionary algorithms for electric power dispatch problem.
J. IEEE Trans. Evolut. Comput. 10(3), 315–329 (2006)
6. King, R.T.F., Rughooputh, H.C.S., Deb, K.: Evolutionary multi-objective environ-
mental/Economic dispatch: Stochastic versus deterministic approaches. In: Coello Coello,
C.A., Hernández Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS, vol. 3410, pp. 677–691.
Springer, Heidelberg (2005)
56 Y. Wu, L. Xu, and J. Xue

7. Basu, M.: Dynamic economic emission dispatch using nondominated sorting genetic algo-
rithm-II. J. Electr. Power. Energy Syst. 30(2), 140–210 (2008)
8. Wang, L.F., Singh, C.: Environmental/economic power dispatch using a fuzzified multi-
objective particle swarm optimization algorithm. J. Electr. Power Syst. Res. 77(12), 1654–
1664 (2007)
9. Cai, J.J., Ma, X.Q., Li, Q., Li, L.X., Peng, H.P.: A multi-objective chaotic particle swarm
optimization for environmental/economic dispatch. J. Energy Convers Manage. 50(5),
1318–1325 (2009)
10. Agrawal, S., Panigrahi, B.K., Tiwari, M.K.: Multiobjective particle swarm algorithm with
fuzzy clustering for electrical power dispatch. J. IEEE Trans. Evolut. Comput. 12(5),
529–541 (2008)
11. Daneshyari, W., Yen, G.G.: Cultural MOPSO: A cultural framework to adapt parameters
of multiobjective particle swarm optimization. In: C. IEEE Congress. on Evolut. Comput.,
pp. 1325–1332 (2009)
12. Farag, A., Al-Baiyat, S., Cheng, T.C.: Economic load dispatch multiobjective optimization
procedures using linear programming techniques. J. IEEE Trans. Power Syst. 10(2),
731–738 (1995)
13. Landa, B., Carlos, A., Coello, C.: Cultured differential evolution for constrained optimiza-
tion. J. Comput Methods in Applied Mechanics and Engine 195, 4303–4322 (2006)
14. Yunhe, H., Lijuan, L., Yaowu, W.: Enhanced particle swarm optimization algorithm and
its application on economic dispatch of power systems. J. Proc. of CSEE 24(7), 95–100
(2004)
15. Hemamalini, S., Simon, S.P.: Emission Constrained Economic Dispatch with Valve-Point
Effect using Particle Swarm Optimization. In: C. IEEE Region. 10 Confer., pp. 1–6 (2008)
A New Multi-Objective Particle Swarm
Optimization Algorithm for Strategic Planning
of Equipment Maintenance

Haifeng Ling1,2 , Yujun Zheng3 , Ziqiu Zhang4 , and Xianzhong Zhou1


1
School of Management & Engineering, Nanjing University, Nanjing 210093, China
2
PLA University of Science & Technology, Nanjing 210007, China
3
College of Computer Science & Technology, Zhejiang University of Technology,
Hangzhou 310014, China
4
Armament Demonstration & Research Center, Beijing 100034, China
ling [email protected], [email protected], [email protected],
[email protected]

Abstract. Maintenance planning plays a key role in equipment op-


erational management, and strategic equipment maintenance planning
(SEML) is an integrated and complicated optimization problem consist-
ing of more than one objectives and constraints. In this paper we present
a new multi-objective particle swarm optimization (PSO) algorithm for
effectively solving the SEML problem model whose objectives include
minimizing maintenance cost and maximizing expected mission capabil-
ity of military equipment systems. Our algorithm employs an objective
leverage function for global best selection, and preserves the diversity of
non-dominated solutions based on the measurement of minimum pair-
wise distance. Experimental results show that our approach can achieve
good solution quality with low computational costs to support effective
decision-making.

1 Introduction

Maintenance planning plays a key role in equipment operational management.


In general, strategic equipment maintenance planning (SEMP) involves vari-
ous kinds of equipment, different maintenance policies and quality evaluation
measures, and complex sets of constraints, and thus is typically modeled as a
complicated multi-objective optimization problem.
Despite this, most researches on equipment maintenance planning (e.g.,
[4,7,15]) still focus on the narrow use of single approaches such as expert conver-
sation, mathematical programming, proration, case based reasoning, etc, which
have serious limitations in expressing relationships and tradeoffs between the
objectives. In recent years, some nature-inspired heuristic methods such genetic
algorithm, simulated annealing, and ant colony optimization have been applied

This work was supported in part by grants from National Natural Science Foundation
(No. 60773054, 61020106009, 90718036) of China.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 57–65, 2011.

c Springer-Verlag Berlin Heidelberg 2011
58 H. Ling et al.

in the domain of equipment maintenance and have shown their advantages both
in problem-solving effectiveness and solution quality. For example, Kleeman and
Lemont [9] designed a multi-objective genetic algorithm to solve the aircraft en-
gine maintenance scheduling problem, which is a combination of a modified job
shop problem and a flow shop problem. Verma and Ramesh [14] viewed the initial
scheduling of preventive maintenance as a constrained non linear multi-objective
decision making problem, and proposed a genetic algorithm that simultaneously
optimizes the objectives of reliability, cost and newly introduced criteria, non-
concurrence of maintenance periods and maintenance start time factor. Yang
and Huang [16] also proposed a genetic algorithm for multi-objective equipment
maintenance planning, but the model used a simplified function that evaluates
equipment capability based on equipment cost and thus limited its practicality.
Ai and Wu [1] used a hybird approach based on simulated annealing and ge-
netic algorithm for communication equipment maintenance planning, the they
did not consider multiple objectives. Recently the authors presented in [19] an ef-
ficient multi-objective tabu search algorithm, which was capable of solving large
problems with more than 45000 equipments of 500 kinds.
Particle swarm optimization (PSO) [8] is a population-based global optimiza-
tion technique that enables a number of individual solutions, called particles, to
move through a hyper dimensional search space in a methodical way to search for
optimal solution(s). Each particle represents a feasible solution which has a posi-
tion vector x and a velocity vector v, which are adjusted at iteration by learning
from a local best pbest found by the particle itself and a current global best g best
found by the whole swarm. PSO is conceptually simple and easy to implement,
and has demonstrated its efficiency in a wide range of continuous and combina-
torial optimization problems [2]. Since 2002, multi-objective PSO (MOPSO) has
attracted much attention among researchers and has shown promising results
for solving multi-objective optimization problems (e.g., [3,13,11,6]).
In this paper we define a multi-objective integer programming model for
MESP which considers objectives include minimizing maintenance costs (in-
cluding costs of maintenance materiel and workers) and maximizing expected
mission capability of equipment systems (via layered quadratic functions). We
then propose a MOPSO algorithm for the problem model, which uses an ob-
jective leverage function for global best selection and preserves the diversity of
non-dominated solutions based on the measurement of minimum pairwise dis-
tance. Experimental results show that our approach can achieve good solution
quality with low computational costs to support effective decision- making.

2 Problem Model
2.1 Problem Description and Hypothesis
SEMP needs to determine the numbers of different kinds of equipment to be
maintained at different levels according to the overall mission requirements and
the current conditions of all equipment [18]. There are two key aspects to assess
A New Multi-Objective Particle Swarm Optimization Algorithm 59

an SEMP solution: the overall maintenance cost and the overall mission capa-
bility after maintenance. Thus SEMP is typically a multi-objective optimization
problem, for which the improvement of one objective may cause the degradation
of another.
On of the basic principles of equipment maintenance is to assign each
equipment to an appropriate maintenance level according to the quality of the
equipment. In this paper, we roughly suppose there are three quality levels of
equipment, namely A, B, and C, and two maintenance levels, namely I and II;
Typically, equipment of quality level A does not need to be maintained, and
equipment of quality level C and B should be maintained at the level I and II
respectively1 .

2.2 Cost Evaluation


Suppose there are m kinds of equipment to be maintained and n kinds of materiel
to be consumed for maintenance, then we can define two materiel to be materiel-
consuming matrices, namely Z  = (zij 
)m×n and Z  = (zij 
)m×n , where zij and

zij denote materiel j consumed for equipment i at the maintenance level I and
II respectively. Let cj be the price of material j, xi be the number of equipment
i to be maintained at level I and xi be that at level II, then the total cost of
maintenance material can be calculated as follows:
m 
 n
   
CE = cj (zij xi + zij xi ) (1)
i=1 j=1

Moreover, let ti be the number of (average) maintenance hours of equipment


i to be maintained at level I, si be the relevant cost (in RMB Yuan) per hour,
and ti and si be that at level II, then the total working hour and working cost
can be respectively calculated by (2) and (3):
m

MT = (ti xi + ti xi ) (2)
i=1

m

CT = (si ti xi + si ti xi ) (3)
i=1
And thus the overall maintenance cost C = CE + CT .

2.3 Mission Capability Evaluation


Let xA B C
i , xi , and xi be the current numbers of equipment i at the quality level
A, B and C respectively, and αi , βi , and γi be the wearing coefficients during a
given period respectively2 . Given the maintenance number xi and xi , after the
1
The model based on this hypothesis can be easily extended to include more quality
and maintenance levels.
2
In detail, αi is the probability of degradation from quality A to B, βi is that from
B to C, and γi is that from A to C. All coefficients are in range (0,1).
60 H. Ling et al.

given period the expected numbers of equipment i of the quality level A, B, and
C are respectively calculated as follows:

A
x A  
i = (1 − αi − γi )(xi + xi + xi ) (4)
B
x A  
i = αi (xi + xi + xi ) + (1 − βi )xi
B
(5)
C
x C B A  
i = xi + βi xi + γi (xi + xi + xi ) (6)

Now for equipment i, its mission capability can be evaluated based on the
i of the numbers of equipment at different quality level as follows
weighted sum x
(For most equipment the weight wiA can be set to 1 and the weight wiC is very
small):
i = wiA x
x A B B
i + wiC x
i + wi x C
i (7)
And the mission capability I of the whole equipment system can be evaluated
using the quadratic function as follows:
m 
 m m

I= i x
aij x j + i + c
bi x (8)
i=1 j=1 i=1

where aij is the correlation coefficient, bi is the covariance coefficient, and c is


the constant coefficient. In real-world applications, we usually have aij = 0 for
most i and j, and thus the number of coefficients is typically far less than m2 .

2.4 The Multi-Objective Optimization Problem Model


Based on above analysis, we get the following multi-objective optimization prob-
lem model for SEMP:
m 
 m m

max I = i x
aij x j + i + c
bi x (9)
i=1 j=1 i=1
m 
 n m

   
min C = cj (zij xi + zij xi ) + (si ti xi + si ti xi ) (10)
i=1 j=1 i=1
s.t. I ≥ I (11)
C≤C (12)
MT ≤ M T (13)
m

xi ≤ X  (14)
i=1
m
xi ≤ X  (15)
i=1

where I is the lower limit of the overall mission capability, C is the upper limit
of the overall cost, M T is the upper limit of the total working hour, and X  and
X  are the upper limits of the numbers of equipment can be maintained at level
I and II respectively.
A New Multi-Objective Particle Swarm Optimization Algorithm 61

3 MOPSO Algorithm
3.1 The Algorithm Framework
The SEMP model described above is a multi-objective integer programming
problem. Although the standard PSO algorithm works on continuous variables,
the truncation of real values to integers will not affect significantly the perfor-
mance of the method when the range of decision variables is large [10]. The
following presents our MOPSO algorithm for the SEMP that searches for the
Pareto-optimal front rather than a single optimal solution:
1. Set the basic algorithm parameters, and randomly generate a swarm P of p
feasible solutions.
2. For each particle η in the swarm, initialize its velocity v η = 0, and set pηbest
be its initial position xη .
3. Select all non-dominated solutions from P and save them in the archive N P .
4. Choose a solution g best from N P such that:
g best = max{θ ∈ N P |w1 I(θ) − w2 C(θ)} (16)
where and w1 and w2 are two preference weights satisfying w1 , w2 ∈ (0, 1)
and that w1 + w2 = 1.
5. Update the velocity and position of each η in P according to the following
movement equations:
v η = wv η + c1 r1 (pηbest − xη ) + c2 r2 (g best − xη ) (17)
η η η
x =x +v (18)
where w is the inertia weight, c1 and c2 are learning factors, and r1 and r2
are random values between (0, 1).
6. If the position xη violates the problem constraints (11)∼(15), reset xη =
pηbest and reset v η = 0.
η
7. Update each local best solution pbest . 
8. Compute SI = θ∈N P I(θ) and SC = θ∈N P C(θ).
9. Update the non-dominated solution set N P based on the new swarm, and
then compute ΔSI = θ∈N P I(θ) − SI and ΔSC = SC − θ∈N P C(θ).
10. If the termination condition is satisfied, then the algorithm stops; else update
the inertia weight according to the following equations and then goto step 4:
k
w = wmax − max (wmax − wmin ) (19)
 k
min(w1 + 0.1, w1max ) if ΔSI < ΔSC
w1 = (20)
max(w1 − 0.1, w1min ) else
w2 = 1 − w1 (21)
where k is the current iteration number, k max is the maximum iteration
number of the algorithm, wmax and wmin are the maximum and minimum
inertia weights respectively, and w1max and w1min are the maximum and min-
imum first-preference weights respectively.
62 H. Ling et al.

In the algorithm we use an “objective leverage function” L(θ) = w1 I(θ)−w2 C(θ)


for global best selection, and a solution θ whose value of L(θ) is the maximum
among all solutions in N P is selected as g best . Typically, w1 and w2 can both
be initialized to 0.5, or be manually set based on the preference of the decision-
maker. After each iteration, we compute ΔSI , the increasing of summarized
value objective function I of N P , and ΔSC , the decreasing of summarized value
objective function C of N P . If ΔSI is less than ΔSC , we increase the preference
weight w1 and decrease w2 , and vice versa. This strategy significantly decreases
the computational cost for global best selection since at each time it uses only
one non-dominated solution as the g best for all particles; on the other hand,
the changing of values of w1 and w2 according to the ΔSI and ΔSC helps to
simultaneously evolve the two objectives and thus to improve the diversity of
the solutions. However, to ensure the strategy works effectively, we should scale
those coefficients in (9) and (10) such that I and C are of the same order of
magnitude.

3.2 Size and Diversity of the Solutions Set


During the search procedure, the size of approximating non-dominated solution
set N P may increase rapidly and the performance of search will decrease signif-
icantly. Therefore, a reasonable approach is to limit the size of the solution set,
which will cause us to decide whether to insert a new non-dominated solution η
when the size of N P reaches the limit and, if so, which archive solution θ should
be removed.
In our MOPSO algorithm, we use the diversity of solutions in N P as the
criterion, i.e., a θ ∈ N P should be replaced with η if the diversity can be (poten-
tially) improved by doing so, since the preservation and improvement of diversity
of which is crucial not only to avoid loosing potentially efficient solutions but
also avoid premature convergence. Here we employ a simple approach based on
the minimum pairwise distance [5] which is with low computational costs. In
detail, whenever N P contains more than one member, it records two solutions
θa and θb , the Euclidean distance between which is the minimum among all pairs
in N P :
dis(θa , θb ) = min dis(x, y) (22)
x,y∈N P ∧x=y

When the size of N P reaches the size limit |N P |max , the following procedure
is applied for possible inclusion of a new solution η:
1. If η is dominated by any θ ∈ N P , then η is discarded.
2. Else if η dominates some θ ∈ N P , then remove those θ and insert η.
3. Else if minθ∈N P ∧θ=θa dis(η, θ) > dis(θa , θb ), then remove θa and insert η.
4. Else if minθ∈N P ∧θ=θb dis(η, θ) > dis(θa , θb ), then remove θb and insert η.
5. Else choose a closet z ∈ N P to η; if minθ∈N P ∧θ=z dis(η, θ) > dis(η, z), then
remove z and insert η.
6. Else discard η.
A New Multi-Objective Particle Swarm Optimization Algorithm 63

m
Table 1. Parameter setting in the algorithms, where M = i=1 xA B C
i + xi + xi is the
total number of equipment

Algorithm |N P |max kmax p c1 c2 wmax wmin w1max w1min



MOPSO-A max( M/10, 50) M/2 m/10 0.8 1.0 0.9 0.1 0.9 0.1

MOPSO-B max( M/10, 50) M/2 m/10 0.8 1.0 0.9 0.1

MOTS max( M/10, 50) M

Table 2. Computational experiments conducted on the test problem instances

m M MOPSO-A MOPSO-B MOTS


∗ ∗ ∗ ∗
t |N P | I C t |N P | I C t |N P | I∗ C∗
50 300 0.1 5 30.6 11.4 0.1 5 30.6 11.4 0.1 5 30.6 11.4
50 900 0.2 9 75.2 35.7 0.2 9 75.2 35.7 0.1 9 75.2 35.7
100 1800 0.7 13 24.3 89.0 0.7 13 24.3 89.0 0.6 13 24.3 89.0
100 9750 2.4 28 45.9 12.6 2.7 26 40.1 12.6 2.2 30 45.9 14.3
200 3800 2.6 18 14.9 12.1 2.7 17 14.6 12.1 2.6 18 14.9 12.1
200 17200 21.8 36 92.5 42.2 26.1 39 79.0 42.2 42.2 40 73.6 42.2
300 4900 22.2 20 80.4 89.6 26.6 23 72.1 93.2 40.5 23 72.1 87.5
300 23600 63.6 39 54.2 44.7 60.5 33 52.3 43.8 181.7 37 53.9 46.7
500 9500 72.1 27 36.7 25.0 74.1 29 36.3 25.2 208.0 30 36.9 28.1
500 45800 292.4 46 50.0 12.5 329.2 50 48.7 12.8 1480.9 48 49.9 12.7
800 13650 283.3 36 34.9 22.2 280.7 36 34.9 22.2 1675.8 35 29.6 22.0
800 64000 2791.2 49 75.3 48.8 3039.4 50 70.1 48.8
1000 15000 2846.6 38 29.4 14.2 3022.3 38 27.3 14.2
1000 82000 6558.9 50 69.3 60.0

4 Computational Experiments
The presented MOPSO algorithm (denoted by MOPSO-A) has been tested on
a set of SEMP problem instances and compared with two other algorithms:

– A basic MOPSO algorithm (denoted by MOPSO-B) where the g best is ran-


domly selected from N P .
– A multi-objective tabu search algorithm (denoted by MOTS) proposed in
[19].
64 H. Ling et al.

The experiments are conducted on a computer of 2 × 2.66GHz AMD Athlon64


X2 processor and 8GB memory. The basic parameter values are given in
Table 1. The performance measures include the CPU time t (in seconds), the
number of non-dominated solutions |N P |, the result maximum mission capa-
bility I ∗ and minimum maintenance cost C ∗ . In order to improve the clarity
of comparison, all the values of I ∗ and C ∗ are scaled into the range (0, 100).
The summary of experimental results are presented in Table 2 (the maximum
running time on every instance is 2 hours, and that “ ” denotes the algorithm
fails to stop within the time).
As we can see from the computational results, for small-size problem instances
where m ≤ 100 and M ≤ 1800, all the three algorithms reach the same Pareto-
optimal front; but with the increasing of instance size, two PSO algorithms
exhibit significant performance advantage over the tabu search algorithm; for
large-size problems, MOPSO-A also exhibits certain performance advantage over
MOPSO-B. On the other hand, the result I ∗ obtained by MOPSO-A is always
no less than that obtained by MOPSO-B, and C ∗ obtained by MOPSO-A is
always no more than that obtained by MOPSO-B except for one case (which
is italicized in Table 2). This demonstrate that our strategy from global best
selection plays an important role for improving the quality of result solutions.

5 Conclusion

The paper presents an effective multi-objective particle swarm optimization


(PSO) algorithm for solving the SEML problem model. Our algorithm employs
an objective leverage function for global best selection and preserves the diver-
sity of non-dominated solutions based on the measurement of minimum pairwise
distance, and thus decreases the computational cost and improve the quality
of result solution set. As demonstrated by the experimental results, the pro-
posed algorithm are quite efficient even for large-size problem instances. We are
now extending the algorithm by introducing non-dominated sorting method [11]
which will increase the computational cost but can evolve the swarm more close
to the true Pareto front, and thus is more appropriate for medium-size problem
instances. Further research will also include the fuzziness of maintenance costs
and mission capability to decrease the sensitivity of the model and improve the
adaptivity of the algorithm.

References
1. Ai, B., Wu, C.: Genetic and simulated annealing algorithm and its application
toequipment maintenace resource optimization. Fire Control & Command Con-
trol 35(1), 144–145 (2010)
2. Clerc, M.: Particle Swarm Optimization. ISTE, London (2006)
3. Coello, C.A.C., Lechuga, M.S.: MOPSO: A proposal for multiple objective particle
swarm optimization. In: Proceedings of Congress on Evolutionary Computation,
vol. 2, pp. 1051–1056. IEEE Press, Los Alamitos (2002)
A New Multi-Objective Particle Swarm Optimization Algorithm 65

4. Fletcher, J.D., Johnston, R.: Effectiveness and cost benefits of computer-based


decision aids for equipment maintenance. Comput. Human Behav. 18, 717–728
(2002)
5. Hajek, J., Szollos, A., Sistek, J.: A new mechanism for maintaining diversity of
Pareto archive in multi-objective optimization. Adv. Eng. Softw. 41, 1031–1057
(2010)
6. Ho, S.-J., Ku, W.-Y., Jou, J.-W., Hung, M.-H., Ho, S.-Y.: Intelligent particle swarm
optimization in multi-objective problems. In: Ng, W.-K., Kitsuregawa, M., Li, J.,
Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 790–800. Springer,
Heidelberg (2006)
7. Jayakumar, A., Asgarpoor, S.: Maintenance optimization of equipment by linear
programming. Prob. Engineer. Inform. Sci. 20, 183–193 (2006)
8. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of IEEE
International Conference on Neural Networks, Perth WA, Australia, pp. 1942–1948
(1995)
9. Kleeman, M.P., Lamont, G.B.: Solving the aircraft engine maintenance scheduling
problem using a multi-objective evolutionary algorithm. In: Coello, C.C., Aguirre,
A.H., Zitzler, E. (eds.) EMO 2005. LNCS, vol. 3410, pp. 782–796. Springer, Hei-
delberg (2005)
10. Laskari, E.C., Parsopoulos, K.E., Vrahatis, M.N.: Particle swarm optimization for
integer programming. In: Proceedings of Congress on Evolutionary Computing, pp.
1582–1587. IEEE Press, Los Alamitos (2002)
11. Li, X.: A non-dominated sorting particle swarm optimizer for multiobjective
optimization. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R.,
O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener,
J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller,
J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2723, pp. 37–48. Springer,
Heidelberg (2003)
12. Liu, D., Tan, K., Goh, C., Ho, W.: A multiobjective memetic algorithm based on
particle swarm optimization. IEEE Trans. Syst. Man. Cybern. B 37, 42–50 (2007)
13. Parsopoulos, K.E., Vrahatis, M.N.: Particle dwarm optimization method in mul-
tiobjective problems. In: Proceedings of the 2002 ACM Symposium on Applied
Computing, pp. 603–607. ACM Press, New York (2002)
14. Verma, A.K., Ramesh, P.G.: Multi-objective initial preventive maintenance
scheduling for large engineering plants. Int. J. Reliability Quality & Safety En-
gineering 14, 241–250 (2007)
15. Xu, L., Han, J., Xiao, J.: A combinational forecasting model for aircraft equipment
maintenance cost. Fire Control & Command Control 33, 102–105 (2008)
16. Yang, Y., Huang, X.: Genetic algorithms based the optimizing theory and ap-
proaches to the distribution of the maintenance cost of weapon system. Math.
Prac. Theory 24, 74–84 (2002)
17. Yu, G., Li, P., He, Z., Sun, Y.: Advanced evolutionary algorithm used in multi-
objective constrained optimization problem. Comput. Integ. Manufact. Sys. 15,
1172–1178 (2009)
18. Zhang, Z., Wang, J., Duan, X., et al.: Introduction to Equipment Technical Sup-
port. Military Science Press, Beijing (2001)
19. Zheng, Y., Zhang, Z.: Multi-objective optimization model and algorithm for equip-
ment maintenance palnning. Comput. Inter. Manufact. Sys. 16, 2174–2180 (2010)
Multiobjective Optimization for Nurse Scheduling

Peng-Yeng Yin*, Chih-Chiang Chao, and Ya-Tzu Chiang

Department of Information Management, National Chi-Nan University


Nantou 54561, Taiwan
[email protected]

Abstract. It is laborious to determine nurse scheduling using human-involved


manner in order to account for administrative operations, business benefits, and
nurse requests. To solve this problem, a mathematical formulation is proposed
where the hospital administrators can set multiple objectives and stipulate a set
of scheduling constraints. We then present a multiobjective optimization
method based on the cyber swarm algorithm (CSA) to solve the nurse schedul-
ing problem. The proposed method incorporates salient features from particle
swarm optimization, adaptive memory programming, and scatter search to cre-
ate benefit from synergy. Two simulation problems are used to evaluate the per-
formance of the proposed method. The experimental results manifest that the
proposed method outperforms NSGA II and MOPSO in terms of convergence
and diversity performance measures of the produced results.

Keywords: cyber swarm algorithm, adaptive memory programming, scatter


search, multiobjective optimization, nurse scheduling.

1 Introduction
Nurse scheduling, which is among many other types of staff scheduling, intends to
automatically allot working shifts to available nurses in order to maximize hospital
value/benefit subject to relevant constraints including governmental regulations, nurse
skill requirement, minimal on-duty hours, etc. There are several solution methods
proposed in the last decade for dealing with the nurse scheduling problem. These
methods can be divided into three categories: mathematical programming, heuristics,
and metaheuristics. Most of the methods aimed to solve a single-objective formula-
tion, only few of them [1-4] addressed a more complete description of real-world
hospital administration and attempted multiobjective formulation of nurse scheduling.
Nevertheless, due to the high complexity of multiobjective context, the authors of
[1-3] converted the multiobjective formulation into a single-objective program by the
weighting-sum technique. The weighting-sum technique fails to identify optimal
solutions if the Pareto front is non-convex and the value of the weights used to com-
bine multiple objectives is hard to determine.
This paper proposes a cyber swarm algorithm (CSA) for the Multi-Objective Nurse
Scheduling Problem (MONSP). The CSA is a new metaheuristic approach which
marries the major features of particle swarm optimization (PSO) and scatter search.

*
Corresponding author.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 66–73, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Multiobjective Optimization for Nurse Scheduling 67

The CSA has been shown to be more effective than several state-of-the-art methods
for single-objective optimization [5]. The contribution of this paper includes the
following. (1) We devise a multiobjective version for the CSA. The proposed method,
named MOCSA, is general and can be employed to solve many classes of problems
with multiobjective context; (2) we show the effectiveness of MOCSA in tackling the
generic multiobjective nurse scheduling problem. The non-dominated solutions ob-
tained by MOCSA are superior to those produced by other competing methods in
terms of the dominance strength and the diversity measure on the solution front; and
(3) the multi-dimensional asymptotic Pareto front is shown in the objective space to
illustrate the comparative performances of competing methods.
The remainder of this paper is organized as follows. Section 2 presents a literature
review of existing methods for the nurse scheduling problem and introduces the cen-
tral concepts of multiobjective optimization. Section 3 describes the problem formally
and articulates the proposed method. Section 4 presents experimental results together
with an analysis of their implications. Finally, concluding remarks and discussions are
given in Section 5.

2 Related Works
To assist various operations performed in a hospital, a work day is normally divided
into two to four shifts (for example, a three-shift day may include day, night, and late
shift). Each nurse is allocated to a number of shifts during the scheduling period with
a set of constraints. A shift is fulfilled by a specified number of nurses with different
medical skills depending on the operations to be performed in the shift. The adherent
constraints with nurse scheduling are necessary hospital regulations when taking into
account the wage cost, execution of operations, nurses’ requests, etc. The constraints
can be classified as hard constraints and soft constraints. Hard constraints should be
strictly satisfied and a schedule violating any hard constraints will not be acceptable.
Soft constraints are desired to be satisfied as much as possible and a schedule violat-
ing soft constraints is still considered feasible.
The objective could involve the reduction of the human resource cost, satisfaction
of nurses’ request, or minimization of violations to any soft constraints. Most existing
works seek to optimize one objective, only few consider multiple objectives when
search for solutions. Berrada et al. [1] proposed the first attempt to find a nurse
schedule optimizing several soft constraints simultaneously. The lexico-dominance
technique is applied where the priority order of the soft constraints is pre-specified
and is used to determine the quality of solutions. Burke et al. [3] applied the weight-
ing sum technique but the weight values are determined by the priority order of objec-
tives obtained after close consultation with hospitals. Burke et al. [4] proposed a
simulated annealing multiobjective method which generates the non-dominated solu-
tions to obtain an approximate Pareto front.
A widely accepted notion in decision science field for multiobjective optimization
is to search the Pareto-optimal solutions which are not dominated by any other solu-
tions. A solution x dominates another solution y, denoted x ; y , if x is strictly better
than y in at least one objective and x is no worse than y in the others. The plots of
objective values for all Pareto-optimal solutions form a Pareto front in the objective
space. It is usually hard to find the true Pareto front due to the high complexity of the
problem nature. Alternatively, an approximate Pareto front is searched for. The
68 P.-Y. Yin, C.-C. Chao, and Y.-T. Chiang

quality of this front is evaluated by two measures: (1) The convergence measure indi-
cates how close the approximate front is converging to the true front, and (2) the di-
versity measure favors the approximate front whose plots are evenly spread on the
front. Classical multiobjective optimization methods include lexico-dominance,
weighting sum, and goal programming. However, multiple runs of the applied method
are needed to obtain a set of non-dominated solutions. Recently, metaheuristic algo-
rithms have been introduced as a viable technique for multiobjective optimization.
Notable applications have been proposed using Strength Pareto Evolutionary Algo-
rithm (SPEA II) [6], Non-dominated Sorting Genetic Algorithm (NSGA II) [7], and
Multi-Objective Particle Swarm Optimization (MOPSO) [8].

3 Proposed Method
This paper deals with the MONSP on a shift-by-shift basis. Each working day is di-
vided to three shifts (day, night, and late shift), and the total shifts in a scheduling
period are numbered from 1 to S (1 indicates the day-shift of the first day, 2 indicates
the night-shift of the first day, etc). Assume that there are M types of nurse skills, and
skill type m is owned by Tm nurses. The aim of the MONSP is to optimize multiple
objectives simultaneously by allotting appropriate nurses to the shifts subject to a set
of hard constraints. By using the notations introduced in Table 1, we present the
mathematical formulation of the addressed MONSP as follows.

Table 1. Notations used in the addressed MONSP formulation

Lmj Min. number of nurses having skill m required to fulfill shift j


Umj Max. number of nurses having skill m required to fulfill shift j
Wm Min. number of shifts a nurse having skill m should serve in a scheduling period
Rm Max. number of consecutive working days that a nurse having skill m can serve
Cmj Cost incurred by allotting a nurse having skill m to shift j
Pmij

xmij

Pmij = 1 if nurse i having skill m is satisfied with shift j assignment;
Pmij = 1 if unsatisfied; and Pmij = 0 if no special preference
xmij = 1 if nurse i having skill m is allotted to shift j; otherwise, xmij = 0
M Tm S
Minimize f1 = ∑∑∑ x
m =1 i =1 j =1
mij C mj (1)

M S
⎛ Tm

Minimize f2 =
∑ ∑ ⎜⎜ ∑ x mij − L mj ⎟⎟ (2)
m =1 j =1 ⎝ i =1 ⎠

∑ ∑ ∑ x (1 − P )
M Tm S
Minimize f3 = (3)
mij mij
m =1 i =1 j =1

Subject to
S

∑x
j =1
mij ≥ W m ∀m, i (4)

Tm

∑x
i =1
mij ≥ L mj ∀m, j (5)
Multiobjective Optimization for Nurse Scheduling 69

Tm

∑x mij ≤ U mj ∀m, j (6)


i =1
r +2

∑x
j =r
mij ≤1 r = 1, 4, 7, …, S-2 ∀m, i
(7)
r + 3 ( Rm +1 )−1

∑x
j=r
mij ≤ Rm r = 1, 4, 7, …, S-2 ∀m, i (8)

x mij ∈ {0, 1} ∀m, i, j (9)


The first objective (Eq. (1)) intends to minimize the cost incurred by performing
the nurse schedule. The second objective (Eq. (2)) tries to minimize the deviation
between the minimum number of required nurses for a shift and the number of nurses
really allotted to that shift. The third objective originally intends to maximize the total


nurses’ preference Pmij about the schedule, it is converted to a minimization objective
by using 1 Pmij (Eq. (3)). The first constraint (Eq. (4)) stipulates that the number of
shifts fulfilled by a nurse having skill m should be greater than or equal to a minimum
threshold Wm. Eq. (5) and Eq. (6) describe that the number of nurses having skill m
which are allotted to shift j should be a value between Lmj and Umj. The fourth con-
straint (Eq. (7)) indicates any nurse can only work for at most one shift during any
working day. Finally, the fifth constraint (Eq. (8)) requests that the nurse having skill
m can serve for at most Rm consecutive working days.

Fig. 1. The conception diagram of the MOCSA

One of the notable PSO variants is the Cyber Swarm Algorithm (CSA) [5] which
facilitates the reference set, a notion from scatter search [9], keeping the most influen-
tial solutions. To seek the approximate Pareto optimal solutions for the MONSP prob-
lem, we propose the multiobjective version of the CSA, named MOCSA. Fig. 1 shows
the conception diagram of the MOCSA which consists of four memory components.
The swarm memory component is the working memory where a population of swarm
particles evolve to improve their solution quality. The individual memory reserves a
separate space for each particle and stores the pseudo non-dominated solutions by
reference to all the solutions found by this designated particle only. Note that the
70 P.-Y. Yin, C.-C. Chao, and Y.-T. Chiang

pseudo non-dominated solutions could be dominated by the solutions found by other


particles, but we propose to store the pseudo non-dominated solutions because our
preliminary results show that these solutions contain important diversity information
along the individual search trajectory and they assist in finding influential solutions
that are overlooked by just using global non-dominated solutions. The global memory
tallies the non-dominated solutions that are not dominated by any other solutions
found by all the particles. The solutions stored in the global memory will be output as
the approximate Pareto optimal solutions as the program terminates. Finally, the ref-
erence memory taking the notion of reference set from scatter search [9] selects the
most influential solutions based on objective values and diversity measures. The
MOCSA exploits the guiding information by the manipulations on different types of
adaptive memory. The details of the features of MOCSA are presented as follows.
Particle Representation and Fitness Evaluation. Given S working shifts to be
fulfilled, there are at most 2 S possible allocations (without considering scheduling


constraints) for assigning a nurse to available shifts. Hence, a nurse schedule can be
encoded as a value between [0, 2 S 1]. Assume a population of U particles is used,
where a particle Pi = {pij}, indicating the schedule for all the nurses. The fitness of the
ith particle is a four-value vector (f1, f2, f3, f4). The objective values evaluated using
Eqs. (1)-(3) are referred to as the first three fitness values (f1, f2, f3). The fourth fitness
value f4 serves as a penalty which computes the amount of total violations incurred by
any constraint (Eqs. (4)-(8)). We assume that a feasible solution always dominates
any infeasible solution.
Exploiting guiding information. The CSA extends the learning form using pbest
and gbest by additionally including another solution guide which is systematically
selected from the reference set, storing a small number of reference solutions, denoted
RefSol[m], m = 1, 2, …, RS, observed by all particles by reference to fitness values
and solution diversity. For implementing the MOCSA, the selecting of solution guides
is more complex because multiple non-dominated solutions can play the role of pbest,
gbest and RefSol[m]. Once the three solution guides were selected, particle Pi updates
its positional vector in the swarm memory by the guided moving using Eqs. (10) and
(11) as follows.
⎛ ⎛ ω1ϕ1 pbestij + ω 2ϕ 2 gbest j + ω3ϕ 3 RefSol[m] j ⎞ ⎞ ,1≤m≤RS (10)
vijm ← K ⎜ vij + (ϕ1 + ϕ 2 + ϕ 3 )⎜⎜ − pij ⎟⎟ ⎟
⎜ ω ϕ + ω ϕ + ω ϕ ⎟
⎝ ⎝ 1 1 2 2 3 3 ⎠⎠

Pi←non-dominated { ( f (P + v ) 1 ≤ k ≤ 4)
k i
m
i m ∈ [1, RS ] } (11)

where K is the constriction factor, ω and ϕ are the weighting value and the cognition
coefficient for the three solution guides pbest, gbest and RefSol[m]. As RefSol[m],
1≤m≤RS is selected in turn from the reference set, the process will generate RS can-
didate particles for replacing Pi. We choose the non-dominated solution from the RS
candidate particles. If there exist more than one non-dominated solutions, the tie is
broken at random. Nevertheless, all the non-dominated solutions found in the guided
moving are used for experience memory update as noted in the following.
Experience memory update. As shown in Fig. 1, experience memory consists of
individual memory, global memory and reference memory, where the rewarded
Multiobjective Optimization for Nurse Scheduling 71

experience pbest, gbest and RefSol[m] are stored and updated. The individual memory
tallies the personal rewarded experience pbest for each individual particle. Because
there may exist more than one non-dominated solution in the search course of a parti-
cle (here, the non-dominance only refers to all the solutions found by this particle),
we save all these solutions in the individual memory. Any solutions in the individual
memory can serve as pbest in the guided moving, and we’ll present the Diversity
strategy [10] for selecting pbest from the individual memory. By contrast to individ-
ual memory, the global memory stores all the non-dominated solutions found by the
entire swarm. Hence, the content of the global memory is used for the final output of
the approximate Pareto-optimal solutions. During the evolution, the solutions in the
global memory are also helpful in assisting the guided moving of particles by serving
as gbest. The Sigma strategy [11] is employed in our method for selecting gbest from
the global memory. The reference memory stores a small number of reference solu-
tions selected from individual and global memory. According to the original scatter
search template [9], we facilitate the 2-tier reference memory update by reference to
the fitness values and diversity of the solutions.
Selecting solution guides. First, the Diversity strategy for selecting pbest is em-
ployed where each particle selects from its individual memory a non-dominated solu-
tion as pbest that is the farthest away from the other particles in the objective space.
Thus, the particle is likely to produce a plot of objective values equally-distanced to
those of other particles, improving the diversity property of the solution front. Second,
we apply the Sigma strategy for selecting gbest from the global memory. For a given
particle, the Sigma strategy selects from the global memory a non-dominated solution
as gbest which is the closest to the line connecting the plot of the particle’s objective
values to the origin in the objective space, improving the convergence property of the
solution front. Finally, the third solution guide, RefSol[m], m = 1, 2, …, RS, is sys-
tematically selected from the reference memory. These reference solutions have good
properties of convergence and diversity, so their features should be fully explored in
the guided moving for a particle.

4 Result and Discussion


We have intensively consulted administrators and senior staffs at the Puli Christian
Hospital (http://www.pch.org.tw/english/e_index.html). A dataset consisting of two
problem instances was thus created for assessing the objective values of the nurse
schedules produced by various algorithms. The first problem instance (Problem I)
requires to determine the optimal scheduling of 10 nurses with two levels of skills in a
planning period of one week, while the second problem instance (Problem II) consists
of 25 nurses with three different skills to be scheduled in a period of four weeks.
Among others, NSGA II and MOPSO are two notable methods and are broadly
used as performance benchmarks. We thus choose these two methods for performance
comparison. All the algorithms were coded using C# language, and the following
experiments were conducted on a 2.4GHz PC with 1.25GB RAM. The quality of the
final solution front is evaluated in two aspects: the convergence of the produced front
to the true Pareto front and the diversity of the produced front manifesting the plots of
objective values that are evenly spaced on the front. The convergence measure named
Hypervolume calculates the size of the fitness space covered by the produced front.
72 P.-Y. Yin, C.-C. Chao, and Y.-T. Chiang

To prevent the bias preferred to a less number of efficient points, the Hypervolume is
normalized by the final number of solutions produced. The solutions with a smaller
Hypervolume value is more desired because they are closer to the true Pareto front.
The diversity measure named Spacing which estimates the variance of the distance
between adjacent fitness plots. The solutions with a smaller Spacing value are more
desired because these solutions exhibit a better representation of a front.
As all the competing algorithms are stochastic, we report the average performance
index values over 10 independent runs. Each run of a given algorithm is allowed with
a period of duration of 80,000 fitness evaluations. Table 2 lists the values of the per-
formance indexes for the solution fronts produced by the competing algorithms. For
Problem I, the MOCSA gives the smallest Hypervolume value indicating the
produced solution front converges closer to the true Pareto front than the other two
algorithms. The Spacing value for the MOCSA is also the smallest among all which
discloses that the non-dominated solutions produced by MOCSA spread more evenly
on the front. On the other hand, the NSGA II produces the greatest values (worst
performance) for both Hypervolume and Spacing, while the MOPSO generates the
intermediate values. The experimental outcome for Problem II is slightly different
with the previous case. The NSGA II gives the smallest Hypervolume value (best
performance) although its spacing value indicates that the produced solutions are not
well distributed on the front. The MOCSA produces the second smallest Hyper-
volume value and the smallest Spacing value among all competitors, supporting the
claim that the MOCSA is superior to the other two algorithms. The MOPSO generates
the worst Hypervolume value and a median Spacing value.
Fig. 2(a) shows the plots of the multiobjective values of all the solutions for Prob-
lem I obtained by different algorithms. It is seen that the front produced by MOCSA
is closer to the origin. We can also observe that the spread of the solutions are better

Table 2. The values of performance indexes obtained by competing algorithms

Problem I Problem II
Hypervolume Spacing Hypervolume Spacing
MOCSA 2.42E+07 1.41 9.86E+07 3.40
NSGA II 9.45E+07 2.37 8.37E+07 7.82
MOPSO 6.17E+07 2.01 1.35E+08 4.24

(a) (b)

Fig. 2. The multiobjective-valued front for simulation problems


Multiobjective Optimization for Nurse Scheduling 73

distributed on the front than those produced by the other two methods. The front gen-
erated by the MOPSO is next to that produced by the MOCSA by reference to the
visual distance to the origin. The front generated by the NSGA II is the farthest to the
origin and the obtained solutions are not evenly distributed on the front. For Problem
II as shown in Fig. 2(b), we can see the front produced by the NSGA II is the closest
to the origin although the obtained solutions are still not evenly distributed on the
front. The MOCSA produces the front next to that of NSGA II, but better spacing is
observed. Finally, MOPSO front is the furthest to the origin where the distribution of
the obtained solutions on the front is also better than that produced by the NSGA II.

5 Conclusions
In this paper, we have presented a multiobjective cyber swarm algorithm (MOCSA)
for solving the nurse scheduling problem. Based on a literature survey, we propose a
mathematical formulation containing three objectives and five hard constraints. In
contrast to most existing methods which transform multiple objectives into an inte-
grated one, the proposed MOCSA method tackles the generic multiobjective setting
and is able to produce approximate Pareto front. The experimental results on two
simulation problems manifest that the MOCSA outperforms NSGA II and MOPSO in
terms of convergence and diversity measures of the produced fronts.

References
1. Berrada, I., Ferland, J., Michelon, P.: A multi-objective approach to nurse scheduling with
both hard and soft constraints. Socio-Economic Planning Sciences 30, 183–193 (1996)
2. Azaiez, M.N., Al Sharif, S.S.: A 0-1 goal programming model for nurse scheduling. Com-
puters & Operations Research 32, 491–507 (2005)
3. Burke, E.K., Li, J., Qu, R.: A Hybrid Model of Integer Programming and Variable
Neighbourhood Search for Highy-Constrained Nurse Rostering Problems. European Jour-
nal of Operational Research 203, 484–493 (2010)
4. Burke, E.K., Li, J., Qu, R.: A Pareto-based search methodology for multi-objective nurse
scheduling. Annals of Operations Research (2010)
5. Yin, P.Y., Glover, F., Laguna, M., Zhu, J.X.: Cyber swarm algorithms – improving parti-
cle swarm optimization using adaptive memory strategies. European Journal of Opera-
tional Research 201, 377–389 (2010)
6. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the strength pareto evolutionary
algorithm. Technical Report 103, ETH, Switzerland (2001)
7. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic
algorithm: NSGA-II. IEEE Transaction on Evolutionary Computation 6, 42–50 (2002)
8. Coello Coello, A.C., Pulido, G.T., Lechuga, M.S.: Handling multiple objectives with parti-
cle swarm optimization. IEEE Trans. on Evolutionary Computation 8, 256–279 (2004)
9. Laguna, M., Marti, R.: Scatter Search: Methodology and Implementation in C. Kluwer
Academic Publishers, London (2003)
10. Branke, J., Mostaghim, S.: About selecting the personal best in multi-objective particle
swarm optimization. In: Runarsson, T.P., Beyer, H.-G., Burke, E.K., Merelo-Guervós, J.J.,
Whitley, L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 523–532. Springer, Heidel-
berg (2006)
11. Mostaghim, S., Teich, J.: Strategies for finding local guides in multi-objective particle
swarm optimization (MOPSO). In: Proceedings of the IEEE Swarm Intelligence Sympo-
sium 2003 (SIS 2003), Indianapolis, Indiana, USA, pp. 26–33 (2003)
A Multi-Objective Binary Harmony Search Algorithm

Ling Wang, Yunfei Mao, Qun Niu, and Minrui Fei

Shanghai Key Laboratory of Power Station Automation Technology,


School of Mechatronics and Automation, Shanghai University,
Shanghai, 200072
[email protected]

Abstract. Harmony Search (HS) is an emerging meta-heuristic optimization


method and has been used to tackle various optimization problems successfully.
However, the research of multi-objectives HS just begins and no work on binary
multi-objectives HS has been reported. This paper presents a multi-objective
binary harmony search algorithm (MBHS) for tackling binary-coded multi-
objective optimization problems. A modified pitch adjustment operator is used to
improve the search ability of MBHS. In addition, the non-dominated sorting based
crowding distance is adopted to evaluate the solution and update the harmony
memory to maintain the diversity of algorithm. Finally the performance of the
proposed MBHS was compared with NSGA-II on multi-objective benchmark
functions. The experimental results show that MBHS outperform NSGA-II in
terms of the convergence metric and the diversity metric.

Keywords: binary harmony search, multi-objective optimization, harmony


search.

1 Introduction
Harmony Search (HS) is an emerging global optimization algorithm developed by
Geem in 2001 [1]. Owing to its excellent characteristics, HS has drawn more and
more attention and dozens of variants have been proposed to improve the
optimization ability. On the one hand, the control parameters of HS were investigated
and several adaptive strategies were proposed to achieve better performance. Pan et al
[2] proposed a self-adaptive global best harmony search algorithm in which the
harmony memory consideration rate and pitch adjustment rate were dynamically
adapted by the learning mechanisms. Wang and Huang [3]presented a self-adaptive
harmony search algorithm which used the consciousness to automatically adjust
parameter values. On the other hand, various hybrid harmony search algorithms were
proposed where additional information extracted by other algorithms was combined
with HS to improve the optimization performance. For instances, Li and Li
[4]combined HS with the real valued Genetic Algorithm to enhance the exploitation
capability. Several hybrid HS algorithms combined with Particle Swarm Optimization
(PSO) were developed to optimize the numerical problem [5], pin connected
structures [6] and water network design [7]. Other related works include the fusion of
HS with Simplex Algorithm [8] or Clonal Selection Algorithm [9].

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 74–81, 2011.
© Springer-Verlag Berlin Heidelberg 2011
A Multi-Objective Binary Harmony Search Algorithm 75

Now HSs have been successfully applied in a wide range of optimization problems
in the scientific and engineering fields. However, most of these works focused on the
single-objective optimization problems in the continuous or discrete space; and so far
just several researches are concerned with the binary-coded problems or multi-
objective optimization problems. On binary-coded optimization problems, Geem [10]
firstly used HS to solve the water pump switching problem where the candidate value
for each decision variable is “0” or “1”. Then Greblicki and Kotowski [11] analyzed
the properties of HS on the one dimensional binary knapsack problem and the
optimization performance of HS is unsatisfactory. Afterwards, Wang et al[12] pointed
out that the pitch adjustment rule of HS cannot performs its function for binary-coded
problems which is the root of the poor performance. To make up for it, Wang
proposed a binary HS algorithm in which a new pitch adjustment operator was
developed to ameliorate the optimization ability. On the multi-objective optimization
problems, Geem and Hwangbo [13] studied the satellite heat pipe design problem
which need simultaneously consider two objectives, i.e., the thermal conductance and
the heat pipe mass. However, the authors transformed this multi-objective problem
into a single objective function by minimizing the sum the individual error between
current function value and optimal value. And Geem [14] later used HS to tackle the
multi-objective time-cost optimization problem for scheduling a project. In this work,
the dominance-based comparison for selection was adopted to achieve the trade-off of
the time and cost. As far as we know, there is no work reported on the multi-objective
binary HS (MBHS). To extend HS to tackle the multi-objective binary-coded
problems, a new Pareto-based multi-objective binary HS is proposed in the work.
This paper is organized as follow. Section 2 briefly introduces the standard HS
algorithm. Then the proposed MBHS is described in Section 3 in details. Section 4
presents the experimental results of MBHS on the benchmark functions and the
comparisons with NSGA-II are also given. Finally, some conclusions are drawn in
Section 5.

2 Harmony Search Algorithm


Harmony Search Algorithm is inspired by the improvising process of musicians. HS
mimics this process by keeping a matrix of the best solution vectors named the
Harmony Memory (HM). The number of vectors that can be simultaneously
remembered in the HM called the Harmony Memory Size (HMS). These memory
vectors are initialized with HMS solutions randomly generated for each decision
variable. The search procedure after initialization is called improvisation which
includes three operators, i.e., harmony memory considering operator, pitch adjusting
operator and random selection operator.
The harmony memory considering rate (HMCR), which is between 0 and 1,
controls the balance between exploration and exploitation during improvisation. A
random number is generated and compared with HMCR during search process for
each decision variable. If it is smaller than HMCR, the memory vector in HM is taken
into consideration for generating the new value; otherwise a value is randomly
selected from the possible ranges of the decision variable. Each decision variable of
the new solution vector obtained from the HM is examined to determine whether it
should be pitch adjusted. The pitch adjusting rate (PAR) decides the ratio of
76 L. Wang et al.

pitch-adjustment. Another random number between 0 and 1 is generated and the


pitch-adjustment operating as Eq. (1) is executed if it is not bigger than PAR.

⎪⎧ xi + rand ().BW in continuous space


new

xi new = ⎨ new (1)


⎪⎩ xn in discrete space

Here xi new is the i-th element of the new harmony solution vector; rand() is the
random number; BW is an arbitrary distance band width; and xn new is a neighboring
value of xi new .
If the new harmony vector is better than the worst solution vector in HM in terms
of the fitness value, it will be included in the HM and the existing worst harmony
solution vector is excluded from HM. This process runs iteratively till the terminated
rules are satisfied.

3 Multi-Objective Binary Harmony Search Algorithm


The standard multi-objective HS can be used to deal with binary-coded multi-
objective optimization problems, but the disfunction of the pitch adjustment operator
in binary space will spoil the performance greatly. So the multi-objective binary
harmony search algorithm (MBHS) is proposed in this paper to achieve the
satisfactory optimization ability. In MBHS, the harmony vector is formed by the
binary-string. For a N-dimension problem, the HM with the size of HMS can be
represented as Eq. (2) and initialized randomly,

⎡ x1,1 x1,2 ... x1, N −1 x1, N ⎤


⎢ ⎥
⎢ x2,1 x2,2 ... x2, N −1 x2, N ⎥
⎢... ... ... ... ... ⎥
x HM =⎢ ⎥ (2)
⎢... ... ... ... ... ⎥
⎢x xHMS −1,2 ... xHMS −1, N −1 xHMS −1, N ⎥
⎢ HMS −1,1 ⎥
⎢⎣ xHMS ,1 xHMS ,2 ... xHMS , N −1 xHMS , N ⎥⎦

where xi , j ∈ {0,1} is the j-th element of i-th harmony memory vector. Like the
standard HS, MBHS also uses three updating operators, that is, harmony memory
consideration operator, pitch adjustment operator and random selection to generate
the new solutions.

3.1 Harmony Memory Consideration Operator and Radom Selection Operator

In MBHS, harmony memory consideration operator (HMCO) and random selection


operator (RSO) are used to perform the global search. MBHS performs HMCO with
the probability of HMCR, i.e., picking a value in HM; while it runs RSO with the rate
of (1-HMCR), i.e., choosing a feasible value not limited to HM, which means that the
bit is re-initialized stochastically to be “0” or “1”. The process of HMCO and RSO
can be described as Eq. (3-4)
A Multi-Objective Binary Harmony Search Algorithm 77

⎧⎪ xk , j , k ∈ {1, 2,....HMS} r1 ≤ HMCR


xj = ⎨ (3)
⎪⎩ RSO otherwise

⎧1 r2 ≤ 0.5
RSO = ⎨ (4)
⎩0 otherwise

where x j is the j-th bit of the new harmony solution vector; r1 and r2 are two
independent random number between 0 and 1.

3.2 Pitch Adjustment Operator

If the element of the new harmony comes from the HM, it need to be adjusted by
pitch adjustment operator (PAO) with the probability PAR. However, in binary space,
the value of the each element in HM is bound to be “0” or “1”, so the standard
definition of PAO in HS will be degraded to mutation operation [12]. If we simply
abandon the PAO, the algorithm will lack the operator to perform local search. To
remedy it, the pitch adjustment operator as Eq. (5) is used in MBHS.

⎧⎪ B j r ≤ PAR
xj = ⎨ (5)
⎪⎩ x j otherwise

where r is a random number; B j is the j-th element of the best harmony solution
vector in HM. The PAO executes a local search based on the current solution and the
optimal solution which will help MBHS find the global optima effectively and
efficiently.

3.3 Updating of HM

The new generated harmony vector is added into the HM. Then all the solutions in
HM are sorted according to the fitness values and the solution with the worst fitness is
removed from HM. In the multi-objective optimization problems, the two major goals
of Pareto-based optimizer are to pursue the convergence to the Pareto-optimal set as
well as maintain the diversity. To achieve it, the non-domination sort strategy based
on crowding distance is adopted to sort the HM vectors.

4 Result and Discussion


Following the previous work, five multi-objective optimization functions, i.e.,
SCH*[15], FON[16] and Deb*[17], are chosen as benchmark problems.

4.1 Performance Measures

In this work, the convergence metric γ and the diversity metric Δ proposed in [18]
are adopted to evaluate the performance.
78 L. Wang et al.

(1) convergence metric γ


The convergence metric γ is used to measure the closeness of solutions in obtained
Pareto-optimal set to true Pareto-optimal set and it is calculated as Eq.(6):
| p*| k
⎛ f m (hi ) − f m ( p j ) ⎞
di = min
j =1
∑⎜
m =1 ⎝ f m max − f m min ⎠
⎟ (6)

|H |

∑d i
(7)
γ = i =1

|H |

where p* = ( p1 , p2 ,...... p| p* | ) is the true Pareto-optimal set, H = (h1 , h2 ,......h| A| ) is the


obtained Pareto-optimal set, f m max is the maximum of the m-th objective function
and f m min is the minimum of the m-th objective function. In this work, a set of
| p* |= 400 uniformly distributed Pareto-optimal solutions is used to calculate the
convergence metric γ .

(2) diversity metric Δ


The diversity metric is computed as Eq. (8):
HMS −1 −
d f + dl + ∑ di − d
(8)
Δ= i =1

d f + dl + ( HMS − 1)

where di is the distance between two successive solutions in the obtained Pareto-
optimal set; d is the mean value of all the di ; d f and d l are the two Euclidean
distances between the extreme solutions and the boundary solutions of the obtained
non-dominated set.

4.2 Result and Discussion

For MBHS, a reasonable set of parameter values are adopted, i.e., HMCR=0.9, and
PAR=0.03; each decision variable are coded with 30 bits. For a comparison, NSGA-II
[18] with the default parameters is used to solve these problems as well. MBHS and
NSGA-II both ran with 50000 function evaluations. Table 1-2 list the optimization
results of MBHS and NSGA-II and box plots of γ and Δ are given in Fig.1
and Fig.2.
According to the results in Table 1-2, it is reasonable to claim that the proposed
MBHS is superior to the NSGA-II. Fig.1 indicated that MBHS generally achieved
solutions with higher quality in comparison with NSGA-II in terms of convergence
metric. And in Fig.2, the comparison of the diversity metric indicated that MBHS is
able to find a better spread of solutions and obviously outperforms NSGA-II in all
problems.
A Multi-Objective Binary Harmony Search Algorithm 79

-3 -3
x 10 FON x 10 DEB1 DEB2
1.25
0.2
8 1.2
Convergence

Convergence

Convergence
0.15
1.15
6
1.1 0.1
4
1.05
0.05
2 1

0.95 0
MBHS NSGA-II MBHS NSGA-II MBHS NSGA-II

-4 -4
x 10 SCH1 x 10 SCH2

10.5 8
Convergence

Convergence
10 7.5

9.5
7

9
6.5
MBHS NSGA-II MBHS NSGA-II

Fig. 1. Box plot of the convergence metrics γ obtained by MBHS and NSGA-II

FON DEB1 DEB2


0.8
0.65
0.9

0.6
0.6 0.8
Diversity

Diversity

Diversity

0.4 0.7
0.55

0.2 0.6
0.5
0.5
MBHS NSGA-II MBHS NSGA-II MBHS NSGA-II

SCH1 SCH2
0.5 1.1

0.4
Diversity

1.05
Diversity

0.3
1
0.2

0.95
0.1
MBHS NSGA-II MBHS NSGA-II

Fig. 2. Box plot of the diversity metrics △ obtained by MBHS and NSGA-II
80 L. Wang et al.

Table 1. Mean and Variance of the convergence metric γ

MBHS NSGA-II
Mean Variance Mean Variance
FON 1.9534481E-003 2.5725898E-003 1.9009196E-003 1.9787263E-004
SCH1 9.7508949E-004 5.9029912E-005 9.7769396E-004 6.9622480E-005
SCH2 7.3687049E-004 5.6615711E-005 7.4402367E-004 5.2879053E-005
DEB1 1.0286786E-003 5.8010990E-005 1.0697121E-003 6.6791139E-005
DEB2 8.3743810E-003 1.5211841E-002 9.8603419E-002 1.0217030E-001

Table 2. Mean and Variance of the diversity metric Δ

MBHS NSGA-II
Mean Variance Mean Variance
FON 9.6845154E-002 6.2345711E-002 7.8416829E-001 2.9294262E-002

SCH1 1.1668542E-001 1.0841259E-002 4.2701519E-001 3.5264364E-002

SCH2 9.4714113E-001 1.6775193E-003 1.0347253E+000 2.7413411E-002

DEB1 4.7516338E-001 5.5063477E-003 6.3378683E-001 1.9689019E-002

DEB2 6.6037039E-001 1.8871529E-001 6.8131960E-001 1.1085579E-001

5 Conclusion

This paper presented a new multi-objective binary harmony search algorithm for
tackling the multi-objective optimization problems in binary space. A modified pitch
adjustment operator is used to perform a local search and improve the search ability
of algorithm. In addition, the non-dominated sorting based on crowding distance is
adopted to evaluate the solution and update the HM which insures a better diversity
performance as well as convergence of MBHS. Finally the performance of the
proposed MBHS was compared with NSGA-II on five well-known multi-objective
benchmark functions. The experimental results show that MBHS outperforms NSGA-
II in terms of the convergence metric and the diversity metric.

Acknowledge

This work is supported by Research Fund for the Doctoral Program of Higher Education
of China (20103108120008), the Projects of Shanghai Science and Technology
Community (10ZR1411800 & 08160512100), ChenGuang Plan (2008CG48),
Mechatronics Engineering Innovation Group project from Shanghai Education
Commission, Shanghai University “11th Five-Year Plan” 211 Construction Project and
the Graduate Innovation Fund of Shanghai University.
A Multi-Objective Binary Harmony Search Algorithm 81

References
1. Geem, Z., Kim, J., Loganathan, J.: A new heuristic optimization algorithm: harmony
search. J. Simulations 76, 60–68 (2001)
2. Pan, Q., Suganthan, P., Tasgetiren, M., Liang, J.: A self-adaptive global best harmony
search algorithm for continuous optimization problems. Applied Mathematics and
Computation 216, 830–848 (2010)
3. Wang, C., Huang, Y.: Self-adaptive harmony search algorithm for optimization. Expert
Systems with Applications 37, 2826–2837 (2010)
4. Li, H., Li, L.: A novel hybrid real-valued genetic algorithm for optimization problems. In:
International Conference on Computational Intelligence and Security, pp. 91–95 (2008)
5. Omran, M., Mahdavi, M.: Global-best harmony search. Applied Mathematics and
Computation 198, 643–656 (2008)
6. Li, L., Huang, Z., Liu, F., Wu, Q.: A heuristic particle swarm optimizer for optimization of
pin connected structures. Computers & Structures 85, 340–349 (2007)
7. Geem, Z.: Particle-swarm harmony search for water network design. Engineering
Optimization 41, 297–311 (2009)
8. Jang, W., Kang, H., Lee, B.: Hybrid simplex-harmony search method for optimization
problems. In: IEEE Congress on Evolutionary Computation, pp. 4157–4164 (2008)
9. Wang, X., Gao, X.Z., Ovaska, S.J.: A hybrid optimization method for fuzzy classification
systems. In: 8th International Conference on Hybrid Intelligent Systems, pp. 264–271
(2008)
10. Geem, Z.: Harmony search in water pump switching problem. In: Wang, L., Chen, K., S.
Ong, Y. (eds.) ICNC 2005. LNCS, vol. 3612, pp. 751–760. Springer, Heidelberg (2005)
11. Greblicki, J., Kotowski, J.: Analysis of the Properties of the Harmony Search Algorithm
Carried Out on the One Dimensional Binary Knapsack Problem. In: Moreno-Díaz, R.,
Pichler, F., Quesada-Arencibia, A. (eds.) EUROCAST 2009. LNCS, vol. 5717, pp. 697–
704. Springer, Heidelberg (2009)
12. Wang, L., Xu, Y., Mao, Y., Fei, M.: A Discrete Harmony Search Algorithm.
Communications in Computer and Information Science 98, 37–43 (2010)
13. Geem, Z., Hwangbo, H.: Application of harmony search to multi-objective optimization
for satellite heat pipe design. Citeseer, pp. 1–3 (2006)
14. Geem, Z.: Multiobjective Optimization of Time Cost Trade off Using Harmony Search.
Journal of Construction Engineering and Management 136, 711–716 (2010)
15. Schaffer, J.: Multiple objective optimization with vector evaluated genetic algorithms. In:
Proceedings of the 1st International Conference on Genetic Algorithms, pp. 93–100 (1985)
16. Fonseca, C., Fleming, P.: Multiobjective optimization and multiple constraint handling
with evolutionary algorithms. II. Application example. IEEE Transactions on Systems,
Man and Cybernetics, Part A: Systems and Humans 28, 38–47 (2002)
17. Deb, K.: Multi-objective genetic algorithms: Problem difficulties and construction of test
problems. Evolutionary Computation 7, 205–230 (1999)
18. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multi-objective genetic
algorithm: NSGA-II. IEEE Trans. on Evolutionary Computation 6, 182–197 (2002)
A Self-organized Approach to Collaborative Handling of
Multi-robot Systems

Tian-yun Huang1,2, Xue-bo Chen2, Wang-bao Xu1,2, and Wei Wang1


1
Faculty of Electronic Information and Electrical Engineering,
Dalian University of Technology,
116024 Liaoning, China
2
School of Electronics and Information Engineering,
Liaoning University of Science and Technology,
114051 Liaoning, China
[email protected], [email protected], [email protected],
[email protected]

Abstract. The purpose of this paper is to develop a general self-organized


approach to multi-robot’s collaborative handling problem. Firstly, an autonomous
motion planning graph (AMP-graph) is described for individual movement
representations. An individual autonomous motion rule (IAM-rule) based on
“free-loose” and “well-distributed load-bearing” preferences is presented. By es-
tablishing the simple and effective individual rule model, an ideal handling forma-
tion can be formed by each robot moving autonomously under their respective
preferences. Finally, the simulations show that both the AMP-graph and the IAM-
rule are valid and feasible. On this basis, the self-organized approach to collabora-
tive hunting and handling with obstacle avoidance of multi-robot systems can be
further analyzed effectively.

Keywords: Self-organized, Collaborative handling, Formation control.

1 Introduction
Collaborative handling, as one of tasks of multi-robot systems, plays an important
role in the research on collaborative control of complex system. It begins with the
research on ‘two industrial robots handling a single object’ of Zheng, Y.F. and J.Y.S.
Luh [1], continues in the work of Y Kume [2] as ‘multiple robots’, and receives ma-
turity in recent times in the work of a motion-planning method of multiple mobile
robots in a three-dimensional environment. (see, for example, [3] ). In the Early Stage
of research most of the classic approaches to collaborative handling are centralized
control which may be effective only when the number of controllers is usually limited
within a certain range [4][5][6]. Decentralized control is an effective method by
which each robot is controlled by its own controller without explicit communication
among robots, and the method usually employ the leader-following relational mode
by assigning to a leader who obtain the motion information of the object [2][7].
However, it may not be the best choices because of explicit relational mode and the
bottleneck in communication and computing of leader robot. Self-organized approach is

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 82–90, 2011.
© Springer-Verlag Berlin Heidelberg 2011
A Self-organized Approach to Collaborative Handling of Multi-robot Systems 83

a good train of thought for collaborative handling of multi-robot systems, even swarm
systems [8][9][10]. The main objective of this paper is to initiate a study on self-
organized approach to multi-robot’s collaborative handling problem. For individual
movement representations, an autonomous motion planning graph (AMP-graph) is
described. An individual autonomous motion rule (IAM-rule) including two kind of
“free-loose” and “well-distributed load-bearing” preferences is presented. By establish-
ing the simple and effective individual rule model, an ideal handling formation can be
formed by each robot moving autonomously under their respective preferences. The
simulations show that both the AMP-graph and the IAM-rule are valid and feasible.
Considering many uncertain factors in the handling process, before continuing any
further we will make three necessary assumptions: First, the handling process happen
in the ideal plane. Second, the rim of object exist any solid handling points which
hardly produce deformation. Lastly, handling robots with strong bearing capacity
don’t sideslip and deflect in the handling process. Based on these assumptions, a self-
organized approach will be design.

2 Autonomous Motion Planning Model


Definition 1. Based on local sensing, each robot can complete collaborative handling
task only through their own simple rules, we call it multi-robot self-organized han-
dling (MSH).
Based on Definition 1, we will make three assumptions [11], which are the base of
autonomous motion planning model of handling robots:
Assumption 1. Each handling robot can obtain location information of object.
Assumption 2. Each handling robot has a local sensor, by which the robot can ob-
tain the position information of a finite number of its neighboring robots.
Assumption 3. There are always some simple rules, by which each robot can
autonomously move under the respective preferences to form an ideal handling for-
mation.
Next, the details of autonomous motion planning model will be described.
Definition 2. In the absolute coordinates XaOaYa, the robot Ri can obtain four location
information denoted by T0=(xt0,yt0), Ri=(xi,yi), Rpi=(xpi,ypi), Rqi=(xqi,yqi), which are of
any target point T0 within the object, the robot Ri and two neighboring robots Rpi, Rqi
of the robot Ri. A target constraint line clti is the position vector from the robot Ri to
the target T0, denoted by clti=(xt0-xi)+i(yt0-yi). A target constraint angle θ ti is the angle
from X-axis to clti, denoted by θ ti =arctan((yt0-yi)/(xt0-xi)). Two interconnected con-
straint lines clpi and clqi is the position vector from the robot Ri to its neighboring
robot Rpi and Rqi, denoted by clpi=(xpi-xi)+i(ypi-yi), clqi=(xqi-xi)+i(yqi-yi). Two intercon-
nected constraint angles θ pi and θ qi is the deflection angle from X-axis to clpi and clqi,
denoted by θ pi =arctan((ypi-yi)/(xpi-xi)), θ qi =arctan((yqi-yi)/(xqi-xi)). And two intercon-
nected deflection angle θpti and θqti are the angle from clti to clpi and clqi, denoted by
θ pti = θ pi - θ ti , θ qti = θ qi - θ ti . The desired linear velocity vei is decomposed into a
vertical component vtdi in the direction of clti and a horizontal component vtpi in the
direction perpendicular to clti. The desired deflection angle θ eti is the angle from vtdi to
vei, denoted by θ eti = θ ti - θ ei .
84 T.-y. Huang et al.

T0
clti vei
vtdi θ ei
Rpi θ ti Rqi
θ pi vtpi clqi
θ tpi θ qi
clpi
Ri
Fig. 1. The AMP-graph

Then, an autonomous motion planning graph (AMP-graph) is formed by location


information of any target point T0 and two neighboring robots Rpi, Rqi, shown in
Fig. 1.

3 Individual Autonomous Motion Rule Model


From the above discussion, we note that the key of MSH problem is how to design
some simple rules by which each robot can autonomously determine the direction of
motion at every moment, and by which all the handling robots can be distributed evenly
to various points around the edge of the target within a finite time. There are two parts
in the moving process: (1) aimed at collision avoidance when the robots move from
initial points to target; (2) aimed at well-distributed load-bearing when all the robots
reach the edge of the target. Considering the parameters in definition 2 and two different
motion processes, an individual autonomous motion rule (IAM-rule) with the “free-
loose” and “well-distributed load-bearing” preferences will be designed.

3.1 The IAM-Rule Based on the “Free-Loose” Preference

Given in Definition 2, the desired linear velocity vei is a vector sum of vtdi in the direc-
tion of clti and vtpi in the direction perpendicular to clti. For the sake of simplicity, only
the target T0, two neighbors Rpi, Rqi of the ith robot Ri are taken into account in the
IAM-rule based on the “free-loose” preference. Ensure that vtpi points to the “free-
loose” space while vtdi always point in the direction of cl0i and two constraint condi-
tions vtdi= fd(|clti|,|clpi|,|clqi|) and vtpi=fp(|clpi|,|clqi|) are satisfied, where fd, fp are
the vertical and horizontal potential functions. In the process of moving to the target,
the rule will make all the robots coordinate into the ideal formation and all the robots
tend towards scatter each other and towards gather relative to the target, therefore we
call it the IAM-rule based on the “free-loose” preference.
The “free-loose” space modeling. Let us consider the robot Ri in the relative co-
ordinates in which Y-axis always point to the target T0, the first and forth quadrants
are defined as the positive quadrants since θ pti , θ qti are positive within them and the

second and forth quadrants are defined as the negative quadrants since θ tpi , θ qti are
l

negative within them, then the “free-loose” space can be described:


A Self-organized Approach to Collaborative Handling of Multi-robot Systems 85

The direction of the “free-loose” space points to


1) the opposite direction of the space two neighbors belong to, when two neighbors
lay together in the positive or negative quadrant.
2) the direction of the space the interconnected constraint line with greater X-axis
component belong to, when two neighbors lay respectively in the positive or negative
quadrant.
Thus, the description can be expressed mathematically as follows:

⎧⎪Cli = cl pi sin θ pti + clqi sin θ qti clti ≠ 0


⎨ (3.1)
⎪⎩Cli = cl pi sgn(θ pti ) + clqi sgn(θ qti ) clti = 0

⎧θ tpi
l
= θ ti Cli ≤ ε

⎨ sgn(θ pti )+ sgn(θ qti )
π
(3.2)
⎪θ tpi
l
= θ ti + (−1) 2
⋅ sgn(Cli ) ⋅ Cli > ε
⎩ 2
where ε is a permissible error. Because Cli covers all of the information to determine
autonomous motion of Ri, we call it interconnected characteristics parameter with
“free-loose” feature. Cli denotes the vector sum of the X-axis components of two
interconnected direction lines clpi and clqi if the robot Ri has not reach the edge of the
target, or the vector sum of clpi and clqi if the robot Ri has reach. Similarly, because
θ tpil covers all the possible direction of “free-loose” space of Ri, therefore we call it
autonomous motion direction angle with “free-loose” feature. Specially, the desired
linear velocity vei point in the direction of θ ti if “free-loose” space do not exist, that
is, ∃ε , θ tpi = θ ti , if |Cli| ≤ ε .
l

We know that the arrow of the Y-axis represents the direction that all the robots
tend towards gather relative to the target and the arrow of the X-axis represents the
direction that all the robots tend towards scatter each other on the edge of the target.
Therefore, the desired angle θ ei at every moment of autonomous motion with the
IAM-rule based on the “free-loose” preference can be obtained as follow:

⎧θ ei = θ ti Cli ≤ ε

⎪θ ei = θ ti + arctan(vtpi / vtdi ) Cli > ε and clti ≠ 0
⎨ (3.3)
⎪θ ei = θ tpi Cli > ε and clti = 0
l

⎪ *
⎩θ ei = θ ti Cli ≤ ε and clti = 0
Eq. (3.3) describes every process of multi-robot self-organized handling. Accord-
ing to Definition 2, the desired angle θ ei is the deflection angle of vei and xa if two
interconnected constraint lines exist and the robot does not reach the edge of the
86 T.-y. Huang et al.

target, that is, |clti| ≠ 0 and |Cli| > ε . Specially, when two interconnected constraint
lines do not exist, that is |Cli|=0, the desired angle θ ei coincides with the target con-
straint angle θ ti . When the robot reaches the edge of the target and the interconnected
constraint lines exist, that is, |clti|=0 and |Cli| > ε , the desired angle θ ei coincides
with θ . When the robot reaches the edge of the target and the interconnection can be
l
ti
negligible, that is, |clti|=0 and |Cli| ≤ ε , the robot obtain a stable desired
angel θ coinciding with θ ti .
*
ei
Now, we turn to the second motion process of multi-robot self-organized handling.

3.2 The IAM-Rule Based on the “Well-Distributed Load-Bearing” Preference

After a uniform dispersed formation is formed by autonomous motion with the IAM-
rule based on the “free-loose” preference, that is, |clti|=0 and |Cli| ≤ ε , all the han-
dling robots smoothly lift up the object together to measure the load bearing data
which are used as the parameter of the IAM-rule based on the “well-distributed load-
bearing” preference. Similar to the “free-loose” preference, only the load-bearings of
the two nearest neighbors at both left and right sides of Ri are taken into account.
Ensure that Ri always move along the edge of the object and in the direction of
neighbor with larger load-bearing, the IAM-rule will make the load-bearing of all the
robots tending towards average, therefore we call it the IAM-rule based on the “well-
distributed load-bearing” preference.
The “well-distributed load-bearing” space modeling. Similar to the “free-loose”
preference, the robot Ri in the relative coordinates in which Y-axis always point to the
target T0, the first and forth quadrants are defined as the positive quadrants
since θ pti , θ qti are positive in them and the second and forth quadrants are defined as
the negative quadrants since θ pti , θ qti are negative in them, then the “well-distributed
load-bearing” space can be described:
The direction of the “well-distributed load-bearing” space points to the direction of
the space the neighbor with larger load-bearing belong to.
Corresponds to the "free-loose" preference model, the description can be expressed
mathematically as follows:
Cbi = G pi sgn(θ pdi ) + Gqi sgn(θ qdi ) (3.4)

⎧θ ei = θ tpi
b
= θ ti Cbi ≤ ε

⎪ π
⎨θ ei = θ tpi = θ ti + sgn(Cb ) ⋅ Cbi > ε
b
(3.5)
⎪ 2
⎪Gei* = G0 / n all Cbi ≤ ε

where Gpi and Gqi are the load-bearing of the two nearest neighbors at both left and
right sides of Ri. Because Cbi covers all of the information to determine the direction
A Self-organized Approach to Collaborative Handling of Multi-robot Systems 87

of autonomous motion of Ri, we call it interconnected characteristics parameter with


“well-distributed load-bearing” feature. Similarly, because θ tpi covers all the possible
b

direction of “well-distributed load-bearing” space of Ri, therefore we call it autono-


mous motion direction angle with “well-distributed load-bearing” feature. Specially,
if |Cbi| ≤ ε , i=1,2…n, then all the robots receive the weight equally denoted by
G*i=G0/n.

3.3 General Remarks on Multi-robot Self-organized Handling

Remark 1. Effective sensing range is the maximum range within which the omni-
direction sensor of each handling robot can detect a target, denoted by Rs. If the
minimum distance between the robot and the object beyond effective sensing range
Rs, the robot follows any given point T0=(xt0,yt0) within the object, or the robot follows
the point T0i=(xt0i,yt0i) located nearest from the edge of the object.
Remark 2. By setting the parameters of the potential field function, the IAM-rule can
maintain collision avoidance between any two robots. When the distance between the
robots is smaller, the potential field function make the interconnected deflection angle
increasing rapidly to produce greater repulsive interaction. Specially, when the “free-
loose” spaces in all directions do not exist, the robot is forced to remain stationary and
wait for a chance to autonomous motion.
Remark 3. Effective interconnected radius δ is the maximum value within which the
interaction between any two robots Rpi, Rqi exists, that is, ∃ δ i , |clpq|=|clpq| if
|clpq| ≤ δ i , or |clpq|=0 if |clpq| > δ i ,p ≠ q ∈ {1,2,…,n}.

4 Simulations and Analysis


In order to test the validity and feasibility of the IAM-rule based on the “free-loose”
preference, two simulations are carried out, in which 8 handling robots are present. The
group of robots is required to start from a disordered state and to then form the relatively
well-distributed formation around the edge of the object described as an ellipse, where
the parametric equations of the ellipse are x=3cos(t)-3, y=1.5sin(t)+1. Each robot
obtains the same motion parameters, denoted by r the radius of robot, by Rs effective
sensing radius, by δ effective interconnected radius, by ε the minimum distance differ-
ence between two robots, by T0 any given point within the object and by λ the step fac-
tor. The parameter values of the trajectory control are shown in Table 1 and the initial
position information of all the handling robots are shown in Table 2.

Table 1. The parameter values of the trajectory control

Parameter r ¤ Rs G O H
Value 0.2 0.1 8 4 0.3 0.5
88 T.-y. Huang et al.

Table 2. The initial position information of all the handling robots

R1 R 2 R3 R4 R 5 R6 R7 R8 T0
X 0 -2.2 1.1 -0.2 3.9 2.6 -0.3 -4.8 -4.0
Y -0.8 -4.6 -2.8 -1.3 0.2 -1.6 -7.2 -4.9 1.0

Fig. 2. The moving process of 8 handling robots with IAM-rule (36 steps)

From Fig. 2, we observe that after 36 steps all the handling robots distribute uni-
formly around the edge of the target, so the IAM-rule based on the “free-loose” pref-
erence can effectively make multi-robot systems form the ideal handling formation
corresponding to formation control [12][13][14]. In the initial period R7 follows the
known point T0 within the object, since the object from which the initial position of
R7 farther away can not be perceived, coincides with Remark 1. Due to the smaller
distance between R1 and R4 in the initial period, R1 and R4 obtain two larger desired
deflection angles θ et1 and θ et 4 , coincide with Remark 2. In addition, although the
robot R2, R7 and R8 are neighbors each other, the interaction between them are negli-
gible in the initial period because of the greater distances each other, and then the
re-establishment of the interaction makes their trajectories deflected during the
autonomous motion, coincides with Remark 3. It is to be noted that because each
robot is always fond of moving in the direction of “free-loose” space, the robots in the
periphery of the group possess more dispersity and make the ones within the group
pulled by the “free-loose” space to spread to the periphery gradually, thus the rela-
tively dispersed characteristics for the group are formed finally. If each robot satisfies
local collision avoidance conditions under the special circumstances of Remark 2, we
might as well call it “strict collision avoidance”.
A Self-organized Approach to Collaborative Handling of Multi-robot Systems 89

5 Conclusion and Future Work


The self-organized approach with IAM-rules has the following advantages over other
methods.
Firstly, a simple and effective individual autonomous motion rule(IAM-rule) model
is established, by which an ideal handling formation can be formed by each robot
moving autonomously under their respective preferences. Compared with the central-
ized control used for multi-robot collaborative handling, the self-organized approach
with IAM-rule is simple and has fewer bottlenecks in communication and computing
caused by the centralized data processing and leader guiding. For the robot itself, if
the information of the target and two neighbors are obtained by local perception, it
will determine their own desired speed, thus less information processing are benefi-
cial to making rapid judgment.
Secondly, the self-organized approach with IAM-rule has the good characteristics
for strict collision avoidance which provides a solution for coordination problem of
swarm systems. The IAM-rule can be applied to explanation and resolution of group
behaviors since the “free-loose” preference coincides with individual behavior of real
system.
Thirdly, it may provide a novel train of thought for emergence control modeling,
which is verified by the simulation that the system can be controlled to produce cer-
tain characteristic and function of emergence by constructing simple individual rules.
The paper is the basement of research on emergence of multi-robot collective be-
havior. Future work include: 1.On this basis, self-organized approach to multi-robot
systems’ collaborative hunting and handling with obstacle avoidance can be further
analyzed effectively. 2. More rules with certain preferences can be designed to jointly
complete more complex function of swarm systems. 3. Based on the IAM-rule, leader
emergence can be further discussed.

Acknowledgments. Supported by the National Natural Science Foundation of China


(Grant No. 60874017).

References
1. Kim, K.I., Zheng, Y.F.: Two Strategies of Position and Force Control for Two Industrial
Robots Handling a Single Object. Robotics and Autonomous Systems 5, 395–403 (1989)
2. Kosuge, K., Oosumi, T.: Decentralized Control of Multiple Robots Handling an Object. In:
IEEE/ RJS Int.Conf. on Intelligent Robots and Systems, vol. 1, pp. 318–323 (1996)
3. Yamashita, A., Arai, T., et al.: Motion Planning of Multiple Mobile Robots for Coopera-
tive Manipulation and Transportation. IEEE Transactions on Robotics and Automa-
tion 19(2) (2003)
4. Koga, M., Kosuge, K., Furuta, K., Nosaki, K.: Coordinated Motion Control of Robot Arms
Based on the Virtual International Model. IEEE Transactions on Robotics and Autono-
mous Systems 8 (1992)
5. Wang, Z., Nakano, E., Matsukawa, T.: Cooperating Multiple Behavior-Based Robots for
Object Manipulation. In: IEEE /RSJ/GI International Conference on Intelligent Robots and
Systems IROS 1994, vol. 3, pp. 1524–1531 (1994)
90 T.-y. Huang et al.

6. Huang, T.-y., Wang, X.-n., Chen, X.-b.: Multirobot Time-optimal Handling Method Based
on Formation Control. Journal of System Simulation 22, 1442–1465 (2010)
7. Kosuge, K., Taguchi, D., Fukuda, T., Sakai, M., Kanitani, K.: Decentralized Coordinated
Motion Control of Manipulators with Vision and Force Sensors. In: Proc. of 1995 IEEE
Int. Conf. on Robotics and Automation, vol. 3, pp. 2456–24162 (1995)
8. Jadbabaie, A., Lin, J., Morse, A.S.: Coordination of Groups of Mobile Autonomous
Agents Using Nearest Neighbor Rules. IEEE Transactions on Automatic Control 48,
988–1001 (2003)
9. Turgut, A.E., Çelikkanat, H., Gökçe, F., Şahin, E.: Self-organized Flocking in Mobile
Robot Swarms. Swarm Intelligence 2, 97–120 (2008)
10. Gregoire, G., Tu, H.C.Y.: Moving and Staying Together Without a Leader. Physica D 181,
157–170 (2003)
11. Xu, W.B., Chen, X.B.: Artificial Moment Method for Swarm Robot Formation Control.
Science in China Series F: Information Sciences 51(10), 1521–1531 (2008)
12. Balcht, T., Arkin, R.C.: Behavior-based Formation Control for Multi-robot Teams. IEEE
Transactions on Robotics and Automation 14, 926–939 (1998)
13. Lawton, J.R., Beard, R.W., Young, B.J.: A Decentralized Approach to Formation Maneu-
vers. IEEE Transactions on Robotics and Automation 19, 933–941 (2003)
14. Das, A.K., Fierro, R., et al.: A vision-based formation control framework. IEEE Transac-
tions on Robotics and Automation 18, 813–825 (2002)
An Enhanced Formation of Multi-robot Based on A*
Algorithm for Data Relay Transmission

Zhiguang Xu1, Kyung-Sik Choi1, Yoon-Gu Kim2, Jinung An2, and Suk-Gyu Lee1
1
Department of Electrical Eng. Yeugnam Univ., Gyongsan, Gyongbuk, Korea
2
Daegu Gyeongbuk Institute of Science & Technology, Daegu, Korea
[email protected], {robotics,sglee}@ynu.ac.kr,
{ryankim9,robot}@dgist.ac.kr

Abstract. This paper presents a formation control method of multi-robot based


on A* algorithm for data relay transmission. In our system, we choose
Nanotron sensor and compass sensor to execute the tasks of distance
measurement, communication and obtaining moving direction. Since there
exists data disturbance from Nanotron sensor when there is an obstacle between
two robots. Therefore, we embed path planning algorithm information control.
The leader robot (LR) knows the whole information of environment, and sends
its moving information and corner information as a node to FRs. The FRs
regard the node information which received from LR as temporary target to
increase the efficiency of multi-robot formation by optimal path. From the
simulations and experiments, we will show desirable results of our method.

Keywords: multi-robot, formation, path planning, data relay transmission.

1 Introduction
In mobile robotics, robots execute their own tasks in unknown environment by
navigation, path planning, communication and etc. Recently, researchers focus on the
navigation in multi-robot system to deal with cooperation [1], efficient path planning
[2] [3], keeps the stability of navigation [4], and collision avoidance [5]. They have
got respectable results through simulations and some experiments.
For path planning algorithms, such as Genetic Algorithm, Ant Colony System, A*
Algorithm, Neural Network [6]-[9] are very favored by the researchers. Neural
network algorithm [9] implements path planning for multiple mobile robots to
coordinate with other avoiding moving obstacles. A* algorithm as one of graph search
algorithm provides the fastest search of shortest path under the same heuristic. In [3],
A* algorithm utilizes function to accelerate searching and reduce computational time.
In multi-robot system, the robots are not only required to avoid obstacles, but also
need to avoid collision among each other. To solve this problem, [11] adopted a
reactive multi-agent solution with decision agent and obstacle agent on a linear
configuration. The avoidance decision strategy will be acquired from the timely
observations of the decision agents’ organization and calculating the trajectories
interacted with other decision agents and obstacle agents. [5] developed a step
forward approach for collision avoidance in multiple robot system. They built

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 91–98, 2011.
© Springer-Verlag Berlin Heidelberg 2011
92 Z. Xu et al.

techniques from omni-directional vision systems, automatic control, and dynamic


programming. This strategy is avoiding static obstacles and dynamic objects from re-
establishing positions of each robot.
In our system, we assume LR knows whole information of environment. FRs
follow LR or forward FRs. We consider that the FRs use Nanotron sensor to obtain
distance information. Robots keep in certain distance with each other to avoid
collision between each robot. FRs follow the frontal robot in given distance range and
plan the path based on the knowledge of nodes which are received from LR. To get
shortest trajectory, FRs also apply A* algorithm.
The paper is organized as follows. Section 2 derives the mathematical descriptions
for embedding path planning algorithm in our system. In section 3, simulation results
which coded by Matlab show good performance of our proposed method. In section 4,
experiment result validates proposed method.

2 Related Works
In multiple robots system, there are three main control approaches, leader-follower
based approach [12], behavior-based approach [13] and virtual structure approach
[14]. Several control algorithms, such as EKF [15], I/O linearization [16], sliding
mode control method [17] are common used to control each robot. In practical,
computer process should be loaded if we use the control approaches and control
methods above to do multi-robot formation. However, the MCU of our system is
AVR, so it is difficult to make our system commercialize and industrialize. In our
robots’ system, each robot just utilizes on-board sensors to localization, but redundant
sensor data will make a great burden on the aim controller. Consequently, we adopt a
more realistic practical application to achieve our control goal which implant path
planning algorithm to each robot so as to reduce the computational burden and satisfy
our control system requirements.

2.1 System Structure

The system structure for the homogeneous robot is shown in Fig.1. There are two
robots for real experiment: one leader robot (LR) and one follow robot (FR). There
are two missions for FR. One is to maintain the given distance with LR, the other is to
determine the ideal temporary target based on A* algorithm when LR change its
moving direction. Generally, mobile robot measures the distance by motor encoders
or some kinds of distance measurement sensors while robots explore freely in an
experimental environment.
We consider a new approach to for team robots navigation based a wireless RF
module. The used wireless RF module is a Nanotron sensor node, Ubi-nanoLOC, which
is developed by HANBACK Electronics©[18]. The WPAN module is based on IEEE
802.15.4a protocol for high aggregate throughput communications with a precision
ranging capability in this system. Since the measured distance from the wireless module
may also include considerable error according to ambient environments, the system
adopted Kalman filter to reduce the localization error. LR knows the whole information
of the experiment environment, and realizes communication between two robots byad-
hoc routing application among multiple wireless communication modules.
An Enhanced Formation of Multi-robot Based on A* Algorithm 93

Fig. 1. System Structure of the Homogeneous Robot

We utilize a MCU, ATmega128 which adopts 8 bits control system which


developed by Atmel©[19], to control robot navigation. We also provide the direct
movement by compass sensor XG1010 which developed by Microinfinity©[20].

3 Algorithm Description
3.1 State Function and System Flow Chart
The motion of each robot is described in terms of P = (x, y, θ)T, where x, y and θ are the
x coordinate, the y coordinate and the bearing respectively. The trajectory of each robot
has the form of (x, y) with the velocity v and the angular velocity ω. The model of
robot Ri as the form of:

x& (t ) = v(t ) cosθ


y& (t ) = v(t ) sin θ (1)
θ&(t ) = ω (t )

Fig. 2 shows flow chart of LR process which knows the whole information of
environment. If LR does not reach its destination, LR will send MI to rear robots at
each time step, such as moving distance, heading angle and node information. When
LR arrives at a corner, it will turn 90 degree and regard the next step position of LR as
a node.
Fig. 3 describes flow chart of FR maintaining a given distance range. To reduce the
steps from start point to goal point and maintain the communication with LR, the FRs
use A* algorithm to plan the path, where the FRs make use of the information nodes
received from LR.

Fig. 2. Flow chart of LR


94 Z. Xu et al.

Fig. 3. Flow chart of FRs for maintaining a given distance with LR

For the FRs, the node is target which received from the LR. And when LR moves in
the environment, there is not only just one node. So the FRs must reach every node as
target. However, if two nodes are very close, to increase the efficiency of navigation,
the FRs will use A* algorithm to obtain a shortest path and eliminate useless nodes.

3.2 Path Planning

A* algorithm is widely used in graph searching algorithm which includes heuristic


function and evaluation function to sort the nodes. This algorithm expressed as f(n)
which consists of two functions as follows: g(n) is defined as the cost to go, and h(n) is
the cost remaining after having gone, and it is chosen as the Euclidean distance from
the other end of the new candidate edge to the destination. The searching process of
our system is as follows: (1) Marking the initial node and expanding the unmarked
subsequent nodes or the child nodes; (2) Calculating the evaluation function value for
each subsequent node, and sorting by evaluation function. Then identifying and
marking the node of the minimum evaluation function value; (3) Iterating upper steps
to recording the shortest path until the current node same as the goal node [2] [8].
f(n) = g(n) + h(n). (2)
Fig. 4 is pseudo code of A* in the simulation. In the pseudo code, EXP means
horizontal and vertical position on nodes, evolution function, cost function, and
heuristic value function, respectively. OPEN and CLOSE sets are to store available
path information and disable path information, respectively. To search evolution
function with minimum value, there is the comparison step between EXP and OPEN.
An Enhanced Formation of Multi-robot Based on A* Algorithm 95

1. If Have a new node(goal)


2. A*(current position, goal)
3. Closed set = the empty set
4. Open set = includes current position(CP)
5. Store node(start) in OPEN
6. While Open set != empty & path movable
7. Do calculate N’s G, H, F and save in EXP
8. Compare EXP and OPEN
9. IF F is minimum then flag=1
10. Else flag=0 and Add node(n) in OPEN
11. Update path cost and CLOSE
12. End while

Fig. 4. Pseudo code of A* algorithm

4 Simulation and Experiment

4.1 Simulations

We use Matlab 2008b to do the simulation of the multi-robot formation. The


simulation describes the moving trajectory of one LR and several FRs. We give the
whole map information and path to LR. FRs follow the LR, and planning the path to
keep the formation at the same time. The distance range is constrained from 1m to 3m.
The environment is separated by cells; one cell size is 1m by 1m.
Fig. 5 (a) describes the trajectories of one LR and three FRs which without A*
algorithm. The three FRs follow the leader robot by the same trajectory of leader robot.
In order to obtain accurate distance measurement data, the LR have to wait the rear
FRs until the distance between these two robots is in the minimize value when the LR
arrive a corner. It may also reduce the efficiency of formation. However, in Fig. 5 (b),
the black circles are denoted as the information nodes sending by leader robot. When
the distance measurement data is disturbed, especially the robots go through a corner,
LR will send the node information to the robots which moving behind it. Then the FRs
use A* algorithm to plan the shortest path to reach its node (as target) within minimize
steps. This method not only increases the efficiency of formation but also prevent the
effectiveness from data disturbance in Nanotron sensor.
Fig. 6 shows the step comparison histogram of one leader robot with different
number of follower robots from start point to goal point using A* algorithm or not. If
the leader robot just has one follower robot, the step of follower robot using A*
algorithm is 39, it is 12 steps less than the follower robot without A* algorithm. If the
leader robot has two follower robots using A* algorithm, the step of follower robots is
48, however, if the follower robots do not use A star algorithm, the step is 57. In three
robots case, the three robots use 48 steps from start point to goal point using A*
algorithm. If without A* algorithm, the three followers use 57 steps.
From the comparison, we get the result that the steps using A* algorithm is much
less than the steps of follower robots without A* algorithm. The follower robots are
able to reach their goal points using A* algorithm more efficiently.
96 Z. Xu et al.

Fig. 5. Trajectories of one LR and three FRs, (a) without A* algorithm, (b) using A* algorithm

Fig. 6. Step comparison histogram of one leader robot with different number of follower robots
using A* algorithm or not

4.2 Experiments
In the experiment, we embed whole map information to LR, such as the distance
between target and corner information. FR navigates autonomously and follows the LR
within a given distance to keep required form. When LR arrive at the corner, it will
send the corner information to FR for executing A* algorithm.
Each robot in the team executes localization using motor encoder and Nanotron
sensor data based on Kalman Filter. From some paper, the researchers obtain θ
calculated from the relationship between motor encoder and the length of robot’s two
wheels. However, heading angle data from compass sensor is more accurate than the
data calculated from encoder, we get θ value from XG1010 and let robot go straight
using XG1010. Robots are in indoor environment and there is just one corner. The
initial distance between LR and FR is 3 meter. When LR robot move 3 meter, it will
An Enhanced Formation of Multi-robot Based on A* Algorithm 97

experiment result:one leader robot and one follower robot


700
leader robot
follower robot
600

500

400
y-axis(cm)

300

200

100

-100
-100 0 100 200 300 400 500 600 700
x-axis(cm)

Fig. 7. Experiment result about LR and FR moving trajectories

turn left and send the node information to rear FR by Nanotron sensor. At this time,
FR plans optimal path to the temporary target based on A* algorithm to keep the
required form with LR. We get the robot’s position values and orientation value at
each step. And when robots go straight, from its real trajectory, we measure the error
in x axis and y axis is less than 1 centimeter. Then we use Matlab draw the
trajectories of each robot as Fig. 7.

5 Conclusion
In multiple mobile robots system, it is so important to share the moving information of
every robot to increase the efficiency of cooperation. The FRs move to its node (as
target) with A* path planning algorithm using the information node which is achieved
from LR. Basically, the proposed method obtains a respectable result as we want. The
steps of FRs using A* path planning algorithm are much less than the steps of FRs
without A* algorithm. The simulation and experiment results show that the robots
embedded A* algorithm could obtain better performance on efficiency.
For future research, we are going to realize this multi-robot formation control
among more number of robots. And we will consider more complex environment, such
as exist some obstacles.

Acknowledgment
This research was carried out under the General R/D Program of the Daegu
Gyeongbuk Institute of Science and Technology (DGIST), funded by the Ministry of
Education, Science and Technology (MEST) of the Republic of Korea.
98 Z. Xu et al.

References
1. Farinelli, A., Iocchi, L., Nardi, D.: Multi-robot Systems: A Classification Focused on
Coordination. IEEE Transactions on Systems, Man, and Cybernetics, Part-B:
Cybernetics 34(5), 2015–2028 (2004)
2. Wang, K.H.C., Botea, A.: Tractable Multi-Agent Path Planning on Grid Maps. In: Int.
Joint Conf. on Artificial Intelligence, pp. 1870–1875 (2009)
3. Seo, W.J., Ok, W.J., Ahn, J.H., Kang, S., Moom, B.: An Efficient Hardware Architeture of
the A-star Algorithm for the Shortest Path Search Engine. In: Fifth Int. Joint Conf. INC,
IMS and IDC, pp. 1499–1502 (2009)
4. Scrapper, C., Madhavan, R., Balakirsky, S.: Stable Navigation Solutions for Robots in
Complex Environments. In: Proc. IEEE Int. Workshop on Safety, Security and Rescue
Robotics (2007)
5. Cai, C., Yang, C., Zhu, Q., Liang, Y.: Collision Avoidance in Multi-Robot Systems. In:
Proc. IEEE Int. Conf. on Mechatronics and Automation, pp. 2795–2800 (2007)
6. Castillo, O., Trujillo, L., Melin, P.: Multiple objective optimization genetic algorithms for
path planning in autonomous mobile robots. Int. Journal of Computers, Systems and
Signals 6(1), 48–63 (2005)
7. Li, W., Zhang, W.: Path Planning of UAVs Swarm using Ant Colony System. In: Fifth Int.
Conf. on Natural Computation, vol. 5, pp. 288–292 (2009)
8. Yao, J., Lin, C., Xie, X., Wang, A.J., Hung, C.C.: Path planning for virtual human motion
using improved a star algorithm. In: Seventh Int. Conf. on Information Technology, pp.
1154–1158 (2010)
9. Li, H., Yang, S.X., Biletskiy, Y.: Neural Network Based Path Planning for A Multi-Robot
System with Moving Obstacles. In: Fourth IEEE Conf. on Automation Science and
Engineering (2008)
10. Otte, M.W., Richardson, S.G., Mulligan, J., Grudic, G.: Local Path Planning in Image
Space for Autonomous Robot Navigation in Unstructured Environments. Technical Report
CU-CS-1030-07, University of Colorado at Boulder (2007)
11. Sibo, Y., Gechter, F., Koukam, A.: Application of Reactive Multi-agent System to Vehicle
Collision Avoidance. In: Twentieth IEEE Int. Conf. on Tools with Artificial Intelligence,
pp. 197–204 (2008)
12. Consolini, L., Morbidi, F., Prattichizzo, D., Tosques, D.: A Geometric Characterization of
Leader-Follower Formation Control. In: IEEE International Conf. on Robotics and
Automation, pp. 2397–2402 (2007)
13. Balch, T., Arkin, R.C.: Behavior-based Formation Control for Multi-robot Teams. IEEE
Trans. on Robotics and Automation 14, 926–939 (1998)
14. Lalish, E., Morgansen, K.A., Tsukamaki, T.: Formation Tracking Control using Virtual
Structures and Deconfliction. In: Proc. of the 2006 IEEE Conf. on Decision and Control
(2006)
15. Schneider, F.E., Wildermuth, D.: Using an Extended Kalman Filter for Relative
Localisation in a Moving Robot Formation. In: Fourth Int. Workshop on Robot Motion
and Control, pp. 85–90 (2004)
16. Desai, J.P., Ostrowski, J., Kumar, R.V.: Modeling formation of multiple mobile robots. In:
Proc. of the 1998 IEEE Int. Conf. on Robotics and Automation, Leuven, Belgium (1998)
17. Sánchez, J., Fierro, R.: Sliding Mode Control for Robot Formations. In: Proc. of the 2003
IEEE Int. Symposium on Intelligent Control, Houston, Texas (2003)
18. Hanback Electronics, http://www.hanback.co.kr/
19. Atmel Corporation, http://www.atmel.com/
20. MicroInfinity, http://www.minfinity.com/
WPAN Communication Distance Expansion Method
Based on Multi-robot Cooperation Navigation

Yoon-Gu Kim1, Jinung An1, Kyoung-Dong Kim2, Zhi-Guang Xu2,


and Suk-Gyu Lee2
1
Daegu Gyeongbuk Institute of Science and Technology,
50-1, Sang-ri, Hyeonpung-myeon, Dalseong-gun, Daegu, Republic of Korea
2
Department of Electrical Engineering, Yeungnam University,
214-1, Dae-dong, Gyongsan, Gyongbuk, Republic of Korea
{ryankim9,robot}@dgist.ac.kr, [email protected],
[email protected], [email protected]

Abstract. Over the past decade, an increasing number of researches and devel-
opments for personal or professional service robots are attracting considerable
attention and interest in industry and academia. Furthermore, the development
of intelligent robots is strongly promoted as a strategic industry. To date, most
of the practical and commercial service robots are controlled remotely. The
most important technical issue of remote control is wireless communication, es-
pecially in indoor and unstructured environments where communication infra-
structure may be hampered. Therefore, we propose a multi-robot cooperation
navigation method for securing the communication distance extension of the
remote control based on wireless personal area networks (WPANs). The
concept and implementation of following navigation are introduced, and per-
formance verification is carried out through navigation experiments in real or
test-bed environments.

Keywords: WPAN, Communication distance expansion, Multi-robots, Remote


control.

1 Introduction
In fire-fighting and disaster rescue situations, fire fighters always face unpredictable
situations. The probability of unexpected accidents is increased as they cannot effec-
tively cope with such events, owing to which they experience mental and physical
strain. In contrast, a robot can be put in dangerous environments because it can be
controlled or navigated autonomously in a global environment. The use of robots to
accomplish fire-fighting missions can reduce much of the strain experienced by the
fire fighters. This is the reason for the development and employment of professional
robots for fire fighting and disaster prevention. Incidentally, fire sites are considered
in either the local or the global environment. If robots are placed in a global setting,
they have to secure reliable communication among themselves and the central control
system. Therefore, we approached the robot application from the point of view of fire
fighting and disaster prevention, which require reliable communication and highly
accurate distance measurements information.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 99–107, 2011.
© Springer-Verlag Berlin Heidelberg 2011
100 Y.-G. Kim et al.

The Kalman filter, a well-known algorithm widely applied in the robotics field, is
based on the linear mean square error filtering for state estimation. A set of mathe-
matical equations in the Kalman filter is implemented adequately as a compensator
and an optimal estimator for some types of noises. Therefore, it has been used for
stochastic estimation of measurements with noisy sensors. This filter can minimize
the estimated error covariance when the robot is placed under presumed conditions.
For the given spectral characteristics of an additive combination of signal and noise,
the linear operation based on these input yields the best results with minimum square
error of the signal from the noise. The distinctive feature of the Kalman filter, de-
scribed in its mathematical formulation in terms of a state space analysis, is that its
solution is computed recursively.
Park [1] approached the recognition of position and orientation of a mobile robot
using encoders and ubiquitous sensor networks (USNs). For this, the USNs are con-
sisted of four fixed nodes and a mobile node. The robot is based on the fuzzy algo-
rithm using information from the encoder and the USNs. Incidentally, this proposal
has errors in the recognition of a USN when considering the exploration of each robot
without fixed nodes. In addition, the noises caused by the friction between the road
surface and the wheels and the control error of the motor affect the localization esti-
mation acquired from the encoders. In addition, the measured errors accumulate while
a robot navigates. In order to solve these problems, we proposed a localization and
navigation system, which is based on the IEEE 802.15.4a protocol, to measure the
distance between the robots and a compass sensor to obtain the heading angle of each
robot. The IEEE 802.15.4a protocol allows for high aggregate throughput
communication with a precision ranging capability. Nanotron techniques developed
their first Chirp spread spectrum (CSS) smart RF module—smart nanoLOC RF with
ranging capabilities. The proposed method is based on a modified Kalman filter,
which is adapted in our system to improve the measurement quality of the wireless
communication module, and the compass sensor for reducing the error in the localiza-
tion and navigation process.
This paper is organized as follows. Section 2 introduces related works and dis-
cusses localization approaches and the application of the IEEE 802.15.4a protocol to
our system. Section 3 presents the proposed multi-robot-based localization and navi-
gation. Section 4 explains and analyzes the experimental results. Finally, Section 5
presents the conclusion of this research and discusses future research directions.

2 Related Works

2.1 Localization Approaches

In general, localization is divided into relative localization and absolute localization.


Relative localization is the process of estimating a mobile robot’s state or pose (loca-
tion and orientation) relative to its initial one in the environment; it also called dead
reckoning (DR). Generally, an encoder, a gyroscope, and an inertial measurement unit
(IMU) are used for localization by DR. It is easy and economically efficient to im-
plement DR localization; however, DR has a critical drawback in that it is easily af-
fected by external noises, resulting in error accumulation.
WPAN Communication Distance Expansion Method 101

Absolute localization is based on telemetric or distance sensors and may avoid the
error accumulation of relative localization. Absolute localization is a global localiza-
tion using which it is possible to estimate the current pose of the mobile robot even if
the conditions of the initial pose are unknown and the robot is kidnapped and tele-
ported to a different location [2]. The basic principle of absolute localization is based
on probabilistic methods and the robot’s belief or Bayes’ rule. The former is a prob-
ability density function of the possible poses. The latter updates the belief according
to the information. Taking into account the problem of approximating the belief, we
can classify localization into Gaussian filter-based localization and non-parametric
filter-based localization. The extended Kalman filter (EKF) [4] and the unscented
Kalman filter (UKF) [3] are included in the former. Markov localization [5] and
Monte Carlo localization [2] are included in the latter.
EKF localization represents the state or pose of the robot as Gaussian density to es-
timate the pose using EKF. UKF localization addresses the approximation issues of
the EKF. The basic difference between the EKF and the UKF stems from the manner
in which Gaussian random variables (GRV) are represented for propagating through
system dynamics [3]. In the EKF, state distribution is approximated by GRV, which is
then propagated analytically through the first-order linearization of a nonlinear sys-
tem. This can introduce large errors in the true posterior mean and the covariance of
the transformed GRV, which may lead to sub-optimal performance and sometimes
divergence of the filter. The UKF addresses this problem by using a deterministic
sampling approach. The state distribution is also approximated by GRV. In contrast, it
is now represented using a minimal set of carefully chosen sample points. These sam-
ple points completely capture the true mean and covariance of the GRV, which are
propagated through the true nonlinear system. The EKF achieves only first-order
accuracy. Neither the explicit Jacobian nor the Hessian calculation is necessary for the
UKF. Remarkably, the computational complexity of the UKF is of the same order as
that of the EKF [3].
Markov localization approximates the posterior pose of a robot using a histogram
filter over a grid decomposition of the pose space. Hence, it is called grid localization.
Monte Carlo localization approximates the posterior pose of a robot using a particle
filter that represents the pose of the robot by a set of particles with important weight.
This non-parametric filter-based localization can resolve the global localization and
kidnap problem through multi-modal distribution.

2.2 IEEE 802.15.4A

IEEE 802.15 related to the wireless personal area network (WPAN) is the standard
protocol developed by many task groups (TG) in IEEE. In particular, IEEE 802.15.4
is the standard of the low power for driving devices, the low cost for establishment,
and the available industrial, scientific, and medical (ISM) band. In addition, IEEE
802.15.4a provides enhanced ranging information among nodes through its adaptation
of wireless communication. As a result, we decided to use this protocol for sensor
networking. IEEE 802.15.4a was standardized in August 2007 based on the low com-
plexity, low cost, and low energy in a WPAN environment and its capability to simul-
taneously allow for correspondence and distance measurement. IEEE 802.15.4a
chooses two PHY techniques, namely, the ultra-wide band (UWB) method and the
102 Y.-G. Kim et al.

chirp spread spectrum (CSS) method with the centre of Samsung and Nanotron[6, 7].
UWB is a technique used for local distance communication and it is used for commu-
nicating signals with shorter pulse width in the baseband without a carrier. Owing to
the extremely short pulse width, the applied frequency bandwidth is long. Therefore,
it appears as though normal noise exists in channels of low output power. It does not
affect the wireless device. However, it is difficult in long distance communication
because it is a baseband communication and the output has low voltage. Its frequency
range is 3.4 GHz~10 GHz.
CSS was developed in the 1940s, and it is referred to as the dolphin and bat com-
munication. It has been typically used in radars because it has some advantages such
as strong interference and availability to long distance communication. After 1960, it
was expanding into industrialization, and grafted linear sweep into chirp signal to get
the significant information. CSS uses its entire allocated bandwidth to broadcast a
signal, making it robust to channel noise. Moreover, even in the low voltage case,
multi-path fading will not be much affected. The frequency of the CSS method is the
2.4 GHz ISM band.

3 System Architecture

Figure 1 shows the proposed WPAN communication expansion scenario, which is


based on the multi-robot cooperative robot navigation. There are two robots in this
system: the leader robot (LR) and the follow robot (FR). Figure 2 shows the system
architecture of the proposed scenario. In the proposed system, an operator controls the
navigation of the LR by a remote controller and the FR navigates autonomously but
also following the LR within a certain distance, which is limited to 2~3m. The ulti-
mate mission of the FR is to secure the reliability of wireless communication among
multiple communication nodes. Therefore, when the communication distance between
the FR and the remote controller exceeds a valid communication distance or when
communication is unstable, the FR stops following the LR and its mission changes to
that of a communication relay node. Normally, localization and navigation research
regarding a mobile robot has been using the distances measured by the motor encod-
ers while the robots explore freely in an experimental environment. However, pro-
gressive error accumulation cannot be ignored and the result is a very serious but also
unanticipated navigation error, especially in fire-fighting or disaster conditions.
Therefore, we considered a new approach to the optimal navigation in rough terrains,
which is a wireless RF-module-based navigation. The used wireless RF module is a
Nanotron sensor node, i.e., Ubi-nanoLOC, which is developed by Hanbeak elec-
tron©[8]. The WPAN module is based on the IEEE 802.15.4a protocol for high
aggregate throughput communication with a precision ranging capability. The meas-
ured distance from the wireless module may also include considerable error in an
ambient environment. Because of this reason, the system adopted the Kalman filter to
reduce the measurement error. LR can communicate with a remote controller through
an ad-hoc routing application among multiple wireless communication modules, and
the FR can undertake the assignment of securing the reliability of communication.
WPAN Communication Distance Expansion Method 103

We utilized a micro-controller, i.e., the ATmega128, which adopts an 8-bit control


system developed by Atmel© for controlling robot navigation. The driving perform-
ance of the DC motors was enhanced by PID control, and the accuracy of the compass
was secured by fusing two sensors, namely, an AMI302 compass sensor, developed
by Aichi Steel© [9], and an XG1010 gyro sensor, developed by Micro-Infinity© [10].
Fig. 3 shows the operation flow of the LR and FR systems. The leader robot is op-
erated by a remote control. When the robot receives navigation commands such as
forward, backward, and turn left and right, it moves depending on the commands.
Moreover, it has more abilities such as acceleration and deceleration based on the PID
control. The operation flow of the FR system maintains a valid distance between the
robots through the ranging information from the LR. As mentioned above, the meas-
ured distance from the WPAN has various error factors because each robot always
moves. Here, the applied Kalman filter works for compensating for the errors.

Fig. 1. Wireless communication expansion scenario based on a multi-robot system

Fig. 2. System architecture


104 Y.-G. Kim et al.

Fig. 3. Operation flowcharts of leader robot and follower robot

4 Experimental Results
For this experiment, we placed two robots at certain distances in the linear corridor.
Figure 4 shows the measured distance errors with keeping each distance interval
between the FR and the LR. This experiment shows that the error for maintaining a
specific interval is decreased when the Kalman filter is applied to the distance meas-
urement. The Kalman filter estimates a more correct distance between the predicted
encoder distance information and the measured WPAN distance information, as
summarized in equations (1)–(7). We tried to simulate how well the FR follows LR
by the leader following operation, which is based on the WPAN distance measure-
ment. Figure 5 shows the simulation results of the leader following navigation of a
follower robot. The simulation is the navigation results of the FR following the LR
and navigating in a 10 m × 10 m area. The RF sensor data and compass sensor data
have uncertainty error factors. Therefore, the objective of the proposed system is to
achieve accuracy in the WPAN sensor network system by using the Kalman filter.
However, the Kalman filter requires a considerable amount of data for the estimation.
Even in this case, the system cannot move perfectly when the measurement data are
dispersed. To solve this problem, we have to ignore the dispersed data. Therefore, the
system cannot avoid resulting in errors. Figure 6 shows an experiment of the multi-
robot cooperation navigation for valid wireless communication distance expansion.

xˆ k−+1 = xˆ k + u k + wk , (1)

d k +1 = d k − ( Δt vk cosθ k )2 + ( Δt vk sinθ k )2 (2)


,
WPAN Communication Distance Expansion Method 105

Δt vk tanφk
θ k +1 = θ k + (3)
L ,

Pk−+1 = Pk + σ w2 k (4)
,

Pk−+1
K = (5)
Pk−+1 + σ RF
2
k +1
,

xˆ k +1 = xˆ k−+1 + K ( z k +1 − xˆ k−+1 ) , (6)

Pk +1 = Pk−+1 (1 − K ) (7)
.

Fig. 4. Measured distance errors with keeping each interval b/t FR and LR

Fig. 5. Simulation of the leader robot following navigation by a follower robot


106 Y.-G. Kim et al.

Fig. 6. Multi-robot cooperation navigation for wireless communication distance expansion

5 Conclusion
We proposed a multi-robot cooperation navigation method for securing a valid
communication distance extension of the remote control based on the WPAN. The
concept and implementation of the LR following navigation were introduced, and
performance verification was carried out through navigation experiments in real or
test-bed environments. The proposed multi-robot cooperation navigation method
verified the effect and reliability of securing valid wireless communication and ex-
panding the valid communication distance in an indoor and special-purpose service
robot.

Acknowledgments. This research was carried out under the General R/D Program
sponsored by the Ministry of Education, Science and Technology(MEST) of the
Republic of Korea and the partial financial support by the Ministry of Knowledge
Economy(MKE), Korea Institute for Advancement of Technology(KIAT) and Daegu-
Gyeongbuk Leading Industry Office through the Leading Industry Development for
Economic Region.

References
1. Jong-Jin, P.: Position Estimation of a Mobile Robot Based on USN and Encoder and De-
velopment of Tele-operation System using the Internet. The Institute of Webcasting Inter-
net and Telecommunication (2009)
2. Sebastian, T., Dieter, F., Wolfram, B., Frank, D.: Robust Monte Carlo Localization for
Mobile Robots. Artificial Intelligence 128, 99–141 (2001)
3. Wan, E.A., van der Merwe, R.: Kalman Filtering and Neural Networks. In: The Unscented
Kalman Filter, ch. 7. Wiley, Chichester (2001)
4. Greg, W., Gary, B.: An Introduction to the Kalman Filter. Technical Report: TR 95-041,
University of North Carolina at Chapel Hill (July 2006)
5. Dieter, F., Wolfram, B., Sebastian, T.: Active Markov Localization for Mobile Robots in
Dynamic Environments. Journal of Artificial Intelligence Research 11(128), 391–427
(1999)
6. Jeon, H.S., Woo, S.H.: Adaptive Indoor Location Tracking System based on IEEE
802.15.4a. Korea Information and Communications Society 31, 526–536 (2006)
WPAN Communication Distance Expansion Method 107

7. Lee, J.Y., Scholtz, R.A.: Ranging in a Dense Multipath Environment using an UWB Radio
Link. IEEE Journal on Selected Areas in Comm. 20(9) (2002)
8. http://www.hanback.co.kr/
9. http://www.aichi-steel.co.jp/
10. http://www.minfinity.com/
Relative State Modeling Based Distributed Receding
Horizon Formation Control of Multiple Robot Systems*

Wang Zheng1,2, He Yuqing2, and Han Jianda2


1
Graduate School of Chinese Academy of Sciences, Beijing, 100049, P.R. China
2
State Key Laboratory of Robotics, Shenyang Institute of Automation, Shenyang,
110016, P.R. China
{wzheng,heyuqing,jdhan}@sia.cn

Abstract. Receding horizon control has been shown as a good method in multiple
robot formation control problem. However, there are still two disadvantages in
almost all receding horizon formation control (RHFC) algorithms. One of them is
the huge computational burden due to the complicated nonlinear dynamical opti-
mization, and the other is that most RHFC algorithms use the absolute states di-
rectly while relative states between two robots are more accurate and easier to be
measured in many applications. Thus, in this paper, a new relative state modeling
based distributed RHFC algorithm is designed to solve the two problems referred
to above. Firstly, a simple strategy to modeling the dynamical process of the rela-
tive states is given; Subsequently, the distributed RHFC algorithm is introduced
and the convergence is ensured by some extra constraints; Finally, formation con-
trol simulation with respect to three ground robots is conducted and the results
show the improvement of the new given algorithm in the real time capability and
the insensitiveness to the measurement noise.

Keywords: multiple robot system, formation control, distributed receding hori-


zon control, relative state model.

1 Introduction
Formation control, multiple robot systems working together as a fixed geometry con-
figuration, has been widely researched in the past decades. And a great deal of strate-
gies have been introduced and presents their great validity in both theory and reality,
such as leader-following[1], behavior based[2], and virtual structure [3], etc.
Receding horizon control (RHC), also called model predictive control (MPC), with
the abilities of handling constraints and optimization, has been paid more and more
attentions in the field of formation control in most recent. However, there are. One of
the huge disadvantages in almost all existing receding horizon formation control
(RHFC) is the huge computational burden due to the required online optimization
algorithm. In order to solve this problem, distributed RHFC (DRHFC) seems a good
method and some researching works have been published [4-9].
*
This work is supported by the Chinese National Natural Science Foundation: 61005078 and
61035005.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 108–117, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Relative State Modeling Based Distributed Receding Horizon Formation Control 109

However, there are some problems in DRHFC algorithms in most practical appli-
cations: 1) Absolute states of each individual robot are difficult to be obtained by
other robots since intercommunication is lack of reliability in poor environment. 2)
Most DRHFC algorithms use the absolute states directly while relative states between
two robots are more accurate and easier to be measured in many applications[16].
Relative state model, i.e., which determines the relative motion law between two
robot systems considering each individual model simultaneously in detail, is a new
concept originated from the multiple satellite formation control[10]. Both relative
kinematics model[11] and relative dynamics model[12] described this kind of relative
motion. And, these relative state models have been applied to many distributed forma-
tion problems recently.
In this paper, a new DRHFC strategy is proposed by introducing relative state
model to deal with the above disadvantages. And the remainder of this paper is organ-
ized as follows: First, in section 2, the relative state model between two robot systems
and a whole formation model are derived. Second, the formation strategy and distrib-
uted control law is realized in section 3. Subsequently, in section 4, a simulation re-
sults are presented to verify the validity of the proposed algorithm. Finally, the
conclusions are given in section 5.

2 System Modeling

2.1 Relative Model

We consider the formation control problem of N (N≥2) robot systems, and each indi-
vidual robot’s dynamical model can be denoted as follows,
xi0 = fi 0 ( xi0 , ui ) (1)

where xi0 ∈ \ n (i=1,2,…,N) and ui ∈ \ m are the state vector and control input vector
of the ith robot, respectively; f i 0 (⋅) are some nonlinear smooth functions with pre-
defined structure.
Generally, Eq.(1) describes the motion of the robot system in the global coordinate
fixed with the earth [14-15]. Thus, xi0 is often called absolute states.
Actually, for most member robot system in a formation, only relative states infor-
mation are necessary to keep a high precise formation, so it is necessary to obtain the
dynamical equation considering relative states between two interested robot systems.
In this paper, we denote the relative model of robot i and robot j as follows,

x ij = f ji ( xij , ui , u j ) (2)

where xij ∈ \n is the relative state vector with the same dimensions of individual
state xi and x j . ui , u j ∈ \ m are the control input of robot i and j, respectively. The
methods for modeling relative state equations can be founded in [11] and [12].
110 W. Zheng, H. Yuqing, and H. Jianda

2.2 Formation Model

In a formation control problem, suppose that every robot i has ni neighbor robots
(neighbors of the ith robot mean the robots which can exchange information with
robot i), and all the neighbors of robot i consist of a set Ni.
There are two roles in our formation architecture, Na (Na≤N) leaders and N-Na fol-
lowers. Leaders mean these robots which know their own desired states profile. While
Followers denote these robots have no a priori-knowledge about their own desired
states profile, and they can only follow their neighbor robots to keep the formation.
Thus, the leader robot can be modeled using absolutely state equation and the fol-
lower robot can be modeled as several relative state equations with his neighbor ro-
bots. Thus, each robot’s state equation combined to its neighbors can be denoted as
follows,
⎡ xi0 ⎤ ⎡ fi 0 ( xi0 , ui ) ⎤
⎢ ⎥ ⎢ ⎥
⎢# ⎥=⎢ # ⎥ (3.a)
⎢ x ij ⎥ ⎢ f ji ( xij , ui , u j ) ⎥
⎢ ⎥ ⎢ ⎥
⎣⎢ # ⎦⎥ ⎣⎢ # ⎦⎥

⎡#⎤ ⎡ # ⎤
⎢ x i ⎥ = ⎢ f i ( x i , u , u ) ⎥ (3.b)
⎢ j⎥ ⎢ j j i j ⎥
⎢⎣ # ⎥⎦ ⎢⎣ # ⎥⎦

where vector xi = [ xi " xij "]T and xi = [" xij "]T denote the leader and follower’s
states, respectively. For the purpose of simplification, Eq.(3.a) and Eq.(3.b) can be
transformed uniformly as
xi = fi ( xi , ui , u− i ) (4)

where u−i = ["u j "]T is all the neighbors’ control inputs. Combining all the system
states and models, the whole formation system’s formation model should be ex-
pressed as follow,
x = f ( x, u ) (5)

where x = [ x1 ," xN ]T is the total states of all robots, and u = [u1 ,", uN ]T the total
control input. f ( x, u ) = [" fi ( xi , ui , u− i )"]T is the summation of all the individual
robots’ model (4).

3 Distributed Receding Horizon Formation Control

3.1 Cost Function

Before introducing the distributed receding horizon formation control algorithm, we


first give some notations used in the following sections. For any vector x ∈ \ n , x
Relative State Modeling Based Distributed Receding Horizon Formation Control 111

= xT Px is P-weight 2-norm of vector x, where P is an


2
denotes the vector norm. x P

arbitrary positive-definite real symmetric matrix. Also, λmax ( P ) and λmin ( P) denote
the largest and smallest eigenvalue of P, respectively. xijc , xi0c , xic and
x c = [ x1c ," xNc ]T are the desired states.
In general, the following cost function is used in RHFC algorithm,
N ⎧ ⎫⎪
N
⎪ 1
L( x, u ) = ∑ L ( xi , ui ) = ∑ ⎨γ xi0 − xi0 c + (1 − γ ) ∑ x ij − x ij c
2 2
+ ui
2
⎬ (6)
i =1 ⎪ ⎪⎭
Qi0 2 Q ij Ri
i =1 ⎩ j∈ N i

where
⎧1 for i ∈ {1,… , Na} (robot i is a leader)
γ =⎨
⎩0 for i ∈ {N − Na,… , N } (robot i is a follower)

is a positive constant for distinguishing leader and follower. Weighted matrix Qi0 , Qij
and Ri are all positive definite matrixes, and Qij = Qi j .
Let Q = diag ("Qi0 "Qij ") and R = diag (" Ri ") , the integrated cost function
can be equivalently rewritten as
2 2
L ( x, u ) = x − x c + u R
(7)
Q

Splitting the cost function (7) as following distributed cost function for each individ-
ual robot,
1
+ (1 − γ ) ∑ xij − x ij c
2 2 2
Li ( xi , ui ) = xi − xic + ui = γ xi0 − xi0 c + ui
2 2
(8)
Qi Ri Qi0 2 Qij Ri
j ∈N i

Then, the distributed formation control problem can be described as: Design some
distributed controllers ui = ki ( xi ) by solving a optimal control problem with respect
to the distributed const function (8) for each individual robot i to make the formation
system (5) converge to the desired formation state x c .

3.2 Algorithm

Since some cost Li(xi,ui) depends upon the relative states xij , which is subject to dy-
namics model (2), robot i must predict some relative trajectories according to ui and u-
i over each prediction horizon. That means, during each update, robot i will receive an
assumed control trajectories uˆi (⋅; tk ) from its neighbors[9]. Then, by solving the opti-
mal control problem using model (2), the assumed relative state trajectories can be
computed. Likewise, robot i should transmit an assumed control to all neighbors for
their own behavior optimization. Thus, the optimal control problem for each individ-
ual robot system can be denoted as
112 W. Zheng, H. Yuqing, and H. Jianda

Problem 1. For every robot i ∈ {1,… , N } and at any update time tk, give initial condi-
tions xi(tk), and assumed controls uˆ−i (⋅; tk ) , for all s ∈ [tk , tk + T ] find

J i∗ ( xi (tk )) = min J i ( xi (tk ), ui (⋅; tk )) (9)


ui ( ⋅ )

where
tk + T
J i ( xi (tk ), ui (⋅; tk )) = ∫ Li ( xi ( s; tk ), ui ( s; tk ))ds + M i ( xi (tk + T ; tk ))
tk

subject to dynamics constrains


xi ( s; tk ) = f i ( xi ( s; tk ), ui ( s; tk ), u−i ( s; tk ))
input constrains
ui ( s ; t k ) ∈ U
terminal constrains
xi (tk + T ; tk ) ∈ Ωi (ε i ) (10)

and compatibility input constrains


ui (s; tk ) − uˆi ( s; tk ) ≤ δ 2κ (11)
2
where terminal set is defined as Ω i (ε i ) = { xi | xi − xic ≤ ε i } , given the constants
κ , ε i ∈ (0, ∞) .
Constraint (11) is used to reduce the prediction error due to the difference between
what a robot plans to do and what neighbors believe that robot will plan to do. Details
about defining constraint (11) can be found in [9]. Terminal function Mi(.) should be
chosen to drive the terminal state enter the terminal set (10) so that system close-loop
stability can be guaranteed. By solving Problem 1, the optimal control solution, we
can obtain the optimal control profile ui∗ (τ ; tk ) , τ ∈ [tk , tk + T ] . And, the close-loop
system for which stability is to be guaranteed is
x(τ ) = f ( x(τ ), u∗ (τ )), τ ≥ t0 (12)

with the applied distributed receding horizon control law


u∗ (τ ; tk ) = (u1∗ (τ ; tk ),", u∗N (τ ; tk ))

for τ ∈ [tk , tk + 1) , and the receding horizon control law is updated when each new
initial state update x(tk ) ← x(tk +1 ) is available. Following the succinct presentation in
[9], we state the control algorithm.
Algorithm 1. At time t0 with initial state xi(t0), the distributed receding horizon con-
troller for any robot i ∈ {1,… , N } is as follows,
Relative State Modeling Based Distributed Receding Horizon Formation Control 113

Data: xi(t0), T ∈ (0, ∞) , δ ∈ (0, T ] . Initialization: At time t0, solve Problem 1 for
robot i, setting uˆi (τ ; t0 ) = 0 and uˆ−i (τ ; t0 ) = 0 for all τ ∈ [t0 , t0 + T ] and removing
constraint (11). At every update interval,
(1) Over any interval [tk, tk+1):
a) Apply ui∗ (τ ; tk ) , τ ∈ [tk , tk +1 ) ,
b) Compute uˆi (τ ; tk +1 ) = uˆi (τ ) as
⎧⎪u ∗ (τ ; tk ) τ ∈ [tk +1 , tk + T )
uˆi (τ ; tk +1 ) = ⎨ i
⎪⎩0 τ ∈ [tk + T , tk +1 + T ]

c) Transmit uˆi (τ ; tk +1 ) to neighbors and receive uˆ−i (τ ; tk +1 ) from neighbors.


(2) At any time tk:
a) Measure current state xi(tk),
b) Solve Problem 1 for robot i, yielding ui∗ (τ ; tk ) , τ ∈ [tk , tk + T ] .

3.3 Stability Analysis

In this section, the stability analysis of algorithm 1 is given and the main result is
somewhat similar to the work in reference [9]. So, the primary lemmas and theorems
will be given with a simple explanation.
Lemma 1. For a given fixed horizon time T>0, and for the positive constant ξ de-
fined by
ξ = 2ρmax λmax (Q)ANT δ 2κ
The function J*(.) satisfies
N
J ∗ ( x(tk +1 )) − J ∗ ( x(tk )) ≤ − ∑ ∫
tk +1
Li ( xi∗ ( s; tk ), ui∗ ( s; tk ))ds + ξδ 2 (13)
tk
i =1

for any δ ∈ (0, T ] .


In (13), ρ max ∈ (0, ∞) is a positive constant for restricting the state boundary, such
that xi ( s; tk ) − xic ≤ ρ max for all s ∈ [tk , tk + T ] . Constant A ∈ (0, ∞) , restricting the
boundary of uncontrollable input, satisfies x ij1 − xij 2 ≤ A u j1 − u j 2 at invariant ui
*
subject to relative model (2). Lemma 1 shows that J (.), the optimal value function,
decreases from one update to the next along the actual closed-loop trajectories by
properly choosing the update interval δ . That is, sufficient small δ will ensure the
monotonically decreasing characteristic of objective function J*(.), satisfies
2
J ∗ ( x(τ )) − J ∗ ( x(tk )) ≤ − x(tk ) − x c (14)
Q

Theorem 1. For a given fixed horizon time T>0 and for any state x (t0 ) ⊂ X at ini-
tialization, if there exist an proper update time δ satisfies (14), then the formation
can converge to x c asymptotically.
114 W. Zheng, H. Yuqing, and H. Jianda

A small fixed upper bound on δ is provided that guarantees all robots have
reached their terminal constraint sets via the distributed receding horizon control.
After applying the previous lemmas, J*(.) is shown to be a Lyapunov function for the
closed-loop system and the remainder of the proof follows closely along the lines of
the proof of Theorem 1 in [13].

4 Simulation
In this section, we will conduct some simulations to verify the supposed algorithm.
Considering two dimensional bicycle-style robot system, shown an Fig.1, and its
absolute and relative state model are stated as,
⎡ xi ⎤ ⎡υi cos θi ⎤
⎢ y ⎥ ⎢υ sin θ ⎥
⎢ i⎥ =⎢ i i ⎥
(15.a)
⎢θi ⎥ ⎢ ui1 ⎥
⎢ ⎥ ⎢ ⎥
⎣υi ⎦ ⎣ ui 2 ⎦

⎡ x ij ⎤ ⎡υ j cos θ ij − υi + y ij ui1 ⎤
⎢ i⎥ ⎢ ⎥
⎢ y j ⎥ = ⎢ υ j sin θ j − x j ui1 ⎥
i i
(15.b)
⎢θij ⎥ ⎢ −ui1 + u j1 ⎥
⎢ ⎥ ⎢ ⎥
⎢⎣ υi ⎥⎦ ⎢⎣ ui 2 ⎥⎦

Simulation of three robots formation is presented. Robot-1 is a leader robot. As


two followers, robot-2 follows robot-1 by the measured relative states, and robot-3
simultaneously follows robot-1 and robot-2. Set update interval δ =0.2s and predict
horizon T=1s. At initial time, there robots are located at (2, 2), (1, 3) and (3, 1) in the
global coordinate, respectively, and the desired formation is at initial time instant. In
order to do some comparisons, we conducts the simulations using both absolute state
based DRHFC method (DRHFC-A) and the supposed algorithm in this paper
(DRHFC-B). The simulations are conducted using Matlab Optimization Toolbox
solver at a PC (Intel(R) Core(TM)i5, M450 @ 2.40GHz).

υj
υj
θj θ ij

(x j , y j ) ( xij , y ij )

υi υi
θi θi
( xi , yi ) ( xi , yi )

Fig. 1. Absolute and relative modeling of robots


Relative State Modeling Based Distributed Receding Horizon Formation Control 115

Simulation 1: Time consuming


There robots, keeping initial formation geometry, move along x axis with velocity
1m/s during the first 5 seconds. At the time 5s, an abrupt change of the leader robot’s
desired position in the direction of y axis happens, i.e., the desired trajectory of the
leader robot is as follows,
⎧[2 + t , 2], t ∈ [0,5]

⎩[2 + t ,3], t ∈ [5,10]
The whole simulation takes 10 second, and the trajectories of the robots are shown in
Fig.2 where the five dashed circles of each individual robot denote their five different
predictive states in every time interval.
The relative position between robot-1 and robot-2 is shown at Fig. 3, where the
dashed line denote the simulation results of DRHFC-A (algorithm in reference [9])
and the solid line describe the results of DRHFC-B (the proposed algorithm in this
paper). From Fig.3, it can be seen that the precision of these two algorithms is similar.
Time=5.4s Time=9.6s

8 Leader 8 Leader
Follower1 Follower1
Follower2 Follower2
6 6

4 4
Y(m)

Y(m)

2 2

0 0

-2 -2

0 5 10 15 0 5 10 15
X(m) X(m)

Fig. 2. Trajectories of three robots formation at 5.4s and 9.6s respectively

Since DRHFC-B takes one relative model instead of two absolute models while
solving the optimal problem at every interval. The computing time will be naturally
reduced. Computing time of the two algorithms is shown in Fig.4, with average cost
time Time(DRHFC-A)=3.18 and Time(DRHFC-B)=1.81. That means DRHFC-B is
more effective than DRHFC-A. Also, comparisons are conducted in different simula-
tion environment as shown in Table 1, and the similar results can be concluded.
Relative Positions
1.5 14
Time(A)=3.1848
Time(B)=1.8053
12
x12(m)

1
10
CPU Time(s)

0.5 8
0 1 2 3 4 5 6 7 8 9 10

-0.5 6

4
y12(m)

-1
2

-1.5 0
0 1 2 3 4 5 6 7 8 9 10 -2 0 2 4 6 8 10 12
Time(s) Time(s)

Fig. 3. Relative positions of robot 1 and 2 Fig. 4. Computing time at every update interval
116 W. Zheng, H. Yuqing, and H. Jianda

Table 1. Computing time in different environments

Environment Computing time


Hareware Solve method DRHFC-A DRHFC-B Saving time
Intel(R) Core(TM)2 Line-Search 2.82 1.63 42.20%
Duo [email protected] L-M method 5.34 4.07 23.78%
Intel(R) Core(TM) i5 Line-Search 3.18 1.81 41.90%
M450 @ 2.40GHz L-M method 5.90 4.47 24.24%
Intel(R) Core (TM)2 Line-Search 6.32 3.93 37.82%
Duo 6300 @ 1.86GHz L-M method 7.42 4.69 36.79%
AMD Athlon(TM)64×2 Line-Search 6.21 3.61 41.87%
Dual Core 4000+ L-M method 7.48 4.28 42.78%

Simulation 2: Insensitiveness to measurement noise


There robots keep initial formation geometry stationary for 10 seconds, this time
desired trajectory of the leader robot changes to be
[2, 2], t ∈ [0,10]
Since there is no filter in controllers, and σ2=0.01m2 Gaussian white noise con-
tained in every measured absolute and relative states, robots’ formation are interfered
dramatically as shown in Fig.5. We chose object function J*(.)>0 to measure the sta-
tionary noise disturbance, larger J*(.) representing stronger disturbance. Fig.6 displays
the compared cost function with average J*(A)=0.01196 and J*(B)=0.00351. That
means DRHFC-B has less disturbance than DRHFC-A.
Relative Positions
0.035
1.2 J(A)=0.011959
1.1 J(B)=0.0035076
0.03
x12(m)

0.9 0.025

0.8
0.02
Cost(m2)

0 1 2 3 4 5 6 7 8 9 10

0.015
-0.8

-0.9 0.01
y12(m)

-1

-1.1 0.005

-1.2
0
0 1 2 3 4 5 6 7 8 9 10 -2 0 2 4 6 8 10 12
Time(s) Time(s)

Fig. 5. Relative positions of robot 1 and 2 Fig. 6. Effect of noise disturbance

5 Conclusion
In this paper, a new decentralized receding horizon formation control based on rela-
tive state model was proposed. The new designed algorithm has the following advan-
tages: 1) The relative states, instead of the absolute states are used, since the latter is
the only requirement for most member robots in a formation and easier to be meas-
ured; 2) Computation burden and influence from measurement noise is reduced.
However, as a classical leader-follower scheme, some disadvantages will still exist in
the proposed algorithm, which is common in most DRHFC algorithm. Such as, how
to select proper parameters as the receding horizon time T and update period δ .
Relative State Modeling Based Distributed Receding Horizon Formation Control 117

References
1. Das, A.K., Fierro, R., Kumar, V.: A vision-based formation control framework. J. IEEE
Transactions on Robotics and Automation 18(5), 813–825 (2002)
2. Balch, T., Arkin, R.C.: Behavior-based formation control for multi-robot teams. J. IEEE
Transactions on Robotics and Automation 14(6), 926–939 (1998)
3. Lewis, M.A., Tan, K.H.: High precision formation control of mobile robots using virtual
structures. J. Autonomous Robots 4(4), 387–403 (1997)
4. Camponogara, E., Jia, D., Krogh, B.H., Talukdar, S.: Distributed model predictive control.
J. IEEE Control Systems Magazine 22(1), 44–52 (2002)
5. Motee, N., Sayyar-Rodsari, B.: Optimal partitioning in distributed model predictive con-
trol. In: Proceedings of the American Control Conference, pp. 5300–5305 (2003)
6. Jia, D., Krogh, B.H.: Min-max feedback model predictive control for distributed control
with communication. In: Proceedings of the American Control Conference, pp. 4507–4512
(2002)
7. Richards, A., How, J.: A decentralized algorithm for robust constrained model predictive
control. In: Proceedings of the American Control Conference, pp. 4261–4266 (2004)
8. Keviczy, T., Borrelli, F., Balas, G.J.: Decentralized receding horizon control for large scale
dynamically decoupled systems. J. Automatica 42(12), 2105–2115 (2006)
9. Dunbar, W.B., Murray, R.M.: Distributed receding horizon control for multi-vehicle for-
mation stabilization. J. Automatica 42(4), 549–558 (2006)
10. Inalhan, G., Tillerson, M., How, J.P.: Relative dynamics and control of spacecraft forma-
tions in eccentric orbits. J. Guidance, Control, and Dynamics 25(1), 48–59 (2002)
11. Chen, X.P., Serrani, A., Ozbay, H.: Control of leader-follower formations of terrestrial
UAVs. In: Proceedings of Decision and Control, pp. 498–503 (2003)
12. Wang, Z., He, Y.Q., Han, J.D.: Multi-unmanned helicopter formation control on relative
dynamics. In: IEEE International Conference on Mechatronics and Automation, pp. 4381–
4386 (2009)
13. Chen, H., Allgower, F.: Quasi-infinite horizon nonlinear model predictive control scheme
with guaranteed stability. J. Automatica 34(10), 1205–1217 (1998)
14. Fukao, T., Nakagawa, H., Adachi, N.: Adaptive tracking control of a nonholonomic mo-
bile robot. J. IEEE Transactions on Robotics and Automation 16(5), 609–615 (2002)
15. Béjar, M., Ollero, A., Cuesta, F.: Modeling and control of autonomous helicopters. J. Ad-
vances in Control Theory and Applications 353, 1–29 (2007)
16. Leitner, J.: Formation flying system design for a planer-finding telescope-occulter system.
In: Proceedings of SPIE the International Society for Optical Engineering, pp. 66871D-10
(2007)
Simulation and Experiments of the Simultaneous
Self-assembly for Modular Swarm Robots

Hongxing Wei1, Yizhou Huang1, Haiyuan Li1, and Jindong Tan2


1
School of Mechanical Engineering and Automation,
Beijing University of Aeronautics and Astronautics, 100191, Beijing, China
[email protected]
2
Electrical Engineering Department,
Michigan Technological University, 49931, Houghton, USA
[email protected]

Abstract. In our previous work, we have proposed a distributed self-assembly


method based on Sambot platform. But there have interference of the infrared
sensors between multiple Sambots. In this paper, two interference problems
with multiple DSAs are solved and a novel simultaneous self-assembly method
is proposed to enhance the efficiency of the self-assembly of modular swarm
robots. Meanwhile, the simulation platform is established; some simulation ex-
periments for various configurations are made and the results are analyzed for
finding out evidence for further improvement. The simulation and physical ex-
periment results verify the effectiveness and scalability of the simultaneous
self-assembly algorithm which is more effective to shorten the assembly time.

Keywords: swarm, self-assembly, modular robot.

1 Introduction
The self-assembly has been paid special attention to in modular robot field, and has
made a remarkable progress. Self-assembly can realize autonomous construction of
configurations which refers to organizing a group of robot modules into a target ro-
botic configuration through self-assembly without human interventions [1]. Because
in modular swarm robotic field, the basic modules in most cases usually can not move
on their own or only have very limited ability of autonomous locomotion, their initial
configuration is generally manually assembled. However, once the robotic configura-
tion is established, the number of modules is fixed, leading to difficulties to add new
modules without external direction [2].
The self-assembly provides an efficient way for autonomous construction for modu-
lar swarm robots [3]. A group of modules or individual robots with the same function
through self-assembly are connected into robotic structures, which have higher capabili-
ties of locomotion, perception and operation. Bojinov [4], Klavins [5], Grady [6] et al
respectively proposed some self-assembly control methods in different ways.
We have designed a newly developed robotic module named as Sambot, which is
an autonomous mobile robot having the characteristics of chain-type and mobile self-
reconfigurable robots. Each Sambot has one active docking interface and four passive

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 118–127, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Simulation and Experiments of the Simultaneous Self-assembly 119

docking interfaces. It can move fully autonomously and dock with another Sambot
from four directions. Through docking with each other, multiple Sambots can organ-
ize into a collective robot [7].
The algorithm for self-assembly is complex and because of high cost of hardware
experiment, a simulation platform for Sambot robot is required. Using Microsoft
Robotics Studio (MSRS), we design a simulation platform according to physical
Sabmot system and some simulation experiments of autonomous construction for
various configurations are conducted.
In our previous work [7], [8], we have proposed a distributed self-assembly method
based on Sambot platform. There have three types of Sambots, including Docking
Sambots (DSA), SEED and Connected Sambots (CSA). Single DSA experiments for
some configurations have been conducted. But because there have interference of the
infrared sensor between multiple Sambots, the simultaneous self-assembly have not
been realized. In this paper, two interference problems in Wandering and Locking
phase are found out and solved. A simultaneous self-assembly method is designed to
enhance the efficiency of the self-assembly of modular swarm robots. Meanwhile, the
simultaneous docking of multiple Sambots in Locking phase has been realized. The
simulation and physical experiment results show that the simultaneous self-assembly
control method is more effective for the autonomous construction of swarm robots.
The paper is organized as follows. In section 2, the overall structure of the Sambot
robot is described and simulation platform of Sambot is introduced. In section 3, two
interference problems in Wandering and Locking phase are analyzed and a simultane-
ous self-assembly control method is proposed. In section 4, based on the Sambot
simulation platform, some simulation experiments are demonstrated to verify the self-
assembly algorithm suitable for autonomous construction of various configurations.
The simulation results are provided and analyzed. In section 5, physical experiments
are implemented and the results are discussed. Finally, conclusions are given and the
ongoing work is pointed out.

2 The Sambot Robot and Simulation Platform

2.1 Overall Structure of Sambot

The Sambot is an autonomous mobile and self-assembly modular robot including a


power supply, microprocessors, sensors, actuators, and a wireless communication
unit, which is composed of a cubic main body and an active docking interface, as
shown in Fig. 1 (a).
The control system of each Sambot is composed of a main microcontroller and
four salve microprocessors. The Sambot has two types of communications: the Zig-
Bee wireless communication and the CAN bus communication. The former can be
used to achieve global wireless communication among multiple Sambots but here it is
not used. The latter takes effect only after two or multiple Sambots finish docking. In
autonomous construction phase, CAN bus is adopted to communicate the information
and command between Sambots and share the parameters. The bus can support 128
nodes at the most which is large enough for most engineering application.
120 H. Wei et al.

(a) (b) (c) (d)

Fig. 1. The structure of Sambot. (a) a Sambot robot ; (b) simulated Sambot module; (c) simu-
lated cross quadruped configuration; (d) simulated parallel quadruped configuration.

2.2 Simulation Platform

While some researches are being performed, we use Microsoft Robotics Studio
(MSRS) to build our simulation platform for more complex strcuture and large quan-
tity of swarms. The simulation model is shown in Fig. 1(b). To realize physics-based
simulation, we should design a class which contains inspection module, control
module and execution module (as shown in Fig. 2). The inspection module contains
gyroscope, infrared sensor and bumper sensor. Control module works as ports in
simulation environment which receives message from inspection module and makes
decision according to the information. Then robot carries out performance according
to these decisions. In Fig. 1 (c) and (d), the simulated cross quadruped configuration
and simulated parallel quadruped configuration have been demonstrated.

Analysis of
Gyroscope Configuration
Simulation engine
Infrared Inspection Control AGEIA XNA
sensor module module PhysX render
Bumper Execution
sensor module

Navigati
Wander Dock
on

Fig. 2. Frame of simulation platform

3 The Simultaneous Self-assembly Algorithms


This section presents three types of roles of different Sambot robots in the self-
assembly control model and a newly improved simultaneous self-assembly algorithm.
In our previous work [8], a control model consisiting of SEED, CSA, DSA, and
CCST (configuration connection state table) is proposed. However in phase of self-
assembly, only a single docking Sambot can enter the platform and finish docking
once, which is obviously efficient for large quantity of swarms. But, here, a simulta-
neous self-assembly algorithm for various configuration is designed for improving the
assembly. These are carried out based on a bounded experimental platform.
Simulation and Experiments of the Simultaneous Self-assembly 121

3.1 Interferences of Self-assembly with Multiple DSAs


In our previous work [8], in order to avoid the collision of simultaneous docking, the
DSA Sambots are added onto the experimental platform one by one. The DSA is an
“isolated” wander and does not have the information of the target configuration and
the global coordination. Its controller works according to a series of behaviors of the
DSA, including Wandering, Navigation, Docking and Locking. Obviously, if simul-
taneous docking is available, efficiency of self-assembly would be improved.
Self-assembly with multiple DSAs has two interference problems to solve. One
appears in Wandering phase, and the other in Locking phase.

SEED DSA
Sambo DSA (1)
t Left
(1)

DSA SEED
Sambo Front
(2)
t
(2 A
DS

Back Right
)

(a) (b)

Fig. 3. Two interference situations. (a) DSA’s detecting infrared sensors are interfered by an-
other DSA. (2) Information conflict of simultaneous docking for multiple DSAs.

1. In the phase of Wandering, when there is only one DSA to dock with the current
configuration, the DSA searches for the Docking_Direction (infrared emitters) without
another DSA’s interference. However, if there are multiple DSAs wandering simultane-
ously, interference would occur from other Sambots’ infrared emitters. In such cases,
the DSAs might mistake anther DSA as the current configuration and then miss the
target. As shown in Fig. 3 (a), in the process of searching SEED or CSA, detecting
sensors of DSA (2) detect DSA (1) before find SEED and DSA (1) is mistaken as cur-
rent configuration. Then it will navigate around DSA (1). Although DSA (2) still can
get away from DSA (1) after DSA (1) is not within the perception of DSA (2), this
process is unprofitable. So it is necessary to distinguish current configuration and DSA.
2. In the Locking phase, for simultaneous docking of multiple Sambots, informa-
tion transmitting conflict can cause the deadlock. Because of CAN bus characteristics
and sensors’ limitation, the bus is shared simultaneously by two or more Docking
Sambots. When two docking interfaces of current configuration are docked with
Sambot A and B meanwhile, Sambot A waits for the record end of Sambot B while
Sambot B waits for record end of Sambot A. For example, in Fig. 3 (b) DSA (1) and
DSA (2) are docking simultaneously with the SEED, the SEED needs to communicate
with them. However, in previous self-assembly algorithm, docking time difference is
used to recognize which interface is docked with and further define DSA’ node num-
ber in connection state table, here which is unavailable and need to be improved.

3.2 Solution of Interference and Simultaneous Self-assembly Algorithm


To achieve simultaneous self-assembly with multiple DSAs, We proposed the im-
proved algorithm to solve the interference problems for obtain better efficiency of
self-assembly with multiple DSAs.
122 H. Wei et al.

A fixed
angle
SEED SEED SEED SEED
Sambo Sambo Sambo Sambo
t t t t
DSA DSA

DSA
DSA

(a) (b) (c) (d)

Fig. 4. Operation scenario of DSA detecting current configuration (here only SEED)

A fixed
angle
Anothe
r Anothe
r Anothe
DSA r Anothe
DSA DSA
DSA r DSA
DSA

DSA
DSA

(a) (b) (c) (d)

Fig. 5. Operation scenario of DSA detecting another DSA

1. In order to avoid infrared sensors’ interference in the phase of Wandering, not-


ing that the DSAs are wandering but the current configuration always remains static.
Therefore, when detecting infrared sensors of a DSA receive signals, the object de-
tected by the DSA may be the current configuration or another DSA. At that moment,
the DSA moves forward a short step and then rotates around it by certain angle. After
this moment, if the signal disappears, the object in the front must be another DSA.
Otherwise, it might be the current configuration. However, a possible exception is that
two or more DSAs might be interfered simultaneously, which may lead to a wrong
judgments. In this situation, the DSA would be in endless deadlock. A function is
designed to monitor this sutiation periodically to terminate the deadlock.
The Wandering algorithm for multiple DSAs is improved as the following example.
If the object detected by DSA is the current configuration, operation scenario is as
shown in Fig. 4. First, detecting infrared sensors of DSA have input signal reflected by
SEED (a). DSA rotates to the right by a certain angle (b). After that, DSA takes a certain
distance forward (c). Then, DSA rotates turns left by a fixed angle and detects SEED
again (d). However, as shown in Fig. 5, if the detected object is another DSA, the object
will move away from the original place in movement process of DSA and at last is not
within perception of DSA (Fig. 5 (d)). Therefore, this method can be used to distinguish
current configuration and DSA and solve sensors’ interference problem.
2. Referring to the ordered resource allocation policy to prevet the deadlock by
Havender, solution to avoid information conflict is designed. To solve the problem, an
ordered conmunication process is introduced. Here, the four interfaces of the same
Sambot belongs a group and the smaller the Samot’s node number in connection state
table the smaller group number. Meanwhile, the four interfaces in the same group as
the suquence of the front, left, back and right are respectively defined as 1, 2, 3 and
4(shown in Fig. 6). Fig 3 (b) gives a possible deadlock. Once deadlock happens,
Simulation and Experiments of the Simultaneous Self-assembly 123

interface of a lower number (here back) is delayed, until the information of high num-
ber has been transmitted and deadlock is removed, that is, comunication is running as
an ordered allocation.
Two improved algorithms to solve the corresponding interference problems are
added to self-assembly control method. Multiple DSAs are able to simultaneously
self-assembly into the target configuration according to design requirement. Obvi-
ously, it will shorten the assembly time, which would be analyzed in next sections
through simulation and physical experiments.

the ordered resource allocation

Left front right back


Group 1

Communication bus

Group 2
Left front right back

Fig. 6. Solution to avoid the information conflict using ordered resource allocation policy

4 Simulation and Analysis

4.1 Simulation of Snake-Like and Quadruped Configuration

On the simulation platform, we try to construct a snake-like configuration and a cross


quadruped configuration. We will take experiments with 5, 6 and 7 robots as exam-
ples to show the distribution of completion time.
Fig. 7 shows the process of the self-assembly experiments of snake-like and cross
quadruped configuration. Fig. 8 shows the distribution of completion time. As shown
in these graphs, as the number of robots increases, the completion time grows quickly.
To show the variation trend of completion time, we expand the number of robots to
11 robots. As shown in Fig. 9, the curvilinear path of completion time is almost like
quadratic curve whose slope grows. When the number of robots grows to a certain
value, the completion time becomes unacceptable. However, the slope of expected
curve should stay the same or even goes down.
To explain the phenomenon, we can focus on a single robot. For each robot, the
time of docking with other robot stays the same. Then we should pay attention to
wandering and navigating state. In wandering state, as the number of robots grows,
the probability of interference from other robots increases. It becomes more difficult
to find the SEED or CSA. In navigating state, there are two main reasons: one is the
distance for a robot to navigate increases as the configuration grows. The second is
the interference from other robots when one robot is in navigating state, it can be
brought back to wandering state by other robot. So to reduce the completion time, we
should improve the wandering and navigating algorithms.
124 H. Wei et al.

(a) snake-like configuration (b) cross quadruped configuration

Fig. 7. The self-assembly experiments of the snake-like and cross quadruped configuration on
simulation platform

(a) Snake-like configuration (b) Cross quadruped configuration

Fig. 8. Completion time of snake-like and cross quadruped configuration on simulation


platform

Fig. 9. Average time of snake-like and cross quadruped configuration

4.2 The Simulation of Complex Configuration

Fig. 10 shows the process of the self-assembly experiments of H-form and parallel
quadruped configuration on simulation platform. The Fig.11 shows the distribution of
completion time.
Simulation and Experiments of the Simultaneous Self-assembly 125

Fig. 10. The self-assembly experiments on the H-form and quadruped configuration on simula-
tion platform

Fig. 11. Distribution of completion time of the H-form and quadruped configurations on simu-
lation platform

5 Physical Experiments


Based on Sambot modules, on a platform of 1000 mm 1000 mm, we conduct the
simultaneous self-assembly experiments with multiple DSAs for both the snake-like
and the quadruped configurations. The SEED is also located at the platform center,
but the DSAs are put randomly at the four corners.
1. The simultaneous self-assembly of the snake-like configuration with multiple
DSAs is shown in Fig. 12. As for linear configuration, in simultaneous self-assembly
process, simultaneous docking conflict doesn’t exit but DSA’s sensors are possible to
be interfered by another DSA.
2. The simultaneous self-assembly of the quadruped configuration with multiple
DSAs. As indicated by the red arrows in Fig. 13, all the four lateral interfaces of the
SEED are Docking-Directions which remarkably enhance the experimental effi-
ciency. Transmitting information conflict to the deadlock and sensor interference are
possible to happen. However, the simultaneous self-assembly algorithm can deal with
the problems. The experimental results verify the effectiveness of the algorithm.
126 H. Wei et al.

Fig. 12. The self-assembly experiment of the snake-like configuration with multiple DSAs

Fig. 13. The self-assembly experiment of the quadruped configuration with multiple DSAs

6 Conclusions and Future Work


This paper proposed a simultaneous self-assembly control algorithm based on our
novel self-assembly modular robot, Sambot. That can be used to realize reconfigura-
tion by autonomous construction. Each module of Sambot is a fully self-contained,
mobile robot that has the characteristics of both the chain-type and the mobile swarm
robots. In distributed state, each DSA is an autonomous mobile robot; the control
model has distributed characteristics. A simultaneous self-assembly algorithm is pro-
posed to enhance the docking efficiency by solving transmitting information conflict
and sensor interference. On simulation platform, we make simultaneous self-assembly
experiments of various configuration and analyze the efficiency. We succeed in
autonomously constructing the snake-like and the quadruped configurations with five
Sambots on physical platform which verify simultaneous self-assembly control algo-
rithm comparing the previous researches.
Simulation and Experiments of the Simultaneous Self-assembly 127

Some ongoing researches still deserve studying. It is significant that wandering and
navigating algorithm still needs further improvement using evolutionary algorithm.
Moreover, it is necessary to establish an autonomous control system for the self-
assembly of some given configurations, the movement of the whole configuration, the
evolutionary reconfiguration to another arbitrary robotic structure and so on.

Acknowledgments
This work was supported by the 863 Program of China (Grant No. 2009AA043901
and 2009AA043903), National Natural Science Foundation of China (Grant No.
60525314), Beijing technological new star project (Grant No. 2008A018).

References
1. Whitesides, G.M., Grzybowski, B.: Self-Assembly at All Scales. J. Science 295, 2418–
2421 (2002)
2. Christensen, A.L., Grady, R.O., Dorigo, M.: Morphology Control in a Multirobot System.
J. IEEE Robotics & Automation Magzine 14, 18–25 (2007)
3. Anderson, C., Theraulaz, G., Deneubourg, J.L.: Self-assemblages in Insect Societies. J.
Insectes Sociaux 49, 99–110 (2002)
4. Bojinov, H., Casal, A., Hogg, T.: Multiagent Control of Self-reconfigurable Robots. J.
Artificial Intelligence 142, 99–120 (2002)
5. Klavins, E.: Programmable Self-assembly. J. IEEE Control Systems Magazine 27, 43–56
(2007)
6. Christensen, A.L., O’Grady, R., Dorigo, M.: Morphology Control in a Multirobot System.
J. IEEE Robotics & Automation Magazine 14(4), 18–25 (2007)
7. Hongxing, W., Yingpeng, C., Haiyuan, L., Tianmiao, W.: Sambot: a Self-assembly Modu-
lar Robot for Swarm Robot. In: The 2010 IEEE Conference on Robotics and Automation,
pp. 66–71. IEEE Press, Anchorage (2010)
8. Hongxing, W., Dezhong, L., Jiandong, T., Tianmiao, W.: The Distributed Control and Ex-
periments of Directional Self-assembly for Modular Swarm Robot. In: The 2010 IEEE/RSJ
International Conference on Intelligent Robots and Systems, pp. 4169–4174. IEEE Press,
Taipei (2010)
Impulsive Consensus in Networks of Multi-agent
Systems with Any Communication Delays

Quanjun Wu1, , Li Xu1 , Hua Zhang2 , and Jin Zhou2


1
Department of Mathematics and Physics, Shanghai University of Electric Power,
Shanghai, 200090, China
[email protected]
2
Shanghai Institute of Applied Mathematics and Mechanics and Shanghai Key
Laboratory of Mechanics in Energy Engineering, Shanghai University,
Shanghai, 200072, China

Abstract. This paper considers consensus problem in directed networks


of dynamic agents having communication delays. Based on impulsive con-
trol theory on delayed dynamical systems, a simple impulsive consensus
protocol for such networks is proposed, and a generic criterion for solving
the average consensus problem is analytically derived. Compared with
some existing works, a distinctive feature of this work is to address av-
erage consensus problem for networks with any communication delays.
It is shown that the impulsive gain matrix in the proposed protocol play
a key role in seeking average consensus problems. Simulations are pre-
sented that are consistent with our theoretical results.

Keywords: average consensus; impulsive consensus; directed delayed


networked multi-agent system; fixed topology; time-delay.

1 Introduction

Recently, the distributed coordination in dynamic networks of multi-agents has


emerged as a challenging new research area. The applications of multi-agent
systems are diverse, ranging from cooperative control of unmanned air vehicles,
formation control of mobile robots, control of communication networks, design
of sensor-network, to flocking of social insects, swarm-based computing, etc.,
[1,2,3,4]. Agreement and consensus protocol design is one of the important prob-
lems encountered in decentralized control of communicating-agent systems.
To achieve cooperative consensus, a series of works have been performed re-
cently [1,2,3,4,5,6]. Jadbabaie et al. provided a theoretical explanation for the
consensus behavior of the Vicsek model using graph theory [1]. Fax et al. empha-
sized the role of information flow and graph Laplacians and derived Nyquist-like
criterion for stabilizing vehicle formations [2]. Olfati-Saber et al. investigated a
systematical framework of consensus problem in networksof agents. Three con-
sensus problems were discussed: directed networks with fixed topology, directed

Corresponding author.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 128–135, 2011.

c Springer-Verlag Berlin Heidelberg 2011
Impulsive Consensus in Delayed Networks of Multi-agent Systems 129

networks with switching topology, as well as undirected networks with commu-


nication time-delays and fixed topology [3]. Moreau used a set-valued Lyapunov
approach to study consensus problems with unidirectional time-dependent com-
munication links [4]. Ren et al. extended the results to unidirectional commu-
nication and relaxed the connectivity assumption to the assumption that the
union of the communication graphs has a spanning tree [5].
Time-delays often occur in such systems as transportation and communica-
tion systems, chemical and metallurgical processes, environmental models and
power networks [7,8,9]. In many scenarios, networked systems can possess a dy-
namic topology that is time-varying due to node and link failures/creations,
packet-loss, asynchronous consensus, state-dependence, formation reconfigura-
tion, evolution, and flocking. There has been increasing interest in the study of
consensus problem in dynamic networks of multi-agents with time-delays in the
last several years [3,7,8,9]. It has been noticed the existing studies on consen-
sus problem are predominantly to give some consensus protocols for networks
of dynamic agents having communication delays with various network topology.
However, these consensus protocols are only valid for some specific small com-
munication delays [3,7,8,9]. For example, Olfati-Saber et al. discussed average
consensus problems in undirected networks having a common constant commu-
nication delay with fixed topology and switching topology. They presented the
following main result (See Theorem 10 in [3]): A sufficient and necessary con-
dition for seeking average consensus in an undirected connected network is that
the communication delays are less than a positive threshold. Therefore, this mo-
tivates the present investigation of average consensus problems in networks of
dynamic agents for any communication delays particularly regarding practical
engineering applications.
This present paper considers consensus problem in directed networks of dy-
namic agents with fixed topology for any communication delays. It can general-
ize to the case of switching topology. The primary contribution of this work is
to propose a novel yet simple impulsive consensus protocol for such networks,
which is the generalization of corresponding results existing in the literature. A
generic criterion for solving the average consensus problem is derived based on
impulsive control theory on delayed dynamical systems. It is demonstrated that
average consensus in the networks is heavily dependent on impulsive gain matrix
in the proposed consensus protocol. Finally, simulations are presented that are
consistent with our theoretical results.
The paper is organized as follows. A simple impulsive consensus protocol is
proposed in Section 2. In Section 3, we focus on the average consensus prob-
lem in directed delayed networks of dynamic agents with fixed topology. Some
simulation results are provided in Section 4. Finally, we summarize the main
conclusions in Section 5.

2 Consensus Algorithms
Let R = (−∞, +∞) be the set of real numbers, R+ = [0, +∞) be the set
of nonnegative real numbers, and Z + = {1, 2, · · · } be the set of positive integer
130 Q. Wu et al.

numbers. For the vector x = [x1 , · · · , xn ] ∈ Rn , x denotes its transpose. Rn×n


stands for the set of n × n real matrixes, for the matrix A = [aij ]n×n ∈ Rn×n ,
A denotes its transpose, As = (A + A )/2 stands for the symmetric part of A.
The spectral norm of A is defined as A = [λmax (AA )]1/2 . E is the identity
matrix of order n.
In this paper, we are interested in discussing average consensus problem in
directed delayed networks of dynamic agents with fixed topology, where the
information (from vj to vi ) passes through edge (vi , vj ) with the coupling time-
delays 0 < τ (t) ≤ τ . Here we assume that the communication topology of G
is balanced and has a spanning tree. Moreover, each agent updates its current
state based upon the information received from its neighbors.
As 
L is a balanced matrix, an average consensus is asymptotically reached and
α = ( i xi (0))/n = Ave(x). The invariance of Ave(x) allows decomposition of
x according to the following equation:
x = α1 + η, (1)
where η = (η1 , · · · , ηn )T ∈ Rn satisfies 1T η = 0. Here, we refer to η as the
(group) disagreement vector. The vector η is orthogonal to 1 and belongs to an
(n − 1)-dimensional subspace.
Let xi be the state of the ith agent. Suppose each node of a graph is a dynamic
integrator agent with dynamics:
ẋi (t) = ui (t), i = 1, 2, · · · , n. (2)
where ui (t) is the control input (or protocol) at time t.
In [3], Olfati-Saber and Murray presented the following linear time-delayed
consensus protocol:
  
ui (t) = aij xj (t − τ ) − xi (t − τ ) . (3)
vj ∈Ni

They presented the following main result [3]:


Proposition 1. Assume the network topology G is fixed, undirected, and con-
nected. Then, the protocol (3) globally asymptotically solves the average-consensus
problem if and only if the following condition is satisfied:
(i) τ ∈ (0, τ ∗ ) with τ ∗ = π/2λn , λn = λmax (L).
Obviously, the consensus protocol (3) is invalid for any τ ≥ τ ∗ . The main objec-
tive of this section is to design and implement an appropriate protocol such that
(2) uniformly asymptotically solves the average consensus problem for any com-
munication delays. This is to say, limt→+∞ xi (t) − xj (t) = 0, for τ ∈ (0, +∞)
and all i, j ∈ Z + .
Based on impulsive control theory on delayed dynamical systems, we propose
the following impulsive consensus protocol:
  
ui (t) = aij xj (t − τ (t)) − xi (t − τ (t))
vj ∈Ni


+∞ 
+ bij (xj (t) − xi (t))δ(t − tm ), (4)
m=1 vj ∈Ni
Impulsive Consensus in Delayed Networks of Multi-agent Systems 131

where bij ≥ 0 are constants called as the control gain, δ(t) is the Dirac function
[9,10].
Remark 1. If bij = 0 for all i, j ∈ n, then the protocol (4) becomes a linear con-
sensus protocol (3) corresponding to the neighbors of node vi . Clearly, consensus
protocol (4) is the generalization of corresponding results existing in the litera-
ture [3,7,8,9]. It should be noted that the latter part of the impulsive consensus
protocol (4) has two aims. On one hand, if τ (t) < τ ∗ , we can utilize it to accel-
erate the average consensus of such systems. On the other hand, if τ (t) ≥ τ ∗ , it
can solve average consensus for any communication time-delays. This point will
be further illustrated through the numerical simulations.
Under the consensus protocol (4), the system (2) has the following form

ẋ(t) = −Lx(t − τ (t)), t = tm , t ≥ t0 ,
(5)
Δx(t) = x(t) − x(t− ) = −M x(t), t = tm , m ∈ Z + ,

where M = (mij )n×n is a Laplacian defined by mij = −bij , j = i, and mij =


 n s
=i bik , j = i. The eigenvalues of the matrix M can be ordered as 0 =
k=1,k
s s s
λ1 (M ) < λ2 (M ) ≤ · · · ≤ λn (M ). Moreover, η evolves according to the (group)
disagreement dynamics given by

η̇(t) = −Lη(t − τ (t)), t 
= t m , t ≥ t0 ,
(6)
(E + M )η(t) = η(t− ), t = tm , m ∈ Z +

In what follows, we will consider the average consensus problem of (5) with
fixed topology. We will prove that under appropriate conditions the system
achieves average consensus uniformly asymptotically.

3 Main Results
Based on stability theory on impulsive delayed differential equations, the follow-
ing sufficient condition for average consensus of the system (5) is established.
Theorem 1. Consider the delayed dynamical network (5). Assume there exist
positive constants α, β > 0, such that for all m ∈ N , the following conditions
are satisfied:
s 
 2λ2 (M ) + λ2 (M M ) · L
(A1 ) 2 +  ≤ α;
(A2 ) ln 1 + 2λ2 (M s ) + λ2 (M  M ) − α(tm − tm−1 ) ≥ β > 0.
Then the delayed dynamical network (5) achieve average consensus uniformly
asymptotically.
Proof. Since the graph G has a spanning tree, by using Lemma 3.3 in [5], then
its Laplacian M has exactly one zero eigenvalue and the rest n− 1 eigenvalues all
have positive real-parts. Furthermore, M s is a symmetric matrix and has zero
row sums. Thus, the eigenvalues of matrices M s and M  M can be ordered as

0 = λ1 (M s ) < λ2 (M s ) ≤ · · · ≤ λn (M s ),
132 Q. Wu et al.

and 0 = λ1 (M  M ) < λ2 (M  M ) ≤ · · · ≤ λn (M  M ).
On the other hand, since M s and M  M are symmetric, by the basic theory of
Linear Algebra, we know
η  (t)M s η(t) ≥ λ2 (M s )η  (t)η(t), 1 η = 0. (7)

η  (t)M  M η(t) ≥ λ2 (M  M )η  (t)η(t), 1 η = 0. (8)


Let us construct a Lyapunov function of the form
1 
V (t, η(t)) = η (t)η(t). (9)
2
When t = tm , for all η(t) ∈ S(ρ1 ), 0 < ρ1 ≤ ρ, we have
η  (tm )(E + M  )(E + M )η(tm ) = η  (t− −
m )η(tm ),

By (7) and (8), we get


 
1 + 2λ2 (M s ) + λ2 (M  M ) η  (tm )η(tm ) ≤ η  (t− −
m )η(tm ),

that is
1
V (tm , η(tm )) ≤ V (t− −
m , η(tm )) (10)
1 + 2λ2 (M s ) + λ2 (M  M )
1
Let ψ(t) = t, then ψ(t) is strictly increasing and
1 + 2λ2 (M s ) + λ2 (M  M )
ψ(0) = 0 with ψ(t) < t for all t > 0. Hence, the condition (ii) of Theorem
1 in [10] is satisfied.
For any solutions of Eqs. (6), if
V (t − τ (t), η(t − τ (t))) ≤ ψ −1 (V (t, η(t)). (11)
Calculating the upper Dini derivative of V (t) along the solutions of Eqs. (6),
and by using the inequality x y + y  x ≤ εx x + ε−1 y  y, we can get that
 
D+ V (t) = −η  Lη(t − τ (t)) ≤ L · V (t, η(t)) + sup V (s, η(s))
t−τ ≤s≤t
 
≤ 2 + 2λ2 (M s ) + λ2 (M  M ) · LV (t, δ(t)) ≤ αV (t, η(t)).

Letting g(t) ≡ 1 and H(t) = αt. Thus, the condition (iii) of Theorem 1 in [10] is
satisfied. The condition (A2 ) of Theorem 1 implies that
μ tm
ds
− g(s) ds
ψ(μ) H(s) tm−1
1
1
= ln μ − ln[ s 
μ] − (tm − tm−1 )
α 1 + 2λ2 (M ) + λ2 (M M )
ln[1 + 2λ2 (M s ) + λ2 (M  M )] β
= − (tm − tm−1 ) ≥ > 0.
α α
Impulsive Consensus in Delayed Networks of Multi-agent Systems 133

The condition (iv) of Theorem 1 in [10] is satisfied. Let w1 (|x|) = w2 (|x|) = |x2 |/2,
so the condition (i) of Theorem 1 in [10] is satisfied. Therefore, all the conditions
of Theorem 1 in [10] are satisfied. This completes the proof of Theorem 1.
Remark 2. Theorem 1 shows that, average consensus of the delayed dynamical
network (5) not only depends on the topology structures of the entire network,
but also is heavily determined by the impulsive gain matrix M and the impulsive
interval tm − tm−1 . In addition, the conditions of Theorem 1 are all sufficient
conditions but not necessary, i.e., the dynamical networks can achieve average
consensus uniformly asymptotically, although one of the conditions of Theorem
1 may fail.

4 Simulations
As an application of the above theoretical results, average consensus problem for
delayed dynamical networks is worked out in this section. Meanwhile, simulations
with various impulsive gains matrices are given to verify the effectiveness of the
proposed impulsive consensus protocol, and also visualize the impulsive gain
effects on average consensus problem of the delayed dynamical networks.
Here we consider a directed network with fixed topology G having 100 agents
as in Fig. 1. It is easy to see that G has a spanning tree. Matrix L is given by
⎛ ⎞
2 −1 0 · · · −1
⎜ −1 2 −1 · · · 0 ⎟
⎜ ⎟
⎜ ⎟
L = ⎜ 0 −1 2 · · · 0 ⎟ .
⎜ .. .. .. . . .. ⎟
⎝ . . . . . ⎠
−1 0 0 ··· 2 100×100

Fig. 1. A directed network with fixed topology having 100 agents

For simplicity, we consider the equidistant impulsive interval tm − tm−1 ≡ Δt.


It is easy to verify that if the following conditions hold,
 
2 + 2λ2 (M s ) + λ2 (M  M ) × 4 ≤ α;
 
and ln 1 + 2λ2 (M s ) + λ2 (M  M ) − α(tm − tm−1 ) ≥ β > 0.
134 Q. Wu et al.

12

10

x (t),(i=1,2,...,100)
6

i
4

0
0 5 10 15 20
t

Fig. 2. The change process of the state variables of the delayed dynamical network (5)
without impulsive gain in case τ (t) = τ ∗ = π/8

10 10

9 9

8 8

7 7
x (t),(i=1,2,...,100)

x (t),(i=1,2,...,100)
6 6

5 5

4 4
i

3 3

2 2

1 1

0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
(a) t (b) t

Fig. 3. Average consensus process of the agents state of the delayed dynamical network
(5) with different impulsive gains matrices in case τ (t) = 1.0

then all the conditions of Theorem 1 are satisfied, which means the delayed
dynamical network (5) achieve average consensus uniformly asymptotically.
Let the equidistant impulsive interval be taken as Δt = 0.02. Fig. 2 is the
simulation result corresponding to change process of the state variables of the
delayed dynamical network (5) having the communication delay τ (t) = τ ∗ =
π/2λn = π/8 with the impulsive gain matrix M = 0 in time interval [0, 20].
This clearly shows that average consensus is not asymptotically reached, which
is consistent with the result of Proposition 1. Fig. 3 demonstrates the change
process of the state variables of the delayed dynamical network (5) having the
communication delay τ (t) = 1 with different impulsive gain mij = −0.015, i = j,
mij = 1.485, i = j, α = 30, β = 2.7322 and mij = −0.018, i = j, mij = 1.782,
i = j, α = 36, β = 2.9169 in time interval [0, 2], respectively, which satisfy the
conditions of Theorem 1. It can be shown that impulsive average consensus is
finally achieved, and the impulsive gain matrix heavily affect consensus of the
delayed dynamical network.

5 Conclusions

This paper has developed a distributed algorithm for average consensus in di-
rected delayed networks of dynamic agents. We have proposed a simple impulsive
consensus protocol for such networks for any communication delays, and some
generic sufficient conditions under which all the nodes in the network achieve
Impulsive Consensus in Delayed Networks of Multi-agent Systems 135

average consensus uniformly asymptotically have been established. It has been


indicated that average consensus in the networks is heavily dependent on com-
munication topology of the networks and impulsive gain. Finally, numerical re-
sults have been used to show the robustness and effectiveness of the proposed
impulsive consensus protocol.

Acknowledgment
This work was supported by the National Science Foundation of China (Grant
Nos. 10972129 and 10832006), the Specialized Research Foundation for the Doc-
toral Program of Higher Education (Grant No. 200802800015), the Innovation
Program of Shanghai Municipal Education Commission (Grant No. 10ZZ61), the
Shanghai Leading Academic Discipline Project (Project No. S30106), and the
Scientific Research Foundation of Tongren College (Nos. TS10016 and TR051).

References
1. Jadbabaie, A., Lin, J., Morse, A.S.: Coordination of Groups of Mobile Autonomous
Agents Using Nearest Neighbor Rules. IEEE Trans. Autom. Contr. 48, 988–1001
(2003)
2. Fax, J.A., Murray, R.M.: Information Flow and Cooperative Control of Vehicle
Formations. IEEE Trans. Autom. Contr. 49, 1465–1476 (2004)
3. Olfati-Saber, R., Murray, R.M.: Consensus Problems in Networks of Agents with
Switching Topology and Time-Delays. IEEE Trans. Autom. Contr. 49, 1520–1533
(2004)
4. Moreau, L.: Stability of Multiagent Systems with Time-Dependent Communication
Links. IEEE Trans. Autom. Contr. 50, 169–182 (2005)
5. Ren, W., Beard, R.W.: Consensus Seeking in Multiagent Systems Under Dynam-
ically Changing Interaction Topologies. IEEE Trans. Autom. Contr. 50, 655–661
(2005)
6. Hong, Y.G., Hu, J.P., Gao, L.X.: Tracking Control for Multi-Agent Consensus with
an Active Leader and Variable Topology. Automatica 42, 1177–1182 (2006)
7. Sun, Y.G., Wang, L., Xie, G.M.: Average Consensus in Networks of Dynamic
Agents with Switching Topologies and Multiple Time-Varying Delays. Syst. Contr.
Lett. 57, 175–183 (2008)
8. Lin, P., Jia, Y.M.: Average Consensus in Networks of Multi-Agents with both
Switching Topology and Coupling Time-Delay. Physica A 387, 303–313 (2008)
9. Wu, Q.J., Zhou, J., Xiang, L.: Impulsive Consensus Seeking in Directed Networks
of Multi-Agent Systems with Communication Time-Delays. International Journal
of Systems Science (2011) (in press), doi:10.1080/00207721.2010.547630
10. Yan, J., Shen, J.H.: Impulsive Stabilization of Functional Differential Equations by
Lyapunov-Razumikhin Functions. Nonlinear Anal. 37, 245–255 (1999)
FDClust: A New Bio-inspired Divisive Clustering
Algorithm

Besma Khereddine1,2 and Mariem Gzara1,2


1
Multimedia InforRmation systems and Advanced Computing Laboratory (MIRACL)
Sfax, Tunisia
2
Institut supérieur d’informatique et de mathématique de Monastir
[email protected],
[email protected]

Abstract. Clustering with bio-inspired algorithms is emerging as an alternative


to more conventional clustering techniques. In this paper, we propose a new
bio-inspired divisive clustering algorithm FDClust (Artificial Fish based Divi-
sive Clustering algorithm). FDClust takes inspiration from the social organiza-
tion and the encounters of fish shoals. In this algorithm, each artificial fish
(agents) is identified with one object to be clustered. Agents move randomly on
the clustering environment and interact with neighboring agents in order to ad-
just their movement directions. Two Groups of similar objects will appear
through the movement of agents in the same direction. The algorithm is tested
and evaluated on several real benchmark databases. The obtained results are
very interesting in comparison with Kmeans, Slink, Alink, Clink and Diana
algorithms.

Keywords: Clustering, data mining, hierarchical clustering, divisive clustering,


swarm intelligence, fish shoals.

1 Introduction
Clustering is an important data mining technique that has a wide range of applications
in many areas like biology, medicine, market research and image analysis among
others. It is the process of partitioning a set of objects into different subsets. The goal
is that the object within a group be similar (or related) to one another and different
from (or unrelated to) the objects in other groups.
Many clustering algorithms exist in the literature. At a high level, we can divide
these algorithms into two classes: partitioning algorithms and hierarchical algorithms.
Given a database of n objects or data tuples, a partitioning method constructs k parti-
tions of the data, where each partition represents a cluster. Whereas, hierarchical
clustering presents data in the form of a hierarchy over the entity set. In hierarchical
clustering methods, the number of clusters has not to be specified a priori, and there
are no initializations to be done. Hierarchical clustering is static, and data affected to a
given cluster in the early stages cannot be moved between clusters. There are two
approaches to build a cluster hierarchy: (i) agglomerative clustering that builds a
hierarchy in the bottom up fashion by starting from smaller clusters and sequentially

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 136–145, 2011.
© Springer-Verlag Berlin Heidelberg 2011
FDClust: A New Bio-inspired Divisive Clustering Algorithm 137

merging them into parental nodes (ii) divisive clustering that builds a top-down hie-
rarchy by splitting greater clusters into smaller ones starting from the entire data set.
Researchers seek to invent new approaches to enhance the resolution of the cluster-
ing problem and to achieve better results. Recently, research on and with the
bio-inspired clustering algorithms has reached a very promising state. The basic moti-
vation of these approaches stems from the incredible ability of social animals and
other organisms (ants, bees, termites, birds, fish, etc) to solve complex problems col-
lectively.
These algorithms use a set of similar and rather simple artificial agents (ant, bee,
individual, etc) to solve the clustering problem. These algorithms can be divided into
three main categories according to data representation [1]: (i) an agent represents a
potential solution to the clustering problem to be optimized such as genetic [2,3] and
particle swarm optimization clustering algorithms [4,5], (ii) data points which are
objects in the universe, are moved by agents in order to form clusters. Examples of
such approaches are ant-based clustering algorithms [6] [7], (iii) each artificial agent
represents one data set. These agents move on the universe to form groups of similar
entities, for example Antree [8] and AntClust [9].
In this work, we propose a new bio-inspired divisive clustering algorithm: artificial
Fish based Divisive Clustering algorithm (FDClust). This algorithm takes inspiration
from the social organization and the encounters of fish shoals phenomena. Several
studies have shown that fish shoals are assorted according to several characteristics
[10][11]. During fish shoals encounters, an individual fish decides to join or to leave a
group according to its common characteristics with the already existing group mem-
bers [12][13]. Shoals encounters may result in the fission of the group into two
homogenous shoals. Thus real fish are able to solve the sorting problem. These phe-
nomena can be easily adapted to solve the clustering problem. In FDClust, an artifi-
cial fish represents an object to be clustered. The encounters of two artificial shoals
results in the fission of the group into two clusters of similar objects. FDClust builds a
binary tree of clusters. It applies recursively this process to split each node into two
homogenous clusters.
The reminder of the paper is organized as follows. Section 2 first describes the so-
cial organization of fish species and then the encounter phenomenon of fish shoals. In
section 3 we present the FDClust algorithm in details. Experimental results are pre-
sented and discussed in section 4. Section 5 concludes the paper and gives sugges-
tions for future work.

2 Social Organization and Encounters of Fish Shoals


Fish are strikingly social organisms [14]. Several biological studies have observed
and developed theoretical models to understand the fish shoals structures. In [13] the
authors have stated that fish shoals are not random aggregations of individuals but
they are instead assorted with respect to several factors including species and size.
Croft et al [14] provided evidence that juveniles display assortative shoaling based on
color pattern. Shoaling by color has also been reported in mollies [12]. Shoaling by
species and body length was observed in several species [13][11]. The homogeneity
of group composition has associated benefits such as anti-predator defense and forag-
ing efficiency.
138 B. Khereddine and M. Gzara

Fig. 1. Diagram showing the two forms of fission events that were recorded a) a rear fission
event, b) a lateral fission event [14]

Shoals membership is not necessarily stable over time. Individuals are exchanged
between groups [14]. Fish shoals are thus open groups (groups where individuals are
free to leave and join). Theoretical models of open groups assert that socials animals
make adaptive decisions about joining groups on the basis of a number of different
phenotypic traits of existing group members. Hence, individuals prefer to associate
with similar conspecifics, those of similar body length and those free of parasite [13].
Active choice of shoal mates has been documented for many fish species. During
shoals encouters individuals may actively choose neighboring fish that are of a similar
phenotype. Fish have limited vision and then cannot interact with the total group
members but only with perceived ones. Thus, shoals encounters provide an individual
based mechanism for shoal assortment. Since individuals can make decisions based
on the composition of available shoals, other group members are a source of informa-
tion about the most adaptive decisions [15]. Group living is likely to be based on a
continuous decision-making process, with individuals constantly evaluating the prof-
itability of joining, leaving or staying with others, in each encounter with other
groups. The encounters of fish shoals result in shoal fission or fusion.
Fission (but not fusion) events are shown to be an important mechanism in generat-
ing phenotypic assortment [14]. Shoal fission events are divided into two categories
(figure 1): (i) rear fission events where the two resulting shoals maintained the same
direction of travel and fission occur due to differential swimming speeds, (ii) lateral
fission events where the two resulting shoals are separated due to different directions
of travel [14].
The social organization of fish shoals is based on the phenotypic similarity. The
continuous decision-making process is based on the maintenance of social organiza-
tion with neighboring group members. The behavior of real fish during shoals en-
counters makes them able to solve collectively the sorting problem. Our study of
these phenomena (particularly the fission events) from a clustering perspective results
in the development of a clustering model for solving the divisive clustering problem.
The core task in such a problem is to split a candidate cluster into two distant parts. In
our model, this task is achieved by the simulation of the encounters of two groups of
artificial fish. The model is described in the next section.

3 FDClust: A New Bio-inspired Divisive Clustering Algorithm


The FDClust algorithm constructs a binary tree of clusters by starting from all
objects in the same cluster and splitting recursively each candidate cluster into two
FDClust: A New Bio-inspired Divisive Clustering Algorithm 139

sub-clusters until each object form one cluster. At each step the cluster with the highest
diameter among those not yet splitted is partitioned into two sub-clusters. To achieve the
partitioning of a group of objects into two homogenous groups, FDClust applies a bi-
partitioning procedure that takes inspiration from the shoals encounters phenomenon.
During shoals encounters, real fish are able to evaluate dynamically the profitability of
joining, leaving or staying with neighboring agents. This decision making process is
based on the maintenance of social organization of the entire group. Fish shoals are
phenotypically assorted by color, size and species. Shoals encounters may result in the
fission of the group into two well-organized groups (assorted groups). In lateral fission,
groups are separated due to two different directions of swimming. To achieve the divi-
sion of the candidate cluster into two sub-clusters, we use two artificial fish shoals. The
encounter of these two groups of artificial fish results in a lateral fission of the group
into two homogenous groups. Artificial fish (agents) are initially randomly scattered on
the clustering environment. Each agent is an object to be clustered. Each agent is ran-
domly associated a direction left or right. Since real fish have only local vision, artificial
agents interact only with neighboring agents to make adaptive decisions about joining or
leaving a group. Each agent has to make a binary decision whether to move to the left or
to the right. Agents take the same direction as most similar agents in their neighborhood.
Artificial fish join finally their appropriate group composed with similar agents. The
initial group is then separated into two sub-groups of similar objects due to the two
directions of travel left and right. Two groups of agents are formed those having the left
direction and those having the right direction.

3.1 Clustering Environment and Agents Vision


The clustering environment is a rectangular 2D grid G. Its width is w= ⎡ n ⎤ and
length is L = ⎡ n + 2 A ⎤ , where A is a positive parameter and n is the number of
objects. Objects are initially scattered randomly in the central square of the grid of
size w×w (figure 2). Two objects cannot occupy initially the same cell of the grid.
Each agent has initially a random direction left (←) or right (→). Artificial agents
have a limited vision. An agent can perceive only s×s neighboring cells (figure 2).
Agents are allowed to move to a cell still occupied by other agents. Let n i be the

(a) (b)

Fig. 2. FDClust: a) Clustering environment b) Agents vision


140 B. Khereddine and M. Gzara

number of agents in the neighborhood v ( p i ) of the agent p i . If n ≤ v ( p ) = s × s i i

then the agent p interacts with all his neighbors, else it interacts only with n v = s × s
i

neighbors chosen randomly among those situated in his neighborhood. We note


pv ( p i ) the set of agents with which the agent p i can interact.

3.2 Agents Movements

Each agent has an initial preferable direction left (←) or right (→). This direction is
initially fixed randomly. Agents move with identical speed. In one step, an agent can
move to one of its neighboring cells whatever the left one or the right one. It chooses
actively its travel direction through the interactions with its neighboring agents. An
agent interacts with at most n v nearest neighbors among those situated in his local
neighborhood. In fact, agents can occupy the same cell as other agents. To take the
decision about the next travel direction, the agent p i evaluates its similarity with
agents from pv ( p i ) that have the direction left (→) (respectively right (←)). These
two similarities are calculated as follow:
p j ∈ pv ( p i ) / dir ( p j ) =→

∈ pv ( p
∑ d 2
( pi, p
) =→
j )
(1)
( pi,→ ) = 1 −
p j i ) / dir ( p j
sim
m * pv j ∈ v ( p i ) / dir ( p j ) =→

p j ∈ pv ( p i ) / dir ( p j ) =←


p j ∈ pv ( p i ) / dir ( p
2
d ( pi, p
) =←
j )
(2)
sim ( p i , ← ) = 1 − j

m * p j ∈ pv ( p i ) / dir ( p j ) =←

Where m is the number of attribute considered to characterize the data. An agent


has the tendency to have the same direction of travel as its most similar neighbors. If
agents in pv ( p ) that have the left direction are more similar to p than those hav-
i i

ing the right direction, than p will move to the cell at its left and vice versa. An
i

agent will apply the following rules:


• If | pv ( p i ) |=0 then the agent will stand by.
• If | pv ( p i ) |≠0 and sim ( p i , → ) = sim ( p i , ← ) then the agent will stand by.
• If pv ( p ) ≠ i 0
and sim ( p i , → ) > sim ( p i , ← )
then the agent will move
to the right.
• If pv ( p ) ≠ i 0
and sim ( p i , → ) < sim ( p i , ← ) then the agent will move
to the left.

3.3 The Algorithm FDClust

FDClust starts by all objects gathered in the same cluster. At each step it applies the
bi-partitioning algorithm to the cluster to be splitted until each object constitutes one
cluster. It is a hierarchical divisive clustering algorithm (figure 3).
FDClust: A New Bio-inspired Divisive Clustering Algorithm 141

1. Initially, the universal cluster C containing all objects is to be splitted.


2. Bi-partitioning(C).
3. Eliminate C from the list of clusters to be splitted, and add Cr and Cl to this list.
4. Select the cluster C with the highest diameter among those not yet splitted.
5. If |C|=1 stop else go to step 2.

Fig. 3. The algorithm FDClust

Input: number of objects N, the size of the perception zone s*s, the movement
step p and the number of iterations T.
Output: Cl and Cr
1. Scatter objects of cluster C in the central square of the grid
2. Associate random direction (ĺ or ĸ) to each object
3. For t=1 to T do
4. For i=1 to N do
5. If(| pv ( p ) |=0) then stand by else
i

6. compute sim ( p i , → ) and sim ( p i , ← )


7. If sim ( p i , → ) > sim ( p i , ← )
then
direction ( p )= ĺ and move to the right.
i

8. If sim ( p , → ) < sim ( p , ← ) then


i i

p
direction ( i )=ĸ and move to the left.
9. else stand by

10. for i=1 to N do


∈ Cr
11. if direction ( p )= ĺ then
i p i

12. Else p i ∈ Cl
13. end
14. end
15. Return Cl et Cr

Fig. 4. Bi-partitioning algorithm

The bi-partitioning algorithm (figure 4) receives as a parameter a cluster C com-


posed of n objects, the size of the perception zone s×s and the number of iterations T.
The output of the algorithm is two clusters Cl and Cr. It assigns randomly to each
object its corresponding coordinate on the grid and its initial direction left or right.
The algorithm consists of T iterations. At each iteration, each agent evaluates its simi-
larity with neighboring ones having the left direction (respectively the right direction),
takes the decision on its following direction and computes its new coordinates on the
grid. After T iterations, two clusters will be formed Cl and Cr, where Cl (respectively
Cr) is the set of objects having the left direction (respectively right direction). The
computational complexity of the bi-partitioning procedure is Ο(T n v n) with T the
number of iteration, n v the maximum number of neighboring agents and n the num-
ber of objects in the candidate cluster.
142 B. Khereddine and M. Gzara

4 Tests and Results


To evaluate our algorithm, we have used real databases issued from the machine
learning repository [16].

Table 1. Real data bases

Data base N M K
Iris 150 4 3
Glass 214 9 6
Thyroid 215 5 3
Soybean 47 35 4
Wine 178 13 3
Yeast 1484 8 10

The main features of the databases are summarized in Table 1. In each case the num-
ber of attributes (M), the number of classes (K) and the total number of objects (N)
are specified.
To evaluate our algorithm we have used the following measures:
ƒ The intra clusters inertia: used to determine how homogonous the objects in
clusters are with each others (where, Gi is the center of the cluster I, d is the
Euclidean distance):
K
1
I =
K
¦ i =1
¦
xi ∈ C
d ( x i , Gi ) 2 (3)
ƒ The recall, the precision and the F-measure: are based on the idea of compar-
ing a resulting partition with a real or a reference partition. The relative recall
(respectively precision and F-measure) of the reference class Ci to the resulting
class Cj are defined as follows:
n ij n ij precision ( i , j ) * recall (i, j )
recall ( i , j ) = precision (i , j ) = F (i, j ) = 2
N i Nj precision ( i , j ) + recall (i, j )

Where n ij is the number of objects or individuals present in the reference


class Ci and in the resulting class Cj. Ni and Nj represent respectively the to-
tal number of objects in the class Ci and Cj.
To evaluate the entire class Ci, we just choose the maximum of values ob-
tained within Ci:
recall ( i ) = max ( recall ( i , j )) precision ( i ) = max ( precision ( i , j )) F ( i ) = max ( F ( i , j ))
j j j

The global value of the recall (r) , the precision (p) and F-measure (F) for all
classes will be respectively ( p i is the weight of the class Ci):

r = ∑ p i × recall (i ) p = ∑ pi × précision(i) F = ∑ p i × F (i ) where p i =


Ni
(4)
i i i
∑ Nk
k

In table 2, we present the obtained results for FDClust, kmeans, Alink, Clink, Slink
and Diana algorithms. Since FDClust and kmeans are stochastic, we give the min, the
max the mean and the standard deviation of 100 runs.
FDClust: A New Bio-inspired Divisive Clustering Algorithm 143

Table 2. FDClust: experimental results

Iris FDClust Kmeans Slink Alink Clink Diana


I min 0 .047 max 0.051 min 0.05 max 0.052 0.07 0.047 0.047 0.046
mean 0.05 sd 0.01 mean 0.051 sd 0.01
R min 0.86 max 0.92 min 0.89 max 0.92 0.99 0.88 0.88 0.88
mean 0.88 sd 0.01 mean 0.92 sd 0.009
P min 0.85 max 0.92 min 0.66 max 0.89 0.66 0.9 0.91 0.89
mean 0.89 sd 0.01 mean 0.86 sd 0.08
F min 0.85 max 0.92 min 0.7 max 0.88 0,77 0,87 0,88 0,88
mean 0.88 sd 0.01 mean 0.85 sd 0.06
Glass FDClust Kmeans Slink Alink Clink Diana
I min 0,09 max 0.14 min 0,08 max 0.1 0,2 0,091 0,1 0,082
mean 0.1 sd 0.03 mean 0.83 sd 0.04
r min 0,32 max 0.75 min 0,52 max 0.85 0,97 0,86 0,87 0,84
mean 0.49 sd 0.05 mean 0.63 sd 0.06
p min 0.41 max 0.7 min 0.52 max 0.78 0.53 0.56 0.56 0.56
mean 0.57 sd 0.07 mean 0.63 sd 0.05
F min 0.37 max 0.61 min 0.53 max 0.74 0.62 0.62 0.62 0.62
mean 0.48 sd 0.05 mean 0.6 sd 0.04
Thyroid FDClust Kmeans Slink Alink Clink Diana
I min 0.02 max 0.11 min 0.03 max 0.056 0.09 0.054 0.054 0.049
mean 0.034 sd 0.01 mean 0.049 sd 0.03
r min 0.54 max 0.72 min 0.86 max 0.78 0.97 0.92 0.96 0.88
mean 0.66 sd 0.03 mean 0.78 sd 0.01
p min 0.59 max 0.74 min 0.71 max 0.9 0.8 0.87 0.67 0.9
mean 0.71 sd 0.01 mean 0.9 sd 0.02
F min 0.49 max 0.68 min 0.77 max 0.87 0.87 0.88 0.75 0.87
mean 0.64 sd 0.03 mean 0.87 sd 0.01
Soybean FDClust Kmeans Slink Alink Clink Diana
I min 1.42 max 1.57 min 1.57 max 1.64 1.57 1.39 1.57 1.39
mean 1.43 sd 0.02 mean 1.61 sd 0.02
r min 0.62 max 0.8 min 0.53 max 1 0.95 1 0.95 1
mean 0.79 sd 0.03 mean 0.93 sd 0.07
p min 0.71 max 1 min 0.51 max 1 0.88 1 0.88 1
mean 0.94 sd 0.04 mean 0.9 sd 0.11
F min 0.75 max 0.97 min 0.41 max 1 0.9 1 0.9 1
mean 0.93 sd 0.04 mean 0.89 sd 0.11
Wine FDClust Kmeans Slink Alink Clink Diana
I min 0.32 max 0.34 min 0.27 max 0.32 0.52 0.29 0.52 0.32
mean 0.32 sd 0.03 mean 0.28 sd 0.05
r min 0.86 max 0.98 min 0.8 max 0.96 0.98 0.93 0.98 0.87
mean 0.94 sd 0.05 mean 0.95 sd 0.01
p min 0.68 max 0.93 min 0.56 max 0.96 0.58 0.93 0.58 0.82
mean 0.87 sd 0.05 mean 0.95 sd 0.03
F min 0.73 max 0.93 min 0.65 max 0.96 0.67 0.93 0.67 0.81
mean 0.87 sd 0.04 mean 0.94 sd 0.02
Yeast FDClust Kmeans Slink Alink Clink Diana
I min 0.011 max 0.012 min 0.032 max 0.035 0.07 0.07 0.042 0.035
mean 0.011 sd 0.001 mean 0.033 sd 0.0001
r min 0.21 max 0.64 min 0.32 max 0.45 0.98 0.98 0.5 0.47
mean 0.37 sd 0.09 mean 0.39 sd 0.02
p min 0.29 max 0.47 min 0.57 max 0.82 0.67 0.67 0.72 0.54
mean 0.34 sd 0.03 mean 0.65 sd 0.03
F min 0.22 max 0.52 min 0.35 max 0.53 0.72 0.72 0.54 0.53
mean 0.34 sd 0.05 mean 0.48 sd 0.02
144 B. Khereddine and M. Gzara

For the database Iris, our algorithm generates the best results according to all consi-
dering measures in comparison with other algorithms. For the data bases Glass and
thyroid, FDClust encounters some difficulty in the determination of real cluster struc-
ture, but the obtained clusters are homogenous. For the data base soybean, all algo-
rithms generate good partitions and results are nearby. For the data base Wine
FDClust generates a partition of a good quality in term of inertia, recall, precision and
F_measure in comparison with those obtained by the other algorithms. For the data
base yeast, FDClust generates the best partition in term of intraclusters inertia but like
Kmeans it has a difficulty in detecting real clusters structures.
Comparing with other algorithms, we note that for all data bases FDClust has rec-
orded good performances. Moreover FDClust has the advantage of having lower
complexity than the other hierarchical algorithms.

5 Conclusion
Bio-inspired clustering algorithms are an appropriate alternative to traditional cluster-
ing algorithms. Research on bio-inspired clustering algorithms is still an on-going field
of research. In this paper we have presented a new approach for divisive clustering
with artificial fish. It is based on the shoal encounters and social organization of fish
shoals phenomena. The obtained results are encouraging.
As prospects, we attempt to extend our algorithm by considering more than two di-
rections of travels. A candidate cluster may be divided into more than two sub-clusters.

References
1. Bock., H., Gaul, W., Vichi, M.: Studies in classification, data analuysis, and knowldge or-
ganization (2005)
2. Falkenauer, E.: A new representation and operators for genetic algorithms applied to
grouping problems. Evolutionary Computation 2(2), 123–144 (1994)
3. Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern
Recognition 33, 1455–1465 (2000)
4. Sandra Cohen, C.M., Leandro de Castro, N.: Data Clustering with Particle Swarms. In:
IEEE Congress on Evolutionary Computations 2006 (2006)
5. Chen, C.-Y., Ye, F.: Particle swarm optimization algorithm and its application to clustering
analysis. In: Proceedings of IEEE International Conference on Networking, Sensing and
Control, pp. 789–794 (2004)
6. Lumer, E., Faieta, B.: Diversity and adaptation in populations of clustering ants. In: Cliff,
D., Husbands, P., Meyer, J., W., S. (eds.) Proceedings of the Third International Confe-
rence on Simulation of Adaptive Behavior, pp. 501–508. MIT Press, Cambridge (1994)
7. Gzara., M., Jamoussi., S., et Elkamel, A., Ben Abdallah, H.: L’algorithme CAC: des four-
mis artificielles pour la classification automatique. Accepté à paraitre dans la revue
d’intelligence artificielle (2011)
8. Azzag, H., Guinot, C., Oliver, A., Venturini, G.: A hierarchical ant based clustering algo-
rithm and its use in three real-world applications. In: Dullaert, W., Marc Sevaux, K.S.,
Springael, J. (eds.) European Journal of Operational Research (EJOR). Special Issue on
Applications of Metaheuristics (2005)
FDClust: A New Bio-inspired Divisive Clustering Algorithm 145

9. Labroche, N., Monmarché, N., Venturini, G.: A new clustering algorithm based on the
chemical recognition system of ants. In: van Harmelen, F. (ed.) Proceedings of the 15th
European Conference on Artificial Intelligence, pp. 345–349 (2002)
10. Krause, J., Butlin, R.K., Peuhkuru, N., Prichard, V.: The social organization of fish shoals:
a test of the predictive power of laboratory experiments for the field. Biol. Rev. 75, 477–
501 (2000a)
11. McCann, L.I., Koehn, D.J., Kline, N.J.: The effects of body size and body markings on
nonpolarized schooling behaviour of zebra fish (Brachydanio rerio). J. Psychol. 79, 71–75
(1971)
12. Krause, J., Godin, J.G.: Shoal choice in the banded killifish (Fundulus diapha-nus, Teleos-
tei, Cyprinodontidae) – Effects of predation risk, fish size, species compo-sition and size of
shoals. Ethology 98, 128–136 (1994)
13. Crook, A.C.: Quantitative evidence for assortative schooling in a coral reef. Mar. Ecol.
Prog. Ser. 179, 17–23 (1999)
14. Theodorakis, C.W.: Size segragation and effects of oddity on predation risk in minnow
schools. Anim. Behav. 38, 496–502 (1989)
15. Croft, D.P., Arrowsmith, B.J., Bielby, J., Skinner, K., White, E., Couzin, I.D., Margurran,
I., Ramnarine, I., Krausse, J.: Mechanisms underlying shoal composition in Trinidadian
guppy (Poecilia). Oikos 100, 429–438 (2003)
16. Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)
Mining Class Association Rules from Dynamic Class
Coupling Data to Measure Class Reusability Pattern

Anshu Parashar1 and Jitender Kumar Chhabra2


1
Haryana College of technology & Management Kaithal,136027, India
2
Department of Computer Engineering, National Institute of Technology, Kurukshetra,
Kurukshetra 136 119, India
[email protected]

Abstract. The increasing use of reusable components during the process of soft-
ware development in the recent times has motivated the researchers to pay more
attention to the measurement of reusability. There is a tremendous scope of using
various data mining techniques in identifying set of software components having
more dependency amongst each other, making each of them less reusable in isola-
tion. For object-oriented development paradigm, class coupling has been already
identified as the most important parameter affecting reusability. In this paper an
attempt has been made to identify the group of classes having dependency
amongst each other and also being independent from rest of the classes existing in
the same repository. The concepts of data mining have been used to discover pat-
terns of reusable classes in a particular application. The paper proposes a three
step approach to discover class associations rules for Java applications to identify
set of classes that should be reused in combination. Firstly dynamic analysis of the
Java application under consideration is performed using UML diagrams to com-
pute class import coupling measure. Then in the second step, for each class these
collected measures are represented as Class_Set & binary Class_Vector. Finally
the third step uses apriori (association rule mining) algorithm to generate Class
Associations Rules (CAR’s) between classes. The proposed approach has been
applied on sample Java programs and our study indicates that these CAR’s can as-
sist the developers in the proper identification of reusable classes by discovering
frequent class association patterns.

Keywords: Coupling, Data Mining, Software Reusability.

1 Introduction
Object oriented development has become widely acceptable in the software industry.
It provides many advantages over the traditional development approaches [17] and is
intended to enhance software reusability through encapsulation and inheritance [28].
In object-oriented concept, classes are basic building blocks and coupling between
classes is well-recognized structural attribute in OO software engineering. Software
Reuse is defined as the process of building or assembling software applications from
previously developed software [20]. Concept of reuse has been widely used by the
software industry in recent times. The present scenario of development is to reuse

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 146–156, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Mining Class Association Rules from Dynamic Class Coupling Data 147

some of the already existing quality components and development of new highly re-
usable component. The reuse of software components in software development leads
to increased productivity, quality, maintainability etc [3,23]. The success of reusabil-
ity is highly dependent on proper identification of whether a particular component is
really reusable or not. These measures help to develop, store and identify reusable
components [21]. Reuse of Class code has been frequent in practice. It is essential &
tricky to identify a set of needed classes to reuse together or alone. Hence it is always
desirable to find out the classes along with their associated classes [17]. Class cou-
pling plays a vital role to measure reusability and selecting classes for reuse in combi-
nation because the highly coupled classes are required to be reused as a group [7].
One can define a class Ca related to class Cb if Ca must use Cb in all future reuse. So
group of dependent classes should be reused together for ensuring the proper
functioning of the application [22]. There is a thrust of software metrics especially
reusability metric as an active research area in the field of software measurement.
Software metric is a quantitative indicator of an attribute of a software product or
process. There are some reuse related metric models like cost productivity, return on
investment, maturity assessment, failure modes and reusability assessment etc [20].
For developer who wants to reuse components, reusability is one of the important
characteristic. It is necessary to measure the reusability of components in order to
recognize the reuse of components effectively. So classes must be developed as Reus-
able to effectively reuse them later. Developers should be trained or facilitated to use
reusable components e.g. classes because it is hard to understand the structure of
classes developed by others [24]. If developers do not have any prior knowledge
about the coupling of classes they want to reuse, then they need to spend few time to
understand the association pattern of classes. So there is a need to develop some
mechanism that helps to know what combination of classes to reuse. By viewing class
association rules and patterns, developer can predict required set of classes and can
avoid unnecessary, partial class reuse. So for reuse, issues like maintaining class code
repository, deciding what group of classes should be incorporated into repository &
their association patterns and identifying exact set of classes to reuse, need to be ad-
dressed. It will reduce some reuse efforts. To discover the class association rules data
mining can be used. By using data mining technology, one can find frequently used
classes and their coupling pattern in a particular java application.

1.1 Data Mining and Its Usage in Reusability

Data mining is the process of extracting new and useful knowledge from large amount
of data. Mining is widely used to solve many business problems such as customer
profiling, customer behavior modeling, product recommendation, fraud detection etc
[25]. Data mining techniques can be used to analyze software engineering data to bet-
ter understand the software and assist software engineering tasks. It also helps in pro-
gramming, defect detection, testing, debugging, maintenance etc. In component reuse,
mining helps in numerous ways such as to decide which components we should reuse,
what is the right way to reuse, which components may often be reused in combina-
tions etc [25]. The general approach of mining software engineering data consists of
following steps:
148 A. Parashar and J.K. Chhabra

a) Identify the software engineering problem to be solved by mining


b) Identify and study data source
c) Extract & preprocess data
d) Mine the data e.g discover association rules
Due to the popularity of open source concept large amount of source code of classes
is available on Internet as software repositories. Some also exists in large software
companies where developer in one group may reuse classes written by other groups.
For this reason, it is desirable to have mining tools that tell explicitly the class asso-
ciation patterns. Finding associations provides a distinct advantage in highly reusable
environment. By searching for class patterns with high probability of repetitions we
can correlate one set of classes with other set of classes. Also class associations will
help developer to know which classes are likely to be reused together. The process of
selecting required set of classes to reuse is complicated and requires some fundamen-
tal knowledge about the class structure, relationship or interaction with other classes.
For this either software developer learns the reuse pattern of classes by their continu-
ous experience or reading documentations/ manuals or by browsing such mined class
association rules. The latter being feasible practically.
In this paper, we explore market basket analysis technique to mine class associa-
tion rules (CAR’s) from vast collection of class coupling data for particular pro-
ject/program. This can be helpful in reusing classes by capturing association rules
between classes. By querying or browsing such association rules a developer can dis-
cover patterns for reusing classes. For this purpose firstly dynamic analysis of java
application is done using UML diagrams to collect class import coupling data. Then
in second step, these collected data are represented as Class Set & Binary Class Vec-
tor. Then finally in the third step market basket analysis (apriori) technique is applied
on Class Set representation to find frequently used classes and association rules be-
tween them. Further Class Vector representation is used to measure cosine similarity
between classes. The measured values are analyzed to compare import coupling pat-
tern between classes.
The rest of the paper is organized as follows. Section 2 discusses the related works.
Section 3 describes the proposed methodology to mine class association rules and
class coupling behavior. Section 4 shows example case study to illustrate our ap-
proach .Section 5 presents results and discussion. Finally, Section 6 concludes this
paper.

2 Related Works
For object-oriented development paradigm, class coupling has been used as an impor-
tant parameter effecting reusability. Li et.al. [19], Yacoup et. al [18] , Arisholm et.
al.[2] proposed some measures for coupling. Efforts have been made by the re-
searchers to measure reusability through coupling and cohesion of components [5].
Gui et al [6,7] and Choi et al [4] provided some reusability measures based on cou-
pling and cohesion. ISA [8] methodology has been proposed to identify data cohesive
subsystems. Gui et al [10] proposed a new static measure of coupling to assess and
rank the reusability of java components. Arisholm et. al.[2] have provided a method
for identifying import coupled classes with each class at design time using UML
Mining Class Association Rules from Dynamic Class Coupling Data 149

diagrams. Data mining is focused on developing efficient techniques to extract rele-


vant information from very large volumes of data that may be exploited, for example,
in decision making, to improve software reliability and productivity [9]. Association
rule discovery from large databases is an important data mining task. Few algorithms
have been proposed for mining the association rules like market basket analysis
[1],apriori[11],Ranked Multilabel Rule (RMR) [12], CAR[13], CMAR
[14],ARMC[13]. Michail[28] considered the problem of discovering association rules
that identify library components that are often reused in combination in the ET++
application framework . Yin et al[15] proposed a Classification approach CPAR
based on Predictive Association Rules, which combines the advantages of both asso-
ciative classification and traditional rule-based classification. Cosine similarity (be-
tween two vectors) and Jaccard similarity coefficient are often used to compare
documents in text mining[30]. We find the association mining approach proposed by
Agrawal et al[1,11,3] and cosine similarity measure very simple and go well with our
idea. So to predict class reusability pattern of a particular java application, we are
using cosine similarity measure and association mining approach.

3 Proposed Methodology

The concepts of data mining have been used to discover patterns of reusable classes in
a particular application. These patterns are further helpful in reusing the classes. As-
sociation rules between classes and class coupling behaviour are used to identify the
class reusability patterns. For this purpose, association mining algorithm [1, 11] is
used to mine class association rules (CAR) from class import coupling data. To know
the class coupling behaviour the cosine similarity measure can be applied on class
import coupling data. Our approach to mine class association rules and class coupling
behavior consists of three steps:
1. Collection of Class import coupling data through UML.
2. Representation of Collected Data.
3. Mining of Class Association Rules (CAR) & Prediction of class import cou-
pling behavior.
The steps are described in section 3.1 to 3.3.

3.1 Collection of Class Import Coupling Data through UML

Dynamic analysis of a program is a precondition for finding the association rules be-
tween classes. Dynamic analysis of programs can be done through UML diagrams
[27]. Significant advantages of using UML are its language independence and compu-
tation of dynamic metrics based on early design artifacts. Erik Arisholm[2] referred
UML models to describe dynamic coupling measures as a way to collect for each
class its import coupled classes. They used following formula for calculating class
import coupling IC_OC (Ci).

I C _ O C ( c 1) { ( m 1 , c 1 , c 2 ) | (  ( o 1 , c 1 )  R o c ) (  ( o 1 , c 2 )  R o c | N ) c 1 z c 2 š ( o 1 , m 1 | o 2 , m 2 )  M E }
150 A. Parashar and J.K. Chhabra

IC_OC (Ci) counts the number of distinct classes that a method in a given object
uses. This formula can be used to measure dependency of one class to other classes in
terms of its import coupling.

3.2 Representation of Collected Data


Data thus collected in step one should be represented in some suitable intermediate
representation, so that mining algorithm can be applied easily to find the class asso-
ciations. In this we propose to represent data in two forms:

3.2.1 Class Set Representation


Class import coupling data of each class can be represented by class set representa-
tion. IC_Class_Set represents classes coupled (import) with a class. For example , Let
C= {C1,C2,C3,C4,C5} is set of classes of an application , IC_Class_Set(C1)=
{C1,C3,C4 } means that class C1 is coupled(import) with classes C3 and C4.Class C1
itself is included in its import coupled class set to have complete set of classes used .
For an application, the collection of IC_Class_Set of all classes is called as IC_SET
(application).

3.2.2 Class Vector Representation


Class vector of a class also represents the import coupled classes with a given class
but it is in vector form and is w.r.t. all classes of the application. Suppose C is the
ordered set of classed in a particular application , then for a class Ci , class vector is
represented as C_V(Ci)=[1,0,1,1,0] . Here 1 at place j indicates that Ci class is cou-
pled (import) with class j and 0 at place k indicates no coupling (import) of class Ci
with class Ck . From above two representations, IC_SET (application) is used to mine
class association rules through apriori approach and C_V(Ci) is used for measuring
class coupling behavior by cosine similarity measure.

3.3 Mining of Class Association Rules and Prediction of Class Import Coupling
Behavior

3.3.1 Mining Class Association Rules


To mine class association rules for java programs basic apriori approach proposed by
Agrawal et al[1,11,3] has been used. They have used the apriori approach (Market
basket analysis) to analyze marketing transaction data for determining which products
customers purchase together. The concept of support and confidence has been used to
find out association rules for the purchased products. The support of an itemset is
defined as the proportion of transactions in the transaction data set which contain the
itemset and confidence of a rule X→Y is defined as an estimate of the probability P(Y
| X), the probability of finding the RHS of the rule in transactions under the condition
that these transactions also contain the LHS[29].Based on that we are considering a
collection of class import coupling data for an application i.e. IC_SET (application) to
find the set of Class Association Rules (CAR). Association rules are required to
satisfy a user-specified minimum support and minimum confidence at the same time.
Association rule generation is usually split up into two separate phases. In First phase,
minimum support min_sup is applied to mine all frequent classes called as Frequent
Class Combination Set (FCCS). The process to find FCCS is as follows:
Mining Class Association Rules from Dynamic Class Coupling Data 151

1: i=1
2: Create candidate class set CSi having all classes and their Support. (The Support
for a class Ci is the frequency of occurrence of that class in IC_SET (application).
3: Create large class set Li by eliminating Class Set from CSi having Support
sup<min_sup
4: Repeat
5: i=i+1
6: Create candidate class set CSi having Cartesian product of sets in Li-1 and calcu-
late their support from IC_SET (application).
7: Create Large set Li by eliminating Class Set from CSi having sup <min_sup
8: Until (no scope to built large class set)
So Class Set in CSi give frequent Class Combination Set (FCCS).After this in Second
phase, FCCS and minimum confidence min_conf constraint are used to form CAR.
The support and confidence values for each pair of classes in FCCS are calculated
using below mentioned formulas 1 &2 [1,11,3]:
no of tuples containing both Ci & C j
support(Ci → Cj) = (1)
total no of tuples

no of tuples containing both Ci & C j


confidence(Ci → Cj) = (2)
no of tuples containing Ci

So a rule Ci → Cj holds in the IC_SET with confidence cf% if cf% tuples of


ICOUP_SET contain Ci also contain Cj . The rule Ci → Cj has support sp% if sp%
tuples of IC_SET contain Ci U Cj. So the association rule Ci → Cj on IC_SET is worth
considering if set of tuples in IC_SET holds this rule with sup (Ci → Cj) >min_sup &
conf (Ci → Cj) >min_conf.
As a result of this, the distinction can be made between classes that are being used
together often and classes that are not. These rules suggest which classes can be re-
used as a group and repository designer can then use these rules to put frequently used
classes in the repository.

3.3.2 Measuring Class Coupling Behavior


The class vector representation C_V(C) of classes is used to compute Cosine Similar-
ity between classes on a scale of [0, 1]. The Cosine Similarity [16] of two class vec-
tors C_V (Ci) & C_V (Cj) is defined as:
Ci .C j
Cos_Sim(Ci,Cj) =
Ci C j

The value 1 means that the coupling pattern of classes Ci & Cj is identical, 0 means
completely different[16,26].So using Cosine similarity one can analyze which classes
have similar, nearly similar or completely different coupling pattern. In next section,
we demonstrate our approach of Mining Class Association Rules and Measuring
Class Coupling Behavior of a sample application.
152 A. Parashar and J.K. Chhabra

4 Example Case Study


We are using a small example to illustrate our approach for Mining Class Association
Rules and measuring class coupling behavior. We consider example java application
MYSHAPES. Application MYSHAPES has class set C(MYSHAPES)= { myshape, cir-
cle, square, shape }.
In the first step import coupling data are collected for MYSHAPES using UML ap-
proach. In second step these collected values are represented asIC_SET(MYSHAPES).
The next sections 4.1 & 4.2 show the third step of our approach. We assume
min_sup>25%.

4.1 Mining Class Association Rules

As a first part of third step of methodology, the method given in section 3.3.1 is ap-
plied on IC_SET (MYSHAPES) to find Frequent Class Combination Set (FCCS) and
is shown in figure 1.Then the output FCCS is used to form Class Association
Rules(CAR) having min_conf ≥90%. Table 1 lists the CAR with confidence more
than 90% for the application MYSHAPES.

4.2 Measuring Class Coupling Behavior

To find out the behavior of each class in terms of class coupling pattern we use class
vector representation (C_V) of MYSHAPES (table 2) and compute Cosine similarity
measure between classes as mentioned in section 3.3.2. Following table 3 shows the
computed Cos_Sim(Class1,Class2).

5 Results and Discussion

We can measure the reusability pattern of classes by analyzing their association rules
and import coupling patterns. CAR’s of application MYSHAPES (figure 2) suggest
whenever a class on the left hand side of rule is to be reused, there is strong probabil-
ity with 100% confidence that classes on right side of the rule will also be reused.
From figure 2 it is observed that whenever class square is to be reused class shape
will also be reused. From figure 2 it is observed that the cosine similarity between
classes circle and shape is 1 and myshape and square is.71. It suggests that import
coupling behavior of classes circle & shape are exactly similar i.e. they are always
used together while classes myshape and square are sometimes import coupled to
some common classes.
Our study shows that FCCS, CAR’s and Cos_sim between classes can be helpful
for a repository designer/user to predict which classes are required to be reused in
combination and what is the coupling pattern of classes. The effectiveness of Class
association rule is dependent on type of coupling attributes used to know import cou-
pling between classes, ways to represent coupling data and accuracy of association
mining algorithm applied to it.
Mining Class Association Rules from Dynamic Class Coupling Data 153

IC_SET (MYSHAPES) CS1

Class_Id IC_Class_Set Class_Set sup


myshapes myshape, circle,
myshapes 01
square, shape
circle 03
circle circle, square,
shape
square 04
square square , shape
shape 04
shape circle , square ,
shape

L1
CS2
Class_Set sup
Class_Set sup
circle 03
{ circle ,square } 03
square 04
{ circle ,shape } 03
shape 04
{ square shape } 03

CS3
L2
Class_Set sup
Class_Set sup
{ circle ,square, 03
{ circle ,square } 03 shape }

{ circle ,shape } 03

{ square shape } 03

Frequent Class
Combination Set(FCCS)

{ circle ,square, shape }

Fig. 1. Frequent Class Combination Mining Steps


154 A. Parashar and J.K. Chhabra

Table 1. List of CAR & FCCS

Application Frequent Class CAR


Combinations Set (FCCS)

MYSHAPES {circle , square, shape} 1. circleĺsquare (sup=75% , conf=100%)

2. circleĺshape (sup=75% , conf=100%)

3. shapeĺsquare (sup=100% , conf=100%)

4. squareĺshape (sup=100% , conf=100%)

5. circleĺshape, square (sup=75% , conf=100%)

Table 2. Class vector of MYSHAPES Table 3. Cosine similarity


between of MYSHAPES
myshape circle square shape
C_V(myshape) 1 1 1 1
Cos_Sim(Class1,Class2) Scale
C_V(circle) 0 1 1 1
Cos_Sim(myshape,circle) .87
C_V(square) 0 0 1 1
Cos_Sim(myshape,square) .71
C_V(shape) 0 1 1 1
Cos_Sim(myshape,shape) .87
Cos_Sim(circle,square) .81
Cos_Sim(circle,shape) 1
Cos_Sim(square,shape) .87

circleĺshape, square 0.8


SCALE

0.6
squareĺshape 0.4
0.2
CAR’s

shapeĺsquare
0

circleĺshape

circleĺsquare

0% 20% 40% 60% 80%100%


SUPPORT Cos Sim(Class1,Class2)

Fig. 2. CAR and their Support Fig. 3. Cosine Similarities between Classes
Mining Class Association Rules from Dynamic Class Coupling Data 155

6 Conclusions
In this paper, an attempt has been made to determine class reusability pattern from
dynamically collected class import coupling data of java application. Our initial study
indicates that basic technique of market basket analysis (apriori) and cosine similarity
measure can be constructive to find out class association rules (CAR’s) and class
import coupling behaviour. Currently, we have deduced CAR’s for a sample java ap-
plication. However, the approach can also be applied on larger java applications.
Moreover, other association mining and clustering algorithms can be explored to ap-
ply on class coupling data for finding class reusability patterns.

References
1. Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items
in Large Databases. In: ACM, SIGMOD, pp. 207–216 (1993)
2. Arisholm, E.: Dynamic Coupling Measurement for Object-Oriented Software. IEEE
Transactions on Software Engineering 30(8), 491–506 (2004)
3. Negandhi, G.: Apriori Algorithm Review for Finals, http://www.cs.sjsu.edu
4. Choi, M., Lee, J.: A Dynamic Coupling for Reusable and Efficient Software System. In:
5th IEEE International Conference on Software Engineering Research, Management and
Applications, pp. 720–726 (2007)
5. Mitchell, A., Power, F.: Using Object Level Run Time Metrics to Study Coupling Between
Objects. In: ACM Symposium on Applied Computing, pp. 1456–1462 (2005)
6. Gui, G., Scott, P.D.: Coupling and Cohesion Measures for Evaluation of Component Reus-
ability. In: ACM International Workshop on Mining Software Repository, pp. 18–21
(2006)
7. Taha, W., Crosby, S., Swadi, K.: A New Approach to Data Mining for Software Design.
In: 3rd International Conference on Computer Science, Software Engineering, Information
Technology, e-Business, and Applications (2004)
8. Montes, C., Carver, D.L.: Identification of Data Cohesive Subsystems Using Data Mining
Techniques. In: IEEE International Conference on Software Maintenance, pp. 16–23
(1998)
9. Xie, T., Acharya, M., Thummalapenta, S., Taneja, K.: Improving Software Reliability and
Productivity via Mining Program Source Code. In: IEEE International Symposium on Par-
allel and Distributed Processing, pp. 1–5 (2008)
10. Gui, G., Scott, P.D.: Ranking reusability of software components using coupling metrics.
Elsevier Journal of Systems and Software 80, 1450–1459 (2007)
11. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: 20th Interna-
tional Conference on Very Large Data Bases, pp. 487–499 (1994)
12. Thabtah, F.A., Cowling, P.I.: A greedy classification algorithm based on association rule.
Elsevier journal of Applied Soft Computing 07, 1102–1111 (2007)
13. Zemirline, A., Lecornu, L., Solaiman, B., Ech-Cherif, A.: An Efficient Association Rule
Mining Algorithm for Classification. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A.,
Zurada, J.M. (eds.) ICAISC 2008. LNCS (LNAI), vol. 5097, pp. 717–728. Springer, Hei-
delberg (2008)
14. Li, W., Han, J., Pei, J.: CMAR: Accurate and Efficient Classification Based on Multiple
Class-Association Rules. In: International Conference on Data Mining, pp. 369–376
(2001)
156 A. Parashar and J.K. Chhabra

15. Yin, X., Han, J.: CPAR: Classification based on Predictive Association Rules
16. Cosine Similarity Measure,
http://www.appliedsoftwaredesign.com/cosineSimilarityCalculat
or.php
17. Lee, Y., Chang, K.H.: Reusability and. Maintainability Metrics for Object-Oriented Soft-
ware. In: ACM 38th Annual Southeast Regional Conference, pp. 88–94 (2000)
18. Yacoub, S., Ammar, H., Robinson, T.: Dynamic Metrics for Object-Oriented Designs. In:
IEEE 6th International Symposium Software Metrics, pp. 50–61 (1999)
19. Li, W., Henry, S.: Object Oriented Metrics that predict Maintainability. In: Technical Re-
pot, Virginia Polytechnic Institute and State University (1993)
20. Shiva, S.J., Shala, L.A.: Software Reuse: Research and Practice. In: Proceedings of the
IEEE International Conference on Information Technology, pp. 603–609 (2007)
21. Bhatia, P.K., Mann, R.: An Approach to Measure Software Reusability of OO Design. In:
Proceedings of the 2nd National Conference on Challenges & Opportunities in Information
Technology, pp. 26–30 (2008)
22. Eickhoff, F., Ellis, J., Demurjian, S., Needham, D.: A Reuse Definition, Assessment, and
Analysis Framework for UML. In: International Conference on Software Engineering
(2003),
http://www.engr.uconn.edu/~steve/Cse298300/eickhofficse2003s
ubmit.pdf
23. Caldiera, G., Basili, V.R.: Identifying and Qualifying Reusable Software Components.
IEEE Journal of Computer 24(2), 61–70 (1991)
24. Henry, S., Lattanzi, M.: Measurement of Software Maintainability and Reusability in the
Object Oriented Paradigm. In: ACM Technical Report (1994)
25. Xie, T., Pei, J.: Data mining for Software Engineering,
http://ase.csc.ncsu.edu/dmse/dmse.pdf
26. Cosine Similarity, http://en.wikipedia.org/wiki/Cosine_similarity
27. Gupta, V., Chhabra, J.K.: Measurement of Dynamic Metrics Using Dynamic Analysis of
Programs. In: Proceedings of the Applied Computing Conference, pp. 81–86 (2008)
28. Michail, A.: Data Mining Library Reuse Patterns in User-Selected Applications. In: 14th
IEEE International Conference on Automated Software Engineering, pp. 24–33 (1999)
29. Associations Rule,
http://en.wikipedia.org/wiki/Association_rule_learning
30. Jaccard Index, http://en.wikipedia.org/wiki/Jaccard_index
An Algorithm of Constraint Frequent Neighboring
Class Sets Mining Based on Separating Support Items

Gang Fang, Jiang Xiong, Hong Ying, and Yong-jian Zhao

College of Mathematics and Computer Science, Chongqing Three Gorges University


Chongqing 404000, P.R. China
[email protected], [email protected],
[email protected], [email protected]

Abstract. For the reasons that present constraint frequent neighboring class sets
mining algorithms need generate candidate frequent neighboring class sets and
have a lot of repeated computing, and so this paper proposes an algorithm of
constraint frequent neighboring class sets mining based on separating support
items, which is suitable for mining frequent neighboring class sets with
constraint class set in large spatial database. The algorithm uses the method of
separating support items to gain support of neighboring class sets, and uses up
search to extract frequent neighboring class sets with constraint class set. In the
course of mining frequent neighboring class sets, the algorithm only need scan
once database, and it need not generate candidate frequent neighboring class
sets with constraint class set. By these methods the algorithm reduces more
repeated computing to improve mining efficiency. The result of experiment
indicates that the algorithm is faster and more efficient than present mining
algorithms when extracting frequent neighboring class sets with constraint class
set in large spatial database.

Keywords: neighboring class set; constraint class set; separating support items;
up search; spatial data mining.

1 Introduction
Geographic information databases is an important and typical spatial database, mining
spatial association rules from geographic information databases is one important part
of spatial data mining and knowledge discovery, which is all known as spatial
co-location pattern written in [1]. Spatial co-location pattern are some implicit rules
expressing construct and association of spatial objects in geographic information
databases, and also expressing hierarchy and correlation of different subsets of spatial
association or spatial data in geographic information databases written in [2]. At pre-
sent, in spatial data mining, there are mainly three kinds methods of mining spatial
association rules written in [3], such as, layer covered based on clustering written in
[3], mining method based on spatial transaction written in [2, 4, 5 and 6] and mining
method based on non-spatial transaction written in [3]. We use the first two methods
to extract frequent neighboring class set written in [4, 5 and 6], but AMFNCS written
in [4] and TDA written in [5] are not able to efficient extract frequent neighboring

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 157–163, 2011.
© Springer-Verlag Berlin Heidelberg 2011
158 G. Fang et al.

class set with constraint class set, and MFNCSWCC written in [6] need generate
many candidates and have a lot of repeated computing when it uses iterative search to
generate frequent neighboring class set with constraint class set. Hence, this paper
proposes an algorithm of constraint frequent neighboring class sets mining based on
separating support items, denoted by CMBSSI, which need not generate candidate
when mining frequent neighboring class sets with constraint class set.

2 Definition and Problem Description


Spatial data set is made up of each spatial object in spatial domain. We use this data
structure as <Object Identification, Class Identification and Spatial Location> to save
every object. Here, Class Identification distinguishes from different class in spatial
data set, Object Identification distinguishes from different spatial object of the same
class, Spatial Location expresses location coordinate of spatial object. Each object is
regarded as an instance of corresponding class, and so spatial data set is made up of
these instances of spatial Class Identification. Sets of Class Identification are thought
as a spatial class set as C = {C1, C2…Cm}, it means there are m different classes.
Definition 1. Neighboring Class Set, it is a subset of spatial class set in spatial data
set, which is expressed as {Ct1, Ct2…Ctk} (tk ≤ m) denoted by NCS. Let I = {it1, it2…
itk} be an instance of neighboring class set as NCS = {Ct1, Ct2…Ctk}, here, itj is an
instance of Ctj (j ∈ 1, 2…k).
Example, let {V, W, Y} be a NCS, and I = {V3, W1, Y2} is an instance of NCS.
Definition 2. Neighboring Class Set Length, its value is the number of class in
neighboring class set. If the length of NCS is k, it is expressed as k-NCS.
Definition 3. Right Instance of Neighboring Class Set, let I = {it1, it2… itk} be an
instance of NCS, if ∀ ip and iq (ip, iq ∈ I), and distance (ip, iq) ≤ d, and then we think I
be a right instance of NCS. Here, d is the minimal distance used by deciding two
spatial objects are close to each other, distance (ip, iq) expresses Euclidean distance.
Definition 4. Constraint Class Set, it is a proper subset of neighboring class set, which
is made up of class given by user.

Definition 5. Class Weight, it is an integer as 2 Nok −1 , here No k is the sequence number


of class in spatial class set as C = {C1, C2…Cm}.
Definition 6. Neighboring Class Set Vector, it is a vector, whose each component is
Class Weight as 2 Nok −1 in neighboring class set.
Example let C = {C1, C2… Cm} be a spatial class set, there is a neighboring class
set as {Ct1, Ct2… Ctk}, and then its Vector is expressed as NCSV = ( 2 t1−1 , 2 t2 −1 ... 2 tk −1 ).
Definition 7. Neighboring Class Set Identification, it is an integer whose value is the
sum of all components of NCSV, denoted by ∑ Lj=1 2 No j −1 , L is Length of NCS.
According to above example, Neighboring Class Set Identification is ∑kj =1 2 t j −1 .
An Algorithm of Constraint Frequent Neighboring Class Sets Mining 159

Definition 8. Neighboring Class Set Support, it is the number of right instance of


neighboring class set, which is denoted by support (NCS).
Definition 9. Frequent Neighboring Class Set, its support is not less than the minimal
support given by user.
As above knowledge, mining frequent neighboring class set with constraint class
set is expressed as follows:
Input:
(1) Spatial class set as C = {C1, C2… Cm}, instance set as I = {i1, i2… in}, each ik
(ik ∈ I) is expressed as above defined data structure.
(2) Minimal distance as d.
(3) Minimal support as s.
(4) Constraint class set.
Output: Frequent neighboring class set with constraint class set.

3 Mining Constraint Frequent Neighboring Class Sets

3.1 The Method of Separating Support Items

In order to not generate candidate when mining frequent neighboring class sets, the
algorithm introduces the method of separating support items. The method is used to
separate all items or itemsets from spatial transaction to compute support, namely,
extracting all itemsets supported by a spatial transaction to compute support. Let C =
{C1, C2…Cm} be a spatial class set, we regard NCS = {Ct1, Ct2…Ctk} as a transaction,
and the method is expressed as follows:
Step1, according to definition 5 and 6, computing Neighboring Class Set Vector as
NCSV = ( 2 t1−1 , 2 t2 −1 ... 2 tk −1 ).
Step2, let every itemsets supported by the NCS be new Neighboring Class Set, and
computing index interval as [1,2 k − 1] , we use this interval to generate Neighboring
Class Set Identification as NCSIx of these new Neighboring Class Sets.
Step3, computing NCSIx = B x ⋅ NCSV T , x ∈ [1,2 k − 1] , component of vector B x is k bit
of integer as x.
Example C = {U, V, W, X, Y, Z} is a spatial class set, NCS = {V, X, Y}. We use
the method of separating support items to extract all itemsets supported by the NCS.
Step1, we compute Neighboring Class Set Vector as NCSV = (22-1, 24-1, 25-1) = (2,
8, 16).
Step2, we compute index interval as [1,23 − 1] , namely, [1, 7].
Step3, we extract all itemsets supported by the NCS as follows:
NCSI1 = B1 ⋅ NCSV T = (0, 0, 1) · (2, 8, 16) T =16, corresponding NCS1= {Y}.
NCSI2 = B 2 ⋅ NCSV T = (0, 1, 0) · (2, 8, 16) T =8, corresponding NCS2 = {X}.
NCSI3 = B3 ⋅ NCSV T = (0, 1, 1) · (2, 8, 16) T =24, corresponding NCS3 = {X, Y}.
NCSI4 = B4 ⋅ NCSV T = (1, 0, 0) · (2, 8, 16) T =2, corresponding NCS4 = {V}.
NCSI5 = B5 ⋅ NCSV T = (1, 0, 1) · (2, 8, 16) T =18, corresponding NCS5 = {V, Y}.
160 G. Fang et al.

NCSI6 = B6 ⋅ NCSV T = (1, 1, 0) · (2, 8, 16) T =10, corresponding NCS6 = {V, X}.
NCSI7 = B7 ⋅ NCSV T = (1, 1, 1) · (2, 8, 16) T =26, corresponding NCS7 = {V, X, Y}.

Obviously, we use the method to gain all NCSk supported by NCS = {V, X, Y}, and
so compute once support of all NCSk supported by NCS = {V, X, Y}.

3.2 The Process of Mining Constraint Frequent Neighboring Class Sets


Input:
(1) Spatial class set as C = {C1, C2… Cm}, instance set as I = {i1, i2… in}, each ik
(ik ∈ I) is expressed as above defined data structure.
(2) Minimal distance as d.
(3) Minimal support as s.
(4) Constraint class set.
Output: Frequent neighboring class set with constraint class set.
The algorithm uses these data structure as follows:
Structure neighboring class set {
Char ncs; // saving neighboring class sets as NCS
Int count; // saving the number of right instance belong to NCS} NCS
Array [2m-1]; // saving the number of similar neighboring class set, namely, sup-
port of neighboring class set
F [2m-l-1]; // saving frequent neighboring class set, l is length of constraint class set
Step1, Computing the entire right instance as I’ from instance set as I by the mini-
mal distance as d.
Step2, Gaining neighboring class set as NCS after scanning once right instance set
as I’.
Step3, Scanning NCS[j] (j=0), according to NCS[j].ncs, computing Neighboring
Class Set Length as Lengthj and Neighboring Class Set Vector as NCSVj.
Step4, via the method of separating support items in chapter 3.1, the algorithm
computes Neighboring Class Set Identification as NCSIx of every NCSx separated
from NCS[j].ncs, and further executes the operation as follows:
Array [NCSIx -1] = Array [NCSIx -1] + NCS [j].count.
Step5: j = j + 1, and repeated executing these operations form step3 to step4.
Step6: Scanning Array [p - 1] by up search, namely, p is ascending, if Array [p - 1]
> s and p ∧ c = c (here c is Neighboring Class Set Identification of constraint class
set, ∧ is logical “and” operation), and writing p to F after deleting q (p ∧ q =
q, q ∈ F ).
Step7: Outputting frequent neighboring class set with constraint class set from F
according to definition 5 and 7.

4 The Analysis and Comparing of Capability


At present, there are little documents of research frequent neighboring class set.
AMFNCS written in [4] uses numerical variable to generate candidate, this method
is simple, and uses logic operation to compute support, and this is also very simple. In
mining process the algorithm hasn’t superfluous NCS, but it isn’t able to extract fre-
quent neighboring class set with constraint class set.
An Algorithm of Constraint Frequent Neighboring Class Sets Mining 161

TDA written in [5] adopts top-down strategy to generate candidate frequent


neighboring class set, which is made of three stages, firstly, computing the 1st m-
candidate frequent neighboring class set which contains all classes, and then generat-
ing (m-1)-candidate frequent neighboring class set, let (m-1) be k, and generating all
(k-1)-frequent neighboring class set (k>3) by iteration. But it isn’t also able to extract
frequent neighboring class set with constraint class set.
MFNCSWCC written in [6] via iterative search efficiently extracts short frequent
neighboring class set with constraint class set, but it need generate many candidates
and have a lot of repeated computing to restrict the efficiency.
CMBSSI presented by this paper need not generate candidate to extract frequent
neighboring class set with constraint class set, which is suitable for mining short fre-
quent neighboring class set with constraint class se. But it is able to reduce many
candidates frequent neighboring class sets and more repeated computing to improve
the efficiency.

4.1 The Analysis of Capability

Let C = {C1, C2…Cm} be a spatial class set, and let I = {i1, i2…in} be an instance set,
let nk (n=∑nk) be the number of instance of Ck, The length of constraint class set is l.

Time complexity. Computing of CMBSSI mainly includes three parts which are
expressed as computing right instance, separating support items of NCS and search
frequent NCS. Time complexity is expressed as follows:
(2 m−l − 1)[n 2 C m2 / m 2 + 2 m−l −1 − 1] .

Space complexity. Space complexity of CMBSSI is expressed as O ( α ⋅ 2 m ), α is


parameter about support and length of constraint class set. If right instances in spatial
data set are not uniform distribution, space utilization ratio of CMBSSI is too low.

4.2 The Comparing of Experimental Result

Now we use result of experiment to testify above analyses. Two algorithms


MFNCSWCC and CMBSSI are used to generate frequent neighboring class set with
constraint class set from 12267 right instances, whose Neighboring Class Set Identifica-
tion are from 3 to 8191, neighboring class set does not include any simple class, namely,
it has two classes at least, the number of spatial class is 13, Neighboring Class Set Iden-
tification of constraint class set is 9, the number of right instance included by these
neighboring class set observe the discipline which is expressed as follows:
Neighboring Class Set Identification as integer 8191 has one right instance.
Neighboring Class Set Identification as integer 8190 has two right instances.
Neighboring Class Set Identification as integer 8189 has one right instance.
Neighboring Class Set Identification as integer 8188 has two right instances.
...
Our experimental circumstances are expressed as follow: Intel(R) Celeron(R) M CPU
420 @ 1.60 GHz, 1.24G, language of the procedure is Visual C# 2005.NET, OS is
Windows XP Professional.
162 G. Fang et al.

50000
MFNCSWCC
Runtime(ms) )

40000
30000 CMBSSI

20000
10000
0
8.15 4.08 1.63 0.82 0.41 0.25 0.16 0.08
Support(%)

Fig. 1. Comparing the runtime of two algorithms as support changes

50000
Runtime(ms) )

40000 MFNCSWCC
30000 CMBSSI
20000
10000
0
3 4 5 6 7 8 9 10
Length

Fig. 2. Comparing the runtime of two algorithms as length changes

The runtime of two algorithms is expressed as Figure 1 as support of neighboring


class set changes, and the runtime of two algorithms is expressed as Figure 2 as length
of neighboring class set changes.
According to these two figures, we knew that CMBSSI is faster and more efficient
than MFNCSWCC when mining frequent neighboring class set with constraint class
set in large spatial database.

5 Conclusion
This paper proposes an algorithm of constraint frequent neighboring class sets mining
based on separating support items, which is suitable for mining frequent neighboring
class sets with constraint class set in large spatial database. In the future, we need
further discuss how to improve space utilization ratio.

Acknowledgments. This work was fully supported by science and technology re-
search projects of Chongqing Education Commission (Project No. KJ091108), and it
was also supported by science and technology research projects of Wanzhou District
Science and Technology Committee (Project No. 2010-23-01) and Chongqing Three
Gorges University (Project No. 10QN-22, 24 and 30).
An Algorithm of Constraint Frequent Neighboring Class Sets Mining 163

References
1. Ma, R.H., Pu, Y.X., Ma, X.D.: GIS Spatial Association Pattern Ming. Science Press, Beijing
(2007)
2. Ma, R.H., He, Z.Y.: Mining Complete and Correct Frequent Neighboring Class Sets from
Spatial Databases. Journal of Geomatics and Information Science of Wuhan Univer-
sity 32(2), 112–114 (2007)
3. Zhang, X.W., Su, F.Z., Shi, Y.S., Zhang, D.D.: Research on Progress of Spatial Association
Rule Mining. Journal of Progress in Geography 26(6), 119–128 (2007)
4. Fang, G.: An algorithm of alternately mining frequent neighboring class set. In: Tan, Y.,
Shi, Y., Tan, K.C. (eds.) ICSI 2010. LNCS, vol. 6146, pp. 588–593. Springer, Heidelberg
(2010)
5. Fang, G., Tu, C.S., Xiong, J., et al.: The Application of a Top-Down Algorithm in Neighbor-
ing Class Set Mining. In: International Conference on Intelligent Systems and Knowledge
Engineering, pp. 234–237. IEEE press, Los Alamitos (2010)
6. Fang, G., Xiong, J., Chen, X.F.: Frequent Neighboring Class Set Mining with Constraint
Condition. In: International Conference on Progress in Informatics and Computing, pp. 242–
245. IEEE press, Los Alamitos (2010)
A Multi-period Stochastic Production Planning
and Sourcing Problem with Discrete Demand
Distribution

Weili Chen, Yankui Liu , and Xiaoli Wu

College of Mathematics & Computer Science, Hebei University


Baoding 071002, Hebei, China
[email protected], [email protected], [email protected]

Abstract. This paper studies a new class of multi-period stochastic


production planning and sourcing problem with minimum risk criteria,
in which a manufacturer has a number of plants or subcontractors and
has to meet the product demands according to the service levels set by
its customers. In the proposed problem, demands are characterized by
stochastic variables with known probability distributions. The objective
of the problem is to minimize the probability that the total cost exceeds a
predetermined maximum allowable cost, where the total cost includes the
sum of the inventory holding, setup and production costs in the planning
horizon. For general demand distributions, the proposed problem is very
complex, so we cannot solve it by conventional optimization methods. To
avoid this difficulty, we assume the demands have finite discrete distribu-
tions, and derive the crisp equivalent forms of both probability objective
function and the probability level constraints. As a consequence, we turn
the original stochastic production planning problem into its equivalent
integer programming one so that the branch-and-bound method can be
used to solve it. Finally, to demonstrate the developed modeling idea, we
perform some numerical experiments via one 3-product source, 8-period
production planning problem.

Keywords: Stochastic production planning, Minimum risk criteria;


Probability service level, Integer programming.

1 Introduction

Production planning and the sourcing problem is viewed as the manufacturer’s


decision variables that are how much to produce, when to produce, where to pro-
duce, and how much inventory to carry in each period. In the literature, the de-
terministic production planning problem has received much attention [1,2,3,4,5].
However, due to the uncertainty usually presents in a complex decision system,
the stochastic production planning problems has also been studied widely in the
field of production planning management. For example, Bitran and Yanasse [6]

Corresponding author.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 164–172, 2011.

c Springer-Verlag Berlin Heidelberg 2011
A Multi-period Stochastic Production Planning and Sourcing Problem 165

dealt with a stochastic production planning problem with a service level require-
ment, and provided non-sequential and deterministic equivalent formulations of
the model; Zäpfel [7] claimed that MRP II systems can be inadequate for the
solution of production planning problems with uncertain demand because of
the insufficiently supported aggregation process, and proposed a procedure to
generate an aggregate plan and a consistent disaggregate plan for the master
production schedule, and Kelly et al. [8] considered randomness in demand for a
single-product, single-machine line with setups in the process industry, and pro-
posed a model that incorporates mean and standard deviation of demand in the
planning horizon time periods to set production runs. Though only one product
was being made, start-ups after periods of idleness required significant setups.
On the basis of fuzzy theory, the production planning problems have also been
studied in fuzzy community. In this respect, the interested reader may refer to
Lan et al. [9,10], and Sun et al. [11,12].
The purpose of this paper is to study a realistic production planning model.
We consider production, setup, and inventory carrying costs and minimum ser-
vice level constraints at each time period, in which the demands are stochastic
with known probability distributions. Most of the stochastic production plan-
ning models in the literature may formulate the model to minimize the expected
sum of all costs [6,8,13]. In the current development, we minimize the prob-
ability that the total cost exceeds a predetermined maximum allowable cost,
where the total cost includes the sum of the inventory holding, setup and pro-
duction costs in the planning horizon. For general demand distributions, the
proposed problem is very complex, so we cannot solve it by conventional opti-
mization methods. To avoid this difficulty, we assume the demands have finite
discrete distributions, and derive the crisp equivalent forms of both probabilistic
objective function and the probability level constraints. As a consequence, the
proposed production planning problem is turned into its equivalent integer pro-
gramming problem. Since there is no “one-size-fits-all” solution that is effective
for all integer programming problems, we adopt the branch and bound method
to solve our equivalent integer production planning problem.
The rest of this paper is organized as follows. In Section 2, we formulate a
new class of stochastic production planning models with probability objective
subject to service level constraints. In Section 3, we assume the demands have
finite discrete probability distributions, and deal with the equivalent formulation
of original stochastic production planning problem. Section 4 is devoted to the
discussion of the branch and bound solution method for the equivalent integer
production planning problem. Section 5 performs some numerical experiments
via one 3-product source, 8-period production planning problem to demonstrate
the developed modeling idea. Finally, we draw our conclusions in Section 6.

2 Formulation of Problem
In this section, we will develop a new class of stochastic minimum risk pro-
gramming models for a multi-period production planning and sourcing problem.
Assume that there is a single product and N types of production sources (plants
166 W. Chen, Y. Liu, and X. Wu

and subcontractors). The demand for this specific product in each period is
characterized by a random variable with known probability distribution.
The costs in the objective function consist of production cost, inventory hold-
ing cost and setup cost. The objective of the problem is to minimize the proba-
bility that the total cost exceed a predetermined maximum allowable cost.
Constraints on the performance (related to backorders) of the system are
imposed by requiring service levels which forces the probability of having no
stock out to be greater than or equal to a service level requirement in each
period.
In addition, we adopt the following notation to model our production planning
problem: i is the index of sources, i = 1, 2, . . . , N ; t is the index of periods,
t = 1, 2, . . . , T ; cit is the unit cost of production at source i in period t; ht is the
unit cost of inventories in period t; I0 is the initial inventory; It is the inventory
level at the end of period t; sit is the fixed cost of setup at source i in period t;
yit is 1 if a setup is performed at source i in period t, and 0 otherwise; Mit is
the capacity limitation of source i at time period t; dt is the stochastic demand
in period t; αt is the service level requirement in period t; ϕ is the maximum
allowable cost, and xit is the production quantities at source i in period t.
Using the notation above, a minimum-risk stochastic production planning
model with probability service levels is formally built as
⎧ T N

⎪ min Pr{ t=1 (ht (It )+ + i=1 (sit yit + cit xit )) > ϕ}

s.t.: Pr{It ≥ 0} ≥ αt , t = 1, 2, . . . , T
(1)

⎪ xit ≤ Mit yit , i = 1, 2, . . . , N, t = 1, 2, . . . , T

xit ∈ Z+ n
, yit ∈ {0, 1}, i = 1, 2, . . . , N, t = 1, 2, . . . , T,

where (It )+ = max{0, It }, t = 1, 2, . . . , T , are the real inventory levels. For each
period t the inventory balance is


N 
t 
N 
t
It = It−1 + xit − dt = I0 + xiτ − dτ , (2)
i=1 τ =1 i=1 τ =1

where the set of demand quantities {dt , t = 1, 2, . . . , T } are assumed to be mu-


tually independent random variables.
If demands dt , t = 1, 2, . . . , T , have general probability distributions, then
stochastic integer production planning problem (1) is very complex, so we cannot
solve it by conventional optimization methods. To find efficient solution method
for problem (1), we assume that the demands have finite discrete distributions,
and turn the original problem (1) into its equivalent integer programming prob-
lem. This issue will be addressed in the next section.

3 Handing Probability Objective and Level Constraints


In this section, we discuss the equivalent deterministic production planning
model of original problem (1). For this purpose, we assume that the discrete
distribution demand d is characterized by
A Multi-period Stochastic Production Planning and Sourcing Problem 167


 T dˆ1 dˆ2 · · · dˆK
d = d1 . . . dT ∼ , (3)
p1 p2 · · · pK

where dˆk = (dˆk1 , dˆk2 , · · · , dˆkT ) is the kth realization of demand d during T periods,

pk > 0 for all k such that K k=1 pk = 1.
In this case, the tth probability level constraint

 t N  t
Pr I0 + xiτ − dτ ≥ 0 ≥ αt (4)
τ =1 i=1 τ =1

can be turned into the following equivalent deterministic form



t 
N
I0 + xiτ ≥ Q−
t

(αt ), (5)
τ =1
τ =1 i=1

where Q−
t (αt ) is the left end-point of the closed interval of αt -quantiles of

τ =1

probability distribution F tτ =1 dτ (t) of random demand tτ =1 dτ .
Furthermore, we define a binary vector z whose components zk , k ∈ K, take
1 if the corresponding set of constraints has to be satisfied and 0 otherwise. In
particular, for each scenario k, we may introduce a number M large enough so
that the following inequality holds

T 
t 
N 
t 
N 
T
(ht (I0 + xiτ − dˆkτ )+ ) − M zk ≤ ϕ − (sit yit + cit xit ). (6)
t=1 τ =1 i=1 τ =1 i=1 t=1

As a consequence, the original stochastic integer production planning problem


(1) is equivalent to the following nonlinear integer programming problem:
⎧ K

⎪ min k=1 pk z k

⎪ N −

⎪ s.t. (I + t
i=1 xit ) ≥ Q tτ =1 dτ (αt ), t = 1, 2, . . . , T



0 τ =1

⎨ xit ≤ Mit yit , i = 1, 2, . . . , N, t = 1, 2, . . . , T
T t N 
⎪ t=1 ht (I0 + xi,τ − tτ =1 dτ )+ − M zk (7)

⎪ N T
τ =1 i=1

⎪ ≤ ϕ − i=1 t=1 (sit yit + cit xit ), k = 1, 2, . . . , K



⎪ zk ∈ {0, 1}, k = 1, . . . , K

xit ∈ Z+n
, yit ∈ {0, 1}, i = 1, 2, . . . , N, t = 1, 2, . . . , T.
By introducing auxiliary variables ltk , k = 1, 2, . . . , K, t = 1, 2, . . . , T , we can
turn problem (7) into the following integer linear programming problem:
⎧ K

⎪ min k=1 pk zk

⎪ t N


⎪ s.t.: (I0 + τ =1 i=1 xit ) ≥ Q− t (αt ), t = 1, 2, . . . , T

⎪ τ =1 dτ

⎪ x ≤ Mit yit , i = 1, 2, . . . , N, t = 1, 2, . . . , T
⎨ itT N T
ht ltk − M zk ≤ ϕ − i=1 t=1 (sit yit + cit xit ), k = 1, . . . , K (8)
t=1
⎪  


t N t
I0 + τ =1 i=1 xi,τ − τ =1 dˆkτ ≤ ltk , t = 1, 2, . . . , T, k = 1, . . . , K



⎪ lt ≥ 0, t = 1, 2, . . . , T, k = 1, 2, . . . , K
k



⎪ zk ∈ {0, 1}, k = 1, 2, . . . , K

xit ∈ Z+n
, yit ∈ {0, 1}, i = 1, 2, . . . , N, t = 1, 2, . . . , T.
168 W. Chen, Y. Liu, and X. Wu

Furthermore, we have the following result:


Theorem 1. Problem (7) and problem (8) are equivalent.

Proof. If (x̄it , ȳit , z̄k , i = 1, . . . , N, t = 1, . . . , T, k = 1, . . . , K) is a feasible solu-


tion to (7), then, for all t and k, we take


t 
N 
t
ltk = (I0 + xiτ − dˆkτ )+ . (9)
τ =1 i=1 τ =1

Hence, (x̄it , ȳit , z̄k , l̄tk , i = 1, . . . , N, t = 1, . . . , T, k = 1, . . . , K) is a feasible solu-


tion to (8) with equal objective value.
Conversely, let (x̂it , ŷit , ẑk , l̂tk , i = 1, . . . , N, t = 1, . . . , T, k = 1, . . . , K) be a
feasible solution to (8). Then, for each k, the following inequality holds


T 
t 
N 
t 
T
ht (I0 + x̂iτ − dˆkτ )+ − M ẑk ≤ ht l̂tk − M ẑk . (10)
t=1 τ =1 i=1 τ =1 t=1

Consequently, (x̂it , ŷit , ẑk , i = 1, . . . , N, t = 1, . . . , T, k = 1, . . . , K) is a feasible


solution to (7), and the corresponding objective value in (7) equals to the one
in (8). The proof of the theorem is complete.

From the reformulation of production planning problem (7), we can see that
even for a small size of the random vector, the number of K can be very large.
In addition, problem (8) consists of integer and binary decision variables. Thus,
problem (8) belongs to the class of NP-hard problems. In the next section, we
discuss the solution method of (8) by a general purpose optimization software.

4 Solution Method

The equivalent production planning problem (8) is an integer programming one


that may be solved by a pure enumeration scheme. However, the scheme would
not allow to compute large size of the random vector. Standard branch and bound
scheme uses enumeration ingeniously and it is considered the classical method
to solve the purely integer and the mixed-integer programming problem, and it
is one of the successful methods to solve this kind of programming problems at
present.
All commercially available integer programming software packages employ a
linear programming based branch and bound scheme. In order to use integer
programming software packages effectively, it is required to understand the use
of lower and upper bounds on the optimal objective value in an linear program-
ming based branch-and-bound algorithm. For a comprehensive exposition on
integer-programming algorithms we may refer to Nemhauser and Wolsey [14],
and Wolsey [15].
A Multi-period Stochastic Production Planning and Sourcing Problem 169

There is no “one-size-fits-all” solution method that is effective for all inte-


ger programming problems. Therefore, to handle situations in which the default
settings do not achieve the desired performance, integer-programming systems
allow users to change the parameter settings, and thus the behavior and perfor-
mance of the optimizer. By Lingo software, we employ the standard branch and
bound algorithm to solve the equivalent production planning problem (8).

5 Numerical Experiments
In this section, we perform some numerical experiments via the following ex-
ample. A manufacturer supplies his products to a retailer, suppose that the
manufacturer has three product sources, N = 3, and eight production periods,
T = 8. Each plant and subcontractor has different setup cost, product capacity
and unit production cost. Suppose sit , Mit , cit , ht , αt , ϕ are all predetermined by
the actual situation. The manufacturer has to meet the demands for different
products according to the service level requirements set by its customers. Let

Table 1. The Data Set of Production Planning Problem

sit
Periods 1 2 3 4 5 6 7 8
source
1 1500 1450 2000 1600 1200 1250 2200 1800
2 1200 1280 1300 1850 1600 1650 1480 2000
3 2500 2000 1880 1600 1980 1500 1660 1750
Mit
Periods 1 2 3 4 5 6 7 8
source
1 5000 4000 4500 4500 4500 4800 5000 5000
2 6000 5500 5500 4500 4800 3800 4000 4000
3 6500 6500 5500 4000 4000 3800 3800 3500
cit
Periods 1 2 3 4 5 6 7 8
source
1 2 3 2.5 2.5 3.5 2.5 2.5 2.5
2 2.5 3 3 4 4.5 1.6 3 1.8
3 3 3.5 2 2.5 2.2 2.8 5 3.5
dt
Periods 1 2 3 4 5 6 7 8
3800 3760 4800 4500 4890 3200 3450 3990
p 0.4 0.3 0.5 0.45 0.35 0.6 0.55 0.2
3290 4300 5200 5000 6100 5740 4880 4100
p 0.6 0.7 0.5 0.55 0.65 0.4 0.45 0.8
Periods 1 2 3 4 5 6 7 8
ht 4 5 5.5 4 4.5 3 3.5 6
αt 0.95 0.8 0.9 0.92 0.88 0.9 0.92 0.95
170 W. Chen, Y. Liu, and X. Wu

Table 2. The Optimal Solution of Production Planning

xit 1 2 3 4 5 6 7 8
1 3800 0 0 4500 2100 1830 4590 0
2 0 4100 0 0 0 3800 0 4000
3 0 0 5500 0 4000 0 0 0

us assume that the demand dt has a finite integer discrete distribution and it
is meaningful when the products are not indivisibly. We assume, for the sake of
simplicity, that the initial inventory level is 0, I0 =0, and the data used for this
test are collected in Table 1.
Due to T = 8 and each period demand has two realizations, one has K =
256. Let ϕ = 1.8 × 105 , and M = 106 . We employ the Lingo 8.0 to solve the
equivalent production planning problem (8). The obtained optimal suction of
the production planning problem is reported in Table 2, and the corresponding
optimal value is 0.1412070.
From Table 2 we get the production quantities at each source in each period.
The production quantity is nonzero as the binary variables yit = 1. From the
numerical experiment, we can see that even for a small size of the random vec-
tor, the number of K can be very large, and because of introducing auxiliary
variables, the scale of this numerical example is also rather large. Furthermore,
more numerical experiments for this example have been performed with differ-
ent values of parameter ϕ. Figure 1 shows that how the optimal objective value
varies with the predetermined maximum allowable cost ϕ. Lower values of ϕ al-
low bigger probability that the total cost exceeds the maximum allowable cost.
Nevertheless, the choice of ϕ is up to the capability of a decision maker. In reality
life, a manufacturer who has lower acceptable costs may suffer higher risk than
the ones who have higher acceptable costs. So the manufacturer should make a
decision according to the relationship between an acceptable maximum cost and
the suffered risk.

6 Conclusions
When optimal production decisions must be reached in a stochastic environ-
ment, in order to give to the optimization problem its appropriate form, the
formulation of the decision model requires a deeper probing of the aspirations
criteria. In addition, the computational obstacles should be overcome to find
optimal production decisions. In these two aspects, the major new contributions
of the current development are as follows.
(i) On the basis of minimum risk criteria, we have presented a new class of
stochastic production planning problem with probability objective subject
to service level constraints, in which product demands are characterized by
random variables. In addition, a manufacturer has a number of plants and
subcontractors and has to meet the product demands according to various
service levels prescribed by its customers.
A Multi-period Stochastic Production Planning and Sourcing Problem 171

0.9

0.8

0.7

0.6

risk 0.5

0.4

0.3

0.2

0.1

0
0.8 1 1.2 1.4 1.6 1.8 2
maximum allowable cost 5
x 10

Fig. 1. Trade-off between Maximum Allowable Cost and Risk

(ii) For general demand distributions, the developed stochastic production plan-
ning problem (1) is very complex, so we cannot solve it by conventional
optimization methods. So, we assumed the demands have finite discrete dis-
tributions, and derived the crisp equivalent forms of both probability ob-
jective function and the probabilistic level constraints. As a consequence,
we turn the original production planning problem (1) into its equivalent
integer programming model (7) so that the branch-and-bound method can
be used to solve it. The equivalent alternative formulation about
integer production planning problem (7) has also been discussed (see
Proposition 1).
(iii) To demonstrate the developed modeling idea, a number of numerical exper-
iments has been performed via one numerical example with three product
sources and eight production periods. By changing the value of parameter
ϕ, we get the trade-off between an acceptable maximum cost and the suf-
fered risk (see Figure 1). This relationship is considered as a guidance for
investment that is meaningful in reality production processing.

Acknowledgments
This work was supported by the National Natural Science Foundation of China
under Grant No.60974134, the Natural Science Foundation of Hebei Province
under Grant No.A2011201007, and the Education Department of Hebei Province
under Grant No.2010109.
172 W. Chen, Y. Liu, and X. Wu

References
1. Candea, D., Hax, A.C.: Production and Inventory Management. Prentice-Hall,
New Jersey (1984)
2. Das, S.K., Subhash, C.S.: Integrated Approach to Solving the Master Aggregate
Scheduling Problem. Int. J. Prod. Econ. 32(2), 167–178 (1994)
3. Dzielinski, B.P., Gomory, R.E.: Optimal Programming of Lot Sizes, Inventory and
Labor Allocations. Manag. Sci. 11, 874–890 (1965)
4. Florian, M., Klein, M.: Deterministic Production Planning with Concave Costs and
Capacity Constraints. Manag. Sci. 18, 12–20 (1971)
5. Lasdon, L.S., Terjung, R.C.: An Efficient Algirithm for Multi-Echelon Scheduling.
Oper. Res. 19, 946–969 (1971)
6. Bitran, G.R., Yanasee, H.H.: Deterministic Approximations to Stochastic Produc-
tion Problems. Oper. Res. 32(5), 999–1018 (1984)
7. Zäfel, G.: Production Planning in the Case of Uncertain Individual Demand Ex-
tension for an MRP II Concept. Int. J. Prod. Econ. 119, 153–164 (1996)
8. Kelly, P., Clendenen, G., Dardeau, P.: Economic Lot Scheduling Heuristic for Ran-
dom Demand. Int. J. Prod. Econ. 35(1-3), 337–342 (1994)
9. Lan, Y., Liu, Y., Sun, G.: Modeling Fuzzy Multi-Period Production Planning and
Sourcing Problem with Credibility Service Levels. J. Comput. Appl. Math. 231(1),
208–221 (2009)
10. Lan, Y., Liu, Y., Sun, G.: An Approximation-Based Approach for Fuzzy Multi-
Period Production Planning Problem with Credibility Objective. Appl. Math.
Model. 34(11), 3202–3215 (2010)
11. Sun, G., Liu, Y., Lan, Y.: Optimizing Material Procurement Planning Problem by
Two-Stage Fuzzy Programming. Comput. Ind. Eng. 58(1), 97–107 (2010)
12. Sun, G., Liu, Y., Lan, Y.: Fuzzy Two-Stage Material Procurement Planning Prob-
lem. J. Intell. Manuf. 22(2), 319–331 (2011)
13. Yıldırım, I., Tan, B., Karaesmen, F.: A Multiperiod Stochastic Production Plan-
ning and Sourcing Problem with Service Level Constraints. OR Spectrum 27(2-3),
471–489 (2005)
14. Nemhauser, G.L., Wolsey, L.A.: Integer and Combinatorial Optimization. John
Wiley & Sons, New York (1988)
15. Wolsey, L.A.: Integer Programming. John Wiley & Sons, New York (1998)
Exploration of Rough Sets Analysis in Real-World
Examination Timetabling Problem Instances

J. Joshua Thomas, Ahamad Tajudin Khader, Bahari Belaton, and Amy Leow

School of Computer Sciences, University Sains Malaysia & KDU College Penang
[email protected], {tajudin,bahari}@cs.usm.my

Abstract. The examination timetabling problem is widely studied and a major


activity for academic institutions. In real world cases, an increasing number of
student enrolments, variety of courses throw in the growing challenge in the
research with a wider range of constraints. Many optimization problems are
concerned with the best feasible solution with minimum execution time of algo-
rithms. The aim of this paper is to propose rough sets methods to investigate the
Carter datasets. Two rough sets (RS) approaches are used for the data analysis.
Firstly, the discretization process (DP) returns a partition of the value sets into
intervals. Secondly the rough sets Boolean reasoning (RSBR) achieves the best
decision table on the large data instances. The rough sets classified datasets are
experimented with an examination scheduler. The improvements of the solu-
tions on Car-s-91 and Car-f-91 datasets are reported.

Keywords: Examination Timetabling, Rough sets, discretization.

1 Introduction

Examination timetabling is a problem of allocating a timeslot for all exams in the


problem instances within a limited number of permitted timeslots, in such a way that
none of the specified hard constraints are violated. In most cases, the problem is high-
ly constrained and, moreover, the set of constraints which are required to be satisfied
is different from one institution to another as reported by Burke et al. [1]. In general,
the most common hard constraint is to avoid any student being scheduled for two
different exams at the same time. In practice, each institution usually has a different
way of evaluating the quality of the developed timetable. In many cases, the quality is
calculated based on a penalty function which represents the degree to which the con-
straints are satisfied.
Over the years, numerous approaches have been investigated and developed for
exam timetabling. Such approaches include constraint programming, graph colouring,
and various metaheuristic approaches including genetic algorithms, tabu search, simu-
lated annealing, the great deluge algorithm, and hybridized methods which draw on
two or more of these techniques. Some recent important papers which reflect this
broad range of activity are [2, 3, 4, 5]. The earlier work by authors focus on interac-
tion on the scheduling data and, this is a continuous research on the data analysis on
the same problem. These approaches can be found in paper [6].

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 173–182, 2011.
© Springer-Verlag Berlin Heidelberg 2011
174 J. Joshua Thomas et al.

In examination timetabling problem the exams are ordered prior to assignment to a


timeslot, have been discussed by several authors including Boizulmault et al. [8],
Brailsford et al. [9], Burke et al. [10], Burke and Newall [11], Burke and Petrovic [12]
and Carter et al. [16]. Carter et al. [16] reports the use of four ordering criteria to rank
the exams in decreasing order to estimate how difficult it is to schedule each of the
exams. Each one of these techniques has its own properties and features including
their ability of finding important rules and information that could be useful for the
examination timetabling domain. However, no literature is discussed on the rough sets
based methodology to address the problem instances.
Rough set theory [17,18,19] is a comparatively innovative intelligent technique
that has been applied to the real-world cases, and is used for the discovery of data
dependencies, discovers the patterns of data, and seeks the minimum subset of values.
One advantage of the rough set is the creation of readable if-then rules. Such rules
have a potential to reveal new patterns in the data material. More advanced and intel-
ligent techniques have been used in data analysis such as neural network, Bayesian
classifier, genetic algorithms, decision trees, fuzzy theory, and rough set however
rough set methods are not popular yet used for scheduling datasets. It offers a prob-
lem-solving tool between the precision of classical mathematics and the natural va-
gueness of the real world. Other approaches like case-based reasoning and decision
trees [20,21] are also widely used to solve data analysis problems. The objective of
the investigation has been to develop intervals that could rank the dataset on the basis
of rough set discretization process and make decision. The newly created rough sets
based dataset will then inject to the examination evaluator to generate quality feasible
solution.
The structure of the paper is as follows. Section 2 discusses the rough sets data
analysis method in detail. The characteristics of the benchmark dataset are presented
in section 3. The modeling process based on rough sets is briefly described in section
4. Experimental analysis and Results are in Section 5. Finally, the conclusion is pre-
sented in Section 6.

2 Rough Sets Methods


Rough sets methods is to analyze datasets of the Carter, Laporte and Lee [15, 16] with
a set of 12 real-world exam timetabling problems from 3 Canadian highs schools, 5
Canadian, 1 American, 1 UK and 1 mid-east universities. The methods used in this
study consist of two main stages: Discretization process (DP) and Rough sets Boolean
reasoning (RSBR) data processing. Preprocessing stage (DP) includes discretization.
Data Processing (RSBR) includes the generation of preliminary knowledge, such as
computation of object ranking from data, and the classification processes. The final
goal is of generating rules from the information or decision system for the benchmark
datasets. Figure 1 shows the overall steps in the proposed rough sets data analysis
methods.
Exploration of Rough Sets Analysis in Real-World Examination 175

Fig. 1. Rough Set Data Analysis method

3 Dataset
Many researchers are using the benchmarking Carter dataset [16] to applying the
methods and test the results with quality feasible solution. There are two standard
dataset used by the examination timetabling community Carter dataset and the
ITC(International Timetabling Competition) dataset[ 14]. Everybody in the scientific
community uses in order to test any proposed algorithms. The Carter dataset were
introduced in 1996 by Carter, Laporte and Lee in a paper published in the Journal of
the Operational Research Society. One of the major drawbacks of most articles in the
timetabling literature is that testing is limited to randomly generated problems and
perhaps to one practical example.
The formulation for the Carter dataset as follows:
─ The room capacities for the examination rooms (which is why is it considered
an uncapacitated problem).
─ The fact that two consecutive examinations, which are on different days, is bet-
ter than two consecutive examinations on the same day.
Both of these scenarios would give the same penalty cost using the usual objective
function used in the Carter dataset, even though the student would have the evening
(indeed, all night) to revise, as opposed to no time at all if the examinations were truly
consecutive. Indeed, each instance in the dataset just has a number of timeslots. There
is concept of different days.
176 J. Joshua Thomas et al.

Table 1. Carter Examination Timetabling problem instances

The recent examination timetabling review paper [13] has explained the two ver-
sions of the datasets and the modifications. However, the contributions of the works
are not with the data values, but on the problem instances. Few works were done on
modifying with respective of real-world scenario provided by the institutions. Table 1
shows Carter dataset with the problem instances.

4 Pre-processing

In real world examination timetabling, many decisions are required to take into ac-
count several factors simultaneously under various sets of constraints (soft con-
straints). Usually it is not known which parameter(s) need to be emphasized more in
order to generate a better solution or decision. In many cases, a tradeoff between the
various potentially conflicts on assignment of exams into timeslots. Rough sets usual-
ly employ a dataset is represented as a table, where each row represents an object.
Every column represents a variable, an observation that can be evaluated for each
object. This table is called an information system. The following ordering criteria
were considered when selecting which exam should be scheduled first:
─ Number of conflict exams, largest degree (LD)
─ Number of student enrolled, largest enrollment (LE)
─ Number of available slot, saturation degree (SD)
In each case, two out of the three criteria above were selected as input variables.
More formally, a pair U, A ,where U is a non-empty finite set of objects
called the universe and A is a non-empty finite set of attributes such that :
for every . The set is called the value set of a.
Exploration of Rough Sets Analysis in Real-World Examination 177

Table 2. Sample Information system

Table 3. Decision Table on the car-s-91 data instances

An example of simple information system is shown in Table 2. There are 12 cases


or variable objects, and two conditions (Course and Enrollment). The cases of varia-
ble objects x4 and x5 as well as x10 and x12 have exactly the same values of conditions.
A decision system is any information system of the form U, A d , where
is the decision conditions or criteria’s.
A small example decision table can be found in Table 3. The table has the same 12
cases or variable objects as in the previous example, but one decision attribute num-
ber of student enrollment (LE) with two three possible outcomes has been added. The
reader may again notice that cases x4 and x5 as well as x10 and x12 still have exactly
same values of conditions, but the second pair has a different outcome. The definition
to be synthesized from the decision tables is in of the rule form:
IF Course = 0004 and Enrollment = 73 then LE = Medium
It is assumed that a decision table expresses all the knowledge about the model. Same
objects may be represented several times or objects overflows. The notion of equiva-
lence must be considered first. A binary relation (RB) R which is reflexive,
a value is in relation with itself (x RB x), symmetric (if x RB y then y RB x) and transi-
tive if (x RB x and yRB z then xRB z) is called an equivalence relation.
Let , be an information system, then with any there is associated
an equivalence relation
178 J. Joshua Thomas et al.

= , | (1)
is called the indiscernibility relation.
For instance Table. 2 define an indiscernibility relation. The subsets of the
conditional attributes are [Course], [Enrollment]. If for instance, [No of students
Enrollment (LE)] only, objects x4 and x5 belongs to the same equivalence class and
indiscernible. We look at the relation defines three equivalence class
identified below.
= {{ x1,}, { x2,}, { x3,}, { x4,}, { x5,}, { x6,}, { x7,}, { x8,},
{ x9,}, { x10,}, { x11,}, { x12,}}
= {{ x4 , x5 }, { x10, x12}}
= {{ x1, x2}, { x3, x6, x7, x10, x11, x12}, { x4, x5}, { x8, x9}}

4.1 Data Completion and Discretization of Data Values

The rough set approach requires only indiscernibility it is not necessary to define an
order or distance when the values of different kinds are combined (e.g. courses,
enrollment). The discretization step determines how roughly the data to be processed.
We called this as “pre-processing”. For instance, course or enrollment values have to
establish cut-off points. The intervals might be refined based on good domain know-
ledge. Setting the cut-off points are computationally expensive for the large datasets
and that domain expert to prepare discretization manually.
Let be an information system with n objects. The discernibility matrix of is
symmetric n matrix with entries as given below. Each entry thus consists of
the set of values upon which objects and differ.
| } for i, j = 1,….,n (2)
The discrenibility function for an information system is a Boolean function
m Boolean variables (a1………..am).

……, |1 , 0

The discernibility function is


, , , , , , ,
, , , ,
,
, ,
Where “,” stands for disjunction in the Boolean expression, after simplification, the
function is , , , where (e denotes enrollment and r denotes rank the
data values).

4.2 Data Processing

Processing stage includes generating knowledge, such as computation of object from


data, split intervals, ranking and classification. These stages lead towards the final
goal of generating rules from information or decision system of the Carter dataset.
Exploration of Rough Sets Analysis in Real-World Examination 179

Let , be given. The cardinality of the data d(U) = {k|d(x) = k,


x } is called the rank of d and is denoted by r(d). Assume the set of values of
decision d is equal to …… }. Quite often the rank is of two Boolean
values (e.g. Y, N) or it can be an arbitrary number, in the above example, we
could have four ranks if the decision had values in the set {rank 3, rank 2,
rank 1, rank 0}.The decision d determines a partition
, …… .

| for 1 } (3)

is called the classification of objects in determinded by the deci-


th
sion d. The set , is called the i- decision class of . Fig 2 explains the RSBR
discretization algorithm applied on the dataset. Table 4 shows the intervals and rank-
ing of the dataset.

Input: Information table (T) created from the dataset value column and n is the no. of interval
for each column value.
Output: Information table (DT) with discretized real value column.
1. For do
2. Define Boolean variables B = ∑ , ……
3. End For where ∑ correspond to a set of partition defined on the va-
riables of column v.
4. Create a new Information table (DT) by using the set of partition.
5. Find the objects that discerns in the decision class.

Fig. 2. Rough sets Boolean Reasoning discretization

Table 4. Interval & Ranking of Carter dataset

For instance, the Table. 4 shows the interval and cut-off points used in the Carter
dataset problem instances. The column count explains the large, average, medium and
low intervals a set on the standard dataset where the number of student enrolled larg-
est enrollment (LE). Searching the reducts form a decision table is NP-complete.
Fortunately, Carter dataset has no reducts, and the work directions with setting inter-
vals, ranking with classification on the dataset.
180 J. Joshua Thomas et al.

5 Experiment and Result


The algorithm was developed using java based object oriented programming. The
experiments were run on a PC with a 2.8 GHz Core(2) Duo and 2GB of RAM.
Carter’s (1996) publicly available exam timetabling datasets were used in the
experiments as shown in Table 1. In this work, we evaluate the performance of our
approach on twelve instances. In order to test our work modification to the sequential
construction method previously developed by Carter et al. [16], the algorithm was
initially run with implementing rough set discretization. The experiment works on the
exams in the problem instances are in single interval criteria. (E.g. LE)
From Table 5 it can be seen that the initial rough sets methods produced
comparable results to the single interval criteria. Slightly modified problem instance
of Car-f-91 and Car-f-92 are tested with the algorithm. The algorithm produced
similar but better results for the Car-f-91 and Car-f-92 dataset, hec-s-92 and kfu-s-93
other datasets were compared with the standard results.

Table 5. Experimental results for the rough sets discretization approach that were implemented

6 Conclusion
In this paper, we have presented an intelligent data analysis approach based on rough
sets theory for generating classification rules from a set of observed 12- real world
problem instances as a benchmarking dataset for the examination timetabling com-
munity. The main objective is to investigate the problem instances/dataset and with
minor modification to obtained better timetables. To increase the classification
process rough sets with Boolean reasoning (RSBR) discretization algorithm is used to
discretize the data. Further work will be done to minimize the experiment duration in
order to get better results with the rough set data analysis.
Exploration of Rough Sets Analysis in Real-World Examination 181

References
[1] Burke, E.K., Elliman, D.G., Ford, P.H., Weare, R.F.: Examination timetabling in British
Universities – a survey. In: Burke, E., Ross, P. (eds.) PATAT 1995. LNCS, vol. 1153, pp.
76–90. Springer, Heidelberg (1996)
[2] Burke, E.K., Elliman, D.G., Weare, R.F.: A hybrid genetic algorithm for highly con-
strained timetabling problems. In: Proceedings of the 6th International Conference on Ge-
netic Algorithms (ICGA 1995), Pittsburgh, USA, July 15-19, pp. 605–610. Morgan
Kaufmann, San Francisco (1995)
[3] Burke, E.K., Bykov, Y., Newall, J., Petrovic, S.: A time-predefined local search approach
to exam timetabling problems. IIE Transactions on Operations Engineering, 509–528
(2004)
[4] Caramia, M., Dell’Olmo, P., Italiano, G.F.: New algorithms for examination timetabling.
In: Näher, S., Wagner, D. (eds.) WAE 2000. LNCS, vol. 1982, pp. 230–241. Springer,
Heidelberg (2001)
[5] Carter, M.W., Laporte, G., Lee, S.Y.: Examination timetabling: Algorithmic strategies
and applications. Journal of the Operational Research Society, 373–383 (1996)
[6] Joshua, J., et al.: The Perception of Interaction on the University Examination Timetabl-
ing Problem. In: McCollum, B., Burke, E., George, W. (ed.) Practice and Theory of Au-
tomated Timetabling, ISBN 08-538-9973-3
[7] Al-Betar, M., et al.: A Combination of Metaheuristic Components based on Harmony
Search for The Uncapacitated Examination Timetabling. In: McCollum, B., Burke, E.,
George, W. (eds.): Practice and Theory of Automated Timetabling, ISBN 08-538-9973-3
(PATAT 2010, Ireland, Aug, selected papers) for Annals of operational research
[8] Boizumault, P., Delon, Y., Peridy, L.: Constraint logic programming for examination
timetabling. The Journal of Logic Programming 26(2), 217–233 (1996)
[9] Brailsford, S.C., Potts, C.N., Smith, B.M.: Constraint satisfaction problems: Algorithms
and applications. European Journal of Operational Research 119, 557–581 (1999)
[10] Burke, E.K., de Werra, D., Kingston, J.: Applications in timetabling. In: Yellen, J., Gross,
J.L. (eds.) Handbook of Graph Theory, pp. 445–474. Chapman Hall, CRC Press (2003)
[11] Burke, E.K., Newall, J.P.: Solving examination timetabling problems through adaption of
heuristic orderings. Annals of Operations Research 129, 107–134 (2004)
[12] Burke, E.K., Petrovic, S.: Recent research directions in automated timetabling. European
Journal of Operational Research 140, 266–280 (2002)
[13] Qu, R., Burke, E.K., McCollum, B., Merlot, L.T.G., Lee, S.Y.: A Survey of Search Me-
thodologies and Automated System Development for Examination Timetabling. Journal
of Scheduling 12(1), 55–89 (2009), online publication (October 2008), doi:
10.1007/s10951-008-0077-5.pdf
[14] McCollum, B., Schaerf, A., Paechter, B., McMullan, P., Lewis, R., Parkes, A., Di Gaspe-
ro, L., Qu, R., Burke, E.: Setting The Research Agenda in Automated Timetabling: The
Second International Timetabling Competition. INFORMS Journal on Computing 22(1),
120–130 (2010)
[15] Carter, M.W.: A survey of practical applications of examination timetabling algorithms.
Operation Research 34(2), 193–202 (1986)
[16] Carter, M.W., Laporte, G., Lee, S.Y.: Examination timetabling: Algorithmic strategies
and applications. Journal of the Operational Research Society 47, 373–383 (1996)
[17] Pawlak, Z.: Rough sets. International Journal of Computer and Information Science 11,
341–356 (1982)
182 J. Joshua Thomas et al.

[18] Pawlak, Z.: Rough Sets Theoretical Aspect of Reasoning about Data. Kluwer Academic,
Boston (1991)
[19] Pawlak, Z., Grzymala-Busse, J., Slowinski, R., Ziarko, W.: Rough sets. Communications
of the ACM 38(11), 89–95 (1995)
[20] Ślęzak, D.: Various approaches to reasoning with frequency-based decision reducts: a
survey. In: Polkowski, L., Tsumoto, S., Lin, T.Y. (eds.) Rough Sets in Soft Computing
and Knowledge Discovery: New Developments. Physica Verlag, Heidelberg (2000)
[21] Pal, S.K., Polkowski, L., Skowron, A.: Rough-Neuro Computing: Techniques for Compu-
ting with Words. Springer, Berline (2004)
Community Detection in Sample Networks
Generated from Gaussian Mixture Model

Ling Zhao1 , Tingzhan Liu2 , and Jian Liu3


1
Beijing University of Posts and Telecommunications,
Beijing 100876, P.R. China
2
School of Sciences, Communication University of China,
Beijing 100024, P.R. China
3
LMAM and School of Mathematical Sciences, Peking University,
Beijing 100871, P.R. China
[email protected]

Abstract. Detecting communities in complex networks is of great im-


portance in sociology, biology and computer science, disciplines where
systems are often represented as networks. In this paper, we use the
coarse-grained-diffusion-distance based agglomerative algorithm to un-
cover the community structure exhibited by sample networks generated
from Gaussian mixture model, in which the connectivity of the network is
induced by a metric. The present algorithm can identify the community
structure in a high degree of efficiency and accuracy. An appropriate
number of communities can be automatically determined without any
prior knowledge about the community structure. The computational re-
sults on three artificial networks confirm the capability of the algorithm.

Keywords: Community detection, Gaussian mixture model, Coarse-


grained diffusion distance, Agglomerative algorithm, k-means.

1 Introduction
The modern science of networks has brought significant advances to our under-
standing of complex systems [1,2,3]. One of the most relevant features of graphs
representing real systems is community structure, i.e. the organization of vertices
in clusters, with many edges joining vertices of the same cluster and compara-
tively few edges joining vertices of different clusters. Such communities can be
considered as fairly independent compartments of a network, playing a similar
role like the tissues or the organs in the human body [4,5]. Detecting communi-
ties is of great importance, which is very hard and not yet satisfactorily solved,
despite the huge effort of a large interdisciplinary community of scientists work-
ing on it over the past few years [6,7,8,9,10,11,12,13]. On a related but different
front, recent advances in computer vision and data mining have also relied heav-
ily on the idea of viewing a data set or an image as a graph or a network, in
order to extract information about the important features of the images [14].
In our previous work [12], we extend the measure of diffusion distance between
nodes in a network to a generalized form on the coarse-grained network with data

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 183–190, 2011.

c Springer-Verlag Berlin Heidelberg 2011
184 L. Zhao, T. Liu, and J. Liu

parameterization via eigenmaps. This notion of proximity of meta-nodes in the


coarse-grained networks reflects the intrinsic geometry of the partition in terms
of connectivity of the communities in a diffusion process. Nodes are then grouped
into communities through an agglomerative hierarchical clustering technique [15]
under this measure and the modularity function [7,8] is used to select the best
partition of the resulting dendrogram.
A widely used simulated example is the sample network generated from a
Gaussian mixture model [10,11]. This model is related the concept of “random
geometric graph” proposed by Penrose [16] except that we take Gaussian mixture
here instead of uniform distribution. First we generate n sample points {xi } in
two
K dimensional Euclidean space subject to a K-Gaussian mixturedistribution
K
k=1 qk G (µk , Σk ), where {qk } are weights that satisfy 0 < qk < 1, k=1 qk = 1,
while µk and Σk are the mean positions and covariance matrices for each com-
ponent, respectively. Here we pick node set Tk in group k, and with this choice,
approximately qk = |Tk |/n. Next, we generate the network with a thresholding
strategy. That is, if |xi − xj | ≤ dist, we assign an edge between the i-th and j-th
node; otherwise they are not connected. With this strategy, the connectivity of
the network is induced by a metric. We are interested in the connection between
our network clustering and the traditional clustering in the metric space. To
evaluate our result obtained by the algorithm proposed above, we can compare
the clustering result with the original partition {Tk }. Notice that {Tk } is in-
dependent of the topology of the network, which can be only considered as a
reasonable reference value but not an exact object. Another choice is to compare
our result with those obtained from k-means algorithm [15] since the metric is
known in this case.
The rest of the paper is organized as follows. In Section 2, we briefly introduce
the coarse-grained random walk and coarse-grained diffusion distance [12]. After
reviewing the definition of modularity, we describe the algorithm in detail. In
Section 3, we apply the algorithm to three artificial examples mentioned before.
Finally we make the conclusion in Section 4.

2 Framework of Coarse-Grained-Diffusion-Distance
Based Agglomerative Algorithm
2.1 Construction of Coarse-Grained Diffusion Distance
We will start with a brief review of the basic idea in [12]. Let G(S, E) be a
network with n nodes and m edges, where S is the nodes set, E = {e(x, y)}x,y∈S
is the weight matrix and e(x, y) is the weight for the edge connecting the nodes x
and y. We can relate this network to a discrete-time Markov chain with stochastic

matrix P with entries p1 (x, y) given by p1 (x, y) = e(x,y)
d(x) , d(x) = z∈S e(x, z),
where d(x) is the degree of the node x [3]. The process is driven by the P t =
{pt (x, y)}x,y∈S , where pt (x, y) represents the probability of going from node
x to node y through a random walk in t time steps. This Markov chain has
stationary distribution μ(x) =  d(x)d(z) and it satisfies the detailed balance
z∈S
Community Detection in Sample Networks 185

condition μ(x)p1 (x, y) = μ(y)p1 (y, x). The diffusion distance Dt (x, y) between x
and y is defined as the weighted L2 distance
 
 pt (x, z) − pt (y, z) 2
Dt2 (x, y) = , (1)
μ(z)
z∈S

where the weight μ(z)−1 penalize discrepancies on domains of low density more
than those of high density. As is well known, the transition matrix P has a set
of left and right eigenvectors and a set of eigenvalues 1 = λ0 ≥ |λ1 | ≥ · · · ≥
|λn−1 | ≥ 0, and naturally P ϕi = λi ϕi , ψiT P = λi ψiT , i = 0, 1, · · · , n − 1. Note
that ψ0 = μ, ϕ0 ≡ 1 and ψiT ϕj = δij . The left and right eigenvectors are related
according to ψi (x) = ϕi (x)μ(x). The spectral decomposition of P t is given by


n−1 
n−1
pt (x, y) = λti ϕi (x)ψ(y) = λti ϕi (x)ϕ(y)μ(y), (2)
i=0 i=0

then the diffusion distance (1) can be reduced to


n−1
Dt2 (x, y) = i (ϕi (x) − ϕi (y)) .
λ2t 2
(3)
i=0

N 
We take a partition of S as S = k=1 Sk with Sk Sl = Ø if k  = l, and
regard each set Sk in the state space S = {S1 , · · · , SN } as corresponding to the
nodes of a N -nodes network Ĝ(S, Et ), where Et = {êt (Sk , Sl )}Sk ,Sl ∈S , and the
weight
 êt (Sk , Sl ) on the edge that connects Sk and Sl is defined as êt (Sk , Sl ) =
x∈Sk ,y∈Sl μ(x)pt (x, y), where the sum involves all the transition probabilities
between x ∈ Sk and y ∈ Sl . From the detailed balance  condition, it can be
verified that êt (Sk , Sl ) = êt (Sl , Sk ). By setting μ̂(Sk ) = z∈Sk μ(z), one can
define a coarse-gained Markov chain on Ĝ(S, Et ) with stationary distribution μ̂
and transition probabilities

êt (Sk , Sl ) 1 
p̂t (Sk , Sl ) = N = μ(x)pt (x, y). (4)
m=1 êt (Sk , Sm )
μ̂(Sk )
x∈Sk ,y∈Sl

It can be easily shown that p̂t is a stochastic matrix on the state space S and
satisfies a detailed balance condition with respect to μ̂. More generally, we define
coarse-grained versions  of ψi in a similar way by summing over the nodes in a
partition ψ̂i (Sk ) = z∈Sk ψi (z) and as above, coarse-grained versions of ϕi
according to the duality condition ψ̂i (Sk ) = ϕ̂i (Sk )μ̂(Sk ) is defined as ϕ̂i (Sk ) =
ψ̂i (Sk ) 1

μ̂(Sk ) = μ̂(Sk ) z∈Sk ϕi (z)μ(z). Then the coarse-grained probability p̂t can be
written in a similar way as (2) in the spectral decomposition form as follows


n−1
p̂t (Sk , Sl ) = λti ϕ̂i (Sk )ϕ̂i (Sl )μ̂(Sl ). (5)
i=0
186 L. Zhao, T. Liu, and J. Liu

This can be considered as an extension version of (2). This leads to the diffusion
distance between community Sk and Sl given by

n−1 
N
D̂t2 (Sk , Sl ) = λti λtj (ϕ̂i (Sk ) − ϕ̂i (Sl ))(ϕ̂j (Sk ) − ϕ̂j (Sl )) ψ̂i (Sm )ϕ̂j (Sm ).
i,j=0 m=1
(6)
This notion of proximity of communities in the coarse-grained networks reflects
the intrinsic geometry of the set S in terms of connectivity of the meta-nodes
in a diffusion process. This metric is thus a key quantity in the design of the
following algorithm that is based on the preponderance of evidences for a given
hypothesis.

2.2 Modularity Maximization and Its Main Limits


In recent years, a concept of modularity proposed by Newman [7,8] has been
widely used as a measure of goodness for community structure. A good divi-
sion of a network into communities is not merely one in which the number of
edges running between groups is small. Rather, it is one in which the number
of edges between groups is smaller than expected. These considerations lead
to the modularity Q defined by Q = (number of edges within communities) −
(expected number of such edges). It is a function of the particular partition of
the network into groups, with larger values indicating stronger community struc-
ture. Let pE (x, y) be the expected number of edges between x and y, and then
for a given partition {Sk }N
k=1 , the modularity can be written as

1   
N
d(x)d(y)
Q= e(x, y) − pE (x, y) , pE (x, y) = (7)
2me 2me
k=1 x,y∈Sk

and me is the total weight of edges given by x,y∈S e(x, y)/2. Some existing
methods are presented to find good partitions of a network into communities
by optimizing the modularity over possible divisions, which has proven highly
effective in practice [7,8,11,12].

2.3 The Algorithm


Agglomerative clustering algorithms begin with every observation representing
a singleton cluster. At each of the n − 1 steps the closest two (least dissimilar)
clusters are merged into a single cluster, producing one less cluster at the next
higher level [15]. Here we make use of the process of agglomerative hierarchical
clustering technique for network partition and choose the coarse-grained diffusion
distance as the measure of dissimilarity between communities at each step. The
maximal value of modularity determines the optimal partition of the network.
Given a distance measure between points, the user has many choices for how
to define intergroup similarity in traditional clustering literature [15]. However,
different choices have different benefits and shortages. The advantage of our al-
gorithm is that the proposed measure of coarse-grained diffusion distance over-
comes the weaknesses of the traditional linkage techniques based on node-to-node
Community Detection in Sample Networks 187

Table 1. The parameters for construction of the three sample networks generated from
the Gaussian mixture model

Networks n K µT1 µT2 µT3 µT4 σ dist


1 150 3 (0.0,3.0) (1.5,4.5) (-0.5,5.0) 0.15 0.9
2 300 3 (1.0,4.0) (3.0,6.0) (0.5,6.5) 0.25 0.8
3 320 4 (1.0,5.0) (3.0,5.5) (1.0,7.0) (3.0,7.5) 0.15 0.8

Table 2. The computational results obtained by our method. Here CR1 and CR2 are
the correct rates compared with the original partition {Tk } and those obtained from
k-means algorithm, respectively.

Networks n N t Q CR1 CR2


1 150 3 1 0.6344 0.9867 1
2 300 3 3 0.6547 0.9867 1
3 320 4 3 0.7301 0.9969 1

dissimilarity mentioned above, since it takes into account all the information re-
lating the two clusters. The only parameter our computation is the time step
t and increasing t corresponds to propagating the local influence of each node
with its neighbors.

123
6 142107
149 108 121
126 112110
127
115 147
143
114
5.5 124 102106 66
139 120
128 140
133 130 76
144 129 118 132 85 57
70 89 60 78
5 104116 138
122109 83
131 113136111
146 101
137 99 5955 96 67
119103 9380 71 7756
5182 72
145 150
125 81 86 52
84
75100 91
4.5 141 88
87
105 62 68 73
135
148 64 53
7494
117 92
134 9754 98 65
4 69 95 63 61 58 79
y

3 90
3.5 14 42
24
6 9
13 1234
15 41 33
3 2644
18 43
29
40
8 4930162 31 4
11 25 20
46 736
2.5 4538 21
Samples in group 1 28
23 1 37 27
Samples in group 2 19 5 22
32 47
2 Samples in group 3 39 50
17 10

−1 0 1 2
48
x 35

(a) (b)

Fig. 1. (a)150 sample points generated from the given 3-Gaussian mixture distribu-
tion. The star symbols represent the centers of each Gaussian component. The circle,
square and diamond shaped symbols represent the position of sample points in each
component, respectively. (b)The network generated from the sample points in Figure
1(a) with the parameter dist = 0.9.
188 L. Zhao, T. Liu, and J. Liu

3 Experimental Results
As mentioned in Section 1, we generate n sample points {xi } in two dimensional

Euclidean space subject to a K-Gaussian mixture distribution K k=1 qk G (µk ,Σk ).
Here we pick nodes n(k − 1)/K + 1 : nk/K in group Tk for simplicity, and with
this choice, approximately qk = n/K, k = 1, · · · , K. The covariance matrices
are set in diagonal form Σk = σI. The other parameters for construction of the
three sample networks generated from the Gaussian mixture model are list in Ta-
ble 1. The computational results obtained by our method are shown in Table 2.
Here CR1 and CR2 are the correct rates compared with the original partition
{Tk } and those obtained from k-means algorithm, respectively. We can see that
the number of communities are in accordance with the number of components in
the corresponding Gaussian mixture models and the two kinds of correct rates

0.6
Modularity Q
0.6 t=1
t = 1000 0.4 Q =0.6344
max
0.5
t = 1200
0.2
t = 1500
Modularity function

t = 2000 0
0.4
20 40 60 80 100 120 140

0.3 27
35
4
33
31
43
29
134
48
23
0.2 47
39
17
50
32
22
10
5
3
11
0.1 40
8
14
6
19
41
18
44
30
0 16
45
25
28
21
38
0 50 100 150 49
46
20
26
Number of communities 15
42
24
34
12
13
9
36
(a) 7
2
37
1
140
132
141
117
148
135
105
123
123 142
126
142 150
125
107 124
139
149 108 121 128
127
112
126 110
127 112
149
115 147
143
114 115
110
124 102106 66 108
107
139 120
128 140 146
131
133 130 76 145
144 129 118 132 85 57
70 89 60 78
144
113
104
104116 138
122109 83 130
118
131 113136111
146 101
137 99 5955 96 67 106
120
114
119103 9380 71 7756
5182 72 147
143
145 150 52
84 102
133
141 125 81 86 8875100 91 129
116
105 87 136
138
135
148 62 68 73 64 53 122
119
117 7494 92 103
109
134 9754 98 65 111
69 95 63 61 58
137
101
79 121
93
80
97
90 69
90
3 76
99
14 42
24 83
6 9 85
70
13 12 34 66
62
33 65
67
15 41 63
79
264418 43
29 60
78
40 92
8 49 162 31 4 91
72
11 3846 30
25 20 736 61
58
68
45 21 89
81
28 71
23 1 37 27 98
94
73
19 5 22 64
74
32 47 95
54
39 50 53
59
57
17 10 96
100
82
55
77
56
51
87
84
48 35
75
52
88
86

(b) (c)

Fig. 2. The computational results for the sample network with 150 nodes detected by
our method. (a)The modularity changing with number of communities in each iteration
for different time parameter t. (b)The community structure identified by setting t = 1,
corresponds to 3 communities represented by the colors. (c)The dendrogram of hierar-
chical structures and the optimal partition with a maximal modularity Q = 0.6344 is
denoted by a vertical dash line.
Community Detection in Sample Networks 189

7.5
t=1
0.6
7 t=2
t=3
6.5 0.5 t = 10

Modularity function
6 t = 20
0.4
5.5
y

5 0.3 190
271
268
4.5
0.2
4 149
0.1
3.5 Samples in group 1
Samples in group 2
3 Samples in group 3 0

0 1 2 3 4 0 50 100 150 200 250 300

x Number of communities

(a) (b) (c)

Fig. 3. (a)300 sample points generated from the given 3-Gaussian mixture distribution.
(b)The modularity changing with number of communities in each iteration for different
time parameter t. (c)The community structure identified by setting t = 3, corresponds
to 3 communities represented by the colors.

8.5

8
0.7 t=1
7.5
t=2
0.6 t=3
t = 10
Modularity function

7
0.5 t = 100
6.5
0.4
y

0.3
5.5

5 0.2

4.5 Samples in group 1 0.1


Samples in group 2
17
4 Samples in group 3
Samples in group 4
0

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 50 100 150 200 250 300
x
Number of communities

(a) (b) (c)

Fig. 4. (a)320 sample points generated from the given 4-Gaussian mixture distribution.
(b)The modularity changing with number of communities in each iteration for different
time parameter t. (c)The community structure identified by setting t = 3, corresponds
to 4 communities represented by the colors.

indicate that our method can produce accurate results when the time parameter
t is properly chosen. The visualization of the partitioning result and the dendro-
gram of hierarchical structures are shown in Figure 2. The same goes for the other
two sample networks and the clustering results can be seen in Figure 3 and Figure
4, respectively.

4 Conclusions
In this paper, we use the coarse-grained-diffusion-distance based agglomerative
algorithm to uncover the community structure exhibited by sample networks
generated from Gaussian mixture model. The present algorithm can identify the
community structure in a high degree of efficiency and accuracy. An appropri-
ate number of communities can be automatically determined without any prior
190 L. Zhao, T. Liu, and J. Liu

knowledge about the community structure. The computational results on three


artificial networks generated from Gaussian mixture model confirm the capabil-
ity of the algorithm.

Acknowledgements
This work is supported by the Project of the Social Science Foundation of Beijing
University of Posts and Telecommunications under Grant 2010BS06.

References
1. Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. Rev. Mod.
Phys. 74(1), 47–97 (2002)
2. Newman, M.: The structure and function of complex networks. SIAM Review 45(2),
167–256 (2003)
3. Newman, M., Barabási, A.L., Watts, D.J.: The structure and dynamics of networks.
Princeton University Press, Princeton (2005)
4. Barabási, A., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., Vicsek, T.: Evolution
of the social network of scientific collaborations. Physica A 311, 590–614 (2002)
5. Ravasz, E., Somera, A., Mongru, D., Oltvai, Z., Barabási, A.: Hierarchical organi-
zation of modularity in metabolic networks. Science 297(5586), 1551–1555 (2002)
6. Girvan, M., Newman, M.: Community structure in social and biological networks.
Proc. Natl. Acad. Sci. USA 99(12), 7821–7826 (2002)
7. Newman, M., Girvan, M.: Finding and evaluating community structure in net-
works. Phys. Rev. E 69(2), 026113 (2004)
8. Newman, M.: Modularity and community structure in networks. Proc. Natl. Acad.
Sci. USA 103(23), 8577–8582 (2006)
9. E, W., Li, T., Vanden-Eijnden, E.: Optimal partition and effective dynamics of
complex networks. Proc. Natl. Acad. Sci. USA 105(23), 7907–7912 (2008)
10. Li, T., Liu, J., E, W.: Probabilistic Framework for Network Partition. Phys. Rev.
E 80, 26106 (2009)
11. Liu, J., Liu, T.: Detecting community structure in complex networks using simu-
lated annealing with k-means algorithms. Physica A 389(11), 2300–2309 (2010)
12. Liu, J., Liu, T.: Coarse-grained diffusion distance for community structure detec-
tion in complex networks. J. Stat. Mech. 12, P12030 (2010)
13. Fortunato, S.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010)
14. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern
Anal. Mach. Intel. 22(8), 888–905 (2000)
15. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data
Mining, Inference, and Prediction. Springer, New York (2001)
16. Penrose, M.: Random Geometric Graphs. Oxford University Press, Oxford (2003)
Efficient Reduction of the Number of Associations Rules
Using Fuzzy Clustering on the Data

Amel Grissa Touzi, Aicha Thabet, and Minyar Sassi

Université de Tunis El Manar


Ecole Nationale d’Ingénieurs de Tunis
Bp. 37, Le Belvédère 1002 Tunis, Tunisia
{amel.touzi,minyar.sassi}@enit.rnu.tn,
{[email protected]}

Abstract. In this paper, we are interested in the knowledge discovery methods.


The major inconveniences of these methods are: i) the generation of a big num-
ber of association rules that are not easily assimilated by the human brain ii) the
space memory and the time execution necessary for the management of their
data structures. To cure this problem, we propose to build rules (meta-rules) be-
tween groups (or clusters) resulting from a preliminary fuzzy clustering on the
data. We prove that we can easily deduce knowledge about the initial data set if
we want more details. This solution reduced considerably the number of gener-
ated rules, offered a better interpretation of the data and optimized both the
space memory and the execution time. This approach is extensible; the user is
able to choose the fuzzy clustering or the extraction rules algorithm according
to the domain of his data and his needs.

1 Introduction
Nowadays, we notice a growing interest for the Knowledge Discovery in Databases
(KDD) methods. One of the important reasons to that fact is the increasing volume of
the accumulated data by organizations that are under-exploited extensively. Several
solutions were proposed, they are based on neural networks, trees, concept lattices,
association rules, etc. [1].
Several algorithms for mining association rules were proposed in the literature. The
existing generation methods are combinative and generate a big number of rules (even
when departing from sets of reasonable size) that are not easily exploitable [2], [3].
Several approaches of reduction of this big number of rules have been proposed like
the use of quality measurements, syntactic filtering by constraints, and compression
by the representative or Generic Bases [4]. These bases constitute the reduced sets of
rules permitting to preserve the most relevant rules, without any loss of information.
In our opinion, the big number of the generated rules is due to the fact that these
approaches try to determine rules departing from the enormous data set.
In this paper, we propose to extract knowledge taking in consideration another de-
gree of granularity into the process of knowledge extraction. We propose to define
rules (meta-rules) between classes resulting from a preliminary fuzzy clustering on
the data. We call the knowledge yet extracted "Meta-Knowledge". Indeed, while clas-
sifying data, we construct homogeneous groups of data having the same properties, so

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 191–199, 2011.
© Springer-Verlag Berlin Heidelberg 2011
192 A. Grissa Touzi, A. Thabet, and M. Sassi

defining rules between clusters implies that all the data elements belonging to those
clusters will be necessarily dependent on these same rules. Thus, the number of gen-
erated rules is smaller since one processes the extraction of the knowledge on the
clusters which number is relatively lower compared to the initial data elements. We
prove that we can easily deduce knowledge about the initial data set if we want more
details.
The rest of the paper is organized as follows: section 2 presents the basic concepts
of discovering association rules. Section 3 presents problems and limits of the existing
knowledge discovery approaches. Section 4 defines the theoretical foundation of this
approach. Section 5 contains the principles of the new approach that we proposed.
Section 6 enumerates the advantages of the proposed approach. Section 7 validates
the proposed approach and gives an experimentation example. We finish this paper
with a conclusion and a presentation of some future works.

2 Basic Concepts
In this section, we present the basic concepts of discovering association rules.

2.1 Discovering Association Rules

Association rules mining have been developed in order to analyze basket data in a
marketing environment. Input data are composed of transactions: each transaction
consists of items purchased by a consumer during a single visit. Output data is com-
posed of rules. An example of an association rule is “90% of transactions that involve
the purchase of bread and butter also include milk” [5]. Even if this method was in-
troduced in the context of Market Business Analysis, it has many applications in other
fields, like webmining or textmining. It can also be used to search for frequent co-
occurrences in every large data set.
The first efficient algorithm to mine association rules is APriori [6]. Other algo-
rithms were proposed to decrease the count of reads of the database and to improve
computational efficiency. Among them, we mention CLOSED [7], CHARM [8],
TITANIC [9],[10], GENALL [11], PRINCE [12].
Several varieties of lattice have been introduced with these algorithms, like Ice-
berg Concept lattices [10] where the nodes are frequent closed itemsets ordered by
the inclusion relation, Minimal Generators Lattice [12], where the nodes are the
minimal Generators (called key itemsets) are ordered by the inclusion relation. In
these cases we don't construct the FCA on the data but on the found itemsets. For
more detail the reader can see [12].

2.2 Classification and Knowledge Extraction

The only interesting work, that used the data classification as a prior step of the gen-
eration of association rules, applied in the industry, is the one of Plasse et al. [13]. The
technique proposed was to carry out a preliminary classification of the variables in
order to obtain homogeneous groups of attributes then to seek the association rules
inside each one of these groups. They obtained groups of variables, more restricted
and homogeneous. Besides the rules obtained are fewer and simpler.
Efficient Reduction of the Number of Associations Rules 193

To validate their approaches, they proceeded to a research of association rules con-


sidering several partitions obtained at the end of either proc VARCLUS or of proc
CLUSTER from SAS. The various tests achieved showed that the research of rules
inside classes permits to decrease their number. We notice that even with this solu-
tion, the number of rules is not negligible; the user finds difficulties in assimilating
this big number.

3 Problems and Motivations


Several algorithms try to trace the decision tree or the FCA or one of these extensions
to extract the association rules. In this case, researchers always focus on giving an
optimum set of rules modeling in a faithful way the starting data unit, after having
done a data cleansing step and an elimination of invalid-value elements.
To our point of view, limits of these approaches consist in extracting the set of rules
departing from the data or a data variety like the frequent itemsets or the frequent closed
itemsets, which may be huge. Thus we note the following limits: 1) these approaches
require a big space memory and an important execution time for data modeling because
of the data structures required for these algorithms such as the trees or the graphs or the
lattices; 2) the rules generated from these data are generally redundant rules; 3) these
algorithms generated a very big number of rules, almost thousands, that the human brain
cannot even assimilate; 4) some previous works demonstrated that the behaviour of
these algorithms of association rules extraction varies strongly according to the features
of the used data set [14]. The number of the generated association rules in general varies
from several ten of thousands to several millions [3], [15]. The Execution times ob-
tained strongly vary according to the used algorithm [16]; 5) generated rules by these
algorithms don't take into account the data semantics nor the importance of an attribute
in relation to another in the data description, according to the specific domain of this
data set; and (6) Generally, the goal to extract a set of rules is to help the user to give
semantics of data and to optimize the information research. This fundamental constraint
is not taken into account by these approaches.
To cure all these problems, we propose a new approach for knowledge extraction
using a preliminary fuzzy clustering on the data. We start by presenting the theoretical
foundations of the proposed approach.

4 Theoretical Foundation of the Knowledge Extraction


In this part, we present the theoretical foundations of the proposed approach, based on
the following properties and theorems. We start with an example.
Example
Given a set of students S1, S2, S3, S4, S5, S6, S7, S8, S9 and S10 according to their
preferences for the following modules: Databases (DB), Programming Lan-
guages(PL), NeTworks(NT), LItterature(LI) and Another Topic(AT).
Table 1 puts in correspondence the student and his marks in these different mod-
ules. The fuzzy clustering operation (FCM algorithm) applied to this example gener-
ates 3 fuzzy partitions. The result of the fuzzy clustering algorithm is the matrix of
member-ship described in Table 2.
194 A. Grissa Touzi, A. Thabet, and M. Sassi

We define the cut, noted α-Coupe, on the fuzzy context as being the reverse of the
number of clusters obtained.
We can consider two possible strategies for the application of the α-Coupe: Bi-
nary strategy (resp. Fuzzy strategy): This strategy defines a binary membership
(resp. fuzzy membership) of the objects to the different clusters. We propose to leave
the fuzzy formal context, to apply an α-Coupe to the set of the degrees of member-
ship, to replace these last by values 1 and 0 and to deduce the binary reduced formal
context.

Table 1. Example of students marks Table 2. Result of clustering

DB PL NT LI AT
C1 C2 C3
15 14 12 14 10
S1 S1 0.092 0.804 0.104
14 15 9 8 10
S2 S2 0.091 0.708 0.201
16 13 12 12 7
S3 S3 0.041 0.899 0.060
7 10 14 12 8
S4 S4 0.071 0.100 0.829
11 5 18 15 14 S5 0.823 0.070 0.107
S5
12 11 10 10 10 S6 0.090 0.548 0.362
S6
17 6 14 15 14 S7 0.810 0.108 0.082
S7
9 10 12 11 10 S8 0.036 0.066 0.898
S8
5 6 10 6 10 S9 0.157 0.179 0.664
S9
13 7 12 14 13 S10 0.231 0.388 0.381
S10

In our example α-Coupe = 1/3. Table 3 presents the binary reduced formal context
after application of the α-Coupe to the fuzzy formal context presented in table 2.
Table 4 represents the fuzzy reduced formal context after application of the α-Coupe
to the fuzzy formal context presented on Table 2.
Generally, we can consider that the attributes of a formal concept, best-known as
the concept intention, are the description of the concept. Thus, the relationships be-
tween the object and the concept should be the intersection of the relationships be-
tween the objects and the attributes of the concept. Since each relationship between
the object and an attribute is represented as a set of membership values in the fuzzy
formal context, the intersection of these member-ship values should be the minimum
of these membership values, according to fuzzy theory. Thus, we defined the fuzzy
formal concept from the fuzzy formal context.

Properties
− The number of clusters generated by a clustering algorithm is always lower than
the number of starting objects to which one applies the clustering algorithm
− All objects belonging to one same cluster have the same characteristics. These
characteristics can be deduced easily knowing the center and the distance from the
cluster.
Efficient Reduction of the Number of Associations Rules 195

Table 3. Reduced binary formal context Table 4. Reduced fuzzy formal context

C1 C2 C3 C1 C2 C3

S1 0 1 0 S1 - 0.804 -

S2 0 1 0 S2 - 0.708 -

S3 0 1 0 S3 - 0.899 -

S4 0 0 1 S4 - - 0.829

S5 1 0 0 S5 0.823 - -

S6 0 1 1 S6 - 0.548 0.362

S7 1 0 0 S7 0.810 - -

S8 0 0 1 S8 - - 0.898

S9 0 0 1 S9 - - 0.664

S10 0 1 1 S10 - 0.388 0.381

Notation
Let C1 and C2 be two clusters generated by a fuzzy clustering algorithm.
The rule C1 = > C2 with a coefficient (CR) will be noted C1 = > C2 (CR)
If the coefficient CR is equal to 1 then the rule is called an exact rule.

Theorem 1
Let C1, C2 be two clusters, generated by a fuzzy clustering algorithm and verifying
the properties p1 and p2 respectively. Then the following properties are equivalent:
C1 ⇒ C2 (CR) ⇔
- ∀ object O1 ∈ C1 => O1 ∈C2 (CR)
- ∀ object O1 ∈ C1, O1 checks the property p1 of C1 and the property p2 of C2. (CR)

Theorem 2
Let C1, C2 and C3 be three clusters generated by a fuzzy clustering algorithm and
verifying the properties p1, p2 and p3 respectively. Then the following properties are
equivalent:
C1 and C2 = > C3 (CR)

∀ object O1 ∈ C1 ∩ C2 = > O1 object ∈C3 (CR)
∀ O1 object ∈ C1 ∩ C2 then O1 checks the properties p1, p2 and p3 with (CR)

The proof of the two theorems rises owing to the fact that all objects which belong to
a same cluster check necessarily the same property as their cluster.

Classification of data
We regroup customers that check the same property in a class (only one property).
Using this type of fuzzy clustering, we have the following properties:
- The number of clusters in this case will be equal to the number of attributes.
- The Class i will contain all the objects which check only one same property.
For example in basket data in a marketing environment we regroups all the customers
who bought the same product x.
196 A. Grissa Touzi, A. Thabet, and M. Sassi

C1 C2
Customers who Customers who
bought Bread bought

Fig. 1. Example of overlapping between two clusters

From this matrix, we can generate rules giving associations between the different
clusters. The Figure1 models an example of classification result modeling an over-
lapping between the two clusters C1 and C2. We notice that the intersection of the
two clusters gives the customers who bought bread and Chocolate.

Generation of the knowledge departing from Meta knowledge


Definition
Each Class generated by a fuzzy clustering algorithm will be modeled by a predicate
having two arguments: the first is the object; the second carries the name of the corre-
sponding criteria of clustering.
Example
The cluster C1 will be modeled by the predicate buys(x, Bread).
In this case, we deduce that if we have the rule: C1 = > C2 (CR), it will be transformed
in the following rule: buys (x, Bread) = > buys (x, Chocolate) (CR )
We can simplify this notation while only specifying:
Bread ⇒ Chocolate (CR)
Thus we can generate the different rules on data from generated rules on clusters.

5 New Approach for the Knowledge Discovery


The principle of our approach is based on the following ideas:
− While applying a fuzzy clustering algorithm to the elements of a data source to
divide them into clusters, we obtain the matrix of adherence of each of these ele-
ments to a given cluster,
− We consider the obtained matrix as a formal context where objects are elements
to classify and attributes are clusters. Then, we deduce the reduced binary formal
context of the obtained matrix.
− From this formal context, we apply an algorithm of generation of association
rules on clusters to generate the set of Meta knowledge.
− We deduce the set of knowledge from this Meta knowledge
This principle for the extraction of knowledge from the data proceeds in two phases:
1. Phase of clustering, to organize the data into groups, using the algorithms
of fuzzy clustering,
2. Phase of extraction of knowledge, using any algorithm of generation of
association rules.
This process can be summarized in the following steps:
Efficient Reduction of the Number of Associations Rules 197

Begin
Step 1: Introduce a data set (any type of data)
Step 2: Apply a fuzzy clustering algorithm to organize the data into the different
groups (or clusters).
Step 3: Determine the fuzzy formal context (Object/Cluster) of the matrix obtained
in the step 2
Step 4: Deduce the reduced binary formal context of the matrix obtained in the
step 3
Step 5: Apply an algorithm of generation of association rules on clusters to generate
the set of Meta knowledge in the form of association rules between clusters
Step 6: Generate knowledge of the data set in the form of association rules
End

Fuzzy Meta knowledge Knowledge


Set of Context Matrix
Classification Set of Rules set of Rules on
clusters (Objects / Clusters)
generated on the clusters data
Cluster 1 ⇒ Objet1 ⇒
Cluster 2 (CR) Objet2 (CR)

Generation of
Analysis Extraction of Knowledge
Meta Knowledge’s

Fig. 2. Process of knowledge extraction

6 Advantages of the New Approach


Different advantages are granted by the proposed approach:
1) Extensibility of the proposed approach: 1) Our approach can be applied with any
fuzzy clustering algorithm to classify the initial data. 2) The stage of generation can
be applied with any rules generation algorithm. In the literature, the studies showed
that such algorithm is optimum than another according to the field of the data used. It
means that we can apply the optimum method according to the field of data set. 3) We
can generate the maximum of knowledge on our initial data set; it’s enough to modify
the choice of the clustering criteria. These criteria can be chosen by the user like pa-
rameters of entry according to the importance of the attribute in its applicability. 4)
We can classify our data according to different criteria and obtain different clusters
which generate a different set of Meta knowledge
2) The definition of the Meta knowledge concept: This definition is in our opinion
very important, since the number of rules generated is smaller. Besides, the concept of
Meta knowledge is very important to have a global view on the data set which is very
voluminous. This models a certain abstraction of the data that is fundamental in the
case of an enormous number of data.
In this case, we define the set of association rules between the classes. That can
generate automatically the association rules between the data, if we want more details.
198 A. Grissa Touzi, A. Thabet, and M. Sassi

7 Validation of the Proposed Approach


To validate the approach proposed, we chose: 1) The FCM algorithm for a fuzzy
clustering of the data set, and 2) The PRINCE algorithm, presented in [12], permits to
extract the generic bases. This algorithm is based on the FCA. It takes as input an
extraction context K, the minimum threshold of support minsup and the minimum
threshold of confidence minconf. It outputs the list of the frequent closed itemsets and
their associated minimal generators as well as generic bases of association rules.
Thus, Prince operates in three successive steps: (i) Minimal generator determination
(ii) Partial order construction (iii) Extraction of generic rule bases [12]. A free version
of this algorithm exists on internet1.
The choice of FCA method and Generic Bases is justified by the fact that these
Bases constitute reduced sets of informative and generic rules permitting to preserve
the most relevant rules, without loss of information. Thus, the major problems in the
other methods such as memory capacity and execution time will be more interesting
because we operate the knowledge extraction on clusters which number is negligible
compared to the initial data.
Results that we obtained are encouraging. We are now to test our approach on differ-
ent types of data. A platform of fuzzy clustering and knowledge extraction of data is
implemented in our laboratory. It offers in particular the possibility to: 1) Compare
the generated rules by traditional methods and those generated by our new approach,
following a fuzzy clustering operation of data; 2) Visualize the different generated
lattices; and 3) Model the different generated clusters. This platform is extensible, it
offers different fuzzy clustering algorithms as C-moyennes, FCM, etc; and different
association rules generation algorithms as PRINCE, CLOSE, etc.

8 Conclusion
In this paper, we presented a new approach that permits to extract knowledge from a
preliminary fuzzy clustering of the data. Generally, all the methods in this field applied
to the data (or data variety) which is huge. It generates consequently a big number of
association rules that are not easily assimilated by the human brain. The space memory
and the time execution necessary for the management of these lattices are important.
To resolve this problem, we propose to build rules (meta-rules) between groups (or
clusters) resulting from a preliminary fuzzy clustering of the data. This approach is
based on the following main ideas: while classifying data, we construct homogeneous
groups or clusters of data having each one the same properties. Consequently, defin-
ing rules between clusters implies that all data belong to those will be necessarily
dependent on these same generated rules.
To validate this approach, we have chosen the FCM (Fuzzy C-Means) algorithm
which allows a fuzzy clustering to generate clusters. We have chosen the Prince algo-
rithm that permits to extract Generic Bases modeling the Meta Knowledge from initial
data and we deduce the data set’s knowledge. We have implemented a fuzzy cluster-
ing and a data knowledge extraction platform. It is extensible, it offers different fuzzy
clustering algorithms and different generation association rules algorithms.

1
www.cck.rnu.tn/sbenyahia/software_release.htm
Efficient Reduction of the Number of Associations Rules 199

In the future, we propose to define obtained rules in an Expert System and offer to
the user the possibility to dialogue with this system to satisfy his needs.

References
1. Goebel, M., Gruenwald, L.: A Survey of Data Mining and Knowledge Discovery Software
Tools. SIGKDD, ACM SIGKDD 1(1), 20–33 (1999)
2. Zaki, M.: Mining Non-Redundant Association Rules. In: Data Mining and Knowledge Dis-
covery, vol. (9), pp. 223–248 (2004)
3. Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Intelligent structuring and re-
ducing of association rules with formal concept analysis. In: Baader, F., Brewka, G., Eiter,
T. (eds.) KI 2001. LNCS (LNAI), vol. 2174, pp. 335–350. Springer, Heidelberg (2001)
4. Pasquier, N.: Data Mining: Algorithmes d’Extraction et de Réduction des Règles
d’Association dans les Bases de Données. Thèse, Département d’Informatique et Statis-
tique, Faculté des Sciences Economiques et de Gestion, Lyon (2000)
5. Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between sets of items in
large Databases. In: Proceedings of the ACM SIGMOD Intl. Conference on Management
of Data, Washington, USA, pp. 207–216 (June 1993)
6. Agrawal, R., Skirant, R.: Fast algoritms for mining association rules. In: Proceedings of the
20th Int’l Conference on Very Large Databases, pp. 478–499 (June 1994)
7. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient Mining of Association Rules Us-
ing Closed Itemset Lattices. Information Systems Journal 24(1), 25–46 (1999)
8. Zaki, M.J., Hsiao, C.J.: CHARM: An Efficient Algorithm for Closed Itemset Mining. In:
Proceedings of the 2nd SIAM International Conference on Data Mining, Arlington, pp. 34–
43 (April 2002)
9. Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Fast Computation of Concept
Lattices Using Data Mining Techniques. In: Bouzeghoub, M., Klusch, M., Nutt, W., Sattler,
U. (eds.) Proceedings of 7th Intl. Workshop on Knowledge Representation Meets Data-
bases (KRDB 2000), Berlin, Germany, pp. 129–139 (2000)
10. Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing Iceberg Concept
Lattices with TITANIC. J. on Knowledge and Data Engineering (KDE) 2(42), 189–222
(2002)
11. Ben Tekaya, S., Ben Yahia, S., Slimani, Y.: Algorithme de construction d‘un treillis des
concepts formels et de déte rmination des générateurs minimaux. ARIMA Journal, 171–193
(Novembre 2005); Numéro spécial CARI 2004
12. Hamrouni, T., Ben Yahia, S., Slimani, Y.: Prince: Extraction optimisée des bases généri-
ques de règles sans calcul de fermetures. In: Proceedings of the Intl. INFORSID Confer-
ence, Editions Inforsid, Grenoble, France, May 24-27, pp. 353–368 (2005)
13. Plasse, M., Niang, N., Saporta, G., Villeminot, A., Leblond, L.: Combined use of associa-
tion rules mining and clustering methods to find relevant links between binary rare attrib-
utes in a large data set. Computational Statistics & Data Analysis 52(1), 596–613 (2007)
14. Pasquier, N., Bastide, Y., Touil, R., Lakhal, L.: Pruning closed itemset lattices for associa-
tion rules. In: Proceedings of 14th International Conference Bases de Données Avancées,
Hammamet, Tunisia, October 26-30, pp. 177–196 (1998)
15. Zaki, M.J.: Generating Non-Redundant Association Rules. In: Proceedings of the 6th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston,
MA, pp. 34–43 (August 2000)
16. Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining frequent patterns with
counting inference. SIGKDD Explorations 2(2), 66–75 (2000)
A Localization Algorithm in Wireless Sensor Networks
Based on PSO

Hui Li, Shengwu Xiong, Yi Liu, Jialiang Kou, and Pengfei Duan

School of Computer Science and Technology,


Wuhan University of Technology, Wuhan 430070, China

Abstract. Node localization is a fundamental and important technology in wire-


less sensor networks. In this paper, a localization algorithm in wireless sensor
networks based on PSO is proposed. Unlike most of the existing location algo-
rithm, the proposed algorithm figures out the rectangular estimation range of
unknown node by bounding box algorithm and takes one value as the estimated
coordinates of this node, then it has been optimized by PSO, so got the more
precise location of unknown nodes. Simulation results show that this optimized
algorithm outperforms traditional bounding box on the positioning accuracy and
localization error.

Keywords: Node localization, wireless sensor networks, PSO, bounding box,


localization error.

1 Introduction

Wireless sensor network integrates multifold subjects including microelectronic,


wireless communication and wireless network, and is widely used in industrial con-
trolling, military, medical assistance fields [1-3]. In most applications, determining the
physical positions of sensor nodes is the basic requirements. With regard to localization
algorithms used for estimation location, there are two categories: range-based and
range-free [4, 5]. Because of the hardware limitations of WSN devices, solutions in
range-free localization are being pursued as a cost-effective alternative to more ex-
pensive range-based approaches. In this paper, we firstly describe the Particle swarm
optimization [6]; then propose a localization algorithm in wireless sensor networks
based on PSO; at last analysis and simulation show that the proposed algorithm can
improve the accuracy of the estimated location.

2 A Localization Algorithm in WSNs Based on PSO


In this section, a range-based localization algorithm based on PSO for wireless sensor
networks is described. A similar bounding-box localization algorithm is used to cal-
culate a rectangular estimation area of unknown nodes, and then the coordinates of
nodes in the area is generated randomly, these coordinates are regarded as initial
population and are optimized by PSO, and we can get the more precise coordinates of
unknown nodes.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 200–206, 2011.
© Springer-Verlag Berlin Heidelberg 2011
A Localization Algorithm in Wireless Sensor Networks Based on PSO 201

2.1 Communication Model

In this paper, RSS (Received Signal Strength) [7] is used to measure distance between
two nodes. The radio range is a circle in this model. The distance between the nodes is
measured according to the attenuation of signal broadcasted in medium. Mathematical
model of wireless communication channel is showed as follows:

⎛d ⎞
PL(d ) = PL(d 0 ) − 10n lg ⎜ ⎟ − xσ (1)
⎝ d0 ⎠
Where d denotes the distance between transmitter and receiver; d0 denotes the reference
node; n denotes the channel attenuation index and can be read out value from 2 to 4. Xσ
denotes a variable of Gauss random noise; PL(d0) denotes the signal strength which is
d0 away from the transmitter; PL(d) denotes signal strength which is d away from
transmitter; PL(d0) can be got from experience or definition of hardware criterion. In
this formula, the distance d is calculated by signal strength PL(d).

2.2 Assumptions

In order to research conveniently, some assumptions are put forward:


(1) Each sensor node has a unique ID, sensor nodes are deployed irregularly;
(2) Signal transmission model of sensor nodes is the ideal space of sphere;
(3) All ordinary sensor nodes are homogeneous and the power and computing ca-
pacity are limited;
(4) Anchor nodes can get two-dimensional coordinates information by means of
GPS [8, 9] devices or other devices;
(5) All sensor nodes and the anchor nodes are time-synchronized.

2.3 Algorithm Description

In general, most localization algorithms adopt the communication model which deems
that the communication region of a node in a two dimension space is a circle. However,
the bounding box algorithm uses squares instead of circles to bound the possible posi-
tions of a node. An example of this algorithm is depicted in Fig.1.
For each anchor node i, a bounding box is defined as a square with its center at the
position of this node (xi,yi), with sides of size 2h (the side length of the internal con-
necting square of the circle mentioned above) and with four corners’ coordinates of
( xi − h, yi − h), ( xi − h, yi + h),( xi + h, yi − h), ( xi + h, yi + h) respectively. The intersection
of all bounding boxes can be easily computed without any need for floating point op-
erations by taking the maximum of the low coordinates and the minimum of the high
coordinates of all bounding boxes, expressed by formula (5) and denoted by the shaded
rectangle in Fig.1.

y floor = max( yi − h); yceiling = min( yi + h);


(2)
xleft = max( xi − h); xright = min( yi + h);
202 H. Li et al.

Fig. 1. Sketch map of bounding box algorithm

So we can figure out the rectangular estimation range of unknown node from the
formula (6) as follows:

EstimateScopei = [ xleft xright y floor yceiling ]; i = 1, 2,..., N (3)

With the scope of the EstimateScope, get the xrandom between the xleft and the
xright,get the yrandom between the yfloor and the yceiling,so we can figure out:

EstimateCoordinatei = [ xrandom yrandom ]; i = 1, 2,..., N (4)

Therefore, each unknown node has a value of EstimateCoordinate. If there are N


numbers of Unknown nodes, we can randomly get a group of EstimateCoordinate value
at a time, randomly do this on that way, M times there will be M groups. Each group is
regarded as a particle, all of the M particles is optimized by PSO, we can figure out the
more precise coordinate value of the unknown nodes. Despite the final error of this
algorithm, computing the scope of rectangles uses fewer processor resources than
computing the intersection of circles.

3 Simulation and Analysis

3.1 Simulation Environment and Parameters

In this set of experiments, we deployed 300 sensor nodes including anchor nodes and
unknown nodes randomly distributed in a two-dimension rectangular area of
1000*1000m2(shows in figure 2). We assume the transmission range is fixed by 200m
for both unknown nodes and anchor nodes. In the simulation of the position algorithm,
we assume the channel attenuation index n of 4. Figure 3 shows the connectivity rela-
tion of sensor nodes, which leads to an average connectivity of 31.8933 and adjacent
anchor node numbers of 6.5533. In the graph, points represent nodes and edges rep-
resent the connections between neighbors who can hear each other.
A Localization Algorithm in Wireless Sensor Networks Based on PSO 203

Fig. 2. Node distribution

Fig. 3. Connectivity relation

3.2 Performance Comparison

In this section, traditional bounding box and optimized bounding box localization
algorithm will be simulated in same parameters environment (original node coordi-
nates, ratio of anchor nodes, node density and communication radius) and the different
performance of positioning accuracy and localization error will be analyzed.
Figure 4 shows the localization error of different nodes before optimization and after
optimization. All of the unknown nodes have been estimated by bounding box algorithm.
204 H. Li et al.

Fig. 4. Localization error of different nodes

Fig. 5. comparison between non-optimization and optimization

The green represent the localization error of different nodes before optimization and then
the blue represent the localization error of different nodes after optimization. We can see
clearly from Figure 4 that localization error of almost all nodes decrease correspondingly
by optimization. This is due to the fact that in the bounding box algorithm, the final po-
sition of the unknown node is then computed as the center of the intersection of all rec-
tangular estimation range, however, after estimated coordinates are optimized by PSO,
we can get the more precise coordinate value of the unknown nodes.
A Localization Algorithm in Wireless Sensor Networks Based on PSO 205

Fig. 6. Original position and optimized position of Unknown nodes

Figure 5 shows the average location accuracy of the traditional bounding box algo-
rithm compared with optimized algorithm, which shows us that the latter is more
preferable to the former.
Figure 6 shows original position and optimized position of unknown nodes. The
circles represent the true location of the nodes, and the squares represent the estimated
location of the nodes which have been optimized by PSO. The longer the line, the larger
the error is. In this graph, we can see that the estimated location of the nodes is more
near to the true location after optimization.

4 Conclusion

Localization is an important issue for WSNs. To reduce the localization error and
improve the accuracy of the estimated location, a localization algorithm in WSNs based
on PSO is proposed.
In this paper, after the rectangular estimation range of unknown node is calculated
by bounding box, the final position of the unknown node is not computed by the center
of the intersection of all rectangular estimation range, but got by the position of the
unknown node randomly within rectangular estimation range and then optimized by
PSO. Analysis denotes that this scheme requires small amount of calculation and
simulation results shows that optimized algorithm is superior to the traditional
bounding box on the positioning accuracy and localization error.
206 H. Li et al.

Acknowledgements. This work was supported in part by the National Science


Foundation of China under grant no. 40971233.

References

1. Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: A Survey on Sensor Networks.
IEEE Commun. Magn. 40, 102–114 (2002)
2. Tubaishat, M., Madria, S.: Sensor Networks: An Overview. IEEE Potentials 22(2), 20–23
(2003)
3. Basagni, S., Carosi, A., Melachrinoudis, E., Petrioli, C., Wang, Z.M.: Protocols and model
for sink mobility in wireless sensor networks. ACM SIGMOBILE Mobile Computing and
Communications Review 10, 28–30 (2006)
4. He, T., Huang, C., Blum, B.M., Stankovic, J.A., Abdelzher, T.: Range-Free Localization
Schemes for Large Scale Sensor Networks. In: 9th Annual International Conference on
Mobile Computing and Networking, pp. 81–95. IEEE Press, San Diego (2003)
5. You, Z., Meng, M.Q.-H., Liang, H., et al.: A Localization Algorithm in Wireless Sensor
Networks Using a Mobile Beacon Node. In: International Conference on Information
Acpuisition, pp. 420–426. IEEE Press, Jeju City (2007)
6. Eberhart, R.C., Kennedy, J.: A New Optimized using Particle Swarm Theory. In: 6th Inter-
national Symposium on Micromachine and Human Science, pp. 39–43. IEEE Press, Pis-
cataway (1995)
7. Chen, H., Ping, D., Xu, Y., Li, X.: A Novel Localization Scheme Based on RSS Data for
Wireless Sensor Networks. In: Advanced Web and Network Technologies, and Applications,
pp. 315–320. IEEE Press, Harbin (2008)
8. Bulusu, N., Heidemann, J., Estrin, D.: GPS-less low-cost outdoor localization for very small
devices. IEEE Personal Communications 7(5), 28–34 (2000)
9. Capkun, S., Hamdi, M., Hubaux, J.P.: GPS-Free Positioning in Mobile Ad-Hoc Networks.
In: 34th Annual Hawaii International Conference on System Sciences, pp. 255–258. IEEE
Press, Maui (2001)
Game Theoretic Approach in Routing Protocol for
Cooperative Wireless Sensor Networks

Qun Liu, Xingping Xian, and Tao Wu

College of Computer Science, Chongqing University of Posts and Telecommunications,


Chongqing, China
[email protected], [email protected], [email protected]

Abstract. A game theoretic method, called the first price sealed auction game,
was introduced to control routing overhead in wireless sensor networks in this
paper. The players of the game are the wireless nodes with set of strategies
(forward or not). The game is played whenever an arbitrary node in the network
forwards packets. In order for the game to function, a multi-stage pricing game
model is established, this provides the probability that the wireless nodes for-
ward the receiving packets, and the payoff of all nodes can be optimize through
choosing the best neighbour node. The simulations in NS2 showed that the pric-
ing routing game model improves performance, not only decreasing the energy
consumption, but also prolonging network life time. Finally the numerical
analysis about nodes’ payoff is given through Matlab.

Keywords: Game Theory, Packet Forwarding, Routing, Auction, Incentive


Mechanism.

1 Introduction
Wireless sensor networks(WSNS) have received significant attention in recent years.
The main features of WSNS are that of low-cost nodes with limited resources both in
terms of computational power and battery whose purpose is sensing the environment.
In order to decrease the energy consumption, Numerous routing protocols have been
introduced for wireless sensor networks. Our approach in this paper falls into selec-
tively balancing the forwarding overhead on nodes by applying Game Theory. Game
theory is a mathematical method that attempts to mathematically capture and analyze
the behavior in strategic situations, in which an individual's success in making choices
depends on the choices of others. It ensures that the desired global objective is
achieved in the presence of selfish agents. Game Theory is not new to the area of tele-
communications and wireless networks. It has been used to model the interaction
among users, to solve routing and resource allocation problems in a competitive envi-
ronment, it provides incentive mechanism for high energy nodes to cooperate with
other nodes in transferring information in networks. In [1-4], authors gave the relative
researches on these problems.
In this paper, a routing model based pricing game is presented. In order to use first
price sealed auction game model to select relay node, we first organize the node and
its neighbour nodes into an incomplete information auction game, the node is buyer

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 207–217, 2011.
© Springer-Verlag Berlin Heidelberg 2011
208 Q. Liu, X. Xian, and T. Wu

and its neighbour nodes are sellers. Then, we evolve the auction game as a multi-stage
game model. Finally, we get a reliable routing path through relay node selection in
each stage game. In each stage game, the Sink offers to the buyer a payment which
compensates for its energy consumption and service reward (Sink-pay). The buyer
purchases service from sellers and selects the relay node by evaluating each seller, the
selected seller will get a payment for the forwarding service (relay-reward). The sta-
bility of the routing path is very important. So the choice of the relay node is a key
problem. From a game theory perspective, such a stable configuration corresponds to
Nash Equilibrium [5]. In our algorithm, we first choose the best node to forward
packets in each stage, and then all the selected nodes form a stable path through
multi-stage game. So our algorithm decreases the energy consumption and prolongs
network life time.
The rest of this paper is organized as follows. Section 2 surveys the related net-
work. Section 3 introduces the network model and game model. Section 4 provides
the design of pricing routing game model. Section 5 elaborates the proposed algo-
rithm. Section 6 evaluates the algorithm with numerical analysis and simulation. At
last, we conclude the paper in the section 7.

2 System Model

2.1 Network Model

Consider a multi-hop networks consisting of one buyer node and several seller nodes,
we model the WSNs as an undirected graph G =< V , E > as illustrated in Fig. 1,
where V denotes the node set and E represents the link set. Each link between nodes
indicates whether each buyer-seller pair can communicate with each other. If they
could, there is a link between them. Otherwise, there is no link between the two
nodes. ei denotes the residual energy of node vi, hvi denotes the hop count from the
node vi to the Sink node, and h(vs, v3) denotes the minimum hop count from vs to the
Sink node through node v3. Each node saves local network information, including its
minimum hop count to the Sink node, its residual energy, its neighbour nodes’ mini-
mum hop count to the Sink node and its neighbour nodes’ residual energy. We present
the node of vs information matrix in Table 1.

Fig. 1. Network topology


Game Theoretic Approach in Routing Protocol for Cooperative WSNs 209

Table 1. The node of VS information matrix

Notate Explanation Value Explanation


The minimum hop count of vs to the sink node
The node vs’s (h(vsˈv1),es)=(3,es) through v1 and the residual energy of vs are (3, es).
vs
information (h(vsˈv3),es)=(3,es) The minimum hop count of vs to the sink node
through v3 and the residual energy of vs are (3, es).
The minimum hop count of neighbour node v1 to the
sink node through v6 and the residual energy of v1 are
(3, e1).
The neighbour (h(v1ˈv6),e1)=(3,e1) The minimum hop count of neighbour node v1 to the
Nvs (v1) node v1’s infor- (h(v1ˈv2),e1)=(2,e1) sink node through v2 and the residual energy of v1 are
mation of vs (h(v1ˈv4),e1)=(2,e1) (2, e1).
The minimum hop count of neighbour node v1 to the
sink node through v4 and the residual energy of v1 are
(2, e1).
The neighbour
(h(v3ˈv2),e3)=(2,e3)
Nvs (v3) node v3’s infor- The same with the above description.
mation of vs (h(v3ˈv5),e3)=(2,e3)

2.2 Problem Description

In most cases, the key problem that gets energy efficiency and reliability is to design
an efficient incentive for accelerating cooperation. Most of nodes would like to
choose cooperation strategy in order to improve their payoff. Some of the nodes are
likely to break their agreement and drop the packets after they get the payoff. Some
existing mechanisms can't solve this problem well, and these mechanisms can expend
many network resources due to the overhead packets’ transmitting frequently [5]. In
WSNs, most of the nodes will serve as routers, the whole forwarding process is de-
picted as multi-stage game, each stage is composed of one buyer and several sellers,
the buyer is the sending node and the sellers are receiving nodes. We model this net-
work based on sealed auction game model, the whole auction game process in net-
work is shown in Fig. 2. At first stage, the Sink needs the packets of source node, the
source node acts as the buyer, and he will pay a certain price to buy the forwarding
service from his neighbour nodes. After the selected neighbour node receives the
packets and gets the profit, the auction game enters the second stage with new state
and the sellers in stage one become buyers. The auction game is a typical strategic
game with incomplete information, because the buyer knows the valuation of all the
sellers, but any seller does not know valuation about others except his own.

Fig. 2. Auction game model


210 Q. Liu, X. Xian, and T. Wu

3 Pricing Routing Game Model


We will state the first price sealed auction game model at this section, In each stage
game, the players of the game are composed of one node and its neighbor nodes, the
node is buyer and its neighbor nodes are sellers. The Sink will pay a certain price to
the buyer who will send packets to the Sink, and the buyer will pay certain price to
sellers to compensate for the forwarding service. Multi-sellers compete against each
other to forward packets and improve their own profit.
Definition 1. Define the link quality as r (ei (t ), hi , j ) = hi , j / ei (t ) , where ei (t ) de-
notes the residual energy at time t of node i . hi , j is the hop count of node i to the
Sink node through node j . The smaller the value of r (ei (t ), hi , j ) , the higher the reli-
ability of the link.
Definition 2. Define the forwarding success rate of node i at time t as α i (t ) ,
α i (t ) = Psi (t ) / Pri (t ) , Psi (t ) is the number of the forwarded packets, and Pri (t ) is
the total number of the received packets in a time unit before time t .
In the cooperative multiuser forwarding game model, an important question needs
to be considered: which node can forward packets reliably. Each player will get the
reward of value b for sending packets. In an auction game, buyer and sellers give
their own price and quote at the same time. Define N i is the neighbor node set of
node i, define the bidding function of node j in N as β j (hj ,k (t), r(ej (t ), hj ,k )) , given by
i

β j ( r (e j (t ), h j ,k )) = b ⋅ r (e j (t ), h j ,k ) (1)

where h j , k is the hop count of node j to the sink node through neighbor node k , b
is the payoff for forwarding packets, r (e j (t ), h j , k ) is the link quality of node j .
Because the buyer knows the residual energy and the hop count of his neighbor
nodes, he will give an average price according to the valuation information of all the
neighbor nodes. The bidding function of buyer node i can be expressed as

ρi ( r (ei (t ), hi , j )) = b ⋅ ∑ ( r (ei (t ), hi , j )) / Φ ( N i ) (2)


j∈ N i

where ∑ (r (e (t ), h
i i, j
)) is the total evaluation of all the neighbor nodes’ link quality,
j∈ N i

Φ ( N i ) is the number of i ’s neighbor nodes, ∑ ( r (e (t ), h


i i, j
)) / Φ ( N i ) is the average
j∈N i

link quality of each neighbor node. After the buyer and sellers quote price respec-
tively according to their link quality, we can get the deal price ( ϕ (i , j ) ) of buyer i
and seller j , given by

⎧β j (hj ,k , r(ej (t), hj ,k )) ρ (ei (t ), hi, j )) > β j (hj ,k , r(ej (t), hj ,k ))


ϕ(i, j) = ⎨ (3)
⎩0 others
Game Theoretic Approach in Routing Protocol for Cooperative WSNs 211

where ϕ (i , j ) is the deal price of buyer i and seller j . According to (3), when the
quoted price of buyer i is larger than seller j , the buyer i buy forwarding service at
j ’s price, or else, the deal price is 0, which means i doesn’t choose j as relay node.
When there are multiple sellers, the seller with the lowest price is chosen by i , which
is depicted as
ϕ ( i , j ) = M IN β j ( h j , k , r ( e j ( t ), h j , k )) (4)
j∈ N i

3.1 The Payoff Function of Source Node as Buyer at First Stage Auction

Sink will give a price to the source nodes which send the packets to it, and it is the
payment to compensate for energy consumption and service reward of source node.
We define the payoff function of source node s at time t is

us (t ) = α s (t )[( hs , j ) 2 ⋅ b − ϕ ( s, j ) − ess (t ) / es (t )] (5)

where ess (t ) is the energy consumed in sending packets of source node s , es (t ) is


2
the residual energy of source node s . ( hs , j ) ⋅b is the payment to source node s
given by Sink, it makes the source node (buyer) give the lowest pay to sellers, then it
can get the highest possible profit. We use ( h s , j ) 2 control the pay, since we must
guarantee the value of payoff that source node gets is greater than zero, and ensure
each selected seller get the fair payoff.
As sensor node is rational and selfish, he refuses to send all packets or send part of
packets to his next seller after he obtains the payment, and get more extra profit along
with reducing energy consumed. In order to control this situation, forwarding success
rate is introduced to control the node’s payoff, if the node wants to maximize its own
payoff, it should forward packets as much as possible. If he refuses to give the price
to his neighbor node, the neighbor node will drop all the packets which he sends, and
then the number of forwarded packets is falling as well as the profit.
According to the definition, we need that for source node vs , his utility ui (t ) ≥ 0.
This implies
ϕ ( s , j ) + e ss ( t ) / e s ( t )
0 < < b (6)
( hs , j (t )) 2

The aim of source node is to maximize its own utility, which can be expressed as
m a x ( u s ( t ) ) = m a x {α s ( t ) ⋅ [ ( h s , j ( t ) ) 2 ⋅ b − ϕ ( s , j ) − e s s ( t ) / e s ( t ) ]} (7)

According to (5), in order to maximize his utility, the source node should increase
the forwarding success rate and choose the lowest deal price ϕ ( i , j ) .This implies

m a x ( u s ( t )) = ( m a x (α s ( t )) ) ⋅ [( h s , j ) 2 ⋅ b − m in ( ϕ ( s , j ) + e s s ( t ) / e s ( t ))] (8)
212 Q. Liu, X. Xian, and T. Wu

3.2 The Payoff Function of Seller at First Stage Auction

Assume that the energy consumed in receiving packets is ignored, the buyer (source
node) and the seller (relay node) agree to transact at the price ϕ ( s, j ) . The payoff
function of the relay node j is depicted as

u j (t ) = ϕ ( s , j ) (9)

where u j (t ) shows j ’s payoff at time t .

3.3 The Payoff Function of Buyer at Second Stage Auction

In this stage, source node withdraw the game, relay node (seller) j in first stage acts
as the buyer, his neighbor nodes set N j act as sellers, and node j needs buy the for-
warding service from his neighbor nodes N j . Node j and each neighbor node in
N j bid at the same time, the process is same with the first stage game. The final deal
price is the minimum price of the neighbor nodes’. Assume k is the selected neighbor
node, the payoff function of node j at time t is u j (t ) , given by
u j (t ) = α j (t )[( h j ,k )2 ⋅ b + ϕ (s, j) − ϕ ( j, k ) − e js (t ) / e j (t )] (10)

herein, ( hs , j ) 2 ⋅ b is the payoff of node j given by Sink, ϕ ( s , j ) is the deal price of


source node and node j in the first stage, ϕ ( j , k ) is the deal price of node j and his
neighbor node k in the second stage, e js (t ) is the energy consumed in sending pack-
ets of node j , e j (t ) is the residual energy of node j .
( )
We can see from Equation 10 , the aim of node j is to maximize his own util-
ity, which is shown as
m a x u j ( t ) = m a x {α i ( t ) [ ( h j,k ) 2 ⋅ b + ϕ (i, j ) − ϕ ( j , k ) − e js ( t ) / e j ( t ) ]} (11)

From the above equations, each node would like to choose the minimum pricing node
as his relay node to maximize his payoff. Node’s forwarding success rate is affected
by his neighbour node, and node’s payoff is constrained by forwarding success rate.
In order to maximize his payoff, node will pay price to his minimum pricing neighbor
node to ensure his forwarding success rate.

4 Algorithmic Flow
Our goal is to find out the reliable and stable path from the source to the destination,
if such path exists, according to our model, each node on this path will get non-
negative payoff. If a node will give price to his neighbours, he must know both his
own evaluation information and his neighbour nodes’ evaluation information. There-
fore, we propose algorithm 1 and algorithm 2 to build the neighbour node table for
each node.
Game Theoretic Approach in Routing Protocol for Cooperative WSNs 213

Algorithm 1. Build Up Hop Information


The Sink initializes the hop=1, and sends a BSHello information (including the node id and
its hop count to Sink) to each node.
Each node Vj build a empty table for the information of neighbour node Vi ∈ Nj, hop and
residual energy.
Node Vj ∈ N receives the BSHello (hop1, Vi) information.
Find(Vi) ;/* Find Vj ’s neighbour node Vi*/
If (Vi exists) then
Compare (Vi . hop, hop1); /* Vi.hop is in neighbour node table, hop1 is in the received
BSHello information*/
If (Vi . hop > hop1) then
Update(hop1); /*Insert the value of hop1 to Vi . hop*/
End if
Else
insert(Vi , hop1);
End if
hop2 = hop1++;
Send(Vj , hop2 , BSHello); /*relay the information to other nodes.*/

Algorithm 2. Build Up and Update Energy Information

Before each cluster head selection, for each node Vi ∈ N, sending a Hello information with


node id, residual energy (Hello, Vi ,energy) to his neighbours, Send(Hello, Vi ,energy(ei)).
2. For each node Vi ∈ N receives a Hello information, recv(Hello, Vj Ni , ej ).
3. Find(Vj );
Update(Vj , ej);/*Update the energy of Vj*/
When the current cluster selection time is over, jump to step 1.

The neighbour node table is built up at the cluster head selection stage. In the proc-
ess of sending packets, each node will choose the proper relay node to forward pack-
ets according the neighbour node table. The buyer node and seller nodes will launch
an auction for the forwarding service, and the seller nodes compete with each other to
get profit by sending buyer node’s packets. The pricing routing game algorithm is
given in algorithm 3.

5 Numeric and Simulation Results

5.1 Numerical Analysis

We use Matlab to analyse the node’s payoff which is affected by the forwarding suc-
cess rate and link quality, and we use source node and his neighbour nodes game
model to calculate source node’s payoff.
We set the number of nodes is 50, the number of BS nodes is 1 and the mobility of
nodes is static, the value of reward b is 1, the hop count of source node to Sink is 10,
and the energy consumed of sending packets is 0.1. In Fig.3, it shows the payoff of
source node at different forwarding success rate α . We can know that the payoff at
α = 0.8 is higher than the payoff at α = 0.4 and α = 0.1 . The experiment results
214 Q. Liu, X. Xian, and T. Wu

show that larger the forwarding success rate, higher the payoff, and also show the
source node’s payoff increases with his neighbour node’s residual energy increasing.
Therefore, each node would like to choose the maximum energy neighbour node as
his relay node to send packets to destination.

Algorithm 3. Price Routing Game

For each source node Vk select the relay node Vj ∈ Nk, and sends a information to his
neighbour nodes.
The source node estimates price ρ ( r ( e (t ), h
k k k , j )) , and his neighbours Nk estimate their
price, for node Vj ∈ Nk, the price is β j ( r ( e j (t ), h j , l ))
Neighbour nodes send the price information to the source node, and source node judge the
deal price by
ρ k (r (ek (t ), hk , j )) ≥ β j ( r (e j (t ), h j ,l )) , finding the proper node as relay node, and give
the deal price ϕ ( k , j ) to the selected node.
Sends data to Vj ,calculate the payoff u k (t ) of Vk.
For each node Vj , if (Vj is the selected node) then
Buy the link quality from neighbour nodes, give the deal price to the selected node;
Send data to Vl
Calculate the payoff of Vj, u j (t ) = α j (t )[ h j , k ) ⋅ b + ϕ ( k , j ) − ϕ ( j , l ) − e js (t ) / e j (t )]
2

Else
The payoff u j (t ) =0
End if
If the node is not the one which is described above, go to sleep.

80

α =0.8
70

60

50
α =0.4
40
payoff

30

20

α =0.1
10

0
0 1 2 3 4 5 6 7 8 9 10
energy

Fig. 3. The payoff of source node at different Fig. 4. The payoff of different hops of
forwarding success rate neighbour nodes and node’s energy

The source node’s payoff with different hop count and different residual energy of
neighbour nodes is given in Fig.4. We assume the value of forwarding success rate is
1 ( α =1), the hop count of source node is 10, the energy consumed of sending packets
is 0.1, and the residual energy of source node is 8. Fig.4 indicates that the smaller the
hop count of neighbour nodes and the larger the residual energy of neighbour nodes
Game Theoretic Approach in Routing Protocol for Cooperative WSNs 215

are, the higher the payoff of source node is. We also conclude that the larger the hop
count and the smaller the residual energy of neighbour nodes are, and the payoff of
source node will decrease rapidly and nearly close to 0. In order to increase its payoff,
the source node will choose the minimum hop count and the largest residual energy
neighbour node as his relay node.

5.2 Simulation Results

In this section, we perform extensive simulation experiments to analyze the perform-


ance of our algorithm on NS-2. All the nodes randomly distribute in the network
scene, the BS location is in the center of the network field, and all of the simulation
parameters used in the simulations are summarized in Table 2.
Fig. 5 shows the results of the residual energy profit of a randomly selected node in
WSNs using LEACH protocol and our proposed algorithm respectively. Simulation
results show that the randomly selected node’s energy consumes very quickly when
the network uses LEACH protocol, and our algorithm is much better than the
LEACH protocol on energy saving. This is because of LEACH protocol’s single hop
communication strategy. In LEACH protocol, every node sends packets to its
cluster head node, the cluster head communicates directly with the remote base sta-
tion. Because the cluster head is far from the base station, the transmission power is
large, and leads to excessive energy consumption [18]. We know that the energy con-
sumption of single hop network is worse than the multi-hop network. We transmit
packets through multi-hop communication to save the nodes’ energy, so the
residual energy of the nodes with our algorithm is much greater than the LEACH
protocol.
Simulation results in Fig. 6 exhibit the profit of the sum of all active nodes’
residual energy in a period time. We can know that the sum of all active nodes’ the
residual energy decays more rapidly in LEACH protocol than in our algorithm, and
the variation of the sum of the residual energy is more stable in our algorithm. The
LEACH protocol consumes more amounts of energy than our algorithm.
In order to illustrate the stability of energy consumption in our algorithm well, we
collect nine nodes randomly and describe their residual energy in a short period of
time, as depicted in Fig.7. We can see from Fig.7 that the nine nodes’ residual energy
with LEACH protocol is extremely unstable. Because cluster head selection is ran-
dom in LEACH protocol, the number of the cluster head and their location are ran-
dom. On the other hand, the cluster head communicates directly with the remote base
station, the transmission power is large, and leads to excessive energy consumption,
which overburden a small number of sensor nodes. however, in our algorithm, we
transmit packets through multi-hop communication, and we use game model to select
the best relay node, so the residual energy is steady in network with our algorithm.
Therefore, the life time of sensor network with LEACH protocol is shorter than the
network with our pricing routing game algorithm.
216 Q. Liu, X. Xian, and T. Wu

Table 2. Simulation parameters

Parameter name Value


Initial energy 10 J
Pt_ 0.281838 w
Sensing Power_ 0.015J
Processing Power 0.024J
Simulation time 200 s
BS energy 100J
Network area size 100×100 m2
Control packet length 200bits

10 1000
LEACH
princing routing game
9 pricing routing game 900 LEACH

800
8

700
7
residual energy

600
energy

6
500
5
400

4 300

3 200

100
2
0 5 10 15 20 25 30 35 40 45 50
simulation time 0
0 10 20 30 40 50 60 70
simulation time

Fig. 5. The residual energy profit of a randomly Fig. 6. The profit of the sum of all active
selected node nodes’ residual energy

10

7
residual energy

2
LEACH
1 pricing routing game

0
15 16 19 24 35 37 40 41 49
node id

Fig. 7. The randomly selected node’s residual energy

6 Conclusion
In this paper, we considered an energy constrained cooperative wireless sensor net-
works, and proposed a pricing routing game model based on the first price sealed auc-
tion game model. Through the pricing routing game model, we encourage the relay
node to forward packets, each node aims at maximizing his payoff by choosing the op-
timal relay node. Compared to the LEACH protocol, our algorithm can enhance net-
work’s life time effectively. In the next paper, we will discuss the network performance
under the influence of the dishonest nodes and cooperative nodes respectively.
Game Theoretic Approach in Routing Protocol for Cooperative WSNs 217

Acknowledgements. This work is supported by the National Natural Science Foun-


dation of China under GrantNo. 60903213. Natural Science Foundation of Chongqing
under GrantNo.CSTC2007BB2386.

References
1. Machado, R., Tekinay, S.: A survey of game-theoretic approaches in wireless sensor net-
works. Computer Networks 52, 3047–3061 (2008)
2. Liu, Q., Liao, X.F., et al.: Dynamics of an inertial two-neuron system with time delay.
Nonlinear Dynamics 58(3), 573–609 (2009)
3. Komathy, K., Narayanasamy, P.: Best neighbor strategy to enforce cooperation among selfish
nodes in wireless ad hoc network. Computer Communications 30(18), 3721–3735 (2007)
4. Jun, C., Xiong, N.X., Yang, L.T., He, Y.: A joint selfish routing and channel assignment
game in wireless mesh networks. Computer Communications 31, 1447–1459 (2008)
5. Liu, H., Krishnamachari, B.: A price-based reliable routing game in wireless networks. In:
Proceedings of the First Workshop on Game Theory for Networks, GAMENETS 2006
(2006)
6. Heinzelman, W.B., Chandrakasan, A.P., Balakrishnan, H.: An application-specific proto-
col architecture for wireless microsensor networks. IEEE Transactions on Wireless Com-
munications 1, 660–670 (2002)
7. Zhong, S., Chen, J., Yang, Y.R.: A simple, cheat-proof, Credit-based System for Mobile
Ad hoc Networks. In: Proceeding of IEEE INFOCOM, pp. 1987-1997 (2003)
8. Marti, S., Giuli, T.J., Lai, K., Baker, M.: Mitigating Routing Misbehaviour in Mobile Ad
Hoc Networks. In: Proceedings of the Sixth Annual International Conference on Mobile
Computing and Networking, MobiCom 2000 (2000)
9. Lu, Y., Shi, J., Xie, L.: Repeated-Game Modeling of Cooperation Enforcement in Wireless
Ad Hoc Network. Journal of Software 19, 755–776 (2008)
10. Altman, E., Kherani, A.A., Michiardi, P., Molva, R.: Non-cooperative Forwarding in Ad-
hoc Networks, Technical Report INRIA Report No.RR-5116 (2004)
11. Wang, B., Han, Z., Liu, R.: Stackelberg game for distributed resource allocation over mul-
tiuser cooperative communication networks. IEEE Trans. Mobile Computing 8(7), 975–
990 (2009)
12. Shastry, N., Adve, R.S.: Stimulating cooperative diversity in wireless ad hoc networks
through pricing. In: Proc. IEEE Intl. Conf. Commun. (June 2006)
13. Zhong, S., Li, L., Liu, Y., Yang, Y.R.: On designing incentive-compatible routing and for-
warding protocols in wireless ad-hoc networks an integrated approach using game theoretical
and cryptographic techniques, Tech. Rep. YALEU/DCS/TR-1286, Yale University (2004)
14. Huang, J., Berry, R., Honig, M.: Auction-based spectrum sharing. ACM/Springer J. Mo-
bile Networks and Applications 11(3), 405–418 (2006)
15. Chen, J., Lian, S.G., Fu, C., Du, R.Y.: A hybrid game model based on reputation for spec-
trum allocation in wireless networks. Computer Communications 33, 1623–1631 (2010)
16. Huang, J., Han, Z., Chiang, M., Poor, H.V.: Distributed power control and relay selection
for cooperative transmission using auction theory. IEEE J. Sel. Areas Commun. 26(7),
1226–1237 (2008)
17. Chen, L., Szymanski, B., Branch, W.: Auction-Based Congestion Management for Target
Tracking in Wireless Sensor Networks. In: Proceedings of the 2009 IEEE International
Conference on Pervasive Computing and Communications (PERCOM 2009), Galveston,
TX, USA, 9-13, pp. 1–10 (2009)
A New Collaborative Filtering Recommendation
Approach Based on Naive Bayesian Method

Kebin Wang and Ying Tan

Key Laboratory of Machine Perception (MOE), Peking University


Department of Machine Intelligence, School of Electronics Engineering
and Computer Science, Peking University, Beijing, 100871, China
[email protected], [email protected]

Abstract. Recommendation is a popular and hot problem in e-commerce.


Recommendation systems are realized in many ways such as content-based
recommendation, collaborative filtering recommendation, and hybrid
approach recommendation. In this article, a new collaborative filtering
recommendation algorithm based on naive Bayesian method is proposed.
Unlike original naive Bayesian method, the new algorithm can be applied
to instances where conditional independence assumption is not obeyed
strictly. According to our experiment, the new recommendation algorithm
has a better performance than many existing algorithms including the
popular k-NN algorithm used by Amazon.com especially at long length
recommendation.

Keywords: recommender system, collaborative filtering, naive Bayesian


method, probability.

1 Introduction
Recommendation systems are widely used by e-commerce web sites. They are a
kind of information retrieval. But unlike search engines or databases they pro-
vide users with things they have never heard of before. That is, recommendation
systems are able to predict users’ unknown interests according to their known
interests[8],[10]. There are thousands of movies that are liked by millions of peo-
ple. Recommendation systems are ready to tell you which movie is of your type
out of all these good movies. Though recommendation systems are very useful,
the current systems still require further improvement. They always provide ei-
ther only most popular items or strange items which are not to users’ taste at
all. Good recommendation systems have a more accurate prediction and lower
computation complexity. Our work is mainly on the improvement of accuracy.
Naive Bayesian method is a famous classification algorithm[6] and it could also
be used in the recommendation field. When factors affecting the classification
results are conditional independent, naive Bayesian method is proved to be the
solution with the best performance. When it comes to the recommendation field,
naive Bayesian method is able to directly calculate the probability of user’s
possible interests and no definition of similarity or distance is required, while in

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 218–227, 2011.

c Springer-Verlag Berlin Heidelberg 2011
A New Collaborative Filtering Recommendation Approach 219

other algorithms such as k-NN there are usually many parameters and definitions
to be determined manually. It is always fairly difficult to measure whether the
definition is suitable or whether the parameter is optimal. Vapnik’s principle said
that when trying to solve some problem, one should not solve a more difficult
problem as an intermediate step. On the other side, although Bayesian network[7]
have good performance on this problem, it has a great computational complexity.
In this article, we designed a new collaborative filtering algorithm based on
naive Bayesian method. The new algorithm has a similar complexity to naive
Bayesian method. However, it has an adjustment of the independence which makes
it possible to be applied to the instance where conditional independence assump-
tion is not obeyed strictly. The new algorithm provides us with a new simple
solution to the lack of independence other than Bayesian networks. The good per-
formance of the algorithm will provide users with more accurate recommendation.

2 Related Work
2.1 Recommendation Systems
As shown in Table 1, recommendation systems are implemented in many ways.
They attempt to provide items which are likely of interest to the user accord-
ing to characteristics extracted from the user’s profile. Some characteristics are
from content of the items, and the corresponding method is called content-based
approach. In the same way, some are from the user’s social environment which
is called collaborative filtering approach[12].
Content-based approach reads the content of each item and the similarity be-
tween items is calculated according to characteristics extracted from the content.
The advantages of this approach are that the algorithm is able to handle brand
new items, and the reason for each recommendation is easy to explain. However,
not all kinds of items are able to read. Content-based systems mainly focus on
items containing textual information[13], [14], [15]. When it comes to movies,
the content-based approach does not work. Therefore in this problem, we chose
collaborative filtering approach.
Compared to content-based approach, collaborative filtering approach does
not care what the items are. It focuses on the relationship between users and
items. That is, in this method, items in which similar users are interested are
considered similar[1],[2].
Here we mainly talk about collaborative filtering approach.

Table 1. Various recommendation systems

recommendation systems

content-based collaborative filtering

model-based memory-based
220 K. Wang and Y. Tan

2.2 Collaborative Filtering

Collaborative filtering systems try to predict the interest of items for a partic-
ular user based on the items of other users’ interest. There have been many
collaborative systems developed in both academia and industry[1]. Algorithms
for collaborative filtering can be grouped into two-general classes, memory-based
and model-based[4], [11].
Memory-based algorithms essentially are heuristics that make predictions
based on the entire database. Values deciding whether to recommend the item
is calculated as an aggregate of the other users’ records for the same item.[1]
In contrast to memory-based methods, model-based algorithms first built
a model according to the database and then made predictions based on the
model[5]. The main difference between model-based algorithms and memory-
based methods is that model-based algorithms do not use heuristic rules. Instead,
models learned from the database provide the recommendations.
The improved naive Bayesian method belongs to the model-based algorithms
while the k-NN algorithm which appears as a comparison later belongs to the
memory-based algorithms.

2.3 k-NN Recommendation


k-NN recommendation is a very successful recommendation algorithm used by
many e-commerce web sites including Amazon.com[2], [9].
The k-NN recommendation separates into item-based k-NN and user-based
k-NN. Here we mainly talk about item-based k-NN popularized by Amazon.com.
First an item-to-item similarity matrix using cosine measure is built. For each
pair of items in the matrix, the similarity is defined as the cosine value of two
item-vectors. The item-vectors’ M dimensions corresponding to the M users is
one, which means the user is interested in the item, or zero otherwise.
The next step is to infer each user’s unknown interests using the matrix and
his known interests. The items most similar to his known interests will be rec-
ommended according to the matrix.

3 Improved Naive Bayesian Method


3.1 Original Naive Bayesian Method
For each user, we are supposed to predict his unknown interests according to his
known interests. User’s unknown interest is expressed in such a way.

p(mx |mu1 , mu2 , · · ·) (1)


When considering the user’s interest on item mx , we have mu1 , mu2 · · · as
known interests. Of course, mx is not included by the user’s known interests. The
A New Collaborative Filtering Recommendation Approach 221

conditional probability means the possibility of the item mx being an interest of


the user whose known interests are mu1 , mu2 , etc. In our algorithm, the items of
higher conditional probability have higher priority to be recommended and our
job is to compute the conditional probability of each item for each user.

p(mx ) · p(mu1 , mu2 , · · · |mx )


p(mx |mu1 , mu2 , · · ·) = (2)
p(mu1 , mu2 , · · ·)

We have the conditional independence assumption that

p(mu1 , mu2 , · · · |mx ) = p(mu1 |mx ) · p(mu2 |mx ) · · · · (3)

In practice, comparison only occurred among the conditional probabilities of


the same user where the denominators of equation (2) p(mu1 , mu2 , · · ·) are all
the same and have no influence on the final result. Therefore its calculation is
simplified as (4).

p(mu1 , mu2 , · · ·) = p(mu1 ) · p(mu2 ) · · · · (4)

So the conditional probability can be calculated in this way.

p(mx |mu1 , mu2 , · · ·) = p(mx ) · q, (5)

where
p(mu1 , mu2 , · · · |mx ) p(mu1 |mx ) p(mu2 |mx )
q= = · · ··· (6)
p(mu1 , mu2 , · · ·) p(mu1 ) p(mu2 )

3.2 Improved Naive Bayesian Method


In fact, the conditional independence assumption is not suitable in this problem.
Because the relevance between items is the theory foundation of our algorithm.
p(mx ) in (5) shows whether the item itself is attractive, and q shows whether
the item is suitable for the very user. In our experiment, it is revealed that the
latter has more influence than it deserved because of the lack of independence.
To adjust the bias we have
cn
p(mx |mu1 , mu2 , · · ·) = p(mx ) · q n (7)

n is the number of the user’s known interests and cn is a constant between 1


and n. The transformation makes the influence of the entire n known interests
equivalent to the influence of cn interests, which will greatly decrease the influ-
ence of the user’s known interests. Actually, cn represents how independent the
items are. The value of cn is calculated by experiments and for most of the n’s
the value is around 3.
222 K. Wang and Y. Tan

3.3 Implementation of Improved Naive Bayesian Method

Calculation of prior probability. First we calculate the prior probability


p(mi ). The prior probability is the possibility that the item mi is interesting to
all the users. The algorithm 1 shows how we do the calculation.

foreach item i in database do


foreach user that interested in the item do
ti = ti + 1;
end
p(mi ) = ti / TheNumberOfAllUsers;
end
Algorithm 1. Calculation of prior probability

Calculation of conditional probability matrix. In order to calculate the


conditional probability, first the joint probability is calculated and then the joint
probability is turned into conditional probability. The algorithm 2 shows how
we do the calculation.
foreach user in database do
foreach item a in the user’s known interests do
foreach item b in the user’s known interests do
if a is not equal to b then
ta,b = ta,b + 1;
end
end
end
end
foreach item pair (a,b) do
p(ma , mb ) = ta,b / TheNumberOfAllUsers;
p(ma |mb ) = p(ma , mb )/p(mb );
end
Algorithm 2. Calculation of conditional probability matrix

Making recommendation. Now we have the prior probability for each item
and the conditional probability for each pair of items. The algorithm 3 will show
how we make the recommendations.

How to compute cn . As mentioned before, cn is calculated by experiments.


That is, the database is divided into different groups according to the size of
user’s known interest. For each group we use many cn ’s to do the steps above
and choose the one with the best result.

3.4 Computational Complexity

The offline computation, in which prior probability and conditional probability


matrices are calculated, has a complexity of O(LM), where L is the length of log
A New Collaborative Filtering Recommendation Approach 223

foreach user that needs recommendation do


foreach item x do
r(mx ) = p(mx );
foreach item ui in user’s known interests do
p(mx |mui ) cn
r(mx ) = r(mx ) × ( p(m x)
)n;
end
p(mx |mu1 , mu2 , · · ·) = r(mx );
end
end

Algorithm 3. Making recommendation

in which each line represent an interest record of a user and M is the number
of items. The online computation which gives the recommendation of all users,
also has a complexity of O(LM). Therefore the total complexity is O(LM) only.

4 Experiment
Many recommendation algorithms are in use nowadays. We have non-
personalized recommendation and k-NN recommendation mentioned before to
be compared with our improved naive Bayesian.

4.1 Non-Personalized Recommendation


Non-Personalized recommendation is also called top-recommendation. It presents
the most popular items to all users. If no relevancy is there between user’s in-
terests and the user, the Non-Personalized will be the best solution.

4.2 Data Set


The movie log from Douban.com is used in the experiment. It has been a non-
public dataset up to now. The log includes 7,163,548 records of 714 items from
375,195 users. It is divided into matrix-training part and testing part. Each
user’s known interest of testing part is divided into two groups. One of them is
considered known and is used to infer the other which is considered unknown.
The Bayesian method ran for 264 seconds and the k-NN for 278 seconds. Both
of the experiments are implemented in Python.

4.3 Evaluation
We have F-measure as our evaluation methodology. F-measure is the harmonic
mean of precision and recall[3]. Precision is the number of correct recommen-
dations divided by the number of all returned recommendations and recall is
the number of correct recommendations divided by the number of all the known
interests supposed to be discovered. A recommendation is considered correct if
it is included in the group of interests which is set unknown. It is to be noted
that the value of our experiment result shown later is doubled F-measure.
224 K. Wang and Y. Tan

4.4 Comparison with Original Naive Bayesian Method


As it is shown in Figure 1, the improvement on naive Bayesian method has a fan-
tastic effect. Before the improvement it is even worse than the non-personalized
recommendation. After the improvement, naive Bayesian method’s performance
is obviously better than the non-personalized recommendation at any length of
recommendation.

Fig. 1. comparison with original naive Bayesian method

4.5 Comparison with k-NN


As it is shown in Figure 2, before the peak k-NN and improved naive Bayesian
method have almost the same performance. But when more recommendations
are made, k-NN’s performance declines rapidly. At the length larger than 45,
k-NN is even worse than the non-personalized recommendation while improved
naive Bayesian method still has a reasonable performance.

4.6 Analysis and Discussion


It is noticed that though there are great difference between different algorithms,
the performances of all these algorithms turn out to have a peak. Moreover,
the value of F-measure increases rapidly before the peak and decreases slowly
after the peak. The reason for the rapid increase is that the recall rises and
the precision is almost stable, while the reason for the slow decrease is that the
precision reduces but the recall hardly increases.
A New Collaborative Filtering Recommendation Approach 225

Fig. 2. Comparison with k-NN

According to our comparison between ordinary and improved naive Bayesian


method, the improvement on naive Bayesian method has an excellent effect.
The result of ordinary naive Bayesian method is even worse than that of non-
personalized recommendation. However, after the improvement the performance
is obviously better than the non-personalized recommendation. It is concluded
that there is a strong relevance between user’s known and unknown interests. The
performance of non-personalized recommendation tells that the popular items
are also very important to our recommendation. When a proper combination
between two aspects is made, as it is in the improved naive Bayesian method,
performance of the algorithm should be satisfactory. When the combination is
not proper, it may lead to a terrible performance as it is shown in the ordinary
naive Bayesian method.
The comparison of improved naive Bayesian method and k-NN shows that the
improved naive Bayesian method has a better performance than the popular k-
NN recommendation especially when it comes to long length recommendation. It
is worth notice that the performance of two different algorithms are fairly close
at short length recommendation, which leads to the conjecture that the best
possible performance may have been approached though it calls for more proofs.
Unlike short length recommendation, the performance of k-NN recommenda-
tion declines rapidly after the peak. It is even worse than the non-personalized
recommendation at the length larger than 45. It is concluded that Bayesian
method’s good performance is because of its solid theory foundation and better
226 K. Wang and Y. Tan

obedience of Vapnik’s principle while k-NN’s similarity definition may not be


suitable for all the situations, which leads to the bad performance at long length
recommendation.

5 Conclusion

In this article, we provide a new simple solution to the recommendation topic.


According to our experiment, the improved naive Bayesian method has been
proved able to be applied to instances where conditional independence assump-
tion is not obeyed strictly. Our improvement on naive Bayesian method greatly
improved the performance of the algorithm. The improved naive Bayesian method
has shown its excellent performance especially at long length recommendation.
On the other hand, we are still wondering what the best possible performance
of a recommendation system is and whether it has been approached in our ex-
periment. The calculation of cn is still not satisfactory. There may be a more
acceptable way to get cn , which is not by experiments. All of these call for our
future work.

Acknowledgments. This work was supported by National Natural Science


Foundation of China (NSFC), under Grant No. 60875080 and 60673020, and
partially supported by the National High Technology Research and Develop-
ment Program of China (863 Program), with Grant No. 2007AA01Z453. The
authors would like to thank Douban.com for providing the experimental data,
and Shoukun Wang for his stimulating discussions and helpful comments.

References
1. Adomavicius, G., Tuzhilin, A.: The next generation of recommender systems: A sur-
vey of the state-of-the-art and possible extensions. IEEE Transactions on Knowl-
edge and Data Engineering (2005)
2. Linden, G., Smith, B., York, J.: Amazon.com recommendations: Item-to-item col-
laborative filtering. IEEE Internet Computing (2003)
3. Makhoul, J., Kubala, F., Schwartz, R., Weischedel, R.: Performance measures for
information extraction. In: Proceedings of Broadcast News Workshop 1999 (1999)
4. Breese, J.S., Heckerman, D., Kadie, C.: Empirical Analysis of Predictive Algo-
rithms for Collaborative Filtering. In: Proc. 14th Conf. Uncertainty in Artificial
Intelligence (July 1998)
5. Hofmann, T.: Collaborative Filtering via Gaussian Probabilistic Latent Semantic
Analysis. In: Proc. 26th Ann. Int’l ACM SIGIR Conf. (2003)
6. Kotsiantis, S.B., Zaharakis, I.D., Pintelas, P.E.: Machine learning: a review of clas-
sification and combining techniques. Artificial Intelligence Review (2006)
7. Yuxia, H., Ling, B.: A Bayesian network and analytic hierarchy process based
personalized recommendations for tourist attractions over the Internet. Expert
System With Applications (2009)
8. Resnick, P., Varian, H.R.: Recommender systems. Communications of the ACM
(March 1997)
A New Collaborative Filtering Recommendation Approach 227

9. Koren, Y.: Factorization Meets the Neighborhood: a MultifacetedCollaborative


Filtering Model. ACM, New York (2008)
10. Schafer, J.B., Konstan, J.A., Reidl, J.: E-Commerce Recommendation Applica-
tions. In: Data Mining and Knowledge Discovery. Kluwer Academic, Dordrecht
(2001)
11. Pernkopf, F.: Bayesian network classifiers versus selective k-NN classifier. Pattern
Recognition (January 2005)
12. Balabanovic, M., Shoham, Y.: Fab: Content-Based, Collaborative Recommenda-
tion. Comm. ACM (1997)
13. Rocchio, J.J.: Relevance Feedback in Information Retrieval. In: Salton, G. (ed.)
SMART Retrieval System-Experiments in Automatic Document Processing, ch.
14. Prentice Hall, Englewood Cliffs (1979)
14. Pazzani, M., Billsus, D.: Learning and Revising User Profiles: The Identification
of Interesting Web Sites. Machine Learning 27, 313–331 (1997)
15. Littlestone, N., Warmuth, M.: The Weighted Majority Algorithm. Information and
Computation 108(2), 212–261 (1994)
Statistical Approach for Calculating the Energy
Consumption by Cell Phones

Shanchen Pang and Zhonglei Yu

College of Information Science and Engineering,


Shandong University of Science and Technology, Qingdao 266510, China
[email protected]

Abstract. Energy consumption by cell phones has great effect on energy crisis.
Calculating and optimizing the method of service that provided by the cell phone
is essential. In our solution, we build up three main models. Transition model
reflects the relationship between the change of energy and time; next we give the
function of energy consumption during the steady state. Optimization approach
structures the function of energy consumption and constructs the function with
convenience degree to emphasize the convenience of cell phones. Using waste
model we obtain the waste functions under different situations and get the total
waste energy.

Keywords: energy consumption, transition function, steady state, optimization


approach, waste model.

1 Introduction
Recently, the use of mobile computing devices has increased in computation and
communication. With the development of cell phones, landline telephones are even-
tually given up. We have noticed that people’s charger stays warm even it is not
charging the phone. All of these drain electricity. It’s not just wasting your money, but
also adding to the pollution created by burning fossil fuels [1]. According to investi-
gation, only 5% of the power drawn by cell phone chargers is actually used to charge
phones. The other 95% is wasted when you leave it plugged into the wall, but not into
your phone [2]. It is no doubts that calculate the energy consumption of cell phone and
optimize the method of service that provided by the landline and the cell phone is
significant for coping with the energy crisis. Although increases in the perceived like-
lihood of an energy shortage had no effect, increments in the perceived noxiousness or
severity of energy crisis strengthened intentions to reduce energy consumption [3].
Over the last decades, in order to reduce the energy waste of communication
Equipments, many academics and politicians had put forward some algorithms,
methods, models and arguments for energy consumption. In literature [4], several
authors compare the power consumption of an SMT (DSP) with a CMP (DSP) under
different architectural assumptions; they find that the SMT (DSP) uses up to 40% less
power than the CMP (DSP) in our target environment. To reduce the idle power,
Eugene Shih, Palaver Buhl introduces a technique to increase the battery lifetime of a
PDA-based phone by reducing its idle power, the power a device consumes in a

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 228–235, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Statistical Approach for Calculating the Energy Consumption by Cell Phones 229

"standby" state. Using this technique, we can increase the battery lifetime by up to
115%. In the paper, they describe the design of "wake-on-wireless" energy-saving
strategy and the prototype device they implemented [5].
In this paper, we use available data to build up a transition model and interpret the
steady state to study consequences of the change of electricity utilization after landlines
are replaced by cell phones. Then we consider an optimal way of phone service through
discussing three different cases and discover the convenience using cell phones instead
of landlines. Besides, we use a population growth model and an economic growth
model to predict the energy consumption by cell phones combining with the population
and economy in the future.
This paper is organized as follows. In section two, we design models of energy
consumption by cell phones. Besides, we analyze and discuss the relationship between
the energy consumption and the habit of people using cell phones. Section three is an
application of the models for “Pseudo US”. We find that Americans waste plenty of oil
because of their bad habits. In section four, we make a conclusion.

2 Design of Model
With the development of technology, cell phone usage is mushrooming, and many
people are using cell phones and giving up their landline telephones [6], [12]. Our
model just involves the “energy” consequences of the cell phone revolution. Here, we
make an assumption that every cell phone comes with a battery and a charger. We
design models in consider with the current US, a country of about 300 million people.

2.1 Relative Definitions

In this paper, we develop a model to analyze how the cell phone revolution impacts
electricity consumption at the national level. The basic component of our model is the
household. A household can exist in one of three disjoint states at a time. The three
states are as follows: (1) Initial State: a household only uses landline telephones. (2)
Acquisition State: when a household acquires its first cell phone. (3) Transition State:
all household members have their own cell phones but the landline is retained.

Definition 1. In order to describe the quality of countries’ phone system, we introduce


the concept of “Convenience”. The phone’s convenience degree has a relationship with
the mobile phone ownership. The larger the country’s mobile phone ownership, the
more convenient the country’s phone system is.

Definition 2. W e define “waste” in a simple way, it’s just the misuse of energy with
nothing utilization. Take the “waste” of the electricity into three different cases:
charging the cell phone when it turns on, continuing to charge cell phone after it has
fully charged and leaving the charger plugged in but not charging the device.

Definition 3. Compare with “waste”, we define “consumption” is that the common use
of the energy, no matter with high or low utilization, there is no waste of energy.
230 S. Pang and Z. Yu

2.2 Transition Model

If all the landlines are replaced by cell phones, there exists a change of electricity
utilization. Here, we assume that each family only has one landline phone and each
member just has one cell phone [7], and the energy consumption of the average cell
phone remains constant. We except for those who don’t have cell phones that they
belong to the family that owns a landline phone. If one loses his cell phone, he buys
new cell phone immediately. The energy consumed by cell phones is calculated
through the following formula:

W (t ) = ( H (t )mP1 − H (t ) P2 ) × t . (1)

Where H (t ) is the number of landline users at time t, it also symbolize the amount of
total family at time t, m is the average number of family members in the United States.
P1 is the average power of cell phones in the U.S. market. P2 is the average power of a
single landline.
As the change of W (t ) follows time t, it is possible that the energy consumption
can reach a steady state. Here, the “steady state” means that the growth of energy
consumption remains unchanged all the time. In mathematics, we calculate the derivate
of function (1), expressed as W ′(t ) .Just because

H (t ) = H (t0 )e ρ ×t . (2)

H (t0 ) Symbolizes the amount of total family at time t, we can consider it as a constant
at time t0 . ρ is the population growth rate of mobile phone users. Based on both
function (1) and (2), we can draw a conclusion that W ′(t ) . Generally, when W ′(t )
equals zero, the system reaches the steady state. Here, it is sure that
H (t0 ) ≠ 0 and m ≥ 1 . For that, we conclude that W ′(t ) cannot equal to zero. Con-
sequently, only when all landline users transform into mobile phone users, it can reach
steady state.

2.3 The Optimal Way of Providing Phone Service

Consider a second “Pseudo US” [8], a country with about the same economic status as
the current US. However, this emerging country has neither landlines nor cell phones.
We need to find an optimal way of providing them with phone service.

2.3.1 Excluding the Social Consequences


We discuss the broad and hidden consequences of having only landlines, only
cell phones, or a mixture of the two from an energy perspective. Firstly, we
don’t consider the convenience of the cell phone. We can get the total energy
consumption:
Statistical Approach for Calculating the Energy Consumption by Cell Phones 231

ω1
W = ω0 × ( P3 β 0 + P4 β1 ) × T + × P5 × T . (3)
m
There ω0 is the population of America who own cell phones. ω1 is the population who
don’t have cell phones, so the sum of ω0 and ω1 is the total population. T is the time
after charging from the beginning through to the next complete depletion when the
power in the mobile phone completely exhausted. P3 is the power of a cell phone when
maintain a cell phone call. P4 is the power of a cell phone when it is not used. P5 is the
average power of a single landline. β0 is the percentage of T when maintain a cell
phone call. β1 is the percentage of T when the cell phone is not used. There are three
different conditions when ω0 , ω1 takes different values:
(1) When ω0 = 0 , ω1 ≠ 0 . All of the people use landlines. At this time,
ω1
W= ×P 5 ×T .
m
(2) When ω0 ≠ 0 , ω1 ≠ 0 . That shows that some people use landlines while others use
cell phones. We can express the whole enery consumption as function (3).
(3) When ω0 ≠ 0 , ω1 = 0 . All of the people use cell phones.
W = ω0 × ( P3 β 0 + P4 β1 ) .

2.3.2 Including the Social Consequences


As the cell phone has many social consequences that landline phones don’t allow. So
we take the convenience of the cell phone into account. Here is the function:
ω1
C = c0 × ω0 + c1 × , c0 + c1 = 1 (4)
m
Where c0 is the convenient degree of mobile phones. c1 is the convenient degrees of
landline. We consider that min(W ) / max(C ) is the optimal way of providing phone
service.

2.4 Modeling for Energy Wasted by Cell Phones

Considering people who waste electricity in many ways, we divide them into three
basic situations. In every situation, we can conclude a waste function, so we can cal-
culate the accurate energy consumption. Here are the details:

2.4.1 Charge the Cell Phone When It Turns on


Take it for granted that charging the cell phone when it turns on is a waste of electrical
energy. From that, we have a waste function.
232 S. Pang and Z. Yu

w = p 3 × t1 × N ( t ) × γ 1 (5)
.

Where p3 the rated power of cell phones, t1 is the phone standby time. N (t ) is the
total population of the United States at time t . γ 1 is the proportion of Americans who
charge the cell phone with it turns on. In order to find the functional relationship be-
tween w and t , the relationship between the populations N (t ) and t have to be
confirmed firstly. Here, we can estimate the United States population N (t ) using a
Logistic model.

2.4.2 Charge the Cell Phone after It Is Fully Charged


There is a proportion of Americans who continue to charge cell phones after they are
fully charged [9]. We need to calculate the energy wasted by these people. We know
that two-thirds electronic energy was wasted by chargers, so we have the second waste
function:
2
w = × p 4 × t2 × N ( t ) × γ 2 . (6)
3
p4 is the power of cell phone while continue to charge cell phone after it is fully
charged. t2 is the time of fully charged cell phones continue to charge per day. N (t )
is the same as above. γ2 is the proportion of Americans who continue to charge cell
phone after it is full.

2.4.3 Charger Plugged in without Device


Some people prefer to plug the charger in whether they need to be recharged or not. In
this case, we construct the model for wasted energy as follows:

w = p 5 × t3 × N ( t ) × γ 3 (7)
.

Where p5 is the power of cell phone while leaving the charger plugged in but not
charging the device. t3 is the time of leaving the charger plugged in but not charging
the device per day. γ3 is the proportion of people who left the charger plugged in but
not charging the devices. We can simply take the power of charger as the main power
while continuing to charge after the cell phone is fully charged.

3 Application of Model
According to energy assumption that, the growth rate of mobile phone users is as same
as the economic growth rate. The above discussion is the current situation. Now con-
sider population and economic growth over the next 50 years.
Statistical Approach for Calculating the Energy Consumption by Cell Phones 233

3.1 Prediction Model

For each 10 years for the next 50 years, predict the energy needs for providing phone
service based upon your analysis in the first three requirements. Again, assume elec-
tricity is provided from oil. Interpret your predictions in term of barrels of oil. We use
the population and economic growth model to explain the energy needs for providing
phone service based upon the analysis in the first three parts.
Solow neoclassical model of economic growth adds to the quality of labor and
capital elements of the quality of the elements use the construction of Cobb-Douglas
production function model [10], we can get the model:

α ( λ ) = α (1 + λ ) . (8)

Using α (λ ) and the recursive type, we get data as follows.

Fig. 1. Energy Consumption

Fig.1 shows the energy needs over the next 50 years, and signifies the energy con-
sumption in terms of oil.

3.2 An Application of the Models for “Pseudo US”

We consider a second “Pseudo US”-a country of about 300 million peoples with about
the same economic status as the current US. Cell phones periodically need to be re-
charged. However, many people always keep their charger plugged in. Additionally,
many people charge their phones every night, whether they need to be recharged or not.
This causes a large quantity of energy consumption. Assume that the Pseudo US sup-
plies electricity from oil. Take the situation that people continue to charge their cell
phones after they are fully charged as an example, we can calculate the wasted energy
234 S. Pang and Z. Yu

according to formula (6). For a particular mobile phone, the battery capacity of cell
phone C=850(mA) [11], [12], and the voltage of cell phone battery V=3.7 (v). Thus,
p 4 = ( C × V ) / 1000 = 3.145 w . Take t = 2009 , t2 =5 , γ 2 =5% [13]. Then we
get the result w = 1.2721×10
8
J . According to oil calculation, American people
waste B = w / w4 = 12.085 barrels per day through this way. Similarly, for the
other two situations, they are 7.411 and 20.804 barrels respectively. Thus, American
waste 40.3 barrels oil per day.

4 Conclusions
From the models we build, with the landlines are replaced by cell phones, there exists a
change of electricity utilization. We make a transition model to estimate the con-
sumption of enery, and get the steady model that the growth of energy consumption
remains unchanged. We realize that the amount of energy consumption of phones is
very large. At present, the energy crisis becomes more and more serious. So, we have to
make the most use of the energy and save the energy.
However, our models still exist weaknesses. Our model doesn’t examine all
household member dynamics, i.e., members getting born, growing old enough to need
cell phones, moving out, starting households of their own, etc. Another is ignores In-
frastructure. We do not examine the energy cost of cellular infrastructure (towers, base
stations, servers, etc.) as compared to the energy cost of landline infrastructure (i.e.
telephone lines and switchboards).

References
1. Robert, L.H.: Mitigation of Maximum World Oil Production: Shortage scenarios. Energy
Policy 36, 881–889 (2008)
2. Mayo, R.N., Ranganathan, P.: Energy Consumption in Mobile Devices: Why Future Sys-
tems Need Requirements–Aware Energy Scale-Down. In: Falsafi, B., VijayKumar, T.N.
(eds.) PACS 2003. LNCS, vol. 3164, pp. 26–40. Springer, Heidelberg (2005)
3. Hass, J.W., Bagley, G.S., Rogers, R.W.: Coping with the Energy Crisis: Effects of Fear
Appeals upon Attitudes toward Energy Consumption. Journal of Applied Psychology 60,
754–756 (1975)
4. Stefanos, K., Girija, N., Alan, D.B., Zhigang, H.: Comparing Power Consumption of an
SMT and a CMP DSP for Mobile Phone Workloads. In: The 2001 International Conference
on Compilers, Architecture, and Synthesis for Embedded Systems (2001)
5. Eugene, S., Paramvir, B., Michael, J.S.: Wake on Wireless: An Event Driven Energy Saving
Strategy for Battery Operated Devices. In: 8th Annual International Conference on Mobile
Computing and Networking, pp. 160–171 (2002)
6. Singhal, P.: Integrated Product Policy Pilot Project. Nokia Corporation (2005)
7. Paolo, B., Andrea, R., Anibal, A.: Energy Efficiency in Household Appliances and Lighting.
Springer, New York (2001)
8. Tobler, W.R.: Pseudo-Cartograms. The Am. Cartographer 13, 40–43 (1986)
Statistical Approach for Calculating the Energy Consumption by Cell Phones 235

9. Sabate, J.A., Kustera, D., Sridhar, S.: Cell-phone Battery Charger Miniaturization. In: In-
dustry Applications Conference, pp. 3036–3043 (2000)
10. Meeusen, W., Broeck, J.: Efficiency Estimation from Cobb-Douglas Production Functions
with Composed Error. International Economic Review 9, 435–444 (1977)
11. Toh, C.: Maximum Battery Life Routing to Support Ubiquitous Mobile Computing in
Wireless ad hoc Networks. IEEE Communications, 138–147 (2001)
Comparison of Ensemble Classifiers in
Extracting Synonymous Chinese Transliteration
Pairs from Web

Chien-Hsing Chen1 and Chung-Chian Hsu2


1
Department of Information Management, Hwa Hsia Institute of Technology,
111 Gong Jhuan Rd., Chung Ho, Taipei, Taiwan
2
Department of Information Management, National Yunlin University of
Science and Technology, 123 University Road, sec. 3, Douliou, Yunlin, Taiwan
[email protected], [email protected]

Abstract. There is no transliteration standard across all Chinese language re-


gions, including China, Hong Kong, and Taiwan, and variations in Chinese
transliteration have thus arisen in the process of transliterating foreign languages
(English, for instance) into the Chinese language. In this paper, we compare
several ensemble classifiers in confirming a pair, that is, a transliteration and
another term, whether it is synonymous. We construct a new confirmation
framework to confirm whether a pair of a Chinese transliteration and another
Chinese term is synonymous. The presented framework is applied to extract
synonymous transliteration pairs from a real-world Web corpus; this is valuable
to build a new database of synonymous transliterations or support search engines
so that they can return much more complete documents as Web search results to
increase the usages in practice. Experiments show that our integrated confirma-
tion framework is effective and robust in confirming and extracting pairs of
Chinese transliteration following the collection of synonymous transliterations
from the Web corpus.

Keywords: Chinese transliteration variation, pronunciation-based approach,


ensemble scheme, boosting, and bagging.

1 Introduction
There is no transliteration standard across all Chinese language regions; thus, many

悉悉 雪雪 雪雪
different Chinese transliterations can arise. As an example, the Australian city “Syd-
ney” also produces different transliterations of (xi ni), (xue li) and (xue
li). Someone who uses the Chinese language may never know all these different Chi-
nese synonymous transliterations; hence, this level of Chinese transliteration variation
leads readers to mistake transliterated results or to retrieve incomplete results when
searching the Web for documents or pages if a trivial transliteration is submitted as the
search keyword in a search engine such as Google or Yahoo. Moreover, while varia-
tions in Chinese transliteration have already emerged in all Chinese language regions,
including China, Hong Kong and Taiwan, we still lack effective methods to address
this variation. Most research focuses on machine transliteration across two different

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 236–243, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Comparison of Ensemble Classifiers in Extracting Synonymous Chinese Transliteration 237

languages; in contrast, fewer efforts in the literature have focused on confirming a pair
comprised of a Chinese transliteration term and a Chinese term (or another Chinese
transliteration) as to whether it is synonymous.
In this paper, we compare several ensemble classifiers in confirming a pair whether
this pair is “synonymous” or “not synonymous”. We first construct an integrated con-
firmation framework (ICF) with considering the use of a majority voting scheme and a
boosting scheme [1] together to robustly confirm pairs, since majority voting and
boosting have been used to reduce noise and over fitting in terms of training classifiers.
Then, the well-known ensemble classifiers, boosting [1] and bagging [2] are applied on
this classification problem. The contribution of this research lies in that the results of
the confirmation framework can be applied to construct a new database of synonymous
transliterations, which can then be used to increase the size of the transliterated voca-
bulary, making it useful to expand an input query in search engines such as Google and
Yahoo. This could alleviate the problem of incomplete search results stemming from
the existence of different transliterations of a single foreign word.

2 Decision-Making

2.1 Similarity Evaluation among Sequences Based on Pronunciation Approaches

Two major steps are included in the framework for the sake of confirming whether a
pair is synonymous. First, we study two Romanization transcription systems, including
the National Phonetic System of Taiwan (BPMF system) and the Pinyin system, to
transcribe Chinese characters into sound alphabets. The BPMF system is used to
transcribe a Chinese character into a phonetic sequence for the use of CSC [3] and LC
[4]; the Pinyin system is used for ALINE [5], FSP [6]and PLCS [7].
Measuring similarity for two sets of sound alphabet sequences produces a similarity
score between two transliterations. Assume that we have two Chinese transliterations A
={a1, …, an, …, aN} and B ={b1, …, bm, …, bM}, where an is the nth character of A and
bm is the mth character of B. N may not be equal to M. The characters an and bm are
formed into sound alphabet sequences an ={ an,1, …, a n,i, …, a n,I} and bm ={ bm,1, …, b
m,j, …, b n,J}, respectively. The alphabets an,i and bm,j are generated by either the BPMF
system or the Pinyin system.
Second, we use a dynamic programming-based approach to obtain the similarity
score for a given Chinese pair; that is, a Chinese transliteration versus another Chinese
term. To acquire the maximum similarity score between two sets of sound alphabet
sequence (formed from A and B, respectively), which is represented as score(A,B), a
dynamic programming-based approach can be used to acquire the largest distortion
between A and B by adjusting the warp on the axis of T(n,m) of sim(an,bm), which
represents the similarity between an and bm. The recursive formula (1) is defined as
follows.
ܶሺ݊ െ ͳǡ ݉ െ ͳሻ ൅ ‫݉݅ݏ‬ሺܽ௡ ǡ ܾ௠ ሻ
ܶሺ݊ǡ ݉ሻ ൌ ƒš ቐܶሺ݊ െ ͳǡ ݉ሻ (1)
ܶሺ݊ǡ ݉ െ ͳሻ
238 C.-H. Chen and C.-C. Hsu

where the base conditions are defined as { ,0 = 0 and 0, = 0. To


avoid longer transliterations that appear to be more similar and thus acquire higher
T(N,M), the similarity score must be normalized, taking into account the average length
of the transliterations, as defined below.
,
, (2)
/

where the formula respects the similarity range [0,1]; accordingly, the two normalized
scores in the above examples are 0.87 and 0.71, respectively.

2.2 Definition and Decision-Making Using a Similarity Entity

Let X be a dataset containing a set of n data pairs, and let be a pair consisting of
a transliteration and another Chinese term, which corresponds to class label yj Y,
representing a synonymous pair or not a synonymous pair. Let M = {m1, …, mI} be a set
of pronunciation-based approaches, where mi is the ith approach in M. For a pair
, let scorej = {scorej,1, …, score j,I} be a set of similarity scores, where scorej,i is
measured by mi (to use formula (2)) for xj, and then let vj = {vj,1, …, v j,I} be a set of
decisions, where vj,i is a decision (i.e., a vote) taken from scorej,i. In particular, a pair xj
has three entities, namely, yj, vj and scorej.
The similarity entity scorej drives the decision entity vj. Most studies in the literature
often take a vote, represented as vj,i that is accepted when scorej,i ≧ θi, whereas vote vj,i
is rejected when scorej,I < θi. The parameter θi is a threshold. Determining a higher
value for θi often brings higher precision but lower recall, whereas determining a lower
value θi often brings lower precision but higher recall. Nevertheless, the determination
of the appropriate parameters is usually empirical in many applications of
information retrieval.
Instead of requiring the parameters , we use the K-nearest neighbor algorithm
to obtain vi with the help of scorej, because it provides a rule that xj can be classified
according to its K nearest neighbor pairs; by the same token, the vote vj,i is assigned by
a majority vote on , with respect to , , where “j → k”
represents the kth nearest neighbor training pair of xj. Initially, we denote , = yr in
advance if xr is a training pair.
Since a majority-voting scheme is a well-known integrated voting approach to
generate a final decision, it is applied to obtain a class label. The class label yj is de-
termined using a majority-voting scheme on vj. In particular, the voting function h(xj)
determines a predicted class label via a majority vote of , and is
written as
argmax ∑ δ , (3)
where the function δ returns a Boolean value.

2.3 Hypotheses Combination on the Confirmation of Chinese Transliteration


Pairs

The ensemble framework proposed in this paper considers the use of multiple learning
approaches M = {m1, …, mI} and multiple data fractions X1, X2, …, XT. Let be
Comparison of Ensemble Classifiers in Extracting Synonymous Chinese Transliteration 239

a set of learning approaches based on the pronunciation model. Let be a set of


training datasets generated by the use of a boosting scheme [8] fitting variant data
distributions of a participative training dataset for accurate confirmation performance.
Following the generation of variant data distributions, Xt is evolved from Xt-1 using a
bootstrapping strategy in the training process. It is worth mentioning that a pair must be
learned more frequently when it is not easy to confirm. In other words, a pair xj will
appear much more possible in Xt while acquiring the wrong predication class label in
Xt-1. In contrast, xj, while contributing the correct class label in Xt-1, may not appear in
Xt.
The final voting function , which is to integrate multiple votes of number T
as a final vote, is performed while allowing T rounds. Thus, a T-dimensional voting
vector is made for each xj and given via . Additionally, a learning round
acquiring an accuracy rate lower than the random guess accuracy (be 1-1/|y|) will not
contribute to the final vote. The function Hfin for xj is written as
argmax ∑ | . δ (4)
where ht represents an integrated vote for at the tth round, and the function δ
returns a Boolean value. We extend h(.), which was mentioned in formula (6), as a
weighted majority-voting function ht(.) related to the various contributions of the set of
approaches at the tth round. In addition, the extended formula must take the
parameter t into account. The extended equation is written as
argmax ∑ δ , (5)

Providing different voting confidences for a repeatable learning procedure is indeed


necessary. In other words, it is quite understandable that have different
weights with respect to their capabilities in their corresponding learning spaces
; in addition, the capabilities of comparison approaches . Two weighted
entities wt and are learned from the learning property for round t. We write

(6)

where is the probability of training error at the tth round. In addition, we also write

(7)

where is the probability of training error in the comparison approach mi at the tth
round.
The entities are good candidates for driving the data distribution for Xt.
The xj obtaining the correct vote at round t will receive a lower probability value
Dj(t+1) and will be less likely to be drawn at round t+1. Dj(t+1) is expressed as

‫ܦ‬௝ ሺ‫ݐ‬ሻ‹ˆ݄௧ ሺ‫ݔ‬௝ ሻ ് ‫ݕ‬௝


‫ܦ‬௝ ሺ‫ ݐ‬൅ ͳሻ ൌ ൝ ఌ೟  (8)
‫ܦ‬௝ ሺ‫ݐ‬ሻ ൈ ‡Ž•‡
ଵିఌ೟
240 C.-H. Chen and C.-C. Hsu

3 Experiments

3.1 Preparation of the Training Dataset

The data source is selected from the study in [3] in which the dataset contains a total of
188 transliterations collected from Web news sources. These transliterations are proper
names, including geographic, entertainment, sport, political and some personal names.
They are built as a set of pairs, some of which are synonymous and others of which are
not synonymous pairs. In other words, the class label of each pair is known in advance.
The pairs are constructed as a training dataset and are used for decision-making.
In particular, a total number of 17,578 unique pairs (C ) is obtained. However, we
only allow the length diversity of the pair to be one because the length of differentiation
between a Chinese transliteration and its actual synonym is often at most one [3]. From
this point of view, many pairs can be ignored without allowing the length diversity to
exceed one; thus, we retain a total of 12,006 pairs, which include 436 ac-
tual-synonymous pairs and 11,570 pseudo-synonymous pairs (i.e., pairs that are not
synonymous).
In order to reduce the likelihood of participative training data driving confirmation
performance as well as to ignore the influences of an imbalanced training dataset, we
perform a validation task involving ten different datasets selected from the training data
by sampling without replacement and thus ensure the number of positive pairs is the
same as the number of negative ones. Therefore, ten training datasets, each of which
includes 436 positive pairs and 436 negative ones, are used for the experiments.

3.2 Description of the Input Transliterations

Two datasets, D50 and D97, are used for the experiments in [9] and contain translite-
rations. The second dataset, referred to as D97, is from the 2008 TIME 100 list of the
world's most influential people. There are a total of 104 names in the list, since four
entries include two names. Ninety-seven names are retained for the experiment. Seven
names are ignored, namely, Ying-Jeou Ma, Jintao Hu, Jeff Han, Jiwei Lou, Dalai Lama,
Takahashi Murakami, and Radiohead. The first five have Chinese last names that have
standard Chinese translations. The sixth term is a Japanese name for which translation
is usually not done using transliteration. The last name is that of a music band; its
translation to Chinese is not according to its pronunciation, but its meaning.

3.3 Constructing Pairs from the Web

In this experiment, we input the transliterations in D50 and D97 to collect their syn-
onyms from a real-world Web corpus using the integrated confirmation framework
proposed in this paper. For each transliteration, we collected Web snippets by submit-
ting a search keyword to the Google search engine. The search keyword is used to
retrieve Web snippets; however, it does not contribute information to the confirmation
framework, which determines whether a pair is synonymous.
To construct a pair, we use the original term of the given transliteration as a search
keyword, because the original term is able to retrieve appropriate Web documents in
which the transliteration’s synonyms appear. Let a transliteration (abbreviated as TL)
Comparison of Ensemble Classifiers in Extracting Synonymous Chinese Transliteration 241

be an entry. The TL’s original term (abbreviated as ORI), which is treated as the search
keyword for the search engine, is represented as QOri and is submitted to retrieve search
result Web snippets, represented as DOri. The set DOri is limited to Chinese-dominant
Web snippets. The procedure of returning a pair by collecting Web snippets from the
Google search engine is as follows.
A. For each TL in D50 and D97, we use QORI to download Web snippets DORI. In
particular, we set |DORI | to 20 for each TL because the snippets appearing at the
head of the returned snippets are often more relevant to the research keyword. The
size of the downloaded DORI for D50 is 1,000, whereas the size of the downloaded
DORI for D97 is 1,940.
B. We delete known vocabulary terms with the help of a Chinese dictionary for DORI
and apply an N-gram algorithm to segment Chinese n-gram terms for the remain-
ing fractional sentences in DORI. Furthermore, most synonymous transliterations
(TLs with their STs) have the same length, but some of them have different lengths
of at most one [3]. Therefore, we retain the Chinese terms from DORI while con-
trolling for length. Each Chinese term of length N is retained, with N=|TL|-1 to
N=|TL|+1 and N ≥ 2.The number of remaining pairs for D50 is 9,439, whereas that
for D97 is 19,263, where the pair consists of a given TL and a remaining Chinese
n-gram term.
C. However, some pairs have similarities that are not high enough and thus are never
considered synonymous pairs. We set a similarity threshold to ignore those pairs.
According to the findings in [3], a lower similarity threshold can be set to 0.5 by
using the CSC approach to cover effectively all examples of synonymous transli-
terations. After discarding the pairs with similarities lower than 0.5, a total of 2,132
and 5,324 pairs are retained for D50 and D97, respectively. These pairs are con-
firmed by the use of the framework proposed in this paper and will be discussed in
next section.

3.4 Confirmation of Synonymous Transliterations and Performance Analysis


The experiments demonstrate whether the proposed framework is effective in ex-
tracting synonymous transliterations from the Web. The following nine approaches are
employed for comparison in the experiments.

z The integrated confirmation framework (ICF): This is the ensemble framework


proposed in this paper.
z The majority-voting approach (MV): This is a simple ensemble approach. We use
equation (6) to perform this approach.
z The individual approach: There are five approaches, CSC, LC, ALINE, FSP and
PLCS, each of which is performed individually in the experiment.
A feature vector with five dimensions generated using these five approaches can be
performed for the experiments; hence, a classification-learning algorithm can be ap-
plied to predict the class label for each 5-tuple pair. The following two approaches are
popular for analyzing experiments in the literature and are employed for comparison in
this paper.
242 C.-H. Chen and C.-C. Hsu

z Bagging [2]: This combines multiple classifiers to predict the class label for a pair by
integrating their corresponding votes. The base algorithm for the classification we
used is KNN with k=5 due to its simplicity.
z Boosting [1, 8]: This requires a weak learning algorithm. We use KNN with k=5 in
this study.
ICF, bagging and boosting are the same in that they determine a parameter T, the
number of iterations. One study [8] set the parameter T to 10 to use the boosting
scheme. We follow the same setting for our experiments.
A total of ten results are obtained for the testing data in the experiment, since we
have ten training datasets involved in the validation process. The evaluator used for the
experiment is the accuracy measure, which is common in a classification task. More-
over, we use a box-plot analysis to graphically employ a total of nine approaches,
including ICF, boosting, bagging, MV, and five individual approaches (CSC, LC,
ALINE, FSP and PLCS). The results are shown in Figure 1.

(a) D50 (b) D97

Fig. 1. Box-plot analysis for nine approaches in the testing datasets (a) D50 and (b) D97

In Figure 2, the experimental results show that the average accuracy in the confir-
mation of Chinese transliteration pairs for three ensemble approaches (namely, ICF,
boosting, and bagging) is higher than that of the other individual approaches. This is
because the three ensemble approaches allow repeated learning of the variant data
distributions, whereas the individual approaches only perform the experiments once,
driven by the participative training datasets. In addition, ICF achieves an average ac-
curacy of 0.93 in D50 and 0.89 in D97 and is the best among nine approaches, because
it considers several individual approaches together in evaluating variant data distribu-
tions. Otherwise, CSC achieves an average accuracy of 0.88 in D50 and 0.85 in D97
and is the best of the five individual approaches. Moreover, a shorter distance between
the top and the bottom in a box-plot analysis demonstrates that ICF produces a much
more stable performance than the others do; in contrast, bagging produces the most
unstable performance among all ensemble approaches. This is because ICF best
achieves learning objectives with variant data distributions. Otherwise, all five indi-
vidual approaches produce a less stable performance than do the ensemble approaches,
because they are seriously affected by the training datasets.
Comparison of Ensemble Classifiers in Extracting Synonymous Chinese Transliteration 243

4 Conclusions
In this paper, we propose a new ensemble framework for confirming Chinese transli-
teration pairs. Our framework confirms and extracts pairs of synonymous translitera-
tion from a real-world Web corpus, which is helpful to support search engines such as
Google and Yahoo for retrieving much more complete search results. Our framework
considers the use of the majority-voting scheme and the boosting scheme together at
the same time. The experimental results were evaluated according to the proposed
framework in this paper, comparing boosting, bagging, general majority voting, and
five individual approaches. The experimental results demonstrate that the proposed
framework is robust for improving performance in terms of classification accuracy and
stability.

References
1. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of
the 13th International Conference on Machine Learning, pp. 148–156 (1996)
2. Breiman, L.: Bagging Predictors. Machine Learning 24, 123–140 (1996)
3. Hsu, C.C., Chen, C.H., Shih, T.T., Chen, C.K.: Measuring similarity between transliterations
against noise data. ACM Transactions on Asian Language Information Processing 6, 1–20
(2007)
4. Lin, W.H., Chen, H.H.: Similarity measure in backward transliteration between different
character sets and its applications to CLIR. In: Proceedings of Research on Computational
Linguistics Conference XIII, Taipei, Taiwan, pp. 97–113 (2000)
5. Kondrak, G.: Phonetic alignment and similarity. Computers and the Humanities 37, 273–291
(2003)
6. Connolly, J.H.: Quantifying target-realization differences. Clinical Linguistics & Phonetics,
267–298 (1997)
7. Gao, W., Wong, K.-F., Lam, W.: Phoneme-based transliteration of foreign names for OOV
problem. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS
(LNAI), vol. 3248, pp. 110–119. Springer, Heidelberg (2005)
8. Sun, Y., Wang, Y., Wong, A.K.C.: Boosting an associative classifier. IEEE Transactions on
Knowledge and Data Engineering 18, 988–992 (2006)
9. Hsu, C.C., Chen, C.H.: Mining Synonymous Transliterations from the World Wide Web.
ACM Transactions on Asian Language Information Processing 9(1), 1–28 (2010)
Combining Classifiers by Particle Swarms with Local
Search

Liying Yang

School of Computer Science and Technology, Xidian University, Xi’an, 710071, China
[email protected]

Abstract. Weighted combination model with appropriate weight vector is very


effective in multiple classifier systems. We presented a method for determining
the weight vector by particle swarm optimization in our previous work, which
called PSO-WCM. A weighted combination model, PSO-LS-WCM, was pro-
posed in this paper to improve the classification performance further, which ob-
tained the weighted vector by particle swarm optimization with local search.
We describe the algorithm of PSO-LS-WCM in detail. Seven real-world prob-
lems from UCI Machine Learning Repository were used in experiments to jus-
tify the validity of the approach. It was shown that PSO-LS-WCM is better than
PSO-WCM and the other six combination methods in literature.

Keywords: Multiple Classifier Systems, Combination Method, Particle Swarm


Optimization, Local Search.

1 Introduction
Combining classifiers is one of the most prominent techniques currently used to
augment the accuracy of learning algorithms. Instead of evaluating a set of different
algorithms against a representative validation set and selects the best one, Multiple
classifier systems (MCS) are to integrate several models for the same problem. MCS
came alive in the 90’s of last century, and almost immediately produced promising
results [1][2]. From this beginning, research in this domain has increased and grown
tremendously, partly as a result of the coincident advances in the technology itself.
These technological developments include the production of very fast and low cost
computers that have made many complex pattern recognition algorithms practicable
[3]. A large number of combination schemes have been proposed in the literature [4].
Majority vote is the simplest combination method and has been a much-studied sub-
ject among mathematicians and social scientists. In majority vote, each individual has
the same importance. A natural extension to majority vote is to assign weight to dif-
ferent individual. Thus weighted combination algorithm was obtained. Since under
most circumstances, there is difference between individuals, weighted combination
algorithm provides a more appropriate solution. The key to weighted combination
algorithm is the weights. Two weighted combination models based on particle swarm
optimization were proposed in our previous work [5][6]. In order to avoid the local
optimum in PSO-WCM, a new weighted combination model is proposed in this pa-
per, which cooperate PSO with local search to combine multiple classifiers.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 244–251, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Combining Classifiers by Particle Swarms with Local Search 245

2 Particle Swarm Optimization with Local Search

2.1 Particle Swarm Optimization

Everhart and Kennedy introduced Particle Swarm Optimization (PSO) in 1995, in


which candidate solutions are denoted by particles [7][8]. Each particle is a point in
the search space and has two attribute values: fitness determined by the problem and
velocity to decide the flying. Particles adjust their flying toward a promising area
according to their own experience and the social information in the swarm. Thus they
will at last reach the destination through continuous adjustment in the iteration. Given
a D-dimension search space, N particles constitute the swarm. The i-th particle is
denoted by xi = ( xi1 , xi 2 ,..., xiD ), i = 1, 2,..., N . Taking xi into the objective func-
tion, the fitness for the i-th particle can be work out, which could tell the quality of
current particle, i.e. the current solution. The current velocity and the best previous
solution for the i-th particle are represented by vi = (vi1 , vi 2 ,..., viD ) and
pi = ( pi1 , pi 2 ,..., piD ) . The best solution achieved by the whole swarm so far is
denoted by pg = ( pg1 , p g 2 ,..., pgD ) . In Everhart and Kennedy’s original version,
particles are manipulated according to the following equations:

vid (t +1) = vid (t ) + c1r1 ( pid − xid (t ) ) + c2 r2 ( pgd − xid (t ) ) . (1)

xid (t +1) = xid (t ) + vid ( t +1) . (2)

where t is the loop counter; i = 1,..., m ; d = 1,..., D ; c1 and c2 are two positive
constants called cognitive learning rate and social learning rate respectively; r1 and
r2 are random numbers in the range [0,1]. The velocity vid is limited in
[−vmax , vmax ] with vmax a constant determined by specific problem. The original
version of PSO lacks velocity control mechanism, so it has a poor ability to search at
a fine grain [9]. Many researchers devoted to overcoming this disadvantage. Shi and
Eberhart introduced a time decreasing inertia factor to equation (1) [10]:

vid (t +1) = μ vid ( t ) + c1r1 ( pid − xid (t ) ) + c2 r2 ( pgd − xid ( t ) ) . (3)

Where μ is inertia factor which balances the global wide-range exploitation and the
local nearby exploration abilities of the swarm. Clerc introduced a constriction factor
a into equation (2) to constrain and control velocities magnitude [11]:

xid (t +1) = xid (t ) + avid (t +1) . (4)

The above equations (3) and (4) are called classical PSO, which is much efficient and
precise than the original one by adaptively adjusting global variables.
246 L. Yang

2.2 Hill-Climbing Used as Local Search Algorithm in PSO

Hill-climbing is a typical local search algorithm used in many fields, partially due to
its easier implement and flexible transform in the particles. Aimed to avoid some
demerits in classical PSO, such as relapsing into local extremum and low convergence
precision in the late evolutionary, we adopted a hybrid algorithm of particle swarm
optimization and hill-climbing algorithm, PSO-LS called in [12]. In PSO-LS, each
particle has a chance of self-improvement by applying hill-climbing algorithm before
it exchanges information with other particles in the swarm. Hill-climbing used as
local search algorithm in our work is executed as follows.

Procedure of hill-climbing algorithm for local search


Step 1. Initialization: select an initial solution w0
and specify the neighborhood function as follows.

wnew = wcurrent + r (1 − 2rand ()) . (5)

where r represents the changing range of original


solution. Rand() is random values between 0 and 1.
Step 2. Set the max loop times T1 as a big enough number
or according circumstance and the loop counter
t1=1, set wcurrent=w0;
Step 3. While t1<=T1
Step 3.1 Generate wnew according to formula (5) ;
Step 3.2 Calculate the change in objective func-
tion Δ =C(wnew)-C(wcurrent);
Step 3.3 If Δ < 0 , accept solution wnew and set
wcurrent=wnew;
Step 3.4 t1= t1+1;
Step 4. End

3 Weighted Combination Model Based on PSO-LS

3.1 Weighted Combination Model

Weighted Combination Model (WCM) is an extension of voting method, in which

∑ w = W ,K is
K
there are two types of constraint [13]. One is Sum-W constraint: i
i =1

the number of classifiers. The other is non-negativity constraint: wi ≥ 0 for all


i=1, …, K. Consider a pattern recognition problem with M classes (C1 , C2 ,..., CM )
and K classifiers ( R1 , R2 ,..., RK ) . For a given sample x, Ri (i = 1,..., K ) outputs
ER i = (ei (1),..., ei ( M )) , where ei ( j )( j = 1,..., M ) denotes the probability that x
Combining Classifiers by Particle Swarms with Local Search 247

is from class j according to Ri . The weight vector for classifier ensemble is


K
represented as w = ( w1 ,..., wK ) with ∑w
k =1
k = 1 . Let E = ( ER 1 ,..., ER K ) . The
sample x is classified into the class with maximum posteriori probability and the deci-
sion rule is:
R M R
x → C j,If ∑ wi ei ( j )=max(∑ wi ei (k )) .
i =1
k =1
i =1
(6)

There are two methods for acquiring the weights in WCM. One set fixed weights
to each classifier according to experience or something else. The other obtains
weights by training. In the previous work, we proposed a combination algorithm
which determined the weights based on PSO (PSO-WCM) [5].

3.2 PSO-LS-WCM Algorithm

To improve the performance of PSO-WCM, a weighted combination model based on


PSO-LS, that is, PSO-LS-WCM is presented in this work. Optimal weights are
achieved by searching in K-dimension space. A solution is a particle in PSO-LS and
coded into one K-dimension vector w = ( w1 ,..., wK ) . Fitness function is computed
as combination model’s error rate on validation set using the weights.

Pseudo code of WCM-PSO-LS algorithm

Begin PSO-LS-WCM
Step 1.Initialize the parameters: swarm size N, max
loop times in local search T1, max iteration of
PSO T2;
Step 2. Randomly generating N particles;
Step 3. Calculate the fitness for each of the N
particles;
Step 4. Calculate Pi(i=1…N) and Pg, set t=1;
Step 5. While t<=T2
5.1 For every pi(i=1…N), do local Search as shown in
Section 2.2;
5.2 Update Pi and Pg;
5.3 Update the velocity according to formula (3);
5.4 Update the position according to formula (4);
5.5 Evaluate the fitness for each particle in current
iteration;
5.6 update Pi and Pg;
5.7 t=t+1;
Step 6. End while
End PSO-LS-WCM
248 L. Yang

4 Experiments and Discussion

4.1 Individual Classifier

Five classifiers used in this work are: (1) LDC, Linear Discriminant Classifier; (2)
QDC, Quadratic Discriminant Classifier; (3) KNNC, K-Nearest Neighbor Classifier
with K=3; (4) TREEC, a decision tree classifier; (5) BPXNC, a neural network classi-
fier based on MATHWORK's trainbpx with 1 hidden layer and 5 neurons in hidden
layer.

4.2 Data Sets

PSO-LS-WCM was applied to seven real world problems from the UCI repository:
Pima, Vehicle, Glass, Waveform, Satimage, Iris and Wine [14]. For each dataset, 2/3
examples were used as training data, 1/6 validation data and 1/6 test data. In other com-
bination rules or individual classifiers, 2/3 examples were used as training data and 1/3
test data. All experiments were repeated for 10 runs and averages were computed as the
final results. Note that all subsets were kept the same class probabilities distribution as
original data sets. The characteristics of these data sets are shown in Table 1.

Table 1. Characteristics of data sets

#Samples #Attributes #Classes


Pima 768 8 2
Glass 214 9 7
Iris 150 4 3
Satimage 4435 36 6
Vehicle 846 18 4
Waveform 5000 21 3
Wine 178 13 3

4.3 Configurations

Hill-climbing. Loop times in local search T1=5, r was initially set 0.1 times the search
space and linearly decreased to 0.005 times as iteration increased.

PSO. Since there are 5 classifiers, the number of weights is 5. A particle in PSO was
coded into one 4-dimension vector w = ( w1 , w2 , w3 , w4 ) . The fifth weight w5 was
5
computed according to ∑w
k =1
k = 1 . Classical PSO was adopted. Parameters were set
as following: size of the swarm N=10; inertia factor μ linearly decreases from 0.9 to
0.4; c1 = c2 = 2 ; constriction factor a =1; for i-th particle, each dimension in
position vector xi and velocity vector vi were initialized as random number in the
range [0,1] and [-1,1]; max iteration T2= 500.
Combining Classifiers by Particle Swarms with Local Search 249

4.4 Results and Discussion

The performance of individual classifiers was list in Table 2. It is shown that different
classifier achieved different performance for the same task, and no classifier is supe-
rior for all problems. For purpose of comparison, individual classifiers were combined
by majority vote rule, max rule, min rule, mean rule, median rule, product rule, PSO-
WCM and PSO-LS-WCM [15]. Ensemble learning performance was given in
Table 3.
It is shown that PSO-LS-WCM outperforms all comparison combination rules and
the best individual classifier on data sets Pima, Satimage, Vehicle, Waveform. These
data sets have a common characteristic, that is, the sample size is large. Therefore, the
optimal weights obtained on validation set are also representative on test set. The
same thing is not true on smaller data sets (such as Glass, Iris, and Wine) for the ob-
vious reason that over fitting tends to occur. Optimal weights might appear in initial
process, so the succeeding optimization makes no sense. But on these small datasets,
PSO-LS-WCM exhibits as good as the other methods or just obtains a median result,
which avoid selecting the worst classifier called Worst Case Motivation for Multiple
Classifier System.
From Table 3, we can also see that PSO-LS-WCM is better than PSO- WCM. The
error rates of the two combination methods were plotted in Fig. 1 in order to give an
intuitionistic comparison.

Table 2. Error rate of individual classifiers

Data Pima Glass Iris Satimage Vehicle Waveform Wine


sets
LDC 0.2252 0.3529 0.0083 0.1591 0.2114 0.1423 0
QDC 0.2441 0.6176 0.0167 0.1457 0.1600 0.1510 0.0071
KNNC 0.2535 0.3824 0.0333 0.1104 0.3557 0.1418 0.2571
TREEC 0.3354 0.2941 0.0667 0.1797 0.2857 0.2995 0.0929
BPXNC 0.2268 0.3824 0.0167 0.3268 0.1957 0.1380 0.1000

Table 3. Error rate comparison of combination algorithms

Data sets Pima Glass Iris Satimage Vehicle Waveform Wine


Majority 0.2205 0.3235 0.0083 0.1179 0.1771 0.1406 0.0071
Vote
Max Rule 0.2362 0.5294 0.0167 0.2088 0.1614 0.1550 0.0071
Min Rule 0.2362 0.5588 0.0250 0.3018 0.1743 0.1798 0.0071
Mean 0.2173 0.3529 0.0083 0.1336 0.1671 0.1401 0.0071
Rule
Median 0.2205 0.3529 0.0083 0.1146 0.1771 0.1397 0.0071
Rule
Product 0.2236 0.5682 0.0083 0.2904 0.1611 0.1412 0.0071
Rule
PSO- 0.2063 0.3529 0.0083 0.1080 0.1600 0.1369 0.0071
WCM
PSO-LS- 0.1863 0.3529 0.0083 0.0984 0.1571 0.1258 0.0071
WCM
250 L. Yang

0.35 PSO-WCM
PSO-LS-WCM
0.30

0.25
Error Rate

0.20

0.15

0.10

0.05

0.00
Pima Glass Iris Satimage Vehicle Waveform Wine
Data sets

Fig. 1. Error rates of PSO-WCM and PSO-LS-WCM on seven data sets

5 Conclusion
Evolutionary computing based weighted combination model is a very natural ap-
proach to linear combiners in ensemble learning. It trains base learners and combines
them with specific weights rather than identical ones to solve the problem. We present
a weighted combination method based on particle swarm optimization with local
search in this paper, namely PSO-LS-WCM, which can avoid the local optimum in
PSO-WCM that proposed in our previous work. Experiments were carried out on
seven data sets from the UCI repository. It is shown that PSO-LS-WCM performs
better than individual classifiers, majority voting rule, max rule, min rule, mean rule,
median rule, product rule, and PSO-WCM.

Acknowledgments. This work was supported by the Science and Technology


Research Development Program in Shaanxi province of China (No.2009K01-56) and
the Fundamental Research Funds for the Central Universities (No.K50510030007).

References
1. Hansen, L., Salamon, P.: Neural Network Ensembles. IEEE Transactions on Pattern
Analysis and Machine Intelligence 12, 993–1001 (1990)
2. Brown, G.: Ensemble Learning. In: Encyclopedia of Machine Learning. Springer Press,
Heidelberg (2010)
3. Suen, C.Y., Lam, L.: Multiple classifier combination methodologies for different output
levels. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 52–66. Springer,
Heidelberg (2000)
4. Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. John Wiley and
Sons, Inc., Chichester (2004)
Combining Classifiers by Particle Swarms with Local Search 251

5. Yang, L.-y., Qin, Z.: Combining Classifiers with Particle Swarms. In: Wang, L., Chen, K.,
S. Ong, Y. (eds.) ICNC 2005. LNCS, vol. 3611, pp. 756–763. Springer, Heidelberg (2005)
6. Yang, L., Zhang, J., Wang, W.: Selecting and Combining Classifiers Simultaneously with
Particle Swarm Optimization. Information Technology Journal 8(2), 241–245 (2009)
7. Kennedy, J., Eberhart, R.: Particle Swarm Optimization. In: IEEE International Conference
on Neural Networks, Perth, Australia, vol. 4, pp. 1942–1948 (1995)
8. Eberhart, R., Kennedy, J.: A New Optimizer Using Particle Swarm Theory. In: Proceeding
of the Sixth International Symposium on Micro Machine and Human Science, Nagoya,
Japan, pp. 39–43 (1995)
9. Angeline, P.J.: Ebulutionary optimization versus particle swarm optimization: philosophy
and performance differences. Evolutionary programming VII. In: Proceedings of the Sev-
enth Annual Conference on Evolutionary Programming (1998)
10. Shi, Y., Eberhart, R.: A modified particle swarm optimizer. In: IEEE World Congress on
Computational Intelligence, pp. 69–73 (1998)
11. Clerc, M.: The Swarm and the Queen: Towards a Deterministic and Adaptive Particle
Swarm Optimization. In: Proceeding of the Congress of Evolutionary Computation, vol. 3,
pp. 1951–1957 (1999)
12. Chen, J., Qin, Z., Liu, Y., Lu, J.: Particle Swarm Optimization with Local Search. In:
Proceedings of 2005 International Conference on Neural Networks and Brain Proceedings,
ICNNB 2005, pp. 481–484 (2005)
13. Tomas, A.: Constraints in Weighted Averaging. In: Benediktsson, J.A., Kittler, J., Roli, F.
(eds.) MCS 2009. LNCS, vol. 5519, pp. 354–363. Springer, Heidelberg (2009)
14. Blake, C., Keogh, E., Merz, C.J.: UCI Repository of Machine Learning Databases (1998),
http://www.ics.uci.edu/~mlearn/MLRepository.html
15. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Transac-
tions On Pattern Analysis and Machine Intelligence 3, 226–239 (1998)
An Expert System Based on Analytical Hierarchy
Process for Diabetes Risk Assessment (DIABRA)

Mohammad Reza Amin-Naseri and Najmeh Neshat

Department of Industrial Engineering, Tarbiat Modarress University, Tehran, Iran


[email protected]

Abstract. DIABRA (DIABetes Risk Assessment) is a knowledge-based expert


system developed to aid individuals to assess their chance for getting Type 2
diabetes. The system core is a quantitative model, implemented by Analytical
Hierarchy Process (AHP) mechanism, to evaluate the developed scenarios. The
acquired knowledge as scenarios are scored by AHP mechanism and repre-
sented in the DIABRA. The validation results show the expert system gives a
highly satisfactory performance when compared to human experts. In addition,
the computerized system shows additional advantages which can be used as
helpful tool to reduce the chance of getting Type 2 diabetes.

Keywords: Expert System; Artificial Intelligence; Analytical Hierarchy Proc-


ess; Diabetes Risk Assessment.

1 Background and Significance


According to the World Health Organization in 2000, more than 171 million people
worldwide (approximately, 2.8% of the population) suffer from diabetes [1]. Diabetes
mellitus is one kind of metabolic diseases in which a person has high blood sugar.
Type 2 diabetes as the most frequent type of diabetes, results from insulin resistance
of cells and sometimes combined with an absolute insulin deficiency [2].
The prevalence of Type 2 diabetes mellitus is increasing rapidly, and it is estimated
that by 2030, its prevalence will be almost double [1]. Whereas the trend of urbaniza-
tion and lifestyle changes, most importantly a "Western-style" diet, are the key factor
for the increase in incidence of diabetes, it becomes common throughout the world in
the more developed countries.
According to the American Diabetes Association, 8.6 million (approximately
18.3%) of Americans age 60 and older have diabetes [3]. Diabetes mellitus probabil-
ity increases with age, and the numbers of older persons with diabetes are expected to
grow so that the elderly population increases in number.
Risk assessment in diabetes plays a key role in achieving a long term goal of predic-
tive strategy in the period of adolescence and must be carried out accurately. There are a
lot of risk factors, which have different effects and complications, hence, it makes diffi-
cult to assess the risk in diabetes. In this context, decision aid tools and technologies can
be useful to assess the risk for diabetes. Expert systems (ES) as a branch of applied arti-
ficial intelligence (AI) provides powerful and flexible means for providing solutions to a
variety of problems. Expert systems in healthcare domain are widely studied where ac-
curacy and agility of the systems need to be paid more attention [4, 5].

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 252–259, 2011.
© Springer-Verlag Berlin Heidelberg 2011
An Expert System Based on Analytical Hierarchy Process for DIABRA 253

A major difficulty for reaching a correct assessment is the complexity of the risk
factors, as well as the vast amount of information (including psychosocial issues, age,
gender, information quality, and so on) that the expert must take into account, hence,
the process gets complicated and it is transformed into a difficult modeling.
The author’s objective is to develop an expert system for risk assessment in diabe-
tes which personalizes its decision based on incoming case. The proposed expert sys-
tem utilizes the results of Analytical Hierarchy Process (AHP) approach to improve
the risk assessment for Type 2 diabetes.
The rest of this paper is organized as follow: Section 2 reviews the literature. In sec-
tion 3 the research methodology is explained and AHP results are devoted to section 4.
Expert system development and validation are provided in section 5 and 6, respectively.
In the last section (section 7) we summarize our findings and present final remarks.

2 Review of Literature

There are a lot of successful application examples of developing expert systems in


healthcare domain. Most of them have been utilized by many researchers in the treat-
ment and diagnosis of various diseases, such as multiple sclerosis diagnosis [6], ano-
rexia diagnosis [7], bone tumors diagnosis [8], dangerous infection diagnosis and
treatment [9]. In this environment, applying other branches of AI such as artificial
neural networks and cased based reasoning approaches in expert systems can be
found [10, 11]. A decision support system was designed by a causal probabilistic net-
work, which was able to detect insulin effectiveness on a daily basis [12]. Mateo et al.
(2007) proposed healthcare expert system (HSE) where the services are customized
by human experts to choose the appropriate objects for services. In this cooperative
model, the communication of the components is based on multi-agent approach for
effective coordination of tasks [13]. Šušterši et al. (2009) developed a hierarchical
multi-attribute decision approach based on the Henderson's model of basic living ac-
tivities (BLA) to increase the practical efficiency of BLA model. The model was de-
veloped to evaluate a patient's health status and was tested in clinical practice by 17
nurses in two health centers in Slovenia [14].
Moreover, several studies in the literature have focused on improving the expert
systems performance by using Multi Criteria Decision Making (MCDM) mechanism.
Bohanec et al. developed the approach to the application of qualitative hierarchical
decision models that was based on an expert system shell (DEX). The proposed
model was used for four applications in healthcare domain: assessment of breast can-
cer risk, assessment of basic living activities in community nursing, risks assessment
in diabetic foot care, and technical analysis of radiogram errors [15]. Besides, an ex-
pert system has been developed to diagnose the psychological disorders integrated to
structured methodologies in decision support (Multi-Criteria Decision Analysis -
MCDA). In the intended study, the force relation between symptoms and causes is
established in order to create a hierarchy to set the rules of expert system [16]. As a
consequence, in this work MCDM method is conducted to prioritize and assign the
important weightings for risk factors i.e., the physical index, individual psychological
status, and the pattern type of blood sugar changes.
254 M.R. Amin-Naseri and N. Neshat

3 Research Methodology
Regarding to the aim of the study, identification of risk factors for Type 2 diabetes
has a key role in developing the intended expert system. In order to provide a com-
prehensive set of risk factors, the related secondary sources were investigated and
selected [17-22]. Some of identified risk factors which are grouped as Physical Risk
Factors (PRF) were gathered from previous studies addressing the correlation coeffi-
cient of these factors to the risk value for diabetes [18-22]. Hence, PRF score was
calculated using AHP approach (explained in section 4) with regards to the findings
of these studies. Next, several scenarios were developed in terms of different levels of
all identified risk factors (Table 1) and finally, scenarios were evaluated by human
experts (Knowledge acquisition).

Table 1. Detailed information of identified risk factors and their levels

Assigned
Risk Factor Type Value Comment
level
0 L- Number of times that
Fasting Blood Sugar 1 L FBS is more than 110
Numerical
(FBS) Index mg/dl in 5 sequential
>=2 L+
assessments.
Obesity Body Mass Index
(BMI) PRF < 2 L- (BMI) is a measure of
body weight relative
Diet to height (Kgr/m2).
Physical 2 <= PRF < P P

L Fatty diet is valued as


Risk 4
Age Numerical 1 and light diet is
Factors
valued as 0.
(PRF)
Blood 4 <= PRF L+ For more details, refer
Pressure (BP) to Table 3.

Female L-
Gender Categorical
Male L+
No L- Having a parent,
Family History Categorical brother, or sister with
Yes L+
diabetes.
No L-
Smoking status Categorical
Yes L+
Yes L-
Having Breakfast Categorical
No L+
Alcohol Drinking Categorical No L-
Yes L+
No L-
Feeling of stress Categorical
Yes L+
An Expert System Based on Analytical Hierarchy Process for DIABRA 255

Furthermore, the methodology employed to drive the expert system can be summa-
rized as four main phases:
Phase 1. Identifying the risk factors for Type 2 diabetes including the Physical
Risk Factors (PRF) and risk factors related to the lifestyle, medical,
family history and so on which are all presented in Table 1.
Phase 2. Knowledge acquisition with the help of several human experts and in-
formation acquired in the literature, several scenarios in an extended
range are developed and the chance of getting diabetes in scenarios is
evaluated by conducting AHP.
Phase 3. Knowledge representation utilizing FOOPES shell [23] so as to compile
the scenarios that capture the human expert’s knowledge in a rule based
knowledge base.
Phase 4. Validation of system based on comparative performance of human ex-
pert and expert system through some test samples.
In this work, the risk for Type 2 Diabetes is categorized into “low”, “medium”, and
“high” and assessed for each scenario designed by the human experts.

4 AHP Results

In phase 1 of implementation (explained in section 3), the identified risk factors were
categorized into two groups: physical risk factors (PRF) and risk factors related to
medical and family history and the like, due to differences in their knowledge acquisi-
tion process (Table 1).
Knowledge acquisition of PRF and their importance were carried out via conduct-
ing AHP framework through some scenarios developed by human experts of “Yazd
Research Center for Diabetes” and the rest of information was acquired from the lit-
erature [17-22]. The steps of conducting AHP framework through PRF are explained
as follows:
Step1: List the PRF.
Step2: Elicit pair wise comparison between the risk factors as prefer-
ence/importance matrix and normalize it (Table 2).
Step3: Develop the decision matrix.
Step4: Develop multiple preference/importance matrix to normalized deci-
sion matrix in order to obtain weighted score for each scenario
(Table 3).
Step 5: In this step, the level of each scenario for PRF can be determined
using Table 1.
Step 6: Some scenarios are developed in various level for all determined risk
factors and are assessed the risk factor for diabetes of each scenarios
by experts (Knowledge acquisition).
256 M.R. Amin-Naseri and N. Neshat

Table 2. Priorities among the factors by making a series of judgments based on pair wise com-
parisons of the factors

Overall
Factor Age Diet BMI BP
priority
Age 0.12 0.25 0.128205 0.090909 0.147279
Diet 0.04 0.083333 0.102564 0.090909 0.079202
BMI 0.48 0.416667 0.512821 0.545455 0.488735
BP 0.36 0.25 0.25641 0.272727 0.284784
Sum 1 1 1 1 1

Table 3. Weighted score of PRF for scenarios provided by synthesizing the priorities and nor-
malized score of risk factors

Factor Weighted
Age Diet BMI BP
Scenarios score (PRF)
S1 0.87 0.854167 0.893162 0.901515 3.5188442
S2 0.545 0.53125 0.561966 0.564394 2.2026098
S3 0.42 0.40625 0.384615 0.386364 1.597229
S4 0.94 0.947917 0.935897 0.931818 3.7556323
S5 0.67 0.65625 0.619658 0.621212 2.5671202
S6 0.44 0.447917 0.465812 0.462121 1.8158498
S7 0.525 0.489583 0.480769 0.488636 1.9839889
S8 0.73 0.708333 0.74359 0.75 2.9319231
S9 0.9 0.864583 0.833333 0.840909 3.4388258
S10 0.855 0.833333 0.861111 0.867424 3.4168687
S11 0.545 0.53125 0.561966 0.564394 2.2026098
S12 0.65 0.614583 0.598291 0.606061 2.4689345
S13 0.56 0.625 0.594017 0.575758 2.3547747
S14 0.69 0.697917 0.641026 0.636364 2.6653059
S15 0.565 0.5 0.523504 0.541667 2.1301709
S16 0.46 0.416667 0.42735 0.439394 1.743411

5 DIABRA Development
In order to represent the acquired knowledge, an expert system shell namely,
FOOPES (Fuzzy Object Oriented Programming Expert System) was used. Input data
was defined in “Scenarios and input variables data window” and scenarios were rep-
resented using “Ruled by Table Window” as shown in Figure 1.
A client requesting for the consultation is needed to run the DIABRA (DIABETES
RISK ASSESSMENT) file (by pressing F5). Input data by selection the nearest in-
tended item within limited range is entered to the DIABRA. The results of risk as-
sessment for Type 2 diabetes by DIABRA is reported using “Report window” as
shown in Figure 2.
An Expert System Based on Analytical Hierarchy Process for DIABRA 257

Fig. 1. Sample screens of DIABRA

Fig. 2. Sample screens of DIABRA (Report window)

6 DIABRA Validation
The developed expert system (DIABRA) should be tested to ensure a satisfactory
performance is achieved. Hence, five unseen test samples were presented to the
258 M.R. Amin-Naseri and N. Neshat

DIABRA and its results were compared to the human expert. The results of testing
and evaluation demonstrated good performance of the system when compared to hu-
man experts. According to the results of Table 4 all results are similar.

Table 4. The results of DIABRA validation through 5 test sample

Human
Family Having Stress DIABRA
# FBS PRF Gender Smoking Drinking expert
History Break. Score result
result
S1 3 3 Male NO Yes NO NO - High High
S2 2 2 Female NO NO Yes NO Yes Medium Medium
S3 1 2 Female Yes NO Yes NO - Low Low
S4 1 3 Male NO Yes NO NO NO Low Low
S5 3 3 Male Yes Yes Yes NO Yes High High

7 Conclusion and Final Remarks


An expert system model namely, DIABRA was developed in order to assess the risk
of Type 2 diabetes. To develop the system four phases of identification of risk factors,
knowledge acquisition based on AHP mechanism, knowledge representation, and
validation of the system have been done. The results show the expert system gives a
highly satisfactory performance when compared to human experts. In addition, the
computerized system shows additional advantages which can be used as helpful tool
to reduce the chance of getting Type 2 diabetes. The client should be noted that
knowledge acquisition of risk factors based on scenarios is addressed as the limitation
of this study.

References
1. Wild, S., Roglic, G., Green, A., Sicree, R., King, H.: Prevalence of Diabetes: Estimates for
2000 and Projections for 2030. Diabetes Care 27(5), 1047–1053 (2004)
2. Rother, K.I.: Diabetes Treatment-Bridging the Divide. The New England Journal of Medi-
cine 356(15), 1499–1501 (2007)
3. American Diabetes Association, Total prevalence of diabetes & pre-diabetes. Archived
from the original on February 08 (2006),
http://web.archive.org/web/20060208032127,
http://www.diabetes.org/diabetesstatistics/prevalence.jsp
(retrieved March 17, 2006)
4. Çinar, M., Engin, M.E., Engin, Z., Atesçi, Y.Z.: Early Prostate Cancer Diagnosis by Using
Artificial Neural Networks and Support Vector Machines. Expert Systems with Applica-
tions 36, 6357–6361 (2009)
5. Ezziane, Z.: Applications of Artificial Intelligence in Bioinformatics: A Review. Expert
System with Applications 30(1), 2–10 (2006)
6. Gaspari, M., Roveda, G., Scandellari, C., Stecchi, S.: An expert system for the evaluation
of EDS Sin multiple sclerosis. Artificial Intelligence in Medicine 25, 187–210 (2002)
An Expert System Based on Analytical Hierarchy Process for DIABRA 259

7. Pérez-Carretero, C., Laita, L.M., Roanes-Lozano, E., Lázaro, L., González-Cajal, J., Laita,
L.: A Logic and Computer Algebra-Based Expert System for Diagnosis of Anorexia.
Mathematics and Computer Sin Simulation 58, 183–202 (2002)
8. Lejbkowicz, I., Wiener, F., Nachtigal, A., Militiannu, D., Kleinhaus, U., Applbaum, Y.H.:
Bone Browser a Decision-Aid for the Radiological Diagnosis of Bone Tumors. Computer
Methods and Programs in Biomedicine 67, 137–154 (2002)
9. Lamma, E., Mello, P., Nanetti, A., Riguzzi, F., Storari, S., Valastro, G.: Artificial Intelli-
gence Techniques for Monitoring Dangerous Infections. IEEE Transactions on Information
Technology in Biomedicine 10(1), 143–155 (2006)
10. HyukIm, K., Sang Park, C.: Case-based Reasoning and Neural Network Based Expert Sys-
tem for Personalization. Expert Systems with Applications 32, 77–85 (2007)
11. Pandey, B., Mishra, R.B.: Knowledge and Intelligent Computing System in Medicine.
Computers in Biology and Medicine 39, 215–230 (2009)
12. Hernando, M.E., Gomez, E.J., Corcoy, R., del Pozo, F.: Evaluation of DIABNET, A Deci-
sion Support System for Therapy Planning in Gestational Diabetes. Computer Methods
and Programs in Biomedicine 62, 235–248 (2000)
13. Mark, A., Mateo, R., Gerardo, B.D., Lee, J.: Health Care Expert System Based on the
Group Cooperation Model. In: International Conference on Intelligent Pervasive Comput-
ing, Jeju Island, Korea, pp. 285–288 (October 2007)
14. Šušteršič, O., Rajkovič, U., Dinevski, D., Jereb, E., Rajkovič, V.: Evaluating Patients’
Health Using a Hierarchical Multi-Attribute Decision Model. Journal of International
Medical Research 37(5), 1646–1654 (2009)
15. Bohanec, M., Zupan, B., Rajkovic, V.: Applications of Qualitative Multi-Attribute Deci-
sion Models in Healthcare. International Journal of Medical Informatics 58-59, 191–205
(2000)
16. Luciano, C.N., Plácido, R.P., Tarcísio, C.P.: An Expert System Applied to the Diagnosis
of Psychological Disorders. In: International Conference on Intelligent Computing and In-
telligent Systems, ICIS, pp. 363–367. IEEE, Los Alamitos (2009)
17. Rimmi, E.B., Manson, J.E., Stampfer, M.J., Colditz, G.A., Willett, W.C., Rosner, B.: Oral
Contraceptive Use and the Risk of Type 2 diabetes Mellitus in a Large Prospective Study
of Women. Diabetologia 35, 967–972 (1992)
18. Sugimori, H., Miyakawa, M., Yoshida, K., Izuno, T., Takahashi, E., Tanaka, C.,
Nakamura, K., Hinohara, S.: Health Risk Assessment for Diabetes Mellitus Based on Lon-
gitudinal Analysis of MHTS Database. Journal of Medical Systems 22(1), 121–138 (1998)
19. Griffin, M.E., Coffey, M., Johnson, H., Scanlon, P., Foleyt, M., Stronget, N.M.: Universal
v.s. Risk Factor-Based Screening for Gestational Diabetes Mellitus: Detection Rates, Ges-
tation at Diagnosis and Outcome. British Diabetic Association Medicine 17, 26–32 (2000)
20. Park, J., Edington, D.W.: A Sequential Neural Network Model for Diabetes Prediction. Ar-
tificial Intelligence in Medicine 23, 277–293 (2001)
21. Anonymas: Am I at Risk for Diabetes?. National Institutes of Health National Institute of
Diabetes and Digestive and Kidney Diseases NIH Publication No. 04–4805 (2003)
22. Leontos, C., Gallivan, J.: Small Dteps, Big Rewards: Your Game Plan for Preventing Type
2 Diabetes. Journal of the American Dietetic Association 12(1), 143–156 (2008)
23. FOOPES: A Fuzzy Objective Oriented Program Expert System, Developed by Roozbehani
and Amin Naseri, Tarbiat Modares University (2004)
Practice of Crowd Evacuating Process Model with
Cellular Automata Based on Safety Training

Shi Xi Tang and Ke Ming Tang

YanCheng Teachers University Information Science & Technology College, YanCheng,


JiangSu, China
[email protected], [email protected]

Abstract. To solve the problem that the crowd evacuating process model with
cellular automata is quite different from the reality crowd evacuating process,
the crowd evacuating process model with cellular automata based on safety
training is addressed. The crowd evacuating process based on safety training is
simulated and predicted, and the result is very close to the reality. Using the
vertical way to place the shelves gets both a higher escaping rate and a larger
shelf area that the total area is up to 216m2, and the average death number is 4.2
by safety training when the fire level being 2.

Keywords: Cellular Automata, Fire, Crowd Evacuation, Safety Training.

1 Introduction
In recent years, Manny types of natural disasters and man-made events from the United
States 911 events to China WenChuan earthquake and Japan Miyagi earthquake [1]
occurred; it strengthened the study of crowd emergency evacuating process. Japanese
researchers Kikuji Togawa proposed the evacuation time formula in 1955. J. Fruin
raised the relation curves between the population average speed and population density
by using statistics method. Henderson gave the probability distribution formula of
crowd forward velocity by using the Maxwell-Boltzmann thermodynamics. Researchers
of Wuhan University and Hong Kong City University have established a network
evacuation model, which divided the building into the network reflecting the person
specific location in geometric space, and which analyzed the person moving speed
within the building by using Lagrangian method.
The key of emergency evacuation modeling is crowd evacuation modeling; its
essence is the pedestrian flow model implementation in a specific environment. A
model that can describe the evacuation accurately is needed to simulate hundreds of
thousands of human activities on the computer. Cellular automata as a mathematical
model framework which time, space, states are discrete has strong ability to simulate
various physical systems and natural phenomena by constructing the dynamic
evolution system through the interaction between elements. A series results about
pedestrian evacuation simulation based on cellular automata have been gained, which
are two floors field model proposed by C. Bur stedde and Kirchner [2-3], discrete
social force model proposed by L.Z. Yang [4], and dynamic parameters model

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 260–268, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Practice of Crowd Evacuating Process Model 261

proposed by Hao Yue[5]. These cellular automata models can simulate the macro
evacuation characteristics, which are jamming and clogging, faster-is-slower [1-7].
All evacuation models have a common problem that we try to build complex real-
life evacuation model by using cellular automata assumed to be random, without any
training, and having the crowd with a variety of complex psychological. All these are
the most complex phenomenon of life which is possible to achieve by using the fourth
cellular automata model at the edge of chaos. But the various results are studied based
on elementary cellular automaton mode, which are the theoretical reasons that can not
completely describe the objects' portraits. The crowd evacuating process model with
cellular automata based on safety training (EPMCAST) is raised in this paper to
study the evacuating process by using elementary cellular automata, and the
simulating and predicting results are more realistic than other models.

2 EPMCAST Model
Cellular automata is a time and space discrete power system, each cell distributed in
the regular grid takes in the finite discrete state following the same effect, updating
synchronously according to the local rules. A large number of cellular constitute the
evolution of dynamic system by simple interacting. Cellular automaton consists of the
basic cell, cellular space, neighborhood and rules. Cellular automata can be
considered as a cellular space and a transformation function in the space, which can
be expressed by a four-tuple [8]:

A= (d, S, N, f)
Dimension of cellular automaton is d, S is the finite and discrete status set of
cellular, N is a combination of cellular in neighborhood space, f is the rule of change,
is a conversion function. Unlike the general dynamic model, cellular automata are not
strictly defined by the physical equation or function, but by a series of rules
constructed with model. Models to meet all of these rules can be counted as a cellular
automaton model. Therefore, the cellular automata model is a general term for the
kind of models, or a methodological framework characterized by discrete time, space
and states. Each variable takes only a finite number of states, and the rules changing
the state are local in time and space. We need to construct cellular automaton
according to the actual research questions for there is no fixed mathematical formula.

2.1 Selecting Cell for EPMCAST

The grid of cellular automata includes the grid number, grid size and its boundary
conditions. The number and size of the grid based on simulation of the needs. The
two-dimensional grid structure is usually triangular, square and hexagonal [9].
Cellular automata adjacent space is named neighborhood, two-dimensional cellular
neighborhood types are Moore-type, extended type and Margolus Moore type.
EPMCAST considers not only the spread of fire, but also the evacuation. The impact
of safety training including the command role of training staff to assure a reasonable
direction, to ensure an orderly evacuation, and to ensure the protecting of fire and
smoke in the evacuating process, so that minimize the victims. We use the
262 S. Xi Tang and K. Ming Tang

quadrilateral grid-type structure and the eight direction neighborhood of Moore-type


and extended Moore-type as the cellular model in EPMCAST as shown in Figure 1.
The eight direction neighborhood of Moore-type is used to determine the direction of
the direction of fire spread and evacuation, and the extended Moore-type is used to
select the reasonable direction for evacuation.

   
   
   
   
   

Fig. 1. Automata Selection of EPMCAST

2.2 Replacement Model for Fire Spreading

The fire spreading state of cellular Aij is among 0 (not burning), 1 (not burning), 2
(just burn), 3 (burning) and 4 (off). The main factors affecting the fire spread include
the role of thermal radiation, the characteristics of buildings, the role of large fire
retardant elements, and the environmental impact. The probability of Cellular (i, j)
fire occurrence is
Qij = Wij ·Aij · Lij· H ij
Wij is the wind load effects, Aij is the impact indicators of building structure, Lij is
the impact indicators for the fire load, Hij is the collapsed factor coefficients by
considering the fire performance requirements and the risk factor coefficients of
building, and their values see [10,11]. Replacement of fire spreading is complied by
taking larger adjacent cellular Qij.

2.3 The Solution of Preference Matrix for EPMCAST

The probability of movement of crowd in different directions in the cellular space (i.e.
building) is signed with preference matrix M as shown in Figure 2. Preference matrix
element values are determined with the velocity v and the direction standard deviation

M-1,-1 M-1,0 M-1,1

M0,-1 M0,0 M0, 1

M1,-1 M1,0 M1, 1

Fig. 2. The preference matrix M in different direction in cellular space


Practice of Crowd Evacuating Process Model 263

σ [12]. Two-dimensional cellular Mij is composed of two one-dimensional cellular,


representing horizontal and vertical movements respectively. Horizontal one-
dimensional cellular neighborhood I is expressed as {-1, 0, 1}, the probability the
cellular moving toward neighborhood is pj (p-1, p0, p1), the probability of vertical
cellular automata is qi (q-1, q0, q1), which values are determined by the cellular current
moving velocity v and the direction standard deviation σ.
Mij=qi×pj
And
qi : q-1=σt2/2 q0= 1 -σt2 q1=σt2/2
σt is the standard deviation of the vertical direction
pj : p-1= (σh2+v2-v)/2 p0= 1 - (σh2+v2) p1= (σh2+v2+v)/2
σh is the standard deviation of horizontal direction
Standard direction deviation σ is determined by the current cellular moving
velocity, the effective value interval of σ is [σ1, σk], computing as
σ12= [| v| -1/2]2/4 σk 2= 1 -v2
Cellular choose the target cell location in the next time according to the preference
matrix, cell states are updated paralleling. If the target cellular locations are occupied
in the current time, time the current cellular does not move during the next moment. If
the target cell is not occupied, A lot of cellular may choose the same target element
cellular location next time according to the preference matrix, which may result in
conflict, and this is solved by calculating the relative movement probability of each
cellular to determine which cell movement, which cell does not move.

2.4 Obtaining the Relative Movement Probability of Cellular of EPMCAST

When more than one cellular compete in the same grid at the same time, only one
cellular left, the others continue to find the best location in accordance with the
current obtained moving direction, the probability coming in position (0, 0) by the
surrounding cellular is

ki , j • M i , j
p (i, j ) =
M 1, −1 + M 1,0 + M 0,−1 + M 0,1 + M −1,1 + M −1,1 + M −1,0 + M −1,1
kij is training factor of moving cellular (i, j), the evacuation crowd do not panic and
do not crowded, knowing the fire exits location and the expanding trend of fire and
smoking due to staff training, and its function is

kij = d smoke − m • min( (i p − il ) 2 + ( j p − jl ) 2 • (im − il )2 + ( jm − jl ) 2 )


l

(ip, jp) is the current cellular location, (il, jl) is the possible export position, (im, jm) is
the last person position of current from the potential export positions, dsmoke is the
264 S. Xi Tang and K. Ming Tang

of degree of fire and smoking in position (im, jm). The person position in the grid is
re-located after each crowd's time step is updated. This procedure is continued until
all crowd are evacuated out of the building or burned up.

3 Implementation of EPMCAST

3.1 The Real Background for Simulating

We take fire evacuation of Times Mall of Yancheng City, Jiangsu Province as the
actual background, and resulting in the best way to place shelves with maximum shelf
space and minimum fire casualties. The intermediate fire is often took place in
Yancheng. Power supply room is the most vulnerable to fire in Times Mall. The
safety export numbers must be greater than two according to state regulations. There
are two security exports in Times Mall, one is the elevator, and the other is the stairs,
but the former can not be used when fire broke, so the latter is considered. We
suppose shelves and cashier are making in metal, and non-combustible. The fire
equipments are failure in use for power failure when the fire occurs in power supply
room. The fire is put out by fire brigade after all crowd escaped. We design six
different shelve placing style by considering the various situations that may occur
during fire escaping, considering the purpose for profit of shopping mall, according to
architectural principles and fire escape means in public places as shown in table 1.

Table 1. Different shelve placing style

the shelve No. shelve placing style the total area (m2)
1 horizontal 14 * 16 = 224
2 horizontal 14 * 14 = 196
3 vertical 24 * 8 = 192
4 vertical 24 * 9 = 216
5 vertical 26 * 9 = 234
6 dispersed 18 * 7 = 126

3.2 Parameters Design of EPMCAST

We divide evacuating plane into uniform grid, each grid is occupied with obstacles,
crowd, or is empty in model space. Each person takes one cellular with space 0.5m ×
0.5m, and can move in any direction of up, down, left, right, upper left, lower left,
upper right, lower right within the unit time, corresponding to k = 1,2, ..., 8. k = 0
indicates that the cellular stagnant. Crowd's walking speed is 1m/s under normal
circumstances, so each time step is 0.5m / (1m / s) = 0.5s.

3.3 The Simulating Process

A two-dimensional global object array is constructed. Each object is assigned with a


status obtained by scanning the array, status 0 means empty space, status 1 means
Practice of Crowd Evacuating Process Model 265

crowd, status 2 means fire, status 3 means barrier. The bigger the number of fire level,
the lighter the fire, and a serious fire is happened when we select 1. The bigger the
number, the litter the number of crowd, a maximum crowd number is reached when
we select 1. The s placing style of shelve is inputted too. *
in red signs the fire
source of power room in supermarket, ⊙in blue signs the escaping crowd, ∩

emergency exit, signs the cashier, signs the fence in the recounting place,
and others in white are shelves. The crowd evacuating process with cellular automata
based on safety training is shown in Figure 3.

Fig. 3. The crowd evacuating process with cellular automata based on safety training

The escaping person chooses the best path to escape based on knowledge of escape
training according to the actual situation when the fire continued to spread. The
crowding phenomenon is appeared when there is too much escaping crowd in one
region, and the whole escape velocity is slow down, so the escaping person selects the
path by judging the situation of all exports. The evacuation is completed in 86
seconds, the total number of crowd is 422 before the fire, the number of escaped
crowd is 421, and the number of crowd killed in the fire is 1.

4 Analyzing the Simulation Results of EPMCAST


The escaping rate is different with the different shelf placing style, the simulation
results by placing the same way for 5 times is shown in Table 2-4. The number of
crowd in escaping is up to the limit, and the crowds in the Mall are in a
random distribution.
The climate of Yancheng City is temperate maritime climate, and the fire is often
intermediate fire [13]. Twelve people in cities of Jiangsu have been killed during the
266 S. Xi Tang and K. Ming Tang

first quarter of 2010[14], so the average death number is 4, which is very close to the
average death number 4.2 of simulation result with fire A. We recommend the fourth
shelf placing style for Yancheng Times Mall which is not easy to fire to get higher
escape rate and a larger shelf area when the number of escape crowd reaches the
upper. For the dry climate areas, we recommend the sixth shelf placing style to get
highest escape rate and to achieve 0 casualties for fires less than C.

Table 2. The evacuation simulation result of Yancheng Times Mall with cellular automata
based on safety training for fire A

The The The The The


death death death death death
the The
the number number number number number
total average
shelve of the of the of the of the of the
area death
No. first second third fourth fifth
(m2) number
simulati simulati simulati simulati simulati
ng ng ng ng ng
1 224 81 55 67 51 78 66.4
2 196 67 73 77 84 71 74.4
3 192 60 71 75 70 65 68.2
4 216 74 64 71 55 72 67.2
5 234 72 69 64 82 61 69.6
6 126 48 44 55 48 47 48.4

Table 3. The evacuation simulation result of Yancheng Times Mall with cellular automata
based on safety training for fire B

The The The The The


death death death death death
the The
the number number number number number
total average
shelve of the of the of the of the of the
area death
No. first second third fourth fifth
(m2) number
simulati simulati simulati simulati simulati
ng ng ng ng ng
1 224 9 3 7 8 9 7.2
2 196 11 9 11 11 15 11.4
3 192 6 2 3 7 4 4.4
4 216 2 3 6 5 5 4.2
5 234 7 3 5 5 5 5
6 126 1 1 3 1 2 1.6
Practice of Crowd Evacuating Process Model 267

Table 4. The evacuation simulation result of Yancheng Times Mall with cellular automata
based on safety training for fire C

The The The The The


death death death death death
the The
the number number number number number
total average
shelve of the of the of the of the of the
area death
No. first second third fourth fifth
(m2) number
simulati simulati simulati simulati simulati
ng ng ng ng ng
1 224 1 2 2 2 3 2
2 196 2 3 2 3 2 2.4
3 192 1 2 3 2 1 1.8
4 216 2 1 2 1 2 1.6
5 234 2 3 1 2 2 2
6 126 1 1 0 1 0 0.6

5 Conclusion
We design the crowd evacuating process model with cellular automata based on
safety training by using the evacuation strategies based on safety training. The
simulation results show that the crowd evacuating process model with cellular
automata based on safety training can simulate the emergency evacuation behavior of
supermarket shopping center, and the simulation results are very close to reality. This
simulation method is intuitive, flexibility and scalability, and provides good ideas for
the emergency management research. We will expand this method to study the more
complex evacuation situation in future.

References
1. Real-time information earthquake situation in Japan (March 27, 2011),
http://news.sina.com.cn/z/japanearthquake0311/index.shtml
(March 11, 2011)
2. Burst Edde, C., Klauck, K., Schadschneider, A., Zittartz, J.: Simulation of pedestrian
dynamics using a two dimensional cellular automaton. Physical A (S0378-4371) 295(3),
507–525 (2001)
3. Kirchner, A., Schadschneider, A.: Simulation of evacuating processes using a bionics-
inspired cellular automaton model for pedestrian dynamics. Physical A (S0378-
4371) 312(1), 260–276 (2002)
4. Zhao, D.L., Yang, L.Z., Li, J.: Occupants’ behavior of going with the crowd based on
cellular automata occupant evacuation model. Physica A 387, 3708–3718 (2008)
5. Hao, Y., Fu, S., Zhisheng, Y.: Based on cellular automata simulation of pedestrian
evacuation flow. Physics 7, 4523–4530 (2009)
6. Wang, D., Kwok, N.M., Jia, X., Li, F.: A cellular automata based crowd behavior model.
In: Proceedings of the 2010 International Conference on Artificial Intelligence and
Computational Intelligence: Part II, Sanya, China, October 23-24 (2010)
268 S. Xi Tang and K. Ming Tang

7. Peng, Y.-C., Chou, C.: Simulation of pedestrian flow through a ”t” intersection: A multi-
floor field cellular automata approach. Computer Physics Communications, 205–208
(January 2011)
8. Wolfman, D.: Cellular automata for traffic simulations. Physical A 263(1-4), 438–445
(1999)
9. Talia, D.: Parallel Cellular Environments to Enable Scientists to Solve Complex Problems
(1999), http://www.cscfac.uk/euresco99/presentations/Talia.ppt
10. Ohgai, A., Gohna, Y.i.i., Watanabe, K.: Cellular automata modeling of fire spread in built -
up areas - A tool to aid community -based planning for disaster mitigation. Computers,
Environment and Urban Systems 31(4), 441–460 (2007)
11. Xiaojing, M., Yang, L.-z., Jian, L.: Based on cellular automata model for urban areas the
probability of fire spread. China Safety Science Journal 18(2), 28–33 (2008)
12. Zhao, S.-Y., Su, G.-J., He, Y., Xu, X.-H.: Research of Emergency Evacuation System
Simulation Based on Cellular Automata. Journal of Chinese Computer Systems 28(12),
2220–2224 (2007)
13. The new standards of fire level (March 27, 2011),
http://www.dys.gov.cn/Public/2007/0707/10019152.html (July 07,
2007)
14. Fire statistics in (March 2010) (March 27, 2011),
http://www.js119.com/news/folder15/2010/0415/2010-04-
1575374.html (April 15, 2010)
Feature Selection for Unlabeled Data

Chien-Hsing Chen

Department of Information Management, Hwa Hsia Institute of Technology,


111 Gong Jhuan Rd., Chung Ho, Taipei, Taiwan
[email protected]

Abstract. Feature selection has been explored extensively for several real-world
applications. In this paper, we address a new solution of selecting a subset of
original features for unlabeled data. The concept of our feature selection method
is referred to a basic characteristic of clustering in that a data instance usually
belongs in the same cluster with its geometrically nearest neighbors and belongs
to different clusters with its geometrically farthest neighbors. In particular, our
method uses instance-based learning for quantifying features in the context of the
nearest and the farthest neighbors of every instance, such that using salient fea-
tures can raise this characteristic. Experiments on several datasets demonstrated
the effectiveness of our presented feature selection method.

Keywords: feature selection, nearest neighbors, farthest neighbors, approximate


clusters, and Least-Mean-Squares.

1 Introduction

Feature selection has been explored extensively for several real-world applications
such as text processing [1], image representation [2] and time-series prediction [3].
Recently, many extensive studies have been proposed for feature selection in unsu-
pervised learning, and the selected salient feature subset was found to be helpful for
cluster analysis.
In this paper, we present a new feature selection method, feature selection from the
nearest and the farthest neighbors (NF), which uses instance-based learning for quan-
tifying features in the context of the nearest and the farthest neighbors of every in-
stance. Unlike the previous studies in selecting salient features (e.g., for clustering), our
method does not need a prespecified clustering algorithm for training features and
potentially ignores the noisy features, providing strength in the process of delivering
salient features. This quantification is motivated in that one of the most well-known
characteristics in clustering is that an instance usually belongs in the same cluster with
its geometrically nearest neighbors and belongs to different clusters with its geometr-
ically farthest ones. The purpose of our feature selection method is to quantify features,
such that using salient features (i.e., with higher quantity) can raise this well-known
characteristic.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 269–274, 2011.
© Springer-Verlag Berlin Heidelberg 2011
270 C.-H. Chen

2 The Proposed Method


2.1 Quantifying Features Under the Approximate Clusters
Our method uses instance-based learning for quantifying features in the context of the
nearest and farthest neighbors of every instance and determines a salient feature, if this
feature better satisfies the feature compactness to the geometrically nearest neighbors
and the feature separability to the geometrically farthest neighbors for every instance in
the original data space. The instances are driven by the need to find the feature salience
vector, each of which indicates its salience for clustering. When the best vector is
obtained, the features can be ranked by their salience, and only a subset of features is
applied for clustering.
Assume that we have a dataset X of n data instances (i.e., X = {x1, …, xn}, where xi =
[x1,i, …, xj,i, …, xd,i]T means the ith instance in X with d dimensions and a non-zero
feature salience vector w(t) = [w(t)1, …, w(t)j, …, w(t)d]T), where the element w(t)j is a
real-valued quantity at the tth iteration. In addition, we define a nearest set (NS) and a
farthest set (FS) for an instance xi under a given w(t). The sets are defined as follows
(1 and 2).

, ; ,…, ; ,…, ; (1)

, ; ,…, ; ,…, ; (2)

where , and , include K and L instances in X, respectively. The symbol


th
, represents the k nearest neighbor of xi under w(t), and , represents
the lth farthest neighbor of xi under w(t). We take the different saliences of each feature
into consideration to obtain the nearest and the farthest sets. To simplify, we use the
weighted Euclidean metric to calculate the distance between two instances xi and xr
under w(t). This metric is shown as follows (3).

, | , , (3)

We then follow the idea of approximate clustering [4, 5] and approximate the frac-
tional instances, which include xi and the instances in , and , , as at least two
approximate clusters: the first cluster contains xi and the instances in , , and
another cluster does not simultaneously have xi and any instance in , . The reason
of obtaining these approximate clusters is because an instance usually belongs in the
same cluster with its geometrically nearest neighbors and belongs to different clusters
with its geometrically farthest ones.
We focus on those approximate clusters and further use an evaluation function to
evaluate the goodness of a feature, whether this feature is informative to those
approximate clusters or not. As such, the evaluation function we used in this paper was
Silhouette Width Criterion (SWC) [6], which is a well-known relative clustering va-
lidation and has been widely used to evaluate how a dataset can be partitioned into
Feature Selection for Unlabeled Data 271

subsets (clusters) in harvesting the cluster compactness for the instances within a
cluster and the cluster separability for the instances between clusters. The functions to
measure feature compactness (feature width from xi to its nearest neighbors) and fea-
ture separability (feature width from xi to its farthest neighbors) are respectively de-
fined as follows (4 and 5).

, , (4)

, , (5)

We evaluate intrinsic characteristics for every individual feature. The function f(a,
b) represents the average distance of b to all instances in a for every feature and is
applied to return , ,…, , ,…, , and
, ,…, , ,…, , . The aggregate function for and is shown as
follows (6).

(6)

where , ,…, , ,…, , represents a feature width vector


under the use of w(t) for an instance xi and the item is an adaptive parameter. The
larger value of , represents that the jth feature has a larger width for clustering be-
cause the jth feature can better raise feature compactness and feature separability for
those approximate clusters.

2.2 Searching the Best Feature Salience Vector


In this section, we attempt to search the best feature salience vector w of which goal is
to satisfy that the are constant for the particular instances . Recall
w(t) is used to find , and , applied to evaluate a set of feature width vectors
, which are further applied to reflect w(t). The criterion to determine the best
w is indeed a NP problem.
We reduce the searching problem to optimize the sum-of-squared error (i.e.,
|| – || ), where the is an adaptive parameter. The goal of the learning
task is to search the best w such that the error is minimized. In this paper, we utilize-
dagradient-descent-based approach, Least-Mean-Squares(LMS)[7]with some modifi-
cations, and used cooperative and competitive iterative strategies to find W= [w(1), …,
w(t), …, w(T)]T. Each iteration t only has one instance xt participated in learning. In
cooperation, however, all features in an instance xt are used to evaluate using
w(t).In competition, is used to perceive which feature can be more salient for
clustering and thus reflect on w(t+1). Furthermore, a(t) is a monotonically decreasing
learning coefficient, sothe updated function for w(t+1)was adaptively written as
follows (7).

1 α (7)
272 C.-H. Chen

3 Experiment

3.1 Parameter Setup

We denote the basic method as NF. Assume that we have a dataset including a total of
N instances. We first set an initial learning rate a(1) to 0.8 for our algorithm and de-
crease the learning rate a(t)= a(1)¯[(T-t)/T] at the tth iteration. The weight of every
element in the initial feature salience vector w(1) is set to be the same. The number of
iterations T should be large, such that most instances can be randomly selected for
training on the algorithm. We thus set T to 10N, where N is the size of the training
dataset. Empirically, the adaptive parameters and are set to 0.01 and 1, respec-
tively. The parameters K and L depend on the training instances and are empirically set
to√ 3.

3.2 An Application for Feature Selection on Image Recognition


We considered the NF method for selecting a subset of salient features from several
images, because image processing is demanding and not all features are equivalent for
the distance metric in image representation or recognition. In this test application, we
observed if our NF method could achieve feature selection for discriminating these
images. The variance method for feature selection is also discussed and compared to
our NF. In this experiment, we performed a very interesting application from the
viewpoint of discriminant analysis: we did not consider the rich and abundant tech-
nologies to capture the image features. Given a set of RGB images, our aim in this
application was to observe if the NF method could extract the pixels of gray-level
values, which are well discriminated between images.
The OT dataset [8, 9] consisted of 2,688 RGB images from eight categories, in-
cluding 360 coasts, 328 forest, 374 mountain, 410 open country, 260 highway, 308
inside cities, 356 tall buildings, and 292 streets. The size of each image was 256 ¯ 256
pixels. We collected one image from each of the categories as the dataset for the
analysis. These eight images are shown in Figure 1. The process of discriminating and
displaying the salient and non-salient features included three major steps.
First, each 24-bit RGB image was converted into a gray-scale image by displaying
an 8-bit intensity image. Each gray-scale image had a gray-level range from 0 to 255.
We followed the study [10] to calculate each gray-scale image as a gray-level histo-
gram with 28 dimensional features, where the first feature represents the frequency of
the gray-level 0, the second feature represents the frequency of the gray-level 1, and so
on. Hence, we had a total of eight instances with 28 features as a new dataset. Norma-
lization was applied to limit the feature values in the range of [0, 1].
Second, this dataset was input into the NF method, which outputs the feature sa-
lience vector w(T). All features could then be ordered based on the vector w(T). We
defined a parameter Rtr, which represents how much rate of the top-ranked features
were set to choose salient features. The features that were not chosen were treated as
non-salient. Within the context of this discussion, the Rtr was set to 0.5.
Finally, this information considering the salient and non-salient features was used to
provide the processed gray-scale images. We displayed the gray-scale images where
the salient features were focused; whereas the non-salient features were all ignored. In
particular, the pixels of the non-salient gray-level were all marked as the white pixels,
Feature Selection for Unlabeled Data 273

and the pixels of the salient gray-level were retained as the original gray-level pixels.
The processed gray-scale images using the selected features by NF method are shown
in Figure 2. Moreover, we also conducted a variance metric to select features for dis-
playing the gray-scale images. With a variance metric, a feature (or random variable)
with a higher variance is treated as more salient. The resulting gray-scale images using
the selected features by a variance metric are shown in Figure 3.
Comparing the results in Figures 2 and 3, we see the NF method better achieved
feature selection for discriminating these images. Using the NF method can depict
summary of the original images. This implies that the extracted salient features can
discriminate the difference among images. However, the method using a variance
metric often failed in achieving this goal. For example, the third image, “opencountry”,
we could still see that the image was clear (see third image in Figure 2). However, the
variance metric showed a very unclear image (see third image in Figure 3). Taking
another image into consideration affords the same conclusions (compare the last image
in Figures 2 and 3).

Fig. 1. The dataset consisted of eight RBG images

Fig. 2. The gray-scale images with 128 dimensions using the NF method

Fig. 3. The gray-scale images with 128 dimensions using a variance metric
274 C.-H. Chen

In addition, when the number of instances in a dataset is small, it will have less
shared context for the distance metric. From this example, it was interesting to see that
our NF method can still handle the task of feature selection for distinguishing images.

4 Conclusions
Our paper provides an interesting approach of using only a subset of features for un-
labeled data. Our presented method differs from filter-based methods, which usually
fail in selecting feature subsets for clustering because the number of clusters or the
clustered structure cannot be effectively predicted in advance. As such, our method
uses an instance-based learning for quantifying features in the context of the nearest
and the farthest neighbors of every instance such that all instances using the salient
features for clustering can increase cluster compactness for the instances within a
cluster and cluster separability for the instances between clusters. The experiments on
several datasets demonstrated that our presented method could effectively select a
feature subset that was adaptable in cluster analysis and achieved better performance
than variance metric in feature selection.

References
1. Lee, C., Lee, G.G.: Information gain and divergence-based feature selection for machine
learning-based text categorization. Information and Process Management 42, 155–165
(2006)
2. Wang, H., Li, P., Zhang, T.: Histogram features-based fisher lineardiscriminant for face
detection. In: Asian Conference on Computer Vision, pp. 521–530 (2006)
3. Crone, S.F., Kourentzes, N.: Feature selection for time series prediction - A combined filter
and wrapper approach for neural networks. Neurocomputing 73, 1923–1936 (2010)
4. Hathaway, R.J., Bezdek, J.C., Huband, J.M., Leckie, C., Kotagiri, R.: Approximate clus-
tering in very large relational data, in review. Journal of Intelligent Systems (2005)
5. Feder, T., Greene, D.: Optimal algorithms for approximate clustering. In: Proceedings of the
20th Annual ACM Symposium on the Theory of Computing, pp. 434–444 (1988)
6. Kaufman, L., Rousseeuw, P.: Finding groups in data. Wiley, Chichester (1990)
7. Haykin, S.S., Widrow, B.: Least-mean-square adaptive filters. Wiley, Chichester (2003)
8. Boutemedjet, S., Bouguila, N., Ziou, D.: A hybrid feature extraction selection approach for
high-dimensional non-Gaussian data clustering. IEEE Transactions on Pattern Analysis and
Machine Intelligence 3(8), 1429–1443 (2009)
9. Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the
spatial envelope. International Journal of Computer Vision 42, 145–175 (2001)
10. Otsu, N.: A threshold selection method from gray-level histogram. IEEE Transactions on
Systems, Man, and Cybernetics 9(1), 62–66 (1979)
Feature Selection Algorithm Based on Least Squares
Support Vector Machine and Particle Swarm
Optimization

Song Chuyi1, Jiang Jingqing2, Wu Chunguo3, and Liang Yanchun3


1
College of Mathematics, Inner Mongolia University for Nationalities, Tongliao 028043, China
2
College of Computer Science and Technology, Inner Mongolia University for Nationalities,
Tongliao 028043, China
3
College of Computer Science and Technology, Jilin University, Changchun 130012, China
[email protected]

Abstract. A hybrid feature selection algorithm based on least squares support


vector machine (LSSVM) and discrete particle swarm optimization is proposed
in this paper. The proposed algorithm takes advantage of the easy solving of
LSSVM, adopts LSSVM to construct classifier, and use accuracy as the main
part of fitness function on the process of particle swarm optimization. The
simulation results show that the proposed algorithm could obtain the features
which contribute a lot to classifier. Therefore the dimension of data is decreased
and the efficiency of classifier is improved.

Keywords: Feature Selection, Least Squares Support Vector Machine, Particle


Swarm Optimization.

1 Introduction
Classification is one of the major problems in the field of pattern recognition. The
accuracy of the classifier is related to the selection of classifier, the number of the
samples and the dimension of the sample. The feature selection is the key problem for
classification. Selecting a handful of most informative genes is a necessary step for
classification. With the development of science and technology, the dimension of
sample that obtains in some field is becoming larger and larger. So more and more
researchers pay attention to the feature selection and focus on the study of feature
selection.
Recently, the evolution algorithm based on biology intelligence is developed
rapidly. Meanwhile some feature selection algorithms based on intelligent algorithm
and their hybrid algorithm appeared. Shoorehdeli [1] proposed a feature subset
selection algorithm based on genetic algorithm and particle swarm optimization for
face detection. Huang [2] gave a feature selection algorithm based on double parallel
feed-forward neural networks and particle swarm optimization. Yu [3] gave a feature
gene selection algorithm based on discrete particle swarm optimization and support
vector machines. Qiao [4] presented a feature subset selection algorithm based on
particle swarm optimization and support vector machines. Dai [5] gave a fast feature
selection algorithm based on support vector machines.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 275–282, 2011.
© Springer-Verlag Berlin Heidelberg 2011
276 S. Chuyi et al.

For the better classification and generalization ability of the support vector
machines, and the least squares support vector machines converting the inequality
constrains into equality constrains to decrease the difficulty of training support vector
machines, a hybrid feature selection algorithm based on particle swarm optimization
and least squares support vector machines is proposed in this paper.

2 Basic Algorithm

2.1 Particle Swarm Optimization (PSO)

Particle Swarm Optimization was introduced by Kennedy and Eberhart [6] who had
been inspired by the research of the artificial livings. It is an evolutionary
computational model based on swarm intelligence.
In PSO, the possible solution for the optimization problem is imaged as a point in
the D dimension search space, which is called a particle. The particles fly in the
search space on some certain speeds. The speed adjusts dynamically according to the
particle itself flying experience and its partners’ flying experience. Every particle is
valued by a fitness which is calculated from the objective function. Every particle
records the best position that it experienced, which is denoted by pbest. The best
position that the colony experienced is called global best and denoted by gbest.
Suppose that the search space is D-dimensional and m particles form the colony[7].
The ith particle represents a D-dimensional vector Xi (i=1, 2, …, m). It means that the
ith particle locates at Xi = (xi1, xi2, …, xiD) (i=1, 2, …, m) in the search space. The
position of each particle is a potential solution. We calculate the particle’s fitness by
putting its position into a designated objective function. When the fitness is higher,
the corresponding Xi is “better”. The ith particle’s “flying” velocity is also a D-
dimensional vector, denoted as Vi = (vi1, vi2, …, viD) (i=1, 2, …, m). Denote the best
position of the ith particle as Pi = (pi1, pi2, …, piD), and the best position of the colony
as Pg (pg1, pg2, …, pgD), respectively. The PSO algorithm could be performed using the
following equations

Vi (k +1) = wVi (k ) + c1r1(Pi − Xi (k)) / Δt + c2r2 (Pg − Xi (k )) / Δt (1)

X i ( k + 1) = X i ( k ) + Vi ( k + 1) Δ t (2)

where i=1, 2, …, m, k represents the iterative number, w is the inertia weight, c1 and
c2 are learning rates, r1 and r2 are random numbers between 0 and 1, ∆t is the time step

value, Vi [Vmin, Vmax] where Vmin and Vmax are the designated vectors. The
termination criterion for the iterations is determined according to whether the
maximum generation or a designated value of the fitness of Pg is reached.

2.2 Discrete Binary Particle Swarm Optimization

PSO is used to deal with the optimization of continuous function initially. In 1997,
Kennedy and Eberhart [8] proposed binary particle swarm optimization (BPSO).
Feature Selection Algorithm Based on Least Squares Support Vector Machine 277

BPSO constrains each dimension of Xi and Pi to 1 or 0, but the velocity is not


constrained. This generalization makes the PSO apply to combination optimization.

2.3 Least Squares Support Vector Machines (LS-SVM)

The least squares support vector classification algorithm is introduced briefly as


follows [9-10]:
Let us consider a given training set of N samples { x i , y i } iN=1 with the ith input
datum x i ∈ R n and the ith output datum yi ∈ {−1,+1} . The aim of support vector
machines model is to construct the classifier taking the form:

f ( x , w ) = sign [ w T ϕ ( x ) + b ] (3)

where the nonlinear mapping ϕ (⋅) maps the input data into a higher dimensional
feature space. In least squares support machines the following optimization problem
is formulated
N
1 T
min J ( w , e ) = w w + γ ∑ ei2 (4)
w,e 2 i =1

subject to the equality constraints

y i [ w T ϕ ( xi ) + b ] = 1 − ei , i = 1,..., N (5)

This corresponds to a form of ridge regression. The Lagrangian is given by


N
L ( w , b , e , α ) = J ( w , e ) − ∑ α i { y k [ w T ϕ ( x i ) + b ] − 1 + ei } (6)
i =1

with Lagrange multipliers α k . The conditions for the optimality are

⎧ ∂L N

⎪ ∂W = 0 → w = ∑ α i y iϕ ( x i )
⎪ i =1

⎪ ∂L = 0 →
N

⎪ ∂b ∑
i =1
α i yi = 0
⎨ ∂L (7)
⎪ = 0 → α i = γei
⎪ ∂ ei
⎪ ∂L
⎪ = 0 → y k [ w T ϕ ( xi ) + b ] − 1 + ei = 0
⎩ ∂ α i

for i=1,…,N. After elimination of ei and w , the solution is given by the following set
of linear equations
⎡0 − y T ⎤ ⎡b ⎤ ⎡0 ⎤
⎢ ⎥⎢ ⎥ = ⎢G ⎥ (8)
⎣y Ω + γ −1 I ⎦ ⎣ a ⎦ ⎣ 1 ⎦
278 S. Chuyi et al.

G
where y = [ y1 ,..., y N ]T ,1 = [1,...,1]T , α = [α1 ,..., α N ]T and the Mercer condition

Ω kl = y k yl ϕ ( x k ) T ϕ ( xl ) = y k ylψ ( x k , xl ) k , l = 1,..., N (9)

has been applied.


Set A = Ω + γ − 1 I . For A is a symmetric and positive-definite matrix, A-1 exists.
Solve the set of linear Eqs. (8), the solution can be obtained as
G G
α = A −1 (1 − yb ) y T A −1 1
b= (10)
y T A −1 y

Substituting w in Eq. (1) with the first equation of Eqs. (7) and using Eq. (9) we have

N
f ( x ) = sign [ ∑ α i y iψ ( x , x i ) + b ] (11)
i =1

where α i , b are the solution to Eqs. (8). The kernel function ψ ( ⋅ ) can be chosen as
linear function ψ ( x , x i ) = x iT x , polynomial function ψ ( x , x i ) = ( x iT x + 1 ) d or
radial basis function ψ ( x , x i ) = exp{ − x − x i 2 / σ 2 } . In the proposed algorithm the
2

kernel function is chosen as linear function. f (x) is the obtained classifier.

3 Feature Selection Based on PSO and LSSVM

3.1 Representation of Particle

In PSO, a particle represents a point in feature space, which responds to a subset of


features. Each particle is a binary string whose length is D. D is the number of
features in feature space. A bit in binary string represents a feature. The value of a bit
is 1 meaning that the feature corresponding to this bit is selected into feature subset.
The value of a bit is 0 meaning that the feature is not selected into feature subset.

3.2 Initialization of Colony

The initialization of colony is to generate a set of particle randomly. But the number
of 1 and 0 in each particle will almost be the same. This means that the number of
features in each particle is almost the same. To obtain the different numbers of
feature, the algorithm [4] used in this paper is generating the number of 1 in a particle
randomly and then distributing 1 in a particle randomly. This algorithm could reflect
the variety of feature.
The initial velocity is generated by this formula [11]:

Vi (0) = Vmin + rand() × (Vmax − Vmin ) (12)


Feature Selection Algorithm Based on Least Squares Support Vector Machine 279

where rand () is a random between 0 and 1. Vmax and Vmin are the maximum velocity
and minimum velocity respectively.

3.3 Updating Particle Velocity

The velocity of a particle is represented by a vector of D dimension real number. The


updating formula is Eq.(1). Δt = 1 in this paper. In Eq. (1), ( Pi − X i ( k )) represents
the distances between position Xi and the best position of particle. (Pg − X i (k ))
represents the distance between position Xi and the best position of colony. For the
position is constructed by binary string, the distance between the two positions could
be scaled by the numbers of different bits in two binary strings.
In order to constrain the velocity in a proper range, the velocity should be
processed after updating as follows:

⎧ V max V i ( k + 1) > V max



V i ( k + 1) = ⎨V i ( k + 1) V min ≤ V i ( k + 1) ≤ V max
⎪ V V i ( k + 1) < V min
⎩ min

3.4 Updating Position

Updating position follows the Eq.(2) which adds the position vector and the velocity
vector. And then the result is rounded. Finally, the result modulo 2 maps 0 and 1.

3.5 Fitness of Particle

The purpose of feature selection is to find the feature subset which has stronger
classification ability. Fitness is the scale for evaluating the feature subset which is
denoted by a particle. Fitness is composed by two parts: (a) testing accuracy. (b)
numbers of selected features. For each particle h, the fitness is as follows:

fit ( h ) = 10 4 × (1 − acc ( h )) + k × ones ( h )

where the acc(h) is the classification accuracy. The classifier is constructed by the
features selecting according to h. Ones(h ) is the number of 1 in h. The PSO is to find
the global minimum. The higher accuracy means lower fitness. k is the parameter
which balances the accuracy and feature number. The larger k means that the feature
number is important.

3.6 Classifier Selection

For the better classification and generality ability, we consider selecting the support
vector machines as classifier. The least square support vector machines converse the
inequality constrain to equality constrain. So it is easy to solve. LS-SVM is selected
as classifier in this paper.
280 S. Chuyi et al.

3.7 Algorithm Description

The steps of the proposed algorithm are as follows:


Step 1. Set the size m of colony, set the maximum velocity Vmax and the minimum
velocity Vmin .
Step 2. Initialize the colony:
For each particle, generate the number of 1 randomly and scatter 1 to the position
vector. Generate the initial velocity according to Eq. (12)
Step 3. For each particle, construct the LS-SVM classifier according to the selected
feature. Calculate the accuracy and then calculate the fitness.
Step 4. Compare each particle’s fitness value with that of the best position Pi that has
been experienced, if it is better than the old one it will be selected to be the present
best position Pi .
Step 5. Compare each particle’s fitness with that of the present best global position
Pg , if it is better than the current Pg , the index of Pg will be reset.
Step 6. Update the velocity and position of each particle by Eq. (1) and Eq. (2).
Step 7. If the maximum number of iterations is reached or the designated fitness is
achieved, the process is stopped. Otherwise go to step 3.

4 Numerical Experience
We use two datasets to demonstrate the performance of the proposed algorithm. The
datasets are obtained from http://sdmc.lit.org.sg/GEDatasets/Datasets.html. Table 1
shows the details of the two datasets. The experiments are run on a Lenovo personal
computer, which utilizes a 3.0GHz Pentium IV processor with 1GB memory. This
computer runs Microsoft Windows XP operating system. All the programs are written
in C++, using Microsoft's Visual C++ 6.0 compiler. We use the original datasets
without normalization.
The parameters that should be predetermined are as follows: The kernel ψ (⋅) for
LS-SVM is chosen as a linear function ψ ( x , x i ) = ϕ ( x ) T ϕ ( x ) = x iT x .k=0.45 in
Eq(13). The size m of colony is 100. The value of Vmax in each dimension is 10 and
the value of Vmin in each dimension is 0.1. The performance of the proposed
algorithm is summarized in Table 2.

Table 1. Information of datasets

Datasets Number of gens Number of train samples Number of test samples


38 34
ALLAML
7129 ALL AML ALL AML
Leukemia
27 11 20 14
32 149
Lung Cancer 12533 MPM ADCA MPM ADCA
16 16 15 134
Feature Selection Algorithm Based on Least Squares Support Vector Machine 281

Table 2. Performance of the proposed algorithm

Datasets Number of selected gens Testing error(%)


ALLAML
157 6.7628
Leukemia
Lung
135 4.2974
Cancer

It can be seen from Table 2 that 2.20% genes are selected from ALLAML
Leukemia dataset (157 genes are selected from 7129 genes). 1.07% genes are selected
from Lung Cancer dataset (135 genes are selected from 12533 genes).

5 Conclusion
A feature selection algorithm based on PSO and LS-SVM is proposed in this paper.
LS-SVM performs well for classification problem. PSO is easy to solve and robust.
The proposed algorithm combines the advantages of LS-SVM and PSO. PSO is used
to select feature and LS-SVM is used to construct classifier. The accuracy for
classification is the main part in fitness function. Numerical experiences show that
this algorithm decreases the dimension of samples and improve the efficiency for
classification.
There are some further improvements for the proposed algorithm. For example, the
distance between two positions, the added method for position vector and velocity
vector, the initial colony method and so on. The further improvement algorithm will
achieve better performance.

Acknowledgment
This work was supported by the funds from National Natural Science Foundation of
China (NSFC) (61073075, 60803052 and 10872077), the National High-Tech R&D
Program of China (2009AA02Z307), Jilin University ("985" and "211" project,
Scientific Frontier and interdisciplinary subject project (200903173)), Inner Mongolia
Autonomous Region Research Project of Higher Education (NJ10118 and NJ10112).

References
1. Shoorehdeli, M.A., Teshnehlab, M., Moghaddam, H.A.: Feature Subset Selection for Face
Detection Using Genetic Algorithms and Particle Swarm Optimization. In: Proceedings of
the 2006 IEEE International Conference on Networking, Sensing and Control, pp. 686–690
(2006)
2. Huang, R., He, M.Y.: Feature Selection Using Double Parallel Feedforward Neural
Networks and Particle Swarm Optimization. In: IEEE Congress on Evolutionary
Computation, pp. 692–696 (2007)
3. Yu, H.L., Gu, G.C., Zhu, C.M.: Feature Gene Selection by Combining an Improved Discrete
PSO and SVM. Journal of Harbin Engineering University 30(13), 1399–1403 (2009)
282 S. Chuyi et al.

4. Qiang, L.Y., Peng, X.Y., Peng, Y.: BPSO-SVM Wrapper for Feature Subset Selection.
Acta Electronica Sinica 34(3), 496–498 (2006)
5. Dai, P., Li, N.: A Fast SVM-based Feature Selection Method. Journal of Shandong
University (Engineering Science) 40(5), 60–65 (2010)
6. Kennedy, J., Eberhart, R.C.: Particle Swarm Optimization. In: IEEE International
Conference on Neural Networks, pp. 1942–1948. IEEE Service Center, Piscataway (1995)
7. Shi, X.H., Wan, L.M., Lee, H.P., et al.: An Improved Genetic Algorithm with Variable
Population-size and a PSO-GA Based Hybrid Evolutionary Algorithm. In: Second
International Conference on Machine Learning and Cybernetics, pp. 1735–1740 (2003)
8. Eberhart, R.C., Kennedy, J.: A Discrete Binary Version of the Particle Swarm Algorithm.
In: IEEE Conference on Systems, Man, and Cybernetics, vol. 5, pp. 4104–4109. IEEE
Press, Orlando (1997)
9. Suykens, J.A.K., Vandewalle, J.: Least Squares Support Vector Machine Classifiers.
Neural Processing Letter 9, 293–300 (1999)
10. Chua, K.S.: Efficient Computations for Large Least Square Support Vector Machine
Classifiers. Pattern Recognition Letters 24, 75–80 (2003)
11. Ma, H.M., Ye, C.M., Zhang, S.: Binary Improved Particle Swarm Optimization Algorithm
for Knapsack Problem. Journal of University of Shanghai for Science and Technology 28(1),
31–34 (2006)
Unsupervised Local and Global Weighting for Feature
Selection

Nadia Mesghouni, Khaled Ghedira, and Moncef Temani

University of Tunis, LI3 Laboratory, ISG TUNIS,


92, Avenue 9 avril 1938, Tunis – 1007, Tunisia
[email protected],
[email protected],
[email protected]

Abstract. In this paper we will describe a process for selecting relevant features
in unsupervised learning paradigms using a new weighted approachs: local
weight observation “OBS-SOM”, and global weight observation “GObs-SOM”
This new methods are based on the self organizing map (SOM) model and
feature weighting. These learning algorithms provide cluster characterization by
determining the feature weights within each cluster. We will describe extensive
testing using a novel statistical method for unsupervised feature selection. Our
approach demonstrates the efficiency and effectiveness of this method in
dealing with high dimensional data for simultaneous clustering and weighting.
These models are tested on a wide variety of datasets, showing a better
performance for new algorithms or classical SOM algorithm. We can also show
that through deferent means of visualization, OBS-SOM, and GObs-SOM
algorithms provide various pieces of information that could be used in practical
applications.

Keywords: Self Organizing Map, unsupervised learning, local weighting


observation, global weighting observation.

1 Introduction
Feature selection for clustering or unsupervised feature selection aims at identifying
the feature subsets so that a model describing accurately the clusters can be obtained
from unsupervised learning. This improves the interpretability of the induced model,
as only relevant features are involved in it, without degrading its descriptive accuracy.
Additionally, the identification of relevant and irrelevant variables with SOM [1, 2]
learning provides valuable insight into the nature of the group-structure. Features
selection or variables selection for clustering is difficult because, unlike supervised
learning [3], there are no class labels for the dataset and no obvious criteria to guide
the search. The important issue of feature selection in clustering is to provide
variables which give the Memory based Weighted Topological Clustering "best"
homogeneous clustering [4]. Therefore, we use the weight and prototype vector π[j]
and w[j] provided by our proposed weighting approaches to cluster the map and to
characterize each cluster with relevance variable. For map clustering we use

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 283–290, 2011.
© Springer-Verlag Berlin Heidelberg 2011
284 N. Mesghouni, K. Ghedira, and M. Temani

traditional hierarchical clustering coupled to Davides-Bouldin index [5] to choose


optimal partition. Thus, to select variables, we use original method which is based on
statistical criteria named Scree Test to choose the most important variables for cell
and cluster of cell [6]. In the following, we present a version of local variables
weighting using SOM: weighting the distances.
Firstly, we propose adaptive weighting approaches which were inspired from the
w–k means and ω-SOM algorithms. After, we minimize the objective SOM function
based on analytical method which allows us to obtain the SOM batch version; and we
will present our proposed feature weighted approach for the adaptive SOM version
distance weighing and weighing observation. Finally, we tested our model on a wide
variety of datasets, showing a better performance for the OBS-SOM, GObs-SOM or
classical SOM algorithm.

2 Adaptive Weighting SOM


We proposed to use the principle of thewLVQ2 weighting technique and to adapt it to
the Self-Organizing Maps using the stochastic version of the SOM algorithm. The
minimization of the objective function is doing using the gradient descent techniques
looking for a local minimum. This type of approaches is more efficient than the
analytical type weighting, and this is because we use the adaptive weighting for the
clustering characterization. We propose four types of adaptive approaches: local
weighting observations; local weighting distance; global weighting observations, and
global weighting distance. We proposed to use the principle of thewLVQ2 weighting
technique and to adapt it to the Self-Organizing Maps using the stochastic version of
the SOM algorithm. The minimization of the objective function is doing using the
gradient descent techniques looking for a local minimum. This type of approaches is
more efficient than the analytical type weighting, and this is because we use the
adaptive weighting for the clustering characterization. We propose four types of
adaptive approaches: local weighting observations; local weighting distance; global
weighting observations, and global weighting distance.

3 Weighting Observations
Weighting the observations during the learning process is a technique which allows
giving more importance to the relevant features of the weighted observation. Consider
the dataset X={x1, x2, x3, x4} and suppose that the observation x2 has a bigger
relevance in the X. In this case the weighting approach must be able to assign a highest
weight value to this one comparing to others three observations. For this type of
approach we propose the both local and global weighting described in the next sections.

3.1 Local Weighting Observations: OBS-SOM

We based our method on initial work describing the supervised modelw-LVQ2 [7].
This approach adapts weights to filter the observation during the learning process.
Using this model, we weighted observations x using weight vectors π before
computing the distance. So, the weight matrix will be considered like a filtering
process for the observations. The objective function was rewritten as follows:
Unsupervised Local and Global Weighting for Feature Selection 285

| |
R Obs-SOM ( ,W,π) = ∑ ∑| | || ||² (1)

Minimization of R OBS-SOM (χ,W,Π) was performed by iterative repetition of the


following three steps until stabilization. The initialization step determines the
prototype set W and the set of associated weights Π, at each training step (t+ 1). An
observation xi is then randomly chosen from the input dataset and the following
operations are repeated:
Minimize R OBS-SOM (χ, , ) with respect to χ by fixing W and Π. Each
weighted observation (πj, xi) is assigned to the closest prototype wj using the
assignment function, defined as follows:

(xi) = arg minj (||πjxi-wj||²) (2)

Minimize R OBS-SOM (χ, , ) with respect to W by fixing χ and Π. The prototype


vectors are updated using the gradient stochastic expression:

Wj(t+1)= wj(t) + (t) Kjx(xi) (πjxi-wj(t)) (3)

Minimize R OBS-SOM (χ, , ) with respect to Π by fixing χ and W. The update rule
for the feature weight vector πj (t+1) is:

Πj(t+1)=πj(t)+ (t) Kjx(xi) (πjxi-wj(t)) (4)


As in the traditional stochastic learning algorithm of Kohonen, we denote the
learning rate at time t by (t). The training is usually performed in two phases. In the
first phase, a high initial learning rate (0) and a large neighborhood radius Tmax are
used. In this c on d phase, a low learning rate and small neighborhood radius are used
from the beginning. So, for a map there is a associated matrix of weights trained
during the learning algorithm. If we want to obtain a global weighting observations
algorithm (GObs-SOM) one changes this matrix in a vector. In this case we do not
take care of the importance of each variable for assigned to each observation. The
relevance vector do not depends on the cell and variables, but it depends on the map
C: π= (π1,...,πd). We show the adaptive local process in the algorithm 1.

Algorithm 1. The Obs-SOM learning algorithm


Input: Data set X; Iter - number of iterations
Initialization Phase:
Randomly initialize the prototype matrix W;
Randomly initialize the weight matrix Π;
For t= 1 to Iter do
Learning Phase: Present a learning example x and find the BMU computing the
Euclidian distance;
Updating Phase: Compute the new prototypes w using the expression (3);
Compute the weights π using the expression (4).
End for
286 N. Mesghouni, K. Ghedira, and M. Temani

3.2 Global Weighting Observations: GObs-SOM


Weighting the clustering algorithm in a global way will allow us to weight the entire
map with the same vector of weight. This is useful when we do not see to find the
relevant features for each cluster, but we want to detect them for entire dataset or for
the obtained map. The objective function is the same like for the adaptive learning
Obs-SOM, changing only the weight matrix π in a vector of weights:
| |
RGObs-SOM( ,W,π)=∑ ∑| | (i) |i| |j|
j (x ))||πx -w ||² (5)
For each feature we have a corresponding numerical weight, where:

Πj(t+1)=π(t)+ (t)Kjx(xi)(πxi–wj(t)) (6)

4 Experimental Results for the Cluster Characterization (Using


the Adaptive Approaches: G/Obs-SOM)
We have performed several experiments on five known problems from the UCI
Repository of machine learning data bases: waveform, spam base, madelon, isolet and
Wisconson cancer data base [8]. To evaluate the quality of clustering, we compared
results to a "ground truth". We used the clustering accuracy form assuring the
clustering results. In general, the results of clustering are usually assessed on the basis
of some external knowledge about how clusters should be structured. The only way to
assess the usefulness of a clustering result is in direct validation, where by clusters are
applied to the solution of a problem and the correctness is evaluated. A gain to
objective external knowledge. This procedure is defined by [9] as "validating
clustering by extrinsic classification", and has been used in many other studies. To use
this Approach we therefore need labeled data sets, where the external (extrinsic)
knowledge is the class information provided by labels. Thus, the identification of
significant clusters in the data, by Obs-SOM, will be reflected by the distribution of
classes. A purity score can thus be expressed as the percentage of elements in a
cluster that have been assigned a particular class. We also validated our approaches in
supervised case learning paradigms. We used the K-fold cross validation technique,
repeated s times for s= 5 and K= 3, to estimate the performance of G/ Obs-SOM. For
each run, the dataset was split into three disjoint subsets of equal size (15 runs for
each dataset). We used two subsets for training and then tested the model on there
mining subset using all features and selected features (selected on the cells or on
clusters). The labels generated were compared to the real labels of the test set for each
run. We used the purity index to evaluate the quality of map segmentation. This index
shows the correspondence between the class of data and cluster label, which is
computed using the majority vote rule. A high value for this measure indicates a high
level of homogeneous clustering. A purity index value close to 0 is indicative of poor
clustering, whereas an index value close to 1 is indicative of a good clustering result.

4.1 Results on the Waveform Dataset for G/OBS-SOM

We used this dataset to show a good level of performance for both algorithms
(Dis-SOM and Obs-SOM) for simultaneous clustering and feature weighting. All
Unsupervised Local and Global Weighting for Feature Selection 287

observations were used to generate a map with 26×14 cells dimension. Both learning
algorithms provided two vectors for each cell: the referent vector wj= (w1 j,w2j,...,wdj)
and weight vector πj= (π1j,π2j,...,πdj), where d= 40. Preparing data for clustering
requires some preprocessing, such as normalization or standardization. In the first
experimentation step, we normalized the initial dataset (Figure 1(a)) to obtain more
homogenous data (Figure 1 (b)). We used variance normalization, representing a
linear transformation that scales the values such that their variance is equal to 1.
We created 3D representations of the referent vector and weight vector provided by
classical SOM and by our methods (G/Obs-SOM). The axes X and Y indicate the
features and the referent indexes, respectively. The amplitude indicates the mean
value of each component. Examination of the two graphs (3(c), 4(b)) shows that the
noise represented by features 19 to 40 may be clearly detected with low amplitudes.
This visual analysis of the results clearly shows that Obs-SOM algorithm provides the
best results. Both graphs of weights Π and prototypes W show that features associated
to noise is irrelevant with low amplitude. Visual analysis of the weight vectors
(Figure 2(d)) showed the weight vectors obtained with Obs-SOM to give a more
accurate. The Obs-SOM algorithm provides good results because the weight vectors
work as a filter for observations and estimates the referents that result from this
filtering. We applied the selection task to all parameters of the map before and after
map clustering to check that it was possible to automatically select the features using
our algorithms. This task involves detecting major changes for each input vector
represented as a signal graph. We used hierarchical classification [10] for clustering
the map. After Obs-SOM map clustering, we obtained three clusters with a purity
index equal to 0.7076. This demonstrates that when there is no cluster (labels)
information, feature weighting can be used to find and characterize homogeneous
clusters. The importance of this index is that it can give us information about each
clusters in a visually mode. The plot founded on the left part of the figure shows the
wrong labeled observations. In the case of both global weighting algorithms, we can
see that some noise features has a high value, and even for the Obs-SOM the first
features (1-20) do not describe well the waves. This disadvantage compared to local
weighted approaches is because the global weighting technique uses only a vector of
weights for all the data, and respectively each sample vector will be weighted with the
same vector of weights. After Obs-SOM map clustering with the referents W, which
are already weighted, we obtain 3 clusters. The characterization of clusters with the
"Scree Test" algorithm is provided in Table 1for each algorithm, we present the
features selected for each cluster. Both techniques (Obs-SOM, GObs-SOM) provided
three clusters characterized by different features. By contrast, segmentation of the
map using classical SOM provided six clusters with a purity index value of 0.662.
Map segmentation was performed using hierarchical clustering with all the features.
For clusters cl1, cl2 and cl3, the features selected using Obs-SOM. We found that the
algorithm Obs-SOM identified relevant and informative features, giving more
accurate results than classical SOM. The new and classical methods were compared
after segmentation of the map. We investigated the effect of selected features before
and after, or without segmentation by testing this selection process in the supervised
paradigm and computing the accuracy index for each method. In the case of global
weighting approaches (G/Obs-SOM) we are not able to characterize each cluster
because the weight vector are the same for all the prototypes, but we can detect the
288 N. Mesghouni, K. Ghedira, and M. Temani

relevant features for the whole map (dataset). We can see that the set of selected
features using these global weighting algorithms (Table 1) represent the union of
relevant features obtained with the local weighting approach for all the clusters.

Table 1. Comparison of the selected variables using traditional and our approaches (G/Obs-
SOM). [i− j] indicates the set of selected variables.

Database Real cluster Gobs-SOM Obs-SOM


Wave-form 3 [3-20] Cluster1:[3-8;11-16]
Cluster2:[8-11,14-19]
Cluster3[3-20]

In order to evaluate the relevance of variable selected, we compute purity score by


running a 3-fold cross-validation five times. Figure 3 shows the box plot indicating
the purity scores calculated for each run with learning data set, using SOM, GObs-
SOM and Obs-SOM, We show also the result after clustering the corresponding map
using hierarchical clustering. We observe that the Obs-SOM method has significantly
better score compared to traditional SOM. The score is degraded after clustering the
map, but Obs-SOM is still significantly better than traditional SOM. In the second
time we evaluate the variable selected by assigning the data set (test part) during the
cross validation task. Figure 3 shows the purity score using traditional SOM, and Obs-
SOM. The classification task are tested using all variables, the variables selected by
cell and the variables selected after clustering map

(a) Waveform database (b) Normalized Waveform dataset

(c) W provided by SOM

Fig. 1. Waveform dataset


Unsupervised Local and Global Weighting for Feature Selection 289

(a) (W provided by GObs-SOM) (b) (W provided by Obs-SOM)

(c)Π provided by Gobs-SOM (d) Π provided by Obs-SOM

Fig. 2. 3D visualization of the referent vector and weight vector. The axes X and Y indicate
features and the referent index values, respectively. The amplitude indicates the mean value of
each component of map 26×14 (364 cells).

Fig. 3. Comparison of purity score (classification accuracy with learning dataset) using SOM,
GObs-SOM and Obs-SOM before and after clustering map
290 N. Mesghouni, K. Ghedira, and M. Temani

5 Conclusion
In this paper, we have described a process for selecting relevant features in unsupervised
learning paradigms using these new weighted approaches. These new methods are
based on the SOM model and feature weighting. Both learning algorithms Obs-SOM,
and Gobs-SOM provide cluster characterization by determining the feature weights
within each cluster. We described extensive testing using a novel statistical method for
unsupervised feature selection. Our approaches demonstrated he efficiency and
effectiveness of this method in dealing with high dimensional data for simultaneous
clustering and weighting. The models proposed in this paper were tested on a wide
variety of datasets, showing a better performance for the Obs-SOM, and Gobs-SOM
algorithms or classical SOM algorithm. We also showed that through different means of
visualization, Obs-SOM, and Gobs-SOM, algorithms provide various pieces of
information that could be used in practical applications. The global weighted
approaches are used in the case of analysis of the entire clustering result and not each
cluster separately.

References
[1] Kohonen, T.: Self-organizing Maps. Springer, Berlin (1995)
[2] Vesanto, J., Alhoniemi, E.: Clustering of the selforganizing map. IEEE Neural Networks
[3] Kohonen, T.: Self-organizing Maps. Springer, Berlin (2001)
[4] Frigui, H., Nasraoui, O.: Unsupervised learning of prototypes and attribute weights.
Pattern Recognition 37(3), 567–581 (2004)
[5] Yacoub, M., Bennani, Y.: Features selection and architecture optimization in
connectionist systems. IJNS 10(5) (2000)
[6] Cattell, R.: The scree test for the number offactors. Multivariate Behavioral Research 1,
245–276 (1966)
[7] Yacoub, M., Bennani, Y.: Features selection and architecture optimization in
connectionist systems. IJNS 10(5) (2000)
[8] Asuncion, A., Newman, D.J.: Uci machine learning repository (2007)
[9] Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing
Surveys 31(3), 264–323 (1999)
[10] Vesanto, J., Alhoniemi, E.: Clustering of the self-organizing map. IEEE Transactions on
Neural Networks 11(3), 586–600 (2000)
[11] Yacoub, M., Bennani, Y.: Une mesure de pertinence pour la sélection de variables dans
les perceptrons multicouches. RIA, Apprentissage Connexionniste, pp. 393–410 (2001)
Graph-Based Feature Recognition of Line-Like
Topographic Map Symbols

Rudolf Szendrei, István Elek, and Mátyás Márton

ELTE University, Faculty of Informatics, Budapest


[email protected], {elek,matyi}@map.elte.hu
http://team.elte.hu/

Abstract. Paper-based raster maps are primarily for human consump-


tion. Todays computer services in geoinformatics usually require vector-
ized topographic maps, while the usual method of the conversion has
been an error-prone, manual process.
The system in development separates the recognition of point-like,
line-like and surface-like objects, and the most successful approach ap-
pears to be the recognition of these objects in a reversed order with
respect to their printing. During the recongition of surfaces, homoge-
neous and textured surfaces must be distinguished. The most diverse
and complicated group constitute the line-like objects.
In this article, a possible method of the conversion is discussed for
line-like topographic map objects. The results described here are par-
tially implemented in the IRIS project, but further work remains. This
emphasizes the tools of digital image processing and knowledge-based
approach.

Keywords: Geoinformatics, topographic maps, raster-vector conversion.

1 Introduction

Paper-based raster maps are primarily appropriate for human usage. They al-
ways require a certain level of intelligent interpretation. In GIS applications
vectorized maps are preferred. Especially, government, local authorities and ser-
vice providers tend to use topographic maps in vectorized form. It is a serious
challenge in every country to vectorize maps that are available in raster format.
This task has been accomplished in most countries — often with the use of un-
comfortable, “manual” tools, taking several years. However, it is worth dealing
with the topic of raster-vector conversion. On one hand, some results of vector-
ization need improvement or modification. On the other hand, new maps are
created that need vectorization.
The theoretical background of an intelligent raster-vector conversion system
has been studied in the IRIS project [2]. Several components of a prototype sys-
tem has been elaborated. It became clear very early that the computer support
of conversion steps can be achieved at quite different levels. For example, a map
symbol can be identified by a human interpreter, but the recognition can be

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 291–298, 2011.

c Springer-Verlag Berlin Heidelberg 2011
292 R. Szendrei, I. Elek, and M. Márton

attempted with a software, using the tools of image processing [3]. A computer
system can be fairly valuable and usable even if every important decision of
interpretation is made by the expert user. However, the system designed and
developed by the authors is aimed at to automatize the raster-vector conver-
sion of line-like symbols as much as possible. This aim gives an emphasis to a
knowledge-based approach.
This paper deals with a part of raster-vector conversion applied in cartog-
raphy, with knowledge-based approach [1]. The line-like map symbols used in
topographical maps will be introduced, together with the algorithms used to
recognize them. The organization of expertise into knowledge base will also be
presented.
The following must be considered in connection with good quality and auto-
mated vectorization. Raster maps can be adequately understood only by human
expert. After the vectorization, the relationships used for interpretation are no
more contained in the vectorized map — it consists only of numerical and de-
scriptive data. Automatic interpretation of image contents requires sophisticated
image processing tools, whiches are not comparable to human perception in the
majority of cases. Therefore, the level of automatic recognition must also be
appropriately determined.

2 Line-Like Map Symbols

The topic of this article is how to interpret printed variant of line-like map sym-
bols and how to represent them in computer systems. This process is considered
basically as the result of interpretation and processing of map symbols. To ac-
complish this task it is very important to understand maps, and specifically,
map symbols. To gain a comprehensive survey, refer to [5]. Although human
cognition can not be completely understood, it is necessary to know to a cer-
tain extent how the human expert interprets graphical information. Regarding
human perception, primarily points, lines (see Fig. 1) and textured surfaces are
sought and distinguished. It must be realized that human perception may reveal
finer or hidden information, for example how roads crossing at different levels
hide each other. Human mind is also capable of abstraction, for example when it
disregards the actual texture of surface, and investigates only its shape. Human
eye can make some corrections, for example in the determination of shades of
color layers printed over each other.
Map interpretation process and the complexity of knowledge based object
recognition can be visualized via any example of the four different object types
— that is, point, line, surface and inscription.
Line-like elements can cover larger area on map than their real size. For in-
stance in the case of a highway, a zero-width center line can represent the the-
oretical position of the road in the database. Beyond the graphical properties
of lines the database may contain real physical parameters, such as road width,
carrying capacity, coating (concrete, asphalt) etc. Hiding is a very inherent phe-
nomenon in maps when line-like objects, landmarks (typically roads, railways
Graph-Based Feature Recognition of Line-Like Topographic Map Symbols 293

and wires) located at different elevations intersect. This results in discontinuity


of objects in map visualization. However, in map interpretation continuity must
be assumed.

3 Recognition of Line-Like Symbols


Line-like symbols are usually the trace of a road like object, or edge of a polygon
with a given texture/attribute.

a) c) f)

d)
b)

12 e)

Fig. 1. Examples for line-like symbols: a) railway b) railway network at a railway


station, c) highway under construction d) highway with emergency phone. e) road
with a specified width f) bridge above a canal

Recognition of line-like symbols is one of the most difficult tasks of raster-


vector conversion. These symbols are often complex, and it is permitted for two
symbols to differ only in their size, to cross each other or to join to form a
single object (see Fig. 1a, b, respectively). Difficulties are posed by parallel lines
belonging to the same object (see Fig. 1d, e) versus lines running in parallel which
belong to separate objects. Further difficulties are the discontinuous symbols (see
Fig. 1c, e, f).
It is beyond the aim of the current article to solve all the difficulties mentioned
above, so for the purpose of this paper we assume that line-like symbols
1. do not cross each other,
2. do not join to form a single object, and
3. are continuous.
A classic way of line-like symbol vectorization is introduced in [4], where cadas-
tral maps in binary raster image format are vectorized. The additional features
of color topographic maps, like road width, capacity, coating etc. can not be
recognized the classical way [6]. Each of these features are represented by a
corresponding graphics, color and structure. The following method is able to
recognize the trace of the line-like symbols of a topographic map.
1. Do image segmentation and classify each pixel.
2. Create a binary map, where each black pixel belong to eg. road.
3. Apply thinning and morphological thinning on the binary map.
4. Vectorize the one pixel thin skeletons.
The first step is the segmentation, which works as follows. Define an object color
set O and a surface color set S. The amount of colors in each color set is approx.
294 R. Szendrei, I. Elek, and M. Márton

0 0 0 0 0 1 1
a) 1 b) 1 1 0 c) 1 1 1 d) 1 1
1 1 1 1

Fig. 2. a), b) structuring elements of morphological thinning based skeletonization,


c), d) structuring elements of morphological fork detection on binary images. Values
of the elements can be: 0 - background, 1 - foreground or undefined.

5-7 in the case of topographic maps. We assume that on a printed map each
pixel color can be defined as a linear combination of a surface and an object
color. In optimal case, this can be written as a c = α ∗ co + (1 − α) ∗ cs equation,
where c is the value of the current pixel, and co , cs are the respective object and
surface colors, so the segmentation can be done by solving the minimalization
task min |c − α ∗ co + (1 − α) ∗ cs | for each pixel.
o∈O,s∈S
As the second step is a simple selection on the segmented pixels, it can be
done easily. The third step consists of two different thinning methods. A general
thinning method is used first to avoid creating unneeded short lines by morpho-
logical thinning. The general thinning can be described as it iteratively deletes
pixels inside the shape to shrink it without shortening it or breaking it apart.
Because the result of the general thinning algorithm may contain small pixel
groups, a morphological thinning should be performed. This morphological thin-
ning can be done by using the structuring elements shown in Fig. 2. At each
iteration, the image is first thinned by the left hand structuring element (see
Fig. 2 a) and b) ), and then by the right hand one, and then with the remaining
six 90◦ rotations of the two elements. The process is repeated in cyclic fashion
until none of the thinnings produces any further change. As usual, the origin of
the structuring element is at the center.
The skeletonized binary image can be vectorized in the following way. Mark
all object pixels black and surface pixels white. Mark those black pixels red,
where N(P 1)> 2, and then mark the remaining black fork points blue by using
structuring elements c) and d) of figure 2 in the same way as structuring elements
are used in morphological thinning. The red fork points are connecting lines,
while blue fork points are connecting other fork points. Mark green each black
pixel, if at most one neighbour of it is black (end point of line segment). It can
be seen that a priority is defined over the colors as white < black < green <
red < blue. The following steps vectorize the object pixels
1. Select a green point, mark white and create a new line segment list, which
contains that point.
2. Select a black neighbour if it exists and if the current point is also black.
Otherwise select a higher priority point. Mark white the point and add to
the end of the list.
3. Go to Step 2, while a corresponding neighbour exists.
4. Go back to the place of the first element of the list and go to Step 2. Be
careful that new points should be added now to the front of the list. (This
step processes points in the opposite direction.)
Graph-Based Feature Recognition of Line-Like Topographic Map Symbols 295

5. Go to Step 1, while a green point exists.


6. Select a black point, mark white, and create a new line segment list, which
contains that point.
7. Select a black neighbour of the current point, mark white, and put it at the
end of the list.
8. Go to Step 7, while a black neighbour exists.
9. Select a red point p, mark white and create a new line segment list, which
contains that point. Let N eighbourSelect = RedSelect = Counter = 0,
BlueF irst = false, where = back, q = p.
10. Let P revP oint = q.
11. If the N eighbourSelectth neighbour r of q exists, let q = r, let
BlueF irst = (Steps = 0 and where=back), let n = q, and increment
N eighbourSelect by 1. Put q into the list at where and go to Step 13.
12. If the RedSelectth neighbour r of q exists,
(a) If q and n are neighbours and where =front, then let q = P revP oint
and increment RedSelect by 1. Go to Step 10.
(b) Put q into the list at where, mark q white, let N eighbourSelect = 0 and
increment Counter by 1. Go to Step 10.
13. If where=back, then let where=front, q = p and go to Step 10.
14. Go to Step 9, while a red point exists.

Although, the algorithm above vectorizes all the objects, it merges the several
object types and colors. Hence, pixels of a given object color are copied onto a
separate binary image before they are vectorized.
We introduce an approach, which is able to recognize the features of line-like
objects, so the corresponding attributes can be assigned to them. This assumes
that the path of each object exists in the corresponding vector layer. In order to
recognize a specific feature, its properties should be defined for identification.
Two properties of vectorised symbols are recognized: forks (F ∼ F ork), and
end-points (E ∼ End). Both are well known in fingerprint recognition where
they are called minutiae. In the case of fingerprints, a fork means an end-point
in the complement-pattern, so only one of them is used for identification. In our
case, we can not define a complement-pattern, so both forks and end-points are
used.
Representation of line-like symbols is based on weighted, undirected graphs.
An EF -graph is an undirected graph with the following properties:
– Nodes are either of type E or F . The color of a node is determined by the
corresponding vector layer.
– Two nodes are connected if the line-segment sequence connecting the nodes
in the corresponding vector layer does not contain additional nodes. Edges
running between nodes of different colors can be defined by the user (in case
of multicolor objects). The weight of the edge is equal to the length of the
road connecting the two nodes, and it has the color of the corresponding
symbol part.
– There are special nodes, denoted by an index P , which occur on the trace
of a line object. These will be used to produce the final vector model.
296 R. Szendrei, I. Elek, and M. Márton

E E E

4, Black 4, Black 4, Black

4, Black 4, Black 8, Black 4, Black


EP FP FP FP EP

4, Black 4, Black 4, Black

E E E E E

4, Black 4, Black

4, Black 4, Black 4, Black


FP FP FP FP

4, Black 4, Black

E E

Fig. 3. The EF graph and the elementary EF graph of a double railway line. Distances
are relative values and refer to the scale of the map. The dashed line in the elementary
EF graph represents its cyclical property.

An EF -graph can also be assigned to the vectorised map, not only to the vec-
torised symbols, where line-like symbols are not separated to their kernels. For
recognition we use the smallest units of the symbol, called the kernel. The small-
est unit is defined as the one which can be used to produce the entire symbol by
iteration. In the EF -graph there is usually only two nodes participating in the
iteration; these are type F with only a single edge, so become the entry and exit
points to the graph. In the very few cases, where the entry and exit points of
the smallest unit can not be identified, the kernel of the line-like object is itself.
Smallest unit can not be defined for the whole vectorised map.
Figure 3 shows how a symbol is built up from its smallest units by itera-
tion. Weights represent proportions and depend on the scale of the map. Beside
weights, we can assign another attribute to edges, their color. In the figure almost
all edges are coloured black.
The recognition of line-like objects is reduced to an extended subgraph iso-
morphism problem: we try to identify all the occurrences of the EF graph of the
symbol (subgraph) in the EF graph of the entire map. The weights of the EF
graphs are normalized with respect to the scale of the map, and the collection
is sorted in decreasing order of node degrees. Call this collection of sorted EF
graphs S. Since the EF graphs created to maps do not contain the edges those
connecting nodes with different colors, this case should be handled. In this arti-
cle, the potential edges are identified by searching the corresponding neighbour
on its own layer in the given distance of the node. The validity of a found po-
tential edge is verified by comparing the color of the edge and the color of the
segmented image pixels lying under the edge.
Subject to the conditions above it is possible to design an algorithm for the
recognition of subgraphs. While processing the map, recognized objects are re-
moved, by recoloring the corresponding subgraph. Two colors, say blue and
red, can be used to keep track of the progress of the algorithm and to ensure
termination.
Graph-Based Feature Recognition of Line-Like Topographic Map Symbols 297

The following algorithm stops when there are no more red nodes left in the
graph.
1. Choose an arbitrary red node U with the highest degree from the EF graph
of the map.
2. Attempt to match the subgraph at node U against S, that is the sorted
collection of EF graphs, until the first successful match in the following
way:
(a) Perform a “parallel” breadth-first search on the EF graph of the map
and the EF graph of kernel of the current symbol with origin U . This is
called successful if both the degree of all nodes match, and weights are
the same approximtely.
(b) In the case of success, all matching nodes become blue, otherwise they
remain red.
Upon successful matching the EF -graph of the symbol is deleted from the EF -
graph of the map. Entry and exit points must not be deleted unless they are
marked as E, and the degree of remaining F nodes must be decreased accord-
ingly. The equality of edge weights can only be approximate, due to the curvature
of symbols. The algorithm above can be implemented, as it keeps track the given
object by using the line segments as paths in vector data. The other difficulty
is that edges with differently colored nodes are not directly defined by the vec-
tor layers. In practice, we have created a spatial database, which has contained
the vectorized line segments and their color attribute. The potential edges was
determined by a query, looking for existence of a neighbour with the right color
in a given distance from the corresponding node.

4 Results
In this article a feature extraction method is introduced for line-like symbol
vectorization in the IRIS projects. The project aims to automate and support
the recognition of raster images of topographic maps, with the combination of
digital image processing and a knowledge-based approach.
The interpretation of line-like symbols is the most difficult issue in topographic
map vectorization. An EF graph representation is developed, which is used for
the recognition of curved, line-like objects with regular patterns.

Fork-free line,
Recognized pattern,
Recognized circle,
Unrecognized pattern

Fig. 4. Recognition results of a Hungarian topographic map line networks at map scale
1:10000 and scanning resolution of 300dpi
298 R. Szendrei, I. Elek, and M. Márton

The method was tested on a 6km × 4km section (6990 × 4680 pixels) of
a large scale 1:10 000 topographic map (see Fig. 4). In our experience some
spatial filters, like Kuwahara and conservative smoothing improved the quality
of segmentation. During a large number of tests, symbols completely appearing
on maps were identified at a high rate (> 90%), while symbols disappearing
partially, (like junctions and discontinuities) remained mostly unidentified. In
the latter case, the level of identification can be enhanced with some heuristics
using the neighbour segments.

Acknowledgement
The work was supported by the European Union and co-financed by the Eu-
ropean Social Fund (grant agreement no. TAMOP 4.2.1./B-09/1/KMR-2010-
0003).

References
[1] Corner, R.J.: Knowledge Representation in Geographic Information Systems. Ph.D.
thesis, Curtin University of Technology (December 1999)
[2] Dezső, B., Elek, I., Máriás, Z.: IRIS, Development of Automatized Raster-Vector
Conversion System. Tech. rep., Eötvös Loránd University and IKKK (November
2007) (in Hungarian)
[3] Dezső, B., Elek, I., Máriás, Z.: Image processing methods in raster-vector conver-
sion of topographic maps. In: Karras, A.D., et al. (eds.) Proceedings of the 2009
International Conference on Artificial Intelligence and Pattern Recognition, pp.
83–86 (July 2009)
[4] Janssen, R.D.T., Vossepoel, A.M.: Adaptive vectorization of line drawing images.
Computer Vision and Image Understanding 65(1), 38–56 (1997)
[5] Klinghammer, I., Papp-Váry, Á.: Földünk tükre a térkép (Map, mirror of the
Earth). Gondolat (1983)
[6] Liang, S., Chen, W.: Extraction of line feature in binary images. IEICE Transactions
on Fundamentals of Electronics Communications and Computer Sciences E91A(8),
1890–1897 (2008)
Automatic Recognition of Topographic Map
Symbols Based on Their Textures

Rudolf Szendrei, István Elek, and István Fekete

ELTE University, Faculty of Informatics, Budapest


{swap,elek,fekete}@inf.elte.hu
http://team.elte.hu/

Abstract. The authors’ research goal is to automatize the raster-vector


conversion of topographic maps. To accomplish this goal, a software sys-
tem is currently, under development. It separates the recognition of point-
like, line-like and surface-like objects. The first of these three topics is
discussed in details in this paper. It is assumed that a topographic map
and its vectorized form (possibly in rough form) are given.
In this paper a method is introduced that is able to recognize the point-
like symbols of the map and to assign them as attributes to the correspond-
ing polygon of the vectorized map. It means that point-like symbols should
not appear as polygons in the vectorized data model. Instead, symbols
appear as polygon attributes. The method presented here is also able to
“clean” unnecessary polygons of symbols from vectorized map.
This method is implemented by optimized pattern matching on the
raster image source of the map, where the symbols are handled as special
textures. This method will be improved by using a raw vector model and
a set of kernel symbols.

Keywords: Geoinformatics, map symbol recognition, image processing,


pattern matching.

1 Introduction
This paper1 describes a method that recognizes symbols within the raster-vector
conversion of maps [4]. Maps that contain topographic symbols are made from
vector data models, because photos and remote sensing images contain map
symbols only in raster form.
If a map symbol is identified, then two transformation steps can be made
automatically [1, 3]. First, the vectorized polygon of the map symbol will be
removed from the vectorized map if it was mistakenly recognized as a line or
polygon object. Next, the meaning of the removed symbol will be assigned as an
attribute to the polygon of the corresponding real object i.e. surface in the vector
data model. For instance, after removing the symbol “vineyard”, this attribute
will be added to the boundary polygon of the “real” vineyard (see Fig. 1a ). In
practice, the attributes of the polygons are stored in a GIS database.
1
This research was supported by the project TÁMOP-4.2.1/B-09/1/KMR-2010-003
of Eötvös Loránd University.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 299–306, 2011.

c Springer-Verlag Berlin Heidelberg 2011
300 R. Szendrei, I. Elek, and I. Fekete

Fig. 1. a) Recognizing the symbol of vineyard, b) result of Prewitt filter

2 Main Steps of Raster-Vector Conversion


The raster-vector conversion of maps consists of three main steps.
In the first step, the original colors appearing in the image will be transformed
into a palette with reduced number of colors, which belong to the interpretation
categories. This process can be set up as a series of image filters. These filters
reduce the errors of the image, emphasize dominant color values or increase the
distance between color categories. After these filters were applied, the intensity
values of pixels are classified into color categories by clustering methods. During
these steps the palette is evolved in such a way that the probability of the false
pixel classification is minimal.
The second step determines all edge vectors in the map with reduced number
of colors. Edge filters and edge detectors, like Canny, Laplace or Prewitt meth-
ods, are frequently used to solve this problem. Using these filters local edges can
be found and their direction can be determined. If a pixel is not considered as a
local edge, it can be dropped or represented by a null vector.
The last step is the processing of vectors. This means the extraction of all
possible structural information and storing them with the vectorized map in
the database. This step will build polygons or polylines based on the vectors
determined for each pixel in the previous step.
Experience shows that the most difficult part of raster-vector conversion is the
third step. As an illustration, let us consider the case of roads on a map. The
width of their polylines can be different according to their types. In this case,
most software interpret them as polygons which have edges on both sides of roads
because of their width. The width of a line on the map does not correspond to the
width of the road that is represented by the line. This kind of false classification
is well known, and even the recent applications do not yield complete solution
to this problem.

3 Symbol Recognition
It is important to recognize those objects of the map that represent a symbol
even if they look like lines or polygons. The texture based pattern matching
algorithm developed by the authors will directly recognize these symbols. This
algorithm also determines the positions of symbols on the map. The position
is needed in order to query its corresponding polygon from the vector model.
This polygon will be removed from the vector model and its attribute property
(e.g. “vineyard”) will be assigned to the underlying polygon. A second query is
required to determine the line or polygon that belongs to the symbol [2].
Automatic Recognition of Topographic Map Symbols 301

The attribute represented by the symbol has to be assigned to the correspond-


ing object (a polyline or a surface) of the vectorized map. In order to do this,
polylines and polygons should be handled differently. All segments of the polyline
should inherit the attribute of the polyline symbol. The assignment to polygons
is more sophisticated, because both the border and the interior of a polygon
have to receive the attribute. The decision whether the attribute information is
stored implicitly or explicitly is user dependent. The explicit and implicit storage
means the assignment of attribute information only to the polygon or to all the
polyline segments of the polygon border, respectively.
Character recognition is a special case of symbol recognition [5]. It is assumed
that maps have a legend of symbols on the map sheet or the map interpreter
identifies the map symbols (see Fig. 2).
A map can be represented as an m × n matrix, where each pixel described by
a k-tuple of color components. It is assumed that a part of the map represents
the symbol as a u × v matrix. It is possible that symbols are not rectangular.
This difficulty can be handled by using another u × v matrix that represents a
bitmask. This matrix determines which pixels of the symbol will be used during
pattern matching. The following two sections will show a simple and an improved
pattern matching method.

4 A Simple Pattern Matching Method


The basic method applies a brute force pattern matching as it tries to match
the matrix of the symbol to each u × v matrix of the map. This is an inefficient
solution, because it determines for each pixel of the map whether the pixel is a
part of a symbol or not. Each map pixel can be covered by a u × v matrix in
u ∗ v different ways. This leads to a number of u ∗ v pattern matching where each
costs u ∗ v pixel comparisons. Thus, the runtime in pixel comparisons will be
Tbf (m, n, u, v, k) = Θ((m ∗ n) ∗ (u ∗ v)2 ∗ k).
In addition, this method works only if the symbols on the map have the same
orientation as in the symbol matrix. Unfortunately, polylines mostly have trans-
formed symbols in order to follow the curves of a polyline. Symbols on a map
can be transformed in several ways that makes the matching more difficult. In
the least difficult case an affin transformation was made to a symbol, e.g. it was
rotated. However, it can be much more difficult to recognize the non-located
symbols (e.g. railroads which continously follow the curves of the track). In this
project only the problem of rotated symbols was treated. Without additional
concrete or contextual information the rotated symbols can be identified if the
matching symbol is rotated too. If there is no knowledge of the orientations
of symbols, a number of directions has to be defined as possible positions for
rotated pattern matching. Refining the rotations makes the recognition more
accurate. A correct pattern matching algorithm without any knowledge has to
test at least 20-30 directions. If the symbol is asymmetric, it may be necessary
to do the pattern match with the mirrored symbol too (e.g. country borders)
302 R. Szendrei, I. Elek, and I. Fekete

As the maps are often distorted or defected, statistical methods should be


applied instead of regular pattern matching methods. Several tests are known
for statistical pattern matching depending on the problem class and they mainly
use the mean and variance of a matrix. This paper uses a simple statistical
comparison called similarity function. It takes two u × v matrices as parameters
and calculates the variance of their difference matrix. The pattern matching
algorithm uses the variance as a measure of similarity. In practice, the user
defines or the software calculates a threshold value which will be used for pattern
matching decisions. Each map pixel covered by the u × v matrix of the symbol
is part of the symbol when the value of the similarity function is less than the
threshold.

5 Efficient Pattern Matching


Some commercial software support the raster-vector conversion process. The
embedded algorithms are well known, and most of them are filters (e.g. edge
and corner detectors, edge filters).
Despite the large number of filters, the Gauss and Laplace filters are used
most often in digital image processing as edge filters, while Canny and Prewitt
(see Fig. 1b) methods as edge detectors.
Our task is to enhance the efficiency of symbol recognition. As a starting
point, the vector data model is needed in an uninterpreted raw format, which
naturally contains redundant vectors. The goal is to create the model which is
as similar to the raster image as only possible. From this model, those datas are
required, which describe the presence of a vector and the directon of the vector
(when it exists) at a given point. If a vector exists at a pixel of the map, then
the pixel belongs to an edge, which is represented by a vector with direction d.
If a vector does not exist at a point, no pattern matching is required there. In
other words no symbol is recognized at this point. The pattern matching is much
more efficient if only those map pixels and symbol pixels will be matched which
sit on a line. Namely, these points have a vector in the vector data model.
It is assumed that total length of edges in the map is l ≤ m ∗ n, and the
number of edge pixels in the symbol is ls ≤ u ∗ v. The cost of pattern matching
in a fixed position remain unchanged (u ∗ v pixel comparisons). The estimated
runtime of the improved matching process is then
Teff (m, n, u, v, k) = Θ(l ∗ (u ∗ v) ∗ ls ∗ k).
The total length of the edges may be u ∗ v at worst case. In this case the runtime
can reach asymptotically the Tbf runtime of the brute force algorithm.
The effective runtime of this algorithm is certainly significantly less, because,
in practice the total length of the symbol edges is a linear function of the diameter
of symbols.
It is proven in [4] that the “speed up factor” of the improved method is
Teff  m ∗ n ∗ l2 ∗ k   l2 
s s
=O = O .
Tbf m ∗ n ∗ (u ∗ v)2 ∗ k (u ∗ v)2
Automatic Recognition of Topographic Map Symbols 303

6 Finding the Kernel of the Pattern


Certain symbols are used as a tile in maps and this tile is called kernel. This often
happens when the user selects a part of the map that is larger than the symbol.
This part includes the symbol at least one occurence and may also contain the
symbol partially. In this case the pattern matching is less efficient. The optimized
algorithm uses the smallest tile (see Fig. 2). If a kernel K is a uK × vK matrix
and S is a u × v symbol matrix, then
T
|S(i, j) − K(i mod uK , j mod vK )| < ,
uK ∗ vK
where 0 ≤ i < u, 0 ≤ j < v. Threshold T is used by the pattern matching
algorithm applied on the original symbol.
The kernel can be determined, by for example, a brute force algorithm makes
a self pattern matching with all the submatrices of the symbol matrix. Instead
of using a brute force method of exponential runtime, the algorithm works with
the vector data model of the symbol in the same way as it is used by the pattern
matching algorithm.
Experience shows that the number of edge pixels in the vector data model
is almost irrelevant in comparison with u ∗ v. It is assumed that all tiles of the
symbol matrix have the same direction in the selected area.

Fig. 2. Determining the kernel of the sample

Using vector data, the kernel of the sample can be determined by a motion
vector searching algorithm. The details are not discussed here, because this al-
gorithm is known in the image sequence processing to increase the compression
ratio. (For example, the standard of MPEG and its variants use motion vec-
tor compensation and estimation to remove the redundant image information
between image frames.)

7 Linearizing the Number of Pattern Matching


To apply the method of pattern matching, the previously determined kernel will
be used. Let u denote the horizontal and v the vertical dimension of the kernel. A
useful property of the kernel, which is the smallest symbol, is that it can be used
as tiles to cover the selected symbol. The kernel never overlapped by itself. At
this stage, the algorithm freely selects an edge pixel of the kernel. It is assumed,
that the kernel can be matched in one orientation. The other pixels of the map
region, which is covered by the kernel, do not need to be evaluated. In best case,
the u ∗ v pixels of the map have to be used only once, that is all the pixels of
the map are processed only once. Calculating with the number of rotations of
the symbol, the runtime in optimal case is
304 R. Szendrei, I. Elek, and I. Fekete

Raw Attribute GIS


vectors assignment database

Raster Vector
image model Removal Filtered
of symbol vector
polygons model
Rectangle of Pattern
the sample matching on
texture raster image

Kernel
search

Fig. 3. The complete workflow

Recognition results of symbols


on the whole map

Symbol

Correct 5 535 82 6 11
False Pos. 0 5 4 0 0
False Neg. 0 13 4 0 2

Fig. 4. Recognition results of different symbols

Teff (m, n, u, v, k, r) = Θ(l ∗ (u ∗ v) ∗ ls ∗ k ∗ r),


where k is the number of color components and r is the number of tested rota-
tions.
The vector which belongs to a pixel may have two direction. Therefore, in
each selected part r = 2. The runtime that includes the cases of rotated symbols
will be
Teff (m, n, u, v, k) = Θ(l ∗ (u ∗ v) ∗ ls ∗ k ∗ 2) = Θ(l ∗ (u ∗ v) ∗ ls ∗ k).
When there is a symbol that is not represented in the pixel of the map, then
two cases are possible
1. the pixel is not a part of an edge, or
2. the pixel is a part of an edge, but it is not identified as a part of the symbol
in the given direction.
In the first case, no further pattern matching is needed. In the second case, an
edge pixel of the symbol will be fixed, which is a part of an edge, and the pattern
matching algorithm will start to work with rotating. The angle of rotation α can
be calculated as
Automatic Recognition of Topographic Map Symbols 305
 
dm − ds
α(dm , ds ) = R .
|dm − ds |
where ds is the vector that belongs to the fixed edge pixel of the symbol, dm
is the current starting map pixel of the pattern matching and the function R
returns the angle of the given vector according to i = (1, 0). The worst case gives
the following runtime:

Teff, worst (m, n, u, v, k) = Θ(l ∗ (u ∗ v) ∗ k).

Using the estimation


m∗n
l ≈ ls ∗ ,
u∗v
the runtime is

Teff, worst (m, n, u, v, k) = Θ(m ∗ n ∗ ls ∗ k) = Θ(m ∗ n).

In practice, k is a constant value (e.g. k = 3 for RGB images) and the value ls has
an upper boundary, which is not influenced by the size of the map. Therefore,
the pattern matching algorithm works in linear runtime.

8 Conclusion

A texture based pattern matching algorithm recognizing the symbols of a map


was introduced. The algorithm takes both the raster and the raw vector data
model of the map as input. This method makes it possible to assign the at-
tribute of the symbol to the corresponding vectorized objects. The result is an
interpreted vector data model of the map, where the vector data describing the
symbols themselves are excluded. The process begins on an apropriate part of
the map representing a symbol, selected by the user or the software. After this
step, the algorithm automatically determines all the positions where the sym-
bol appears in the map, using pattern matching. The complete workflow can be
seen on Fig. 3. The quality of the recognition is heavily influenced by the filter
algorithms used before the pattern matching.
The method was applied on a 6km × 4km section (6990 × 4680 pixels) of a
large scale 1:10 000 topographic map. Two pieces of this map and the result of
the recognition of some symbols are shown in Fig. 4.
Our method is compared to a rotation-invariant pattern matching method [6]
based upon color ring-projection. That algorithm gives a high recognition rate
(about 95%) on high resolution images. The authors wrote that “Computation
time is 6 s on a Pentium 300 MHz computer for an arbitrarily rotated image of
size 256 x 256 pixels and a circular window of radius 25 pixels.” Nowadays, the
algorithm may run approx. 35 times faster, but we have maps with size of 100
megapixels. This yields a runtime of 261 seconds in the case of multi-threaded
implementation. Because topographic symbols are simple graphic elements, run-
ning our parallelized algorithm takes approx. 2-3 sec. on the same machine. In
306 R. Szendrei, I. Elek, and I. Fekete

practice, point-like symbols of large scale topographic maps can be vectorized


manually in approx. 1 hour. It is worthwhile to note that topographic maps often
contain areas that are very similar to point-like symbols. This situation can lead
to some false positive matches. False positives can be easily removed manually
after vectorization, as we did.

References
[1] Ablameyko, S., et al.: Automatic/interactive interpretation of color map images.
Pattern Recognition 3, 69–72 (2002)
[2] Bhattacharjee, S., Monagan, G.: Recognition of cartographic symbols. In: MVA
1994 IAPR Workshop on Machine Vision Applications, Kawasaki (1994)
[3] Levachkine, S., Polchkov, E.: Integrated technique for automated digitization of
raster maps. Revista Digital Universitaria 1(1) (2000),
http://www.revista.unam.mx/vol.1/art4/
[4] Szendrei, R., Elek, I., Márton, M.: A knowledge-based approach to raster-vector
conversion of large scale topographic maps (abstract). CSCS, Szeged, Hungary
(June 2010), full paper accepted by Acta Cybernetica (in Press)
[5] Trier, O.D., et al.: Feature extraction methods for character recognition - a survey.
Pattern Recognition 29(4), 641–662 (1996)
[6] Tsai, D., Tsai, Y.: Rotation-invariant pattern matching with color ring-projection.
Pattern Recognition 35(1), 131–141 (2002)
Using Population Based Algorithms for
Initializing Nonnegative Matrix Factorization

Andreas Janecek and Ying Tan

Key Laboratory of Machine Perception (MOE), Peking University


Department of Machine Intelligence, School of Electronics Engineering and
Computer Science, Peking University, Beijing, 100871, China
[email protected], [email protected]

Abstract. The nonnegative matrix factorization (NMF) is a bound-


constrained low-rank approximation technique for nonnegative multi-
variate data. NMF has been studied extensively over the last years, but
an important aspect which only has received little attention so far is a
proper initialization of the NMF factors in order to achieve a faster error
reduction. Since the NMF objective function is usually non-differentiable,
discontinuous, and may possess many local minima, heuristic search al-
gorithms are a promising choice as initialization enhancers for NMF.
In this paper we investigate the application of five population based
algorithms (genetic algorithms, particle swarm optimization, fish school
search, differential evolution, and fireworks algorithm) as new initializa-
tion variants for NMF. Experimental evaluation shows that some of them
are well suited as initialization enhancers and can reduce the number of
NMF iterations needed to achieve a given accuracy. Moreover, we com-
pare the general applicability of these five optimization algorithms for
continuous optimization problems, such as the NMF objective function.

1 Introduction
The nonnegative matrix factorization (NMF, [1]) leads to a low-rank approxi-
mation which satisfies nonnegativity constraints. Contrary to other low-rank ap-
proximations such as SVD, these constraints may improve the sparseness of the
factors and due to the “additive parts-based” representation also improve inter-
pretability [1, 2]. NMF consists of reduced rank nonnegative factors W ∈ Rm×k
and H ∈ Rk×n with k  min{m, n} that approximate matrix A ∈ Rm×n . NMF
requires that all entries in A, W and H are zero or positive. The nonlinear
optimization problem underlying NMF can generally be stated as
1
min f (W, H) = min ||A − W H||2F . (1)
W,H W,H 2

Initialization. Algorithms for computing NMF are iterative and require initial-
ization of the factors W and H. NMF unavoidably converges to local minima,
probably different ones for different initialization (cf. [3]). Hence, random initial-
ization makes the experiments unrepeatable since the solution to Equ. (1) is not

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 307–316, 2011.

c Springer-Verlag Berlin Heidelberg 2011
308 A. Janecek and Y. Tan

unique in this case. A proper non random initialization can lead to faster error
reduction and better overall error at convergence. Moreover, it makes the exper-
iments repeatable. Although the benefits of good NMF initialization techniques
are well known in the literature, most studies use random initialization (cf. [3]).
The goal of this paper is to utilize population based algorithms (abbreviated
as “PBAs”) as initialization booster for NMF. The PBAs are used to initialize
the factors W and H in order to minimize the NMF objective function prior to
the factorization. The goal is to find a solution with smaller overall error at con-
vergence, and/or to speed up convergence of NMF (i.e., smaller approximation
error for a given number of NMF iterations). Instead of initializing the com-
plete factors W and H at once, we sequentially optimize single rows of W and
single columns of H, respectively. This allows for parallel/distributed computa-
tion by splitting up the initialization into several partly independent sub-tasks.
Mathematically, we consider the problem of finding a “good” (ideally the global)
solution of an optimization problem with bound constraints in the form:
min f (x), (2)
x∈Ω

where f : RN → R is a nonlinear function, and Ω is the feasible region. In the


context of this paper, f refers to the optimization (i.e., minimization) of the error
of a single row or column, respectively, of the NMF approximation A ≈ W H.
Hence, f is usually not convex and may possess many local minima. Since NMF
allows only positive or zero values the search space Ω is limited to nonnegative
values. In this paper we consider the following optimization algorithms: Genetic
algorithms (GA), particle swarm optimization (PSO), fish school search (FSS),
differential evolution (DE), and the fireworks algorithm (FWA).
Related work. So far, only few algorithms for non random NMF initializa-
tion have been published. [4] used spherical k-means clustering to group col-
umn vectors of A as input for W . A similar technique was used in [5]. Another
clustering-based method of structured initialization designed to find spatially
localized basis images can be found in [6]. [3] used an initialization technique
based on two SVD processes called nonnegative double singular value decomposi-
tion (NNDSVD). Experiments indicate that this method has advantages over the
centroid initialization in [4] in terms of faster convergence. In a recent study [7]
we have successfully applied feature selection methods for initializing the basis
vectors in W . Compared to the methods mentioned before our approach has com-
putational advantages but can only be applied if the class variables of all data
objects are available. Summarizing, so far no generally preferable initialization
method for NMF exists which motivates for more research in this area.
Only two studies can be found that combine NMF and PBAs, both of them are
based on GAs. [8] have investigated the application of GAs on sparse NMF for
microarray analysis, while [9] have applied GAs for boolean matrix factorization,
a variant of NMF for binary data based on Boolean algebra. The results in these
two papers are promising but barely connected to the initialization techniques
introduced in this paper. To the best of our knowledge, there are no studies that
investigate the application PBAs as initialization enhancers for NMF.
Population Based Algorithms for Initializing NMF 309

2 Methodology
2.1 The NMF Algorithm
The general structure of NMF algorithms is given in Alg. 1. Usually, W and H are
initialized randomly and the whole algorithm is repeated several times (maxrep-
etition). In each repetition, NMF update steps are processed until a maximum
number of iterations is reached (maxiter ). These update steps are algorithm spe-
cific and differ from one NMF variant to the other. If the approximation error
drops below a pre-defined threshold, or if the shift between two iterations is very
small, the algorithm might stop before all iterations are processed.
Given matrix A ∈ Rm×n and k  min{m, n};
for rep = 1 to maxrepetition do
W = rand(m, k);
(H = rand(k, n));
for i = 1 to maxiter do
perform algorithm specific NMF update steps;
check termination criterion;
end
end
Algorithm 1. General Structure of NMF Algorithms

Algorithmic variants. Several algorithmic variants for computing NMF have


been developed. Early algorithms comprise multiplicative update (MU) and
alternating least squares (ALS) [1], as well as projected gradient (PG) [10].
Over time, other algorithms were derived, such as a combination of ALS and
PG (ALSPGRAD) [10], quasi Newton-type NMF [6], as well as fastNMF and
bayesNMF [11].

2.2 Population Based Optimization Algorithms


Genetic Algorithms (GA, [12]) are global search heuristics that operate on a
population of solutions using techniques encouraged from evolutionary processes
such as mutation, crossover, and selection.
In Particle Swarm Optimization (PSO, [13]) each particle in the swarm
adjusts its position in the search space based on the best position it has found
so far as well as the position of the known best fit particle of the entire swarm.
In Differential Evolution (DE, [14]) a particle is moved around in the search-
space using simple mathematical formulation, if the new position is an improve-
ment the particles’ position is updated, otherwise the new position is discarded.
Fish School Search (FSS, [15, 16]) is based on the behavior of fish schools.
The main operators are feeding (fish can gain/loose weight, depending on the
region they swim in) and swimming (mimics the collective movement of all fish).
The Fireworks Algorithm (FWA, [17]) is a recently developed swarm intel-
ligence algorithm that simulates the explosion process of fireworks. Two types
sparks are generated, based on uniform and Gaussian distribution, respectively.
310 A. Janecek and Y. Tan

3 NMF Initialization Using Population Based Algorithms


Before describing new initialization methods using population based algorithms,
we discuss some properties of the Frobenius norm (cf. [18]), which is used as
objective function to measure the quality of the NMF approximation (Equ. (1)).
The Frobenius norm of a matrix D ∈ Rm×n is defined as
⎛ ⎞1/2 ⎛ ⎞1/2
min(m,n) m 
n
 
||D||F = ⎝ σi ⎠ = ⎝ |dij |2 ⎠ , (3)
i=1 i=1 j=1

where σi are the singular values of D, and dij is the element in the ith row and
j th column of D. The Frobenius norm can also be computed row wise or column
wise. The row wise calculation is
 m
1/2

||D||RW
F = |dri |2 , (4)
i=1
n
where |dri | is the norm1 of the ith row vector of D, i.e., |dri | = ( j=1 |rji |2 )1/2 ,
and rji is the j th element in row i. The column wise calculation is
⎛ ⎞1/2
n
||D||CW
F =⎝ |dcj |2 ⎠ , (5)
j=1

m
with |dcj | being the norm of the j th column vector of D, i.e., |dcj | = ( i=1 |cji |2 )1/2 ,
and cji being the ith element in column j. Obviously, a reduction of the Frobenius
norm of any row or any column of D leads to a reduction of the total Frobenius
norm ||D||F . In the following, D refers to the distance matrix of the original
data and the approximation, D = A − W H.

Initialization procedure. We exploit these properties of the Frobenius norm


to initialize the basis vectors in W row wise and the coefficient matrix H column
wise. The goal is to find heuristically optimal starting points for single rows of
W and single columns of H, which can be computed with all PBAs mentioned in
Section 2.2. Alg. 2 shows the pseudo code for the initialization procedure. In the
beginning, H0 needs to be initialized randomly using a non-negative lower bound
for the initialization. In the first loop, W is initialized row wise (cf. Equ. 4), i.e.,
row wri is optimized in order to minimize the Frobenius norm of the ith row dri
of D, which is defined as dri = ari − wri H0. In the second loop, the columns of
H are initialized using on the previously computed rows of W . H is initialized
column wise (cf. Equ. 5), i.e., column hcj is optimized in order to minimize the
Frobenius norm of the j th column dcj of D, which is defined as dcj = acj − W hcj .
1
For vectors, the Frobenius norm is equal to the Euclidean norm.
Population Based Algorithms for Initializing NMF 311

Given matrix A ∈ Rm×n and k  min{m, n};


H0 = rand(k, n);
for i = 1 to m do
Use PBAs to find wri that minimizes ||ari − wri H0||F , cf. Equ. 4;
W (i, :) = wri ;
end
for j = 1 to n do
Use PBAs to find hcj that minimizes ||acj − W hcj ||F , cf. Equ. 5;
H(:, j) = hcj ;
end
Algorithm 2. Pseudo Code for NMF Initialization using PBAs

In line 4, input parameters for the PBAs are ari (the ith row of A) and H0, the
output is the initialized row vector wri , the ith row of W . In line 8, input parame-
ters are acj (the j th column of A) and the already optimized factor W , the output
is the initialized column vector hcj , the j th column of H. Global parameters used
for all PBAs are upper/lower bound of the search space and the initialization
(the starting values of the PBAs), number of particles (chromosomes, fish, ...),
and maximum number of fitness evaluations. The dimension of the optimization
problem is identical to the rank k of the NMF.

Parallelism. All iterations within the first for -loop and within the second for -
loop in Algorithm 2 are independent from each other, i.e., the initialization
of any row of W does not influence the initialization of any other row of W
(identical for columns of H). This allows for a parallel implementation of the
proposed initialization method. In the first step, all rows of W can be initialized
concurrently. In the second step, the columns of H can be computed in parallel.

4 Experimental Evaluation

For PSO and DE we used the Matlab implementations from [19] and adapted
them for our needs For PSO we used the constricted Gbest topology using the
parameters suggested in [20], for DE the crossover probability parameter was set
to 0.5. For GA we adapted the Matlab implementation of the continuous genetic
algorithm available in the appendix of [21] using a mutation rate of 0.2 and a
selection rate of 0.5. For FWA we used the same implementation and parameter
settings as in the introductory paper[17], and FSS was self-implemented following
the pseudo algorithm and the parameter settings provided in [15]. All results are
based on a randomly created, dense 100×100 matrix.

4.1 Initialization Results

At first we evaluate the initial error of the approximation after initializing W


and H (i. e., before running an NMF algorithm). Figures 1 and 2 show the
average approximation error (i. e., fitness of the PBAs) per row and per column,
respectively, for a varying number of fitness function evaluations.
312 A. Janecek and Y. Tan

Fig. 1. Left side: average appr. error per row (after initializing rows of W ). Right side:
average appr. error per column (after initializing columns of H) – rank k =5.

Fig. 2. Left side: average appr. error per row (after initializing rows of W ). Right side:
average appr. error per column (after initializing columns of H) – rank k =30.
Population Based Algorithms for Initializing NMF 313

The figures on the left side show the average (mean) approximation error per
row after initializing the rows of W (first loop in Alg. 2). The figures on the right
side show the average (mean) approximation error per column after initializing
the columns of H (second loop in Alg. 2). The legends are ordered according to
the average approximation error achieved after the maximum number of function
evaluations for each figure (top = worst, bottom = best).
Results for k=5. Fig. 1 shows the results achieved for a small NMF rank k
set to 5 (k is identical to the problem dimension of the PBAs). In Fig. 1 (A),
only 500 evaluations are used to initialize the rows of W based on the randomly
initialized matrix H0 (see Alg. 2). In Fig. 1 (B) the previously initialized rows
of W are used to initialize the columns of H – again using only 500 function
evaluations. As can be seen, (to a small amount) GA, DE and especially FWA
are sensitive to the small rank k and the small number of function evaluations.
PSO and FSS achieve the best approximation results, FSS is the fastest in terms
of accuracy per function evaluations. The lower part (C, D) of Fig. 1 shows the
results when increasing the number of function evaluations for all PBAs from
500 to 2 500. The first 500 evaluations in (C) are identical to (A), but the results
in (D) are different from (B) since they rely on the initialization of the rows of W
(the initialization results after the maximum number of function evaluations in
Fig. 1 (A) and (C) are different). With more function evaluations, all algorithms
except FWA achieve almost identical results.
Results for k=30. With increasing complexity (i. e., increasing rank k) FWA
clearly improves its results, as shown in Fig. 2. Together with PSO, FWA clearly
outperforms the other algorithms when using only 500 function evaluations, see
Fig. 2 (A, B). With increasing number of function evaluations, all PBAs achieve
identical results when initializing the rows of W (see Fig. 2 (C)). Note that
GA needs more than 2 000 evaluations to achieve a low approximation error.
When initializing the columns of H (see Fig. 2 (D)), PSO suffers from its high
approximation error during the first iterations. The reason for this phenomenon
is the relatively sparse factor matrix W computed by PSO. Although PSO is able
to reduce the approximation error significantly during the first 500 iterations, the
other algorithms achieve slightly better results after 2 500 function evaluations.
FSS and GA achieve the best approximation accuracy. The NMF approximation
results in Section 4.2 are based on factor matrices W and H initialized with the
same parameters as Fig. 2 (C, D): k=30, 2 500 function evaluations.

Parallel implementation. We implemented all population based algorithms


in Matlab using Matlab’s Parallel Computing Toolbox which allows to run eight
workers (threads) concurrently. Compared to sequential execution we achieved a
speedup of 7.47, which leads to an efficiency of 0.94. Our current implementation
is computationally slightly more demanding than the NNDSVD initialization (cf.
Section 1 and 4.2). However, we are currently working on an implementation that
allows to use up to 32 Matlab workers (using Matlab’s Distributed Computing
Server). Since we expect the efficiency to remain stable with increasing number
of workers, this implementation should be significantly faster than NNDSVD.
314 A. Janecek and Y. Tan

Fig. 3. Approximation error achieved by different NMF algorithms using different


initialization variants (k=30, after 2500 fitness evaluations for PBA initialization)

4.2 NMF Approximation Results

In this subsection we report approximation results achieved by NMF using the


factors W and H initialized by the PBAs. We compare our results to random
initialization as well as to NNDSVD (cf. Section 1), which is the best (in terms
of runtime per accuracy) available initialization in the literature for unclassified
data. In order to provide reproducible results we used only publicly available
Matlab implementations of NMF algorithms. We used the following implementa-
tions: Multiplicative Update (MU, implemented in Matlab’s Statistics Toolbox),
ALS using Projected Gradient (ALSPG, [10]), BayesNMF and FastNMF (both
[11]). Matlab code for NNDSVD initialization is also publicly available (cf. [3]).
Fig. 3 shows the approximation error on the y-axis (log scale) after a given
number of NMF iterations for four NMF algorithms using different initializa-
tion methods. The initialization methods in the legend of Fig. 3 are ordered
according to the approximation error achieved after the maximum number of
iterations plotted for each figure (top = worst, bottom = best). The classic MU
algorithm (A) presented in the first NMF publication [1] has low cost per iter-
ation but converges slowly. Hence, for this algorithm the first 100 iterations are
shown. For MU, all initialization variants achieve a smaller approximation error
Population Based Algorithms for Initializing NMF 315

than random initialization. NNDSVD shows slightly better results than PSO
and FWA, but GA, DE and especially FSS are able to achieve a smaller error
per iteration than NNDSVD. Since algorithms (B) - (D) in Fig. 3 have faster con-
vergence per iteration than MU but also also have higher cost per iteration, only
the first 25 iterations are shown. For ALSPG (B), all new initialization variants
based on PBAs are clearly better than random initialization and also achieve a
better approximation error than NNDSVD. The performance of the five PBAs
is very similar for this algorithm. FastNMF (C) and BayesNMF (D) are two
recently developed NMF algorithms which were developed after the NNDSVD
initialization. Surprisingly, when using FastNMF, NNDSVD achieves a lower
approximation than random initialization, but all initializations based on PBAs
are slightly better than random initialization. The approximation error achieved
with BayesNMF strongly depends on the initialization of W and H (similar to
ALSPG). The PSO initialization shows a slightly higher approximation error
that NNDSVD, but all other PBAs are able to achieve a smaller approximation
error than the state-of-the-art initialization, NNDSVD.

5 Conclusion
In this paper we introduced new initialization variants for nonnegative matrix
factorization (NMF) using five different population based algorithms (PBAs),
particle swarm optimization (PSO), genetic algorithms (GA), fish school search
(FSS), differential evolution (DE), and fireworks algorithm (FWA). These algo-
rithms were used to initialize the rows of the NMF factor W , and the columns
of the other factor H, in order to achieve a smaller approximation error for a
given number of iterations. The proposed method allows for parallel implemen-
tation in order to reduce the computational cost for the initialization. Overall,
the new initialization variants achieve better approximation results than random
initialization and state-of-the-art methods. Especially FSS is able to significantly
reduce the approximation error of NMF (for all NMF algorithms used), but other
heuristics such as DE and GA also achieve very competitive results.
Another contribution of this paper is the comparison of the general applicabil-
ity of population based algorithms for continuous optimization problems, such as
the NMF objective function. Experiments show that all algorithms except PSO
are sensitive to the number of fitness evaluations and/or to the complexity of
the problem (the problem dimension is defined by the rank of NMF). Moreover,
the material provided in Section 4 is the first study that compares the recently
developed PBAs, fireworks algorithm and fish school search. Current work in-
cludes high performance/distributed initialization, and a detailed comparative
study of the proposed methods. A future goal is to improve NMF algorithms by
utilizing heuristic search methods to avoid NMF getting stuck in local minima.

Acknowledgments. This work was supported by National Natural Science


Foundation of China (NSFC), under Grant No. 60875080. Andreas also wants to
thank the Erasmus Mundus External Cooperation Window, Lot 14 (agreement
no. 2009-1650/001-001-ECW).
316 A. Janecek and Y. Tan

References
[1] Lee, D.D., Seung, H.S.: Learning parts of objects by non-negative matrix factor-
ization. Nature 401(6755), 788–791 (1999)
[2] Berry, M.W., Browne, M., Langville, A.N., Pauca, P.V., Plemmons, R.J.: Algo-
rithms and applications for approximate nonnegative matrix factorization. Com-
putational Statistics & Data Analysis 52(1), 155–173 (2007)
[3] Boutsidis, C., Gallopoulos, E.: SVD based initialization: A head start for nonneg-
ative matrix factorization. Pattern Recogn. 41(4), 1350–1362 (2008)
[4] Wild, S.M., Curry, J.H., Dougherty, A.: Improving non-negative matrix factoriza-
tions through structured initialization. Patt. Recog. 37(11), 2217–2232 (2004)
[5] Xue, Y., Tong, C.S., Chen, Y., Chen, W.: Clustering-based initialization for non-
negative matrix factorization. Appl. Math. & Comput. 205(2), 525–536 (2008)
[6] Kim, H., Park, H.: Nonnegative matrix factorization based on alternating non-
negativity constrained least squares and active set method. SIAM J. Matrix Anal.
Appl. 30, 713–730 (2008)
[7] Janecek, A.G., Gansterer, W.N.: Utilizing nonnegative matrix factorization for
e-mail classification problems. In: Berry, M.W., Kogan, J. (eds.) Survey of Text
Mining III: Application and Theory. John Wiley & Sons, Inc., Chichester (2010)
[8] Stadlthanner, K., Lutter, D., Theis, F., et al.: Sparse nonnegative matrix factoriza-
tion with genetic algorithms for microarray analysis. In: IJCNN 2007: Proceedings
of the International Joint Conference on Neural Networks, pp. 294–299 (2007)
[9] Snásel, V., Platos, J., Krömer, P.: Developing genetic algorithms for boolean ma-
trix factorization. In: DATESO 2008 (2008)
[10] Lin, C.J.: Projected gradient methods for nonnegative matrix factorization. Neural
Comput. 19(10), 2756–2779 (2007)
[11] Schmidt, M.N., Laurberg, H.: Non-negative matrix factorization with Gaussian
process priors. Comp. Intelligence and Neuroscience (1), 1–10 (2008)
[12] Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learn-
ing, 1st edn. Addison-Wesley Longman, Amsterdam (1989)
[13] Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of IEEE
International Conference on Neural Networks, vol. 4, pp. 1942–1948 (1995)
[14] Price, K.V., Storn, R.M., Lampinen, J.A.: Differential Evolution A Practical Ap-
proach to Global Optimization. Springer, Heidelberg (2005)
[15] Filho, C.J.A.B., de Lima Neto, F.B., Lins, A.J.C.C., Nascimento, A.I.S., Lima,
M.P.: Fish school search. In: Chiong, R. (ed.) Nature-Inspired Algorithms for
Optimisation. SCI, vol. 193, pp. 261–277. Springer, Heidelberg (2009)
[16] Janecek, A.G., Tan, Y.: Feeding the fish – weight update strategies for the fish
school search algorithm. To appear in Proceedings of ICSI 2011: 2nd International
Conference on Swarm Intelligence (2011)
[17] Tan, Y., Zhu, Y.: Fireworks algorithm for optimization. In: Tan, Y., Shi, Y., Tan,
K.C. (eds.) ICSI 2010. LNCS, vol. 6145, pp. 355–364. Springer, Heidelberg (2010)
[18] Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, vector spaces, and information
retrieval. SIAM Review 41(2), 335–362 (1999)
[19] Pedersen, M.E.H.: SwarmOps - numerical & heuristic optimization for matlab
(2010), http://www.hvass-labs.org/projects/swarmops/matlab
[20] Bratton, D., Kennedy, J.: Defining a standard for particle swarm optimization.
In: Swarm Intelligence Symposium, SIS 2007, pp. 120–127. IEEE, Los Alamitos
(2007)
[21] Haupt, R.L., Haupt, S.E.: Practical Genetic Algorithms, 2nd edn. John Wiley &
Sons, Inc., Chichester (2005)
A Kind of Object Level Measuring Method Based on
Image Processing*

Xiaoying Wang and Yingge Chen

Department of Computer Science, Changshu Institute of Technology, Changshu, China


{xiaoying_wang,yingge_chen}@cslg.edu.cn

Abstract. In order to accurately measure level from object to image acquisition


devices, this paper put forward a kind of new non-contact level measuring
method based on image process and its prototype equipment. Through a series
of image preprocessing for captured image such as difference, grayness, binari-
zation and thinness, original image is preferable to measure than before. The re-
lation between image pixel value and tilt angle is acquired via mathematical
derivation, as well as the distance formula is gained through fitting function. A
large amount of data is gathered in the experiment while error analysis of these
results is also offered, in which testified that the measuring method for object
distance achieved expected effect.

Keywords: image processing, single camera, angle detection, distance


measurement.

1 Introduction
Traditional level measurements such as molten steel are mainly involved eddy current
probe, float measurement, isotope measurement and radar measurement. Those instru-
ments are easy to damage under harsh environment such as high temperature, dust and
liquid environment. What’s more, they are very expensive. This paper designed a kind
of non-contact level measuring method based on image process, using only one digital
camera or other visual sensors to capture a single image for measurement which has
better maintenance performance and low cost. This method can avoid small view and
spatial matching problem in three-dimensional field by its simple structure and
operation.
Current available measuring techniques based on image process are measurement
based on blur extent [1], the virtual stereo vision measurement [2], the measurement
based on image magnification times [3], and etc. Measurement based on blur extent
applied only to the situation that the lens is closer to the target and is unsuitable for
long distance. The principle of virtual stereo vision measurement is similar to binocu-
lar measurement that they have two sets of mirror to form a single virtual camera,
which requires that the two sets of mirror tilt angle is symmetrical. In addition, it has

*
This work is supported by National Natural Science Foundation of China (No. 60804068),
Natural Science Foundation of Jiangsu Province (No.BK2010261), and Cooperation Innova-
tion of Industry, Education and Academy of Jiangsu Province (No. BY2010126).

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 317–326, 2011.
© Springer-Verlag Berlin Heidelberg 2011
318 X. Wang and Y. Chen

the assumption that the axis and the object is vertical, which is same as the method
based on image magnification times.
The fuzzy method in reference [1] uses the Gaussian formula in optics knowledge,
that is, the image will blur when the distance from the lens to object changes and the
object distance is calculated by its blurring extent. The wavelet algorithm is used to
detect image edge. Although the author mentioned that selecting an appropriate
threshold can determine the blurring band width, but he didn’t give a detailed
theoretical analysis and there exists large errors.
Reference [4] also uses a single light source to project a concentric circle on screen
and calculates the object distance through picture. But they didn’t do meticulous work
for image processing. According to given image after preprocessing in literature, it is
hard to measure. Reference [4] also use some lens before light source for color filter
which will not only increase the complexity of the equipment, but also bring noise or
distortion to the collected images.
For above reasons, this paper proposed an object level measuring system based on
image processing, only one image acquisition device for image data acquisition and a
standard camera projection equipment as an auxiliary light source plus necessary
image pretreatment program. Except the calculation of object level, the system can
firstly judge whether the object's axis is perpendicular to the image acquisition
equipment and do fine adjustment automatically or alarm to adjust by hand, so it can
prevent most errors caused by tilt.

2 Implementation

2.1 Equipment

The image collecting device used in this article is shown in Figure 1. Where 1 is the
support installed on the top of object, 2 is image acquisition equipment such as
camera, 3 is standard video projection equipment and 4 is test object. As standard
video projection equipment, laser transmitter is used to project parallel light and its
projection graphics can be concentric circle, equidistant parallel line or some
equidistant point on a straight line. This paper will project concentric circles as
example.

Fig. 1. Image Collecting Equipment


A Kind of Object Level Measuring Method Based on Image Processing 319

Fig. 2. Original Image Fig. 3. Image after several processing steps

2.2 Image Preprocess

Preprocess the original image and get the appropriate picture is very important which
relates to whether we can measure and calculate smoothly later. After obtaining the
original image in this system, the major processing steps are completed as follow.
1) Image subtraction processing [5], which can effectively remove the background
image and leave useful information.
2) Image grayscale processing, which convert the color image to grayscale.
3) Image binarization processing, which calculate the histogram point in grayscale
picture and got the bottom point as threshold.
4) Refinement of the binary image and the final image of the skeleton can be
obtained. Fig.2 and Fig.3 are images before and after processing separately.

2.3 Angle Detection

From Fig.4, it can be seen that the deformation extent of projected image will become
large, which depend on the angle between the test object and camera surface. We need
to ensure that the test object is vertical to the camera surface as much as possible, so
we can get accurate measurement.

(a) vertical (b) anticlockwise tilt 20° (c) anticlockwise tilt 35°

Fig. 4. Images with different angle


320 X. Wang and Y. Chen

θ
α α

r1 r2

Fig. 5. Angle Project Relation

This paper adopts anticlockwise tilt on horizontal direction as example. In Fig.5,


where B is the focus of image capture device, f is the focal length, H is the object
distance (Focal length is usually much smaller than the object distance and can be
ignored. Fig.5 only indicates imaging relationship). The image on image acquisition
devices is EF, which project concentric circles on test object AC'. D' is the center of
concentric circles, and it is the midpoint of AC’. Because there is a tilt angle θ
between them, the concentric circles is deformed. The corresponding image point of
D’ is not the midpoint of EF any longer. The value of r1, r2 that correspond to R1, R2
in Fig. 4 (c) can be calculated through related program. Because AC//EF, it can be
obtained by the similarity properties of triangles.
AD DC H
= = (1)
r2 r1 f

During angle detection process, we fix lens a certain distance m meter to test object
so H and angle α are known and AB, BC can also be calculated through them. So we
assume the length as L for convenience.
As shown in Fig.5, we establish coordinate system which A is origin and AB
direction is x axis and we may deduce:
The linear equation of AC' is:
y = tan(α + θ ) ⋅ x (2)
The linear equation of BC' is:
y = tan 2α ⋅ ( x − L ) (3)
The linear equation of AC is:
y = tan α ⋅ x (4)
A Kind of Object Level Measuring Method Based on Image Processing 321

By (2) and (3), the coordinate of C’ can be obtained:


⎧ L ⋅ tan 2α
⎪ xC ' = tan 2α − tan(α + θ )

⎨ (5)
⎪ y = L ⋅ tan 2α ⋅ tan(α + θ )
⎩⎪ tan 2α − tan(α + θ )
C'

D’ is the midpoint of AC’, so the coordinate of D’ is:


⎧ L ⋅ tan 2α
⎪ xD ' = 2[tan 2α − tan(α + θ )]

⎨ (6)
⎪ y = L ⋅ tan 2α ⋅ tan(α + θ )
⎪⎩ D ' 2[tan 2α − tan(α + θ )]

The line of BD’ goes through the point of D’ and B, so its equation can be
obtained:
tan 2α ⋅ tan(α + θ )
y= ( x − L) (7)
2 tan(α + θ ) − tan 2α

Combined equation (4) and (7), the coordinate of D can be obtained:


⎧ tan 2α ⋅ tan(α + θ ) ⋅ L
⎪ xD = tan 2α ⋅ tan(α + θ ) − 2 tan α ⋅ tan(α + θ ) + tan α ⋅ tan 2α

⎨ (8)
⎪y = tan α ⋅ tan 2α ⋅ tan(α + θ ) ⋅ L
⎪⎩ D tan 2α ⋅ tan(α + θ ) − 2 tan α ⋅ tan(α + θ ) + tan α ⋅ tan 2α

Combined equation (3) and (4), the coordinate of C can be obtained:


⎧ tan 2α ⋅ L
⎪⎪ xC = tan 2α − tan α
⎨ (9)
⎪ y = tan α ⋅ tan 2α ⋅ L
⎩⎪ tan 2α − tan α
C

The ratio of AD and DC can be calculated:


AD tan 2α ⋅ tan(α + θ ) − tan α ⋅ tan(α + θ )
= (10)
DC tan α ⋅ tan 2α − tan α ⋅ tan(α + θ )

Simplify (10), θ which indicated by AD and DC can be obtained:


AD − DC
θ = arctan( ⋅ tan α ) (11)
AD + DC

Combined equation (1) and (11), θ can be figured as:


r2 − r1
θ = arctan( ⋅ tan α ) (12)
r2 + r1

Where r1 and r2 can be calculated through acquired image and tan α is also be
identified when equipment is fixed.
322 X. Wang and Y. Chen

2.4 Distance Detection

Figure 6 (a), (b), (c) show some different images acquired by different test distance
and we well aware that the image is larger when it is close to image acquisition
equipment.

(a) H=100cm (b) H=85cm (c) H=70cm

Fig. 6. Images with different distance

As shown in Fig.7, point O is the focus of image acquisition devices, the image of
object AB is CD and the image of object A’B’ which moved some distance back is
C’D’. f is the focal length of image acquisition devices, H and H1 is the distance from
test object to image acquisition device which need to be evaluated. CD and C’D’ is
the diameter of projected image which can be calculated by program. AB and A'B 'are
only the different position of the same object. According to Fig. 7 and the similarity
properties of triangles, we can obtain:
AB
H= f× (13)
CD

Fig. 7. Object Position Project Relation


A Kind of Object Level Measuring Method Based on Image Processing 323

3 Accuracy Analysis
In this experiment, the value of tan α
is 5.7671. The tilt angle of the object θ can
be calculated according to formula (12). The value range of θis confined to 0-40
degrees in the following experiment due to the limitation of actual equipment.
The experiment and the computational process are as follows:
1) As shown in Fig.1, projecting the concentric circles to the test object using
standard video projection equipment and got object image with the image acquisition
device which is placed m meter away from the object. We set m as 1 meter in the
experiment for convenience.
2) Rotating the object a certain angle, 2 degrees per rotation in this experiment.
3) Computing the size of r1 and r2 in acquired image according to our algorithm
and its unit is pixel.
Repeat steps 2) and step 3) and got various data sets. To eliminate the influence of
environment and other factors, we collect several group data in the same angle and
got its arithmetic mean, which are shown partly on Table 1. And then we got
computational angle and errors through formula (12). Table 1 and Fig.8 indicate that
those data errors are mostly fluctuated between ±1degree.

Table 1. Result and Analysis of Angle Detection

Actual Computational
r1 r2 Errors
Angle Angle
0 180.45 180.75 0.27 0.27
2 181.85 184.70 2.56 0.56
4 183.95 189.62 5.00 1.00
6 183.69 191.21 6.60 0.6
8 185.48 195.48 8.61 0.61
10 186.95 199.67 10.74 0.74
12 190.19 205.24 12.38 0.38
14 192.76 210.48 14.22 0.22
16 195.71 216.86 16.47 0.47
18 198.40 222.20 18.08 0.08
20 201.48 228.38 19.83 -0.17
22 204.90 235.86 22.06 0.06
24 208.86 245.52 24.97 0.97
26 212.62 251.76 25.94 -0.06
28 218.43 263.05 28.14 0.14
30 223.33 272.90 29.96 -0.04
32 228.00 283.48 32.04 0.04
34 234.76 296.48 33.84 -0.16
36 243.24 311.71 35.45 -0.55
38 254.22 329.89 36.76 -1.24
40 263.67 354.67 40.34 0.34
324 X. Wang and Y. Chen

1.5

1.0

0.5

0.0

-0.5

-1.0

-1.5
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

Fig. 8. Angle Errors Distribution

With a similar angle detection process, we did several experimental measurements


for different object position and got the largest diameters of various sets of concentric
circles which are shown partly on Table 2. The fitting results in Matlab software are
also shown on Fig.9. And then we got the computational value and errors through
fitting function and its largest diameter. Fig.10 indicates that most data errors are
confined in ±0.2cm.

Table 2. Result and Analysis of Distance Detection

Largest
Computational Actual
Diameter Errors/cm
Distance/cm Distance/cm
D(pixel)
580 49.94 50.1 -0.16
569 50.85 50.9 -0.05
547 52.90 53.1 -0.20
538 53.81 53.7 0.11
519 55.77 55.7 0.07
482 60.11 60.0 0.11
454 63.85 63.9 -0.05
439 65.95 65.9 0.05
402 72.10 72.1 0.00
373 77.64 77.8 -0.16
338 85.79 85.8 -0.01
313 92.72 92.7 0.02
288 100.86 100.9 -0.04
268 108.25 108.1 0.15
257 112.89 112.9 -0.01
244 119.11 119.1 0.01
A Kind of Object Level Measuring Method Based on Image Processing 325

Fig. 9. Fitting Curve with Distance

Fig. 10. Distance Errors Distribution

There are a lot of reasons to generate errors such as ordinary low resolution
webcam in this experiment. Manual focusing also generates some visual deviation.
Moving object to experimental angle generates some errors too while the
measurement tools is not absolutely precise.

4 Conclusions
Discussed the shortcomings of exist research, this paper put forward a kind of new non-
contact level measuring method based on image process and its prototype equipment.
Using only one image acquisition device and one video projection equipment as
auxiliary light, the system can detect tilt angle and object distance automatically. The
paper also did some deep comparison of actual distance and computational distance.
Those data errors are acceptable which may obtain expected measuring effect.
326 X. Wang and Y. Chen

References
1. Faquan, Z., Liping, L., Mande, S., et al.: Measurement Method to Object Distances by Mo-
nocular Vision. Acta Photonica Sinica 38(2), 453–456 (2009)
2. Jigui, Z., Yanjun, L., Shenghua, Y., et al.: Study on Single Camera Simulating Stereo Vi-
sion Measurement Technology. Acta Photonica Sinica 25(7), 943–948 (2005)
3. Chunjin, Z., Shujua, J., Xiaoning, F.: Study on Distance Measurement Based on Monocular
Vision Technique. Journal of Shandong University of Science and Technology 26(4),
65–68 (2007)
4. Hsu, K.-S., Chen, K.-C., Li, T.-H., et al.: Development and Application of the Single-
Camera Vision Measuring System. Journal of Applied Sciences 8(13), 2357–2368 (2008)
5. Shuying, Y.: VC++ Image Processing Program Design (The Second Version). Northern
Jiaotong University Press (2005)
Fast Human Detection Using a Cascade of United Hogs

Wenhui Li, Yifeng Lin, and Bo Fu

College of Computer Science and Technology, Jilin University, 130012 Changchun, China
[email protected], [email protected], [email protected]

Abstract. Accurate and efficient human detection has become an important area
for research in computer vision. In order to solve problems in the past human
detection algorithms such as features with fixed sizes, fixed positions and fixed
number, we propose the human detection based on united Hogs algorithm.
Through intersection tests and feature integration, the algorithm can dynamically
generate the features closer to human body contours. Basically maintaining the
detection speed, the detection accuracy is improved by our algorithm.

Keywords: human detection, hog, adaboost, cascade classifier.

1 Introduction
In recent years, with the development of image recognition, object detection in video
sequences and 2D images has made a series of success. Such as, in the study of human
face detection, Viola and Jones[1] proposed the algorithm of rectangular features with
cascade boosting, which made face detection faster and more accurate. After the great
success in face detection technology, human detection has become a hot issue on
computer vision[2]. Useful information on human detection is got mainly from body
shapes and body parts. Relevant algorithms in human detection have been proposed,
which are mainly divided into two categories: methods based on various parts of the
body and methods based on single detection window. Literature [3] gave a description
of them in detail.
For single detection window methods, the paper from Gavrila and Philomin[4]
compared the edge image of the target image with the one of images in sample database
with respect to bevel in 1999. After this, Gavrila[5] constructed edge images on pe-
destrians in database to a layered structure by similar types, which thereby speeded up
the detection when compared in database. This method was successfully applied in a
real-time human detection system[6]. In 2000, Papgeorgiou and Poggio[7] proposed a
human detection algorithm based on Haar wavelet features and SVM training. Inspired
by rectangle feature filter as a good one on human face detection[1], Viola and Jones[8]
combined the Haar wavelet with spatial and temporal characteristics of the human body
in its movement in 2003. Dalal and Triggs[9] studied the species of features on object
recognition in depth, finding that the local appearances of objects are often shown by
the distribution of local gradients intensity and local edges. Inspired, in 2005 they
proposed a human detection algorithm using Histograms of Oriented Gradients(Hog),
and demonstrated by experiments that local normalized Hog was much better than
human detection features existed before. Soon, Dalal and Triggs[10] improved the

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 327–332, 2011.
© Springer-Verlag Berlin Heidelberg 2011
328 W. Li, Y. Lin, and B. Fu

method above with the pixel information on optical flow in 2006, which made human
detection more accurate. In the same year, Zhu et al[2] made a fast and accurate human
detection algorithm using a cascade of Hogs.
However, human detection algorithms using Hog, whichever Dalal and Triggs or
Zhu et al proposed, are static in the stage of feature extraction, and the number of
features obtained is limited greatly by the size of training samples. This paper improves
the Hog feature extraction and reduces the number of useless features by dynamically
generating new features from useful ones, which therefore reduces subsequent over-
heads in the training of weak classifiers, strong classifiers and cascade classifiers, and
also increases the accuracy of human detection. Experiments at the second half of this
paper show these improvements.

2 The Human Detection with United Hogs Framework


Gradient feature extraction is the most important in human detection based on Hog,
which directly affects speed and accuracy of the detection. One of the obvious short-
comings in Dalal and Triggs[9] algorithm is the fixed size of blocks. The same small
size (16*16 pixels mainly) causes the missing description of large areas or global areas.
If block size is fixed at 16*16 pixels, a total of 7*15 blocks are available in the detec-
tion window with 64*128 pixels, which obviously can not describe a human body
accurately. Besides, small blocks can not be used to quickly remove non-person images
in early stages of the cascade classifier. Thus, Zhu et al[2] extracted features on the
blocks with different sizes, different locations and different aspect ratios, which
therefore increases the number of features. When sizes of blocks are from 12*12 pixels
to 64*128 pixels and their aspect ratios become 1:1, 1:2 and 2:1, a total of 5031 blocks
are available in the detection window with 64*128 pixels. Experiments show that
(1)more semantic features on human bodies are described in large-scale blocks, and
(2)large-scale features are used to remove the majority of non-person images in early
levels of the cascade classifier relative to small-scale features in later ones.
However, as the other algorithms based on single detection window, the fast human
detection using a cascade of Hog is greatly affected by the size of training images. Re-
ducing training images scale, the number of Hog features decreases sharply. For
small-scale images, such as images with 16*32 pixels, which are used in the experiments
of this paper, a total of less than 1,000 blocks can only be obtained even with vari-
able-size blocks and 2 pixels step which were used in the literature [2] to scan the entire
image. The less the number of features is, the less accurate human detection will be.
That is because the previous human detection algorithms are static in the feature
extraction, where Eigen values are calculated by features with pre-described shapes,
pre-described sizes, and pre-described locations. Given with these factors, a total of
features are determined in the images with fixed size. To solve this problem, we com-
bine a number of good features which has been obtained in an image to some new better
features.
In the traditional human detection algorithm with Hog, the weak classifiers are
trained directly after extracting features. The sets of features and weak classifiers will
not change after scanning the image only once. We combine each two good features
whose regular rectangles intersect to one feature which is thereby based on irregular
rectangle and then trained to a new weak classifier. Continuously, each two new good
Fast Human Detection Using a Cascade of United Hogs 329

weak classifiers are done further intersection test and trained to be a new weak classi-
fier in the third layer, and so on. This process is shown in Figure 1. In order to balance
the training speed and the detection accuracy, we only combine features in the same
layers. That is, the only new weak classifiers which were combined to in the exactly
previous step can be done intersection test and training. Note that the new weak clas-
sifiers may have the same irregular rectangles as ones generated before, so judgments
on the same irregular rectangle are needed. The more layers are, the more useful fea-
tures and useful weak classifiers are, and after adding some or all of them to training,
the higher detection accuracy will be. The number of layers generated by combination
of weak classifiers depends on the size of initial blocks, the number of weak classifiers
combined in each layer and their performance. In order to make features more obvious
and make calculations simpler, there is not need to combine them all to the end. This
algorithm can stop after a few layers (usually from 5 to 10 layers).

Fig. 1. The Generation of United Hogs

The number of weak classifiers will increase with more layers. Taking the number of
weak classifiers, detection speed and detection precision into account, weak classifiers
which are ready to be combined in each layer are defined as useful weak classifiers,
which are those with higher detection rate and lower false rate selected from the current
layer. As good weak classifiers are retained in each layer, which ensures that the new
weak classifiers generated from them are highly useful, so the number of blocks in the
first layer can be less, and scale and step can change less complicatedly. For
high-resolution images (such as 64*128 pixels and above), there is no need of hundreds
of thousands initial blocks and weak classifiers, which greatly reduces the useless
features. For low-resolution images (such as 16*32 pixels and below), this algorithm
gets a few features with fixed-size blocks in the first layer, and a large number of useful
features can be generated by combinations, which solves the problem that traditional
330 W. Li, Y. Lin, and B. Fu

feature extraction relies on image scale heavily. Therefore, 1/10 pixels of the smaller
one in length and width of the training images are the initial side size of blocks (aspect
ratio is 1:1). That is, for 64*128 pixels images and 16*32 pixels images, initial sizes of
blocks are 6*6 pixels and 2*2 pixels, and the step is half of the side size of blocks.
Dynamic selection in the useful weak classifiers significantly improves the detection
rate and speed.
After combination, the detection rate and false rate of weak classifiers have im-
proved. The shape of Human body should give priority to meet these advanced weak
classifiers, which therefore will be selected in the early stages of cascade classifier. As
long as they satisfy the certain requirements in the certain stage of cascade classifier,
each of them can be a strong classifier. It shows that a weak classifier after combination
is actually a strong classifier in a sense. The more effective features are obtained by
combining weak classifiers constantly. The method of feature integration in our paper
is totally different from Adaboost algorithm and is a more thorough application of the
machine learning in features extraction.

3 Experiments and Results


As mentioned earlier, for training images with 64*128 pixels, we used 6*6 pixels as the
initial size of blocks and 3 pixels step. Therefore there was a total of 20*41=820 initial
blocks in the first layer, which then were combined to generate new features with 9
layers. The useful weak classifiers were trained to the effective strong AdaBoost clas-
sifiers which constituted the cascade classifier with fmax=0.7 and dmin=0.998 in each
stage. The cascade classifier was trained with the algorithm in literature [2] about 9
days in the same PC and the same training set as above.
Figure 2 shows classification accuracy of blocks in Zhu et al algorithm[2] and our
algorithm with the same false rate. Blocks here are just trained to the weak linear SVM
classifiers, and do not constitute strong classifiers or even cascade classifier. It can be
shown that the features from our algorithm contain more information and higher
accuracy of human detection.

(a) (b)
Fig. 2. Classification Accuracy of (a)The Fast Human Detection Algorithm by Zhu et al and
(b)The Human Detection with United Hogs Algorithm in our paper
Fast Human Detection Using a Cascade of United Hogs 331

Fig. 3. The Best of Rectangular Filter, Hog with Variable-size Blocks Filter and United Hogs Filter

(a) (b)

(c)

Fig. 4. Stability of (a)The Best Haar, (b)The Best Hog based on Variable-size Blocks and (c)The
Best United Hogs

Figure 3 clearly shows the best Haar, the best Hog based on variable-size blocks and
the best united Hogs in our paper. The best means the best weak classifier which has the
highest detection rate / false positive rate on average. There is a weak classifier in the
third image at Figure 3, which was generated in the 10th layer in our algorithm. It con-
sisted of 38 initial blocks which often appeared at the edges of the important human
parts such as arms, legs and head. The differences between the traditional rectangular
332 W. Li, Y. Lin, and B. Fu

Hogs and the United Hogs can be seen from the figure: the former selects the entire
edge of human body, which includes some non-human edges; the latter accurately
selects the edge of body without any non-human ones which improves the detection
accuracy. Haar does not include the whole human body, and only selects a few repre-
sentative regions, whose accuracy is the worst.
The differences in these three algorithms can be shown from the stability comparison
of them. First, Eigen values on average of these three best features are calculated in the
test set. And then correlation values between Eigen values and their respective mean
values are calculated. The results are shown in Figure 4, where correlation values are
related to the peaks of images: the peak and variance of the best Haar is 0.5 and 0.3; the
peak and variance of the best Hog based on variable-size blocks is 0.85 and 0.1; the peak
and variance of the best united Hogs is 0.88 and 0.08, showing that united Hogs feature in
our paper is more stable and more suitable for human detection.

4 Summary
We propose the human detection based on united Hogs algorithm. Through intersection
test and feature integration, the algorithm can dynamically generate the features closer to
the human body contours. Basically maintaining the detection speed, the detection rate of
our algorithm is increased by 2.75% and 4.03% than the fast human detection based on
Hog with variable-size blocks algorithm. Since rectangles generated in the combination
are irregular, it is difficult to use integration maps to speed up calculations and therefore
the detection speed is severely affected, which is the further research in the next paper.

References
1. Viola, P., Jones, M.J.: Robust Real-Time Face Detection. J. International Journal of Com-
puter Vision 52(2), 137–154 (2004)
2. Zhu, Q., Avidan, S., Yeh, M.C., Cheng, K.T.: Fast Human Detection Using a Cascade of
Histograms of Oriented Gradients. In: Proc. IEEE International Conference on Computer
Vision and Pattern Recognition (2006)
3. Gavrila, D.M.: The Visual Analysis of Human Movement: A survey. J. Journal of Computer
Vision and Image Understanding 73(1), 82–98 (1999)
4. Gavrila, D.M., Philomin, V.: Real-time Object Detection for Smart Vehicles. In: Proc. IEEE
International Conference on Computer Vision and Pattern Recognition (1999)
5. Gavrila, D.M.: Pedestrian detection from a moving vehicle. In: Vernon, D. (ed.) ECCV
2000. LNCS, vol. 1843, pp. 37–49. Springer, Heidelberg (2002)
6. Gavrila, D.M., Giebel, J., Munder, S.: Vision-Based Pedestrian Detection: The Projector
System. In: Proc. IEEE Intelligent Vehicles Symposium (2004)
7. Papageorgiou, C., Poggio, T.: A Trainable System for Object Detection. J. International
Journal of Computer Vision 38(1), 15–33 (2000)
8. Viola, P., Jones, M., Snow, D.: Detecting Pedestrians using Patterns of Motion and Ap-
pearance. In: International Conference on Computer Vision (2003)
9. Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: Confer-
ence on Computer Vision and Pattern Recognition, pp. 886–893 (2005)
10. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and
appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952,
pp. 428–441. Springer, Heidelberg (2006)
The Analysis of Parameters t and k of LPP on
Several Famous Face Databases

Sujing Wang, Na Zhang, Mingfang Sun, and Chunguang Zhou

College of Computer Science and Technology,


Jilin University, Changchun 130012, China
{wangsj08,nazhang08}@mails.jlu.edu.cn, [email protected]

Abstract. The subspace transformation plays an important role in the


face recognition. LPP, which is so-called the Laplacianfaces, is a very
popular manifold subspace transformation for face recognition, and it
aims to preserve the local structure of the samples. Recently, many vari-
ants of LPP are proposed. LPP is a baseline in their experiments. LPP
uses the adjacent graph to preserve the local structure of the samples.
In the original version of LPP, the local structure is determined by the
parameters t (the heat kernel) and k (k-nearest neighbors) and directly
influences on the performance of LPP. To the best of our knowledge,
there is no report on the relation between the performance and these
two parameters. The objective of this paper is to reveal this relation on
several famous face databases, i.e. ORL, Yale and YaleB.

Keywords: locality preserving projection; the adjacent graph; the near-


est neighbors; heat kernel; parameters set.

1 Introduction
As one of the most important biometric techniques, face recognition has gained
lots of attentions in pattern recognition and machine learning areas. The sub-
space transformation plays an important role in the face recognition. Feature
extraction is one of the central issues for face recognition. Subspace transfor-
mation (ST) is often used as a feature extraction method. The idea of ST is
to project the feature from the original high dimensional space to a low dimen-
sional subspace, which is called projective subspace. In the projective subspace,
the transformed feature is easier to be distinguished than the original one.
Principal Component Analysis (PCA)[12] is a widely used subspace transfor-
mation. It attempts to find the projective directions to maximize variance of the
samples. To improve classification performance, LDA[1] encodes discriminant in-
formation by maximizing the ratio between the between-class and within-class
scatters. LDA can be thought of as an extension with discriminant information
of PCA. Both PCA and LDA focus on preserving the global structure of the sam-
ples. However, Seung[10] assumed that the high dimensional visual image infor-
mation in the real world lies on or is close to a smooth low dimensional manifold.

Corresponding author.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 333–339, 2011.

c Springer-Verlag Berlin Heidelberg 2011
334 S. Wang et al.

Inspired by this idea, multiple manifold dimensionality reduction methods that


preserve the local structure of samples have been proposed, such as ISOMAP[11],
LLE[9], Laplacian Eigenmaps[2] etc. Locality Preserving Projections (LPP)[5] is
a linear Laplacian Eigenmaps. Its performance is better than those of PCA and
LDA for face recognition[6]. Recently, many variants[15][3][13][16][7][14] of LPP
are proposed. LPP is a baseline in their experiments.
However, the performance of LPP depends mainly on its underlying adjacent
graph whose construction suffers from the following points: (1)such adjacent
graph is artificially constructed; (2) it is generally uneasy about assigning ap-
propriate values for the neighborhood size k and heat kernel parameter t involved
in graph construction. To the best of our knowledge, there is no report on the re-
lation between the performance and these two parameters k and t. The objective
of this paper is to reveal this relation on several famous face databases.

2 Locality Preserving Projections

Given a set of N samples X = {x1 , x2 , . . . , xN }, xi ∈ RD , we attempt to find a


transformation matrix W of size D × d to map: yi = WT xi , yi ∈ Rd , such that
yi easier to be distinguished in the projective subspace.
Locality Preserving Projections (LPP)[5] attempts to preserve the local struc-
ture of the samples in the low-dimensional projected subspace as much as possi-
ble. The local structure of the samples is measured by constructing the adjacency
graph G. There are two ways to construct G: ε− neighborhoods and k nearest
neighbors. The similarity matrix S is defined by the following two ways:

1. 0-1 ways 
1 nodes i and j are connected in G
Sij = (1)
0 otherwise.
2. Heat kernel

exp(−xi − xj 2 /2t2 ) nodes i and j are connected in G
Sij = (2)
0 otherwise.

where t is a parameter that can be determined empirically. When t is large


enough, exp(−xi − xj 2 /t) = 1, heat kernel becomes 0-1 ways. Obviously, 0-1
ways is a special case of the heat kernel. In order to contain no any discriminant
information, we do not use any label information to construct the similarity
matrix S. The criterion function of LPP is as follows:

min (yi − yj )2 Sij (3)
W
i,j

The criterion function incurs a heavy penalty if neighboring points xi and xj


are mapped far apart. Therefore, minimizing it is an attempt to ensure that if
xi and xj are close, then yi and yj are close, as well. Finally, the transformation
The Analysis of Parameters t and k of LPP 335

matrix consists of the eigenvectors associated with the smallest eigenvalues of


the following generalized eigenvale problem:
XLXT w = λXDXT w (4)

where D is a diagonal matrix; its entries Dii = j Sij measure the local density
around xi . L = D − S is the Laplacian matrix.
We define SL = XLXT and SD = XDXT , and rewrite Eq. (4) as follows:
SL w = λSD w (5)
Theorem 1. Let N and D be the dimension of the sample and the number of
the samples,respectively .If N > D, then the rank of SL is at most N − 1 and
the rank of SD is at most N .
Proof. According to the definition of the Laplacian matrix and the fact that the
similarity matrix is symmetrical.
 
 j w1j − w11 −w12 ··· −w1N 
  
 −w w − w · · · −w 
 12 j 2j 22 2N 
|L| =  . . . .  (6)
 .. .. .. .. 
  
 −w1N −w2N ··· 
j wN j − wN N

we add the 2rd, 3nd,... Nth rows into the 1st row, and obtain |L| = 0. So, the
rake of L is at most N − 1. It is known that the maximum possible rank of the
product of two matrices is smaller than or equal to the smaller of the ranks of
the two matrices. Hence, rank(SL ) = rank(XLXT ) ≤ N − 1. Similarly, we have
rank(SL ) ≤ N .
From Theorem 1, LPP like LDA also suffers from the SSS problem. Another
problem is how to measure he local structure of the samples. LPP uses the
similarity matrix S. If every entries are the same, the local structure of the
samples is not preserved. Without loss of generality, each entry in S is set as
1/N 2 , i.e., L = N1 I − N12 eeT , where e ia a vector, whose entries are 1. The
matrix SL is equivalent to the covariance matrix in PCA[6]. In this case, LPP
degenerates into PCA. Obviously, the performance of LPP dependents on how
construct the similarity matrix S. In next section, the performance of LPP with
respect to the neighborhood size k and heat kernel parameter t on several famous
face databases will be reported.

3 Experiment
3.1 Database and Experimental Set
Three well-known face database ORL1 , Yale2 and the Extended Yale
Face Database B[4] (denoted by YaleB hereafter) were used in our experiments.
1
http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
2
http://cvc.yale.edu/projects/yalefaces/yalefaces.html
336 S. Wang et al.

Fig. 1. Sample images of one individual from the ORL database

Fig. 2. Sample images of one individual in the YALE database

The ORL database collects images from 40 individuals, and 10 different images
are captured for each individual. For each individual, the images with different
facial expressions and details are obtained at different times. The face in the
images may be rotated, scaled and be tilting in some degree. The sample images
of one individual from the ORL database are shown in Figure 1.
There are total of 165 gray scale images for 15 individuals where each individ-
ual has 11 images in Yale face database. The images demonstrate variations in
lighting condition, facial expression (normal, happy, sad, sleepy, surprised, and
wink). The sample images of one individual from the Yale database are showed
in Figure 2.
The YaleB contains 21888 images of 38 individuals under 9 poses and 64
illumination conditions. A subset containing 2414 frontal pose images of 38 in-
dividuals under different illuminations per individual is extracted. The sample
images of one individual from the YaleB database are showed in Figure 3.

3.2 The Analysis of the Performance with Respect to t and k


In our experiments, the similarity matrix S is governed by two parameters: the
neighborhood size k and heat kernel parameter t. k is searched from {2, 3, . . . , N −
1}. Each image vector is normalized before calculating the similarity matrix
S. We randomly split the image samples so that p (for ORL and yale, p =
2, 3, 4, 5, 6, 7, 8; for YaleB, p = 5, 10, 20, 30, 40, 50) images for each individual are
used as the training set and the rest are used as the testing set.
In yale database, p is set as 2 i.e. 30 samples in the training set, and t is
searched from {1, 1.1, 1.2, . . . , 1.9, 2, 3, . . . , 9, 10, 20, . . . , 90, 100}. This process is
repeated 50 times. The results of three of them are plotted in Fig. 4. From
the figure, little influence has been brought to the performances due to the
variation of t value. Due to normalization of the image vectors, when t > 2,
exp(−xi − xj 2 /t) approximates 1. From the figure, we can also see that in
the same the number of the training samples, the top performance does not also
incur in the same neighborhood size.
The same experiments are conducted on the ORL, Yale, YaleB face databases.
t is searched from {1, 1.1, 1.2, . . . , 1.9, 2, 3, 4, 5}. The several results are plotted
in Fig. 5. From the figure, the top performance incurs on when the neighborhood
size k is greater than the half of the numbers of the samples. And we can also
see the fact that the performance is sensitive to the parameter k. This stems
The Analysis of Parameters t and k of LPP 337

Fig. 3. Sample images of one individual from the YaleB database

29 29 29 0.48
0.44
0.52
0.46
0.42
0.5 0.44
22 22 22
The neighborhood size k

The neighborhood size k

The neighborhood size k

0.4 0.42
0.48
0.4
0.38
0.46
15 15 15 0.38
0.36
0.44 0.36

0.34 0.34
8 0.42 8 8
0.32
0.4 0.32
0.3
2 2 2
1 1.5 2 5 10 50 100 1 1.5 2 5 10 50 100 1 1.5 2 5 10 50 100
The heat kernel parameter t The heat kernel parameter t The heat kernel parameter t

Fig. 4. The performance of LPP vs. the two parameters k and t on Yale face database

from the fact that the essential manifold structure of samples. An alternative
interpretation is that facial images lie on multi-manifolds instead of a single
manifold. Recently, the efforts of research on multi-manifolds for face recognition
are proposed[8]. In order to verify the validation of the assumption that the
performance is insensitive to the heat kernel parameter t and the top performance
incurs in the case that the neighbors size k is greater than the half of the number
of the samples, 50-time cross-validations are performed on Yale database. The
results are illustrated in Fig. 6.
338 S. Wang et al.

319 189
0.54
0.94

0.92
0.53

0.9
240 143

0.88 0.52

The neighborhood size k


The neighborhood size k

0.86

160 95 0.51
0.84

0.82

0.5

0.8

80 48

0.78
0.49

0.76

0.48
0.74
2 2
1 1.5 2 5 1 1.5 2 5
The heat kernel parameter t The heat kernel parameter t

(a) 320 samples on ORL (b) 190 samples on YaleB


Fig. 5. The performance of LPP vs. the two parameters k and t

0.75

0.7
Recognition accuracy (%)

0.65

0.6

0.55

0.5
BASE
PCA
LPP
0.45
2 3 4 5 6 7 8
The number of training samples

Fig. 6. the grid-search parameter result on yale database

4 Conclusion

LPP is a very popular subspace transformation method for face recognition.


Recently, its many variants have been proposed. However, their performances
mainly depend on how to construct the adjacent graph, which artificially con-
structs the local structure. To the best of our knowledge, there is no report on
the relation between the performance of LPP vs. the nearest neighbor size k
and the heat kernel parameter t. This issue is discussed in this paper. We find
that the performance is insensitive to the heat kernel parameter t and the top
performance incurs in the case that the neighbors size k is greater than the half
of the number of the samples.
Our future researches will focus on the performance of the variants of LPP
vs. the two parameters t and k. We also focus on the multi-manifold face
recognition.
The Analysis of Parameters t and k of LPP 339

References
1. Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. fisherfaces: recog-
nition using class specific linear projection. IEEE Transactions on Pattern Analysis
and Machine Intelligence 19(7), 711–720 (1997)
2. Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding
and clustering. Advances in Neural Information Processing Systems 1, 585–592
(2002)
3. Chen, S.B., Zhao, H.F., Kong, M., Luo, B.: 2D-LPP: a two-dimensional extension
of locality preserving projections. Neurocomputing 70(4-6), 912–921 (2007)
4. Georghiades, A., Belhumeur, P., Kriegman, D.: From few to many: illumination
cone models for face recognition under variable lighting and pose. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence 23(6), 643–660 (2001)
5. He, X.F., Niyogi, P.: Locality preserving projections. In: Advances in Neural In-
formation Processing Systems, vol. 16, pp. 153–160. The MIT Press, Cambridge
(2004)
6. He, X.F., Yan, S.C., Hu, Y.X., Niyogi, P., Zhang, H.J.: Face recognition using lapla-
cianfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(3),
328–340 (2005)
7. Liu, Y., Liu, Y., Chan, K.C.C.: Tensor distance based multilinear locality-preserved
maximum information embedding. IEEE Transactions on Neural Networks 21(11),
1848–1854 (2010)
8. Park, S., Savvides, M.: An extension of multifactor analysis for face recognition
based on submanifold learning. In: 2010 IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pp. 2645–2652. IEEE, Los Alamitos (2010)
9. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear em-
bedding. Science 290(5500), 2323 (2000)
10. Seung, H.S., Lee, D.D.: The manifold ways of perception. Science 290(5500), 2268–
2269 (2000)
11. Tenenbaum, J.B., Silva, V., Langford, J.C.: A global geometric framework for non-
linear dimensionality reduction. Science 290(5500), 2319 (2000)
12. Turk, M., Pentland, A.: Eigenfaces for recognition. Journal of Cognitive Neuro-
science 3(1), 71–86 (1991)
13. Wan, M.H., Lai, Z.H., Shao, J., Jin, Z.: Two-dimensional local graph embedding
discriminant analysis (2DLGEDA) with its application to face and palm biometrics.
Neurocomputing 73(1-2), 197–203 (2009)
14. Xu, Y., Zhong, A., Yang, J., Zhang, D.: LPP solution schemes for use with face
recognition. Pattern Recognition (2010)
15. Yu, W.W., Teng, X.L., Liu, C.Q.: Face recognition using discriminant locality
preserving projections. Image and Vision Computing 24(3), 239–248 (2006)
16. Yu, W.: Two-dimensional discriminant locality preserving projections for face
recognition. Pattern Recognition Letters 30(15), 1378–1383 (2009)
Local Block Representation for Face Recognition

Liyuan Jia1, Li Huang2, and Lei Li3


1
Department of Computer Science, Hunan City University, Yiyang, China
2
Department of Science and Technology, Hunan City University, Yiyang, China
3
Department of Computer and Information Technology,
Henan Xinyang Normal College, Xinyang, China
[email protected]

Abstract. Face expression analysis and recognition play an important role in


human face emotion perception and social interaction and have therefore at-
tracted much attention in recent years. Semi-Supervised manifold learning has
been successfully applied to facial expression recognition by modeling different
expressions as a smooth manifold embedded in a high dimensional space. How-
ever, the best classification accuracy does not necessarily guarantee as the as-
sumption of double manifold is still arguable. In this paper, we study a family
of semi-supervised learning algorithms for aligning different data sets that are
characterzied by the same underlying manifold. The generalized framework for
modeling and recognizing facial expressions on multiple manifolds is pre-
sented. First, we introduce an assumption of one expression one manifold for
facial expression recognition. Second, we propose a feasible algorithm for mul-
tiple manifold based facial expression recognition. Extensive experiments show
the effectiveness of the proposed approach.

Keywords: face recognition, manifold learning, locality preserving, semi-


supervised learning.

1 Introduction
Recently, learning from high-dimensional data sets is a contemporary challenging
problem in machine learning and pattern recognition fields, which becomes increas-
ingly important as large and high-dimensional data collections need to be analysed in
different application domain. Suppose that a source dataset R produces high-
dimensional data that we wish to analyze. For instance, each data point could be the
frames of a movie produced by a digital camera, or the pixels of a high-resolution
pixel image or large vector-space representation of text documents abound in multi-
modal data sets. When dealing with this type of high dimensional data, the high-
dimensionality is an obstacle for any efficient processing of the data[1]. Indeed, many
classical data processing algorithms have a computational complexity that grows
exponentially with the dimension. On the other hand, the source R may only enjoy a
limited number of degrees of freedom. This means that most of the variables that
describe each data points are highly correlated, at least locally, or equivalently, that
the data set has a low intrinsic dimensionality. In this case, the high-dimensional rep-
resentation of the data is an unfortunate (but often unavoidable) artifact of the choice

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 340–347, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Local Block Representation for Face Recognition 341

of sensors or the acquisition device. Therefore, it should be possible to obtain low-


dimensional representations of the samples. Note that since the correlation between
variables might only be local, classical global dimension reduction methods like Prin-
cipal Component Analysis and Multidimensional Scaling do not provide, in general,
an efficient dimension reduction[2].
First introduced in the context of manifold learning, eigenmaps techniques [3] are
becoming increasingly popular as they overcome this problem. Indeed, they allow one
to perform a nonlinear reduction of the dimension by providing a parametrization of
the data set that preserves neighborhoods[4]. However, the new representation that
one obtains is highly sensitive to the way the data points were originally sampled.
One important issue is that of manifold alignment[5]. This question arises when
one needs to find common structure between two or more concepts lies at the hearts
of analogical reasoning and establish a correspondence between two or more data sets
resulting from the same fundamental source. For instance, consider the problem of
matching pixels of a stereo image pair. One can form a graph for each image, where
pixels constitute the nodes and where edges are weighted according to the local fea-
tures in the image[6]. The problem now boils down to matching nodes between two
manifolds. Note that this situation is an instance of multisensor integration problem,
in which one needs to find the correspondence between data captured by different
sensors. In some applications, like fraud detection, synchronizing data sets is used for
detecting discrepancies rather than similarities between data sets[7].
In this paper, inspired by the idea of LTSA [8], we propose a novel automatic way
to align the hidden manifold, called manifold alignment via local block representa-
tion. It uses the tangent space in the neighborhood of a data point to represent the
local geometry, and then aligns those local tangent spaces in the low-dimensional
space which is linearly mapped from the raw high-dimensional space. The method
can be viewed as a linear approximation of the nonlinear local tangent space align-
ment [9] algorithm and the technique of linearization is similar to the fashion of LPP
[10]. Since images of faces, represented as high dimensional pixel arrays, often be-
long to a manifold of intrinsically low dimension [11], we develop LLTSA algorithm
for effective face manifold learning and recognition. Comprehensive comparisons and
extensive experiments show that LLTSA achieves much higher recognition rates than
a few competing methods.
The rest of the paper is organized as follows: Local block approximation error is ana-
lyzed concretely in Section 2. Section 3, We start by introducing the local tangent space
alignment in Section 3.1. We then explain how to align the different manifold data set
by local block approximation in Section 3.2. we demonstrates the application of our
approcach to aligning the pose manifolds of images of different object in Section 4.
Finally, the utility and future direction of the approach is discussed in Section 5.

2 Local Block Error Analysis


Local approximation organizes the low-dimensional data with the geometric knowl-
edge of the high-dimensional data. It can be classified into two types: the approxima-
tion of a point and the approximation of a block. Suppose we have a D × n matrix
X = [x1 " xn ] consisting of n data vectors with dimensionality D , and need to
342 L. Jia, L. Huang, and L. Li

transform it into a new d × n matrix Y = [ y1 " yn ] (usually d << D ), while pre-


serving some local property of X .
⎡i⎤
Let Γi be a vector of indices of points in the ( k − 1) -neighbor of xi , and Γi = ⎢ ⎥ is
Γ
⎣ i⎦
a vector including i and Γi . S i is a 0-1 selection matrix satisfying XSi = X Γi . Similarly
in the low-dimensional space, we have YSi = YΓi . Let e be the vector of all 1 ’s, I k be
1
the identify matrix with rank k . Then, J = ( I − eeT ) is the mean removal operator.
k
We can get the local coordinate of the high-dimensional data X Γi = X Γi J , and its
counterpart in low-dimension YΓi = YΓi J . The approximation of a block is defined as
YΓ Wi → YΓ , where Wi is the local approximation extracted from points around xi .
i i

The block approximation error around y i is defined as:

= YSi J (I k − Wi ) F
2 2
errbi = YΓ − YΓ Wi (1)
i i F

The summation of the approximation error of all local blocks is


n
errb = ∑ YSi J (I k − Wi ) F = YSb Bb
2 2
F
(2)
i =1

where Sb = [S1 ,", S n ] , Bb = diag{J (I k − W1 ),", J (I k − Wn )}

3 Manifold Alignment Based on Local Block Representation


In this section we present contributions in using the idea of local tangent space alig-
ment for two or more manifold alignemt. Many different high dimensional data sets
are characterized by the same underlying modes of variability. When these modes of
variability are continuous and few in number, they can be used to parameterize a low
dimensional manifold representation of data. This parameterization can be used to
map correspondences between examples in the high dimensional data sets.We show
how this low dimensional representation can be used to generate the appropriate high
dimensional correspondences.

3.1 Local Tangent Space Alignment


We first outline the basic steps of LTSA. The basic idea of LTSA is to construct local
linear approximations of the manifold in the form of a collection of overlapping ap-
proximate tangent spaces at each sample point, and then align those tangent spaces to
obtain a global parametrization of the manifold. Details and derivation of the algo-
rithm can be found in [10]. Given a data set X = [x1 ,…, xN ] with x i ∈ R m , sampled
(possibly with noise) from a d-dimensional manifold ( d << m ), xi = f (τ i ) + ε i , where
f : Ω ⊂ R d → R m , Ω is an open connected subset ,
and ε i represents noise. LTSA
assumes that d is known and proceeds in the following steps.
Local Block Representation for Face Recognition 343

(1) Local neighborhood construction. For each xi , i = 1,…, N , determine a set


[ ]
X i = xi1 ,…, xik of its neighbors (k nearest neighbors, for example).
(2) Local linear fitting. Compute the optimal rank-d approximation to the centered
,where x = 1k ∑ x ,and e is a k-dimensional row vector of all
matrix ( X i − xi eT ) i
k
j =1 i j

1’s. By the SVD of X − x e , we can obtain the orthonormal basis Q for the
T

d-dimensional tangent space of the manifold at x , and the orthogonal projection of


i i i

each xi j in its neighborhood to the computed tangent space θ (j i ) = QiT ( xi j − xi ) .


(3) Local coordinates alignment. Align the N local projections Θ i = [θ1(i ) ,…,θ k(i ) ] ,
i = 1,…, N ,
to obtain the global coordinates Denote τ 1 ,…,τ N , and Ti = τ i1 ,…,τ ik [ ]
which consists of a subset of the columns of T with the index set {ii ,… , ik } determined
by the neighbors of each xi . Let Ei = Ti − ci eT − Li Θ i be the local reconstruction error
1 1
matrix, where ci = Ti e and Li = Ti ( I − eeT )Θ i+ =T iΘ i+ , where Θ i+ is the Moor-
k k
Penrose generalized inverse of and e is a vector of all ones. Then the alignment of
LTSA is achieved by minimizing the following global reconstruction error:

E (T ) = ∑ Ei ≡ ∑ min Ti − ci eT − Li Θ i
2 2 2
= TSW (3)
ci , Li
i i

Where S = [S1 ,…, S N ] and W = diag (W1 ,…,W N ) , with

1 T
Wi = ( I − ee )( I − Θ i+ Θ + ) (4)
k

To uniquely determine T , we will impose the constraints TT T = I d , it turns out


that the vector e of all ones is an eigenvector of B corresponding to a zero eigen-
value,
B = SWW T S T (5)
therefore, the optimal T is given by the d eigenvectors of the matrix B , corre-
sponding to the 2nd to d + 1 st smallest eigenvalues of B

3.2 Gabol Manifold Alignment via Local Block Approximation

Manifold alignment [15,16] maps several datasets into a global space by the matching
points in each dataset, which is essential for data fusion and multicue data matching
[13]. Suppose, there are m datasets X = {X 1 ,", X m } , and they need to be transformed
~
~
into a uniform dataset Y with dimensionality d . The corresponding matching points
~ ~ ~
in X should be mapped to the same point in Y . Let Y j be X j 's counterpart in Y ,
~
and let S j be a 0-1 selection matrix satisfying
344 L. Jia, L. Huang, and L. Li

~~
Y j = YS j (6)

according to (2) and (6), we can give the summarization of the block approxima-
tion error of all datasets
m m
~~ ~ 2
ERRb = ∑ errbj = ∑ Y j S bj Bbj
2
= Y S b Bb (7)
F F
j =1 j =1

~
(~ ~ )(~ ~ ) , S B is
here M b = Sb Bb Sb Bb
T
bj bj the block approximation matrix for Y j ,
~
[~
,", S S ] , B = diag {B ," B
S b = S1S b1
~
m bm
~
b1 bm } . By imposing the additional constraint
~ ~T ~* ~
Y Y = I d , the optimal Y is given by the d eigenvectors of the matrix M b , corre-
~
sponding to the 2nd to d + 1 st smallest eigenvalues of M b .
Learning with the label info can be regarded as the problem of approximating a
multivariate function from labeled data points. The function can be real valued as in
regression or binary valued as in classification. Learning with the label info can also
be regarded as a special case of dimension reduction that maps all the data points in
2
the label space. The label error of yi is defined as errli = si yi − f i , where
F = [ f1 ,", f n ] is the label value, si is the flag to identify the labeled points satisfying
⎧1 " i ∈ L
si = ⎨ , and L is the collection of index of labeled points. By weighted
⎩0 " i ∉ L
combining the point approximation error and label error, we can get

( )
n
Errp = ∑ (1 − ai ) errpi + ai errli = YB p (I n − A) + (Y − F )A
2 2 2 2
F
(8)
F
i =1

optimal

(
Y * = FAAT M p + AAT )
−1
(9)

l
M p = B p ( I n − A)( I n − A)T B Tp , ai = ( (1 − a 0 ) + a 0 ) si is the weight coefficient at yi , l
n
is the number of labeled points, a 0 is the minimal weight coefficient set by user, and
A = diag (ai ) . We set the weight coefficient on the following intuition that: if the pro-
portion of the labeled points is very small, we have to reduce our dependence on the
knowledge only retained from the labeled points; if all the points are labeled, we must
totally discard the geometry info of the point clouds, for at that moment the label info
is more reliable. As a result, the coefficient ai has to be adjusted with the proportion
of labeled points.
Similarly with the point approximation, the total error defined for block approxi-
mation is

( )
ξ = ∑ (1 − ai )2 erri + ai2errj = YSb Bb ( I k − Ab )
2
F
+ (Y − F )A
2
F
(10)
Local Block Representation for Face Recognition 345

optimal

(
Y * = FAAT M b + AAT )−1
(11)

Here, M b = S b Bb ( I k − Ab )( I k − Ab )T BbT S bT , K = k × n , Ab = diag {a1 I k ,", a n I k } is a sparse


1
weight matrix. We take y 0 = ∑ yi as the decision threshold for classification.
n − l i∉L

4 Experiments and Discussion


A face recognition task can be viewed as a multi-class classification problem. First,
we carry out our algorithm on the training face images and learn the transformation
matrix. Second, each test face image is mapped into a low-dimensional subspace via
the transformation matrix. Finally, we classify the test images by the nearest neighbor
classifier. We validate the effectiveness of our algorithms by solving the problem of
image sequence alignment. The matching points for aligning the manifold of different
image sequences are marked as large dots by different colors. We can see that our
algorithms can align the embedded manifolds properly with the knowledge extracted
from the matching points and the relationship among points of each image sequence.

(a)obj1 (b)obj2
Fig. 1. Images of two objects in FacePix

(a)obj1 (b)obj2

Fig. 2. Embedded manifolds before aligned


346 L. Jia, L. Huang, and L. Li

(a)obj1 (b)obj2

Fig. 3. Embedded manifolds after aligne

We select two image sequences for experiments from the FacePix database[13,14].
The image sequences corresponding to the profile view of the object taken with the
camera placed at 90 degrees from the frontal are shown in Figure 1. The embedded
manifolds before aligned are shown in Figure 2, and the embedded manifolds after
aligned are shown in Figure 3.

5 Conclusion and Future Work


This paper has presented an efficient approach to align manifold and has been suc-
cessfully applied to facial expression recognition. The approach is to use the local
tangent space alignment to modeling different expressions as a smooth manifold em-
bedded in a high dimensional space. Recognizing facial expressions on multiple
manifolds is presented. In our experiments we have demonstrated the feasibility of the
proposed method.

Acknowledgment
This work was supported by the Research Foundation of Science & Technology
office of Hunan Province under Grant (No. 2010FJ4213); Project supported by Scien-
tific Research Fund of Hunan Provincial Education Department under Grant (No.
10C0498).

References
1. Zhou, D., Huang, J., Schoelkopf, B.: Learning with hypergraphs: Clustering, classification,
and embedding. In: Advances in Neural Information Processing Systems (NIPS), vol. 19
(2006)
2. Yan, S.C., Xu, D., Zhang, B., Zhang, H.J.: Graph embedding and extensions: A general
framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence 29(1), 40–51 (2007)
Local Block Representation for Face Recognition 347

3. He, X., Cai, D., Yan, S., Zhang, H.J.: Neighborhood preserving embedding. In: Tenth
IEEE International Conference on Computer Vision, Beijing, vol. 2, pp. 1208–1213 (2005)
4. Belkin, M., Niyogi, P.: Semi-Supervised Learning on Riemannian Manifolds. Machine
Learning 56, 209–239 (2004)
5. Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: A geometric framework
for learning from examples (Technical Report TR-2004-06). University of Chicago (2004)
6. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding.
Science 290, 2323–2326 (2000)
7. Zhang, Z.Y., Zha, H.Y.: Principal Manifolds and Nonlinear Dimension Reduction via Lo-
cal Tangent Space Alignment. CSE-02-019, Technical Report, CSE, Penn State Univ.
(2002)
8. Poggio, T., Girosi, F.A.: Theory of networks for approximation and learning. Technical
Report A. I. Memo 1140. MIT, Massachusetts (1989)
9. Ham, J., Lee, D., Saul, L.: Semisupervised Alignment of Manifolds. In: Proc. 10th Int’l
Workshop Artificial Intelligence and Statistics, pp. 120–127 (2005)
10. Lafon, S., Keller, Y., Coifman, R.R.: Data Fusion and Multicue Data Matching by Diffu-
sion Maps. IEEE Trans. on PAMI 28(11), 1784–1797 (2006)
11. Yang, G., Xu, X., Yang, G., Zhang, J.: Semi-supervised Classification by Local Coordina-
tion. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010. LNCS,
vol. 6444, pp. 517–524. Springer, Heidelberg (2010)
12. Yang, G., Xu, X., Yang, G., Zhang, J.: Research of Local Approximation in Semi-
Supervised Manifold Learning. Journal of Information & Computational Science 7(13),
2681–2688 (2010)
13. He, X., Niyogi, P.: Locality preserving projections. In: Advances in Neural Information
Processing Systems, vol. 16, p. 37. The MIT Press, Cambridge (2006)
14. Zhang, T., Yang, J., Zhao, D., Ge, X.: Linear local tangent space alignment and application
to face recognition. Neurocomputing 70, 1533–1547 (2007)
15. He, X.: Incremental semi-supervised subspace learning for image retrieval. In: Proceedings
of the ACM Conference on Multimedia, New York, pp. 10–16 (October 2004)
16. Yang, X., Fu, H., Zha, H., Barlow, J.L.: Semisupervised nonlinear dimensionality reduc-
tion. In: ICML 2006, Pittsburgh, PA, pp. 1065-1072 (2006)
Feature Level Fusion of Fingerprint and
Finger Vein Biometrics

Kunming Lin1, Fengling Han2, Yongming Yang1, and Zulong Zhang1


1
State Key Laboratory of Power Transmission Equipment & System Security and New
Technology, Chongqing University, 400030 Chongqing, China
2
School of Computer Science and IT, RMIT University, 3001 Melbourne, Australia
[email protected]

Abstract. The aim is to study the fusion at feature extraction level for finger-
print and finger vein biometrics. A novel dynamic weighting matching
algorithm based on quality evaluation of interest features is proposed. First, fin-
gerprint and finger vein images are preprocessed by filtering, enhancement,
gray-scale normalization and etc. The effective feature point-sets are extracted
from two model sources. To handle the problem of curse of dimension,
neighborhood elimination and reservation of points belonging to specific
regions are implemented, prior and after the feature point-sets fusion. Then, ac-
cording to the results of features evaluation, dynamic weighting strategy is in-
troduction for the fusion biometrics. Finally, the fused feature point-sets for the
database and the query images are matched using point pattern matching and
the proposed weight matching algorithm. Experimental results based on
FVC2000 and self-constructed finger vein databases show that our scheme can
improve verification performance and security significantly.

Keywords: Fingerprint; Finger vein; Feature level fusion; Dynamic weighting;


Multimodal Biometrics.

1 Introduction
Whether in passports, credit cards, laptops or mobile phones, automated methods of
identifying people through their anatomical features or behavioral traits are increasing
feature of modern life [1-3]. Uni-biometric systems have to contend with a variety of
problems such as noisy data, intra-class variations, restricted degrees of freedom, non-
universality, spoof attacks, and unacceptable error rates [2-3]. Some of these limita-
tions can be handled by deploying multi-biometric systems that integrate the evidence
presented by multiple sources of information. Ross and Jain [4] have presented an
overview of multimodal biometrics with various levels of fusion, namely, sensor
level, feature level, matching score level and decision level.
Despite the abundance of research papers related to multimodal biometrics [4-9],
fusion at feature level is a relatively understudied problem. Since the feature set con-
tains much richer information on the source data than the matching score or the output
decision of a matcher [4], fusion at the feature level is expected to provide better rec-
ognition performances. The existing literatures are mostly based on fingerprint, face,

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 348–355, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Feature Level Fusion of Fingerprint and Finger Vein Biometrics 349

speech and palmprint. Even though fingerprint is the most widely used biometric trait
[2, 9-11] and vein has high security [12], no methods for feature level fusion of the
two modalities have been proposed in the reported literature.
Thus, a novel approach to fuse fingerprint and finger vein biometrics at feature ex-
traction level is proposed in this paper. Despite the abundance of research papers re-
lated to images quality evaluation [13-15], it is difficult to give an effective criterion
for quality evaluation of the specific features. Aiming at the low quality features, from
the viewpoint of feature point-sets, an effective quality evaluation is given for finger-
print and finger vein features. Five evaluation factors are given to classify the features
as excellent and poor quality so as to set the respective fusion weight. On this basis, a
weight matching algorithm based on quality evaluation of interest features is pro-
posed. Experimental results based on fingerprint and finger vein databases show that
the fusion scheme can greatly reduce the uncertainty and improve verification per-
formance and security.

2 Image Preprocessing and Feature Extraction


Since the low quality images are difficult to extract feature points or there are many
false feature points, it is necessary for fingerprint and finger vein images to be pre-
processed by filtering, enhancement, gray-scale normalization and etc. FVC2000 and
self-constructed finger vein databases are used for experiments. Fingerprint and finger
vein images after preprocessing are 128×128 pixels and 80×120 pixels (extension size
is 128×128 pixels) respectively.
The fingerprint recognition module has been developed using minutiae based tech-
nique [10] where fingerprint image is normalized, preprocessed using Gabor filters,
binarized and thinned, is then subjected to minutiae extraction. The input to the system
is a fingerprint image and the output is the extracted minutiae set
{Mk=( xm , ym ,θ m )|1≤k≤K} (Set_fingerprint), where K is the number of minutiae,
k k k

(x , y ) and θ represent the spatial location and the local orientation.


k k k

m m m

Fig. 1. Extraction of finger vein features

Effect of extracting finger vein features directly determines recognition accuracy


and stability of the system. There are many image segmentation algorithms, such as
the classical threshold algorithm, region growing algorithm, edge detection algorithm
and the combination of a variety of segmentation algorithms. An enhanced method for
extracting finger vein features is used in this paper. The basic principle is to extract
features by detecting concavity in a gray image as detailed in [12]. The extracted fea-
tures are filtered, thinned and deburred, are then subjected to minutiae extraction, as
350 K. Lin et al.

shown in Fig. 1. The input to this system is a finger vein image and the output is the
extracted minutiae set {V = (x , y ,θ ) | 1 ≤ k ≤ K} (Set_vein), where K is the number
k k k k

v v v

of minutiae, (x , y ) and θ represent the spatial location and the local orientation.
k k k

v v v

3 Feature Matching Algorithms


The feature level fusion is realized by simply concatenating the feature points ob-
tained from different sources of information. The concatenated feature point-set has
better discrimination power than the individual feature vectors. The concatenated fea-
ture point-set is expressed as Set_fusion={M1, M2,…Mm,…,V1, V2,…,Vn,…}.

3.1 Feature Reduction Techniques

Neighborhood Elimination. This technique is applied on the feature point-sets of fin-


gerprint and finger vein (Set_fingerprint and Set_vein) individually. That is, for each
point of fingerprint and finger vein, those points that lie within the neighborhood of a
certain radius r (10 and 8 pixels for fingerprint and finger vein respectively on experi-
mental basis) are removed. For the given feature point (xi, yi), those points (xj, yj) which
satisfy (1) will be eliminated, and then we can get the reduced fingerprint and finger
vein point-sets. Effects of the neighborhood elimination are shown in Fig. 2.

( xi − x j ) + ( y i − y j ) ≤ r
2 2
Sd = (1)

(a) Fingerprint (b) Finger vein

Fig. 2. Effects of the neighborhood elimination

(a) Fingerprint (b) Finger vein

Fig. 3. Selection of specific regions


Feature Level Fusion of Fingerprint and Finger Vein Biometrics 351

Points belonging to specific regions. The corepoint in fingerprint is located using a


reference point location algorithm discussed in [11]. A radius equal to 40 pixels was
set for fingerprint feature points selection as shown in Fig.3. The region around core
point accounts for combating the effect of skin elasticity and non-linear distortion due
to varing pressure applied during image acquisition as it is the least affected region. In
order to remove the pseudo minutiae on edges of finger vein images, only the feature
points within the certain region (width dvein, set with 70 pixels) are retained as reduced
point-set, as shown in Fig. 3.

3.2 Quality Evaluation of Interest Features

The proposed interest features are some specific features in matching, such as
minutiae, texture structure and filter features [11] and etc. Minutiae feature is used in
this paper. Fingerpint and finger vein can be divided into two categories of excellent
and poor respectively by the proposed quality evaluation of minutiae sets. The method
of quality evaluation can be briefly described as follows.

The five Evaluation Factors. (a) The proportion of the effective area (the
corresponding evaluation factor is λ1); (b) Whether the number of minutiae extracted
from an input sample is within a certain range (λ2); (c) The change of minutiae
extracted from an input sample before and after reduction of the features (λ3); (d) The
ratio of the number of the registered template and input sample minutiae (λ4); (e) The
degree of the center deviation (λ5). Where λi(i=1,2,…,5) values are 1 or 0, the con-
straints are (2)-(6). If λi meet the corresponding constraint, the value is 1. Otherwise,
the value is 0. Five factors λi (i=1,2,…,5) are used for fingerprint and three factors λi
(i=2, 3, 4) are used for finger vein.
Pmin ≤ Parea = S D / S ≤ 1 (2)

min ≤ K ≤ max (3)

Δ1min ≤ Δ1 = K M / V ′ / K M / V ≤ 1 (4)

Δ 2 min ≤ Δ 2 = K Z / K X ≤ Δ 2 max (5)

( xcore − xz ) + ( ycore − y z ) ≤ Td core _ z


2 2
(6)

Where Parea is the proportion of the effective area, K is the number of minutiae, Δ1
is the ratio of the number of minutiae (before and after reduction of the features), Δ2
is the ratio of the number of minutiae (the registered template and input sample),
(xcore , ycore) and (xz , yz) are the center coordinate and barycentric coordinate respec-
tively. Parameters of five constraint conditions are shown in Table 1.

Quality Evaluation. If the evaluation factors of fingerprint and finger vein features
satisfy with (7) and (8) respectively, quality of the features is excellent. Otherwise,
the quality is poor.
352 K. Lin et al.

Table 1. Parameters of five constraint conditions

Evaluation Factors Threshold Parameters Fingerprint Finger vein


λ1 Pmin 0.5 —
min 15 7
λ2
max 55 28
λ3 Δ1min 0.5 0.5
Δ2min 0.5 0.5
λ4
Δ2max 2.0 2.0
λ5 Tdcore_z 32 —

∑λ i
≥3 (i = 1, 2, ..., 5) (7)

∑λ i
≥2 (i = 2, 3, 4) (8)

Several typical low quality images are shown in Fig. 4. (a) We can not extract
effective minutiae successfully: λ1+λ2+λ4+λ5=0. (b) There are scars and many false
minutiae: λ2+λ3+λ4=0. (c) The degree of deviation of the center is large and effective
area is small: λ1+λ2+λ4+λ5=0. (d) The degree of the center deviation is large:
λ3+λ4+λ5=0.

(a) (b) (c) (d)

Fig. 4. Low quality images

3.3 Dynamic Weighting Matching Algorithm

The proposed approach is based on the fusion of the two traits by extracting inde-
pendent feature point-sets from the two modalities, and making the two point-sets
compatible for concatenation. Overall thought of the algorithm is weight matching
based on predictive quality evaluation of interest features.
Step1: Feature point-sets of fingerprint and finger vein are extracted and expressed
as Set_fingerprint and Set_vein, where each feature point consist of the spatial loca-
tion (x, y) and the local orientation θ.
Step2: Apply the feature reduction techniques for Set_fingerprint and Set_vein. We
can get the feature point-sets Set_fingerprint′ and Set_vein′, recording the amount of
minutiae K, K′ and the center coordinate (xcore, ycore).
Feature Level Fusion of Fingerprint and Finger Vein Biometrics 353

Step3: To evaluate the feature point-sets and get the evaluation factors. According
to the quality evaluation factors of Set_fingerprint′ and Set_vein′, the features are
classified into two groups of excellent and poor quality.
Step4: Set fusion weight (λM and λV) of fingerprint and finger vein features as
follows:
M

⎧λ = 1 λ = 2 V
Initial values;
⎪λ = 2,λ = 2
⎪ M V
If M is excellent,V is poor;
⎨λ = 1,λ = 4
,
If M is poor,V is excellent;
⎪ M V

⎪λ = 1,λ = 2 If M and V are excellent or


⎩ M V poor simultaneously.
where M and V represent fingerprint and finger vein features respectively. Since val-
ues of the minutiae of finger vein are relatively low, initial values are set for 1 (λM)
and 2 (λV).
Step5: To combine the feature point-sets and realize fusion matching. The fused
feature point-sets for the database and the query fingerprint and finger vein images are
matched using the point pattern matching and the proposed weight matching
algorithm.
a) Check the matched point-pairs. Two points are considered paired only if the
spatial distance (9) and the direction distance (10) are all within some pre-determined
thresholds, set with 4 pixels, 5° for r0, θ0 on the basis of experiments.

( x' j − xi ) + ( y' j − yi ) ≤ r0
2 2
sd = (9)

dd = min(| θ' j − θ i |, 360°− | θ' j − θ i |) ≤ θ 0 (10)

Where the points i and j are represented by (xi, yi, θi)of the concatenated database and
query point-sets Set_fusion and Set_fusion′, sd is the spatial distance and dd is the
direction distance.
b) Compute the matching score. The final matching score is calculated using (11) and
(12) based on the number of matched pairs found between the database and query sets.
λM ⋅ N M + λV ⋅ NV
score = c × (11)
λM ⋅ N max_ finger + λV ⋅ N max_ vein

N max_ finger = max( N Set_finger , N Set_finger ′ ) ⎫


⎬ (12)
N max_ vein = max( N Set_vein , N Set_vein ′ ) ⎭

4 Experimental Results
The proposed approach has been tested on public-domain databases FVC
2000_DB1_B-DB4_B and self-constructed finger vein databases. Fingerprint data-
bases consist of 40 individuals composed of 8 images for each individual. The finger
vein images were acquired using an infrared sensor. Finger vein databases
consist of 40 individuals composed of 8 images under different illumination or time
354 K. Lin et al.

for each individual. All fingers are vertical upward and the grayscale and size of ac-
quired vein images are 256 and 135×235 pixels.
The experiments were conducted in several sessions below recording False Accep-
tance Rate (FAR), False Rejection Rate (FRR) and Accuracy (which is computed at
the certain threshold, where the performance is maximum.).
The fingerprint and finger vein recognition systems were tested before and after
reduction of the dimension (RD) respectively. The matching score is computed using
point pattern matching independently for fingerprint and finger vein. The individual
system performance was recorded and the results were computed for each modality as
shown in Table 2. From the experimental results, the reduction of the dimension in-
creased the recognition accuracy by 1.48% for fingerprint and 0.36% for finger vein.

Table 2. Point pattern matching performance of uni-biometric

Algorithm FRR (%) FAR (%) Accuracy (%)


Fingerprint 6.07 9.56 90.83
Un-RD
Finger vein 5.36 11.33 88.93
Fingerprint 4.29 7.95 92.31
RD
Finger vein 5.38 9.60 89.29

Table 3. Performance of the multimodal fusion

Algorithm FRR (%) FAR (%) Accuracy (%)


Matching level fusion 6.92 5.33 93.50
Unweighted feature level fusion 6.78 2.52 95.81
Dynamic weighting fusion 1.85 0.97 98.93

In the second session, multimodal fusion was tested on the multimodal databases
acquired by the authors with fingerprint and finger vein. The complex databases con-
sist of 40 pairs of images composed of a fingerprint sample and a finger vein sample
for each pair. For comparison, the results are computed for multimodal fusion at
matching score and feature extraction level as shown in Table 3. The fusion of
weighted average was used at matching score level. The results show that our scheme
of Dynamic weighting fusion achieved 98.9% recognition accuracy, compared with
fingerprint and finger vein modalities increased by 6.6% and 9.6% respectively,
compared with fusion recognition at matching level increased by 5.4%. Moreover,
compared with the Unweighted feature level fusion, the Dynamic weighting fusion
enhanced the recognition accuracy by 3.1%.

5 Summary
A multimodal biometric system based on the integration of fingerprint and finger vein
traits at feature extraction level was proposed. There are some advantages in multi-
modal biometric systems, including the easy of use, robustness to noise, low cost and
high security. From the viewpoint of feature point-sets, an effective quality evaluation
Feature Level Fusion of Fingerprint and Finger Vein Biometrics 355

is given for fingerprint and finger vein features. On this basis, a dynamic weighting
matching algorithm based on predictive quality evaluation of interest features was
proposed. Experimental results show that our scheme achieved 98.9% recognition
rate, compared with fingerprint and finger vein modalities alone increased by 6.6%
and 9.6% respectively. The scheme can improve verification performance and secu-
rity significantly. The next job is to derive the mathematical model of the best weights
for fusion at feature extraction level.

Acknowledgments. Thanks for the editor and reviewers’ helpful comments. The
work is financially supported by (a) the Funds for Visiting Scholars of State Key
Laboratory (Project No. 2007DA10512709403) and (b) the Fundamental Research
Funds for the Central Universities (Project No. CDJXS11150014).

References
1. Jain, A.K.: Biometric recognition: Q&A. Nature 449(6), 38–40 (2007)
2. Davide, M., Dario, M., Jain, A.K., et al.: Handbook of Fingerprint Recognition, 2nd edn.
Springer, London (2009)
3. Jain, A.K., Ross, A.: Multibiometric systems. Communications of the ACM 47(1), 34–40
(2004)
4. Ross, A., Jain, A.K.: Information fusion in biometrics. Pattern Recognition Letters 24,
2115–2125 (2003)
5. Darwish, A.A., Zaki, W.M., Saad, O.M., et al.: Human authentication using face and fin-
gerprint biometrics. In: 2010 Second International Conference on Computational Intelli-
gence, Communication Systems and Networks, Liverpool, United Kingdom, pp. 274–278
(2010)
6. Tong, Y., Wheeler, F.W., Liu, X.M.: Improving biometric identification through quality-
based face and fingerprint biometric fusion. In: 2010 IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, pp. 53–60. CVPRW, San Francisco (2010)
7. Hong, L., Jain, A.K.: Integrating faces and fingerprints for personal identification. IEEE
Transactions on Pattern Analysis and Machine Intelligence 20(12), 1295–1307 (1998)
8. Sun, A.B., Zhang, D.X., Zhang, M.: Multiple features based intelligent biometrics verifica-
tion model. Computer Science 37(2), 221–224 (2010)
9. Rattani, A., Kisku, D.R., Bicego, M., et al.: Feature Level Fusion of Face and Fingerprint
Biometrics. In: First IEEE International Conference on Biometrics: Theory, Applications,
and Systems, pp. 1–6. BTAS, Crystal City (2007)
10. Luo, X.P., Tian, J.: Image Enhancement and Minutia Matching Algorithms in Automated
Fingerprint Identification System. Journal of Software 13(5), 946–956 (2002)
11. Jain, A.K., Prabhakar, S., Hong, L., et al.: Filterbank-Based Fingerprint Matching. IEEE
Transactions On Image Processing 9(5), 846–859 (2000)
12. Yu, C.B., Qin, H.F.: Research on extracting human finger vein pattern characteristics.
Computer Engineering and Applications 44(24), 175–177 (2008)
13. Alonso, F.F., Fierrez, J., Qrteqa, G.J., et al.: A comparative study of fingerprint image-
quality estimation methods. IEEE Trans. on Information Forensics and Security 2(4),
734–743 (2007)
14. Liu, L.H., Tan, T.Z.: Research on fingerprint image quality automatic measures. Computer
Engineering and Applications 45(9), 164–167 (2009)
15. Hu, M., Li, D.C.: A Method for Fingerprint Image Quality Estimation. Computer Technol-
ogy And Development 20(2), 125–128 (2010)
A Research of Reduction Algorithm for Support Vector
Machine

Susu Liu and Limin Sun*

School of Computer Science, Yantai Unversity, Yantai, China


[email protected],
[email protected]

Abstract. Support vector machine is a new field of machine learning. Gener-


alization accuracy and response time are two important criterions of support
vector machine used in practical application. It is hoped that it will minimum the
number of training dataset and support vectors, simplify the algorithm realization
on the condition of keeping classification accuracy. Based on the above consid-
eration, a reduction algorithm combined SVM with KNN algorithm is presented.
The experiment results show that the algorithm can reduce the number of training
dataset and support vectors on the condition of keeping the classification accu-
racy of the original training dataset.

Keywords: support vector machine, generalization accuracy, response time,


reduction algorithm, KNN algorithm.

1 Introduction
Support vector machine has been widely used in pattern recognition[1], bioinformatics,
text classification. It has its own advantages compared with other statistics. However,
the support vector machine still has many problems as an emerging technology. The
current research on support vector machine includes two aspects: First, the property of
support vector machine to study. Secondly, the new fields of application to explore.
Generalization and response time are two important criterions of support vector
machine. The factors influence the response time of support vector machine are: the
number of the training dataset and support vectors. We want to reduce the number of
training dataset and support vectors on the condition of keeping the classification ac-
curacy. Liu Xiangdong and Chen Zhaoqian[2] presented a fast support vector machine
classification algorithm: using a small amount of support vectors instead of all the
support vectors. But this method requires solving complex optimization problems. Li
Honglian[3] put forward a SVM learning strategy for large-scale training dataset. The
classification accuracy and the number of support vectors using this method are better,
but the method needs to select a threshold. This paper presents a reduction algorithm for
support vector machine. First, the KNN algorithm is used to remove the noise data and
the borderliner data. Then the KNN algorithm is used to extract support vectors from the

*
Corresponding author.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 356–362, 2011.
© Springer-Verlag Berlin Heidelberg 2011
A Research of Reduction Algorithm for Support Vector Machine 357

training dataset to form a new dataset. Finally, the classifier is obtained from the new
training dataset. The experiment studies show that this algorithm the paper presented can
reduce the number of training dataset and support vectors on the condition of keeping
the classification accuracy of the original training dataset.

2 Support Vector Machine


∈ ∈ ∈
Given the training set T={(x1, y1),…, (xl, yl)} (X×Y)l xi x=Rn,yi y=R, i=1,2…l, If
there exists (w,b) satisfy the following inequality
w ⋅ xi + b ≥ 1 if y i = 1,
(1)
w ⋅ x i + b ≤ -1 if y i = −1,
The dataset T is linear separable, and the optimal hyperplane is
w⋅x +b = 0 (2)
2
The smallest margin of classifier is , the method of maximizing the margin leads
w
to solve the following optimization problem:
1 2
min w
w, b 2 (3)
yi (( w ⋅ x) + b) ≥ 1, i = 1,...l
It is not possible to find a linear separator because of the noise data. We introduce the
slack variables ε i for xi . The optimal problem is transformed into:
l
1 2
min w + C∑ξi
w, b 2 (4)
i =1

y i (w ⋅ x i + b) ≥ 1 − ξ i , ξ i ≥ 0, i = 1,2," , l
c > 0 is referred as the penalized cost. It is a trade-off between model complexity and
the accuracy on the training dataset. By introducing Lagrange multipliers to solve the
above problem, the optimal hyperplane is:
l (5)
f (x) = sign (∑ α i y i x i ⋅ x + b)
i =1

However, It is difficult to find an optimal linear support vector machine in the

Ф
original input space for some dataset. To solve this problem, the support vector machine
maps the original data (xi,yi) to the high-dimensional Hibert space ( (xi),yi) through a
nonlinear function Φ [4]. And the training dataset can be linear separable in this
high-dimensional space. xi ,xj x,∈ x∈R n
. If there exits a mapping Ф
from Rn toRm,
n<<m
k ( xi , x j ) =< Φ ( xi ), Φ ( x j ) > (6)
358 S. Liu and L. Sun

The <,> is the inner product of the space Rm, k ( xi , x j ) is a kernel function. The opti-
mization problem is transformed into:
1 2
min w
w,b 2 (7)
yi ( w ⋅ Φ ( xi ) + b) ≥ 1, i = 1,...l
By introducing the Lagrange multiplier ai , the optimal hyperplane is:
l
f ( x) = sgn(∑ ai∗ yi k ( xi , x) + b) (8)
i =1

3 Simplify the Support Vector Machine


To support vector machine, a large number of training dataset and support vectors are
required that consume more time. Therefore, we want to reduce the number of training
dataset and support vectors on the condition of keeping the classification accuracy.
a
Through solving of the optimization problem(7), the Lagrange i has the following
three conditions:
(1) ai=0, xi is classified correctly, which does not have effect on the optimal hyper-
plane.
(2) 0<ai<c, xi is the standard support vector.
(3) ai=c, xi is the boundary support vector, which is also the misclassified point.

Fig. 1. The ideal training dataset

Because the point of ai=0 has no effect on the hyperplane, we can remove the point of
ai=0 to reduce the training dataset firstly. The literature[5] proposed Boundary Nearest
Support Vector Machine. The idea is to find the support vectors to compress the training
dataset. The training dataset is divided into two datasets according to the class label. For
each data, we try to find the k-nearest points from the heterogeneous. Then the hetero-
geneous points are formed a new training dataset. Figure 1 shows that in the case of
no misclassified data, the method can extract all the support vectors, and keep the
A Research of Reduction Algorithm for Support Vector Machine 359

classification accuracy. Using the method of the Boundary Nearest Support Vector
Machine for the dataset shown in the figure 1, the classification hyperplane and support
vectors are shown in the figure 2. However, there are many misclassified points in the
dataset, and these misclassified points are support vectors. This is also the factor of too
much support vectors. For the dataset shown in Figure 3, the classification hyperlane and
the support vectors using the Boundary Nearest Support Vector Machines are shown in
the figure 4. In this case, the classification result of the test dataset is very poor.
As shown in Figure 5, the positive data are mainly distributed in the following forms.

Fig. 2. The hyperplane and support vectors of the ideal training dataset

Fig. 3. The training dataset with noisy examples

Fig. 4. The hyperplane and support vectors of the training dataset with noisy examples
360 S. Liu and L. Sun

Fig. 5. The training dataset

(1) Noise points. For instance the positive points are in the bottom left corner.
(2) Borderliner points are close to the boundary between the positive and negative
regions.
(3) The data have no effect to the hyperplane. For example, the positive data are in
the upper right corner.
(4) Safe points are important to the hyperplane.
If we delete the borderliner points when the data of the two classes mixed severely, it
can avoid too complex of the hyperplane and poor generalization ability[6]. By ana-
lyzing the distribution of the dataset, we hope to take the following steps to compress
the sample dataset: remove the noise data, part of the borderliner points, and the data
have no effect on the hyperplane, retain the safe points. Based on the above analysis, a
reduction algorithm for support vector machine is proposed. First, using KNN algo-
rithm we try to find the noise points from the training dataset, the part of the Borderliner
points and remove them. Then look for support vectors to form a new training dataset.
For convenience, in the following sections the above algorithm is called KNSVM
algorithm.
The calculation of distance between two points is given in the following form:

d ( xi , x j ) = Φ ( xu ) − Φ ( xh ) = k ( xi , xi ) − 2k ( xi , x j ) + k ( x j , x j ) (9)

∈ ∈
Given the training set T={(x1, y1),…, (xl, yl)} (X×Y)l xi x=Rn,yi y=R, i=1,2…l. ∈
Here are the specific steps of the algorithm KNSVM.
(1) Calculate the distance of each point with the other points, when record with its
own distanc e∞ .
(2) For each point xi of the dataset find the k-nearest points, if all class label of
k-nearest points are identical, and the class label of these points are different with the
point xi, then delete the point xi.
(3) Update the distance matrix. Calculate the k-nearest heterogeneous points for each
point, these heterogeneous points are formed a new training dataset.
(4) Train classifier for the new training dataset.
A Research of Reduction Algorithm for Support Vector Machine 361

The algorithm first remove the noise points and a part of the Borderliner points, and look
for support vectors to form a new dataset, then train a classifier for the new dataset. The
reduction algorithm calculate distance matrix only once, and the implementation process
is simple.

4 Simulation Research
4.1 Experiment Environment
In order to verify the reduction extent of support vectors and dataset, using KNSVM
algorithm, we select a different number of training set for study[7]. The experimental
datasets used are shown in the following table.

Table 1. Information of the dataset

dataset number feather class training dataset


Sonar 208 24 2 164
australian 690 14 2 579
breast-cancer 683 10 2 580

4.2 Experiment Results

We use RBF kernel function for support vector machine in the experiment. The KNN
algorithm is used twice in the KNSVM algorithm. We use the standard support vector
machine and KNSVM in the experiment. The result of the standard SVM is shown in
table 2 while the result of the KNSVM is shown in table 3.

Table 2. The classification performance table of standard SVM

dataset training dataset nsv nbsv accuracy


sonar 164 71 1 77.27
australian 579 222 190 80.59
breast-cancer 580 80 76 100

Table 3. The classification performance table of KNSVM

dataset training dataset nsv nbsv accuracy


sonar 68 43 0 79.55
australian 171 125 108 80.59
breast-cancer 80 37 35 100

By the comparison data shown in Tables 2 and Table3, we can see that the number of
training dataset and support vectors are reduction significantly using the KNSVM al-
gorithm compared with the standard SVM. But when the training dataset is unbalanced
severely, the algorithm has its drawback. The algorithm may delete the minority class.
The classifier result for the test dataset may not better than the standard SVM. It is better
to consider the characteristics of the dataset to choose the appropriate model.
362 S. Liu and L. Sun

5 Conclusion
This paper presents a reduction algorithm for support vector machine. The algorithm
combines support vector machine with KNN algorithm. First, we use KNN algorithm to
prune the training dataset. Then the KNN algorithm is used to select for k-nearest
heterogeneous points for each point. At last, train the classifier for the new dataset.
Experiment results show that using KNSVM algorithm, the number of training dataset
and support vectors are greatly reduced on the condition of keeping the accuracy to the
test dataset.

References
1. Lin, Y.Y., Liu, T.L., Fuh, C.S.: Local ensemble kernel learning for object category recog-
nition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,
pp. 1–8. IEEE Press, Washington D.C (2007)
2. Liu, X.D., Chen, Z.Q.: A Fast Classification Alorithm of Support Vector Machines. Journal
of Computer Research and Development 41(8), 1327–1332 (2004)
3. Li, H.L., Wang, C.H., Yuan, B.Z., et al.: A Learning Strategy of SVM Used to Large
Trarning Set. Chinese Journal of Computers 27(5), 716–719 (2004)
4. Nguyen, C.H., Ho, T.B.: An Efficient Kernel Matrix Evaluation Measure. Pattern Recogni-
tion 41, 3366–3372 (2008)
5. Feng, G.H., Li, Y.J., Zhu, S.M.: Boundary Nearest Support Vector Machines. Application
Research of Computers 23(4), 11–12 (2006)
6. Ke, H.X., Zhang, X.G.: Edit support vector machines. In: Proceedings of International Joint
Conference on Neural Networks, Washington, USA, pp. 1464–1467 (2001)
7. A library for Support Vector Machines,
http://www.csie.ntu.edu.tw/cjlin/libsvm
Fast Support Vector Regression Based on Cut

Wenyong Zhou, Yan Xiong, Chang-an Wu, and Hongbing Liu

School of Computer Science and Information Technology, Xinyang Normal University,


Xinyang 464000, P.R. China
{Xynuzwy,liuhbing}@126.com

Abstract. In general, the similar input data have the similar output target val-
ues. A novel Fast Support Vector Regression (FSVR) is proposed on the re-
duced training set. Firstly, the improved learning machine divides the training
data into blocks by using the traditional clustering methods, such as K-mean
and FCM clustering techniques. Secondly, the membership function on each
block is defined by the corresponding target values of the training data, all the
training data have the membership degree falling into the interval [0, 1], which
can vary the penalty coefficient by multiplying C. Thirdly, the reduced training
set is used to training FSVR, which consists of the data with the membership
degrees, which are greater than or equal to the selected suitable parameter λ .
The experimental results on the traditional machine learning data sets show that
the FSVR can not only achieve the better or acceptable performance but also
downsize the number of training data and speed up training.

Keywords: Support vector machines, Support vector regression, Fast support


vector regression, Membership function, K-Means clustering, Cut.

1 Introduction

Support Vector Machines (SVMs) based on statistical learning theory been an elegant
and powerful tool for classification and regression over the past decade as a modern
machine learning approach [1]. SVMs have generated a great interest in the commu-
nity of machine learning due to their excellent generalization performance in a wide
field of learning problems, such as handwritten digit recognition[1], disease diagnosis
[2] and face detection[3]. On the other hand, applications of SVR, such as forecasting
of financial market [4], prediction of highway traffic flow[5], are developed. How-
ever, training SVMs is still expensive, especially for a large-scale learning problem,
i.e., O(N3), where N is the total size of training data. The size of the training set is
related to the computational burden of SVMs, how to solve the training process effec-
tively is the key focus for SVMs. The first strategy is to improve the velocity of the
quadratic programming. So many fast algorithms such as Chunking [6], SMO[7],
SVMTorch[8], LIBSVM[9], and the modified finite Newton method [10] have been
presented to reduce the training set and speed up training. For Chunking method,
Training a SVM is equivalent to solving a linearly constrained quadratic program-
ming (QP) problem in a number of variables equal to the number of data points. SMO

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 363–370, 2011.
© Springer-Verlag Berlin Heidelberg 2011
364 W. Zhou et al.

is an extreme version of this approach which updates only two variables at each itera-
tion. The minimal number of variables can be optimized in order to fulfill the linear
equality constrain. For working set including two variables, the key point is that the
optimization sub-problem can be solved analytically without explicitly invoking a
quadratic optimizer. SVMTorch is a new decomposition algorithm intended to effi-
ciently solve large-scale regression problems using SVMs. LIBSVM is related to a
Nystrom sampling[11] with active selection of support vectors and estimation in the
primal space.
The second strategy is to reduce the scale of the training set to alleviate the compu-
tational burden for SVM training algorithms. The reduced set strategy has been
successfully applied to solve SVM pattern recognition problems[12] and regression
problems[13]. By using a heuristic method for accelerating SVM regression training,
all the training data are first divided into several groups using some clustering meth-
ods, and then for each group, some training data are discarded based on the measure-
ment of similarity among examples. The two processes including the clustering
method and the similarity computation are related to the dimension of the input data.
In an interesting estimated ε -tube based pattern selection method, the probability of a
training point falling in the ε -tube is computed according to the difference between
true target value and the outputs of some regressions, each of which is trained on a
smaller bootstrap sample of the original training set.
In this paper, the reduced training set is used to the regression problems. There are
two processes to deal with the training data before training SVR. The first step is the
partition of the original training set, the clustering methods on the inputs of the train-
ing data, such as K-mean and FCM clustering, are used to divide the training set into
different clusters (blocks). The second step is to discard the redundant data. In the
process, the membership function or importance of data to SVR are defined by the
corresponding target values on each cluster, the data is selected to train SVR when
their membership degrees are greater than or equal to the selected parameter λ .
The paper is organized as follows. Section 2 gives a brief of SVR. In section 3, we
explain the idea of the paper in detail. In section 4, we compare the improved fast
training method with the traditional algorithm on the benchmark data sets. Some con-
clusions are given finally.

2 Fast Support Vector Regression Based on Cut


As mentioned above, the improved FSVR is formed on the reduced training set. The
following steps are used to form the proposed SVR. Firstly, the inputs of training data
are partitioned into different clusters by using the traditional clustering methods. Sec-
ondly, for each cluster, we define the membership function by which the data near to
the estimating function have the large values, its aim is to increase the large penalty to
the data with the potential errors. Thirdly, the data with no potential errors are less
important to train SVR and discarded, so the reduced training set is constructed by the
above method. Finally, the proposed FSVR is formed on the reduced training set, the
acceptable or the better SVR can be obtained by the proper parameter.
Fast Support Vector Regression Based on Cut 365

2.1 Extraction of the Reduced Training Set

In space R d , the traditional clustering methods, such as the K-means and FCM clus-
tering, are used to partition the training set into blocks.
K-means clustering uses a two-phase iterative algorithm to minimize the sum of
point-to-centroid distances, summed over all k clusters: The first phase uses what the
literature often describes as "batch" updates, where each iteration consists of reassign-
ing points to their nearest cluster centroid, all at once, followed by recalculation of
cluster centroids. You can think of this phase as providing a fast but potentially only
approximate solution as a starting point for the second phase. The second phase uses
what the literature often describes as "on-line" updates, where points are individually
reassigned if doing so will reduce the sum of distances, and cluster centroids are re-
computed after each reassignment. Each iteration during this second phase consists of
one pass though all the points.
FCM clustering was originally introduced by Jim Bezdek in 1981 as an improvement
on earlier clustering methods. It is a data clustering technique wherein each data point
belongs to a cluster to some degree that is specified by a membership grade. It provides
a method that shows how to group data points that populate some multidimensional
space into a specific number of different clusters. FCM clustering starts with an initial
guess for the cluster centers, which intends to mark the mean location of each cluster.
The initial guess for these cluster centers is most likely incorrect. Additionally, FCM
assigns every data point a membership grade for each cluster. By iteratively updating
the cluster centers and the membership grades for each data point, FCM iteratively
moves the cluster centers to the "right" location within a data set. This iteration is based
on minimizing an objective function that represents the distance from any given data
point to a cluster center weighted by that data point's membership grade.
For each block, the distances between the target of the data in the block and the
their mean values are calculated by the following formula (1).

distance(i ) = yi − mean( y j ) (1)


1≤ j ≤ nl

where nl denotes the number of the block. The membership function of the block is

max (distance( j )) − distance(i )


1≤ j ≤ nl
membership(i ) = (2)
max (distance( j )) − min (distance( j ))
1≤ j ≤ nl 1≤ j ≤ nl

The membership functions make the training set to the fuzzy training set by adding
the membership values of every training data. In original training data, the samples
are the pair ( xi , yi ) , in the fuzzy training set, the samples are triplets
( xi , yi , membership (i )) . The data, whose target values are near to the mean of all the
target values, have the large membership values. Conversely, the data have the small
membership values. The entire membership values (degrees) focusing on the interval
[0,1] are convenient to select the suitable parameter λ of cut. The suitable parameter
λ is used to cut off the training data which can not be misclassified and support vec-
tors. As we known, the generalization ability is related to the support vectors focusing
into the ε -tube which is the subset of the whole training set, namely the reduced
training set and the suitable parameter λ is existential for SVR and FSVR.
366 W. Zhou et al.

Fig. 1. Extraction of the reduced training set

The training data (http://www.ntu.edu.sg/home/egbhuang/) including one input and


one output are created where xi ’s are uniformly randomly distributed on the interval
(-10, 10). In order to make the regression problem ‘real’, large uniform noise distrib-
uted in [-0.2,0.2] has been added to all the training samples while testing data remain
noise-free. Because x satisfy the uniform distribution, the internal (-10, 10) can be
divided the 10 sub-intervals by using K-means clustering. In each sub-interval, the
membership function is determined by the above formula (5). Figure1 shows that all
the training data and their target are mapped into the rectangle regions, the height
represents the corresponding membership degree, which has mapped into the interval
[0,1]. When the parameter λ of cut is 0.5, the data focus on the smaller rectangle
which is generated by pressing 50% from the top and bottle simultaneously are se-
lected to the next steps, then the other data are discarded. The training data marked
stress ‘.’ are selected as the member of the reduced training set, the other data marked
‘.’ are discarded. The training data marked stress ‘.’ are selected to train FSVR. If we
reduce the parameter value, the irregular tube with the more training data is selected
to train the new FSVR.

2.2 Forming Fast Support Vector Regression

The reduced training set is achieved on the basis of the original training set. Compared
with the traditional SVR, the FSVR obtain the ε -tube and the estimating function on
the reduced training set, namely the mentioned irregular tube above, while the tradi-
tional SVR obtain them on the entire training set. The total size of training data will be
reduced compared with the original training set, and consume the smaller time in term
of the complexity of SVR. The proposed algorithm can be described as follows.
Algorithm. FSVR based the training set with n elements.
Step 1. Dividing the inputs of training data into m blocks by using the clustering
methods.
Step 2. Defining the membership function on each block by using formula (5).
Fast Support Vector Regression Based on Cut 367

0.3 0.3
0.2 0.2
0.1 0.1
0 0
-0.1 -0.1
-0.2 -0.2

-10 -9.5 -9 -8.5 -8 -10 -9.5 -9 -8.5 -8


lambda=0.8 lambda=0.7

0.3 0.3
0.2 0.2
0.1 0.1
0 0
-0.1 -0.1
-0.2 -0.2

-10 -9.5 -9 -8.5 -8 -10 -9.5 -9 -8.5 -8


lambda=0.6 lambda=0.5

Fig. 2. The selections of the reduced training set with different parameters

Step 3. Selecting parameter λ of cut and forming the reduced training set.
Step 4. Training FSVR.
Step 5. Verifying the performance of FSVR, if it is acceptable, outputting support
vectors and the values of the corresponding Lagrange multipliers for testing, other-
wise, go to step3.

In step 3, the parameter λ belongs to interval [0, 1] and is selected by rule of the
descent order, such as λ = 1.0, 0.9, 0.8, 0.7,... . The membership degrees of the nearest
data points to the estimate function are maximal on the basis of the definition (5). The
data points with the lager membership degree lie in the neighborhood of the estimate
function. Whether the samples are the elements of the reduced training set is
determined by the membership degree, so the larger parameter λ means the reduced
training set with the smaller amount. For examples, the data points fall into the inter-
val (-10,-8) in Figure 1 are extracted to interpret the cut process, there are 503 data
points for the example, the inputs of these data lie in the interval (-10,-8), and their
target values belong to (-0.26,0.32), the membership degrees defined by the outputs
fall into the interval [0,1]. 94, 149, 209, 260 elements of 503 training data are used to
consist of the corresponding reduced training set by the parameter
λ = 0.8, 0.7, 0.6, 0.5 respectively. Figure 2 shows the selected process, both the red
points and the blue ones are the training points, the red points are the selected ones,
and the curve represents the estimate function of the interval (-10, -8).
368 W. Zhou et al.

3 Comparison of FSVR and SVR


In order to verify the superiority of the proposed method, two benchmarks with re-
gression problems are used to test the performance of FSVR. One is the artificial case
named approximation of sinc function with noise. The others are abalone 1 and
KIN40K 2 . We performed our experiments by using LS-SVMlab1.5 3 on the same
operating system Windows 2000 professional with 256MB memory and Pentium Ⅳ
CPU. We use the K-means clustering to perform the preprocess, and assign K to 50,
50, 200 for the data sets sinc, abalone, and KIN40K respectively.
Both FSVR and traditional SVR are used to approximate the sinc function

⎪1 x=0
f ( x) = ⎪


⎩sin( x) / x
⎪ x≠0

A training set ( xi , yi ) and testing set ( xi , yi ) 4 with 5000 training data respectively
are created where xi s are uniformly randomly distributed on the interval (-10,10). In
order to make the regression problem ‘real’, large uniform noise distributed in
[-0.2,0.2] has been added to all the training data while testing data remain noise-free.
The results with 50 clusters, C=100, RBF kernel with width 0.3, and different parame-
ter λ of cut, the number of training data (Ntr), time of training and testing, accuracy
of training and testing. It can be seen from Table 1 that the FSVR spend 1.141 sec-
onds CPU time obtaining the acceptable testing root mean square error (RMSE)
0.0052671, however, it takes 156.22 seconds CPU time for the traditional SVR to
reach the smaller RMSE 0.00632. The FSVR runs 137 times faster than the traditional
SVR. Fig.3 shows the true and the approximated function of the FSVR when
λ = 0.9 . The right figure of Fig.3 shows the true and the approximated function of
traditional SVR. In the figure, the dashed represents the real cure of sinc, the real line
represents the curve of SVR or FSVR.
The performance of the FSVR and SVR are compared on real world benchmark
datasets with large size. For dataset abalone with 4177 data, each datum consists of 7
attributes and 1 target value. Before training, 3000 training data and 1177 testing data
are randomly generated from the whole dataset. The simulation results by using the
linear kernel are showed in Table 1. From the table, we can see the FSVR reduce the
testing error and training time. KIN40K including 40000 samples was generated with
maximum nonlinearity and little noise, giving a very difficult regression task. We
select 30000 samples as the training data, and the rest 10000 samples are regarded as
the testing data. It takes a long time for SVR to perform the training process on the
data set KIN40K, we only select the FSVR whose performance approaches that of the
traditional one. From the Table 1, we can see the running time of FSVR is reduced
obviously, and the performance is acceptable when the parameter λ is set 0.9.

1
ftp://ftp.ics.uci.edu/pub/machine-learning-databases
2
http://ida.first.fraunhofer.de/~anton/data.html
3
http://www.support-vector.net/
4
http://www.ntu.edu.sg/home/egbhuang/
Fast Support Vector Regression Based on Cut 369

Table 1. Performance comparison for learning by using FSVR and SVR for sinc function

Data Time (seconds) Accuracy (RMSE)


Regression λ Ntr
Training Testing Training Testing
Sinc FSVR 0.9 602 1.4063 1.2032 0.013446 2.7742e-005
SVR - 5000 99.391 9.6875 0.013424 3.9942e-005
abalone FSVR 1 690 0.312 0.172 2.3245 2.2827
SVR - 3000 7.578 0.687 2.2192 2.2463
kin40k 1 200 0.141 1.562 - 0.26463
FSVR
0.9 6454 99.343 35.984 - 0.20526
SVR - 30000 5370.3 149.53 - 0.19732

Fig. 3. Outputs of the FSVR on the reduced training set when λ = 0.9 and SVR on testing
data

4 Conclusions
In this paper, we propose the methods to reduce training data for support vector re-
gression estimation based on the integration of the inputs and the outputs of training
data, the data near to the estimate function are extracted to construct the reduced train-
ing set and form the FSVR. In order to improve the performance of FSVR, the prod-
uct of the membership degree and the traditional penalty coefficient C is adopted to
increase the penalty to the training data with large errors. The experiment results on
the benchmark data sets of machine learning demonstrate the superiority of FSVR,
which can not only reduce the training complexity but also achieve the better or ac-
ceptable generalization abilities. Unfortunately, the preprocessing before training and
the tuning of the suitable parameters during training process increase the time con-
sumption. Whether the proposed methods are proper is determined by the users, who
need the learning machines with the speediness velocity and the acceptable perform-
ance or the maximal performance.
370 W. Zhou et al.

Acknowledgement. This work was in part supported by Natural Science Foundation


of Henan Province (102300410178, 102102210241, 2009B520025).

References
[1] Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, New York (1998)
[2] Wee, J.W., Lee, C.H.: Concurrent Support Vector Machine Processor for Disease Diag-
nosis. In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004.
LNCS, vol. 3316, pp. 1129–1134. Springer, Heidelberg (2004)
[3] Buciu, L., Kotropoulos, C., Pitas, I.: Combining Support Vector Machines for Accurate
Face Detection. In: Proceeding of International Conference on Image Processing, Thessa-
loniki, Greece, pp. 1054–1057 (2001)
[4] Yang, H., Chan, L., King, I.: Support Vector Machine Regression for Volatile Stock
Market Prediction. In: Jünger, M., Naddef, D. (eds.) Computational Combinatorial Opti-
mization. LNCS, vol. 2241, pp. 391–396. Springer, Heidelberg (2001)
[5] Ding, A., Zhao, X., Jiao, L.: Traffic Flow Time Series Prediction Based on Statistics
Learning Theory. In: Proceedings of International Conference on Intelligent Transporta-
tion System, Singapore, pp. 727–730 (2002)
[6] Osuna, E., Freund, R., Girosi, F.: An Improved Training Algorithm for Support Vector
Machines. In: Proceedings of Workshop on Neural Networks for Signal Processing, Ame-
lea Island, pp. 276–285 (1997)
[7] Platt, J.C.: Fast Training of Support Vector Machines Using Sequential Minimal Optimi-
zation. In: Advances in Kernel Methods: Support Vector Learning, pp. 185–208 (1999)
[8] Collobert, R., Bengio, S.: SVMTorch: Support vector machines for large-scale regression
problems. Journal of Machine Learning 1(2), 143–160 (2001)
[9] Chang, C.C., Lin, C.J.: LIBSVM: A Library for Support Vector Machines (2001),
http://www.csie.ntu.edu.tw/~cjlin
[10] Keerthi, S.S., DeCoste, D.M.: A Modified Finite Newton Method for Fast Solution of
Large Scale Linear SVMs. Journal of Machine Learning Research 6, 341–361 (2005)
[11] Girolami, M.: Orthogonal Series Density Estimation and the Kernel Eigenvalue Problem.
Neural Computation 14(3), 669–688 (2002)
[12] Shin, H.J., Cho, S.: Invariance of Neighborhood Relation Under Input Space to Feature
Space Mapping. Pattern Recognition Letter 26(6), 701–718 (2004)
[13] Wang, W.J., Xu, Z.B.: A Heuristic Training for Support Vector Regression. Neurocom-
puting 61, 259–275 (2005)
Using Genetic Algorithm for Parameter Tuning on ILC
Controller Design

Alireza Rezaee1 and Mohammad jafarpour jalali2


1
Islamic azad university Hashtgerd branch, Alborz, Iran
2
Islamic azad university Buinzahra branch, Qazvin, Iran
{arrezaee,mjp.jalali}@yahoo.com

Abstract. In this project we use the ILC control method to manipulate the ro-
botic arms of a robot with two degrees of freedom. First we implement the dy-
namic equations of robot according to the Schillings book of robotic. The
aforementioned implementation was done in MATLAB SIMULINK environ-
ment. The Genetic Algorithm was used for tuning the coefficients of PD Con-
trollers (proportional and derivative gains). Also we use Multi objective genetic
Algorithm to attain the coefficients of ILC PD Controllers.

Keywords: genetic, controller, tuning, robot.

1 Introduction
Robotics and precision machining, high positioning accuracy and low tracking errors
(position and velocity 'errors) arc very important performance indices. On the other
hand, for the sake of practical feasibility of implementation, the use of classical linear
controllers such as PD in robotic systems has attracted great attention from industries
for many decades. As a matter of fact, the majority of the industrial robots are con-
trolled with the popular PD laws/algorithms.
We begin with a brief introduction about ILC control method in section 2. After
that we will discuss about our simulated [2] model in MATLAB, and then the using
of MATLAB Genetic toolbox is described. Also we have attached some pictures of
our MATLAB model diagram and results that have been achieved from running the
model. The results are as some graphs, achieving from simulated model.

2 ILC Control Method


ILC method is a new approach in control theory. This method is a technique for ma-
chines, equipments, processes and systems which iterate one particular motion or ac-
tion. Essentially, ILC is a short brief of the Iterative learning control phrase, and ILC
controllers are added to compensate the shortage of the other controllers in the simu-
lated model. We use ILC controller and tune its parameters to learn during a sequence
of iterations to correct the error or behavior of robot manipulator, for following a par-
ticular trajectory. There are some problems with systems that use ILC [1], [3], [4].
Those are classified at two general groups: stability and performance.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 371–378, 2011.
© Springer-Verlag Berlin Heidelberg 2011
372 A. Rezaee and M. jalali

The problem of system performance is to lead an output response of a dynamic


system to a particular path, with the minimum of predefined error. Stability means the
declining of error continue with the increasing of iterations. We don't get a good sta-
bility, if a good fitness is not defined. In the other hand the parameters that the GA
yields, cause the model reach to a 0.057 amount of error from initial value of 7.684
error value in 10 iteration, but if we continue the running of model the amount of
error don’t stay constant without oscillation. . Consequently for overcoming the prob-
lem of stability we ran the model 10 times to evaluate the fitness of each chromo-
some. The bellow diagrams attached according to the [1], [3], [4] references, where,
u t the input signal during the trial applied to the system and producing the
output trajectory and r(t) denotes our desired trajectory, which the system
should track. These signals are stored in the memory until the trial is over, at which
time they are processed offline by the ILC algorithm. The learning controller is noth-
ing but the ILC algorithm compares and r(t) and adds an update term to u t to
produce , the refined input signal given to the system for the k 1 trail.
Consider the basic block diagram of ILC is illustrated in Fig. 1.

Fig. 1. Basic block diagram of ILC

Introduction to MOEA Multi-objective Evolutionary Algorithm (MOEA) is a sto-


chastic search technique inspired by the principles of natural selection and natural
genetics. It has attracted significant attention from researchers and technologists in
various fields because of its ability to search for a set of Pareto optimal solutions for
multi-objective optimization. The MOEA first begins with a population of possible
solutions, called strings. Each string is fed into a model. This model is usually a com-
puter program representation of the problem. The model returns the answer in the
form of a cost function. Based on these cost functions, strings are selected for evolu-
tion to create the next generation of strings.
Multi-objective simply means that there are more than one objective involved. For
each string, each objective returns a separate cost. The manner in which a string is
deemed superior or inferior to other strings shall be discussed elsewhere. The selected
Using Genetic Algorithm for Parameter Tuning on ILC Controller Design 373

strings undergo the evolutionary process where the traits of the selected strings
(which may or may not be good) are selected and combined to form new strings for
the next generation. In theory, with each generation, the strings in the population
should return better and better cost functions. In practice, there is a limit to the cost
functions that strings can achieve, depending on the objective functions and the limits
imposed on the model parameters. Further, there is the possibility that the best traits
are not found [5]. We use a toolbox with MOEA name that is written in National Uni-
versity of Singapore.
Fig. 2 shows the chart our Model in Simulink Environment of Matlab

(UURUL W  ,/&
FRQWUROOHU D u W L W 

3'  T D

T  TD  FRQWUROOH 5RERW
G '\QDPLF
0RGHO

3'  T D
FRQWUROOH
T   T D
G

(UURUL W 
,/& D u W L W 
FRQWUROOHU 

Fig. 2. The collocated PD with iterative learning control structure

3 Parameters and Objectives


To control a nonlinear system, a linear controller, such as PD controller, can he de-
signed based on linearization of the system about an operating point. In its simplest
form, a PD control law can be expressed as:

(1)

Where and are diagonal proportional and derivative gain matrices, respectively,
and and denote the position and velocity
errors. Also, and are positive definite matrices.
Fig. 3 shows the PD-type learning control scheme. The performance of the PD-
type learning control depends upon the proportional gain ф and derivative gain .
Stability, settling time, maximum overshoot and many other system performance in
dicators depend upon the values of ф and . The proposed strategy utilizes GA as an
optimization and search tool to determine optimum values for the gains. The
374 A. Rezaee and M. jalali

Fig. 3. PD-type learning algorithm

performance index or the cost function chosen is the error taken by the system to
reach and stay within a range specified by absolute percentage of the final value.
Hence, the role of GA is to find optimum values of the gains ф and . In this case,
integral of absolute error (IAE) is used for minimizing the error and generating the
controller parameters:

(2)

where Error = r(t) - y(t), N = size of sample, r(t) = reference input and y(t) = meas-
ured variable. Thus, the function in Eq. (2) can be minimized by applying a suitable
tuning algorithm.
Genetic algorithm is used here as a tuning algorithm. Genetic algorithms constitute
stochastic search methods that have been used as optimization tools in a wide spec-
trum of applications such as parameter identification and control structure design.
GAs have also found widespread use in controller optimization particularly in the
fields of fuzzy logic and neural networks. The GA used here initializes a random set
of population of the two variables ф and . The algorithm evaluates all members of
the population based on the specified performance index. The algorithm then applies
the GA operators such as reproduction, crossover and mutation to generate a new set
of population based on the performance of the members of the population. The best
member or gene of the population is chosen and saved for next generation. It again
applies all operations and selects the best gene among the new population. The best
gene of the new population is compared to best gene of previous population and the
best among all will be selected to represent ф and .
We choose the proportional and derivative coefficients (P, D) of ILC controllers
and α1 , α 2 such as genes in one chromosome. These aforementioned genes form the
Genetic parameters that were adjusted according to the some objectives. Thus we
have six gene or parameters which participate at forming a chromosome and two ob-
jectives for one of such chromosome. The structure of one chromosome is as follows:
Using Genetic Algorithm for Parameter Tuning on ILC Controller Design 375

P1 D1 P2 D2 α1 α
2

In this implementation seven (7) bits was assigned for each gene. Two objectives
were defined F1 and F2, that we discuss them bellow, before the definition of F1 and
F2 are described.

F1=Error1+Error2;
F2= (Error1>Error p1) + (Error2>Error p2);

Error1 means the total error in this iteration for the first arm joint. Error p1 means the
total error of the previous iteration for the first arm. And Error2 , Error p2 are such as
predefined parameters for the second degree of freedom.

F1 means that the total error of the first and second joints must be minimized.
F2 consists of two inequalities:
The first inequality means if the total error of this iteration for arm #1 is greater than
the total error of the previous iteration then the result of it becomes zero (0). If (Er-
ror1<=Errorp1) then the outcome of inequality is one (1). The second inequality has
the same description.
Therefore the minimum of F2 occurs when that (Error1<=Errorp1) and (Er-
ror2<=Errorp2) that the amount of F2 is zero.
Maximum value of F2 occurs when, the both of above inequality occurs simulta-
neously. For each chromosome the model was ran 10 times thus F2_total equals the
sum of above F2's and F1_total equals the sum of F1's.

4 Gaot (Another Matlab Toolbox)


It is a genetic toolbox in MATLAB which is used to optimize complex nonlinear
functions. Generally Genetic algorithms are good tools for searching in the space of
problem solutions. This toolbox is used to optimize one fitness function and we can
not use it, to optimize further than one fitness function. But, we can combine some
fitness and collect them to one well defined fitness function and then use it for opti-
mizing. In this way, we can obtain the aim of implementing multi objective problems.
The combination of these fitness functions can be defined as a weighting average of
goals that must be optimized.
We use GAOT to optimize the parameters but in this method, our parameters set
were (P,D) gains of controllers. α1 and α2, is assumed to be fixed with 0.95 and 0.9
values.
The values for these parameters is obtained manually with trial and error.
Also empirical experiments exhibit that amount of these parameters must be close
to 1 but not exactly 1.
The outcome of GAOT for tuning ILC parameters is as well as MOEA, but they
seem to be memories with longer time of remember than the results of MOEA.
376 A. Rezaee and M. jallali

Because they give us averaage of error less than those of MOEA in iteration numbbers
further than 10.
we compare the result off MOEA and those of GAOT, both of them be well forr 10
iteration, but GAOT outcom mes, give better average of error than MOEA. Becausee of
applying a constraint such h as Error(k+1)<=Error(k) to objectives is not possiblee in
MOEA, it is possible to imp pose it to goals in GAOT.
The fitness that, we use in
i this toolbox is:
Val= (1/(Error1+Error2))*(1/count)+(1/Error)

5 The Results
Figs. 4 and 5 show the totall error of the 1st and 2nd joints versus number of iteratioons:

7
1.5

6
Second Joint Total Error
First Joint Total Error

0.5
2

0
1 2 3 4 5 6 7 8 9 10 0
1 2 3 4 5 6 7 8 9 10

NO. of iteration NO. of iteration

Fig. 4. Fig. 5.

Figs. 6 and 7 indicate desired value of 1st and 2nd joint angle and the acttual
value of it:

Fig. 6. Fig. 7.
Using Genetic Algorithm for Parameter Tuning on ILC Controller Design 377

These results illustrate increasing in total error during 10 iteration after the tuning
phase.
Iteration: 1 Total Error 7.6864
Iteration: 2 Total Error 1.0291
Iteration: 3 Total Error 0.46992
Iteration: 4 Total Error 0.19265
Iteration: 5 Total Error 0.09609
Iteration: 6 Total Error 0.07436
Iteration: 7 Total Error 0.064329
Iteration: 8 Total Error 0.066136
Iteration: 9 Total Error 0.061094
Iteration: 10 Total Error 0.055988
The pictures of trajectory in Tool space is as bellow. These pictures are plotted with
XY graph in Matlab Simulink Environment.

Fig. 8. Tool space trajectory in iteration NO.1 Fig. 9. Tool space trajectory in iteration NO.3

Fig. 10. Tool space trajectory in iteration NO.5 Fig. 11. Tool space trajectory in iteration NO.7
378 A. Rezaee and M. jalali

Fig. 12. Tool space trajectory in iteration NO.9 Fig. 13. Tool space trajectory in iteration
NO.10

6 Conclusions and Recommendations


In this section we would be discussed the Conclusions and some recommendations for
future works. Genetic algorithms which have a single objective [6], it seems that yield
better results, than generating multi-objective [5] ones. But we think that preference
multi-objective genetic algorithms can yield better results than both of single and ge-
nerating multi-objective algorithms with respect to some objectives such as a measure
of performance, a measure of stability, total error and the number of times that we get
a run with smaller error than the previous run. These objectives could be given dis-
tinct weights and then be accumulated to get a single objective. The aforementioned
approach is called weighted sum method and is a very powerful type of multi-
objective preference methods. The weights can be fixed or variable according to the
type of algorithm that be used. Generating multi-objective algorithms based on Pareto
approach don't have certainty because they wouldn't yield a response without sacrific-
ing any objective. Future works can be done with swarm intelligence and other me-
thod of machine learning such as reinforcement learning. And also a good variety of
works could be done in the performance and stability aspects.

References
1. Siciavicco, L., Siciliano, B.: Modeling and control of robot Manipulators. McGraw-Hill,
New York (1996)
2. Schilling, R.J.: Fundamental of Robotics. Prentice Hall, Englewood Cliffs (1990)
3. Bien, Z., Xu, J.-X.: Iterative learning control. In: Analysis, Design, Integration and Applica-
tion. Kluwer Academic Publisher, Dordrecht (1998)
4. Boming, S.: On Iterative Learning, PhD thesis, National University of Singapore (1996)
5. Fonseca, C.M., Fleming, P.J.: An overview of evolutionary algorithms in multiobjective
optimization. Evolutionary Computation 3(1), 1–16 (1998)
6. Goldberg, G.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison
Wesley, Reading (1989)
Controller Design for a Heat Exchanger in Waste Heat
Utilizing Systems

Jianhua Zhang, Wenfang Zhang, Ying Li, and Guolian Hou

North China Electric Power University, Beijing, 102206 China


[email protected]

Abstract. This paper presents a method of designing self-tuning PID controller


based on genetic algorithm for an evaporator in an Organic Rankine Cycle sys-
tem for waste heat recovery. Compared with Ziegler-Nichols PID controller, the
self-tuning PID controller can achieve better control performance.

Keywords: PID controller, genetic algorithm, heat exchanger.

1 Introduction

Among different ways of recycling waste heat, Organic Rankine Cycle (ORC) system
is preferred because of high reliability, flexibility and low requirement for mainte-
nance. The key components of ORC system are evaporator and condenser. Usually,
moving boundary method and discrete method are used to build evaporator model.
Compared with discrete model, moving boundary model for two phase flow in
evaporator is less complex, as it is characterized by smaller order and higher compu-
tational speed [1]. The moving boundary model has been verified to be effective for
describing the dynamic characteristics of evaporator [2], [3].
A properly controlled evaporator plays a key role in achieving high performance in
ORC system. The most popular controller used in industrial processes is PID controller.
Cruhle and Isermann designed a PI controller to keep evaporator superheat at a fixed
point [4], but the PID controller behaves optimally only for the operating point which is
designed. The practical ORC system always operates in a wide range of operating
conditions, so it is necessary to design self-tuning PID controller. Genetic algorithm
has been recognized as an effective and efficient technique to solve optimization
problem of PID controllers [5], [6]. The self-tuning PID controller based on genetic
algorithm is employed to control the superheated temperature in waste heat
utilizing systems.
This paper is organized as follows: Section 2 describes the dynamic characteristics
of the evaporator. Section 3 solves the self-tuning PID parameters based on genetic
algorithm. Section 4 presents the simulation studies on the evaporator and compares the
self-tuning controller with the Ziegler-Nichols PID controller. Section 5 concludes
the paper.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 379–386, 2011.
© Springer-Verlag Berlin Heidelberg 2011
380 J. Zhang et al.

2 Modeling of Heat Exchanger


In an ORC system, the organic working fluid enters the heat exchanger in sub-cooled
liquid phase and exits as superheated vapor, so the heat exchanger shown as figure 1
can be described by three zones characterized by sub-cooled liquid, saturated mixture
and superheated vapor.

m i hi

Tr1 Sub-Cooled
L1 Zone Ta1
z ρ1 Tw1
hl

Tr 2 Two-Phase
ρ Zone Tw 2 Ta 2
L2 2

hg

Tr 3 Superheated
L3 Tw 3 Ta 3
ρ3 zone

m o ho

Fig. 1. Schematic of the heat exchanger

Nomenclature
A —area ( m )
2 α —heat transfer coefficient( w/ m2 °C )
P —pressure ( Pa ) Subscripts
L —length ( m ) r —working fluid
w —wall
h —specific enthalpy ( J / kg )
a —gas
D —diameter ( m ) i —inlet or inner
ρ —density ( kg / m 3 ) o —outlet or outer
T — temperature ( °C ) 1—sub-cooled
Z — length coordinate ( m ) 2—two-phase
v — velocity ( m / s ) 3—superheated
m — mass flow rate ( kg / s ) s —steady state

c p — heat capacity ( J / m 2 °C )
The moving boundary method is one way to investigate the dynamic characteristics
of the evaporator. The evaporator is divided into zones and separated by boundaries.
Several assumptions must be made in order to simplify the model.
(1) The evaporator is a long, thin, horizontal tube;
(2) The working fluid is mixed adequately and the working fluid flowing through the
evaporator tube can be modeled as a one-dimensional fluid flow;
Controller Design for a Heat Exchanger in Waste Heat Utilizing Systems 381

(3) The pressure drop along the evaporator tube, caused by momentum change in
working fluid and viscous friction, is negligible;
(4) Axial heat conduction in the working fluid as well as in the pipe wall is
negligible.
The pressure loss is assumed negligible, so the momentum balance is superfluous.
The governing equations, derived from the mass and energy conservation principles,
are represented by [3]:
∂Aρ ∂m (1)
+ =0
∂t ∂z

∂( ρAh − AP) ∂m h


+ = πDiα i (Tw − Tr ) (2)
∂t ∂z

(Cp w ρ w Aw ) ∂Tw = πDiα i (Tr − Tw ) + πDoα o (Ta − Tw ) (3)


∂t
Equations (1)-(3) are integrated along the axial coordinate for each of the three re-
gions to generate moving boundary equations.
Selection of independent state variables is one of the key issues of thermodynamic
modeling design. The integrated equations can describe the dynamics of an evaporator
in terms of seven state variables: the position of the interface L1 , L2 , L3 , the working
fluid pressure P , the outlet enthalpy ho , the mean wall temperature Tw1 , Tw 2 , Tw3 .
Define the vector of state variables as x = [ L , L , P , h , T , T , T ]
T

1 2 o w1 w2 w3
and the vector of
T
control input as u = ⎡⎣ m
 i , hi , m o , va ⎤⎦ .The compact state space form [3] is as follows:

x = D −1 f ( x, u ) (4)

This model has been reduced to a compact, lumped-parameter form, even the linear
model around an operating point can also be obtained as follows:
δx = Aδx + Bδu (5)
∂f ( x, u ) ∂f ( x, u )
where A = D −1 , B = D −1 .
∂x x , u ∂u x , u
s s s s
It is significant to maintain an appropriate superheated temperature at the outlet of
the evaporator. The efficiency of the evaporator becomes lower with higher super-
heated temperature, since shorter section is used for evaporation and the energy is
wasted. However, if the superheated temperature is too small, there may be some liquid
entering the turbine.

3 Genetic Algorithms for PID Tuning

3.1 Discrete PID Controller

The discrete PID control is usually implemented as follows:


382 J. Zhang et al.

k
u ( k ) = K p e(k ) + K i ∑ e( j ) + K d [e( k ) − e( k − 1)] (6)
j =0

where K p , K i and K d are coefficients of proportion, integration and differential respec-


tively. e( k ) is the deviation between the set point and the controlled temperature,
u ( k ) is the manipulated variable.
The goal of PID controller design is to determine the gains K p , K i and K d to im-
prove the transient response by reducing the overshoot and shortening the settling time
of the system.

3.2 Self-tuning PID Controller


The practical evaporator usually operates in a wide range of operating conditions. In
addition, it is difficult to build an exact, simplified and control-oriented model for
operating evaporator. Therefore, the self-tuning PID controller is needed to ensure
optimal performance of the controlled process.
Genetic algorithm is a kind of optimal search algorithm and can be applied to tune the
parameters of PID controllers. The structure of the control system is shown in figure 2. ξ1
and ξ 2 are process disturbances and measurement disturbances respectively.

Genetic
algorithm

K p Ki Kd ξ2
ξ1
r e(t ) y
+
PID Evaporator

Fig. 2. Structure of PID controller based on genetic algorithm

The steps of the genetic algorithm are stated as follows [7]:


(1) Set the range and coding length of the PID parameters according to the precision
request and experiences, then code these parameters.
(2) Generate initial population within boundaries randomly, which is comprehensive
so local minima could be avoided.
(3) Select the fitness function. To obtain satisfactory dynamic characteristics of the
transition process, integral of absolute error, rise time and controller outputs are se-
lected as the performance criterions. The objective function of optimization is :

J = ∫ (w e(t ) + w2 u ( t ) )dt + w t
2
1 3 u
(7)
0

where tu is rise time, w1 , w2 and w3 are weighting coefficients, u (t ) is the output of


controller and e(t ) is the system error.
Controller Design for a Heat Exchanger in Waste Heat Utilizing Systems 383

To avoid the big overshoot, we revise the objective function as follows:

(8)

J = ∫ (w e(t ) + w2 u (t ) + w δ y (t ) )dt + w3tu


2
1 4
0

where w4 is the weight coefficient and w4 >> w1 , δ y (t ) = y (t ) − y (t − 1) . The fitness


function is f = 1 .
J
Optimization process is to find the individual with the largest fitness function value
with the constraint conditions.
(4) Operate current population using reproduction, crossover and mutation in order
to generate new population.
i) Reproduction is based on the principle of survival of the fittest. The selection
probability of each individual depends on the fitness value, the bigger the fitness value
is, the more offspring the individual reproduces.
ii) Crossover plays a critical role in genetic algorithms, which can exchanges infor-
mation of the chromosomes in the population and produces new chromosomes. Two
chromosomes are randomly selected from the population and governed by crossover rate
Pc , these two chromosomes are swapped subsection from randomly chosen crossover.
iii) Mutation changes the structure of the string by changing the value of a bit chosen
randomly. This operator can prevent individuals falling into a local optimum.
(5) Repeat steps (3) and (4) until the prescribed request is satisfied.

4 An Illustrative Example
In the simulation, the size of population is 30, the number of generation is 100,
crossover probability Pc = 0.9 , mutation probability Pm < 0.01 . The weights in equa-
tion (8) are w1 = 1 , w = 1 , w = 1 and w
2 3 4 = 100 . The searching ranges of PID
parameters are set to K p ∈ [0,1] , K i ∈ [0,1] and K d ∈ [0,1] .
R245fa is adopted as organic working fluid. The initial steady condition is consid-
ered: evaporation pressure P = 2 Mpa , the working fluid mass flow m = 3.72kg / s ,
the evaporator outlet temperature Tout = 137.6°C ,the velocity of low-quality exhausts
va = 4.03m / s . The set point of the outlet temperature of evaporator is Tset = 140°C .
Figure 3 shows the objective function J in equation (8) is convergent when PID pa-
rameters are optimized using genetic algorithm. It can be shown from figure 4 that the
self-tuning PID controller is better than Ziegler-Nichols PID controller ( K p = 0.24 ,
K i = 0.015 ,K d
= 0.06 ), the step response with self-tuning PID controller has smaller
overshoot and settling time. The self-tuning PID controller is more complex compared
with the Ziegler-Nichols PID controller, but the proposed algorithm can regulate pa-
rameters online. Figure 5 demonstrates the profile of the manipulated variable corre-
sponding to self-tuning PID controller, the velocity of the exhausts changes with small
chatter due to the disturbances existed in the closed loop control system.
384 J. Zhang et al.

9000

8000

7000

6000

Best J
5000

4000

3000

2000

1000
0 10 20 30 40 50 60 70 80 90 100
Times(s)

Fig. 3. The objective function J

140.5

ZN PID

140

GA PID
139.5
Tout(• •

139

138.5

138

137.5
0 10 20 30 40 50 60 70 80 90 100
Time(s)

Fig. 4. Step response of evaporator outlet temperature

5.5

5
u(m/s)

4.5

3.5

3
0 10 20 30 40 50 60 70 80 90 100
Time(s)

Fig. 5. The velocity of the exhausts


Controller Design for a Heat Exchanger in Waste Heat Utilizing Systems 385

142

141

140

rin,yout(• )
139

138

137

136
0 5 10 15 20 25 30 35 40 45 50
Time(s)

Fig. 6. Response of outlet temperature with disturbance

The disturbance is imposed by varying the velocity of low-quality exhausts


Δva = +1m / s when t = 30s and lasts 0.5s, The simulation result shown in figure 6
demonstrates that the outlet temperature can track the set point quickly.

5 Conclusions
In this paper, the self-tuning PID controller based on genetic algorithm has been em-
ployed to control the outlet temperature of the evaporator in a waste heat utilizing
system. Simulation results demonstrate that the self-tuning controller outperforms
Ziegler-Nichols PID controller. Therefore genetic algorithm is a reasonable and effec-
tive method to optimize the parameters of PID controllers.

Acknowledgement
This work was supported by the China National Science Foundation under Grant
(60974029) and National Basic Research Program of China under Grant (973 Program,
2011CB710706). These are gratefully acknowledged.

References
1. Wei, D.H., Lu, X.S., Lu, Z., Gu, J.M.: Dynamic Modeling and Simulation of an Organic
Rankine Cycle (ORC) System for Waste Heat Recovery. J. Applied Thermal Engineering 8,
1216–1224 (2008)
2. Jensen, J.M., Tummescheit, H.: Moving Boundary Models for Dynamic Simulation of
Two-Phase Flows. In: 2nd International Modelica Conference, Oberpfaffenhofen, pp.
235–344 (2002)
3. He, X.D.: Dynamic Modeling and Multivariable Control of Vapor Compression Cycles in
Air Conditioning System. Ph.D Thesis, Massachsetts Institute of Technology, Department of
Mechanical Engineering (1996)
386 J. Zhang et al.

4. Gruhle, W.D., Isermann, R.: Modeling and Control of a Refrigerant Evaporator. In:
American Control Conference, Darmstadt, pp. 234–240 (1985)
5. Lin, G.H., Liu, G.F.: Tuning PID Controller Using Adaptive Genetic Algorithms. In: 5th
International Conference on Computer Science & Education, pp. 519–523. IEEE Press,
Hefei (2010)
6. Singh, R., Sen, I.: Tuning of PID Controller Based AGC System Using Genetic Algorithms.
In: 2004 IEEE Region 10 Conference, TENCON 2004, pp. 531–534. IEEE Press, Bangalore
(2004)
7. Liu, J.K.: Advanced PID Control MATLAB Simulation, 2nd edn. Electronic Industry Press,
Beijing (2006)
Test Research on Radiated Susceptibility of Automobile
Electronic Control System

Shenghui Yang, Xiangkai Liu, Xiaoyun Yang, and Yu Xiao

Academy of Military Transportation, Tianjin, China

Abstract. Research into the influences of electromagnetic radiation on automo-


bile electronic control system (ECS) is of great significance to improve and en-
hance its performance under adverse electromagnetic environment. Therefore,
an analog device of an ECS of a certain typed automotive engine is devised, a
radiated susceptibility (RS) test is conducted, and the results are then analyzed.
The test results indicate that the sensitive frequency of this ECS are HF (high
frequency) and VHF (very high frequency), and when the field strength of elec-
tromagnetic radiation within the sensitive bandwidth grows to a certain degree,
there will be signal derangement, signal loss and even damage to the electronic
components of the ECS.

Keywords: Automobile, Electronic Control System, Electromagnetic Radia-


tion, Radiated Susceptibility (RS).

1 Introduction
With the development of the electronic technology, the frequency of many circuits
and electronic equipment keeps increasing while the operation voltage keeps decreas-
ing. Consequently, the susceptibility and vulnerability to EMP (electromagnetic
pulse) tend to be increasing as well [1]. What’s more, the electronic system has a wide
application of integrated circuits which is relatively sensitive to EMP. Therefore, a
high strength EMP will lead to codes error and memory loss of LSI (Large Scale
Integrated) circuit, and will even result in invalidation or burnout of electronic de-
vices. The facts mentioned above constitute a great threat to the normal operation of
modern automobile which is characterized by the mass application of ECS. Hence it
has become extremely necessary and exigent to conduct the research into the influ-
ence of electromagnetic radiation on automobile ECS [2].
In order to assure the electromagnetic compatibility and susceptibility of military
materiel, a series of military standards come in to being, such as GJB151, GJB152,
MIL-STD-461 and MIL-STD-464. Take GJB151A [3] for example, it provides nine-
teen electromagnetic emission and susceptibility requirement items, including three
RSs (RS101, RS103 and RS105). And MIL-STD-461F [4] provides eighteen emis-
sion and susceptibility requirement items, also including three RSs (RS101, RS103
and RS105). According to these standards, the electromagnetic compatibility, inter-
ference and susceptibility designing and testing had been carried out before the batch
production of a specific automobile. In fact, in such a test, there are some limits
whether in strength or in frequency range. There are many evidences indicating that,

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 387–394, 2011.
© Springer-Verlag Berlin Heidelberg 2011
388 S. Yang et al.

the main threat of the automobile may be the high strength EMP in a modern battle-
field. Because of the lack of tests in the influence of an automobile under high
strength EMP environment, we don’t know whether the ECS of an engine can over-
come the high strength EMP or not; and if the influence exists, how and to what ex-
tent will it be affected.
In this paper, a special RS test has been designed and carried out, and the test re-
sults have also been analyzed. Through the test, the influences of high strength EMP
on the automobile ECS have been identified, the susceptibility and vulnerability to the
EMP has been researched, the damage characteristics and law under the EMP envi-
ronment have been probed. It can provide not only technical support for the decreas-
ing of the vulnerability of the automobile ECS, but also design rules for battlefield
damage assessment and repair.

2 Test Design and Result

2.1 Designing of Test Device

A subsystem of a typical automobile engine ECS has been selected to carry out the
RS test. Because of the limits of the test devices and space, an analog device has been
designed. It can simulate the actual operation of the subsystem of the ECS. The de-
vice is consisted of an ECU (electronic control unit), a mass air flow sensor, a tem-
perature sensor, two electromagnetic injector, an ignition controller, a storage battery,
a signal generator and a blower (Fig.1).

Fig. 1. The test device


Test Research on Radiated Susceptibility of Automobile Electronic Control System 389

Each elements of the device is from the original automobile except the signal gen-
erator and the blower, and they are connected by the control bundle of wires from the
original automobile. The blower is adopted to simulate the inlet condition of the en-
gine, and the signal generator is applied to simulate the speed signal of the engine,
thus the electronic control system can operate independently.

2.2 Designing of Test System and Parameters

According to GJB151A and GJB152A[5], and with reference to MIL-STD-461F and


MIL-STD-464C[6], an total frequency range electromagnetic radiation test from 10
kHz-18GHz has been taken into consideration. The test is divided into two categories:
GTEM cell radiation test of 10 kHz-1GHz and shielding cell radiation test of 1GHz-
18GHz. Band classification and scanning rate is given in Table 1.
The test setup is provided in Fig. 2 and Fig. 3.

Table 1. Test frequency scanning range and step size

Frequency range Stepped scans, step size Residence time


10kHz-80MHz 0.0025f0 1s
80MHz-1GHz 0.0025f0 1s
1GHz-2.5GHz 0.001f0 1s
2.5GHz-7.5GHz 0.001f0 1s
7.5GHz-18GHz 0.0005f0 1s

power meter

signal generator power amplifier coupler power supply(2)

GTEM cell

field intensity
power supply(1) EUT
meter

optical fiber

portable monitoring
computer device

Fig. 2. Test setup for 10 kHz-1000MHz


390 S. Yang et al.

shielded enclosure

signal generator power amplifier antenna

test setup boundary

field intensity
power supply EUT
meter

optical fiber

monitoring portable
device computer

Fig. 3. Test setup for 1-18GHz

2.3 Test Results

a) Fig. 4 describes the normal waveform of the system.


b) Among the whole band of 10 kHz-18GHz, the ECS is sensitive to 4MHz-
260MHz, that is in HF (high frequency) and VHF (very high frequency) range, which
is concerned with the fact that the CPU crystal frequency in ECU of electronic control
system is 16MHz.
c) When testing frequency is 80MHz and field strength is 20V/m, derangement of
fuel injection signal emerges in the ECS (see Fig. 5. a.); and when field strength is
raised to 32 V/m, there is derangement of fuel injection signal and apparent decrease
in the amplitude of the ignition signal (see Fig. 5. b.).

Injection Signal

Rotary Signal

Ignition Signal

Fig. 4. EUT Signal Feature (normal)


Test Research on Radiated Susceptibility of Automobile Electronic Control System 391

a. Field strength 20V/m b. Field strength 32V/m

Fig. 5. EUT Signal Feature (80MHz)

a. Field strength 50V/m b. Field strength 67V/m

Fig. 6. EUT Signal Feature (120MHz)

d) When testing frequency is 120 MHz and field strength of is 50V/m, there is de-
rangement of fuel injection signal, decrease in the amplitude of the ignition signal and
the phenomenon of misfire (see Fig. 6. a.); when field strength is increased to 67 V/m,
there is disappearance of ignition and fuel injection signals (see Fig. 6. b.); and then
when field strength is gradually reduced to 55 V/m, ignition signal and fuel injection
signal are restored to the state of under interference.
e) When testing frequency is 32 MHz and field strength reaches 53V/m, ignition
signal and fuel injection signal disappear, and when field strength exceeds 200V/m,
damage to ECU devices occurs.
392 S. Yang et al.

3 Analysis of Test Results

3.1 Cause Analysis of Derangement of Fuel Injection Signal

When penetrating through the electrical devices, energy of EMP will impose on the
equipment adverse effects which are usually shown in two aspects: one is functional
damage to the electronic installations, and the other is malfunction of the ECS [7].
System malfunction is referred to the state of temporary operating interruption of
the devices with the impact of EMP. Generally, system malfunction is divided into
two cases, one of which is that transient interference generated by EMP impact ap-
pears on one input point of the circuit with the other input points being still on the
original level and the output being temporarily changed. Under this circumstance, the
interfering signal produced in the process of electromagnetic transient passes through
amplifying circuit and is then mistakenly regarded as control signal, which in succes-
sion will lead to the derangement of ignition and injection signal. Since the input
wire, DC power cord and grounding line are usually made of common conductor
without shielding, the system is extremely sensitive to that interference and is vulner-
able to operational malfunction [6].

3.2 Cause Analysis of Signal Disappearance

The main reason for signal disappearance is the recurrent relocation or crash of mi-
crocontroller.
Microcontroller is the core component of the electronic control system. Therefore,
watchdog circuit is usually set up to avoid the abnormality of microcontroller pro-
gram under the electromagnetic interference and restore the program to its normal
condition as soon as possible. But when the amplitude and frequency reaches to a
certain degree, recurrent relocation or crash of the microcontroller is likely to occur
and its external features are the disappearance of signals. But when the amplitude of
interference signal decreases, the system again returns to the state of under interfer-
ence before the state of signal disappearance.
After a great number of repeated experiments and intensive analysis of operating
principle of microcontroller, we have come to the conclusion as follows [7]: under the
environment of EMP irradiation, there is potential resetting of microcontroller, for
which one reason is that interference signal on RST pin is mistaken for relocation
signal, and the other reason is that interference signal on the relocation signal line of
CPU directly relocates microcontroller. So if a secure relocation of microcontroller is
to be achieved, there must be a high level with its machine cycles no less than two on
the RST pin. And that high level should at least keep at 2μs when the crystal fre-
quency reaches 12MHz [8]. Although the width of plus-minus pulse of the interfer-
ence signal is usually less than 2μs which seems to be unqualified for relocation, that
condition is for sure the requirement for a secure relocation. In other words, it is
known that relocation circuit inside CPU will take a sample of the state of RST in the
S5P2 of each machine cycle. Therefore, if the RST data in two successive samplings
are all in high level, CPU can be relocated as well. Furthermore, since the duration of
interference signal on RST pin is close to 2μs, the possibility of acquiring high level
on RST pin in two successive samplings is still in existence.
Test Research on Radiated Susceptibility of Automobile Electronic Control System 393

Crash problem is referred to a state, in which the “endless loop” program is oper-
ated in computer system and only by means of pressing the key of reset can endless
loop be skipped out.
To be specific, the crash is caused by abnormal jump of computer program, of
which the jumping point is random. When the program jumps to the noninitial byte of
an instruction, the latter part of that instruction is possible to be combined with the
former part of the following instruction as an integrated instruction to be executed.
Consequently, such series of instructions as a whole has already changed into a new
program beyond recognition which probably evolves into an endless loop [9]. Alter-
natively, the abnormal jump of computer program will not necessarily give rise to a
crash, because there are slim chances of crash when computer program jumps to the
initial byte of an instruction. It is the alternation of PC (program counter) content
inside CPU that leads to the abnormal jump of computer program. In this way, it is
very easy for EMP to be imported into CPU with data bus by means of front door or
back door coupling. Thus the PC value is changed and then the program crashes.

3.3 Cause Analysis of Component Damage

The test research indicates that various hazardous electromagnetic resources exert
their influences mainly by means of conductive coupling or radiant coupling of the
energy. EMP damage mechanism of electric system and electronic control system can
be summarized into the following 3 aspects:
a) Thermal Effect
EMP thermal effect is an adiabatic process that is generally achieved in nanosec-
ond or microsecond. This effect will give rise to overheating of microelectronic de-
vices and electromagnetic sensitive circuits, lead to burnout of the mental bar in input
protection resistance and CMOS (Metal Oxide Semiconductor), and ultimately result
in the functional degradation or invalidation of the circuit [10].
b) Interference and Surge Effect
RFI generated by EMP creates electrical noise, EMI (electromagnetic interference),
malfunction or functional invalidation to the electronic circuit. Besides, transient
overvoltage or surge effect produced by EMP will also bring forth hard damage that is
mainly reflected in the phenomena such as short circuit, open circuit, breakdown of
PN junction, Oxide Breakdown of semiconductor devices.
c) High-Electric Field Effect
The High-Electric Field can not only bring about dielectric breakdown between
MOS gate oxides or wires and circuit invalidation, but it will also impose adverse
effects on the operating reliability of the sensitive devices.

4 Conclusions
It has been found through the RS test within band of 10 kHz-18GHz that the elec-
tronic control system is sensitive to the band of 4MHz-260MHz, the reason for which
is the fact that the CPU crystal frequency in ECU of electronic control system is
16MHz. And when the field strength of electromagnetic radiation within sensitive
bandwidth rises to a certain degree, there is signal derangement, signal loss and even
damage to electronic components of the ECS.
394 S. Yang et al.

Finally, because the test is not so complete and thorough, and the measurement of
threshold values and analysis of the phenomena mentioned above are not so exact and
perfect, deeper and further research and study still remain to be continued.

References
1. Shuzhong, W., Zhenxin, Z.: The Effects of EMP on Electronic Circuits and its Protection.
Electronics Quality (8), 75–77 (2008)
2. Shenghui, Y.: The Measurements of Automobile Equipment Support in the Complex Elec-
tromagnetic Environment. Journal of Academy of Military Transportation 11(4), 32–35
(2009)
3. GJB151A. Electromagnetic Emission and Susceptibility Requirements for Military
Equipment and Subsystems, pp. 4–5. Publishing House of Defense Industry, Beijing
(1997)
4. MIL-STD-461F. Requirements for the Control of Electromagnetic Interference Character-
istics of Subsystems and Equipment, pp. 25–26. U.S. Government Printing, Washington
(2007)
5. GJB152A. Measurement of Electromagnetic Emission and Susceptibility for Military
Equipment and Subsystems, pp. 1–87. Publishing House of Defense Industry, Beijing
(1997)
6. MIL-STD-464C. Electromagnetic Environmental Effects Requirements for Systems, pp.
1–156. U.S. Government Printing, Washington (2010)
7. Shenghui, Y.: Study on Irradiation Effects of Nuclear Electromagnetic Pulse to Vehicle
Equipment Electronic Control System. Journal of Academy of Military Transporta-
tion 12(4), 46–51 (2010)
8. Guangdi, L.: Fundamentals of Single-chip Microcomputer, pp. 20–24. Press of Beijing
University of Aeronautics and Astronautics, Beijing (2006)
9. Huanxiang, L.C.: Software Anti-interference Design of Monolithic Application System.
Wuhan Uni. Of Sci. & Tech. 23(2), 193–195 (2000)
10. Shanghe, L.: Electromagnetic Environment Effect and its Development Trends. National
Defense Science and Technology 29(1), 4–6 (2008)
Forgeability Attack of Two DLP-Base Proxy
Blind Signature Schemes

Jianhong Zhang1 , Fenhong Guo1 , Zhibin Sun1 , and Jilin Wang2


1
College of Sciences, North China University of Technology,
Beijing 100041, P.R.China
[email protected]
2
College of Infomation, Zhejiang University of Finance and Economics,
Hangzhou 31008, China

Abstract. Proxy blind signature is a cryptographical technique and al-


lows a proxy signer to produce a proxy blind signature on behalf of orig-
inal signer, in such a way that the authority learns nothing about the
message that is being signed. The unforgeability is an important property
of digital signature. In this work, we analyze security of two DLP-based
proxy blind signature schemes , and show that the two schemes are in-
secure. They are universally forgeable, in other words, anyone is able to
forge a proxy blind signature on arbitrary a message. And we also ana-
lyze the reason to produce such attack. Finally, the corresponding attack
is given.

Keywords: proxy blind signature, unforgeability, attack, discrete loga-


rithm problem.

1 Introduction
In traditional digital signature schemes, the binding between a user and his
public key needs to be ensured. The usual way to provide this assurance is
by providing certificates that are signed by a trusted third party, Namely the
public certificate. As a consequence, the system requires a large storage and
computing time to store and verify each user’s public key and the corresponding
certificate. In 1984, Shamir [2]introduced the conception of identity-based public
key cryptosystem to simplify key management procedures in certificate-based
public key setting. In ID-based mechanism, the user’s public key is indeed his
identity (such as email, IP address, etc.). Since then, various ID-based encryption
schemes and signature schemes have been proposed. At present, many ID-based
encryption and signature schemes have been proposed based on the bilinear
pairings in elliptic curves or hyper-elliptic curves. The size of signature is in
general short in these schemes.
The notion of blind signature was introduced by D.Chaum [4], which can pro-
vide an anonymity of signed message. Since it was introduced, blind signature
schemes [4,5,6,7,8,9,10] have been used in numerous application, most promi-
nently in anonymous voting and anonymous e-cash. At the same time, to adapt

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 395–402, 2011.

c Springer-Verlag Berlin Heidelberg 2011
396 J. Zhang et al.

practical demands, many variant of the blind signature schemes appeared, such
as partial blind signature, group blind signature etc.
Informally, blind signature allows a user to obtain signatures from an authority
on any document, in such a way that the authority learns nothing about the
message that is being signed. The most important property of blind signature
is unforgeability, which requires that it is impossible for any malicious user that
engages in k runs of the protocol with the signer, to obtain strictly more than k
valid message-signature pairs. The basic idea of most existing blind signatures
is that the requester randomly chooses some random factors and embeds them
to the message to be signed. The random factors are kept in secret so the signer
cannot recover the message. Upon the blinded signature returned by the signer,
the requester can remove the random factor to obtain a valid signature. Up to
now, two ID-based blind signature schemes based on bilinear pairings have been
proposed. The first scheme was proposed by Zhang and Kim[16] in Asiacrypt
2002, the other scheme was proposed in ACISP2003.
The notion of proxy signature scheme introduced by Mambo et. al in 1996
[15]. A proxy signature scheme allows a entity, called original signer, to delegate
his signing capability to one or more entities, called proxy signer. Since it is
proposed, the proxy signature schemes have been suggested for use in many
applications [16,17,18], particularly in distributed computing where delegation of
rights is quite common. Examples discussed in the literature include distributed
systems, Grid computing, mobile agent applications, distributed shared object
systems, global distribution networks, and mobile communications. And to adapt
different situations, many proxy signature variants are produced, such as one-
time proxy signature, proxy blind signature, multi-proxy signature, and so on.
Since the proxy signature appears, it attracts many researchers’ great attention.
The proxy signature and blind signature have respective advantages. In some
real situations, we must apply them both concurrently, for example, in an anony-
mous proxy electronic voting. The first proxy blind signature was proposed by
Lin and Jan [3] in 2000. Later, Tan et al.[5] proposed a proxy blind signature
scheme. However, in 2003, Lal et al.[8] pointed out that Tan et al.s scheme was
insecure and proposed a new proxy blind signature scheme based on Mambo et
al.s scheme [6]. In 2004, Wang et al.[10] demonstrated that Tans scheme was
insecure and proposed two effective attacks. In 2005, Sun et al.[11] showed that
Tan et al.’s schemes didnt satisfy the unforgeability and unlinkability properties
and they also pointed out that Lal’s scheme [8] didnt possess the unlinkability
property either. In 2004, Xue and Cao showed there exists one weakness in Tan
et al.s scheme [4] and Lal et al.s scheme [8] since the proxy signer can get the link
between the blind message and the signature or plaintext with great probability.
In 2007, Li et al.[11] proposed a proxy blind signature scheme using verifiable
self-certified public key, and compared the efficiency with Tan et al.[5]. Recently,
Yang et.al proposed new scheme[12] and showed their scheme is more efficient
than Li et al.[11]. Recently, based on Yang et.al ’s proxy blind signature, Kar
et al and Nway Oo et al. proposed a novel proxy blind signature in [1] and [2],
respectively. In this paper, by analyzing we present Kar et al’s scheme and Nway
Forgeability Attack of Two DLP-Base Proxy Blind Signature Schemes 397

Oo et al.’s scheme, we show that the two scheme is insecure against unforgeabil-
ity attack, They are universally forgeable, in other words, anyone is able to forge
a proxy blind signature on arbitrary a message. And we also analyze the reason
to produce such attack. Finally, the corresponding attack is given.
The rest of the paper is organized as follows: Section 2 give some preliminary
knowledge related to the paper; in section 3, we show the flaw of Huang et.al
blind signature scheme, then recall Zhang et.al blind signature scheme in section
4; in section 5, we analyze the security of Zhang et.al blind signature scheme
; in section 6, we analyze security of Wu et.al scheme and give the attack on
blindness. Finally, we draw this paper.

2 Preliminaries
In this section, we will review security requirements of proxy blind signature.
– Distinguishability: The proxy blind signature must be distinguishable from
a normal signature.
– Non-repudiation: Neither the original signer nor the proxy signer can sign on
behalf of the other party. This means that they cannot deny their signatures
against anyone.
– Verifiability: The verifier should be able to verify the proxy signature in a
similar way to the verification of the original signature.
– Unforgeability: Only the designated proxy signer can create a valid proxy
signature for the original signer (even the original signer cannot do it).
– Identifiability: Anyone can determine the identity of the corresponding proxy
signer from a proxy signature.
– Prevention of misuse: It should be confident that proxy key pair should
be used only for creating proxy signature, which conforms to delegation
information. In case of any misuse of proxy key pair, the responsibility of
proxy signer should be determined explicitly.
– Unlinkability: After proxy blind signature is created, the proxy signer knows
neither the message nor the signature associated with the signature scheme.
Definition 1. (Blindness:) Let S  be a probabilistic polynomial time algorithm,
U1 and U0 be two honest users. U1 and U0 engage in the signature issuing
protocol with S  on messages mb and m1−b , and output signatures δb and δ1−b ,
respectively, where b is randomly chosen from {0, 1}. Send (m0 , m1 , δb , δ1−b ) to
S  and then S  outputs b ∈ {0, 1}. For all such S  , U0 and U1 , for any constant
c, and for sufficiently large n,

|P r[b = b ] − 1/2| < n−c

3 Reviews of Kar et.al Blind Scheme


In the following, we will briefly review the Kar et.al Proxy blind scheme. Please
the interested reader refer to [1] for the detail content.
398 J. Zhang et al.

3.1 System Setup


The following system parameters are described as follows:
– p, q : two large prime numbers such that q|p − 1.
– g : an element of Zp∗ whose order is q.
– m : the signed message.
– mw : the designated proxy warrant which contains the identities information
of the original signer and the proxy signer, message type to be signed by the
proxy signer, the delegation limits of authority, valid periods of delegation,
etc.
– xA ∈ Zp : the original signer A s private key.
– xB ∈ Zp :the proxy signer B  s private key.
– yA = g xA mod p : the original signer’s public key.
– yB = g xB mod p : the proxy signer’s public key.
– H(·), h(·) : public cryptographically strong hash functions.
– || : concatenation of strings.

3.2 Proxy Designation Phase


To delegate his signature right to proxy signer, the process consists of the fol-
lowing step.
1. Firstly, the original signer A randomly selects an integer k̂ ∈R Zp .
2. Then the original signer A computes K = g k̂ mod p and ŝ = xA +k̂H(mw ||K)
mod q.
3. A sends (K, ŝ) along with the warrant mw to the proxy signer B via a secure
channel.
4. The proxy signer B then verifiers the equation g ŝ = yA K H(mw ||K) mod p.
5. If the above verification holds, then B accepts the proxy task and further
computes s = s + xB mod q as his proxy blind signature secret key.

3.3 Blind Signing Phase


To make the proxy signer produce a blind signature, The user R and the proxy
signer B execute the following interactive procedures:
1. The proxy signer B randomly chooses a number k ∈R Zp , and computes
t = g k+xB mod p
2. Then B sends (K, t) to the user R.
3. the user R selects two random integers a, b ∈ Zp .
4. The User R computes the following:
r = ta (yA yB K H(mw ||K) )ab mod p
e = h(m||r) mod q
e = a−1 e + b mod q
5. If r = 0, R needs to select a new tuple (a, b) . Otherwise, R sends e to B.
6. After receiving e , the proxy signer B computes s = e s + k + xB as the
signed message and sends it to the user R.
Forgeability Attack of Two DLP-Base Proxy Blind Signature Schemes 399

3.4 Signature Extraction


In this phase, the following steps are executed as follows:

1. After receiving s from B, the user R computes s = g s a mod p.
2. the resultant proxy blind signature on message m is (m, mw , s, e, K).

3.5 Signature Verification


In this phase, given a blind signature (m, mw , s, e, K),the verifier can verify the
validity of the proxy blind signature by checking the following equations.

e = h(m||s(yA yB K H(mw ||K) )−e ) mod q

4 The Flaw of Kar et.al Proxy Blind Signature Scheme


Recently, Kar et.al presented a proxy blind signature based on DLP and claimed
that their scheme satisfied the important property of blind signature: unforge-
ability. Unfortunately,we show that Kar et.al proxy blind signature scheme
doesn’t satisfy the unforgeabiblity by analyzing the security of the scheme.
Namely,anyone can forge a proxy blind signature on arbitrary a message m.

4.1 Forgeability
Here, we will show the scheme is insecure against unforgeability attack. Namely,
anyone can produce a forged proxy blind signature on behalf of original signer.
In the following ,we give the corresponding attack.
– Let m be a forged message.
– To produce a forgery, an adversary randomly chooses k ∈ Zq to compute
e = H(g k mod p).

– Then the adversary sets s = g k (yA yB K H(mw ||K) )e mod p
– Finally, the resultant proxy blind signature on message m is δ  =
(m , mw , s , e , K)
In the following, we show that the generated forged proxy blind signature is valid
and it can be verified by the verifier . This is true, because
e = h(m||g l )
h(m||s (yA yB K H(mw ||K) )−e ) = h(m||(g l (yA yB K H(mw ||K) )e )(yA yB K H(mw ||K) )−e )
= h(m||g l )
= e

Obviously, we can know that the forged proxy blind signature δ  = (m , mw , s ,
e , K) can pass verification equation. Thus, our attack is valid.
The reason to produce such attack is the formation of s in the proxy blind
signature is not fixed. It makes that an adversary can randomly chooses a suitable
formation to construct a forgery.
400 J. Zhang et al.

5 Security Analysis of Nway Oo et.al Scheme


5.1 Nway Oo et.al Scheme
In this section, we will briefly recall Nway Oo et.al proxy blind signature scheme,
The system Setup phase and Proxy Delegation Phase in Nway Oo et.al
scheme is the same ones of Kar et.al scheme. In the following, we only consider
Proxy Blind Signing phase and Verification phase. Please interested reader
refer [2] to the detail content.

[Proxy Designation Phase]. To delegate his signature right to proxy signer,


the process consists of the following step.
1. Firstly, the original signer A randomly selects an integer k̂ ∈R Zp .
2. Then the original signer A computes K = g k̂ mod p and ŝ = xA +k̂H(mw ||K)
mod q.
3. A sends (K, ŝ) along with the warrant mw to the proxy signer B via a secure
channel.
4. The proxy signer B then verifiers the equation g ŝ = yA K H(mw ||K) mod p.
5. If the above verification holds, then B accepts the proxy task and further
computes s = s + xB mod q as his proxy blind signature secret key.
[Signing phase]
Suppose that m is the message to be signed. Then, the execution of the proxy
signer and the user is as follows:
– Proxy signer B selects a random number k ∈R Zq and computes

r = gk mod p

and sends (K, r) to the user C.


– To obtain the blind signature, the user C randomly chooses two numbers
u, v ∈ Zq∗ to compute
r∗ = rg u (yA yB )−v mod p (1)
e∗ = h(r∗ ||m) mod q (2)
e = e∗ − v mod q (3)

– If r∗ = 0, then the user C has to select a new tuple (u, v). Otherwise, the
user sends e to the proxy signer.
– After receiving e, proxy signer B computes
s∗ = k + es mod q (4)

and sends s∗ to the user.


[Extraction Phase]
While receiving s∗ , the user C computes

s = gs +u
K −vh(mw ||K)

Finally, the resultant proxy blind signature on message m is (m, mw , s, e∗ , K).


Forgeability Attack of Two DLP-Base Proxy Blind Signature Schemes 401

[Verification]. After receiving the proxy blind signature (m, mw , s, e∗ , K), a


verifier checks as follows:

e∗ = h(s(yA yB K h(mw ||K) )−e ) mod q (5)
if it holds, the verifier accepts it as a valid proxy blind signature.

5.2 Forgeability Attack of Nway Oo et.al Scheme


Nway Oo et.al claimed that their scheme satisfied unforgeability. In other words,
an adversary cannot produce a forged proxy blind signature on behalf of original
signer in name of proxy signer. Unfortunately, In the following, we show that
Nway Oo et.al proxy blind signature scheme is insecure against unforgeability,
namely, it is universally forgeable, anyone can forge a proxy blind signature on
arbitrary a message m.
– Let m be a forged message.
– To produce a forgery, an adversary randomly chooses k ∈ Zq to compute
e∗ = H(g k mod p||m ) mod q. ∗
– Then the adversary sets s = g k (yA yB K H(mw ||K) )e mod p
– Finally, the resultant proxy blind signature on message m is δ  = (m , mw , s ,
e∗ , K)
In the following, we show that the generated forged proxy blind signature is valid
and it can be verified by the verifier . This is true, since
e = h(m||g l )
 H(mw ||K) −e
h(m||s (yA yB K ) ) = h(m||(g l (yA yB K H(mw ||K) )e )(yA yB K H(mw ||K) )−e )
= h(m||g l ) = e∗
Obviously, we can know that the forged proxy blind signature δ  = (m , mw , s ,
e∗ , K) can pass verification equation. Thus, our attack is valid.
In the Nway Oo et al.’s scheme, the reason to produce such attack is the
formation of s in the proxy blind signature is also not fixed. It makes that an
adversary can randomly chooses a suitable formation to construct a forgery.

6 Conclusion
As an important cryptgraphical technique, proxy blind signature plays an impor-
tant role in secure e-commerce, such as e-cash, e-vote. Where the Unforgeability
is an important property of proxy blind signature scheme. In this paper, we give
the security analysis on two DLP-based proxy blind signature schemes[1,2], and
show that the two schemes are insecure. They are universally forgeable, in other
words, anyone is able to forge a proxy blind signature on arbitrary a message. It
is a open problem to how to design a secure proxy blind signature scheme.

Acknowledgements. I thank the anonymous referees for their very valuable


comments on this paper. This work is supported by, the New Star Plan Project
of Beijing Science and Technology (No:2007B-001)and and The Beijing Natural
Science Foundation Programm and Scientific Research Key Program of Beijing
Municipal Commission of Education (NO:KZ2008 10009005).
402 J. Zhang et al.

References
1. Kar, B., Sahoo, P.P., Das, A.K.: A Secure Proxy Blind Signature Scheme Based
on DLP. In: MINES 2010, pp. 477–480 (2010)
2. Oo, A.N., Thein, N.: DLP based Proxy Blind Signature Scheme with Low-
Computation. In: 2009 The Fifth International Joint Conference on INC, IMS,
and IDC, pp. 285–288 (2009)
3. Lin, W.D., Jan, J.K.: A security personal learning tools using a proxy blind sig-
nature scheme. In: Proc. of Int. Conference on Chinese Language Computing, pp.
273–277 (2000)
4. Chaum, D.: Blind signature for untraceable payment. In: Advances in Cryptology-
Crypto 1982, pp. 199–203. Springer, Heidelberg (1983)
5. Tan, Z.W., Liu, Z.J., Tang, C.M.: A proxy blind signature scheme based on DLP.
Journal of Software 14(11), 1931–(1935)
6. Kim, J.-H., Kim, K., Lee, C.S.: An Efficient and Provably Secure Threshold Blind
Signature. In: Kim, K.-c. (ed.) ICISC 2001. LNCS, vol. 2288, pp. 318–327. Springer,
Heidelberg (2002)
7. Wang, S., Bao, F., Deng, R.H.: Cryptanalysis of a Forward Secure Blind Signature
Scheme with Provable Security. In: Qing, S., Mao, W., López, J., Wang, G. (eds.)
ICICS 2005. LNCS, vol. 3783, pp. 53–60. Springer, Heidelberg (2005)
8. Wang, S.H., Wang, G.L., Bao, F., Wang, J.: Cryptanalysis of a proxy blind signa-
ture scheme based on DLP. Journal of Software 16(5), 911–915 (2005)
9. Li, J.G., Wang, S.H.: New Efficient Proxy Blind Signature Scheme Using Verifiable
Self-certified Public Key. International Journal of Network Security 4(2), 193–200
10. Okamoto, T., Inomata, A., Okamoto, E.: A proposal of short proxy signature us-
ing pairing. In: The Proceedings of the International Conference on Information
Technology: Coding and Computing, pp. 631–635 (2005)
11. Pointcheval, D.: Security Arguments for Digital Signatures and Blind Signatures.
Journal of Cryptology 13(3), 361–396
12. Zhang, F., Kim, K.: ID-Based Blind Signature and Ring Signature from Pairings.
In: Zheng, Y. (ed.) ASIACRYPT 2002. LNCS, vol. 2501, pp. 533–547. Springer,
Heidelberg (2002)
13. Zhang, F., Kim, K.: Efficient ID-based Blind Signature and Proxy signature from
Bilinear Pairings. In: Safavi-Naini, R., Seberry, J. (eds.) ACISP 2003. LNCS,
vol. 2727, pp. 312–323. Springer, Heidelberg (2003)
14. Wu, Q., Susilo, W., Mu, Y., Zhang, F.: Efficient Partially Blind Signatures with
Provable Security. In: Gavrilova, M.L., Gervasi, O., Kumar, V., Tan, C.J.K.,
Taniar, D., Laganá, A., Mun, Y., Choo, H. (eds.) ICCSA 2006. LNCS, vol. 3982,
pp. 345–354. Springer, Heidelberg (2006)
15. Mambo, M., Usuda, K., Okamot, E.: Proxy signature: delegation of the power to
sign messages. IEICE Trans. Fundamentals E79-A(9), 1338–1353 (1996)
16. Xu, J., Zhang, Z., Feng, D.: ID-Based Proxy Signature Using Bilinear Pairings.
In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS (LNAI), vol. 3559, pp. 359–367.
Springer, Heidelberg (2005)
17. Zhang, F., Kim, K.: Efficient ID-based blind signature and proxy signature from
pairings. In: Safavi-Naini, R., Seberry, J. (eds.) ACISP 2003. LNCS, vol. 2727, pp.
312–323. Springer, Heidelberg (2003)
18. Shim, K.-A.: An Identity-Based Proxy Signature Scheme from Pairings. In: Ning,
P., Qing, S., Li, N. (eds.) ICICS 2006. LNCS, vol. 4307, pp. 60–71. Springer,
Heidelberg (2006)
Key Cutting Algorithm and Its Variants for
Unconstrained Optimization Problems

Uthen Leeton and Thanatchai Kulworawanichpong

Power System Research Unit, School of Electrical Engineering, Suranaree University of


Technology, Nakhon Ratchasima, Thailand
{uthen.leeton,thanatchai}@gmail.com

Abstract. This paper presents the key cutting algorithm and its variants. This
algorithm emulates the work of locksmiths to defeat the lock. The best key that
matches a given lock is pretended to be an optimal solution of a relevant opti-
mization problem. The basic structure of the key cutting algorithm is as simple
as that of genetic algorithms in which a string of binary numbers is employed as
a key to open the lock. In this paper, four variants of the predecessor are pro-
posed. The modification is mainly in the key cutting selection. Various criteria
of the key cutting probability are added in order to improve the searching speed
and the solution convergence. To evaluate their use, four standard test functions
are challenged and therefore which satisfactory best solutions obtained from the
key cutting variants are compared with those obtained from genetic algorithms.
The results confirm the effectiveness of the key cutting and its variants to solve
the unconstrained optimization problems.

Keywords: Key cutting algorithm, Lock smithing, Genetic algorithms, Uncon-


strained optimization.

1 Introduction
Locksmithing is the science and art of making and defeating a lock. The lock is a clas-
sical mechanism to secure building, rooms, cabinets, storage facilities, etc. A key is a
tool used to open the lock. A “smith” of any kind is one who shapes metal pieces. The
locksmithing as its name implies is the assembly and designing of locks and their re-
spective keys [1]. Although the locksmiths actually made the entire lock to maintain the
security of homes, business, automobiles or etc, sometimes people often encounter a
daily situation of losing their keys. The locksmiths can help them to open their key.
Lock picking [2] is an essential skill of lock smiths to open the lock without having the
correct key while not damaging the lock. There are various techniques to pick different
types of locks. The simplest way starts with a blank key and uses the following method
for a functioning key to open the lock. A hook pick is inserted into the lock and there-
fore the number and exact location of key teeth are known. The key tooth adjustment of
the initial key bank is applied until a key that can open the lock is found.
In November 2009, Jing Qin introduced his/her key cutting algorithm that emulates
the lock picking work of locksmiths to open a lock [3]. His/her algorithm is simple to
understand and be implemented. In his/her paper, a 9-number puzzle and a quadratic

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 403–410, 2011.
© Springer-Verlag Berlin Heidelberg 2011
404 U. Leeton and T. Kulworawanichpong

function of a single variable were used for test. The results were satisfactory but li-
mited. In our further work, this key cutting algorithm always fails when the number of
control variables is equal or greater than two. With respect to the original key cutting
algorithm, some modifications are made in order to improve the performance of the
algorithm suitable for unconstrained optimization problems. In this paper, Section 2
will give a useful description of the original key cutting algorithm while its variants
are illustrated in Section 3. Section 4 shows test results and discussion. The last sec-
tion, Section 5 is the conclusion.

2 Key Cutting Algorithm


Jing Qin [3] described algorithms to emulate the locksmith’s work of key cutting.
He/she gave nine definitions and seven steps of his/her procedure to find an optimal
solution as follows.

2.1 Definitions

Definition 1: Lock
The lock is defined as an objective function of unconstrained optimization problem. It
requires a solution that is called a “key” to open the lock.
Definition 2: Key
A key is a one possible solution to a given objective function.
Definition 3: Key Tooth
A set of key teeth is a binary string representing an encoded key as in Fig. 1.
Definition 4: Key Set
A set of possible solution is a collection of possible keys to open the lock as a collec-
tion of keys in a key ring.
Definition 5: Key Fitness
The key fitness represents a degree of the key and lock matching. The key with a
higher fitness is more suitable to fit the lock.
Definition 6: Similarity
The degree of similarities among all keys in a key set can reveal a correct tooth loca-
tion of the respective key to the lock.
Definition 7: Key Cutting
Key cutting is a step to adjust one tooth on a key or to change one bit of a string.
Definition 8: Key Cutting Probability
Key cutting probability is the probability to control the variation of one tooth of a bit
string. It can be calculated based on the similarity of the key set.
Definition 9: Key Selection (Key Picking)
Choose a subset from a key set to create a new key set in the next iteration.

Fig. 1. Key encoding [3]


Key Cutting Algorithm and Its Variants for Unconstrained Optimization Problems 405

2.2 Procedure of Basic Algorithms

Assume a key as k = [Sn, Sn-1,…,S1]


Step 1: Randomly generate an initial key set of 2M keys.
Step 2: Encode all keys in the key set into binary strings.
Step 3: Evaluate the fitness of all keys in the generated key set.
Step 4: Select a half of higher fitness keys in the key set to create a new key set.
Step 5: Calculate key cutting probability of all keys in all key sets as shown below.
⎡ S1n S1( n −1) ... S12 S11 ⎤
⎢ S S2( n −1) ... S22 S21 ⎥⎥
⎢ 2n
K ′ = ⎢ ... ... ... ... ... ⎥
⎢ ⎥
⎢ S( m −1) n S( m−1)( n −1) ... S( m−1)2 S( m −1)1 ⎥
⎢ Smn Sm ( n −1) ... Sm 2 Sm1 ⎥⎦

Where n is the total number of key teeth,


m is the total number of created keys.
Step 6: Based on the key cutting probability for each key tooth, perform tooth
adjustment to generate new keys.
Step 7: Repeat 3 – 6 until one of the following termination criteria is met.
Termination criteria:
7.1 Reach the maximum iteration.
7.2 Reach the pre-defined fitness value.
7.3 All keys in a key set are the same.
The above steps can be summarized as shown in the flow diagram in Fig. 2.

3 Modifications of the Key Cutting Algorithm


In the original key cutting algorithm, it converges to a solution very fast. However, in
multivariate problems, it usually fails to find the optimal solution. A key set with key
cutting probability is used and key picking is repeatedly performed iteration-by-
iteration, until all keys are the same. The major disadvantage of this algorithm occurs
when a majority of keys in a key set is not good enough to open the lock. The similar-
ity acquired among those keys can cause a trap of solutions to force all keys in a key
set being the same. To avoid such a trap, additional feature must be inserted to create
various candidates to open the lock. In this paper, four strategies are tried as follows.
Modification 1: Tooth adjustment is performed at only one tooth with the highest
key cutting probability and the highest probability must be greater than 0.5, otherwise
skip the adjustment. However, there might be more than one tooth having the same
value of the highest probability, only one tooth will be selected randomly for
simplification.
Modification 2: Similar to the modification 1. To increase opportunity of finding a
good key candidate, all teeth having the same value of the highest probability will be
performed the tooth adjustment. Only one adjustment that gives the best fitness is
selected to create a new key instead of using the random selection as applied in
Modification 1.
406 U. Leeton and T. Kulworawanichpong

Fig. 2. Flow diagram of the original key cutting algorithm [3]

Modification 3: Tooth adjustment is applied to all teeth having the probability


higher than 0.5. Only one adjustment that gives the best fitness is selected to create a
new key.
Modification 4: Similar to the modification 3. However, if the key cutting proba-
bility of a new key set is equal to the key cutting probability of the key set from the
previous iteration, then generate a new key set and restart the process.
Key Cutting Algorithm and Its Variants for Unconstrained Optimization Problems 407

4 Test Results and Discussion


In this paper, results obtained from the key cutting algorithms are compared with
those obtained by genetic algorithms [4-5]. Four standard test function of small-scale
unconstrained optimization problems [6] are used as shown in (1) – (4).

f ( x1 , x2 ) = sin( x1 ) × sin( x2 ) (1)


x1 x2
f ( x1 , x2 ) =100( x2 − x12 )2 + (1− x1 )2 (2)

f ( x1 , x2 , x3 ) = 30 + ∑( 1 ( xi − 20)2 − 9cos( 2π xi ))
3
(3)
i =1 10 5

sin 2 x12 + x 22 − 0.5


f ( x1 , x2 ) = 0.5 + (4)
(1 + 0.001*( x1 2 + x 22 )) 2

4.1 Test Function 1

The test of this function is carried out by applying the same parameter setting to all
the key cutting algorithms and genetic algorithms as follows.
• Population size is 80
• Maximum iteration is 50
• No stalled generation is applied
• 16-bit resolution is used for each variable

Fig. 3. Convergences of the test function 1


408 U. Leeton and T. Kulworawanichpong

After 30 trials of solutions, the selected convergence from each method is shown in
Fig. 3. KCA2 and KCA4 are the two best methods for finding the best objective func-
tion, while GA is the fastest.

4.2 Test Function 2

The test of this function is carried out by applying the same parameter setting to all
the key cutting algorithms and genetic algorithms as follows.
• Population size is 80
• Maximum iteration is 50
• No stalled generation is applied
• 16-bit resolution is used for each variable
After 30 trials of solutions, the selected convergence from each method is shown in
Fig. 4. KCA4 is the best method for finding the best objective function, while GA is
the fastest.

Fig. 4. Convergences of the test function 2

4.3 Test Function 3

The test of this function is carried out by applying the same parameter setting to all
the key cutting algorithms and genetic algorithms as follows.
• Population size is 80
• Maximum iteration is 100
• No stalled generation is applied
• 20-bit resolution is used for each variable
After 30 trials of solutions, the selected convergence from each method is shown in
Fig. 5. KCA4 and GA are the two best methods for finding the best objective func-
tion, while only GA is the fastest.
Key Cutting Algorithm and Its Variants for Unconstrained Optimization Problems 409

Fig. 5. Convergences of the test function 3

Fig. 6. Convergences of the test function 4

4.4 Test Function 4

The test of this function is carried out by applying the same parameter setting to all
the key cutting algorithms and genetic algorithms as follows.
• Population size is 30
• Maximum iteration is 50
• No stalled generation is applied
• 20-bit resolution is used for each variable
410 U. Leeton and T. Kulworawanichpong

After 30 trials of solutions, the selected convergence from each method is shown in
Fig. 6. KCA4 is the best method for finding the best objective function, while GA is
the fastest.

5 Conclusion
This paper presents the key cutting algorithm and its variants to solve multivariate
optimization problems. The proposed algorithms are challenged with four standard
test functions and their results are also compared with those obtained by genetic algo-
rithms. As a result, the key cutting algorithm with modification 4 (KCA4) shows the
best performance of finding the best solution among them. However, from the tests
the key cutting algorithm is slower than genetic algorithms. This is because there is no
stalled iteration applied to all the methods. As all the key cutting algorithms having
fast convergent characteristic, if an appropriately stalled iteration is used, the key
cutting algorithms would be expected to perform faster than genetic algorithms.

References
1. Phillips, B.: The Complete Book of Locks and Locksmithing. McGraw-Hill, Chicago
(2005)
2. McCloud, M.: Lock Picking Basics. Standard Publication Inc. (2004)
3. Qin, J.: A New Optimization Algorithm and Its Application – Key Cutting Algorithm. In:
2009 IEEE International Conference on Grey Systems and Intelligent Services, pp. 1537–
1541. IEEE Press, New York (2009)
4. Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addi-
son-Wesley, Reading (1989)
5. Charuwat, T., Kulworawanichpong, T.: Genetic Based Distribution Service Restoration
with Minimum Average Energy Not Supplied. In: Beliczynski, B., Dzielinski, A., Iwa-
nowski, M., Ribeiro, B. (eds.) ICANNGA 2007. LNCS (LNAI), vol. 4431, pp. 230–239.
Springer, Heidelberg (2007)
6. He, S., Wu, Q.H., Saunders, J.R.: Group Search Optimizer: An Optimization Algorithm In-
spired by Animal Searching Behavior. IEEE Transactions Evolutionary Computation 13,
973–990 (2009)
Transmitter-Receiver Collaborative-Relay
Beamforming by Simulated Annealing

Dong Zheng1 , Ju Liu1,2, , Lei Chen1 , Yuxi Liu1 , and Weidong Guo1
1
Shandong University, Jinan, 250100, China
2
Southeast University, Nanjing, 210096, China

Abstract. This paper considers the collaborative-relay beamforming


(CRBF) design for a three-hop multi-relay network with one transmitter,
one receiver and two clusters of relay nodes. It is assumed that, all the
relay nodes work synchronously with perfect channel state information
(CSI). Optimization on the relay weights is carried out to improve the
signal-to-noise ratio (SNR) at the receiver under aggregate power con-
straints of each cluster. Two different design approaches are proposed
in this study. In the first approach, a simulated annealing (SA) based
CRBF method is presented, and a stochastic global optimum is obtained.
However, the SA algorithm is quite computational demanding. In order
to speed up the heuristic searching process, a suboptimal but efficient
closed-form solution is provided in the second approach, which helps to
generate the initial state of the SA algorithm. Simulation results show
that both approaches outperform the fixed power allocation strategy.

1 Introduction
The insistent demand for developing more spectral efficient technologies makes
the multiple-input multiple-output (MIMO) system attract much attention re-
cently. Space diversity can be fully exploited by using multiple antennas equipped
both at the transmitter and the receiver. However, the limited space, complexity
and non-regenerative power of the mobile terminals challenge the implementa-
tion of multiple antennas and make the potential benefits be difficult to utilize.
Nowadays, another type of diversity (named cooperative diversity), by which
users can relay each other’s information and form a virtual multi-antenna system,
has opened a new research avenue [1]- [4]. Amplify-and-forward, decode-and-
forward, and compress-and-forward are three common fixed relaying schemes.
Among them, amplify-and-forward (AF), which just amplifies the received noisy
signal and forwards it to other relay nodes or destination, is arguably the most
attractive strategy due to its simplicity. These schemes have been well studied
under different assumption of CSI [4]- [6].

This work was supported by National Natural Science Foundation of China
(60872024), the Cultivation Fund of the Key Scientific and Technical Innovation
Project (708059), Open Research Fund of National Mobile Communications Re-
search Laboratory (2010D10), and Independent Innovation Foundation of Shandong
University (2010JC007). Corresponding author: Ju Liu ([email protected]).

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 411–418, 2011.

c Springer-Verlag Berlin Heidelberg 2011
412 D. Zheng et al.

As far as we know, most of the aforementioned works investigated a dual-hop


relay system with only one cluster of relay nodes. In this research, we consider
a three-hop multi-relay wireless network which consists of one transmitter, one
receiver and two clusters of relay nodes. With the help of perfect channel state
information, the two clusters of relay nodes can form a virtual MIMO beamform-
ing system by using AF protocol. We aim to maximize the SNR at the receiver
by optimizing the relay weights subject to the aggregate power constraints of
each cluster. We find the objective function is neither linear nor convex for the
relay weights, thus it is intractable to design the weight coefficients jointly by
standard numerical methods. Fortunately, heuristic optimization methods, such
as simulated annealing, genetic algorithms, tabu search and neural networks,
have developed very fast in the last decade and become useful alternatives to
the conventional optimization methods [7]. Among them, the SA algorithm is a
promising method, which is able to provide a high quality stochastic approxima-
tion to the global optimum, even when the traditional methods failed [8], [9]. In
the first approach, a novel method, which employs the SA algorithm for searching
the optimal collaborative relay beamforming coefficients, is proposed. Besides,
in the second approach, a suboptimal but efficient closed-form method is pre-
sented to accelerate the searching process, by providing the proper initial states.
Simulation results show that, our proposed approaches yield great performance
gain against the fixed power allocation strategy.
The remainder of this paper is organized as follows. In section 2, system model
and problem formulation are described. Then, the SA based CRBF approach
is illustrated in section 3, while section 4 presents the closed-form suboptimal
approach. Section 5 and 6 give the simulation results and the conclusions, re-
spectively.

2 Model Description

2.1 System Model

We consider a three hop relay network with a source S, a destination D and two
clusters of relay nodes, namely cluster one {Tm }M
m=1 with M relay node and
K
cluster two {Rk }k=1 with K relay node, as shown in Fig. 1. We assume that

nT nR
H
T1 w1 R1 v1
g1
f1 nD

f2 g2
S T2 w2 R2 v2 D
• •
fM • •
• • gK
TM wM RK vK

Fig. 1. System Model


Transmitter-Receiver Collaborative-Relay Beamforming 413

only links between S and {Tm }M M K K


m=1 , {Tm }m=1 and {Rk }k=1 , {Rk }k=1 and D
exist due to the poor channel quality. Consider a rich scattering environment
and all the channels are subject to independent Rayleigh flat fading. We denote
the links from S to Tm , Tm to Rk and Rk to D as fm , hk,m and gk , respectively.
In the first stage, the received signal at Tm is given by

 
ym = PS fm s + nT,m (1)
2
where s is the information symbol, with E |s| = 1, P S is the transmit power
of source S, and nT,m is a complex Gaussian noise with zero mean and variance

σT2 . During the second stage, the received signal ym is weighted by a power
2
normalization factor lm = 1 |fm | PS + σT2 and a beamforming weight wm .
Therefore the received signal at Rk is expressed as

uk =
 h M 
PS fm s + nT,m ) + nR,k (2)
k,m wm lm (
i=1
2
in which nR,k is a complex Gaussian noise with zero mean and variance σR .
Then Rk retransmits the received 

noisy signal multiplied by a

similar power

normalization factor dk = 1
M

i=1 hk,m wm lm

PS fm s + nT,m
2
+ σR2

and the beamforming weight vk , hence the received signal at D is

r=
 (g v d u ) + n
K
k k k k D

 g v d  
k=1
(3)
K
M
= k k k [hk,m wm lm (fm Ps S + nT,m )] + nR,k + nD
i=1
k=1
2
where nD is a complex Gaussian noise with zero mean and variance σD . The
Eq. (3) can be represented using matrix form as

 + v
 P v DGHLFWs
  
H H
r= S DGHLWnR +vH DGnT +nD
(4)
desired signal noise

where v = [v1 , v2 , · · · , vK ]H , W = diag(wm ), ∀m, F = diag(fm ), ∀m, G =


diag(gk ), ∀k, [H]k,m = hk,m , ∀k, m, L = diag(lm ), ∀m, D = diag(dk ), ∀k. As a

 
result, the instantaneous SNR at D is given by
2
PS vH DGHLFw
Γ= 2 2
(5)
σT2 vH DGHLw + σR
2
vH Dg + σD
2

where w = [w1 , w2 , · · · , wM ]T . Mathematically, the optimization problem is

max Γ s.t. wH w ≤ PT , vH v ≤ PR (6)


w,v

where PT and PR are the total power constraints of the first relay cluster and
the second relay cluster, respectively.
414 D. Zheng et al.

3 SNR Maximization by Simulated Annealing

3.1 The Metropolis Criteria

Simulated annealing, analogous to the annealing process of metal cooling, has


been widely used for solving combinatorial problems [10], [11], and is regarded
as a stochastic global optimization method. Although it is a local search method,
unlike other conventional gradient decrescent technique, it can avoid being
trapped in a local minimum by accepting bad moves with a probability called
Metropolis Criteria [8]. The optimization process is carried out on the cost func-
tion, which simulates the energy of physical annealing process. Every solution
in the solution space is called a state. The heuristic search starts from a initial
state, and new states are generated in the neighborhood of the former ones. If
the new states yield a decrement on the cost function, they will be automatically
accepted as the new current states. Alternatively, the new states will be accepted
with a probability [11]

ΔE
 
Pa = Pr min 1, exp − ≥ rand[0, 1] (7)
kt

where Pr {A} denotes the probability of an event A, Pa is the probability of the


new states to be accepted, ΔE is the difference between the cost function of the
new state and that of the former state, k denotes the Boltzmann constant and
t is the control parameter simulating the temperature of the system, rand[0, 1]
is an uniformly distributed random variable. As the temperature drops down,
the probability of accepting bad moves decreases, until no worse move could be
accepted.

3.2 The Proposed Algorithm

In this subsection, we summarize the principle of the SA based CRBF method.


More detailed descriptions about SA, see [8], [9] and references therein.
In the simulated annealing process, the cost function is used to evaluate the
quality of the solution and the value of the cost function should reduce as the
optimization process goes on. In this research, we choose the negative of SNR
at the receiver as the cost function. The SNR at the receiver is a function of the
relay weights of the two clusters. In fact, once coefficient w has been determined,
the optimal value of v can be efficiently obtained as a close-form function of w,
detailed description refer to Eq. (11) and (12) in the following section. Thus it’s
not necessary to generate new states of the two coefficients individually, and we
pick w to mimic the state of this system here. Then the cost function is

vH Rv
f (w) = − (8)
vH Qv
Æ
−1

where R = JJH , Q√= σT2 KKH + σR
2

DggH DH√+ σD
2
PR I, J = DGHLFw,
K = DGHLw, v = PR P Q R . Note that PR is used to meet the power
Transmitter-Receiver Collaborative-Relay Beamforming 415

constraint of the second cluster and P (·) represents the principal eigenvector of
a matrix. Neighborhood searching function is used for generating new states in
the neighbor of the former states, and this can be done by adding a perturbation
to the former states. The pseudo-code for neighborhood searching is given as:

wn+1 = PT (wn + δn )/norm(wn + δn ) (9)



where the perturbation δn ∼ C(σ 2 I, 0) (cauchy distribution), and PT is the
power constraint of the first cluster.

Algorithm 1. CRBF by SA
Input:w0 , t0
Initialization: wp = w0 , ti = t0
1: while i < Cout do
2: while j < Cin do
3:
4:

wc = Generate(wp )
if min 1, exp −

f (wc )−f (wp )
≥ rand[0, 1] then
kti
5: wp := wc
6: end if
7: j := j + 1
8: end while
9: ti+1 = αti {α is set to 0.95 here}
10: i := i + 1
11: end while

Output w = wP , v = PR P(Q−1 R), f (w)

The proposed scheme is described Algorithm 1, where Generate(·) is the


neighborhood searching function, as given in Eq. (9), and w0 , t0 are the initial
state and initial temperature, respectively. Note that we fix the iteration counts
(Cout , Cin ) to ensure this algorithm be terminated in finite steps. Likewise we
can stop the process and output the solution when the difference between the
cost functions below a predefined threshold for sufficient counts.

4 Closed-Form Suboptimal Solution for Generating the


Initial State of SA

Although we can obtain a global optimization using the simulated annealing


algorithm, a closed-form solution is still desired due to efficiency and simplicity.
Moreover, this could serve as the initial state of the proposed SA approach to
reduce the execution counts. Before proceeding on, we represent (6) as:

PS PT PR v̂H DGHLFŵ
 2
max 2 2
ŵ,v̂ σT2 PT PR v̂H DGHLŵ + σR
2 P v̂H Dg + σ 2
R D
(10)
H H
s.t. ŵ ŵ ≤ 1, v̂ v̂ ≤ 1
416 D. Zheng et al.

√ √
where w = PT ŵ, v = PR v̂, and ŵ, v̂ are the normalized unit vectors of
w and v, respectively. Let the equivalent channel matrix A = GHLF have a
singular value decomposition (SVD) A = UΛZ, where U and Z are unitary
matrices and Λ is the diagonal matrix of singular values of A. Then we can
choose ŵ as the column vector of Z corresponding to the largest singular value
of A. When ŵ has been determined, the objective function in (10) have the
form:

PS PT PR v̂H FFH v̂
max H 2 P v̂H DggH DH v̂
(11)
v̂ σT PT PR v̂H GG v̂ + σR
2
R
2 v̂H v̂
+ σD

where J = DGHLFŵ, K = DGHLŵ, that is

v̂H Rv̂
SNR(v̂) = (12)
v̂H Qv̂

where R = PS PT PR JJH , Q = σT2 PT PR KKH + σH 2


PR DggH DH + σD 2
I. This
becomes a generalized Rayleigh quotient problem, and it is well known in [4] that
 
the objective function is maximized when v̂ is chosen as the principal generalized
eigenvector of the matrix (R, Q), or equivalently v̂ = P Q−1 R . Till now the
suboptimal solution of ŵ and v̂ are all get. This method also explains why we
can just use ŵ to mimic the state of the system. Note that in this approach we
add constraint on the value of ŵ, so the design is suboptimal.

5 Simulation Results
In this section, the performance of the proposed distributed beamforming solu-
tion in the CRBF systems will be presented. Consider a network with M relay
nodes in first relay cluster and K relay nodes in the second relay cluster. The co-
operative network experiences the independent Rayleigh flat fading. Assume that
2 2
E{|fm | } = 1 for i = 1, 2, · · · , M /2 and E{|fm | } = 2 for i = M /2 + 1, · · · , M ;
2 2
E{|gk | } = 1 for i = 1, 2, · · · , K/2 and E{|gk | } = 2 for i = K/2 + 1, · · · , K.
2
E{|hj,i | } = 1, ∀i, j. The source SNR is defined as PS /N , and all the nodes in
the proposed network have the same power level, without generality, set to 1.
Throughout our simulation, the source SNR (transmit power) is assumed to be
10dB, the total transmit power increases from -5dB to 20dB.
Fig. 2 shows the average received SNR versus the maximum allowable power
of relay cluster one with M = 6,10, K = 10 and P2 = 10dB, while Fig. 3 plots the
solution against the maximum allowable power of the second relay cluster with
M = 10, K = 6,10 and PT = 10dB. Fig. 2 illustrates as the power of the cluster
one increases, the output SNR is accordingly improved. The received SNR is high
even if P1 is low, because noises introduced in the first two hops can be suppressed
by effectively exploiting the channel (especially the channel between two clusters)
spatial diversity through the proper adjustment of w and v. Subsequently, the
output SNR saturates as it is constrained by noises introduced in the last hop
2
σD . In contrast, in Fig. 3, we see that the received SNR rises almost linearly
Transmitter-Receiver Collaborative-Relay Beamforming 417

35
SA based solution [M,K]= [10,10]
SA based solution [M,K]= [6,10]
30
Suboptimal solution [M,K]= [10,10]
Suboptimal solution [M,K]= [6,10]

Received SNR at Destination


25 Fixed power allocation [M,K]=[10,10]
Fixed power allocation [M,K]=[6,10]

20

15

10

0
−5 0 5 10 15 20
Total relay SNR in dB

Fig. 2. The received SNR at D versus the total relay SNR PT /δ 2

35
SA based solution [M,K]= [10,10]
SA based solution [M,K]= [10,6]
30
Suboptimal solution [M,K]= [10,10]
Suboptimal solution [M,K] = [10,6]
Received SNR at Destination

25 Fixed power allocation [M,K]=[10,10]


Fixed power allocation [M,K]=[10,6]

20

15

10

0
−5 0 5 10 15 20
Total relay SNR in dB

Fig. 3. The received SNR at D versus the total relay SNR PR /δ 2

with the increasing of the total relay power PR , since the increased power of the
2
second cluster PR helps to resist the constraint of noise σD .
Both figures indicate that the SNR at the receiver improves as the relay
number of either cluster increases, because there are better chances to select
more suitable relays to forward the signal to the destination. Furthermore, we can
observe that the SA approach is about 3dB better than the suboptimal method.
On the other hand, simulations results show that both approaches outperform
the fixed power allocation strategy.

6 Conclusion
In this paper, we proposed a SA based approach to improve the SNR at the re-
ceiver for the three-hop multiple-relay network and a stochastic global
418 D. Zheng et al.

optimization is obtained. Though the suboptimal scheme is intended for generat-


ing the initial state of the SA based approach, it also can be used as an indepen-
dent solution. Simulation results show that both the proposed approaches yield
significant performance gain against the fixed power allocation strategy. The ra-
tionality and feasibility of dealing with such problems by SA based method still
have controversy, however, our contribution could be considered as a beneficial
trial to such problems.

References
1. Laneman, J.N., Wornell, G.W.: Cooperative diversity in wireless networks: Efficient
protocols and outage behavior. IEEE Trans. Info. Theory 50, 3062–3080 (2004)
2. Sendonaris, A., Erkip, E., Aazhang, B.: User cooperation diversity - Part I. System
description. IEEE Trans. Commun. 51, 1927–1938 (2003)
3. Sendonaris, A., Erkip, E., Aazhang, B.: User cooperation diversity - Part II. Imple-
mentation aspects and perfromance analysis. IEEE Trans. Commun. 51, 1939–1948
(2003)
4. Havary-Nassab, V., Shahbazpanahi, S., Grami, A., Luo, Z.-Q.: Distributed beam-
forming for relay networks based on second-order statistics of the channel state
information. IEEE Trans. Signal Process 56(9), 4306–4316 (2008)
5. Zheng, G., Wong, K.-K., Paulraj, A., Ottersten, B.: Collaborative-relay beam-
forming with perfect CSI: optimum and distributed implementation. IEEE Signal
Processing Letters 16(4) (April 2009)
6. Jing, Y., Jafarkhani, H.: Network beamforming using relays with perfect channel
information. IEEE Trans. Info. Theory 55, 2499–2517 (2009)
7. Manfred, G., Peter, W.: A review of heuristic optimization methods in economet-
rics. In: Working papers, Swiss Finance Institute Research Paper Series, vol. (8-12),
pp. 8–12 (2008)
8. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing.
Science New Series 220(4598), 671–680 (1983)
9. Eglese, R.W.: Simulated annealing: a tool for operational research. European Jour-
nal of Operational Research 46(3), 271–281 (1990)
10. Trucco, A., Murino, V.: Stochastic optimization of linear sparse arrays. IEEE Jour-
nal of Oceanic Engineering 24(3), 291–299 (1999)
11. Cardone, G., Cincotti, G., Pappalardo, M.: Design of wide-band arrays for low
side-lobe level beam patterns by simulated annealing. IEEE Trans. Ultransonics,
Ferroelectrics, and Frequency Control 49(8), 1050–1059 (2002)
Calculation of Quantities of Spare Parts and the
Estimation of Availability in the Repaired as Old Models

Zhe Yin1,2,*, Feng Lin1, Yun-fei Guo1, and Mao-sheng Lai2

1
Key Laboratory of Natural Resources of Changbai Mountain & Functional Molecules
(Yanbian University), Ministry of Education, 133002 Yanji, China
2
Department of Information Management, Peking University, 100871 Beijing, China
[email protected]

Abstract. In this paper, based on the repair as old model under the same storage
condition, the quantities of spare parts M for N identical systems is derived on the
P
condition that failures are repaired with probability 0 , a special example is
given to prove the feasibility. Besides, availability function expression is pro-
vided at the same time, and taking system which follows Weibull distribution as
the case, the validity of system is shown through calculation.

Keywords: availability, spare parts, storage model, updating process.

1 Introduction
Many equipments and systems is still repairable, this will include two cases: One case
is the need to maintain in the process of working, and the other one is in the process of
storage. This article gives statistical analyses for the second kind of the time required
for the maintenance. However, in actual fact, the most concern of people is the storage
quantity of spare parts and how to improve the availability of equipment. Allen and
D'esopo [1] proposed the idea that the spare parts should be classified before the 1960s.
Cohen [2] divided needs into urgent needs and ordinary ones. Moore [3] did the clas-
sification according the functions of spare parts. Because of the influence of spare parts
to manufacture and economy, many scholars have studied the amounts of the spare
parts needed. P.Flint [4] provided the advice that we should develop the fellowship and
the resource sharing to reduce the cycle time. Besides, Foote [5] studied stocks pre-
diction, and Luxhoj and Rizzo [6] obtained the method of amounts of spare parts
needed of the same set based on the set model. Kamath [7] used the Bayesian method to
predict the amounts of spare parts needed. Yu [8] put forward and discuss the repaired
as ole models, and offered calculation formula of the quantities of spare parts M
which could meet the needs of the equipment, and Yan [9] studies the calculation of
availability of the equipments that composed by only one part. Based on this, this paper
considers repaired as old model which has reduced the quantities of spare parts and the
validity of system is shown through calculation.

*
Corresponding author. Head of department of mathematics.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 419–426, 2011.
© Springer-Verlag Berlin Heidelberg 2011
420 Z. Yin et al.

2 Repaired as Old Models and Calculation of the Quantities of


Spare Parts
Model description. Assume that a new system, beginning with zero time, and it is
detected for every time a(a>0), if it is normal, keep storing; otherwise, give the
maintenance, that is, after a time b(0<b<a) maintenance, the system restore normal and
continue to store. If the maintenance process is equivalent to more a store, but also has
invalidated all the same system is said to the repaired as old model. For convenience
narrating, state random variable X t :

⎧1 , normal at timet
Xt = ⎨
⎩0 , abnormal at timet
Z by F ( t ) . On the
Note the distribution function of the first failure time of systems
assumption of model, when b = 0 , the availability at time t ,
A ( t ) = P ( X t = 1) = 1 − F ( t ) ; when b > 0 , note ak = k ⋅ a , ( k = 0,1,...) ,
bk = k ⋅ a + b , ( k = 1, 2,...) , we can know from the model assumption:

When a0 < t ≤ a1 , A ( t ) = 1 − F ( t ) X t , (2.1)

When ak < t ≤ bk ( k > 1) ,

1 − F (t )
A ( t ) = P ( X t = 1| X ak = 1) P ( X ak = 1) = A ( ak ) (2.2)
1 − F ( ak )

When bk < t ≤ ak +1 ( k ≥ 1) ,

A ( t ) = P ( X t = 1| X ak = 0 ) + P ( X t = 1, X ak = 1)

= P ( X t = 1| X ak = 0 ) P ( X ak = 0 ) + P ( X t = 1| X ak = 1) P ( X ak = 1)

1 − F (t ) 1 − F (t )
=
1 − F ( bk )
(1 − A ( a ) ) + 1 − F a A ( a ) (2.3)
k
( k) ak

We can see from 2.1 and 2.2 that we can the available function of system A ( t ) at any
⎛ ⎡ t ⎤⎞
time t only if we calculate all A ( ak ) , ⎜ k = 1, 2,..., ⎢ ⎥ ⎟ . In formula 2.3, let
⎝ ⎣a⎦⎠
t = ak +1 , when k ≥ 1 , we have
Calculation of Quantities of Spare Parts and the Estimation 421

1 − F (ak +1 ) ⎡1 − F (ak +1 ) 1 − F (ak +1 ) ⎤


A(ak +1 ) = +⎢ − ⎥ A(ak )
1 − F (bk ) ⎣ 1 − F (ak ) 1 − F (bk ) ⎦

⎛ ⎡ t ⎤⎞
We can calculate A ( ak ) , ⎜ k = 1, 2,..., ⎢ ⎥ ⎟ , making use of
⎝ ⎣a⎦⎠
A ( ak ) = 1 − F ( ak ) , further more, the expression of availability function A ( t ) at
any time t can be obtained.

Calculation of the quantities of spare parts


Assume that there are N + M ( M is the quantities of the spare parts) systems stored
in the same condition from time 0, the state of these N + M systems are independent,
at the same time, they meet repaired as old model, thus for any time during storage time
T0 , the minimum M which meet the condition that the probability that there are at
least N normal systems is not less than P0 is:
⎧ N +m ⎛ N + m ⎞ ⎫
M = min ⎨m; ∑ ⎜ A ( t ) (1 − A ( t ) )
N + m− j
⎟ ≥ P0 , 0 ≤ t ≤ T0 ⎬ (2.4)
⎩ j=N ⎝ j ⎠ ⎭
Empirical analysis
There is a cycle pump in certain company, and 10 rubber bearings are spare parts.
Besides, failure rate λ = 0.2 , repair rate μ = 0.6 . We will discuss the quantities of
spare parts in the case thatP0 is 0.6, 0.7, 0.8 and 0.9, respectively. Here, we take stable
μ 3
availability A = lim A ( t ) = = instead of A ( t ) . In order that we can look
n →∞ λ+μ 4
up binomial distribution table, necessary improvement of 2.4 is provided:
N +m
⎛ N + m⎞
∑⎜ ⎟ A ( t ) (1 − A ( t ) )
N + m− j

j=N ⎝ j ⎠
(2.5)
N +m
⎛ N + m⎞ m
⎛ N + m⎞
= ∑⎜ ( ) ∑ ⎟ A ( t ) (1 − A ( t ) )
N + m− j N + m− j
⎟ A ( t ) 1 − A ( t ) − ⎜
j =0 ⎝ j ⎠ j =0 ⎝ j ⎠

When P0 = 0.6 , we can obtain M = 3 though formula 2.5 and looking up bino-
mial distribution table. In like manner, we can get M is 4, 5 and 9 when P0 is 0.7, 0.8
and 0.9, respectively.
If one rubber bearing costs 3000 yuan, the corresponding use of funds are 9000
yuan, 12000 yuan, 15000 yuan and 27000 yuan, such is Table 1.
422 Z. Yin et al.

Table 1. Empirical analysis

P0 Quantities of spare Actual P0 Use of funds(yuan)


parts
0.6 3 0.668 9000
0.7 4 0.7588 12000
0.8 5 0.821 15000
0.9 9 0.9436 27000

From the analysis of results, we can see that more quantities of spare parts, higher
reliability, and more use of funds. In the case of the lack of maintenance funds, how to
balance reliability and maintenance is the key consideration.

3 Model and Notations

z1 : The storage life span of the equipment from t = 0 , and z1 ~ F ( t )


z j : The storage life span of the equipment after j − 1 updating, z j ~ F ( t )
z (j ) : The storage life span of the part i after j − 1 updating, after j − 1 updating,
i

i = 1, 2; j = 2,3,...
z1( ) : The storage life span of the part i from t = 0 , z1( i ) ~ Fi ( t ) , i = 1, 2;
i

T1( ) : The time interval of part i from the beginning storage to the first updating,
i

i = 1, 2;
Tj(i ) : The time interval of part i from the j − 1 updating to j updating,
i = 1, 2; j = 2,3,...

S (ji ) : The jth updating time of part i , i = 1, 2; j = 2,3,...

N (i ) ( t ) : The updating times of part i during [ 0, t ] , i = 1, 2;


T1( ) is G ( i ) ( i = 1, 2 ⋅⋅⋅) ( i = 1, 2,3 ⋅⋅⋅, n ) , the
i
Besides, the distribution of

distribution of T2( i ) is H (i ) ( i = 1, 2 ⋅⋅⋅, n )


Let ak = ka ( k = 1, 2, ⋅⋅⋅, n ) , bk = ka + b ( k = 1, 2, ⋅⋅⋅, n ) ,
ck = ka + c ( k = 1, 2, ⋅⋅⋅, n ) , { }
G ( i ) = g (ji ) ; j = 1, 2, ⋅⋅⋅, n ,
{
H (i ) = h(ji ) ; j = 1, 2, ⋅⋅⋅, n }
Calculation of Quantities of Spare Parts and the Estimation 423

( ) (( j − 1) a < Z ( ) ≤ ja )
g (j1) = P T1(1) = b j = P 1
j

= F1 ( ja ) − F1 ( ( j − 1) a )

= R1 ( ( j − 1) a ) = R1 ( ja ) , j = 1, 2, ⋅⋅⋅,

( ) (( j − 1) a − b < Z ( ) ≤ ja − b )
h(j1) = P T2( `) = ja = P 2
1

= F1 ( ja − b ) − F1 ( ( j − 1) a − b )

= R1 ( ( j − 1) a − b ) − R1 ( ja − b ) , j = 1, 2, ⋅⋅⋅,

( ) ( ( j − 1) a < Z ( ) ≤ ja )
g (j2) = P T1( 2 ) = c j = P 1
2

= F2 ( ja ) − F2 ( ( j − 1) a )

= R2 ( ( j − a ) a ) = R2 ( ja ) , j = 1, 2, ⋅⋅⋅,

( ) (( j − 1) a − b < Z ( ) ≤ ja − c )
h(j 2) = P T2( 2 ) = ja = P 2
1

= F2 ( ja − b ) − F2 ( ( j − 1) a − b )

= R2 ( ( j − 1) a − c ) − R2 ( ja − c ) , j = 1, 2 ⋅⋅⋅,

( ) ( ( j −1) a < Z ( ) ≤ a )
g (jn ) = P T1(1) = n j = P 1
2

= Fn ( ja ) − Fn ( ( j − a ) a )

= Rn ( ( j − a ) a ) = Rn ( ja ) , j = 1, 2, ⋅⋅⋅

( ) (( j − 1) a − n < Z ( ) ≤ ja − n )
h(j n ) = P T2( n ) = ja = P 2
n

= Fn ( ja − n ) − F2 ( ( j − 1) a − n )

= Rn ( ( j − 1) a − n ) − Rn ( ja − n ) , j = 1, 2, ⋅⋅⋅,

Where Ri ( ∗) = 1 − Fi ( ∗) is reliability function of part i ( i = 1, 2,3, ⋅⋅⋅n ) .


424 Z. Yin et al.

4 Availability Function

Because the availability of time t during the updating process depends on the last
updating process before time t , we need to study distribution of S N ( t ) ( i = 1, 2, ⋅⋅⋅, n )
(i )

of part i ( i = 1, 2, ⋅⋅⋅, n ) , the generating function of G (i ) and H (i ) is


∞ ∞
g ( i ) ( x ) = ∑ g (ji ) s h( i ) ( s ) = ∑ h(ji ) s ja , i = 1, 2, ⋅⋅⋅, n . So the generating
bj
and
j =1 j =1

( )
k +1
sk( i+)1 = ∑ T j( i ) ( i = 1, 2, ⋅⋅⋅, n ) is g ( i ) ( s ) h(i ) ( s )
k
function of The distribution of
j =1

sk( +)1 is G ( i ) ∗ H k( i ) ( i = 1, 2, ⋅⋅⋅, n ) , here H k( ) denotes k-integral of H (i ) , ∗ denotes


i i

convolution.
So the relation between distribution and generating function is as follows:

⎧v (ji )( 0) = g (ji ) , i = 1, 2, ⋅⋅⋅, n. j = 1, 2, ⋅⋅⋅



⎪ ( i )(1) j −1 ( i )( 0) ( i )
⎪v j = ∑ vk h j − k , i = 1, 2, ⋅⋅⋅, n. j = 2,3, ⋅⋅⋅
⎪ k =1

⎪⋅

⎪⋅
⎪⋅

⎪ ( i )( n ) j −1 ( i )( m −1) ( i )
⎪v j = ∑ vk h j − k , i = 1, 2, ⋅⋅⋅, n. j = n + 1, n + 2, ⋅⋅⋅
⎩ k =n

So far we can obtain the analytical availability function of the equipments, specific
calculation is as follows:

( i ) when 0 ≤ t < a1 ,
A ( t ) = P ( X t = 1) = P ( Z1 > t ) = R ( t )

( ii ) when a1 ≤ t < a2 ,

( )
A ( t ) = P ( X t = 1) = P X 1 = 1, X a1 = 1 + P X 1 = 1, X a2 = 0 ( )
(
= R ( t ) + P X t = 1, X a1 = 0 )
( iii ) when am ≤ t < am+1 ( m ≥ 2 ) ,there will be n kind of case:
Calculation of Quantities of Spare Parts and the Estimation 425

Case 1 only change part 1

When bm ≤ 1 < am+1 ( m ≥ 2 ) ,

(
P X 1 = 1, S N(1) ( t ) = bm , S N( 2) ( t ) ≠ cm )
( ) (
= P X 1 = 1, S N(1) ( t ) = bm P X 1 = 1, S N( 2 ) ( t ) ≠ cm )
( ) ( ⎛
) ( ) ( ⎞
)
m−1
= P X 1 = 1,| S N( ) ( t ) = bm P S N( ) ( t ) = bm ⎜ P X 1 = 1, S N( ) ( t ) = 0 + ∑ P X 1 = 1, S N( ) ( t ) = c1 ⎟
1 1 2 2

⎝ i =1 ⎠
So we have

(
P X t = 1, S N(1) ( t ) = bm , S N( 2 ) ( t ) ≠ c, d , ⋅⋅⋅, n )
m
⎛ m −1 l ⎞
= R 2 ( t − bm ) ∑ vm(1)( j −1) ⎜1 + ∑∑ vl( 2 )( j −1) ⎟
j =1 ⎝ l =1 j =1 ⎠
Case 2 only change part 1

(
P X i = 1, S N( 2 ) ( t ) = cm , S N(1) ≠ bm , ⋅⋅⋅, n )
m
⎛ m −1 l ⎞
= R 2 ( t − cm ) ∑ vm( 2 )( j −1) ⎜1 + ∑∑ vl(1)( j −1) ⎟
j =1 ⎝ l =1 j =1 ⎠
#
Case k only change part k

(
P X i = 1, S N( k ) ( t ) = nm , S Ni( i =1,2,⋅⋅⋅,n ) ≠ bm , cm , ⋅⋅⋅, mm )
m
⎛ m −1 i i ( i =1,2,⋅⋅⋅,n −1) )( j −1) ⎞
= R 2 ( t − nm ) ∑ vm( n )( j −1) ⎜1 + ∑∑ vi( ⎟
j =1 ⎝ i =1 j =1 ⎠
Besides, we can obtain that
( =1,2,⋅⋅⋅,n) ) ≤ t ∪ Z ( j( j ≠i(i =1,2,⋅⋅⋅,n ))) ≤ t ⎞
( ⎝
)
F ( t ) = P Z i( i =1,2,⋅⋅⋅,n ) ≤ t = P ⎛⎜ Z i(ii=(i1,2,⋅⋅⋅ ,n ) i( i =1,2,⋅⋅⋅ ,n )


( ) (
= P Zi((i =1,2,⋅⋅⋅,n ) ) ≤ t + P Z i((i=1,2,⋅⋅⋅,n)
i ( i =1,2,⋅⋅⋅, n ) j ≠ i ( i =1,2,⋅⋅⋅,n ) )
) ( ) (
≤ t − P Zi((i =1,2,⋅⋅⋅,n ) ) ≤ t P Zi((2i =) 1,2,⋅⋅⋅,n ) ≤ t
i ( i =1,2,⋅⋅⋅, n )
)
= Fi(i =1,2,⋅⋅⋅,n ) ( t ) + Fj ≠i( i =1,2,⋅⋅⋅,n ) − Fi( i =1,2,⋅⋅⋅,n ) ( t ) Fj ≠i( i =1,2,⋅⋅⋅,n )
426 Z. Yin et al.

5 Conclusion

This paper derives the quantities of spare parts M for N identical systems based on the
repair as old model n on the condition that failures are repaired with probability P 0 ,
and a special example is given to prove the feasibility. Besides, availability function
expression is provided at the same time, and taking system which follows Weibull
distribution as the case, the validity of system is shown through calculation.

References

[1] Allen, S.G., D’esopo, D.A.: An ordering policy for repairable stock items. Operations
Research 16(3), 82–489 (1968)
[2] Cohen, M.A., Kleindorfer, P.R., Lee, H., et al.: Multi-item service constrained(s, S) policy
for spare parts logistics system. Naval Research Logistics (39), 561–577 (1992)
[3] Moore, R.: Establishing an inventory management program. Plant Engineering 50(3),
113–116 (1996)
[4] Flint, P.: Too much of a good thing: Better inventory management could save the industry
millions whileimproving reliability. Air Transport World (32), 103–106 (1995)
[5] Foote, B.: On the implementation of a control-based forecasting system for air-craft spare
parts procurement. IIE Transactions 27(2), 210–216 (1995)
[6] Luxhoj, J.T., Rizzo, T.P.: Probabilistic spaces provisioning for repairable population
models. Journal of Business Logistics 9(1), 95–117 (1988)
[7] Rajashree, K.K., Pakkala, T.P.M.: A Bayesian approach to a dynamic inventory model
under an unknown demand distribution. Computers & Operations Research 29, 403–422
(2002)
[8] Dan, Y., Xia, Y., Guoying, L.: Fiducial inference for repaired as old Weibull distributed
systems. Chinese Journal of Applied Probability and Statistics 20(2), 197–204 (2004)
[9] Xia, Y., Dan, Y., Guoying, L.: Fiducial inference for a kind of repairable equipment.
Journal of Systems Science and Mathematics Sciences 24(1), 17–27 (2004)
[10] Aronis, K.P., Magou, I., Dekker, R., et al.: Inventory con-trol of spare parts using a
Bayesian approach: a case study. European Journal of Operational Research 154, 730–739
(2004)
The Design of the Algorithm of Creating Sudoku Puzzle

Jixian Meng and Xinzhong Lu*

College of Mathematics, Physics and Information Engineering of Zhejiang Normal University,


Zhejiang, Jinhua, P.R. China, 321004
[email protected]

Abstract. Sudoku puzzle is a well-known and logical-based game. To generate


some puzzles of varying difficulty with “unique solution” is not so easy. We
make a standard of difficulty based on the player’s position, that is, difficulty of
solving methods. Then we develop an algorithm to generate puzzles satisfied the
requirement. For the complexity of our algorithm, we divide it into two parts.
One is the complexity of the algorithm to generate the complete grid. We
discover the randomness of generating complete grid increases when the
complexity increases, that is, the randomness higher and the complexity greater.
We have developed an algorithm which guarantees the most important premise
“unique solution” and ensures the complexity is low enough.

Keywords: Sudoku puzzles, complexity, algorithm.

1 Introduction

Sudoku is a well-known and time-honored game. Original Sudoku puzzle enjoys a tight
relationship with Latin Square. It firstly appeared as a logic-based placement puzzle in
“Dell Pencil Puzzles and Word Games” in 1979. In 1984 Nobuhiko Kanamoto
introduced it to Japan. The modern Sudoku was invented in Indianapolis in 1979 by
Howard Garns. He picked up a Japanese Sudoku magazine and became so enamored of
the puzzle that he spent six years writing a program named “Pappocom” which could
automatically generate the puzzles of varying number of difficulty levels.
The aim of the Sudoku puzzle is to put in a numerical digit from 1 through 9 in each
cell of a 9×9 grid made up of 3×3 sub-grids (called "block”), starting with various digits
given in some cells (the "givens") with the others empty; each row, column, and block
must contain only one instance of each numeral. Now a large number of mathematicians
and computer engineers are researching the Sudoku puzzles problem [1-5]. In this paper,
we consider the Sudoku puzzles as the classical Sudoku with 9×9 cells.
As we know, developing an algorithm to generate Sudoku puzzle is harder than to
solve Sudoku. The difficult and key aspects are how to make the standard of difficulty
level and how to guarantee a unique solution of the Sudoku puzzle generated by our
algorithm.

*
Corresponding author.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 427–433, 2011.
© Springer-Verlag Berlin Heidelberg 2011
428 J. Meng and X. Lu

2 Methods to Solve Sudoku


If the number of candidates of some one cell is only 1, then the value of cell is the single
candidate. If there is only one candidate for a given row, column or box but it is hidden
among other candidates, then the hidden candidate could be confirm. If two cells in a
group (row, column or block) contain an identical pair of candidates and only those two
candidates, then no other cells in that group could be those values. These 2 candidates
can be excluded from other cells in the group. For example, in Figure 1, cells D7 and
D8 in row D have only two candidates 1 and 7 respectively, and then the value of D9 is
4. The case occurs when three cells in a group contain no candidates other that the same
three candidates. The cells don't have to contain every candidate of the triple. If these
candidates are found in other cells in the group they can be excluded. For example in
Figure 1, in middle top block, the candidates of B4, B5 and B6 are part of 2, 7 and 9,
then the candidates of other cells in this block is impossible to be 2, 7 or 9. Thus C6 get
the value 4.

Fig. 1.

3 Our Algorithm

3.1 Standard of Difficulty Level

We make 5 levels according to the solving methods.

Level 1. If the number of candidates of some one cell is only 1.

Level 2. If there is only one candidate for a given row, column or box but it is hidden
among other candidates.
The Design of the Algorithm of Creating Sudoku Puzzle 429

Level 3. If two cells in a group (row, column or block) contain an identical pair of
candidates and only those two candidates, then no other cells in that group could be
those values.

Level 4. If three cells in a group contain no candidates other that the same three
candidates. The cells don't have to contain every candidate of the triple. If these
candidates are found in other cells in the group they can be excluded.

Level 5. The rest cases those we do not mentioned above.

3.2 Generating Algorithm

Our main idea is that: first generate a complete Sudoku grid without any blank cell
randomly, and then we empty the cells step by step according to various difficulty
levels of the final puzzle we need. During the process we called “dig holes”; we
guarantee the puzzles have unique solution respectively in each step.

3.2.1 Generate a Complete Grid


Firstly, we put 9 digits 1, 2… 9 in every row of the grid with 9×9 cells randomly. (In the
follow, each grid we mentioned has 9×9 cells). Clearly, it has many different solutions.
Then for the rest 72 cells, we use Deep Fist Search and induction to get a complete grid
satisfied the rules of Sudoku.

3.2.2 Dig Holes


Facing a complete grid, next we will dig some holes; empty some cells, to generate
varying difficulty puzzles. The difficulty of final puzzle relies on digging methods, e.g.
the method how we dig holes. Generally speaking, the digging methods are the inverse
processes of solving methods.

3.2.2.1 Generate Puzzles of Level 1


Step 1. Randomly choose a filled cell which has not been chosen previously and
empty it.
Step 2. The number of candidates of this cell is 1 or not. If not, it means the cell can’t be
emptied, then restore the cell and do Step 1 again, else go to Step 1 directly.

If the rest of filled cells can not be emptied, end digging. The derived grid is a
Sudoku puzzle we define as Level 1. For example, in Figure 4, cell B8 can’t be emptied,
because its candidates are 6 and 7 after emptying the cell. If we empty the cell G9, it has
only candidate 7, so G9 can be emptied.
The algorithm flow chart of Level 1 is as follows.
430 J. Meng and X. Lu

start

Generate a random Sudoku


map[9][9];num=0,t=1000 mark[map[nx+i][ny+j]]++
j=j+1

Randomly generated
0≤x,y<9 Yes

num=num+1 j<3?

No

map[x][y]=0?
i=i+1;j=0

No Yes

i<3?
i=1
No

mark[i]=0;i=i+1 mark[map[x][y]]==3?

Yes No
Yes
i<10?
map[x][y]=0
No

i=0
num<t?

mark[map[x][i]]++
Yes No
mark[map[i][y]]++
i=i+1

Yes END

i<9?

No

nx=x/3*3,ny=y/3*3;
i=0,j=0

Fig. 2.

3.2.2.2 Generate Puzzles of Level 2


First we execute the process of 3.2.2.1 several times, and then do the following steps.
Step 1. Using the same to Level 1 in 3.2.2.1 and remember its previous value. Without
loss of generation, we assume the value be x.
Step 2. Whether digit x satisfies the condition that be the unique one appeared in the
candidates and givens of all cells belong to the same group (row, column or block) and
The Design of the Algorithm of Creating Sudoku Puzzle 431

any other candidates don’t satisfy this condition. If not, then we restore the cell and go
to Step 1, else go on Step1 directly.
If the rest of filled cells can not be emptied, end digging. The derived grid is a
Sudoku puzzle we define as Level 2. The algorithm flow chart of Level 1 is as follows.

start

≤≤
Generate a random Sudoku

≤≤
mapp[9][9];num=0,t=1000 K=map[x][i][o];0 i 8;
mark[map[x][i][j]]++;1 j k


Randomly generated
0 x,y<9
mark[mapp[x][y]]==0?
num=num+1

No

≤≤
mapp[x][y]==0?

Init:mark[i]=0;1 i 9

No

Init:mark[i]=0;1 ≤≤ i 9
nx=x/3*3,ny=y/3*3

K=map[x][i][o];0 ≤≤
i 8; k=map[nx+i][ny+j][o]; 0 ≤≤ i,j 2;

mark[map[x][i][j]]++;1 ≤≤ j k mark[map[nx+i][ny+j][L]]++;

Yes
1≤≤ L k

Yes

mapp[x][y]==0?

mark[mapp[x][y]]==0 ?

No

Init:mark[i]=0;1 ≤≤ i 9

mapp[x][y]=0

Num<t

No

END

Fig. 3.

For example, in Figure 4, there is no 7 in candidates and givens of the cells in row C
except cell C5, so it only can be filled with 7.
432 J. Meng and X. Lu

Fig. 4.

3.2.2.3 Generate Puzzles of Level 3


Step 1. Following the procedures blow.
First, without loss of generation we empty two cells D6 and F6 belong to the same
column and block.
Second, empty the cell of value 4 in row D and the cell of value 5 in row F.
Third, empty a cell in column 6 randomly, for example H6.
Last, empty all the cells of value digits 4 or 5 in row H and middle bottom block.
Step 2. Execute several process in 3.2.2.1 or 3.2.2.2 while make sure the candidates of
D6 and F6 always be 4 and 5 till no hole could be dug. The derived grid is a Sudoku
puzzle we define as Level 3, as figure 5.

Fig. 5.
The Design of the Algorithm of Creating Sudoku Puzzle 433

3.2.2.4 Generate Puzzles of Level 4 and 5


The processes to generate puzzles of Level 4 and 5 are extremely similar to 3.2.2.3.

3.2.3 Complexity of Our Algorithm


In order to generate some puzzles of varying difficulty, we use different methods to dig
holes. Each digging a hole is corresponding to a different solving method, and then we
define the digging strategies as the inverse processes of solving methods. At the same
time, we guarantee the two inverse processes are one-to-one mapping.
For the complexity of our algorithm, we divide it into two parts. One is the complexity
of the algorithm to generate the complete grid. We discover the randomness of generating
complete grid increases when the complexity increases, that is, the randomness higher
and the complexity greater. In order to ensure certain randomness, we don’t intend to
reduce the present complexity. Therefore we find the algorithm to generate a random
complete grid is logical and feasible. For different digging algorithms, we have made our
greatest efforts to minimize their complexity. We have developed an algorithm which
guarantees the most important premise “unique solution” and ensures the complexity is
low enough.

Acknowledgments

This work is supported by the National Natural Science Foundation of China


(10971198) and (ZSDZZZZXK03).

References

1. Felgenhauer, B.: Mathematics of Sudoku I. Mathematical Spectrum 39, 15–22 (2006)


2. Russell, E., Jarvis, A.F.: Mathematics of Sudoku I. Mathematical Spectrum 39, 54–58
(2006)
3. Suchard, E., Yatom, R., Shapir, E.: Sudoku & Graph Theory. Dr. Dobb’s Journal 1 (2006)
4. Abbott, P.: In and Out: In-Flight Puzzle. Mathematica J. 9, 528–531 (2005)
5. Lee, W.M.: Programming Sudoku. Springer, New York (2006)
Research and Validation of the Smart Power Two-Way
Interactive System Based on Unified Communication
Technology

Jianming Liu*, Jiye Wang, Ning Li, and Zhenmin Chen

State Grid Information & Telecommunication Co., LTD, 28th Floor, Times Fortune
Building, No. 1 Hang Feng Road, 100070, Fengtai, District, Beijing, China
[email protected]

Abstract. Smart power is an important part of smart grid construction, and it


directly to the user. The construction of the smart power two-way interactive
system related to the realization of intelligent electricity. In this paper, the
intelligent electricity network business needs have been analysed first, then the
existing communication technologies and the networking architecture have
been compared and researched. Following the unified communication technol-
ogy and selecting the appropriate communication technology, a fused, flexible
and reliable communication network architecture program which is based on
the OPLC, including power line communications, wireless communications and
other communications, has been formed. The program is the basis and
guarantee of the system. And the system advances the ability of exchanging
information between the user and the grid and promotes the development of in-
telligent grid.

Keywords: Smart Power; Two-way Interactive System; Optical Fiber Compos-


ite Low-voltage Cable; Power Line Communication; Wireless Communication.

1 Introduction
With the economic development, in 21st century people will face energy shortages.
The sustainable development of energy has become the focus of the world. Energy
saving, exhaust reduction and low carbon economy has become an international trend.
There is a lack of interaction between the traditional grid and users, so the user can’t
know the real-time information of power consumption, etc, even less accomplish the
remote control of home appliances, etc. As a result, energy waste appears everywhere.
Therefore, the State Grid Corporation positively change the development mode of
power and the smart grid which makes up the deficiencies of traditional grid comes
into being. It can realize real-time interaction response between user and gird, en-
hance the comprehensive service capacity of network, and meet demand for interac-
tive marketing. The construction of the user side of the communication network is the

*
Jianming Liu (1955 -), male, comes from Rongcheng Shandong Province, senior engineer
(professor), doctoral tutor, researching in the power system information and communication
technologies, smart grid research, application and promotion.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 434–440, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Research and Validation of the Smart Power Two-Way Interactive System 435

foundation of intelligent electricity, and is the premise and guarantee to meet the
power of information, automation, interactive.
At this stage, in the construction of smart grid field developed countries mainly
concentrate on household energy management and renewable energy access, etc; in
our country, intelligence information gathering, intelligent home, electricity
Value-added services and so on has been carried out during a period of early explora-
tion and pilot construction. With the intelligent electricity business development, the
needs of business information integration and interaction will increase. A safe, reli-
able, high bandwidth, practical intelligent power communication network system is
urgently needed to develop and construct to achieve well interaction between user
and gird.

2 Smart Consumption Services Network Needs Analysis and Key


Communication Technology
2.1 Needs Analysis

With the development of “Tripe Play”, more and more referred value-added services
can be used in the smart consumption interaction system, which will be well devel-
oped in a few years, while the power information collection services and smart home
appliance have already been used in it. According to the differences between them,
the needs to the network in the aspects of security, stability and real time are also
differ from each other.
Table 1 shows the needs of intelligence use of electric power to the communication
network. As we know, the node is range between the characteristics of two-way inter-
active communication system in intelligent power using, type of business diversity,
the trend of data flow is concentrated, data bandwidth requirements are different etc.
In order to meet the demand of the business, the system construction should be com-
bined with business characteristics, to adopt appropriate means of communication,
complete the communication network system accompany with the characteristics of
power system.

Table 1. The network needs in different services

Real
Needs
Bandwidth Security Stability time
Service
Power
One-way meter :573 bps,
information high middle low
three-way meter: 684 bps
collection
Smart home
About 10kbps high middle low
appliance
Phone:100bps According
“Tripe Play” and
IPTV (HD):8Mbps to the
referred value-add High high
Internet: according to the content
services
customer’s needs supplier
436 J. Liu et al.

2.2 Key Communication Technology


Due to diversity ways to communicate, separately advantages and disadvantages, as
well as different scope for application, various communication technologies will be
melt together to achieve the requirements of communication network of the system.
The communication technologies mainly referred are as follows:
(1) Optical Fiber Communication
Optical fiber communication is a kind of communication technologies that makes
optical waves as the carrier of information, and optical fiber as the media of transpor-
tation. It has advantages of high communication capacity, transportation distance,
transportation quality, security performance, anti-interference ability, and stable, real-
time date transportation, which can meet the service requirements of the smart elec-
tricity dual-interactive system. EPON will be applied to construct the network in the
system.
Optical Fiber Composite Low-voltage Cable, which composites optical fiber in
low-voltage cable, has both functions of cable and optical fiber communication, hence
it is also called electrical power optical fiber. Its construction picture is as below.
With the continuous development of technology of communication network, FTTx is
the necessary choice to meet the requirements of high capacity and wide bandwidth.
At the same time, the advantages of avoiding twice-wiring, as well as saving con-
struction costs, bring great opportunity to the development of FTTx. Hence, OPLC
will be applied to construct the network in the system.

Fig. 1. Composite Structure of OPLC

(2) Power Line Communication


Power Line Communication(abbreviated as PLC), which transmits date and voice
signals through power line, is the unique communication way in power system. PLC
is divided into wideband technology and narrowband technology according to the
differences of bandwidth and modulation band. Narrowband technology is used in the
Research and Validation of the Smart Power Two-Way Interactive System 437

system. The scope of its application is greatly limited because of its low transmission
rate, short transmission distance and stability apparently affected by power network.
However, for the communication networks which require low reliability and date
bandwidth, narrowband technology can be applied because it is simple to construct
and can avoid wiring. It is mainly applied in the functional modules of electrical in-
formation collection and smart home system.

(3) Wireless Communication


Wireless communication has advantages of low costs and flexibility to construct net-
work, but its coverage scope and penetrating ability are limited. Wireless Communi-
cation is mainly applied in the functional modules of smart home system, wireless
sensing security system and the collection of the meters of water, electricity and gas.
1)ZigBee
Zigbee is a kind of wireless communication technology that is fit for short dis-
tance(10-100 meters), low rate(lower than 25 kbps) communication among electronic
devices. It is applied in far-distance control for smart home and home security protec-
tion.
2)RF 433
Radio Frequency(abbreviated as RF) is a kind of high frequency alternating current
electromagnetic wave. RF433 is a wireless communication system worked at
433MHz, which belongs to ISM. It can be widely applied for the various fields of
short distance wireless communication and industry control, such as the collection of
the meters of water, electricity and gas.
3)WiFi
WiFi, the alternative name of IEEE802.11b, is a kind of wireless network connec-
tion technology, and can be used to construct wireless local area network. It has
advantages of high transmission rate and wide coverage scope, and can be widely
applied for the construction of wireless local area network.
4)Infrared communication
Infrared Communication can accomplish short-distance communication security
and information retransmission between two points through infrared technology. It
has advantages of high security, information capacity and simple construction, but
affected by the characteristic of infrared ray itself, it is easy to be absorbed by dust
and rain. It can be applied for short-distance indoor wireless communication, such as
home security protection.

3 Network Scheme of Intelligent Power Two-Way Interactive


System’s Communication

3.1 Overall Network Architecture

Through the analysis of network demands and these communication technologies,


according to the different nodes of communication network, the appropriate commu-
nication mode is selected. the overall network architecture of intelligent power two-
way interactive system is shown in Figure 2.
438 J. Liu et al.

Fig. 2. Overall Network Architecture based on Optical Fiber Composite Low-Voltage Cable

From the power company to the users’ side, the demands of the channel band-
width, real-time performance, safety and reliability are high. Using the fiber commu-
nication technology, can meet intelligent power two-way interactive function, and
support the construction of smart power grid. Besides, it can meet the demand of
intelligence power and carry triple play service.
The OPLC(Optical Fiber Composite Low-voltage Cable) is laid in these communi-
cation lines of 10kv and the inferior, that is, from the 10kV substation to the power
distribution boxes in buildings, then to the meter box in the floor, and finally to the
power distribution boxes in user’ house. With access of power lines, optical fiber
access can be followed, which reduces wiring construction and material costs.
OLT deployed in the 10kV substation or 110KV substation, provides network (Inter-
net, radio and television networks, telecommunication network) centralization and ac-
cess, while can complete the optical / electrical conversion, the bandwidth allocation,
the control of the connection of these channels, and the functions of real-time monitor-
ing, management and maintenance. ONU deployed in the users’ houses and the location
of the meter box in the floor, realizes the transparent transfer of user data, the service of
voice and video and the upload of meter data. The optical interface in fiber power Meter
which integrates optical network unit (ONU) function, can communicates directly with
the OLT to realize the acquisition and control of the meter data.

3.2 Indoor Network Communications

In the user’s house, the network diagram of these network devices, such as the control
center, intelligent interactive terminals, intelligent interactive set-top boxes, smart
sockets, handheld terminals diagram is shown in Figure 3. The control center is con-
nected to the interactive terminal equipment to realize these terminal operations via
Ethernet or WIFI. Due to lower demand of bandwidth, power information gathering
systems business adopts power line communication technology to prevent re-wiring
construction. Within the small range indoor, intelligent appliance control, home
security, and water, gas collection should be realized. Because of the complex of
Research and Validation of the Smart Power Two-Way Interactive System 439

Fig. 3. Indoor Network diagram

equipment display and complicated wiring, wireless communication technology can


be used. Through the connection of home access network and public network, the
95598 Web site and Internet connection can be connected.

4 The Pilot Project Verification


The smart two-way interactive service system has been applied in many pilot projects,
such as Lian Xiangyuan Community, Fu Cheng Road 95 and Guang Huaxuan
community in Beijing, YuefuHaoting community in Shanghai, Haiyan in Zhejiang
Province, the pilot project of Chongqing Province and so on. In these experimental
projects, through the communication network, the system makes supply companies,
the community and families together organically. It realizes information collection of
the power, intelligent electricity, home control, community services and many other
data services. And it has achieved better results. At the same time, the system will be
used in the construction of intelligent communities in northern China, Beijing, Liaon-
ing, Jiangsu, Henan and other power companies.

5 Conclusions
This article has researched the characteristic of the electric power communication
network and the business requirements analysis of the intelligent power in the user
side. Then communication network structure of the smart two-way interactive service
system has been proposed in the paper. According to the scope of communication
technologies to select the best networking solutions, the flexible and reliable commu-
nication network system, based on the integrate of OPLC, EPON and micro power
440 J. Liu et al.

wireless communication network, has been carried out, which has been verified in the
application of the pilot projects and obtained perfect results. This smart two-way
interactive service system based on many communication technologies is significant
to the development of the smart grid, which implants some systems, realizes intelli-
gent power and supports construction of the smart grid.

References
1. Chen, L.: The Design of Intelligent Village and Intelligent Building System. China archi-
tecture & building press, Beijing (2000)
2. Zhang, F., Zhang, C.: Study on the short range wireless communication technique and its
merging developing trends. Electrical Measurement & Instrumentation 10, 48–52 (2007)
3. Qi, M., Qi, C., Huang, T.: Home automation system based on power line carrier communi-
cation technology. Electric Power Automation Equipment 25(3), 72–75 (2005)
4. Pu, L.: Design of Wireless Communication Protocol for Home LAN. Intelligent Ubiquitous
Computing and Education, 374–377 (2009)
5. Liu, J., Zhao, B., Li, X.: The report of transmission power line carrier-current communica-
tion test in State Grid Information & Telecommunication Co., LTD, 5 (2009)
6. Akyildiz, I., Su, W., Sanakarasubramanian, Y., et al.: Wireless Sensor Networks: A Sur-
vey. Computer Networks 38(4), 393–422 (2002)
A Micro Wireless Video Transmission System

Yong-ming Yang1, Xue-jun Chen1,2,*, Wei He1, and Yu-xing Mao1


1
State Key Laboratory of Power Transmission Equipment & System Security and
New Technology, Chongqing University, No.174 Shazhengjie Chongqing, 400044, China
2
Department of Electronic Engineering, Putian University, Putian, 351100, China
[email protected], [email protected], [email protected],
[email protected]

Abstract. The paper introduces a micro embedded wireless video transmission


system based on WIFI. The system communication is based on C/S structure.
Server takes the TI DaVinci technology to process video and uses H.264 for
video compression before transmission. Realization of four modules for the
server is introduced. All the hardware were designed and made into a micro
wireless video transmission server. Furthermore, it realized server and client
software. Finally, an experiment was done to test the system. The results show
that the system can support real-time wireless video transmission service well.

Keywords: Wireless transmission, Video, Network, Encoding, Surveillance.

1 Introduction
With development of multimedia technology, the video surveillance has been widely
applied in factory workshop, road traffic, bank, mine factory, malls, airport safety, hos-
pitals and so on [1-3]. However, there are some important locations, such as unmanned
substation, mobile cars or ships, in the places where a lot of wired surveillance haven’t
or can’t reach. And for the unattended patient, old people, robot and so on, they do not
need the wired lines. The wireless video transformation would be the best way.
As the development of wireless communication and networking systems and video
coding technology, the wireless video surveillance has been likely. In short-range wire-
less communication, Bluetooth was used [4], but it is not practical for the high band-
width needed for video surveillance. At end of last century, the second generation (2G)
brought us digital mobile communication. It includes GSM network, GPRS network and
CDMA network. Reference [5] respectively introduced the three networks used in wire-
less video. The maximum data transmission rate was no more 200Kbit/s [6-8].
At the beginning of this century, the third generation (3G) is characterized by its abil-
ity to carry data at much higher rates. It would enable multimedia communication with
bit rates ranging from 144 to 384 kbps outdoor and bit rates up to 2 Mbps indoor [4].
Although 2G and 3G radio access networks are now becoming common practices for
users to access data, it is usual for users to access more data with more money [5]. And
now, the pricing trend is progressing toward the unlimited internet model. WLANs are
of different types such as 802.11, 802.11b, 802.11g, 802.11a, and 802.11n networks,

*
Corresponding author.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 441–448, 2011.
© Springer-Verlag Berlin Heidelberg 2011
442 Y.-m. Yang et al.

where the maximum data transmission rates are 1, 2, 11, 54, and 600Mbps, respectively
[9]. IEEE802.11n is a new high-speed transmission scheme used in WLANs. It focuses
on general data transmission and wireless Internet services.
In order to enable the transmission of more video channels or higher quality video
representations within existing digital transmission capacities, in the last decade,
video compression technologies have evolved in the series of MPEG-1,MPEG-2,
MPEG-4, and H.264[10]. H.264/AVC is the newest international video coding stan-
dard [11]. References [10, 12-14] show that H.264 has achieved substantial superior-
ity of video quality over that of H.263, MPEG-2, and MPEG-4. It has achieved up to
50% in bit rate saving compared with H.263 or MPEG-4 coding schemes. This means
that H.264 offers significantly higher coding quality with the same bit rates [13].
Therefore, the paper takes H.264 to compress the video before wireless transmission.
On the other hand, the bulks for the most of the wired video surveillance devices
are big, so they are not used in small locations or portable surveillance. The paper
takes the TI DaVinci technology to process video, and applies H.264 in video com-
pression. Then the video compression pockets would be transformed to real time
transport protocol (RTP) format [15-16], they will be transmitted to PC client through
the WIFI communication module. All the hardware were designed and made into a
micro wireless video transmission server. VC++ was taken to design the video sur-
veillance interface at the PC client.

2 System Hardware Design


The micro wireless video transmission server was made up of video acquisition mod-
ule, ARM&DSP dual-core processor module, WIFI communication module and pow-
er module. Camera of the video acquisition module captures scene real time and
transmitted them to DSP processing center through video encoder. The DSP process-
ing center will process the received videos, including denoise, storing, compressing
and output. After the video were compressed and transformed into RTP format, they
would be transmitted to the wireless network through the WIFI communication mod-
ule. Wireless router was taken as an access point (AP). It connects with the PC. If
there are not any routers in the LAN or WLAN, the USB-router would be directly put
into PC as an AP. The video signals through wireless network would be received and
decompressed by PC. Users can monitor the acquisition videos and send the controller
signals to the micro wireless video transmission server with interface software. The
wireless video transmission system diagram is shown in Fig.1.
Camera1
ARM&DSP

WIFI Network

Camera2

PC Client
Receive/Send Packet
AP
Decode/Encode

Fig. 1. Wireless video transmission system Fig. 2. System hardware configuration


diagram
A Micro Wireless Video Transmission System 443

2.1 Processor Module


TMS320DM6446 was used as center processor of the micro wireless video transmis-
sion server. TMS320DM6446 has a dual-core architecture which provides benefits of
both DSP and reduced instruction set computer (RISC) technologies, and incorporates
a high-performance TMS320C64x+ DSP core and an ARM926EJ-S core. The system
hardware configuration is shown in Fig.2.
Peripheral chips connect with DSP through dedicated interface or general bus. The
embedded operating system was migrated to the system. 16M flash was extended and
used to store configuration data and boot code file. When the system boots, processor
will read the configuration information from flash and initialize the system. A start
code will be read into the DDR2, and then run it to startup in order to read the second
startup coding program from external program memory or download from the JTAG.
Finally, the configurations and initialization of the peripheral chip were done by us-
ers` program, and the system will run.
After the system is running properly, it will process, store, compress and transmit
the acquisition videos from cameras. On the other hand, it will execute the control
commands received from PC Client through wireless network.

2.2 Video Acquisition Module


This system was designed with two cameras, and set aside two. Because TVP5146
can transform NTSC and PAL video into digital color difference signal (YUV422),
supporting two composite video (CVBS) or an S-Video (Y/C) input. User-
programmable video output format is ITU-R BT.656 4:2:2 YCbCr. It has a small
package (32-Pin TQFP), low power consumption. Two TVP5146PBS chips were
used to encode two capture videos. Each uses only one channel composite video sig-
nal. TVP5156 samples and encodes the videos from camera capturing, and transform
it into 4:2:2 YUV. Finally, encoded videos were sent to DDR2.

2.3 WIFI Communication Module


The system takes WG7201 as WIFI communication module. The WG7201 is a single-
package wireless local area network (WLAN) module that integrates several advan-
tage components. It has single chip MAC, base band, and direct conversion radio.
And there is IEEE 802.11 b/g radio power amplifier (PA), low noise amplifier (LNA),
and transmit/receive (T/R) switch. WG7210 interface provides a configurable secure
digital input and output (SDIO) host interface for WLAN radio frequency (RF) signal.
To have better SDIO timing access, we reserve resistors and capacitors on SDIO
traces for timing fine tune between DSP and WIFI module.

2.4 Power Module


In order to make the wireless video transmission server miniaturization and wireless,
battery was used to supply power. DC input voltage is 3.7V. Working voltages of the
processor module are 3.3V, 1.8V and 1.2V. 3.3V was supplied for DSP I/O, NOR
flash and other chips. 1.8V was used for DDR, NAND flash and others. And working
voltage of DSP core is 1.2V. Two cameras of the system work with DC 12V. The
power module set aside a 5V power for LCD. It used boost DC/DC to get DC 12V
444 Y.-m. Yang et al.

and 5V in order to save power. The powers of cameras and LCD would be turned off
when they do not need. A battery monitor AD was used to monitor the battery voltage
in order to prevent the system turned off by a sudden power-down. At the same time,
to ensure the lithium battery life, when battery voltage is down to 3.4V, it would
prompt user to replace battery. If battery voltage is down to 3.3V, the system auto-
matically shut down. The power module diagram is shown in Fig.3.

Fig. 3. System hardware configuration Fig. 4. The circuit board of wireless video
transmission server

2.5 Hardware Board Design


The circuit board of wireless video transmission server was divided into two part
boards to make it miniaturization. Power module and video acquisition module were
integrated into a board as bottom layer plate. It is shown in Fig.4. Processor module,
WIFI module and debug interface were made as a top layer plate. Bottom layer plate
and top layer plate were connected by a HPI interface slot. Debug interface was used
for debugging and downloading program. The two plates were made in 8-layer circuit
layout, and thicknesses were 1.65mm. Hot blast flatting, resistance welding through-
hole, green soldering and lead-free technologies were used for circuit processing. The
sizes of the two plates are about equal to IC Card.

3 System Software
Client/Server(C/S) structure was used to design the system software. Software design
is divided into server-side software and client software, that is, the wireless video
transmission server and PC for the client. Server-side software is consisted of four
modules by video capture, video compression, video transmission and control proc-
essing. And client-side software was designed by three functional modules. They are
video receiving, video decoding and video displaying.

3.1 Server-Side Software

There are several available operating systems for the current embedded environment.
Embedded Linux is one of them. It is promoted as inexpensive, high quality, reliable,
widely available, and well supported. So the server uses embedded Linux operating
system to realize multi-threaded operation.
A Micro Wireless Video Transmission System 445

The server-side software flow chart is shown in Fig.5. After Linux system boots
and the hardware of processor module ware initialized, system will use the “Serv-
er_initWifi” to initialize hardware and configure network parameters of WIFI module.
Then “Server_initVideoEncode” is used to initialize video encoding. At the same time
it create a video coding thread to capture video and initialize video encoding queue in
order to store encoded video. Thirdly, ”Server_initUserinfo”and “Serv-
er_initCheckClientHeart” were used to initialize user information and create the
checking client heart thread respectively. Finally,”Server_initNet” is taken to initial-
ize the network, create listen thread and wait for PC client to visit.

3.2 Design of Wireless Video Transmission


System uses the H.264 to compress video to improve transmission efficiency. In order
to ensure transmission real-time, RTP is used for video stream transform. Therefore,
H.264 video streams transmitted by RTP, that is, H. 264 video data need to be encap-
sulated packets meeting RTP specification.
Wireless video transmission process is shown in Fig.6. The thread created by “Serv-
er_initVideoEncode” is responsible for video encoding. When "YuvBuffer_Full" equal
one, it means that a new video frame is capture successful. Then the thread will send
them to DSP to compress with H.264 standard format. After that, the compressed data
will be converted into RTP stream format. Created listen thread is responsible for listen-
ing on a specified port. When user visits the server at first time on the PC client, the first
thing is to visit the port to be authenticated. Only through the certification, could PC
client receive video data from server or send commands to server. That is, the client and
server data transmission link would be built up. Similarly, when the client “socket”
receives the video packet, it will display monitor after its decompression.

Start

Server_initWifi

Server_initVideoEncode

Server_initUserinfo

Server_initCheckClientHeart

Server_initNet

End

Fig. 5. Server-side software flow chart Fig. 6. Wireless video transmission process

When the user wants to close the client, interrupt of the link between client and
server is an interactive process. First, the client sends exit command through the con-
trol socket thread. When server receives the command, it would return the appropriate
information that accepts the exit command. After that, it would release the video
thread, and then exit control thread.
446 Y.-m. Yang et al.

Fig. 7. Client software framework Fig. 8. Client software interface

3.3 Client-Side Software

Client software is designed in three layers, such as the hardware layer, soft control
and decoding layer and soft interface layer. They are shown in Fig.7. Hardware layer
would be designed to receive the video packets and control packets through calling
windows API program. Soft control and decoding layer is responsible for packing and
sending the control commands from soft interface layer, and receives video data
packets and decodes them to display in soft interface layer in time.
Soft control and decoding layer consists of video decoding module and control
module. They are different threads. Control threads will get the control command
from “Control Button” to change the video bit rate, frame rate, encoding mode and so
on. Each control thread has a specific socket, Client and server is to communicate
through the socket. In order to make the client achieve simultaneously multiple moni-
toring, and to exclude the interference of different wireless networks, the client login
and connecting with server should be certified before they can implement link estab-
lished. Listening port of authentication is the port of socket bound. For each passing
authenticated user, there is a control thread will be created. When users log in and are
certified, they can accept video data pockets sent from the server.

4 Experiment and Result

In the experiment, the client and server are made up of a wireless LAN. Client and
server are set on the same IP network segment. Service set identifier (SSID) of server
is CQUvideo, so the SSID of wireless router is also set to CQUvideo. Set up the wire-
less transmission rate is 25 frame video captures per second. The server starts can be
monitored by the client through hyper terminal.
When the client software is running, it will be connected to the server to login and
be authenticated. Total sampling frames are set to 250, and each frame size of video
capture is 352x288. It spends about 4s to capture, transmit and display all the frames.
Client software interface is shown in Fig.8. From the effects of surveillance video, the
wireless transmission is normal and reliable. For RTP is used to transmit video, the
video is almost synchronous with the scene. There is only 120ms delay, almost negli-
gible. The system can monitor real-time wireless video transmission.
A Micro Wireless Video Transmission System 447

5 Conclusions
The paper realized a micro wireless video transmission system. Compared with con-
ventional wired surveillance system, the system uses WIFI to transmit video, there are
four significant advantages: Firstly, the size of server is small and miniaturization,
that could allow the server to be used for smaller locations or robot eyes. Secondly, it
is convenience for users to build network with the internet LAN based on WIFI wire-
less communications. Thirdly, the realization of battery-powered could be applicable
for no power occasion. Fourthly, the server is realized completely wireless, and it do
not need wiring and save a lot of work.
The future work, the system will be applied to industrial production control, robot
vision, medical video transmission and other fields. The system will be improved and
upgraded in the specific application.

Acknowledgments. This work is supported by State Key Laboratory of Power


Transmission Equipment&System Security and New Technology of China (Project –
2007DA10512709206), and the Natural Science Foundation Project of Chongqing of
China (2007BB2152).

References
1. Foresti, G.L.: Object Recognition and Tracking for Remote Video Surveillance. IEEE
Transactions on Circuits and Systems for Video Technology 9, 1045–1062 (1999)
2. Haering, N., Péter, L.: Venetianer and Alan Lipton. The evolution of video surveillance: an
overview. Machine Vision and Applications 19, 279–290 (2008)
3. Remagnino, P., Velastin, S.A., Velastin, S.A., Trivedi, M.: Novel concepts and challenges
for the next generation of video surveillance systems. Machine Vision and Applica-
tions 18, 135–137 (2007)
4. Rudagavi, M., Heiiizelman, W.R., Webb, J., Talluri, R.: Wireless MPEG-4 video commu-
nication on DSP chips. IEEE Signal Processing Magazine 17, 36–53 (2000)
5. Etoh, M., Yoshimura, T.: Wireless video applications in 3G and beyond. IEEE Wireless
Communications 12, 66–72 (2005)
6. Lehtoranta, O., Suhonen, J., Hännikäinen, M., Lappalainen, V., Hännikäinen, T.D.: Com-
parison of video protection methods for wireless networks. Signal Processing: Image
Communication 18, 861–877 (2003)
7. Erdmann, C., Vary, P., Fischer, K., Xu, W., Marke, M., Fingscheidt, T., Varga, I., Kaindl,
M., Quinquis, C., Kövesi, B., Massaloux, D.: A candidate proposal for a 3GPP adaptive
multi-rate wideband speech codec. IEEE International Conference on Acoustics 2, 757–
760 (2001)
8. Rahnema, M.: Overview of the GSM System and Protocol Architecture. IEEE Communi-
cations Magazine 31, 92–100 (1993)
9. Lin, C.-F., Hung, S.-I., Chiang, I.-H.: An 802.11n wireless local area network transmission
scheme for wireless telemedicine applications. Proceedings of the Institution of Mechani-
cal Engineers, Part H: Journal of Engineering in Medicine 224, 1201–1208 (2010)
10. Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the H.264 /AVC
Video Coding Standard. IEEE Transactions On Circuits and Systems for Video Technol-
ogy 13, 560–576 (2003)
448 Y.-m. Yang et al.

11. Joint Video Team of ITU-T and ISO/IEC JTC 1: Draft ITU-T Recommendation and Final
Draft International Standard of Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC
14496-10 AVC). Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, JVT-
G050 (2003)
12. Schwarz, H., Marpe, D., Wiegand, T.: Overview of the scalable video coding extension of
the H.264-AVC standard. IEEE Transactions on Circuits and Systems for Video Technol-
ogy 17, 1103–1120 (2007)
13. Yu, H., Lin, Z., Pan, F.: Applications and Improvement of H.264 in Medical Video Com-
pression. IEEE Transactions on Circuits and Systems-I:Regular Papers 52, 2707–2716
(2005)
14. Saponara, S., Blanch, C., Denolf, K., Bormans, J.: The JVT advanced video coding stan-
dard: complexity and performance analysis on a tool-by-tool basis. In: Proc. 13th Int.
Packetvideo Workshop, Nantes, France (2003)
15. Basso, A., Cash, G.L., Civanlar, M.R.: Real-time MPEG-2 delivery based on RTP: Im-
plementation issues. Signal Processing: Image Communication 15, 165–178 (1999)
16. Busse, I., Deffner, B., Schulzrinne, H.: Dynamic QoS control of multimedia applications
based on RTP. Computer Communications 19, 49–58 (1996)
Inclusion Principle for Dynamic Graphs

Xin-yu Ouyang1,2 and Xue-bo Chen2

1
School of Control Science and Engineering, Dalian University of Technology,
Dalian Liaoning 116024, China
[email protected]
2
School of Electronics and Information Engineering, Liaoning University of Science and
Technology, Anshan Liaoning 114051, China
[email protected]

Abstract. From the point of view of graph, complex systems can be described by
using dynamic graphs. Thus, the correlative theory of dynamic graphs is
introduced, and inclusion principle for dynamic graphs is provided. Based on the
inclusion principle and permuted transformation, a decomposition method for
dynamic graphs is proposed. By using the approach, the graph can be decomposed
as a series of pair-wise subgraphs with desired recurrent reverse order in the
expanded space of graph. These decoupling pair-wise subgraphs may be designed
to have respective controllers or coordinators. It provides us a theoretic framework
for decomposition of complex system, and is also convenient for the decentralized
control or coordination of complex systems.

Keywords: Inclusion Principle, Dynamic Graphs, System Decomposition,


Complex System, Graph Decomposition.

1 Introduction
Research on complex systems has become a focus in the world currently. A complex
system may be interpreted as composing of multiple small systems and these subsystems
are generally easier to study, whether it is carried out analytically, experimentally, or
computationally. However, hitherto there hasn’t uniform definition for complex systems
in all literatures. For the motive of this paper, we call the system as complex system,
which consists of multiple homogeneous subsystems and has many dynamic
interconnections between subsystems, such as multi-agent systems, electric power
systems, multi-vehicle system, etc. [1-2] In general, the dynamic interconnections are
time-variant. Since complex systems have the characteristics of high dimensions and
variable topology structure constraints, traditional centralized control methods aren’t
adapted to them obviously, decentralized control has been a method in common use.
Thus, how to decompose the system is worth researching. In additional, decomposition is
also the premise of the control or coordination.
From the point of view of graph, complex systems may be described by dynamic
graphs, where the vertices and edges denote the subsystems and dynamic
interconnections respectively. Thus, when complex systems are discussed, they may be
abstracted as dynamic graphs. Based on the above stated, we will provide a
decomposition method for dynamic graphs, and use the method we can decompose
complex systems and complex networks.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 449–456, 2011.
© Springer-Verlag Berlin Heidelberg 2011
450 X.-y. Ouyang and X.-b. Chen

2 Dynamic Graphs
Consider a weighted directed graph D = (V , E ) which is an ordered pair with N
vertices in the set V and edges in the set E . The vertices {v1 , v2 , , vN } are connected
by edges ( vi , v j ), each edge being oriented from v j to vi , here i, j ∈ {1, 2, , N } . We
assign a weight eij for each edge if edge ( v j , vi ) ∈ D , while eij = 0 if ( v j , vi ) ∉ D .
First let’s define a space Ω of graphs with a fixed number N of vertices, as a linear
space over the field R of real numbers. For any D1 , D2 ∈ Ω , there is a unique graph
( D1 + D2 ) ∈ Ω called the sum of D1 and D2 , and for any D ∈ Ω and any α ∈ R , there
is a unique graph α D ∈ Ω called the multiplication of the graph D by a scalar α .
Consider the space Ω of graphs and a family of mapping Φ (t , D ) , which to any
graph D ∈ Ω and any time t ∈ R assigns a graph Φ ∈ Ω .According to the literature
[3], the definition of dynamic graphs will be given in the following.
Definition 1. A dynamic graph D is a one-parameter mapping Φ : R × Ω → Ω of the
space Ω into itself satisfying: (i) Φ (t0 , D0 ) = D0 , ∀t0 ∈ R, ∀D0 ∈ Ω ; (ii) Φ (t , D ) is
continuous, ∀t ∈ R, D ∈ Ω ; (iii) Φ (t2 , Φ (t1 , D)) = Φ (t1 + t2 , D ), ∀t1 , t2 ∈ R, ∀D ∈ Ω .

Definition 2. The number of other vertices connected to the vertex vi is called its
degree and will be denoted by d vi ; The D = (V , E ) for which V ⊆ V and E ⊆ E is
called a subgraph of D .
With dynamic graph D we associate the isomorphic concept of N × N adjacency
matrix E = (eij ) , it can be stated [3]:

Definition 3. A dynamic adjacency matrix E is a one-parameter mapping Ψ :


R × R N × N → R N × N of the space R N × N into itself satisfying: (i) Ψ (t0 , E0 ) = E0 ,
∀t0 ∈ R, ∀E0 ∈ R N × N ; (ii) Ψ (t , E ) is continuous, ∀t ∈ R, E ∈ R N × N ; (iii)
Ψ (t2 , Ψ (t1 , E )) = Ψ (t1 + t2 , E ), ∀t1 , t2 ∈ R, ∀E ∈ R N × N .
Based on definition 1 and 3, dynamic graph can be also described by using an
augmented adjacency matrix

⎡ 0 v1 v2 vN −1 vN ⎤
⎢ v 0 e12 e1, N −1 e1N ⎥
⎢ 1 ⎥
⎡ 0 V( N ) ⎤ ⎢ v2 e21 0 e2, N −1 e2 N ⎥
D ( N +1) = ⎢ T =⎢ ⎥
EN × N ⎥⎦ ⎢
(1)
⎣V( N ) ⎥
⎢ vN −1 eN −1,1 eN −1,2 0 eN −1, N ⎥
⎢ ⎥
⎣⎢ vN eN 1 eN 2 eN , N −1 0 ⎦⎥
Inclusion Principle for Dynamic Graphs 451

where V( N ) = (v1 , v2 , , vN ) is a vector consist of vertices vi ∈ V , T denotes


transpose. Thus, the augmented adjacency matrix D ( N +1) includes the information of
all vertices and edges for a dynamic graph.
Since D ( N +1) represents a dynamic graph, it can be expanded to N dimension space
of graph from N dimension, and described by

⎡ 0 V( N ) ⎤
D ( N +1) = ⎢ T ⎥ (2)
⎣⎢V( N ) EN × N ⎦⎥

Definition 4. The graph D ( N +1) = (V , E ) is an expanded dynamic graph of


D ( N +1) = (V , E ) if N < N , V ⊂ V , E ⊂ E . And their relations satisfying

V − V = {vN +1 , vN + 2 , , vN },
(3)
E − E = {eij , e ji | i = 1, , N ; j = N + 1, , N } ∪ {eij , e ji | i, j = N + 1, , N}

3 General Descriptions of Complex Dynamic Interconnected


Systems
Consider an n -order complex system with dynamic interconnections S = {Si } with
N ( N ≥ 3 ) subsystems described by

Si : xi = f i ( xi , ui , t ) + ∑ j =1 eij h j ( x j , u j , t ) ; yi = gi ( xi , t ), i = 1, 2,
N
,N (4)
j ≠i

where xi ∈ R ni , ui ∈ R mi and yi ∈ R li are the state, input and output vectors of the
i th subsystem respectively; f i ( xi , ui , t ) , h j ( x j , u j , t ) and gi ( xi , t ) are proper
functions, they may be linear or nonlinear; eij is dynamic interconnected coefficient
between subsystem i and j , which is a function with respect to time t and/or state
x ; eij = 0 denotes it hasn’t self-connection in the i th subsystem at i = j . The
variables satisfying
N N N
n = ∑ ni , m = ∑ mi , l = ∑ li ,
i =1 i =1 i =1 (5)
x = [ x1T , x2T , , xTN ]T , u = [u1T , u2T , , uTN ]T , y = [ y1T , y2T , , yTN ]T

here x ∈ R n , u ∈ R m and y ∈ R l the state, input and output vectors of the system
respectively. The matrix form of the system (4) can be described by

S : x = f ( x, u, t ) + E ⋅ h( x, u , t ) ; y = g ( x, t ), (N ≥3) (6)

where E = (eij ) is dynamic adjacency matrix. We have


452 X.-y. Ouyang and X.-b. Chen

f ( x, u , t ) = [ f1 , , f N ]T , h( x, u, t ) = [h1 , , hN ]T , g ( x, u, t ) = [ g1 , , g N ]T . (7)
According to concept of dynamic graph, if we use vertices to denote subsystems and
eij to denote edges, the complex dynamic system can be represented by using a
dynamic graph. In order to decompose the system, we will propose inclusion principle
of dynamic graphs in the next, by which the complex systems can be composed as a
series of pair-wise subsystems.

4 Inclusion Principle of Dynamic Graphs


In fact, a dynamic graph D ( N +1) consists of many subgraphs. If these subgraphs are
pair-wise subgraphs Dij (2 +1) (or denoted by Dij in the context) composed of vertices vi
and v j , we have the following definition [4]:

Definition 5. In its expanded space of graphs, the dynamic graph D ( N +1) is said to be of
N ( N − 1) / 2 pair-wise subgraphs Dij if only the degree d vi of each vertex vi is N ,
that is, at least eij ≠ 0 and/or e ji ≠ 0 . The subscript of Dij is called as a pair sequence.
Definition 5 implies that D ( N +1) can be taken as a multi-overlapping graph of Dij .
This idea of multi-overlapping provides a decomposition method for dynamic graphs.
According to the pair sequence of Dij in Definition 5, we give a special sequence of
the pair-wise subgraphs with recurrent reverse order subscripts.

Definition 6. In its expanded space of graphs, the dynamic graph D ( N +1) is said to be of
N ( N − 1) / 2 pair-wise subgraphs with recurrent reverse order subscripts, if its
pair-wise subgraphs Dij are arranged as

{D12 , D23 , D13 , D34 , D24 , D14 , , Dij , D2 N , D1N .}


(8)
i = j − k ; j = 2, , N ; k = 1, , j − 1.

The special sequence of Dij allows a rearrangement for pair-wise subgraphs when
the last some vertices are disconnected from the dynamic graph D ( N +1) . Furthermore, it
also benefits a reconstruction when some new vertices are added into the dynamic
graph D ( N +1) , the dynamic graph can be expanded by adding vertices.
In order to obtain pair-wise subgraphs from dynamic graphs, let’s introduce
inclusion principle of dynamic graph and recurrent reverse order transform.
Based on the correlative theory in the literature [4-10] and system (4), inclusion
principle of dynamic graph is given in the following:

Theorem 1. The expanded dynamic graph D ( N +1) = (V , E ) includes the dynamic graph
D ( N +1) = (V , E ) , or D ( N +1) ⊃ D ( N +1) , if there exists a pair full rank matrices
Inclusion Principle for Dynamic Graphs 453

{RN × N , S N × N } satisfying SR = I N , such that for any D0( N +1) = (V0 , E0 ) , the conditions
E0 = RE0 imply E (t , E0 ) = S × E (t , E0 ) , V(TN ) = S *V(TN ) for all t ≥ t0 , here degree d vi
of vertex vi is N − 1 .
The relations between D ( N +1) and D ( N +1) can be represented by

E = R * E * S + M E , V(TN ) = R *V(TN ) (9)

where M E is a complementary or coordinated matrix with proper dimension, which


satisfies M E R = 0 [10]; R is a full rank expanding transformation matrix and S is
full rank contracting transformation matrix. One of typical choice of full rank matrices
{RN × N , S N × N } is chosen as following:
T
⎛ N -1 N -1 N -1 ⎞
R = blockdiag ⎜ I n1 I n1 I n1 , I n2 I n2 I n2 , , I nN I nN I nN ⎟
⎜ ⎟
⎝ N ⎠
(10)
1 ⎛ N -1 N -1 ⎞
S= blockdiag ⎜ I n1 I n1 I n1 , , I nN I nN I nN ⎟
N −1 ⎜ ⎟
⎝ N ⎠
The expanded dynamic graph D ( N +1) can be described by
⎡ 0 V( N ) ⎤
D ( N +1) = ⎢ T ⎥ (11)
⎢⎣V( N ) EN × N ⎥⎦

where
⎛ N -1 N -1 N -1

V( N ) = ⎜ v1 v1 v1 , v2 v2 v2 , , vN v N vN ⎟
⎜ ⎟
⎝ N ⎠ (12)
( )
E = Eij
N×N
, i, j = 1, 2, …, N .

here Eij are block matrices with a dimension of (N–1) I ni × (N–1) I n j .


The corresponding complementary matrix M E is uniquely determined by,

M E = ( M ijE ) , i, j=1,2,…, N. (13)


N×N

where, M ijE are also block matrices with a dimension of (N–1) I ni × (N–1) I n j .

5 The Permuted Inclusion Principle of Dynamic Graphs

In order to establish a recurrent reverse order of pair-wise subgraphs Dij in dynamic


graphs D ( N +1) , we associate the inclusion principle of dynamic graphs with the
454 X.-y. Ouyang and X.-b. Chen

permutation. First of all, let us introduce permuted transform before giving the
permuted inclusion principle [4,7,10].
Definition 7. By partitioning an identity matrix I n× n into M sub-identity matrices,
I1 , , Ik , , I M , with proper dimensions, we call

⎡ 0 Ik ⎤
pk ( k +1) = blockdiag ( I1 , , I k −1 , ⎢
0 ⎥⎦ k + 2
,I , , I M ),
⎣ I k +1
(14)
⎡0 I k +1 ⎤
p −1k ( k +1) = blockdiag ( I1 , , I k −1 , ⎢
0 ⎥⎦ k + 2
,I , , IM )
⎣Ik
as basic column exchange matrices and basic row exchange matrices respectively,

P = pi ( i +1) p(i +1)( i + 2) p( j −1) j = Π kj −=1i pk ( k +1) ,


(15)
P −1 = p −1( j −1) j p −1(i +1)( i + 2) p −1i (i +1) = Π kj −=1i p −1k ( k +1) , (i ≥ 1, j ≤ M )

are column group permutation matrices and row group permutation matrices
respectively.
If we want to obtain the special sequence of Dij in Definition 5, the following
transforms can be used:
P = Π iN=1− 2 Π Nj =−1i −1Π kN=(1N+−i (ij )−−1)i ( j +1) pk ( k +1) ,
(16)
P −1 = Π iN=1− 2 Π Nj =−1i −1Π kN=(1N+−i ( ij −) −1)i ( j +1) pkT( k +1)

Here permutation matrices P and P −1 represent nonsingular column


transformation and nonsingular row transformation respectively.
According to the above stated, we give the following theorem:

Theorem 2. The expanded dynamic graph DP ( N +1) = (VP , EP ) permuted from


D ( N +1) = (V , E ) includes the dynamic graph D ( N +1) = (V , E ) , or DP ( N +1) ⊃ D ( N +1) , if
there exists a pair full rank matrices {RP , S P } satisfying S P RP = I N , such that for any
D0( N +1) = (V0 , E0 ) , the conditions EP 0 = RP E0 imply E (t , E0 ) = S P × EP (t , EP 0 ) ,
V(TN ) P = S P *V(TN ) P for all t ≥ t0 , here degree d vi of vertex vi is N − 1 .
The expanded and permuted matrices of DP ( N +1) can be obtained by
EP = RP * E * S P + M EP , VP = RP *V ,
(17)
RP = P −1 R, S P = SP, M EP = P −1 M E P
After pair-wise decomposition, each pair-wise subgraph has four organically basic
communicated modes: full connection, two half connections and disconnection.
Definition 8. Taking an augmented adjacency matrix of a pair of subgraph Dij , there
are four basic communicated modes described as following:
Inclusion Principle for Dynamic Graphs 455

(a) If eij ≠ 0 , e ji ≠ 0 , it represents a bidirectional connection, denoted by Dij , and


both vertices vi and v j act on each other;
(b) If eij ≠ 0 , e ji = 0 , it represents a unidirectional connection, denoted by Dij , and
the vertex v j acts on vi ;
(c) If eij = 0 , e ji ≠ 0 , it also represents a unidirectional connection, denoted by
Dij , and the vertex vi acts on v j ;
(d) If eij = 0 , e ji = 0 , it represents a disconnection, and there is no relationship
between the vertex v j and vi .
By using permuted inclusion principle of dynamic graph, a dynamic graph D ( N +1)
can be decomposed as a series of pair-wise decoupling subgraphs in its expanded graph
space, it can be described by as following

D ( N +1) ={ Dij } ∪ { Dij , Dij }, i = j − k ; j = 2,3, , N ; k = 1, , j −1 (18)

It implies that there are N ( N − 1) / 2 pair-wise bidirectional connection subgraphs


with recurrent reverse order subscripts and N ( N − 1) pair-wise unidirectional
connection subgraphs in the expanded space of graph. If we extract those bidirectional
connection subgraphs and ignore all other subgraphs, we can design decentralized
control scheme, which provides us a method to research complex systems.

6 Conclusions
The paper proposed a decomposition method for dynamic graphs based on permuted
inclusion principle, which can be used to decompose complex systems. It provided us a
theoretic framework to research decentralized control or coordination for complex
systems and complex network.

Acknowledgment
This research reported herein was supported by the NSF of China under grant No.
60874017.

References
[1] Zhang, Z.D., Jia, L.M., Chai, Y.Y.: On General Control Methodology for Complex
Systems. In: Proceedings of the 27th Chinese Control Conference, Kunming,Yunnan,
China, pp. 504–508 (2008)
[2] Ouyang, X.Y., Chen, X.B., Wang, W.: Modeling and decomposition of complex dynamic
interconnected systems. In: The 13th IFAC Symposium on Information Control Problems
in Manufacturing, Moscow, Russia, pp. 1006–1011 (2009)
456 X.-y. Ouyang and X.-b. Chen

[3] Šiljak, D.D.: Dynamic graphs. Nonlinear Analysis: Hybrid Systems 2, 544–567 (2008)
[4] Chen, X.B., Stankovic, S.S.: Decomposition and decentralized control of systems with
multi-overlapping structure. Automatica 41, 1765–1772 (2005)
[5] Chen, X.B., Stankovic, S.S.: Stankovic: Dual inclusion principle for overlapping
interconnected systems. I. J. Control 77(13), 1212–1222 (2004)
[6] Ikeda, M., Šiljak, D.D., White, D.E.: Decentralized control with overlapping information
sets. J. Optimization Theory and Applications. 34(2), 279–310 (1981)
[7] Chen, X.-B., Stankovic, S.S.: Overlapping decentralized approach to automation
generation control of multi-area power systems. I.J. Control 80(3), 386–402 (2007)
[8] Chen, X., Stankovic, S.S.: Inclusion principle of stochastic discrete-time systems. Acta
Automatica Sinica 23(1), 94–98 (1997)
[9] Ikeda, M., Šiljak, D.D.: Lotka-Volterra Equations: Decomposition, Stability, and
Structrue. Journal of Mathematical Biology 9(1), 65–83 (1980)
[10] Chen, X.B., Xu, W.B., Huang, T.Y., Ouyang, X.Y., Stankovic, S.S.: Pair-wise
decomposition for coordinated control of complex systems. Submitted to Information
Sciences (2010)
Lie Triple Derivations for the Parabolic
Subalgebras of gl(n, R)

Jing Zhao1, , Hailing Li2 , and Lijing Fang1


1
College of Mathematics and Computer Science,
Guangxi University for Nationalities, Nanning, 530006, Guangxi, China
2
School of Science, Shandong University of Technology,
Zibo, 255049, Shandong, China
{zhaojing2009,lihailing550}@yahoo.cn, [email protected]

Abstract. Let R be a commutative ring with identity, gl(n, R) the gen-


eral linear Lie algebra over R, P a parabolic subalgebra of gl(n, R). In
this paper, we give an explicit description of Lie triple derivations for the
parabolic subalgebra P of gl(n, R).

Keywords: Parabolic subalgebra, Lie triple derivation, Commutative ring.

1 Introduction

Let R be a commutative ring with identity, gl(n, R) the general linear Lie algebra
consisting of all n × n matrices over R and with the bracket operation [x, y] =
xy − yx. We denote by E the identity matrix in gl(n, R), Ei,j the matrix in
gl(n, R) whose sole nonzero entry 1 is in the (i, j) position, t the subset of
gl(n, R) consisting of all n × n upper triangular matrices over R, and d the
subset of gl(n, R) consisting of all n × n diagonal matrices over R.
Recently, significant work has been done in studying the derivations of general
linear Lie algebra and its subalgebras, such as the subalgebra consisting of all
n × n upper triangular matrices (see [1,2,3]), the parabolic subalgebra (see [4])
and Lie triple derivations of nest algebras and TUHF algebras (see [5,6,7]). In
[8], Li and Wang described the generalized Lie triple derivations for the maximal
nilpotent subalgebras of classical complex simple Lie algebras.
The purpose of this paper is to describe any Lie triple derivation for the
parabolic subalgebra P of gl(n, R). The main result can be summarized as
follows.

Theorem 1. Let R be a commutative ring with identity, P = t+ Aj,i Ej,i
1≤i<j≤n
a parabolic subalgebra of gl(n, R) with Φ = {Aj,i ∈ I(R)|1 ≤ i < j ≤ n} a flag of
ideals of R. Then every Lie triple derivation φ of P can be uniquely written as
follows.

This work is supported by a grant-in-aid for Innovation Fund from Guangxi Univer-
sity for Nationalities, China.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 457–464, 2011.

c Springer-Verlag Berlin Heidelberg 2011
458 J. Zhao, H. Li, and L. Fang

(1) n ≥ 3, φ = adx + ηχ + ρπ , where adx, ηχ , ρπ are inner, central and extreme


triple derivation, respectively.
(2) n = 2, φ = adx + ηχ + φ(σ1 , σ2 ), where adx, ηχ , φ(σ1 , σ2 ) are inner, central
and permutation triple derivation, respectively.

2 The Standard Triple Derivations of P



Let P = t + Aj,i Ej,i be a fixed parabolic subalgebra of gl(n, R) with
1≤i<j≤n
Φ = {Aj,i ∈ I(R)|1 ≤ i < j ≤ n} a flag of ideals of R. The standard triple
derivations of P is given as follows.
1. Inner triple derivations
Let x ∈ P , then adx : P → P, y 
→ [x, y] is a derivation of P , obviously, it is
a triple derivation of P .
2. Central triple derivations
For 1 ≤ i < j ≤ n, let Bj,i be the annihilator of Aj,i in R, Bj,i = {r ∈
R|rAj,i = 0}.

Definition 1. Let χ : d → R be a homomorphism of R-modules. We call χ


suitable for central triple derivations, if

χ(Ei,i − Ej,j ) ∈ Bj,i , for all 1 ≤ i < j ≤ n. (1)

Lemma 1. Define ηχ : P → P by

ηχ (x) = χ(Dx )E + (j − i + 1(mod2))rai,j Ei,j ,
1≤i,j≤n

n
where Dx denotes the projection of x to d(Dx = ai,i Ei,i , when x =
 i=1
ai,j Ei,j ) and 2r = 0, r ∈ R. Then ηχ is a Lie triple derivation,
1≤i,j≤n
provided that χ is suitable for central triple derivations.

Proof. Let
  
x= ai,j Ei,j , y = bi,j Ei,j , z = ci,j Ei,j , (2)
1≤i,j≤n 1≤i,j≤n 1≤i,j≤n

where all ai,j , bi,j , ci,j lie in R and aq,p , bq,p , cq,p ∈ Aq,p for 1 ≤ p < q ≤ n.
Note that 
[x, y], z] = di,j Ei,j , (3)
1≤i,j≤n

where

dij = (ai,r br,k ck,j − bi,r ar,k ck,j − ci,k ak,r br,j + ci,k bk,r ar,j ). (4)
1≤r,k≤n
Lie Triple Derivations for the Parabolic Subalgebras of gl(n, R) 459

For fixed r, k, we have


(r,k) (r,k) (r,i)
di,i = 0, for k = i, di,i + dk,k = 0, for k = i. (5)

Since χ is suitable for central triple derivations and Φ = {Aj,i ∈ I(R)|1 ≤


i < j ≤ n} a flag of ideals of R, by (5), we have that

(r,k) (r,i) (r,k)


di,i χ(Ei,i ) + dk,k χ(Ek,k ) = di,i χ(Ei,i − Ek,k ) = 0. (6)

Let ⎛ ⎞
(r,1) (r,2) (r,n)
d1,1 d1,1 · · · d1,1
⎜ (r,1) (r,2) (r,n) ⎟
⎜ d2,2 d2,2 · · · d2,2 ⎟

Bij = (bi,j ) = ⎜ . ⎟
.. .. ⎟ ,
⎝ .. . . ⎠
(r,1) (r,2) (r,n)
dn,n dn,n · · · dn,n
using (5), we have that bi,i = 0 for i = 1, 2, . . . , n and bi,j + bj,i = 0 for
1 ≤ i < j ≤ n. Then
n
 
di,i χ(Ei,i ) = bi,j χ(Ei,i − Ej,j ) = 0 (7)
i=1 1≤i<j≤n
1≤r≤n

and 
ηχ ([[x, y], z]) = (j − i + 1(mod2))rdi,j Ei,j , (8)
1≤i,j≤n

which equals [[ηχ (x), y], z] + [[x, ηχ (y)], z] + [[x, y], ηχ (z)], thus ηχ is a Lie
triple derivation.


3. Extreme triple derivations

Definition 2. Let n ≥ 3, π : An,1 → R be a homomorphism of R−modules.


We call π suitable for extreme triple derivations, if the following conditions
are satisfied

n−1

n
(a) π(An,1 ) ⊆ ( Bn,i ) ( Bi,1 ),
i=1 i=2
(b) π(An,j Aj,1 ) = 0 for j = 2, 3, . . . , n − 1.

Lemma 2. Define ρπ : P → P by

ρπ ( ai,j Ei,j ) = π(an,1 )E1,n .
1≤i,j≤n

Then ρπ is a Lie triple derivation, provided that π is suitable for extreme


triple derivations.
460 J. Zhao, H. Li, and L. Fang

Proof. Let
  
x= ai,j Ei,j , y = bi,j Ei,j , z = ci,j Ei,j , (9)
1≤i,j≤n 1≤i,j≤n 1≤i,j≤n

where all ai,j , bi,j , ci,j lie in R and aq,p , bq,p , cq,p ∈ Aq,p for 1 ≤ q < p ≤ n.
Note that 
[[x, y], z] = di,j Ei,j , (10)
1≤i,j≤n

where

dij = (ai,r br,k ck,j − bi,r ar,k ck,j − ci,k ak,r br,j + ci,k bk,r ar,j ). (11)
1≤r,k≤n

It is not difficult to check that ρπ is a Lie triple derivation.



4. Permutation triple derivations
Definition 3. Let n = 2, σ1 : R → A2,1 , σ2 : A2,1 → R be two homo-
morphisms of R−modules. We call (σ1 , σ2 ) suitable for permutation triple
derivations if 2σ1 (R) = 2A2,1 σ2 (A2,1 ) = 0.
Lemma 3. Define φ(σ1 , σ2 ) : P → P by

ab 0 σ2 (c)
→ .
cd σ1 (b) 0
Then φ(σ1 , σ2 ) is a Lie triple derivation, provided that (σ1 , σ2 ) is suitable
for permutation triple derivations.
Proof. Let

a1 b 1 a2 b 2 a3 b 3
x= ,y = ,z = ∈ P. (12)
c1 d1 c2 d2 c3 d3
Note that
a4 b 4
[[x, y], z] = , (13)
c4 d4
where
a4 = a1 b2 c3 + b1 d2 c3 − a2 b1 c3 − b2 d1 c3 − b3 c1 a2 − b3 d1 c2 + b3 c2 a1 + b3 d2 c1 ,
b4 = b1 c2 b3 − b2 c1 b3 + a1 b2 d3 + b1 d2 d3 − a2 b1 d3 − b2 d1 d3 − a3 a1 b2 − a3 b1 d2
+a3 a2 b1 + a3 b2 d1 − b3 c1 b2 + b3 c2 b1 ,
c4 = c1 a2 a3 + d1 c2 a3 − c2 a1 a3 − d2 c1 a3 + c1 b2 c3 − c2 b1 c3 − c3 b1 c2 + c3 b2 c1
−d3 c1 a2 − d3 d1 c2 + d3 c2 a1 + d3 d2 c1 ,
d4 = c1 a2 b3 + d1 c2 b3 − c2 a1 b3 − d2 c1 b3 − c3 a1 b2 − c3 b1 d2 + c3 a2 b1 + c3 b2 d1 .
It is easy to show that
φ(σ1 , σ2 )([[x, y], z]) = [[φ(σ1 , σ2 )(x), y], z]+[[x, φ(σ1 , σ2 )(y)], z]+[[x, y], φ(σ1 , σ2 )(z)].

Hence φ(σ1 , σ2 ) is a Lie triple derivation.



Lie Triple Derivations for the Parabolic Subalgebras of gl(n, R) 461

3 Proof of Theorem 1
In this section, we will give the proof of our main result.
Proof. If n = 1, then it is easy to determine the Lie triple derivations of P . From
now on, we suppose that n > 1. Let φ be any Lie triple derivation of P .
Firstly, we show that there exists some x0 ∈ P such that (φ − adx0 )(d) ⊆ d.
 (i) (i) (i)
Suppose that φ(Ei,i ) = αp,q Ep,q with all αp,q ∈ R, and αl,k ∈ Al,k , 1 ≤
1≤p,q≤n
k < l ≤ n. By applying φ on the two sides of
0 = [[E1,1 , Ej,j ], Ek,k ], (14)
(1) (1)
we can obtain that αk,j = αj,k = 0 for k  = j, k = 1 and j = 1. Choose

n
(1) (1)
x1 = (αi,1 Ei,1 −α1,i E1,i ), then φ−adx1 sends E1,1 to d. Write (φ−adx1 )(Ei,i )
i=2
 (i) (i) (i)
in the form βp,q Ep,q with all βp,q ∈ R, βl,k ∈ Al,k , 1 ≤ k < l ≤ n. By
1≤p,q≤n
applying φ − adx1 on the two sides of
0 = [E1,1 , [E1,1 , E2,2 ]], 0 = [[E2,2 , Ej,j ], Ek,k ]], (15)
we can obtain that
(2) (2) (2) (2)
β1,2 = β2,1 = 0, βk,j = βj,k = 0, for k = j, k = 2 and j = 2. (16)

n
(2) (2)
Choose x2 = (βi,2 Ei,2 − β2,i E2,i ), then φ − adx1 − adx2 sends E2,2 to d
i=3

k−1
(also sends E1,1 to d). Generally, suppose that φ − adxi sends E1,1 , E2,2 , . . . ,
i=1
Ek−1,k−1 to d, respectively. Suppose that

k−1  (i) (i) (i)
(φ − adxi )(Ei,i ) = γp,q Ep,q with all γp,q ∈ R, γl,j ∈ Al,j , 1 ≤ j < l ≤ n.
i=1 1≤p,q≤n
(17)

k−1
By applying φ − adxi on the two sides of
i=1

0 = [El,l , [El,l , Ek,k ]], l = 1, 2, . . . , k − 1, 0 = [[Ek,k , Ei,i ], Ej,j ], (18)


(k) (k) (k) (k)
we obtain that γl,k = γk,l = 0 and γi,j = γj,i = 0 for i = j, i = k and j = k,
n
(k) (k) 
k
respectively. Choose xk = (γi,k Ei,k − γk,i Ek,i ), then φ − adxi sends
i=k+1 i=1
Ek,k to d (also sends E1,1 , E2,2 , . . . , Ek−1,k−1 to d, respectively).

n−1
By induction, we can choose x1 , x2 , . . . , xn−1 such that φ − adxi sends all
i=1

n−1
E1,1 , E2,2 , . . . , En,n to d. Denote xi by x0 , then φ−adx0 sends d to d. Denote
i=1
by φ − adx0 by φ1 .
462 J. Zhao, H. Li, and L. Fang

Secondly, we prove that for any 1 ≤ i < j ≤ n, Aj,i Ej,i + REi,j is stable under
the action of φ1 .
For fixed i < j and aj,i ∈ Aj,i . By applying φ1 on the two sides of

aj,i Ej,i = [Ej,j , [Ej,j , aj,i Ej,i ]], aj,i Ej,i = [Ei,i , [Ei,i , aj,i Ej,i ]], (19)

we get that bp,q = 0 for p 


= j and q  = j and bp,q = 0 for p = i and q = i,
respectively. Hence
φ1 (aj,i Ej,i ) ⊆ Aj,i Ej,i + REi,j . (20)
For the same reason, we can obtain that

φ1 (Ei,j ) ⊆ Aj,i Ej,i + REi,j . (21)

For any 1 ≤ i ≤ n − 1, suppose that

φ1 (Ei,i+1 ) = ri Ei,i+1 + si Ei+1,i (22)

with ri ∈ R, si ∈ Ai+1,i .
Let
n−1

y0 = diag(0, r1 , r1 + r2 , . . . , ri ) (23)
i=1

and denote φ1 + ady0 by φ2 , then

φ2 (Ei,i+1 ) = si Ei+1,i ∈ Ai+1,i Ei+1,i (24)

and φ2 (d) ⊆ d. Now, we discuss φ2 in the following cases.


Case 1. n ≥ 3
Firstly, we consider the action of φ2 on d. By applying φ2 on the two sides of

0 = [[E1,2 , E1,3 ], E3,3 ], 0 = [[Ei,i+1 , Ei−1 , i + 1], Ei,i+1 ], 2 ≤ i ≤ n − 1, (25)

we obtain that si = 0, 1 ≤ i ≤ n − 1. Then φ2 (Ei,i+1 ) = 0, i = 1, 2, . . . , n − 1. It


is easy to see that φ2 (Ei,j ) = 0, if 1 ≤ i < j ≤ n and j − i is odd. Suppose that

n
(i)
φ2 (Ei,i ) = δp,p Ep,p . By applying φ2 on the two sides of
p=1

0 = [Ei,i , [Ej,j , Ej,k ]], 1 ≤ j < k ≤ n, j = i, k = i, (26)

Ei,i+1 = [Ei,i , [Ei,i , Ei,i+1 ]], 1 ≤ i ≤ n − 1, (27)


En−1,n = [En,n , [En,n , En−1,n ]], (28)
(i) (i) (i) (i) (n) (n)
we obtain that δj,j = δk,k , 2(δi,i − δi+1,i+1 ) = 0 and 2(δn−1,n−1 − δn,n ) = 0,
(i) (i)
respectively. Put δj,j = ui for j = i and δi,i − ui = vi , we have that φ2 (Ei,i ) =
ui E + vi Ei,i with 2vi = 0. For 1 ≤ i < j ≤ n, by applying φ2 on the two sides of
Lie Triple Derivations for the Parabolic Subalgebras of gl(n, R) 463

Ei,j = [[Ei,i , Ei,j ], Ej,j ], (29)


then we obtain that vi = vj . Put vi = vj = v. Thus φ2 (Ei,i ) = ui E + vEi,i .
Secondly, we consider the action of φ2 on Aj,i Ej,i + REi,j for 1 ≤ i < j ≤ n.
By applying φ2 on the two sides of

Ei,j = [Ei,i , [Ei,j−1 , Ej−1,j ]], 1 ≤ i < j − 1 ≤ n − 1, (30)

we obtain that φ2 (Ei,j ) = (j − i + 1(mod2))vEi,j .


The same methods can be used to get that

φ2 (Ei,j ) = (j − i + 1(mod2))vEi,j , (31)

φ2 (aj,i Ej,i ) = (j − i + 1(mod2))vaj,i Ej,i , i = 1andj = n, (32)


φ2 (an,1 En,1 ) ⊆ n(mod2))van,1 En,1 + RE1,n , (33)
for 1 ≤ i ≤ j ≤ n.
Let χ : d → R be a homomorphism of R−modules defined by χ(Ei,i ) =
ui ,i = 1, 2, . . . , n. Then χ is suitable for central
 triple derivations. For x =
ai,j Ei,j , assume ηχ (x) = χ(Dx )E + (j − i + 1(mod2))vai,j Ei,j ,
1≤i,j≤n 1≤i,j≤n
n
where Dx denotes the projection of x to d(Dx = ai,i Ei,i ). Denote φ2 − ηχ by
i=1
φ3 . Then
φ3 (t) = 0, φ3 (P ) = φ3 (An,1 En,1 ) ⊆ RE1,n . (34)
Define π : An,1 → R such that φ3 (an,1 En,1 ) = π(an,1 )E1,n . It is obvious that π
is a homomorphism of R−modules. We will prove that π is suitable for extreme
triple derivations.
We can use the similar methods as in lemma 2 to prove that π is a homo-
morphism of R−modules and suitable for extreme triple derivations. Using π we
define an extreme triple derivation ρπ of P by

ρπ ( ai,j Ei,j ) = π(an,1 )E1,n .
1≤i,j≤n

Thus φ3 = ρπ is a extreme triple derivation and φ = adx0 − ady0 + ηχ + ρπ .


It is now easily verified that χ1 = χ2 , v1 = v2 .
Case 2. n = 2
As before, we know that


2
φ2 (Ei,i ) = (i)
δp,p Ep,p ∈ d, φ2 (E1,2 ) = s1 E2,1 . (35)
p=1

By applying φ2 on the two sides of

E1,2 = [E1,1 , [E1,1 , E1,2 ]], E1,2 = [E2,2 , [E2,2 , E1,2 ]], E1,2 = [[E1,1 , E1,2 ], E2,2 ],
(36)
464 J. Zhao, H. Li, and L. Fang

we obtain that
(1) (1) (2) (2) (1) (1) (2) (2)
2(δ1,1 − δ2,2 ) = 0, 2(δ1,1 − δ2,2 ) = 0, δ1,1 − δ2,2 = δ1,1 − δ2,2 , (37)
respectively. Put
(1) (1) (2) (2)
δ1,1 − δ2,2 = δ1,1 − δ2,2 = v, (38)
(i)
then φ2 (Ei,i ) = δi,i E + vEi,i . By applying φ2 on the two sides of
a2,1 (E1,1 − E2,2 ) = [[E1,1 , E1,2 ], a2,1 E2,1 ], (39)
we get that
(1) (2)
a2,1 (δ1,1 − δ2,2 ) = 0, φ2 (a2,1 E2,1 ) ⊆ RE1,2 . (40)
Let χ : d → R be a homomorphism of R−modules defined by χ(Ei,i ) =
(i)
δi,i , i = 1, 2. Then χ is suitable for central triple derivations. For
x= ai,j Ei,j , assume that
1≤i,j≤2

ηχ (x) = χ(Dx )E + (j − i + 1(mod2))vai,j Ei,j , (41)
1≤i,j≤2


2
where Dx denotes the projection of x to d(Dx = ai,i Ei,i ). Denote φ2 − ηχ by
i=1
φ3 . Then
φ3 (d) = 0, φ3 (E1,2 ) ⊆ A2,1 E2,1 , φ3 (A2,1 E2,1 ) ⊆ RE1,2 . (42)
Define two homomorphisms of R−modules σ1 : R → A2,1 , σ2 : A2,1 → R such
that
φ3 (a1,2 E1,2 ) = σ1 (a1,2 )E2,1 , φ3 (a2,1 E2,1 ) = σ2 (a2,1 )E1,2 .
Similar to case 1, we can prove that (σ1 , σ2 ) is suitable for permutation triple
derivation and φ = adx0 − ady0 + ηχ + φ(σ1 , σ2 ).


References
1. Cao, Y.: Automorphsims of certain Lie algebras of upper triangular matrices over a
commutative ring. J. Algebra. 189, 506–513 (1997)
2. Jondrup, S.: Automorphsims and derivations of upper triangular matrix rings. Lin-
ear Algebra Appl. 221, 205–218 (1995)
3. Ou, S., Wang, D., Yao, R.: Derivations of the Lie algebra of strictly upper triangular
matrices over a commutative ring. Linear Algebra Appl. 424, 378–383 (2007)
4. Wang, D., Yu, Q.: Derivations of the parabolic subalgebras of the general linear Lie
algebra over a commutative ring. Linear Algebra Appl. 418, 763–774 (2006)
5. Ji, P., Wang, L.: Lie triple derivations of TUHF algebras. Linear Algebra Appl. 403,
399–408 (2005)
6. Zhang, J., Wu, B., Cao, H.: Lie triple derivations of nest algebras. Linear Algebra
Appl. 416, 559–567 (2006)
7. Wang, H., Li, Q.: Lie triple derivations of the Lie algebra of strictly upper triangular
matrices over a commutative ring. Linear Algebra Appl. 430, 66–77 (2009)
8. Li, H., Wang, Y.: Generalized Lie triple derivations. Linear Multi Algebra 59, 237–
247 (2011)
Non-contact Icing Detection on Helicopter and
Experiments Research

Jie Zhang, Lingyan Li, Wei Chen, and Hong Zhang

Huazhong University of Science & Technology, Wuhan, Hubei, China

Abstract. The paper puts forward a new non-contact icing detection approach. In
this approach, an infra-red laser directly radiates on the frozen surface. Because
there are great differences between the absorption rate on the ice and the detected
surface, the energy reflected using the photoelectric detector was totally
different. The received energy is disposed by the signal process circuit, so that
the icing information along the chord of the rotor can be achieved. The method is
validated by the experiments using the infrared laser with the wavelength of
1450nm, the influence of the flapping and torque movement of the rotor to the
signal amplitude is discussed, and the corresponding measures to reduce
the influence of the flapping and torque movement of the rotor is put forward to
the icing detection system according to the changing rule of the signal amplitude.

Keywords: icing detection, helicopter, wave, bending moment.

1 Introduction

The helicopter icing refers to the phenomenon that the helicopter body surface
accumulate ice when it is in flight in the atmosphere, The helicopter body surface icing
has a serious impact on the flight safety, especially it will decline the helicopter lift
coefficients, increase the resistance coefficient, increase the flying oil consumption,
obstruct the instrumental display of the flight hydrostatic system, and even can cause
helicopter’s vibration, affect the stability and manipulative capability seriously, and
reduce the flight safety, even worse, it can cause seriously destroyed accidents[1]. To
ensure the helicopter’s flight safety in icy conditions, the icing detection to the key
parts of a helicopter (especially the rotor system) is badly needed, cooperating with the
deicing device on the helicopter, it can reduce the possibility of helicopter crash in icy
weather conditions.
The helicopter rotor is a moving part, the traditional contact icing detection methods
will damage its structure, normally, and there is no redundant electric ring contacts
provided for icing detection system, so it’s difficult to transfer the signal. In allusion to
the characteristics of helicopter rotor and actual needs, this paper puts forward a new
non-contact icing detection method, it does not need to change the original structure of
the rotor surface, it’s easy to install and can solve the problem of real-time icing
detection on the helicopter rotor.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 465–473, 2011.
© Springer-Verlag Berlin Heidelberg 2011
466 J. Zhang et al.

2 System Scheme Design

2.1 Principle of the Infrared Non-contact Icing Detection

The principle of the infrared non-contact icing detection is as follows: The infrared
laser is modulated by a fixed-frequency square wave to emit energy to the under surface
of the rotor, besides the absorbed energy by the rotor, others is reflected, part of the
reflected energy arrives at the infrared detector through the optical fiber system. The
infrared absorption rate of ice is greater than that of the rotor at the special wavelength,
so if the rotor surface is icing, the reflected energy received by the detector which will
obviously reduce, is converted, processed and analyzed by the signal processing circuit
to judge whether the rotor is icing or not. The icing distribution on the rotor surface can
be calculated through this successive detection.

2.2 System Design

Due to the 16 fibers, whose diameter is only 1mm, we use as the probe light receive
device, the received light intensity is so weak that the electrical signal converted by
photoelectric detector is also weak, the signal processing circuit adopts the
synchronous integrator to improve the signal-to-noise ratio. According to the dynamic
features of the helicopter rotor and propagation characteristics of the infrared radiation,
the system adopts a double-threshold comparator to judge the crossing time of the icing
and the non-icing area of rotor, using the widen circuit, DSP can analyze and calculate
the icing distribution on the rotor surface. The system frame based on infrared laser
non-contact icing detection is shown in Fig.1. The whole detection system consists of
optical system, signal processing circuit and DSP microprocessor.

Fig. 1. Non-contact icing detection system diagram

In Fig.1, square wave generator generates the square wave of 100KHZ, on one hand,
it’s used to drive the infrared lasers, on the other hand, as the reference signal of the
synchronous integrator. The infrared laser emits laser beam to the rotor, the reflected
light is transmitted to photoelectric detector through the optical fibers, the optical signal
is converted to a current signal, the preamplifier converts the current signal to a
corresponding voltage signal, the programmable gain amplifier and the amplifier
Non-contact Icing Detection on Helicopter and Experiments Research 467

further amplifies the signal, the synchronous integrator improves the signal-to-noise
ratio, so that the signal can be strengthen. The output signal of the synchronous
integrator will be processed by double gate voltage comparator to judge whether there
is icing on the rotor surface. Due to the high modulated frequency, it is too difficult to
realize data processing directly by using a pulse signal to trigger DSP interrupt signal,
and there is not enough time to deal with the data within a pulse width, we adopt
dual-channel pulse widen circuit to solve this problem: when crossing icy area of the
rotor above the detector, the output of the high gate voltage comparator widen circuit
keeps low level, the low gate voltage comparator widen circuit maintains high level,
when crossing the icy area above the detector, the output of the dual-channel widen
circuit all maintain high level. The two output signals are used for DSP interrupt clock,
in this way, the system can judge the icing condition, and realize the icing alarm, and
carry on the deicing control system.
There are many key parts in the design of the signal processing circuit, which are
preamplifier, synchronous integrator, threshold comparator, and expanding circuit.
(1) Preamplifier: The selection and design of the preamplifier circuit is significant, its
performance directly affects all the subsequent signal processing circuit. The
requirement of the preamplifier is ultra-low drift, high gain, wide frequency band
and high speed, this system adopts the AD8065 amplifier. In order to improve the
signal amplitude as much as possible, we should design the feedback resistor with
high value, but too large feedback resistance will narrow the frequency band,
resulting in distortion of the output signal of the preamplifier, the value of the
feedback resistor in this system is 15M. We add a 5pF feedback capacitor to
compensate for the phase to eliminate the oscillation caused by the feedback
resistance with high value.
(2) Synchronous integrator: The high-speed analog electronic switch DG419
alternately charge C158, C159 with integration time constant of RW107C158 and
RW107C159 under the control of PP1, which is a synchronous pulse to the
reference signal. After a number of circles of synchronous alternating charge, the
special frequency signals are strengthened from strong noise by accumulating and
averaging for times. The synchronous integrator use cross-correlation detection:
the input signal is supposed to be: f1 (t ) = s1 (t ) + n(t ) , and the reference signal is
supposed to be: f 2 (t ) = s2 (t ) ,
1 T

so the cross-correlative function is:


R12 (τ ) = lim
T →∞ 2T ∫−T 1
f (t )f 2 (t − τ )dt

1 T T
= lim [ ∫ s1 (t ) s2 (t − τ )dt + ∫ n(t ) s2 (t − τ ) dt ]
T →∞ 2T −T −T

= Rs1s2 (τ ) + Rns2 (τ ) = Rs1s2 (τ )


In mentioned formula, Rns2 =0, because the signal and noise are irrelevant random
items. According to the conclusion above, after cross-correlation operation, there is no
noise, but the correlative coefficient of the signal and the reference signal is remained,
thus, the synchronous integrator can improve the signal-to-noise ratio efficiently.
468 J. Zhang et al.

Fig. 2. Circuit of the synchronous integrator

The design of the synchronous integrator requires both high SNR and rapidity. It
does a great influence to the whole system to set the integration time constant. If the
time constant is too long, although it can suppress stronger noise very well, the cost of it
is that it will spend more time on measuring. At the same time, too large integration
time constant will smooth the fast signal, and affect the measure accuracy of the
system. Therefore, we get the appropriate integration time constant by the circuit
simulation and experiments, so that it can not only meet the need of improving
signal-to-noise ratio, but also it can meet the dynamic demand of the detected rotor. In
this system, the largest integration time constant of sample is t= 2RW107C158 =0.02s.
3) Dual-gate comparator: The system adopts a dual-channel comparator LM119. It’s
supposed that the output amplitude of the synchronous integrator is V1 when the
infrared laser is not on the rotor or ice; the output amplitude of the synchronous
integrator is V2 when the infrared laser is on the ice area; the output amplitude of the
synchronous integrator is V3 when the infrared laser is on the rotor; the low threshold
voltage VL is set to be V2 > VL > V1 , the high threshold voltage VH is set to be
V3 > VH > V2 . So it can make sure that when the rotor is not icing, both the comparators
output pulse signals; when the rotor is icing, only the low gate voltage comparator
outputs pulse signals, and the high one outputs low level. When there is only one icing
area on the rotor, the output waveform of the double-gate comparator in two states of
icing and no-icing is as shown in Fig.3, t1 represents the time when the infrared laser is
on the rotor, t2 represents the time when the infrared laser is not on the rotor or the ice
Non-contact Icing Detection on Helicopter and Experiments Research 469

Fig. 3. The output of the threshold comparator

area, t3 represents the time when the infrared laser is on the ice area. Reasonably set the
two thresholds, the system will detect the position and the scope of the icing area on any
rotors.
(4) Pulse widen circuit: The system adopts a dual retriggerable-resettable monostable
multivibrator HEF4528BP to realize the function of pulse widen. The circuit outputs a
high level when there are pulse signal, otherwise it outputs a low level.
(5) DSP digital signal processing circuit can determine the icing scope and icing
position by comparing the output signal of the HEF4528BP, whose output is
corresponding to the two threshold comparators.

3 Experimental Design and Verification

3.1 Experimental Design and Devices

In our designed measurement system, the amplitude of the synchronous integrator


output signal is the most direct expression of the received light energy, so it is
considered to be a parameter to measure whether there is ice existing or not.
The experiments have two purposes: for one thing, as ice has different absorption
rate at different infrared wavelengths, we need to choose the appropriate infrared laser
source as the detection source, whose wavelength should have a stronger absorption
rate, by comparative experiments of two wavelengths of 940nm and 1450nm. For the
other, the helicopter rotor will start to wave affected by airflow during the flight; the
change of the postures of helicopter is controlled by rotor’s bending moment. Waving
and bending moment both will cause the change of angle and distance between infrared
laser and rotor, thus, the signal amplitude will changes, also it will affect the icing
judgment. Waving and bending moment are the main disturbance factors in this
measure system. We simulate the process of waving and bending moment in the
experiments to find the changing rule and range of the signal amplitude. Reasonably set
the two thresholds will improve the system’s dynamic range and enhance the ability of
rotor to overcome the influence caused by flapping and bending moment.
Experiment bench and test program is shown in Fig 4. Adjust the position and angle
of the detector to make the narrow rectangle laser spot, whose width is 2mm and length
470 J. Zhang et al.

Fig. 4. Experiment bench and test program

is 10mm, parallel to the rotor’s span direction. In order to study the influence of
different thickness and types of ice to the signal amplitude, the spray console can set
each spray time to control the thickness of ice and frost; the frozen well can set the
temperature to simulate clear ice and rime ice. The lifting platform and the angle frame
are used to realize the changes of the distance and the angles between the laser source
and the rotor.

3.2 Contrast Test between Two Infrared Lasers with Different Wavelength

The two laser with 940nm and 1450nm are separately tested at the same height and
incident angle, The temperature of the frozen well is set to be -15°C, and control the
spray time, we record the output amplitude of the synchronous integrator, the
amplitudes are normalized and shown in fig 5.

Fig. 5. Two infrared wavelengths freezing test curve


Non-contact Icing Detection on Helicopter and Experiments Research 471

The Fig.5 demonstrates the ability to absorb the infrared energy at the wavelength of
1450nm is far greater than that of 940nm, so the output amplitude of the synchronous
integrator will have significant difference when the rotor freezes or not, this is helpful
for the icing recognition and measurement. In reference [4], the absorption rate of the
single crystalloid ice at 1450nm are much greater than 940nm, but this just consider the
ice crystalloid internal absorption characteristics, in this system, the output amplitude
of the synchronous integrator is not only determined by the absorb rate, but also
influenced by the superficial character of the ice and the rotor, Therefore, the system
will adopt the laser source with the wavelength of 1450nm.

3.3 Experiment for Waving of the Rotor

The rotor’s waving will change the height and angle along the span direction, the height
and angle are changed using special equipments to examine the influence of the change
to the signal amplitude.

3.3.1 Influence of the Waving Height Changes to the Signal Amplitude


The lifting platform can precisely control the distance between laser source and the
rotor, we measure the output amplitude of the synchronous integrator both icing and no
icing for every 10mm in the height range between 1650mm and 1800mm, the influence
of the waving altitude change to the signal amplitude is shown in fig 6.

Fig. 6. Influence of the waving altitude change to the signal amplitude

From Fig.6 we can see, with the increase of the distance between the laser source and
the rotor in the height range from 1650mm to 1800mm, the output amplitude of the
synchronous integrator decreases, when the rotor has no icing, the maximum change of
the output amplitude is 0.4V, when the rotor has icing, the maximum change of the
output amplitude is 0.35V. For this system measurement, the decrease of the amplitude
caused by waving height is not so sufficient to misjudge the icing state, so it’s not
necessary to consider the compensative measures caused by the amplitude decrease.

3.3.2 Influence of the Waving Angle Changes to the Signal Amplitude


Use angle frame to realize the angle changes from 3degree, 5 degree to 8 degree along
the span direction of the rotor, when the rotor has ice or not, the output amplitude of the
synchronous integrator are recorded and shown in Fig.7 respectively.
472 J. Zhang et al.

Fig. 7. Influence of the waving angle changes to the signal amplitude

From Fig.7 we can see: with the increase of the waving angle from 0 degree to 8
degree, the output amplitude of the synchronous integrator decreases, when the rotor
has no icing, the signal amplitude decreases 1.1V, when the rotor has icing, the signal
amplitude decreases 0. 5V, but icing also can make signal amplitude decrease, the
system may not distinguish the real reason of the decrease of the signal amplitude, so
the waving of the rotor is possible to be mistaken for icy information, we should set the
high gate voltage to be between 1.3V and 1.8V to improve the accuracy of the system.

3.4 Experiment for Bending Moment of the Rotor

The bending moment of the rotor can cause the angle changes along the chord direction
of the rotor, the output amplitude of the synchronous integrator changes with bending
angle are shown in Fig.8.

Fig. 8. Influence of the bending angle change to the signal amplitude

By Fig.8 we can see: with the increase of the bending angle from 0 degree to 8
degree, the output amplitude of the synchronous integrator decreases, when the rotor
has no icing, the signal amplitude decreases 0.6V, when the rotor has icing, the signal
amplitude decreases 1V, it is likely to cause icing misjudgment, We should set the high
gate voltage to be between 1.3V and 1.8V to improve the accuracy of the system.
Non-contact Icing Detection on Helicopter and Experiments Research 473

3.5 Compensative Measures for Waving and Bending Moment

It’s necessary to make compensative measures to avoid misjudgment caused by


changes of waving angle and bending angle, and improve the ability of
anti-interference of the system. DSP make a real-time amplitude sampling of the output
of the synchronous integrator, and according to the actual conditions, DSP control the
Programmable Gain Amplifier to adjust the gain to a specific scope of the signal
disposal circuit, that is: there is a great difference between the signal amplitude at the
icing area and the rotor area, and make the signal amplitude at the icing area is less than
that at the waving and bending states. The high threshold of the double threshold
comparator is set to be greater than the signal amplitude when the rotor has icing, but
smaller than the waving and bending signal amplitude, In this method ,the system can
reduce the chance to misjudge greatly.

4 Conclusion
The experimental studies have shown that the absorption effect of ice on the 1450nm
laser wavelength is better than 940nm wavelength laser, so the 1450nm laser
wavelength is more suitable for non-contact ice detection. The system uses the
non-contact infrared ice detection is feasible. When helicopter is in flight the height of
flapping have little effect on the system detection, while the angle of flapping and the
angle of torque have great effect on the system, so we need to make compensation
measures for the system.

References
[1] Qida, W., Tongguang, W.: Icy Detection Technology Progress On Helicopter Rotor
Aeronautical Manufacturing Technology 3, 62–64 (2009)
[2] Ligang, W., Tiancheng, J.: Circuit noise analysis and circuit design based on the
photoelectric diode detection. Journal of Daqing Petroleum Institute 2(133), 88–91
[3] Qingyong, Z.: Weak Signal Detection, pp. 66–75. Zhejiang University Press (2003)
[4] Hobbs, P.V.: Ice physics, pp. 355–455 (1998)
Research on Decision-Making Simulation of
"Gambler's Fallacy" and "Hot Hand"

Jianbiao Li, Chaoyang Li, Sai Xu, and Xue Ren

Research Center for Corporate Governance/Business School, Nankai University,


A901 MBA Building, Business School of Nankai University, 94# Weijin Road,
Nankai District, Tianjin, 300071, P.R. China
[email protected]

Abstract. The "gambler's fallacy" and the "hot hand" are considered as typical
examples of misunderstanding random chronological events. People have two
different expectations on the same historical information: gambler's fallacy and
hot hand. This paper analyzes the occurring numbers of the four effects which are
"gambler’s fallacy", "hot hand", "hot outcome" and "stock of luck" and their
revenues in a series of random chronological events by describing the
decision-making preferences of heterogeneous individuals with the method of
computer simulation. We can conclude from the simulation process of coin flips
that there are no differences among the occurring numbers and the revenues of
these four effects mentioned above. However, they are different when a single
period is focused on, which conforms to the beliefs and behavior in the real
decision-making situation.

Keywords: Gambler's Fallacy, Hot Hand, Simulation.

1 Introduction
Almost every decision made in reality involves uncertainty. People intend to maximize
their profits by choosing an optimum option. Researches on decision-making under
uncertainty reveal that our beliefs toward the probabilities of future events usually
deviate from Bayes rule. With response bias, many people are caught in gambler’s
fallacy or others, and hope that there will be systemic reversal in the outcome of the
random events. Extensive studies have been achieved on behavioral economics,
psychology and neurons economics (e.g. Camerer, Loewenstein, & Prelec, 2005;
Gilovich, Vallone, & Tversky, 1985; Kahneman, 2002; Rabin & Vayanos, 2010).
Further researches on searching regulations in these random sequences and comparison
between the revenue of gambler’s fallacy and hot hand deserve to be carried out. Real
experiments are difficult to be realized because of the high cost and complicated
procedures. Experimental economics and computer simulation provide strong support
for testing these effects in large sample experiments. This paper attempts to test the
revenue of "gambler's fallacy", "hot outcome", "hot hand" and "stock of luck" existing
in decision-making by using computer simulation and explore which is the optimal
decision-making mode.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 474–478, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Research on Decision-Making Simulation of "Gambler's Fallacy" and "Hot Hand" 475

2 Literature Review
Researches on gambler's fallacy and hot hand primarily involve the following four
aspects. First is confirming gambler's fallacy and other recency effects existing in the
real decision-making; secondly, some scholars conduct their researches from the
perspective why gambler's fallacy and hot hand exist; the third research area is to prove
the mechanism results in generation of gamblers fallacy and hot hand and interpret the
rationality of the existence of such false beliefs; the fourth type of studies focus on
combining the false beliefs of gambler's fallacy with the behavior of people, such as the
correlation of the false beliefs and the confidence level, abnormal reaction in stock
market, etc.
The gambler's fallacy is one's subjective judgment about the probability of objective
events, which is considered as a belief in negative autocorrelation of a non-autocorrelated
random sequence of outcomes like coin flips (Peter Ayton, Ilan Fischer, 2004). The hot
outcome is the opposite of the gambler's fallacy, which is a belief in positive
autocorrelation of a non-autocorrelated random sequence of outcomes (outcomes of
objective events). In contrast to the gambler's fallacy, the hot hand is one's subjective
judgment about the probability of subjective events, and is a belief in positive
autocorrelation of a non-autocorrelated random sequence of outcomes (outcomes of
subjective events) (Peter Ayton, Ilan Fischer, 2004). The hot hand also has its opposite
bias--" Stock of luck" which suggests that individual's luck is a solid value. When the
luck is used up, the probability of winning will be reduced.

3 Design of the Simulation Experiments


A sequence of coin flips was simulated by a computer. The occurring numbers and
revenues of the gambler's fallacy, hot outcome, hot hand, and stock of luck were tested
in the simulation. The developing platform was Visual C++6.0.

3.1 Process of Coin Flips and Outcome Guess

We define random numbers 0 and 1 as the representations of the back and the front
sides of the coin separately, and their occurring probabilities are both 50%. The
outcomes are stored in the array a[i], and the size of is the total count of coin flips. Thus,
the value of a[i] equals the outcome of coin flips at time-i: if a[i]=1, representing the
front side; if a[i]=0, the back side. The outcome of guess is also represented by the
random numbers 0 and 1. The guess probability of each outcome is also 50%, which
equals the expected probability of a rational individual. The figures guessed at time-i
are stored in b[i], and if a[i]=b[i], the outcomes of the guess and the coin flip are
identical.

3.2 Simulation and Revenue Design


Simulation procedure of gambler's fallacy, When the outcomes of coin flips appeared
the same side for three times sequentially (a[i]=a[i+1]=a[i+2]), the individual expects
476 J. Li et al.

the next outcome will be the opposite because of the gambler's fallacy, namely, a[i+2]
and a[i+3] are not equal. After the fourth guess, if a[i+2] and a[i+3] are not equal, the
guess outcome under the decision-making mode of the gambler's fallacy is correct, and
then the revenue increases by 2. In contrast, if a[i+2]=a[i+3], the guess outcome under
the gambler's fallacy mode is wrong, thus the revenue is subtracted by 2.
Simulation procedure of hot outcome. On the contrary, the individual expects the
fourth outcome of the coin flips will be the same with the third under the
decision-making mode of hot outcome. If a[i+2]=a[i+3] after the fourth guess, which
means that the decision under the hot outcome mode is right and makes the revenue
increased by 2. Instead, if a[i+2] and a[i+3] are not equal, the revenue decreases by 2.
Simulation procedure of hot hand. Two cases will be explained here. First, when the
outcomes of the guess and the coin flips maintain the same for three times sequentially,
namely, the three conditions are met simultaneously: b[i]=a[i], b[i+1]=a[i+1],
b[i+2]=a[i+2], the individual expects his decision will be correct in the next guess. If
b[i+3]=a[i+3], then add 3 to the revenue, and on the contrary, subtract 3 from the revenue
( the individual increase his bet to 3 because that he has won for three times under hot
hand). Secondly, if the decisions maintain wrong in three sequent guesses under hot hand,
the individual will decrease his bet. In this case, the revenue increases by 1 after the
correct decision in the fourth guess and decreases by 1 after the opposite result.
Simulation procedure of stock of luck. There are also two cases about the stock of
luck here. First, contrast to the hot hand, when the outcomes of the guess and the coin
flips maintain the same for three times sequentially as mentioned above, the individual
expects his decision will not be correct in the next guess under the decision-making
mode of stock of luck. That is to say, the individual expects that b[i+3] and a[i+3] are
not equal under the conditions: b[i]=a[i], b[i+1]=a[i+1], b[i+2]=a[i+2]. In this case, the
individual will reduce his bet to 1. Thus, he will obtain 1 after he wins in the fourth
guess, or lose 1 contrarily. Secondly, if the decisions maintain wrong in three sequent
guesses under stock of luck, the individual will enhance his bet to 3 because of the
expected "reversal of fortune". Then, the revenue increases by 3 after the correct
decision and decreases by 3 in the opposite situation.

300

200

100

0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
-100

-200 GF HC HH

Fig. 1. Revenue of every effect in experiment

In the case of completely random decision-making which involves no effect as the


gambler's fallacy, the bet is 2. At the end of the simulation, the occurring numbers and
the revenues of decisions under the four effects will be counted. The data will be
converted into an Excel file to facilitate our analysis.
Research on Decision-Making Simulation of "Gambler's Fallacy" and "Hot Hand" 477

200

150

100

50
GF HC
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Fig. 2. Occurring numbers of the four effects in experiment

4 Result Analysis
The simulation involved 50 periods, and the number of coin flips was 1000 in every
period. At the end of every period, the occurring number and the revenue of every
effect was counted and recorded. From the comparative analysis above we could
conclude that there was no difference among the revenues and the occurring numbers
of the four effects. That is to say, the revenues of the four effects were homogenous.
In a sense, that is one of the reasons which lead to the individual differences in
decision-making. The decision based on the objective knowledge of random events
shows no advantage compared with the false beliefs, and that may be the cause of the
wild existence of the false beliefs (Andreas Wilke & H. Clark Barrett, 2009).
As is shown in the tables above, the occurring number of every effect is 1/8 which
conforms to our rational expectation. The distribution function of the revenues of every
effect is normal distribution with mean 0. Compared to the revenues under the
completely random decision-making mode, there is also indifference. However, they
are different when a single simulation is focused on.

Table 1. Analysis about the variances of the revenues of every effect

Source SS df MS F Prob > F

Between groups 1082.696 4 270.674 0.06 0.9932


Within groups 1096589.16 245 4475.87412
Total 1097671.86 249 4408.32071

5 Conclusion

This paper reviews the definition of concepts such as the gambler's fallacy, hot hand
and their performance in reality, and then summarizes the generation causes of the four
effects. On this basis, we analyze the occurring numbers and the revenues of the four
effects by describing the decision-making preferences of heterogeneous individuals
with the method of computer simulation. We can conclude that the occurring numbers
478 J. Li et al.

and the revenues of the four effects are indifferent from the analysis of Stata 10.0.
There exist some differences in the beliefs and behavior of the real decision-making.
People's behavior is influenced by their own preferences and beliefs. Preference is
the driver of behavior, and the belief is the understanding of the relation between the
behavior and the result, which also can be considered as an expectation of future events.
People in different countries form their own unique behavior preferences because of the
distinct national culture and institutional environment. Different decision-making
beliefs will be generated in different situations, and the beliefs of these recency effects
such as the gambler's fallacy, hot hand can be interchangeable under certain conditions.
People in different countries, with different preferences, prefer a certain kind of effect,
resulting in "collective effect" of accumulation actions. Thus, decision-making analysis
in cross-cultural perspective is a valid way to study on the collection of the false beliefs.

Acknowledgment. The National Natural Science Foundation of China (70972086,


70672029, Director: Professor Li Jianbiao), Project supported by Humanities and social
sciences key research base from the Ministry of Education of China (05JJD630023,
Director: Professor Xing Xiaolin, Professor Li Jianbiao), Major Program of the National
Natural Science Foundation of China (70532001, Director: Professor Li Weian).

References

1. Sundali, J., Croson, R.: Biases in casino betting: The hot hand and the gambler’s fallacy.
Judgment and Decision Making 1(1), 1–12 (2006)
2. Sun, Y., Wang, H.: Gambler’s fallacy, hot hand belief, and the time of patterns. Judgment
and Decision Making 5(2), 124–132 (2010)
3. Ayton, P., Fischer, I.: The gambler’s fallacy and the hot-hand fallacy: Two faces of
subjective randomness. Memory and Cognition (32), 1369–1378 (2004)
4. Guryan, J., Kearney, M.S.: Gambling at Lucky Stores: Empirical Evidence from State
Lottery Sales. American Economic Review 98(1), 458–473 (2008)
5. Rabin, M.: The Gambler’s and Hot-Hand Fallacies: Theory and Applications. Review of
Economic Studies 77, 730–778 (2010)
An Integration Process Model of Enterprise Information
System Families Based on System of Systems

Yingbo Wu1, Xu Wang2, and Yun Lin2


1
School of Software Engineering
[email protected]
2
School of Mechanical Engineering,
Chongqing University, Chongqing, China
{wx921,linyun313}@163.com

Abstract. Based on the theory of system of systems (SoS), an integration


process model of an enterprise information system family is discussed. The
model is stratified into two levels, the top-level sub-processes of SoS and the
bottom-level sub-processes of component systems, and the mapping relations
between activities in sub-processes are classified as horizontal and vertical,
which all contribute to better internal consistency. To support the dynamic
integration environment of an enterprise information system family, the model
is also designed to be an iterative process. Finally, based on the proposed
model, we present an example of an enterprise information system family
integration process in an auto industry chain.

Keywords: Integration Process, Enterprise Information System Family, System


of Systems.

1 Introduction
System of systems (SoS) is a set of theories and methods proposed and developed to
solve complex system family issues. Before integration, the constituent system is in
spatial distribution and complementary in terms of capabilities. By effective integrating
the constituent systems, SoS enhances problem-solving and responsiveness to
challenging opportunity without changing the existing system working environment.
SoS is highly adaptable to dynamic and unstable external environment [1]. Andrew P.
Sage et al pointed out many of today’s systems are no longer engineered as stand alone
systems by an individual institution, but as part of an integrated system of systems, or a
federation of systems or systems family [2].
With the increasing demand on integrative supply chain and enterprise collaborative
management, an enterprise information system family needs to be integrated as a SOS
among the enterprises along the industry chain and within an enterprise itself to
facilitate collaboration between enterprises or within an enterprise itself. An integration
process study of an enterprise information family using SoS theories and methods will
cast light on how to integrate an enterprise information SoS, and how to systemize,
regulate and guide integration of an enterprise information system family.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 479–485, 2011.
© Springer-Verlag Berlin Heidelberg 2011
480 Y. Wu, X. Wang, and Y. Lin

At present, research on the integration of an enterprise information system family


is mainly focused on integration technologies, frameworks or standards and so on.
Though some scholars made useful exploration in applying SoS to enterprise
information system engineering [3-5]. But little is done on the integration process of
an enterprise information system family based on SoS.

2 SOS and Enterprise Information System Family


The concept of SoS derives from early research on and understanding of issues of
multi-system integration, which was later expounded by scholars in different research
areas from different perspectives. Maier [6] first described SoS in terms of its general
attributes, which were later reduced to the following five main features by Sage &
Cuppan [7]:
1) Operational Independence of the Individual System. A SoS is composed of
systems that are independent and useful in their own right.
2) Managerial Independence of the Individual System. The component systems in a
system of systems not only can operate independently; they generally do operate
independently to achieve an intended purpose.
3) Geographical Distribution. Geographic dispersion of component systems is
often large.
4) Emergent Behavior. The SoS performs functions and carries out purposes that
are not necessarily associated with any component system, leading to behaviors that
are emergent properties of the entire SoS and not the behavior of any component
system.
5) Evolutionary Development. Development of a SoS is generally evolutionary
over time. Components of structure, function, and purpose are added, removed, and
modified as experience with the system grows and evolves over time.
In Boardman & Sauser’s thesis [8], these five main features were summarized as
follows: Autonomy, Belonging, Connectivity, Diversity and Emergence. Based on
these studies, we believe that a SoS is a aggregation of interactive component systems
in a dynamic environment, and the component systems themselves within that
aggregation should have a certain degree of autonomy in order to have better adaptive
and collaborative capabilities, while the interactions between component systems is
loosely coupled and a SoS should possess functional features not necessarily
associated with its any component systems. The emergence properties of a SoS are
important and necessary, because they distinguish a SoS from a system family. A SoS
must demonstrate its unique functional features unavailable in its component systems,
as opposed to a system family formed by simply combining independent systems.
An enterprise information system family is an information system set established
to serve corporate business management. Compared with a SoS, the component
systems within an enterprise information system family also possess features of
autonomy, belonging, connectivity and diversity. But in order to possess emergence
properties unavailable in an existing individual enterprise information system, an
enterprise information system family has to be effectively integrated as a SoS. The
emergence properties are realized as information, functions, and interactiveness
unavailable in component systems at a micro-level, better collaboration, adaptability
An Integration Process Model of Enterprise Information System Families 481

and flexibility at a macro-level. Therefore, the integration of an enterprise information


system family should be viewed as constructing a SoS. Only in so doing, can we
achieve greater comprehensive integration benefits and corporate business and
management integration objectives.

3 Integration Process Model


According to the traditional system engineering perspective, an integration process of
an enterprise system family can be divided into four phases, namely objectives
planning, needs definition, design & development, and implementation. Each of the
four phases includes a number of specific integration activities. In contrast, according
to SoS, a SoS integrated from an enterprise information system family should be
viewed as a whole in terms of its objectives and the emergence properties. And the
objectives and emergence properties should be reflected in the integration activities.
In addition, because the integration of an enterprise information system family often
occurs in a dynamic environment where new systems are added, and the old systems
removed, replaced or modified into a new system; it should be viewed as a dynamic
process. And this process favors the evolution of SoS. The integration process
proposed in this paper is an iterative process in which each integration is seen as a
component system of SoS, which in turn, can be integrated into a larger SoS. This
whole iterative process makes the continuous dynamic evolution of SoS a reality.
A SoS can be logically divided into a top-level SoS and a bottom-level of
component systems [1], and then we propose the integration process model which be
divided into top-level integration sub-processes of SoS and button-level integration
sub-processes of component systems. And each sub-process is further sub-divided
into four stages. Each time after the completion of an integration process, it waits next
in line based on the need for a SoS evolution. The overall process framework is
shown in Fig. 1.

Fig. 1. The integration process model of an enterprise information system family based on SoS

3.1 Top-Level Integration Sub-processes of SoS

The top-level SoS provides an integration overview of an enterprise information


system family. Integration sub-processes for a SoS involve four stages. In the first two
stages, the integration objectives and requirements are planned and defined, and
emergence function feature set is clearly defined. And in the next stage, the structure
482 Y. Wu, X. Wang, and Y. Lin

of a SoS is designed based on integration objectives and requirements, and


corresponding integration specifications and standards of a SoS are devised. The final
stage—system implementation involves developing integration implementation plan,
and preparing the necessary basic integration conditions and environments. The sub-
process model is shown in Fig. 2.

Fig. 2. Top-level Integration Sub-processes of SoS

3.2 Bottom-Level Integration Sub-processes of Component Systems

The sub-processes which is shown in Fig. 3 also involve four stages. In the first two
stages, component system integration objectives and requirements are planned and
defined based on SoS integration sub-processes. In the next two stages, component
systems combination & connection schemes are developed, and component systems
integration is implemented by devising specific implementation schemes of
engineering organization, management and technologies.

Fig. 3. Bottom-level Integration Sub-processes of Component Systems

3.3 Iterative Processes That Favor Dynamic Evolution of Integration

As enterprise information system family changes with its integration, the SoS thus
integrated evolves dynamically. In the process model, each time after the completion
An Integration Process Model of Enterprise Information System Families 483

of integration process, it waits next in line based on the need for a SoS evolution. And
the activities in the process are based on a decomposition principle, for example a SoS
objectives can be decomposed into sub-SOS objectives activities, thus each iterative
integration can be viewed as a construction process of a next SoS. An episode of the
iterative processes is shown in Fig. 4.

Fig. 4. An episode of the iterative processes that favor dynamic evolution of integration

3.4 Mapping Relations between Sub-process Activities

In the process model, the activities in integration sub-processes at the two different-
levels form mapping relations, which can be divided into horizontal and vertical
relations based on mapping directions. Mapping relations of sub-process activities are
shown in Fig. 5.

Fig. 5. Mapping relations between sub-process activities


484 Y. Wu, X. Wang, and Y. Lin

In terms of horizontal mappings of sub-process activities, SoS function and


performance feature sets can be obtained by SoS objectives, features and performance
decompositions, and SoS function and performance can be more clearly defined by
SoS objectives & functions definitions, and performance designing. Horizontal
mapping also holds water for component system integration sub-process activities.

4 An Integration Process Example


A sampled auto industry chain centers on a vehicle manufacturer, where strong cross-
enterprise business collaboration is formed among upstream parts suppliers, vehicle
manufacturers, and downstream sales and third-party logistics agents. The enterprise
information system family of this auto industry chain is integrated to address issues of
information sharing and business collaboration among enterprises.
Based on the process model proposed in this paper, we first conduct SoS needs
analysis and function definitions based on the SoS objective in the collaborative
industry chain, and completed collaborative performance designing between
procurement and production, between production and sales, and between procurement,
sales and logistics. In the system integration process, we clearly define system
integration objectives and functions based on SoS collaborative objectives, functions
and performance, devise the integration tasks & implementation schemes based on data-
middleware and service-oriented framework. The integration process model proposed in
this paper is proved to be an effective tool in developing an integration process catering
to an enterprise information system family.

5 Conclusion
The integration process of an enterprise information system family, which differs
from constructing an individual system, is concerned with a complex comprehensive
SoS. The concept and method of SoS will help us solve complex multi-system
integration issues. Based on SoS, this paper made useful exploration by proposing a
process model that guides and regulates enterprise information system family
integration. There is still much work to be done in integration process of an enterprise
information system family based on SoS like process refinement, assessment, and
optimization.

References
1. Songbao, Z., Weiming, Z., Zhong, L., et al.: Research of Structure Analyzing and
Modeling for Complex System of Systems. Journal of National University of Defense
Technology 28(1), 62–67 (2006)
2. Sage, A.P., Biemer, S.M.: Process for System Family Architecting, Design, and
Integration. IEEE System Journal 1(1), 5–16 (2007)
3. Carlock, P.G., Fenton, R.E.: System of Systems (SoS) Enterprise Systems Engineering for
Information-intensive Organizations. System Engineering 4(4), 242–261 (2001)
4. Morganwalp, J., Sage, A.P.: A System of Systems Focused Enterprise Architecture
Framework. Information, Knowledge System Management 3(2), 87–105 (2003)
An Integration Process Model of Enterprise Information System Families 485

5. Stephenson, S.V., Sage, A.P.: Architecting for Enterprise Resource Planning. Information,
Knowledge, System Management 6(1-2), 81–121 (2007)
6. Maier, M.: Architecting Principles for Systems-of-systems. System Engineering 1(4), 267–
284 (1998)
7. Sage, A.P., Cuppan, C.D.: On the Systems Engineering and Management of Systems of
Systems and federation of systems. Information Knowledge System Management 2(4),
325–345 (2001)
8. Boardman, J., Sauser, B.: System of Systems - the Meaning of. In: Proceedings of the 2006
IEEE/SMC International Conference on System of Systems Engineering, pp. 118–123.
IEEE Press, New York (2006)
A Linear Multisensor PHD Filter Using the
Measurement Dimension Extension Approach

Weifeng Liu and Chenglin Wen

Hangzhou Dianzi University, Hangzhou Zhejiang 310018, China


{liuwf,wencl}@hdu.edu.cn

Abstract. The common probability hypothesis density (PHD) fiter is


derived under the single sensor condition. The multisensor PHD (MPHD)
filter is remarkably complex and thus is impractical to use. Mahler pro-
posed a MPHD filter under the assumption of independence of all senors.
This paper studies the linear multisensor-multitarget system. We propose
a linear multisensor probability hypothesis density (LMPHD) filter. By
combining measurement dimension extension (MDE) approach, we con-
sider linear correlation of all sensors. A simulation is finally proposed to
verify the effective of the L-MPHD filter.

1 Introduction
Since it was proposed by Mahler [1], the probability hypothesis density (PHD)
filter has been widely studied, especially in target tracking. These researches
can be classified two classes according to their development: the PHD based
algorithms and the cardinality PHD (CPHD) based algorithms. In the PHD
based class, for the PHD filter is a nonlinear function, the particle-PHD filter
was first given in references [2,3,4]. The particle-PHD filter can deal with the
nonlinear tracking system, but it needs more computational load. The further
important work is the Gaussian mixture PHD (GM-PHD) filter due to Vo et al
[5]. The GM-PHD filter can estimate the target states without state clustering
algorithms. To improve the state estimation, Nandakumaran et al proposed the
PHD smoother [8]. Combining the interacting multiple model, Kirubarajan et
al proposed the multiple-model PHD filter for tracking maneuvering targets
[6]. In the CPHD based aspect, since the value of expected number of targets
is very unstable in the presence of missed detections and/or significantly large
false alarm densities in the PHD filter propagation process [9,10], Mahler further
proposed the CPHD filter [10]. The CPHD filter propagates not only the PHD
but also the entire probability distribution on number of targets. The analytic
solutions of the CPHD was proposed by Vo et al [11].
Nevertheless, The most PHD and CPHD based algorithms are based on the
single-sensor observation. Mahler investigated the multisensor PHD filter in ref-
erence [1] and pointed out that the resulting PHD formula is impractical due to
its complexity in multisensor case. He proposed an approximation multisensor
PHD algorithm through the product of individual sensor PHDs. In this algo-
rithm, the sensors are independence from respective observation spaces. This

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 486–493, 2011.

c Springer-Verlag Berlin Heidelberg 2011
A LMPHD Filter Using the Measurement Dimension Extension Approach 487

assumption is impractical in many cases. For example, the multisensor may


observe the same targets and the multisensor thus be dependent in this case.
Besides, in sensor network environment, individual sensors communicate infor-
mation and thus the observations coming from the sensors are also dependent.
For reducing difficulty, in this paper we confine our problem to the multisensor
with a linear correlation. We propose the linear multisensor PHD (LMPHD)
filter by adopting the measurement dimension extension (MDE) approach. This
approach method not only effectively reduces the difficulty but also deal with
the sensor correlation.
This paper is organized as follows. Section 2 is about background and problem
description. Section 3 shows the proposed LMPHD filter. Section 4 gives a three
target tracking simulation. Last section 5 is our conclusion.

2 Background and Problem Description


2.1 Review on the PHD Filter

In the single-sensor case, the PHD filter is iteratively proceeded by the following
time-update and measurement-update steps. Predicted step:

Dk+1|k (x) = γk+1|k (x) + [(Ps (u)fk+1|k (x|u) + βk+1|k (x|u))Dk (x|u)]du (1)

Update step:
Dk+1 (x|Zk+1 ) = Fk+1 (Zk+1 |x)Dk+1|k (x) (2)
 Pd (x)gk+1 (z|x)
Fk+1 (Zk+1 |x) = 1 − PD (x) +  (3)
z∈Z
λck+1 (z) + Pd (x)gk+1 (z|x)Dk+1|k (x)dx
k+1

where γk+1|k (x) is the target birth intensity at time step k + 1, PD (u) is the de-
tection probability, βk+1|k (x|u) is the target spawning intensity, which has state
u at time step k and has state x at time step k + 1, PS (x) denotes target surviv-
ing probability, λ is the average clutter intensity, ck+1 (z) is the clutter density
for each clutter point, and gk+1 (z|x) is the measurement likelihood function.
Mahler proposed an approximation for the multisensor PHD filter as follows.

Dk+1 (x) ∼
[1] [1] [s] [s]
= Fk+1 (Zk+1 |x) · · · Fk+1 (Zk+1 |x)Dk+1|k (x) (4)

where s is the number of sensors. It can be seen from (4) that the PHD Dk+1 (x)
is a s + 1-fold application of the Poisson approximation [1]. This update process
can not be used in the sensor correlation case.

2.2 Multisensor Multitarget System Description

The linear multisensor-multitarget system is defined by


488 W. Liu and C. Wen

xk+1 = F k xk + Gk w k (5)
[j] [j] [j]
y k+1 = H k+1 xk+1 + v k+1 (6)
[j]
z k+1 = Ak+1 yk+1 + bk+1 (7)
, y k+1 ]T
[1] [s]
y k+1 = [y k+1 , · · ·

[j]
Where zk+1 denotes the measurement of the jth sensor. Obviously, individual
[j]
observations {zk+1 } are dependent. Equation (6) is the measurement model for
each sensor. Accordingly, equation (7) is the proposed linear fusion model of
each sensor. We aim at deriving the PHD intensity conditional on the measure-
ment set Zk+1 , i.e., D(xk+1 |Zk+1 ), where Zk+1 = {Zk+1 , · · · , Zk+1 }T ,Zk+1 =
[1] [s] [j]

[j] [j]
{zk+1,1 , · · · , zk+1,nj } By augmenting measurements, we rewrite the measure-
ment function (7) as follows

z k+1 = Ak+1 H k+1 xk+1 + Ak+1 v k+1 + bk+1 (8)


z k+1 = [z k+1 , · · · , z k+1 ]T
[1] [s]
(9)
H k+1 = [H k+1 , · · · , H k+1 ]T
[1] [s]
(10)
v k+1 = [v k+1 , · · · , v k+1 ]T
[1] [s]

Given this, we sum up the linear multisensor-multitarget system functions as


follows
xk+1 = Fk xk + Gk w k (11)
z k+1 = H A A
k+1 xk+1 + v k+1 (12)

Where H A A
k+1 = Ak+1 H k+1 , v k+1 = Ak+1 v k+1 + bk+1 . Our next main task is
to obtain the corresponding LMPHD filter after the MDE.

3 The Linear Multisensor PHD Filter


The proposed LMPHD filter in the predicted step is same as the original PHD
filter. Therefore, we focus on the Bayesian update PHD in the measurement-
update step.

3.1 The LMPHD Bayesian Update


We first consider the LMPHD filter using the common approach. Suppose that
[1] [s]
individual sensor RFSs are {Σk , · · · , Σk }, in the Bayesian update step, the
proposed LMPHD filter is derived by probability generating function (PGL)
F [g1 , · · · , gs , h] [1]
A LMPHD Filter Using the Measurement Dimension Extension Approach 489

1 δ ms F · · · δ m1 F δF
Dk+1 (x|Zk+1 ) = [0, · · · , 0, 1] (13)
fk+1 (zk+1 |Zk ) δ ms z [s] · · · δ m1 z [1] δx   
s
F [g1 , · · · , gs , h] =

[1] [s] [1] [s]
g Z · · · g Z hX fk+1 (Z [1] |X) · · · fk+1 (Z [s] |X)fk+1|k (X|Zk )δZ [1] · · · δZ [s] δX (14)

δ ms F δ ms z [s]
= (15)
δ z
m s [s] δz ms · · · δz 1
Obviously, it is intractable to obtain the update PHD Dk+1 (x|Zk+1 ). We here
adopt the MDE approach (9) z k+1 . Thus the above multisensor PHD reduce
to the single-sensor PHD [1](pp: 1173, equations (110),(111)), but here the ex-
tension measurements consist of total combination of all sensor measurements.
That is
1 δ Ls +1 F
Dk+1 (x|Zk+1 ) = [0, 1] (16)
fk+1 (zk+1 |Zk ) δz Ls · · · δz 1 δx
s
Where Ls is the total combination number. It equals to Ls = l=1 ml , ml is the
number of measurements of the lth sensor.
Fk+1 (Zk+1 |x) = 1 − PD (x) +
 Pd (x)gk+1 ([z [1] , · · · , z [s] ]T |x)
 (17)
[1]
λs ck+1 ([z [1] , · · · , z [s] ]T ) + PD (x)gk+1 (z|x)Dk+1|k (x)dx
[z k+1 ,··· ,z [s] ]T ∈Zk+1

Equation (17) is the same as a single sensor in appearance. Nevertheless, it


describes a linear multisensor system and also considers the correlation among
the sensors. We thus derive the LMPHD filter like the single sensor PHD update
approach. However, some parameters which is different from the single-sensor
PHD filter are required to be calculated.

3.2 Derivation of Some Parameters


In the proposed LMPHD update equation, several parameters need to be calcu-
lated, including clutter density ck (·), clutter intensity λ and extension covariance
RAk+1 . We here investigate how to get these parameters.

Clutter Density and Clutter Intensity. Assume that the lth sensor to be a
space surface Sl . Thus, the multisensor can be described to be a super-cylinder
C consisting of these sensor spaces, i.e., C  S1 × · · · × Ss . Therefore, the clutter
intensity is proposed by:
λ = λ1 + · · · + λs (18)
where λ1 , · · · , λs are the clutter intensities of the sensors.
The clutter density of each clutter point is proposed as follows
λ
ck ([z [1] , · · · , z [s] ]T ) = (19)
V (C)
Where V (C) = V (Sl × · · · × Ss ) denotes the volume of the super-cylinder.
490 W. Liu and C. Wen

The Extension Measurement Covariance. Assume that the process noise


[j]
wk and measurement noise {v k } follow Gaussian noise with mean 0 and co-
variances Qk and Rk , respectively. The measurement error covariance RA
[j]
k+1 =
cov(v A
k+1 , v A
k+1 ) of the measurement function in (12) is proposed as follows:

RA T
[1] [s]
k+1 = Ak+1 diag[Rk+1 , · · · , Rk+1 ]Ak+1 (20)
v̄ k+1 = E(v A
k+1 ) = bk+1 (21)

Given the three parameters, the LMPHD filter can be proceeded like the single-
sensor PHD filter. Similarly, the GM-PHD filter can also be used in LMPHD
filter.

4 Simulation
In this section, three targets with CV movements are proposed in x-y coordina-
tion plane. We suppose two sensors observing the same region [−1000, 1000] ×
[−1000, 1000]m2 . These two sensors are placed in different positions and have
a difference in precision. Some system parameter are proposed as follows: ini-
tial states are [250m, 5m/s, 250m, −12m/s],[−250m, 12m/s, −250m, −5m/s] and
[−250m, 12m/s, −250m, −5m/s] respectively for target 1, 2 and 3. Detection
probability is PD = 0.9. The process covariance is Qk = diag(25, 25)m2 .
The measurement covariances are respectively R1 = diag(25, 25)m2 and R2 =
diag(50, 50)m2 for sensor 1 and 2. That is, the sensor 1 has a better performance
than sensor 2. The measurement functions are position observation. We assume
in the simulation that the two sensors have the same reference coordinations.
Accordingly, we fuse these two sensors as follows:

y k = H k xk + v k
z k = Ak y k + bk
⎡ ⎤

0.8 0 0.2 0
1000 ⎢ 0 0.8 0 0.2 ⎥
Hk = , Ak = ⎢ ⎥
⎣ 0.6 0 0.4 0 ⎦ , bk = 0
0010
0 0.6 0 0.4

The GM-PHD filter is here proposed to track the targets. The Gaussian terms
whose weights are greater than threshold 0.5 are selected as the estimations.
Fig.1(a) and Fig.1(b) suggest that the proposed algorithm is effective in tar-
get tracking. We further compare the proposed algorithm with the PHD filter in
Fig.2 and 3. It can be seen from Fig.2 that in the number of targets the proposed
algorithm is stable than the PHD filter. Fig.3(a) is the Wasserstein distances of
these two algorithms. It shows that the PHD filter has something fluctuation
due to the estimation of target number. This point can be also found in Fig.3(b)
where the OSPA distance is adopted. However, the proposed algorithm needs
some more computing time than the PHD filter. The reason is that the proposed
A LMPHD Filter Using the Measurement Dimension Extension Approach 491

400 1000
True track
Estimation track: the proposed algorithm 500
200

x(m)
0

0 −500 True track


Estimation track: the proposed algorithm
−1000
0 10 20 30 40 50 60
y(m)

−200 Time step(s)

500

−400
0

y(m)
−600
−500

−800 −1000
−800 −600 −400 −200 0 200 400 600 800 0 10 20 30 40 50 60
x(m) Time step(s)

(a) The proposed linear MPHD fil- (b) Tracks in x,y coordinations
ter: true tracks and estimation tracks
in x,y plane

Fig. 1. Target tracking trajectorys

5
True number of targets
4.5 Estimation number of targets: the proposed algoirthm
Estimation number of targets: the original PHD filter
4

3.5
Number of targets

2.5

1.5

0.5

0
0 10 20 30 40 50 60
Time step(s)

Fig. 2. The number of targets against time

400 100
The proposed algorithm The proposed algorithm
350 The original PHD filter 90 The original PHD filter

80
300

70
Wasserstein distance

250
60
200
OSPA

50
150
40
100
30
50
20

0 10

−50 0
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Time step(s) Time step(s)

(a) The Wasserstein distances (b) The OSPA distances against


against time time

Fig. 3. Tracking performance comparison


492 W. Liu and C. Wen

algorithm requires to deal with more extension measurements and the higher
dimension vectors. Here, the number of the measurements is Mk = m1 × · · ·× ms
and the dimension is d = d1 + · · · + ds , where d1 , · · · , dm are the measurement
dimensions for individual sensors.

5 Conclusion
We proposed a LMPHD filter by using the MDE approach. Though the new fil-
ter has the same form as the original PHD filter, it can deal with the case where
the sensors are linear correlation. We also proposed the derivation of some pa-
rameters for the new extension measurement in the LMPHD filter. Future works
may focus on two aspects: computational complexity and nonlinear correlation.
The proposed filter needs to update the product of all sensor measurements com-
pared with the sum of the measurements in single sensor PHD filter. Secondly,
Radar plays an important role in target tracking, and radar networks is a new
tendency in future application. Nevertheless, radar is a nonlinear sensor system.
How to solve the nonlinear multisensor will be a popular topic.

References
1. Mahler, R.P.S.: Multitarget Bayes Filtering via First-Order Multitarget Mo-
ments. IEEE Transactions on Aerospace and Electronic systems 39(4), 1152–1178
(2003)
2. Vo, B., Singh, S., Doucet, A.: Sequential Monte Carlo implementation of the PHD
filter for multi-target tracking. In: Proceedings of the International Conference on
Information Fusion, Cairns, Australia, pp. 792–799 (2003)
3. Sidenbladh, H.: Multi-target particle filtering for the probability hypothesis density.
In: Proceedings of the International Conference on Information Fusion, Cairns,
Australia, pp. 800–806 (2003)
4. Zajic, T., Mahler, R.: A particle-systems implementation of the PHD multitarget
tracking filter. In: Signal Processing, Sensor Fusion, and Target Recognition XII,
pp. 291–299 (2003)
5. Vo, B.-N., Ma, W.-K.: The Gaussian Mixture Probability Hypothesis Density Fil-
ter. IEEE Transactions on signal processing 54(11), 4091–4104 (2006)
6. Punithakumar, K., Kirubarajan, T., Sinha, A.: Multiple-model probability hy-
pothesis density filter for tracking maneuvering targets. IEEE Transactions on
Aerospace and Electronic Systems 44(1), 87–88 (2008)
7. Vo, B.N., Pasha, A., Tuan, H.D.: A Gaussian mixture PHD filterr for nonlinear
jump Markov models. In: Proceedings of the 45th IEEE Conference on Decision
and Control, pp. 3162–3166. IEEE, San Diego (2006)
8. Nandakumaran, N., Punithakumar, K., Kirubarajan, T.: Improved multi-target
tracking using probability hypothesis density smoothing. In: Drummond, O.E. (ed.)
Proc. Signal and Data Processing of Small Targets, vol. 6699 (August 2007)
A LMPHD Filter Using the Measurement Dimension Extension Approach 493

9. Erdinc, O., Willet, P., Bar-Shalom, Y.: A Physical-Space Approach for the Proba-
bility Hypothesis Density and Cardinalized Probability Density Filters. In: Signal
and Data Processing of Small Targets, Proc. of SPIE, vol. 6236, pp. 1–12 (2006)
10. Mahler, R.: PHD filters of higher order in target number. IEEE Trans. Aerosp.
Electron. Syst. 43(3), 1523–1543 (2007)
11. Vo, B.-T., Vo, B.-N., Cantoni, A.: Analytic Implementations of the Cardinal-
ized Probability Hypothesis Density Filter. IEEE Transactions on Signal Process-
ing 55(7), 3553–3567 (2007)
An Improved Particle Swarm Optimization for Uncertain
Information Fusion

Peiyi Zhu1,2, Benlian Xu2, and Baoguo Xu1


1
College of IoT Engineering,
Jiangnan University,
Lihu Road, Wuxi City, Jiangsu, China
2
School of Electrical and Automation Engineering,
Changshu Institute of Technology,
Nansanhuan Road, Changshu, Jiangsu, China
[email protected]

Abstract. Multi-sensor information fusion is used to carry on synthesizing


excellently to the multi-source information, make verdict of people more accurate
and credible. But the influences of uncertainties on the safety/failure of the system
and on the warranty costs exist. The new method to deal with the uncertain
information fusion based on improved Dempster-Shafer (D-S) evidence theory
has been proposed, and set up the concept of weight of sensor evidence itself and
evidence distance based on a quantification of the similarity between sets to
acquire the reliability weight of the relationship between evidences. Next an
improved particle swarm optimization (PSO) is used to computer sensor weight to
modify D-S evidence theory. Finally, numerical experiments are adopted to prove
its effectiveness.

Keywords: D-S theory, evidence distance, uncertain, particle swarm


optimization.

1 Introduction
The integration of uncertain multi-sensor information is commonly deal by information
fusion algorithm [1-2]. Information often contains uncertainties, which are usually
related to physical constrains, detection algorithms, and the transmitting channel of the
sensors. Whilst the intuitive approaches, such as Dempster-Shafer Fusion, Dezert
Samarandche Fusion, and Smets’ Transferable Belief Model [3-5] are to aggregate all
available information, these approaches do not always guarantee optimum results.
Acknowledging these measurement techniques have associated measurement costs, the
essence is to derive a fusion process to minimize global uncertainties.
Nowadays, the system is increasingly relying on information fusion techniques to
automate processes and make decisions. An informed decision maker, meanwhile,
often relies on various forms of data fusion models to assist him/her to assess the
current situations. The D-S evidence theory is an excellent method of information
fusion, its adoption belief degree function but not probability is a generous character,
no request to make binary mutex assumption to the indetermination affairs, having

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 494–501, 2011.
© Springer-Verlag Berlin Heidelberg 2011
An Improved Particle Swarm Optimization for Uncertain Information Fusion 495

more catholicity to the indetermination. Because of the difference which is caused by


sensors in the credibility or reliability, the important degree of the proof provided is
different. So it is essential to deal with the evidence with a method of weighed,
adapting to a weighed evidence synthetic technique. People have already put forward
a variety of methods of weighed. But up to now the study on how to obtain better
weight value is also rarely covered in the weight D-S evidence theories. Presently the
methods that select weight value currently: From the expert of fusion or the result of
the covariance etc. [6]. But it is hard to obtain the better weight value through these
methods. Considering of the disadvantages of the weighed D-S theory, we present a
method to solve optimization weight value, and the thought of the method is
explained, then use improved PSO algorithm to work out the weight value in the
established optimization model. Compared with some other methods [7], this
evidence theory proves more effective and advanced by making simulation.

2 An Improved Particle Swarm Optimization


Since the original version of PSO proposed in 1995[8], lots of work has been done to
develop it. According to the Standard PSO (SPSO), the weight ω is a fixed
parameter, so the algorithm easily falls into premature convergence. So this paper
presents an improved strategy which is based on Dynamical Weight (DW) that the
particles have the adaptive changes the diversity of groups to adjust their speed. And
the particles effectively jump out form the best local to avoid premature convergence.
In the standard PSO algorithm, the particles have own thinking and share information
and cooperate with each other in the process of continuous evolution, so it can be
divided into two parts: the evolution degree of particles speed and the aggregation
degree of particles.
Definition 1: xi (t ) | (i ∈ {1, 2,..., N }) denotes ith particle in tth-generation population,
the particle best fitness can be expressed as f i , individual extreme value records at
presents denotes f ibest (t ) , so individual extreme value records at (t − 1) time denotes
f ibest (t − 1) . And the evolution degree of particles speed E( x) can be represented as the
followings:
E( x) = fibest (t ) / f ibest (t − 1) (1)

Definition 2: xi (t ) | (i ∈ {1, 2,..., N }) denotes ith particle in tth-generation population,


the particle best fitness can be expressed as f i , individual extreme value records at
presents denotes f ibest (t ) , f avg denotes all particle average fitness at present, so the
aggregation degree of particles A( x) can be represented as the followings:
A( x) = f ibest (t ) / f avg (t ) (2)

According to defines 1 and 2, we can clearly see the particle swarm optimization
process. If adjusts the weight according to the particle velocity evolution degree and
the granule extent of polymerization, we can combine the weight with the particles
optimization process, thus adjust population diversity by adjusting the weight.
496 P. Zhu, B. Xu, and B. Xu

Therefore the weight changes along with changing the particle velocity evolution and
aggregation degree, which situation can solve the particles premature problem from
the mathematical model. When E( x) is big the velocity evolution is quick, the
algorithm may continue in the big space to search, namely the swarm optimizes in
wide range. When E( x) is small, may reduce ω to cause the swarm searched in a
small scope, thus found the optimum value quickly. When A( x) is small the swarm
is quite scattered, not easy to fall into local optimum. The swarm easily to fall into
local optimum with A( x) increasing, at this moment, we will increase ω to increases
the swarm search space, and improve the swarm global search capability. In
conclusion, ω increases with the particle velocity evolution E( x ) reduce or increases
with the particles aggregation degree A( x) increases. Therefore ω is decided by
E( x) and A( x) , so the functional relations may express the equation below:
w = f ( E ( x), A( x)) = w0 − 0.55* E ( x) + 0.15* A( x) (3)


Where ω0 is initial value of ω , generally speaking, ω0 0.9 ; by definition we can
see 0 < E (x ) ≤ 1 and 0 < A (x ) ≤ 1 , so ω0 − 0.55 < ω < ω0 + 0.15 , ensure ω < 1 the
convergence requirements.

3 An Improved d-s Theory for Uncertain Information Fusion


3.1 Basic of D-S Theory

D-S theory is developed as an attempt to generalize probability theory by introducing


a rule for combining distinct bodies of evidence [9]. In D-S theory, a finite nonempty
set of mutually exclusive alternatives is called the frame of discernment, denoted by
Θ. The power set 2Θ is the set of all the subsets of Θ including itself. The basic
probability assignment (BPA) reflects a degree of belief in a hypothesis or the degree
to which the evidence supports the hypothesis.
DS evidence theory has been widely used in uncertainty information fusion, and
provides a strong theoretical basis on dealing with uncertain information. However,
D-S evidence theory would arrive at conclusion contrary to common sense when high
confliction existed in evidences. Facing the high conflict, many scholars put forward
their own amendments to this disadvantage, such as Yager rule [10], Dubois-Prade
rule [11], the average distribution rule and the weighted rule. However, these
combination rules have disadvantages in robustness and effectiveness. For the above
amendment combination rules, from the practical application point of view, Haenni
pointed out some problems as follows: weight distribution of designated evidence
does not meet the exchange law and integration law and the issue of complex
computation in practical application.

3.2 The Distance between Weighted Evidences

Since the aim of this paper is to define a meaningful metric distance for BPAs, let
w1 , w2 is the corresponding weight of the evidence m1 , m2 in information fusion,
An Improved Particle Swarm Optimization for Uncertain Information Fusion 497

m1 , m2 is two BPAs on the frame of discernment Θ, containing n mutually exclusive


and exhaustive hypotheses [12]. The distance between m1 and m2 is:
1 JJG 2 JJG 2 JJG JJG
d BPA ( w1m1 , w2 m2 ) = ( w1 m1 + w2 m2 − 2 < w1 m1 , w2 m2 >) (6)
2
Where the index 1/2 is needed to normalize d BPA and to guarantee that
JJG JJG
0 ≤ d BPA ≤ 1 ; and < m1 , m2 > is the scalar product defined by Eq. (7):
JJG JJG 2n 2n | Ai ∩ Aj |
< m1 , m2 > = ∑∑ m1 ( Ai )m2 ( Aj ) (7)
i =1 j =1 | Ai ∪ Aj |

3.3 The New Uncertain Information Fusion


In order to make the amended evidence overall between closer and the final decision
is more reasonable, weighted evidence distance function is established, this function
is about evidence weight function. In this paper, we will take this function as
optimized object. According to optimization theory, the optimization model can be
represented as the followings:
F ( w) = min(d BPA
2
)
n n
G G (8)
= min ∑∑ d BPA2
(mi , m j )
i =1 j =1

It can be easily seen that ∑ wi = 1 , thus, the weight shows the relative importance
of the collected evidence. We can use traditional optimization methods to solve
Eq. (8). Considering real-time demand for information fusion, and accuracy and
validity fusion results in conflict evidence, the more optimal solution based on
improved PSO should be given to meet these requirements. For the summary of the
new uncertain information fusion see Table 1.

Table 1. The procedure of the new uncertain information fusion algorithm

The weight of evidence’ owns reliability wi can be acquired by the above analysis,
so we can revise evidence theory by wi . Considering the evidence source itself has
different importance, therefore, the revised evidence not be a simple average, the new
evidence source probability assigned is defined as follows:
n
mae(m) = ∑ ( wi mi ) (9)
i =1
498 P. Zhu, B. Xu, and B. Xu

Then, the new probability assignment is acquired, thus, we can combine it using
D-S combination rule.

4 Numerical Experiments
The new method have analyzed above, this section mainly give two numerical
examples to further understanding the method proposed above.
A. Initial experiments
The experimental arrangement, two groups of common data to acquire the basic
probability assignment, this problem is solved by using two different analysis types
namely case 1 using D-S rules and case 2 using Yager’s rule to combine evidence at
sub-system level for each type. We used modified PSO algorithm to solve this
optimization problem along with D-S rule of combined evidence program in the
developed matlab program to solve this problem. For the PSO parameters, namely,
population size=15, tmax = 500 , c1 = c2 = 2.0 , ω0 = 0.9 , stopping convergence
criterion (in terms of change in the objective function value) = 10-8 for over 200
continuous iterations.
Assuming a discernment frame Θ = { A, B, C} , source S1 , S 2 , S3 , S 4 , such that
Data 1:
S1 : m( A) = 0.7 m( B) = 0.1 m(C ) = 0.1 m( A, C ) = 0.1
S2 : m( A) = 0.1 m( B) = 0.8 m(C ) = 0.05 m( A, C ) = 0.05
S3 : m( A) = 0.4 m( B ) = 0.3 m(C ) = 0.2 m( A, C ) = 0.1
S4 : m( A) = 0.4 m( B) = 0.2 m(C ) = 0.1 m( A, C ) = 0.3
Data 2:
S1 : m( A) = 0.001 m( B ) = 0.199 m(C ) = 0.8
S2 : m( A) = 0.9 m( B ) = 0.05 m(C ) = 0.05
S3 : m( A) = 0.3 m( B ) = 0.6 m(C ) = 0.1
S4 : m( A) = 0.4 m( B) = 0.4 m(C ) = 0.2

Table 2. Experimental result of Data 1

Table 3. Experimental result of Data 2


An Improved Particle Swarm Optimization for Uncertain Information Fusion 499

The results of combination are shown in Table 2 by three different rules, we can
see that when the confliction between evidences are relatively small, this paper’s
method is slightly better than D-S rules and Yager rules, which reflects the basic
status of D-S evidence theory, so there is not much difference among effectiveness of
sensor fusion methods of linear combination, In this paper, but if the identification
object is multi-element set, this paper’s method makes it more reasonable and simple
to compute.
The numbers in the table 3 confirm our conclusion made in the following. We can
see that when the confliction between evidences is relatively large, the experimental
results are much better than Yager rules, and even better than the method of D-S
rules. Because there is any contradiction among the evidences from different sources,
the Yager rule treats that contradiction as coming from ignorance. If there is any
additional knowledge, then this contradiction might be resolved. So the Yager rule is
more conservative than the D-S rule. Therefore, we have any reason to believe that
this paper’s method has the best combination result between three algorithms.
B. Evidence conflict and robustness
In this section, we consider two evidence source combinations by the classic DS
evidence theory and the improved algorithm. Assuming a discernment frame
Θ = { A, B, C} , source S1 and S2, such that
Data 3:
m1 : m1 ( A) = 0.99, m1 ( B ) = 0.01, m1 (C ) = 0
m2 : m2 ( A) = 0, m1 ( B ) = 0.01, m1 (C ) = 0.99
m1' : m1' ( A) = 0.98, m1' ( B) = 0.01, m1' (C ) = 0.01
The results of combination are shown in Table 4 by two different rules.

Table 4. Experimental result of Data 3

We can see that the evidence is high conflicting in this test. In these two evidence,
the focal element A and the focal element C intuitively obtain a higher support degree,
should be approximately 50%, which should get more support after combination.
However, combination based on DS combination rule causes that the focal element B
gets almost certainly support, which is clearly unreasonable. We proposed algorithm
by modifying conflict evidence, get more reasonable results, therefore, the improved
algorithm can effectively avoid the defects of the traditional DS combination rules
when there is conflict evidence.
If the evidence m1' is substitute by m1 , the combination results are shown in Table 5.
500 P. Zhu, B. Xu, and B. Xu

Table 5. Experimental result of Data 3

Contrasting Table 4 and Table 5, the combination results based on DS combination


rule have great changes when the evidence m1' is substitute by m1 . This situation
shows that DS rules is sensitive to the change of the focus element probability
distribution, robustness is poor. But the combination results based on the improved
algorithm is little change, robustness is better.

5 Conclusion
For traditional D-S evidence theory’s problems of high conflict between evidences,
this paper proposed a new method to solve it. Firstly, we deal with the evidence with
a method of weighted D-S theory. Then an optimize model of obtaining sensor
weights has been set up. At last, we use the improved PSO to acquire the reliability
weight of the relationship between evidences to modify D-S theory. Numerical
experiments show that this new method is more effective.

Acknowledgements. This research is supported by a grant from the National Natural


Science Foundation of China No. 60804068 & No. 30971689, the Natural Science
Foundation of Jiangsu Province No. BK2010261, the Key Technology Support
Program of Jiangsu province No. BE2010627, the Natural Science Fundamental
Research Project of Colleges and Universities in Jiangsu province No. 10KJD510001.

References
[1] Wan, S.: Fusion Method for Uncertain Multi-sensor Information. In: International
Conference on Intelligent Computation Technology and Automation, vol. 1, pp. 1056–
1060 (2008)
[2] Chen, L., Huang, J.: Research of Uncertainty. Journal of Circuit and System 9(3),
105–111 (2004)
[3] Shafer, G.: A mathematical theory of evidence, pp. 19–63. Princeton University Press,
Princeton (1976)
[4] Dezert, J., Smarandache, F.: DSmT: A New Paradigm Shift for Information Fusion. In:
COGnitive systems with Interactive Sensors International Conference, Paris, March 2006,
pp. 1–11 (2006)
[5] Ristic, B., Smets, P.: Target Classification Approach Based On the Belief Function
Theory. IEEE Transactions on Aerospace and Electronics Systems 41(2), 1097–1103
(2005)
[6] Capelle, A.S., Fernandez-Maloigne, C., Colot, O.: Introduction of Spatial Information
within the Context of Evidence Theory. In: IEEE International Conference On Acoustics,
Speech, and Signal Processing, vol. 2, pp. 785–788 (2003)
An Improved Particle Swarm Optimization for Uncertain Information Fusion 501

[7] Wu, Z., Wu, G.: A new improvement of evidence combination. Computer and
Modern 12, 116–117 (2007)
[8] Eberhart, R.C., Kennedy, J.: A new optimizer using particle swarm theory. In:
Proceedings of the sixth International Symposium on Micro and Human Science, Nagoya,
Japan, pp. 39–43 (1995)
[9] Sentz, K., Ferson, S.: Combination of evidence in Dempster Shafer theory. TR 0835,
Sandia National Laboratories, Albuquerque, New Mexico (2002)
[10] Yager, R.: On the Dempster-Shafer framework and new combination rules. Information
Sciences 41, 93–137 (1987)
[11] Dubois, D., Prade, H.: Representation and combination of uncertainty with belief
functions and possibility measures. Computational Intelligence 4, 244–264 (1998)
[12] Jousselem, A., Grenier, D., Bosse, E.: A new distance between two bodies of evidence.
Information Fusion 2, 91–101 (2001)
Three-Primary-Color Pheromone for Track Initiation

Benlian Xu1, Qinglan Chen2, and Jihong Zhu1


1
School of Electrical & Automatic Engineering, Changshu Institute of Technology, 215500
Changshu, China
2
School of Mechanical Engineering, Changshu Institute of Technology, 215500 Changshu,
China
{xu_benlian,chen_qinglan,djyzhjh}@yahoo.com.cn

Abstract. We propose a novel ant system with a “subtractive” color mixing


model of three-primary-color for jointly estimating the number of tracks to be
initiated and their individual tracks in the multi-sensor multi-target system. In our
algorithm, each ant deposits cyan, magenta, or yellow pheromone all the time, and
ant’s decision depends on the colored pheromone similarity comparison with
candidates to be visited. On the basis of it, a mixture optimization function on a
three dimensional parameter space is proposed to exploit best solutions by the
following ants. Simulation results are presented to support obtained favorable
performance of our algorithm.

Keywords: Track initiation, Ant colony optimization, Three-primary-color.

1 Introduction
Multi-target tracking (MTT) has received considerable interest over last decade years,
and its application focuses on civil and military areas [1,2]. In general, MTT includes
phases of track initiation, data association and state estimate, among which track
initiation can determines the number of targets as well as the initial state estimate for
state estimator, and track initiation under consideration may result in target loss or
increase computation burden. So far, four popular track initiation techniques are
generally used in radar tracking, namely, the rule-based method, the logic-based
method, the Hough transform and the modified Hough transform method [3]. In this
work, however, we focus on the bearings-only track initiation problem in the sonar
based tracking of submarines. Since measurement candidates grows exponentially
with the number of sensors ( m ) or polynomially with the number of targets, many
attempts have been made including various evolutionary algorithms, among which
Ant Colony Optimization (ACO) approach is recognized a competitive one [4,5].
However, these algorithms need to be improved for practical tracking scenario due to
the assumption that the number of tracks is known a prior.
Biologist reports that there are near 20,000 species of ants that vary in size, color,
and ways of life. Most are a dull, drab color such as brown, rust, or black, but some
are yellow, green, blue, or purple. Inspired from these colored ants, we propose an ant
system with three primary colors to jointly identify the number of tracks to be
initiated and their individual tracks.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 502–508, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Three-Primary-Color Pheromone for Track Initiation 503

2 Background

In the generic ACO algorithm, ants communicate with each other indirectly by the
stigmergy, and such behavior can be replaced by a more direct communication means
called color similarity of pheromone. Color similarity comparison can be conducted
through the following two steps in general: color space conversion and color
difference computation. Since there are many techniques for color space conversions,
the adopted conversion strategy is generally conditional on various applications. In
this work, to increase the color discrimination ability of each ant in the “subtractive”
color mixing model, the following conversion steps are employed:

Step 1): From CMY to standard RGB space. Since standard RGB component
values vary between 0 and 1, a cheap and simple transform from CMY space to
standard RGB space is adopted, namely, R = 1 − C , G = 1 − M , and B = 1 − Y ,
respectively, where each component of CMY lies in the range of [0,1] as well.

Step 2): From standard RGB to CIE XYZ. According to human vision tristimulus,
the conversion law from standard RGB to CIE XYZ is introduced as [6]

⎡X ⎤ ⎡ 0.412453 0.357580 0.180432 ⎤ ⎡ R ⎤


⎢ Y ⎥ = 100 ⎢ 0.212671 0.715160 0.072169 ⎥ ⎢G ⎥ (1)
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢⎣ Z ⎥⎦ ⎢⎣ 0.019334 0.119193 0.950227 ⎥⎦ ⎢⎣ B ⎥⎦

Step 3): From CIE XYZ to CIE LAB. Because the CIE LAB is more perceptually
uniform than the CIE XYZ, i.e. a change of the same amount in a color value
can produce a change of about the same visual importance, the color
difference comparison is conducted preferably in the CIE LAB space, which can be
computed as

L∗ = 116 f ( Y Yn ) − 16
⎧t 1/3 t > (6 / 29)3

a = 500 ⎡⎣ f ( X X n ) − f (Y Yn ) ⎤⎦ where f (t ) = ⎨ 1

4 (2)
⎪ ( 29 6 ) t +
2
otherwise
b = 200 ⎡⎣ f ( Y Yn ) − f ( Z Z n ) ⎤⎦

⎩3 29

where typical ranges for above three values are L* ∈ [0,100] , a* ∈ [−100,100] , and
b* ∈ [−100,100] , respectively; ( X n , Yn , Z n ) denotes the tristimulus value with reference
to white point in the CIE XYZ space, and is given by (95.017,100.0,108.813) .

Step 4): Color difference computation. Consequently, given two colors


w1 = ( L∗1 , a1∗ , b1∗ ) and w2 = ( L∗2 , a2∗ , b2∗ ) obtained from Eq. (2) in the CIE LAB space, the
corresponding color difference in terms of the metric of Euclidean distance is

ΔE ( w1 , w2 ) =|| w1 − w2 ||2 = ( L∗1 − L∗2 )2 + (a1∗ − a2∗ ) 2 + (b1∗ − b2∗ ) 2 (3)


504 B. Xu, Q. Chen, and J. Zhu

3 Three-Primary-Color Pheromone for Track Initiation

3.1 Track Construction

For a multi-sensor multi-target bearings-only tracking system, the sampling data from
the first four scans are generally utilized to initiate tracks, thus we obtain four spaces,
denoted by Ω1 , Ω 2 , Ω3 and Ω 4 , respectively, and each is formed through intersecting
bearing measurements (or Line of sights) of the same scan, as shown in Fig. 1. Since
our algorithm is based on the three primary colors, we consider three groups of equal
number of ants accordingly. Initially, three groups of ants are mixed together and
placed randomly on position candidates in the first search space Ω1 . Afterwards, each
ant will visit the position candidates in the next search space probabilistically.
Suppose that an ant with s 1 pheromone is now located at position i in Ω k
( k = 1, 2,3 ), then the ant will visit position j in the next search space by applying the
following probabilistic formula:

∑eα
−α ⋅ΔE ( ws , w j )
Pi ,( sj ) = e ηi , j ηi , l
− ⋅ΔE ( ws , wl )
(4)
l ∈Ωk +1

−α ⋅ΔE ( w , w )
where e s j
denotes the pheromone color similarity between the current ant and
the path from i to j ; ηi , j denotes the problem dependent heuristic function; while
α is an adjustable positive parameter whose value determines the degree of
pheromone color similarity among candidates. It is noted that the smaller the
−α ⋅ΔE ( ws , w j )
pheromone color similarity e , the bigger the color difference. To eliminate
the effect of outliers and simplify the computation of each ant, ηi , j is defined as
below
⎪⎧1 if r1 ≤ Di , j ≤ r2 , j ∈ γ
ηi , j = ⎨ (5)
⎪⎩κ otherwise
where κ is a constant and takes a value between 0 and 1, Di , j is the distance between
i and j , and γ is denotes an annular gate region whose inner and outer radiuses are
determined respectively by r1 =|| v min ||iT and r2 =|| vmax ||iT with sampling interval T .
While walking from i to j , the ant deposits its corresponding color pheromone with
a given amount τ 0s as
τ i , j ← τ i , j + τ 0s (6)

where τ 0s is the added local pheromone amount with color s , and τ i , j is the resulting
pheromone through mixing three primary colors with their individual amounts.
Once all ants at a given iteration have finished their individual tours, the
pheromone amount on some established tracks will be updated globally, which will
directly result in the color changes on these tracks. We consider respectively three

1
Without loss of generality, s = 1, 2,3 represents cyan, magenta, and yellow, respectively.
Three-Primary-Color Pheromone for Track Initiation 505

Line of sight from sensor 2

Ω1 Line of sight from sensor 1

Ω2

Ω3

Ω4

Sensor 1 Sensor 2 X

Fig. 1. The obtained search spaces

Y θ

a
1

b
1
p
2 V
p 3
c
3 2

O X O
ρ
(a) (b)

Δ E p (1, 2 )
Δ E p ( 2, 3)
Δ E p (1, 3)
θ
1
3
2
O ρ
(c)

Fig. 2. Objective function in ρ −θ −Δ space

groups of solutions to update the corresponding pheromone trails. Moreover, each


group is equally composed of L best-so-far-solutions found by ants. Given a track p
in the s th group, the pheromone update is conducted according to the following rule
τ ip, j ← τ ip, j + Δτ is,,jp . (7)

where Δτ is,,jp is the amount of pheromone with color s added on the segment ij of
track p . In our ant system, Δτ is,,jp is defined as follow
Δτ is,,jp = Q0s J p (8)
s
where Q0 is a adjustment constant related to the s type of three primary colors, and
J p is the objective function to be discussed below.

3.2 Objective Function in a Three Parameter Space

Hough transform (H-T) has been recognized as a robust technique for line or curve
detection in the image detection field, and it in is essence a transform from a point
506 B. Xu, Q. Chen, and J. Zhu

( x, y ) in a Cartesian coordinate system to a curve in a ρ − θ parameter space, and it


obeys the following basic law [7]
ρ = x cos θ + y sin θ (9)
where ρ is the distance from the line through ( x, y ) to the coordinate origin, and θ
is the angle to the normal with the x axis.
In our proposed ant system, the objection function is designed in a three
dimensional parameter spaces denoted by ρ − θ − Δ , where notation “ Δ ” represents
the color difference. As a result, the obtained objective function for track p is a
combined form and described as
J p = ξ J dp + (1 − ξ ) J Δp (10)
where J dp denotes the distance difference sub-objective function, J Δp is the color
difference sub-objective function for track p , and parameter ξ ( 0 ≤ ξ ≤ 1 ) is to
balance the importance between the two sub-objective functions. The cluster idea is
inherited to calculate J dp in the ρ − θ − Δ space as
Ck
J dp = ∑ || x ( k , p ) − V p ||2 (11)
k =1

where x ( k , p ) is the k th point corresponding to some segment on track p , denoted by


1, 2, or 3 in Fig.2(a) to 2(b); Ck is the number of intersections; and V p is the cluster
Ck
prototype with the value V p = ∑ x ( k , p ) Ck , as shown in Fig. 2(b). J Δp can be
k =1
computed as
J Δp = ∑
1≤ i ≠ j ≤ 3
ΔE p ( w j , wi ) (12)

As shown in Fig. 2(c), any two segments on track p are selected to compute the
corresponding color difference, so total three terms are required to be calculated.

4 Numerical Simulation
Numerous simulations with different cases are conducted on a DELL 6GHz processor
with 1.99 GB RAM, we only present the case of three track initiation due to the
layout restrictions. Figs.3 (a) and 4 (a) present the obtained colored board in clutter-
free and clutter environment, respectively. Since our goal is to select all potential
tracks of nearly cyan, magenta, or yellow, through using both color and distance
difference thresholds ( ε c = 60, ε d = 400 ) the extracted tracks are extracted and plotted
in Figs.3 (b) and 4 (b), and each is nearly in cyan, magenta, or yellow. Besides,
simulation results indicate that our proposed algorithm enjoys robust track initiation
performance both in clutter-free and clutter environment.
Three-Primary-Color Pheromone for Track Initiation 507

4 4
x 10 x 10
9 9
track 1
track 2
track3
ghosts
8 8

7 7
Y-Coordinate in (m)

Y-Coordinate in (m)
6 6

5 5

4 4

3 3
4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000
X-Coordinate in (m) X-Coordinate in (m)

a) The obtained color “board” b) The extracted tracks

Fig. 3. Track extraction results in clutter-free environment with ε c = 60, ε d = 400

4 4
x 10 x 10
9 9

8 8

7 7
Y-Coordinate in (m)

Y-Coordinate in (m)

6 6

5 5

4 4

3 3

2 2
4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000
X-Coordinate in (m) X-Coordinate in (m)

a) The obtained color “board” b) The extracted tracks

Fig. 4. Track extraction results in clutter environment with ε c = 60, ε d = 400

Acknowledgments. This work is supported by National Natural Science Foundation


of China (No. 60804068), Natural Science Foundation of Jiangsu Province
(No.BK2010261), and Cooperation Innovation of Industry, Education and Academy
of Jiangsu Province (No. BY2010126).

References
1. Cheng, H.-Y., Hwang, J.-N.: Adaptive particle sampling and adaptive appearance for
multiple video object tracking. Signal Processing 89(9), 1844–1849 (2009)
2. Vo, B.-N., Singh, S., Doucet, A.: Sequential Monte Carlo methods for multi-target filtering
with random finite sets. IEEE Trans. On Aerospace & Electronic Systems 41(4), 1224–
1245 (2005)
3. Hu, Z., Leung, H., Blanchette, M.: Statistical performance analysis of track initiation
techniques. IEEE Transactions on Signal Processing 45(2), 445–456 (1997)
508 B. Xu, Q. Chen, and J. Zhu

4. Xu, B., Chen, Q., Wang, Z.: Ants for Track Initiation of Bearings-Only Tracking.
Simulation Modelling Practice and Theory 16(6), 626–638 (2008)
5. Xu, B., Wang, Z.: A Multi-objective-ACO-Based Data Association Method for Bearings-
Only Multi-Target Tracking. Communications in Nonlinear Science and Numerical
Simulation 12(8), 1360–1369 (2007)
6. Albers, J.: Interaction of Color. Revised and Expanded edn. Yale University Press, New
Haven (2006)
7. Bhattacharya, P., Rosenfeld, Weiss, A.I.: Point-to-line mappings as Hough transforms.
Pattern Recognition Letters 23, 1705–1710 (2002)
Visual Tracking of Multiple Targets by
Multi-Bernoulli Filtering of Background
Subtracted Image Data

Reza Hoseinnezhad1 , Ba-Ngu Vo2 , and Truong Nguyen Vu3


1
RMIT University, Victoria, Australia
[email protected]
2
The University of Western Australia, WA, Australia
[email protected]
3
Vietnam Academy of Science and Technology, Ho Chi Minh City, Vietnam

Abstract. Most visual multi-target tracking techniques in the literature


employ a detection routine to map the image data to point measurements
that are usually further processed by a filter. In this paper, we present
a visual tracking technique based on a multi-target filtering algorithm
that operates directly on the image observations and does not require
any detection nor training patterns. Instead, we use the recent history of
image data for non-parametric background subtraction and apply an effi-
cient multi-target filtering technique, known as the multi-Bernoulli filter,
on the resulting grey scale image data. In our experiments, we applied
our method to track multiple people in three video sequences from the
CAVIAR dataset. The results show that our method can automatically
track multiple interacting targets and quickly finds targets entering or
leaving the scene.

Keywords: visual tracking, Bayesian estimation, multi-target tracking,


random finite sets, multi-Bernoulli.

1 Introduction
Single-view visual tracking techniques invariably consist of detection followed by
filtering. A detection module generates point measurements from the images in
the video sequence which are then utilised as inputs by a filtering module, which
estimates the number of targets and their states (properties such as location
and size). Detection is an integral part of single-view visual tracking techniques.
There is a large body of literature on models and techniques for detecting tar-
gets based on various background and foreground models. One of the most pop-
ular approaches is the detection of targets based on matching colour histograms
of rectangular blobs [1,2]. Other recent methods include a game-theoretic ap-
proach [3], using human shape models [4,5], multi-modal representations [6],
sample-based detection [7], range segmentation [8] and a multi-step detection
scheme including median filtering, thresholding, binary morphology and con-
nected components analysis [9].

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 509–518, 2011.

c Springer-Verlag Berlin Heidelberg 2011
510 R. Hoseinnezhad, B.-N. Vo, and T.N. Vu

Detection compresses the information on the image into a finite set of points
measurements, and is efficient in terms of memory as well as computational re-
quirements. However, this approach may not be adequate when the information
loss incurred in the detection process becomes significant. Another problem with
using detection is the selection of a suitable measurement model for the filtering
algorithm. Modelling the detection process in a computationally tractable man-
ner is a difficult problem. In practice, the selection of the measurement model is
done on an ad-hoc basis and requires the manual tuning of model parameters.
Using random finite set (RFS) theory, a tractable framework for tracking mul-
tiple targets from video data without detection was recently introduced in [10].
This work led to a novel method for tracking multiple targets in video and has
been successfully demonstrated on sport players tracking [11]. However, this
method requires prior information about the visual appearance of the targets
to be tracked, and is most useful in cases where visual target model is avail-
able either a priori or from training data. In many applications, such as people
surveillance, there is no prior information about the visual appearance of the
targets and a new algorithm is needed.
This paper presents a novel algorithm that tracks multiple moving targets
directly from the image without any training data. Our proposed algorithm
gradually learns and updates a probabilistic background model based on ker-
nel density estimation. This resulting background model is then substracted to
generate a grey scale foreground image from which the multi-target posterior
distribution can be computed analytically using the multi-Bernoulli update of
[10]. A sequential Monte Carlo implementation of the multi-Bernoulli filter is de-
tailed and demonstrated through case studies involving people tracking in video
sequences.

2 Background

In the context of jointly estimating the number of states and their values, the
collection of states, referred to as the multi-target state, is naturally represented
as a finite set. The rationale behind this representation traces back to a funda-
mental consideration in estimation theory–estimation error, see for example [10].
Since the state and measurement are treated as realisations of random variables
in the Bayesian estimation paradigm, the finite-set-valued (multi-target) state X
is modelled as a random finite set (RFS). Mahler’s Finite Set Statistics (FISST)
provides powerfull yet practical mathematical tools for dealing with RFSs [12],
[13], based on a notion of integration and density that is consistent with the well-
established point process theory [14]. FISST has attracted substantial interest
from academia as well as the commercial sector with the developments of the
Probability Hypothesis Density (PHD) and Cardinalized PHD filters [12],[14],
[15], [16], [17].
Let us denote the frame image observation by y = [y1 . . . ym ]. Then, using the
FISST notion of integration and density, we can compute the posterior probability
density π(·|y) of the multi-target state from the prior density via Bayes rule:
Visual Tracking of Multiple Targets by Multi-Bernoulli Filtering 511

g(y|X)π(X)
π(X|y) =  (1)
g(y|X)π(X)δX
where g(y|X) is probability density (likelihood) of observation y given the multi-
target state X and the integral over the space of finite sets is defined as follows:
 ∞ 
1
f (X)δX  f ({x1 , . . . , xi })dx1 . . . dxi . (2)
i=0
i!

In this paper, the finite set of targets, X, is modelled by a multi-Bernoulli


RFS which is defined as the union of M independent RFSs X (i) where M is the
maximum number of targets.
M
X= X (i) .
i=1

In this representation, each X (i) is either empty or a singleton with probabilities


1 − r(i) and r(i) , respectively. In the case where X (i) is a singleton, its only
element is distributed according to a probability density p(i) (·). Thus, a complete
representation of the multi-target state is given by {(r(i) , p(i) )}M
i=1 .
The Bayes update (1) is computationally intractable in general. Fortunately,
it was shown in [10] that if the likelihood function has the following separable
form: 
g(y|X) = f (y) g(x, y) (3)
x∈X

and the multi-target RFS has a multi-Bernoulli prior distribution {(r(i) , p(i) )}M
i=1 ,
then the posterior distribution of X, given by Bayes rule (1), is also multi-
(i) (i)
Bernoulli with the parameters {(rupdated, pupdated)}M
i=1 where:

(i) r(i) p(i) (·), g(·, y)


rupdated = (4)
1 − r(i) + r(i) p(i) (·), g(·, y)
(i) p(i) (·)g(·, y)
pupdated(·) = (5)
p(i) (·), g(·, y)

and f1 , f2  denotes the standard inner product f1 (x)f2 (x)dx.
In the next section, we show that the likelihood function for the image after
background subtraction satisfies the above separable form. Using the update re-
sults (4) and (5), we detail a recursive filtering scheme that takes the background
subtracted images as input to directly track multiple targets.

3 Visual Likelihood
Using background subtraction, each frame image is transformed into a grey scale
image in which each pixel value is the probability density of the pixel belong-
ing to the background. The background subtraction method used in this work
512 R. Hoseinnezhad, B.-N. Vo, and T.N. Vu

is based on kernel density estimation which has been quite popular in visual
tracking [18,19,20]. The resulting grey scale image is then used as input to the
multi-target filter. For simplicity of notation, we will use the y and yi symbols
for the background subtracted grey scale image and its pixel values (which are
indeed the probability density values of the actual pixel in the colour image to
belong to background). We also assume that the yi values are normalised to the
interval [0, 1].

3.1 Background Subtraction


It is assumed that the pixel i in the k-th colour image frame of the video has an
RGB colour denoted by [Ri (k) Gi (k) Bi (k)] . We first convert the RGB colour
to chromaticity (rgI) colours by:

ri (k) = Ri (k)/ (Ri (k) + Gi (k) + Bi (k)) (6)

gi (k) = Gi (k)/ (Ri (k) + Gi (k) + Bi (k)) (7)

Ii (k) = (Ri (k) + Gi (k) + Bi (k)) /256 (8)

where the denominator 256 applies to 8-bit colour quantisation. It is observed


in [19] that chromaticity colour is more robust to ambience light variations and
shadows. Note that the above colour components all vary within the interval
[0, 1].
To compute the kernel density estimate of the i-th pixel to belong to back-
ground, we keep a stack of N0 image frames (each in the form of a 3-D array
including all rgI colours of the pixels) and update the contents of the stack
regularly after every K0 frames. The interpretation of the parameter K0 can
be explained via an example: if the frame rate of the video is 25 and we are
looking for moving targets that are not stationary for more than 5 seconds, we
can choose K0 in the range of 5 × 25 = 125.
The stack of images will initially contain all pixel values recorded at the
sampling times 0, K0 , 2K0 , . . ., (N0 −1)K0 . This stack will be then updated (first
at the time k = N0 K0 then in each K0 frames) by removing the first image from
the bottom of the stack and appending the most recently recorded image (e.g. at
the sampling time N0 K0 ). More precisely, at time k (for k ≥ N0 K0 ), the stack
will contain the rgI values of all pixels at the times K0 k/K0 , K0 (k/K0  − 1),
. . . , K0 (k/K0  − N0 + 1). The kernel density estimate of the likelihood of the
event that the i-th pixel belongs to the background is then given by:
N0 −1 
1   
pi (k) = N di (k); di (K0 (k/K0  − )), σd2 (9)
N0
=0 d=r,g,I

2
where N (x; x0 , σ)  √2πσ
1
exp(− (x−x 0)
2σ2 ) and σr , σg and σI are the bandwidth
of Gaussian kernels for the rgI colours and are user-defined parameters chosen
Visual Tracking of Multiple Targets by Multi-Bernoulli Filtering 513

between 0 and 1. To normalise the pi (k) values to vary within [0,1], the density
normalisation factors √ 1
3
are removed which results in the following
(2π) σr σg σI
normalised yi values:
N0 −1 

1  [di (k) − di (K0 (k/K0  − ))]2


yi (k) = exp − . (10)
N0 2σd2
=0 d=r,g,I

3.2 Likelihood Model


Having obtained a grey scale image with pixel values yi , we need to compute its
likelihood for a given set of multi-target state X = {xj |j = 1, . . . , n}. Each target
region is denoted by T (xj ) within which the average (or a weighted average) of
all pixel values can be computed:


yj = y i mj (11)
i∈T (xj )

where mj = |T (xj )| is the number of pixels within the region T (xj ) defined by
the state xj . The likelihood of the region T (xj ) to include a target is expressed
as a function of yj denoted by gF (y j ). This function should be a strictly de-
creasing function in [0,1]. An appropriate choice of such function is gF (y j ) =
ζF exp(−y j /δF ) where δF is a control parameter to tune the sensitivity to large
average pixel values, and ζF is a normalising constant – see Fig. 1(a). Based on
independence assumptions, the likelihood of all elements of the state set X to in-
clude target regions in the background-subtracted image is given by nj=1 gF (y j ).
The rest of the pixels in the image that do not belong to any of the regions
T (xj ) (j = 1, . . . , n), are highly likely to belong to background. This is an
important condition as otherwise, there might be more than n targets in the
scene and this violates the premise that there are n targets. Let us denote the
rest of the image by:

n
y−X  (y) − {yi |i ∈ T (xj )} (12)
j=1

where (y) is the result of mapping the matrix y to a set containing all the pixel
values. We also construct a new image by filling up all the target regions with
background pixels (all yi values equal to 1), and denote the set of its pixel values
by (y; X) which can also be expressed as below:
⎡ ⎤
⎢ ⎥ 
n
(y; X) = ⎣ {1, · · · , 1}⎦ y−X . (13)
  
j=1
mj times

The average (or weighted average) of pixels belonging to (y; X) is given by:
⎛ ⎞
1 ⎝ m n 
yB = yi + (1 − yi )⎠ . (14)
m i=1 j=1 i∈T (xj )
514 R. Hoseinnezhad, B.-N. Vo, and T.N. Vu

§ y · §y ·
g F ( y j ) ] F exp¨¨  j ¸¸ g B ( y B ) ] B exp¨¨ B ¸¸
© GF ¹ © GB ¹

0 1
yj 0 1
yB
(a) (b)

Fig. 1. (a) Foreground likelihood model (b) Background likelihood model

This average is within [0,1] and expected to be very close to 1. Indeed, if there are
any targets existing in the image but not included in the hypothesised state X,
the low values of the pixels belonging to that target region will decrease yB . If the
average target size is small relative to the whole image, this decreasing effect can
be small. Therefore, the likelihood of y B to represent background region should
be large only for y B values that are very close to 1. It is important to note that
scattered noise (e.g. salt and pepper noise) in the background-subtracted image
may cause reduction in yB , similar to the effect of small size targets. To prevent
this, we remove such tiny noise and other areas of image (containing small-values
pixels) by morphologically closing the image (erosion followed by dilation of the
image using a small structural element).
We denote the likelihood of y B to represent an all background region by
gB (y B ) which is expected to be an increasing function of y B in [0,1]. We choose
the exponential function gB (y B ) = ζB exp(y B /δB ) where δB is a control param-
eter to tune the sensitivity to deviations of the average pixel value from 1, and
ζB is a normalising constant – see Fig. 1(b). As we will see later, the exponen-
tial form is a necessity here to provide a separable form for the total likelihood.
Replacing y B from equation (14), we derive:
 m n  
i=1 yi + j=1 i∈T (xj ) (1 − yi )
gB (y B ) = ζB exp (15)
m δB
 m   n  
i=1 y i j=1 i∈T (x j ) (1 − y i)
= ζB exp exp (16)
m δB m δB
 m    
i=1 yi
n
mj − i∈T (xj ) yi
= ζB exp exp (17)
m δB j=1
m δB
 m  n  
i=1 yi
mj (1 − y j )
= ζB exp exp . (18)
m δB j=1
m δB

Finally, the total likelihood of the image y for the given set of states X is
given by:

n
g(y|X) = gB (y B ) gF (y j ). (19)
j=1
Visual Tracking of Multiple Targets by Multi-Bernoulli Filtering 515

By substituting the foreground and background likelihood functions, we derive


the following separable form:
 m  n   !
i=1 yi
mj (1 − yj )
g(y|X) = ζB exp exp gF (y j ) . (20)
m δB m δB
   j=1   
f (y) gy (xj )

4 Monte Carlo Implementation


Our implementation is based on the method presented in [10], adapted to the
likelihood function defined in (20) for multi-target visual tracking. Suppose that
(i) (i) Mk−1 (i)
at time k − 1, the posterior density {rk−1 , pk−1 }i=1 is given and each pk−1
(i)
(i,j) (i,j) L
is represented by a set of weighted samples (particles) {wk−1 , xk−1 }j=1
k−1
. More
precisely,
(i)
Lk−1
(i)
 (i,j)
pk−1 (x) = w(k−1) δx(i,j) (x). (21)
k−1
j=1
We assume a constant survival probability PS , and consider a predefined model
(i) (i)
for birth particles denoted by known parameters {rΓ , pΓ,k }M Γ
i=1 where the density
(i) (i,j) (i,j)
pΓ,k is represented by the particles {wΓ,k , xΓ,k }L Γ
j=1 . In our experiments, we
assume that with a constant probability of 0.02, one target appears in each
of the four quarters of the image planes, with the location of the target being
(1) (4)
uniformly distributed within the quarter. Thus, MΓ = 4, rΓ = · · · = rΓ = 0.02
and the birth particles are sampled with uniform distribution and weights.
Similar to many other particle filtering schemes, in each iteration, the particles
are predicted then updated. In the prediction step, the birth particles are gener-
ated according to the birth model parameters. The multi-Bernoulli parameters
(i) (i,j) (i,j)
from the previous iteration, {rk−1 , wk−1 , xk−1 }, are propagated forward:
(i,j) (i,j) (i) (i) (i,j) (i,j)
xk|k−1 ∼ fk|k−1 (·|xk−1 ) ; rk|k−1 = PS rk−1 ; wk|k−1 = wk−1 . (22)
The proposal density equals the state transition density fk|k−1 (·|xk−1 ). In our
experiments, the targets are modelled by rectangular blobs and the target state
is a 4-tuple vector comprising the x and y location and width and height. The
target dynamic is modelled by x(k+1) = x(k)+e(k) where e(k) is a 4-dimensional
Gaussian variable with zero mean and variance Σ = diag(σx2 , σy2 , σh2 , σw
2
). Thus,
fk|k−1 (x|xk−1 ) = N (x, Σ).
In the update step, the predicted multi-Bernoulli parameters are updated
using the likelihood function (20) and update formulas (4) and (5) which trans-
late to: " #
(i) (i) (i) (i) (i) (i)
rk = rk|k−1 k / 1 − rk|k−1 + rk|k−1 k (23)
(i,j) (i,j) (i,j) (i)
wk = wk|k−1 gyk (xk|k−1 )/k (24)

(i) L(i)
k|k−1 (i,j) (i,j)
where k = j=1 wk|k−1 gyk (xk|k−1 ) [10].
516 R. Hoseinnezhad, B.-N. Vo, and T.N. Vu

Similar to the MeMBer filter [21], the updated particles are resampled with
the number of particles reallocated in proportion to the probability of existence
as well as restricted between a minimum Lmin and maximum Lmax . To reduce
the growing number of multi-Bernoulli parameters, those with probabilities of
existence less then a small threshold (set at 0.01) are removed. In addition,
the targets with substantial overlap are merged. Finally, the number of targets
and their states are estimated via finding the multi-Bernoulli parameters with
existence probabilities larger than a threshold (set at 0.5 in our experiments).
Each target state estimate is then given by the weighted average of the particles
of the corresponding density.

5 Tracking Experiments
We demonstrate our method for tracking moving people in three video sequences
from the CAVIAR dataset1 which is a benchmark for visual tracking experi-
ments. The tracking results are available to download and view from our home
page.2 The first video shows two persons each entering the lobby of a lab in
INRIA and leaving the environment. The second video shows people walking in
a shopping centre and occasionally visiting a shop that is in the front view of
the camera. The third video shows four people entering the same place as in the
first video, walking together and leaving the lobby. Except for a small number
of frames, the four people are relatively accurately detected and tracked at all
times. In this video, we also show the background subtracted (grey scale) images
to give an indication of how our tracking method uses the results of background
subtraction.
Figure 2 shows snapshots of the third video. It demonstrates that in general,
our method can accurately track multiple targets in the video. The tracking
results in the frames shown in Fig. 2 also present the ability of our tracking
technique in detecting the arrival of new targets into the scene and tracking them
while moving and interacting with other targets, and detecting their departure
from the scene.

6 Conclusions
A novel algorithm for tracking multiple targets directly from image observa-
tions has been presented. Using kernel density estimation, the proposed algo-
rithm gradually learns and updates a probabilistic background model which is
then used to generate a grey scale foreground image. A separable likelihood
function has been derived for the grey scale foreground image, which enabled
an efficient multi-target filtering technique called multi-Bernoulli filtering to be
applied. The method has been evaluated in three tracking scenarios from the
CAVIAR datasets, showing that multiple persons can be tracked accurately.
1
http://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/
2
Video 1: www.dlsweb.rmit.edu.au/eng1/Mechatronics/Case01.mpg
Video 2: www.dlsweb.rmit.edu.au/eng1/Mechatronics/Case02.mpg
Video 3: www.dlsweb.rmit.edu.au/eng1/Mechatronics/Case03.mpg
Visual Tracking of Multiple Targets by Multi-Bernoulli Filtering 517

Frame 83 out of 491 Background Subtraction Frame 152 out of 491 Background Subtraction

50 50 50 50

100 100 100 100

150 150 150 150

200 200 200 200

250 250 250 250

100 200 300 100 200 300 100 200 300 100 200 300

Frame 241 out of 491 Background Subtraction Frame 327 out of 491 Background Subtraction

50 50 50 50
100 100 100 100
150 150 150 150
200 200 200 200
250 250 250 250

100 200 300 100 200 300 100 200 300 100 200 300

Frame 338 out of 491 Background Subtraction

50 50
100 100
150 150
200 200
250 250

100 200 300 100 200 300

Fig. 2. Tracking of up to four people in a video sequence from CAVIAR dataset. The
selected frames show that the method is capable of detecting and tracking multiple
moving objects as they enter the scene, interact and leave the scene.

Acknowledgement

This work was supported by ARC Discovery Project grant DP0880553. Authors
thank Dr Branko Ristic from DSTO, Australia for his contribution.

References
1. Okuma, K., Taleghani, A., De Freitas, N., Little, J., Lowe, D.: A boosted particle
filter: Multitarget detection and tracking. In: Pajdla, T., Matas, J(G.) (eds.) ECCV
2004. LNCS, vol. 3021, pp. 28–39. Springer, Heidelberg (2004)
2. Kristan, M., Per, J., Pere, M., Kovacic, S.: Closed-world tracking of multiple in-
teracting targets for indoor-sports applications. Computer Vision and Image Un-
derstanding 113(5), 598–611 (2009)
3. Yang, M., Yu, T., Wu, Y.: Game-theoretic multiple target tracking. In: ICCV 2007,
Rio de Janeiro, Brazil (2007), http://dx.doi.org/10.1109/ICCV.2007.4408942
4. Wu, B., Nevatia, R.: Detection and tracking of multiple, partially occluded humans
by Bayesian combination of edgelet based part detectors. IJCV 75(2), 247–266
(2007)
5. Zhao, T., Nevatia, R., Wu, B.: Segmentation and tracking of multiple humans in
crowded environments. PAMI 30(7), 1198–1211 (2008)
6. Apewokin, S., Valentine, B., Bales, R., Wills, L., Wills, S.: Tracking multiple pedes-
trians in real-time using kinematics. In: CVPR 2008 Workshops, Anchorage, AK,
United states (2008), http://dx.doi.org/10.1109/CVPRW.2008.4563149
518 R. Hoseinnezhad, B.-N. Vo, and T.N. Vu

7. Zhu, L., Zhou, J., Song, J.: Tracking multiple objects through occlusion with online
sampling and position estimation. Pattern Recognition 41(8), 2447–2460 (2008)
8. Parvizi, E., Wu, Q.J.: Multiple object tracking based on adaptive depth segmen-
tation. In: Canadian Conference on Computer and Robot Vision – CRV 2008,
Windsor, ON, Canada, pp. 273–277 (2008)
9. Abbott, R., Williams, L.: Multiple target tracking with lazy background subtrac-
tion and connected components analysis. Machine Vision and Applications 20(2),
93–101 (2009)
10. Vo, B.N., Vo, B.T., Pham, N.T., Suter, D.: Bayesian multi-object estimation from
image observations. In: Fusion 2009, Seatle, Washington, pp. 890–898 (2009)
11. Hoseinnezhad, R., Vo, B.N., Suter, D., Vo, B.T.: Multi-object filtering from image
sequence without detection. In: ICASSP, Dallas, TX, pp. 1154–1157 (2010)
12. Mahler, R.: Multi-target bayes filtering via first-order multi-target moments. IEEE
Trans. Aerospace & Electronic Systems 39(4), 1152–1178 (2003)
13. Mahler, R.: Statistical multisource-multitarget information fusion. Artech House,
Boston (2007)
14. Vo, B.N., Singh, S., Doucet, A.: Sequential Monte Carlo methods for multi-target
filtering with random finite sets. IEEE Tran. AES 41(4), 1224–1245 (2005)
15. Vo, B.N., Ma, W.K.: The Gaussian mixture probability hypothesis density filter.
IEEE Trans. Signal Proc. 54(11), 4091–4104 (2006)
16. Mahler, R.: Phd filters of higher order in target number. IEEE Trans. Aerospace
& Electronic Systems 43(4), 1523–1543 (2007)
17. Vo, B.T., Vo, B.N., Cantoni, A.: Analytic implementations of the Cardinalized
Probability Hypothesis Density filter. IEEE Trans. Signal Processing 55(7), 3553–
3567 (2007)
18. Tyagi, A., Keck, M., Davis, J.W., Potamianos, G.: Kernel-based 3D tracking. In:
CVPR 2007, Minneapolis, Minnesota, USA (2007)
19. Elgammal, A., Duraiswami, R., Harwood, D., Davis, L.S.: Background and fore-
ground modeling using nonparametric kernel density estimation for visual surveil-
lance. Proceedings of the IEEE 90(7), 1151–1162 (2002)
20. Han, B., Comaniciu, D., Zhu, Y., Davis, L.S.: Sequential kernel density approx-
imation and its application to real-time visual tracking. PAMI 30(7), 1186–1197
(2008)
21. Vo, B.T., Vo, B.N., Cantoni, A.: The cardinality balanced multi-target multi-
Bernoulli filter and its implementations. IEEE Transactions on Signal Process-
ing 57(2), 409–423 (2009)
Mobile Robotics in a Random Finite Set
Framework

John Mullane1 , Ba-Ngu Vo2 , Martin Adams3 , and Ba-Tuong Vo2


1
Nanyang Technological University, Singapore
[email protected]
2
University of Western Australia, Perth, Australia
{ba-ngu.vo,ba-tuong.vo}@uwa.edu.sg
3
University of Chile, Santiago, Chile
[email protected]

Abstract. This paper describes the Random Finite Set approach to


Bayesian mobile robotics, which is based on a natural multi-object fil-
tering framework, making it well suited to both single and swarm-based
mobile robotic applications. By modeling the measurements and feature
map as random finite sets (RFSs), joint estimates the number and loca-
tion of the objects (features) in the map can be generated. In addition,
it is shown how the path of each robot can be estimated if required. The
framework differs dramatically from existing approaches since both data
association and feature management routines are integrated into a single
recursion. This makes the framework well suited to multi-robot scenarios
due to the ease of fusing multiple map estimates from swarm members, as
well as mapping robustness in the presence of other mobile robots which
may induce false map measurements. An overview of developments thus
far is presented, with implementations demonstrating the merits of the
framework on simulated and experimental datasets.

Keywords: mobile robotics, Bayesian estimation, random finite sets,


Probability Hypothesis Density.

1 Introduction

Mobile robotics is becoming an increasingly popular field of research, especially


as embedded computing technology matures and gets ever more sophisticated.
The goal of a mobile robot (or a swarm of mobile robots) is typically to travel
through an unknown environment autonomously, while being continuously aware
of both its surroundings and its position in relation to those surroundings. To
measure the area of operation, robots are equipped with a suite of sensors such
as lasers, radar, sonar or cameras which are inherently prone to sensing and data
association error. In addition, the control inputs applied to a robots actuators
introduce further uncertainty into the robot’s position due to noise and kine-
matic modeling errors of the robot. Due to these multiple sources of uncertainty,
Bayesian approaches have become widely popular [1], [2], [3], [4], which adopt

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 519–528, 2011.

c Springer-Verlag Berlin Heidelberg 2011
520 J. Mullane et al.

probabilistic robot and sensor models and attempt to extract optimal estimates
of the map and robots.
By far the most common approach to the problem is to use a random vector
framework, in which the map and robot paths are modeled as random vec-
tors containing positional information about the features and robot locations
respectively [1]. While this model is the basis for the majority of existing mobile
robotics algorithms, it requires independent data association and map manage-
ment routines to respectively assign measurements to features and to estimate
the number of features in the map [5], [6]. Recently, a new framework has been
developed using Random Finite Set (RFS) models [7], [8], [9], which alleviates
the need for independent routines and unifies the stochastic mobile robotics
framework into a single Bayesian recursion. This new approach admits numerous
benefits such as removal of data association, increased robustness to measure-
ment error, integration of map management, straightforward fusion of multiple
robot map estimates, expected map estimation and can be readily applied to
single or multiple robot scenarios.
This paper advocates a fully integrated Bayesian framework for mobile robotics
under DA uncertainty and unknown feature number. The key to this formula-
tion is the representation of the map as a finite set of features. Indeed, from
an estimation viewpoint, it is argued in section that the map is indeed a finite
set and not a vector. Using Random Finite Set (RFS) theory, mobile robotics is
then posed as a Bayesian filtering problem in which the posterior distribution of
the set-valued map is propagated forward in time as measurements arrive. In the
case of an unknown robot path, the joint density including the robot trajectory
can be propagated. A tractable solution which propagates the first order mo-
ment of the map, its Probability Hypothesis Density (PHD), is presented. The
PHD construct can also be interpreted in terms of occupancy maps [9], [10]. In
this paper, both the map estimation from a known robot path and joint map/
trajectory estimation from an unknown robot path are examined separately. In
particular, mapping robustness to multiple robots, which may interfere with the
map building process, is demonstrated.

2 Background

Map estimation is closely related to the multi-target filtering problem, where the
aim is to jointly estimate the time-varying number of targets (features) and their
states from sensor measurements in the presence of data association uncertainty,
detection uncertainty, clutter and noise. The first systematic treatment of this
problem using random set theory was conceived by Mahler in 1994 [11], which
later developed into Finite Set Statistics and the Probability Hypothesis Density
(PHD) filter in 2003 [12]. A detailed treatment can be found in [13]. The mobile
robotics problem was first formulated in an RFS framework in [7], with mapping
and localisation algorithms presented in [8]. The approach modeled the joint ve-
hicle trajectory and map as a single RFS, and recursively propagates its first
order moment. Stemming from the popular FastSLAM algorithm [6], a factored
Mobile Robotics in a Random Finite Set Framework 521

approach to RFS-SLAM was proposed in [14] however vector-based approxima-


tions of RFS integrals were used which invalidate the approach. A theoretically
correct factored approach was presented in [15], [9], which propagates the poste-
rior PHDs of multiple trajectory-conditioned maps and the posterior distribution
of the vehicle trajectory, with implementations on marine based radar data also
appearing in [16].

3 The Robotics RFS Filtering Framework


This section details both the RFS map estimation filter and the RFS SLAM
filter.

3.1 The Robotic Mapping Problem


As was first proposed in [7] and [8], let M be the RFS representing the entire
unknown environment comprising static map features and multiple mobile robots
and let Mk−1 be the RFS representing the subset of the map that has passed
through the field-of-view (FoV) of the on-board sensor with trajectory X0:k−1 =
[X0 , X1 , . . . , Xk−1 ] at time k − 1, i.e.

Mk−1 = M ∩ F oV (X0:k−1 ). (1)

Given this representation, Mk−1 evolves in time according to,


 
Mk = Mk−1 ∪ F oV (Xk ) ∩ M̄k−1 (2)

where M̄k−1 = M − Mk−1 , i.e the set of features that are not in Mk−1 . If
fk|k−1 (Mk |Mk−1 , Xk−1 ) then represents the RFS feature map state transition
density, the generalised Bayesian RFS robotic mapping recursion can be written
[17],

pk|k−1 (Mk |Z0:k−1 , X0:k ) = fk|k−1 (Mk |Mk−1 , Xk )×

pk−1 (Mk−1 |Z0:k−1 , Xk−1 )δMk−1 (3)

gk (Zk |Xk , Mk )pk|k−1 (Mk |Z0:k−1 , X0:k )


pk (Mk |Z0:k , X0:k ) =  (4)
gk (Zk |Xk , Mk )pk|k−1 (Mk |Z0:k−1 , X0:k )δMk

where gk (Zk |·) denotes the likelihood of the RFS measurement and δ denotes
a set integral. Integration over the map, requires integration over all possible
feature maps (all possible locations and numbers of features).

3.2 The Joint Robotic Mapping and Localisation Problem


To jointly estimate the map and the robot trajectory, denoted by the random
vector X1:k , the posterior density of (4) can be modified to get,
522 J. Mullane et al.

pk (Mk , X1:k |Z0:k , U0:k−1 , X0 ) = pk (X1:k |Z0:k , U0:k−1 , X0 )pk (Mk |Z0:k , X0:k ).
(5)
where U0:k−1 denotes the robot control inputs random vector. Note that the
second term is exactly equivalent to the posterior of (4). The first term can be
calculated via,
pk|k−1 (Mk |Z0:k−1 , X0:k )
pk (X1:k |Z0:k , U0:k−1 , X0 ) = gk (Zk |Mk , Xk ) ×
pk (Mk |Z0:k , X0:k )
pk|k−1 (X1:k |Z0:k−1 , U0:k−1 , X0 )
× (6)
gk (Zk |Z0:k−1 )
Further details can be seen in [9], [10], [15], [16].

3.3 First Order Moment Approximation


As with the classical approaches, the previous Bayesian recursions are numer-
ically intractable and sensible approximations are required. In this work, the
predicted and posterior RFS maps of (3) and (4) are approximated by Pois-
son RFSs with PHDs vk|k−1 (m|Z0:k−1 , X0:k ) and vk (m|Z0:k , X0:k ). In essence,
this approximation assumes that features are iid and the number of features is
Poisson distributed, i.e.,

vk|k−1 (m|X0:k )
m∈Mk
pk|k−1 (Mk |Z0:k−1 , X0:k ) ≈   (7)
exp vk|k−1 (m|X0:k )dm

vk (m|X0:k )
m∈Mk
pk (Mk |Z0:k , X0:k ) ≈  . (8)
exp vk (m|X0:k )dm
This PHD approximation has been proven to be effective in multi-target tracking
[13]. As shown in [12], [17], the recursion of (3) and (4) can equally be reduced
to a predictor corrector form. The PHD predictor equation is,

vk|k−1 (m|X0:k ) = vk−1 (m|X0:k−1 ) + b(m|Xk ) (9)

where b(m|Xk ) is the PHD of the new feature RFS, B(Xk ). The PHD corrector
equation is then,

vk (m|X0:k ) = vk|k−1 (m|X0:k ) 1 − PD (m|Xk )+


Λ(m|Xk )
 (10)
z∈Z
ck (z|Xk ) + Λ(ζ|Xk )vk|k−1 (ζ|X0:k )dζ
k

where, PD (m|Xk ) is the probability of detecting a feature at m, ck (z|Xk ) is


the PHD of the clutter RFS and Λ(m|Xk ) = PD (m|Xk )gk (z|m, Xk ). The PHD
recursion is far more numerically tractable than propagating the RFS map den-
sities of (4). In addition, the recursion can be readily extended to incorporate
Mobile Robotics in a Random Finite Set Framework 523

Fig. 1. An example of a map PHD superimposed on the true map represented by black
dots. The peaks of the PHD represent locations with highest concentration of expected
number of features. The PHD on the left is at time k − 1, and that on the right is at
time k.

multiple sensors / swarms of robots by sequentially updating the map PHD


with the measurement from each robot. In addition, using the same Poisson
approximations, (6) can be readily evaluated. A graphical depiction of a PHD
approximated by a Gaussian Mixture before and after an update is shown in
figure 1.
The PHD construct is also beneficial to the multi-robot submapping problem
[18], where the global map may be recovered by simply adding (and averaging
the over-lap) the PHD sub-maps.

4 Filter Implementations
This section outlines a Gaussian Mixture implementation of the proposed filters.
For the Robotic Mapping filter of section 3.1,let the predicted map PHD,
Jk−1

(j)  (j) (j) 
vk−1 (m|Xk−1 ) = ηk−1 N m; μk−1 , Pk−1 (11)
j=1

(j) (j) (j)


which is a mixture of Jk−1 Gaussians, with ηk−1 , μk−1 and Pk−1 being the corre-
sponding predicted weights, means and covariances respectively for the j th Gaus-
sian component of the map PHD. Let the new feature intensity, b(m|Zk−1 , Xk ),
also be a Gaussian mixture of the form,
Jb,k

(j)  (j) (j) 
b(m|Zk−1 , Xk ) = ηb,k N m; μb,k , Pb,k (12)
j=1

where, Jb,k defines the number of Gaussians in the new feature intensity at
(j) (j) (j)
time k and ηb,k , μb,k and Pb,k are the corresponding components. The predicted
intensity is therefore also a Gaussian mixture,
Jk|k−1

(j)  (j) (j) 
vk|k−1 (m|Xk ) = ηk|k−1 N m; μk|k−1 , Pk|k−1 (13)
j=1
524 J. Mullane et al.

which consists of Jk|k−1 = Jk−1 + Jb,k Gaussians representing the union of the
prior map intensity, vk−1 (m|Xk−1 ), and the proposed new feature intensity, ac-
cording to (9). Since the measurement likelihood is also of Gaussian form, it
follows from (10) that the posterior map PHD, vk (m|Xk ) is then also a Gaus-
sian mixture given by,

Jk|k−1

(j)
vk (m|Xk ) = vk|k−1 (m|Xk ) 1 − PD (m|Xk ) + vG,k (z, m|Xk ) .
z∈Zk j=1

The components of the above equation are given by,


(j) (j) (j) (j)
vG,k (z, m|Xk ) = ηk (z|Xk )N (m; μk|k , Pk|k ) (14)

(j)
(j)
PD (m|Xk )ηk|k−1 q (j) (z, Xk )
ηk (z|Xk ) = Jk|k−1
(15)

()
c(z) + PD (m|Xk )ηk|k−1 q () (z, Xk )
=1
 (j) (j) 
where, q (z, Xk ) = N
(j)
z; Hk μk|k−1 , Sk .
The terms μk|k , Pk|k and Sk can be
obtained using any standard filtering technique such as EKF or UKF. In this
paper, the EKF updates are adopted. The clutter RFS, Ck , is assumed Poisson
distributed [2] in number and uniformly spaced over the mapping region and
Gaussian management methods are carried out as in [19].
For the Robotic Mapping filter of section 3.2, the location density of (6) can
be propagated via Particle Filtering techniques [6], [15]. If the vehicle transition
density is chosen as the proposal distribution, the weighting for the ith particle
becomes,
w
(i) (i) )w(i) .
k = gk (Zk |Z0:k−1 , X (16)
0:k k−1

Then for each hypothesised robot trajectory particle, an independent robotic


mapping filter of equations (17)-(22) is executed.

5 Results and Analysis


The benchmark algorithms used in the analysis are the EKF-based mapping fil-
ters [4] and the FastSLAM localisation [6] algorithm with maximum likelihood
data association, using a mutual exclusion constraint and a 95% χ2 confidence
gate. An arbitrary simulated feature map and vehicle trajectory are used as
shown in figure 2. Existing approaches to mobile robotics typically deal with
interfering measurements through ‘feature management’ routines, primarily via
the landmark’s (feature’s) quality (LQ) [4] or a binary Bayes filter [6]. These
operations are typically independent of the main filtering update, whereas the
proposed approach unifies feature management, data association and state fil-
tering into a single Bayesian update giving it a more robust performance in the
presence of multiple mobile robots.
Mobile Robotics in a Random Finite Set Framework 525

Fig. 2. Left: The simulated environment showing point features (green circles). A sam-
ple measurement history plotted from the robot trajectory (green line) is shown. Right:
Comparison of mapping error vs. measurement noise for the proposed filters and clas-
sical vector EKF solutions.

Figure 2 also shows the filter performance in increasing measurement noise.


Figure 3 shows a comparison of the map estimate for each filter at differing noise
inflation values, γ. In a mobile robot swarm, mapping robots should exclude the
moving robots from the static feature map. As such, figure 3 also depicts the
suitability of the proposed approach to swarm based mapping, as the approach
is robust to an increasing number of other robots. The proposed mapping frame-
work performs well in the presence of increased measurement uncertainty and
other mobile robots.

Fig. 3. Left: Feature mapping error vs. clutter density for vector based NN-EKF and
JCBB-EKF approaches and the proposed PHD framework, with the PHD approach
seen to perform well in high clutter. Right: Comparison of the map estimation error in
the presence of increasing densities per square meter of mobile robots.

By assuming an unknown vehicle trajectory, and applying random control in-


puts, this section analyses the proposed joint mapping and localisation filter in
comparison to the popular FastSLAM algorithm [6]. Figure 5 demonstrates the
ability of the trajectory estimation filter to estimate the location of the robot at
each time step, showing less error than conventional methods. In terms of estimat-
ing the map from an unknown path, figure 4 show how the RFS framework and
proposed filters maintain accurate estimates of the number of features in the map
and their locations, even in the presence of false measurements from other robots
526 J. Mullane et al.

Fig. 4. Left: The average estimated number of features in the map vs. ground truth
for each approach. The feature number estimate from the proposed approach can be
seen to closely track that of the ground truth. Right: A comparative plot of the mean
and standard deviation of the map estimation error vs. time. Note that the ‘ideal’ error
converges to zero, an important property for robotic mapping filters.

7
GPS

6
RB−PHD−SLAM
Average Positional RMSE (std)

NN−FastSLAM-FE
5
NN−FastSLAM-LQ
Wireless Industrial FMCW MMWR
Modem PC
4
LMS 200
LMS 200
3 E-Stop
Steering Motor Speed
Encoder Controllers
2

Wheel Encoders
0
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Time Index

Fig. 5. Left: The mean and standard deviation of the expected trajectory estimates of
the proposed RFS approach versus that of FastSLAM over 50 MC runs. LQ refers to
an implementation with the ‘landmark quality’ method of [4]. Right: The Autonomous
Robot used in experimental trials.

Fig. 6. A: Raw radar measurements and noisy vehicle path. B: The scan map plotted
from the GPS path. C: Posterior Estimate from FastSLAM, D: Posterior Estimate from
PHD-SLAM.
Mobile Robotics in a Random Finite Set Framework 527

and mapping noise. Experimental results based on a 77GHz millimeter wave radar
mounted on an autonomous robot, as seen in figure 5, are shown in 6.

6 Conclusion

This paper shows that from a fundamental estimation viewpoint that a feature-
based map is a finite set and subsequently presented a Bayesian filtering for-
mulation as well as a tractable solution for the feature-based mobile robotics
problem. The framework outlined here presents a new direction of research for
the multiple mobile robot community, which naturally encapsulates the inherent
system uncertainty. Both a mapping-only and joint robot trajectory / map filter
was introduced and analysed. In contrast to existing frameworks, the RFS ap-
proach to mobile robotics jointly estimates the number of features in the map as
well as their individual locations in the presence of data association uncertainty
and clutter. It was also shown that this Bayesian formulation admits a number
of optimal Bayes estimators for mobile robotics problems. Analysis was carried
out both in a simulated environment through Monte Carlo trials, demonstrating
the robustness of the proposed filter, particulary in the presence of large data
association uncertainty and clutter, illustrating the merits of adopting an RFS
approach a swarm-based robotics applications.

Acknowledgements

This research is funded in part by the Singapore National Research Foundation


through the Singapore-MIT Alliance for Research and Technology CENSAM.
The second author is supported in part by discovery grant DP0880553 awarded
by the Australian Research Council.

References
1. Smith, R., Self, M., Cheeseman, P.: Estimating uncertain spatial relationships in
robotics. In: Autonomous Robot Vehicles, pp. 167–193 (1990)
2. Makarsov, D., Durrant-Whyte, H.: Mobile vehicle navigation in unknown envi-
ronments: a multiple hypothesis approach. In: IEE Proceedings of Contr. Theory
Applict., vol. 142 (July 1995)
3. Thrun, S.: Particle filter in robotics. In: Uncertainty in AI (UAI) (2002)
4. Dissanayake, G., Newman, P., Durrant-Whyte, H., Clark, S., Csorba, M.: A solu-
tion to the simultaneous localization and map building (SLAM) problem. IEEE
Transactions on Robotic and Automation 17(3), 229–241 (2001)
5. Guivant, J., Nebot, E., Baiker, S.: Autonomous navigation and map building using
laser range sensors in outdoor applications. Journal of Robotic Systems 17(10),
565–583 (2000)
6. Montemerlo, M., Thrun, S., Siciliano, B.: FastSLAM: A Scalable Method for the
Simultaneous Localization and Mapping Problem in Robotics. Springer, Heidelberg
(2007)
528 J. Mullane et al.

7. Mullane, J., Vo, B., Adams, M., Wijesoma, W.: A random set formulation for
bayesian SLAM. In: Proceedings of the IEEE/RSJ International Conference on
Intelligent Robots and Systems, France (September 2008)
8. Mullane, J., Vo, B., Adams, M., Wijesoma, W.: A random set approach to SLAM.
In: Proceedings of the IEEE International Conference on Robotics and Automation
(ICRA) workshop on Visual Mapping and Navigation in Outdoor Environments,
Japan (May 2009)
9. Mullane, J., Vo, B., Adams, M., Vo, B.: A random finite set approach to bayesian
SLAM. IEEE Transactions on Robotics 27(2), 268–283 (2011)
10. Mullane, J., Vo, B., Adams, M., Vo, B.: Random Finite Sets for Robot Mapping
& SLAM. Springer Tracts in Advanced Robotics (to appear)
11. Mahler, R.: Global integrated data fusion. In: Proc. 7th Nat. Symp. on Sensor
Fusion, vol. 1, pp. 187–199 (1994)
12. Mahler, R.: Multi-target bayes filtering via first-order multi-target moments. IEEE
Transactions on AES 4(39), 1152–1178 (2003)
13. Mahler, R.: Statistical Multisource Multitarget Information Fusion. Artech House
(2007)
14. Kaylan, B., Lee, K., Wijesoma, W.: FISST-SLAM: Finite set statistical approach
to simultaneous localization and mapping. International Journal of Robotics Re-
search 29(10), 1251–1262 (2010), Published online first in October 2009
15. Mullane, J., Vo, B., Adams, M.: Rao-blackwellised PHD SLAM. In: Proceedings of
the IEEE International Conference on Robotics and Automation (ICRA), Alaska,
USA (May 2010)
16. Mullane, J., Keller, S., Rao, A., Adams, M., Yeo, A., Hover, F., Patrikalakis, N.:
X-band radar based SLAM in singapore’s off-shore environment. In: Proceedings
of the 11th IEEE ICARCV, Singapore (December 2010)
17. Vo, B., Singh, S., Doucet, A.: Sequential monte carlo methods for multi-target
filtering with random finite sets. IEEE Transactions on Aerospace and Electronic
Systems 41(4), 1224–1245 (2005)
18. Shoudong, H., Zhan, W., Dissanayake, G.: Sparse local submap joining filter for
building large-scale maps. IEEE Transactions on Robotics 24(5), 1121–1130 (2008)
19. Vo, B., Ma, W.: The gaussian mixture probability hypothesis density filter. IEEE
Transactions on Signal Processing 54(11), 4091–4104 (2006)
IMM Algorithm for a 3D High Maneuvering Target
Tracking

Dong-liang Peng and Yu Gu

Institute of Information and Control, Hangzhou Dianzi University,


Hangzhou, 310018, China
{dlpeng,guyu}@hdu.edu.cn

Abstract. A major challenge posed by target tracking problems is the target


flying at high speeds and performing “high-g” turns in the 3D space. In this
situation, horizontally or decoupled models may lead to an unacceptable accuracy.
To address this problem an IMM algorithm that includes a 3D constant velocity
model (CV), a 3D “current” statistic model (CSM), and 3D constant speed
coordinated turn model (3DCSCT) with the kinematic constant for constant speed
targets is proposed. The tracking performance of the proposed IMM algorithm is
compared with that of an IMM algorithm utilizing CV, 3DCSCT, and constant
acceleration model (CA) or singer model. Simulation results that demonstrate the
algorithm is feasible and practical for a 3D high maneuvering target tracking.

Keywords: 3D constant speed coordinate turn model, high maneuvering target


tracking, IMM.

1 Introduction
The problem of tracking maneuvering targets has been studied extensively since the
mid 1960s. However, it’s still a challenge to track targets that flying at high speeds,
particularly performing “high-g” turns in the 3D space. When targets maneuver in a
horizontal plane with nearly constant speed and turn rate and have little or limited
vertical maneuver(such as civilian aircraft in ATC system), many of 2D horizontal
models and algorithms can lead to an acceptable accuracy [1]. In practice, targets
(such as military aircraft) always perform an arbitrary trajectory in the 3D space with
high speed or coordinated-turn. In this situation, horizontally or decoupled models
may lead to an unacceptable accuracy. So it’s necessary to investigate an algorithm
for a 3D maneuvering target tracking.
The multiple model approach has been observed to be a successful method for
maneuvering target tracking [2]. The results of previous investigations also indicate
that the IMM algorithm is the superior technique for tracking maneuvering targets
when the computational requirements of the technique are considered [1, 2, 3]. The
IMM algorithm uses model (Markov chain state) probabilities to weigh the inputs and
outputs of a bank of parallel Kalman filters (maybe other filter) at each time instant.
The key matter of the IMM algorithm is how to select models to match the real
motion mode. In other words, the combination of models in the IMM algorithm plays
an important role for the final tracking accuracy.

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 529–536, 2011.
© Springer-Verlag Berlin Heidelberg 2011
530 D.-l. Peng and Y. Gu

If the target flying at high speeds and performing coordinated-turns in 3D space, an


algorithm use single model (such as constant acceleration model, singer model,
“current” statistical model and so on) can exhibit considerable model error [4]. The
3D constant speed coordinate turn model (3DCSCT) with the kinematic constraint for
constant speed targets is considered in the IMM algorithm in order to match the
coordinated-turn mode. The turning rate, as a parameter in 3DCSCT model, is
computed as the magnitude of the target acceleration divided by its speed. So the
precision of this model partly depends on the estimation of acceleration and velocity.
Three model combinations that include CA-CV-3DCSCT, Singer -CV-3DCSCT, and
CSM-CV-3DCSCT are given to track the 3D high maneuvering target. Simulation
results show that the IMM algorithm of CSM-CV-3DCSCT is more effective than
other two combinations for a maneuvering target with a high acceleration.
This paper is structured as follows. Four 3D motion models are reviewed in
Section 2. In Section 3, the IMM estimator for target tracking with r modes is
outlined. In Section 4, the performance of the proposed IMM algorithm is illustrated
and the analysis of the simulation results is given. Finally in Section 5, we make some
concluding remarks.

2 3D Motion Models

2.1 Constant Acceleration Model (CA)

CA model assumes that the acceleration derivative is a white noise, i.e.


a (t ) = w(t ) (2.1)
The corresponding 3D discrete-time model is
x(k) = diag[F(k), F(k), F(k)]x(k −1) + w(k) (2.2)

with state x = [ x, x, x, y, y, y, z , z , z ] , and where


⎡1 T T 2 / 2 ⎤
⎢ ⎥
F (k ) = ⎢ 0 1 T ⎥ (2.3)
⎢0 0 1 ⎥⎦

2.2 Singer Model (Singer)

Singer model is explored by singer in 1970 [4]. It assumes that the target acceleration
a (t ) is a zeros-mean first-order stationary Markov process with autocorrelation [5]
R a (τ ) = E { a ( t ) a ( t + τ )} = σ a2 e − α τ (a ≥ 0) (2.4)
where a is the reciprocal of the maneuver time constant and depends on how long the
maneuver lasts. σ a2 is the “instantaneous variance” of the acceleration.
Such a process a (t ) is the state process of a linear time-invariant system
IMM Algorithm for a 3D High Maneuvering Target Tracking 531

a (t ) = −α a (t ) + w(t ) (α > 0) (2.5)


where w(t ) is zero-mean white noise with power spectral density 2 aσ 2
a
.
The corresponding 3D discrete-time model is
x(k) = diag[F(k), F(k), F(k)]x(k −1) + w(k) (2.6)

with state x = [ x, x, x, y , y , y , z , z , z ] , and where

⎡1 T (αT −1+ e−αT ) / α 2 ⎤


⎢ ⎥
F (k) = ⎢0 1 (1− e−αT )/ α ⎥ (2.7)
⎢0 0 e−αT ⎥
⎣ ⎦

2.3 “Current” Statistical Model (CSM)

In practice, if a target maneuvers with a specific acceleration at time k, the


acceleration at time k+1 should be correlated with the “current” acceleration.
Considering this fact, Zhou H R proposed a “current” statistical model [6]. It assumes
the acceleration has a non-zero mean, such a non-zero-mean acceleration satisfies
a (t ) = −α a (t ) + α a (t ) + w(t ) (2.8)

where a (t ) is the mean of the acceleration, assumed to be constant over each


sampling interval. So the estimate aˆ( k ) of a ( k ) is taken to be the “current” value of
the mean a (k + 1) , and this is available online information.
The corresponding 3D discrete-time equivalent can be represented as

x(k ) = diag[ F (k ), F (k ), F (k )]x(k − 1)


+ diag[U (k ), U (k ), U (k )]a + w(k ) (2.9)

with the state x = [ x, x, x, y, y, y, z , z , z ] , and where


⎡1 αT 2 1 − e−αT ⎤
⎢α − + +
α ⎥
( T )
⎡1 T (α T − 1 + e−αT ) / α 2 ⎤ ⎢
2

⎢ ⎥ 1 − e − αT
F (k ) = ⎢ 0 1 (1 − e −αT ) / α ⎥ , U (k ) = ⎢ T−

⎢ α ⎥ (2.10)
⎢0 0 e −α T ⎥ ⎢ ⎥
⎣ ⎦ ⎢ 1− e −αT

⎢ ⎥
⎣ ⎦
2.4 Constant Speed Coordinate Turn (CSCT)

The constant speed coordinated turn model [7, 8] assumes a circular target moves at
constant turn rate in a plane (for a constant speed motion, the acceleration vector is
orthogonal to the velocity vector). For an arbitrary plane of maneuver, the
acceleration can be described as
a = Ω×v (2.11)
532 D.-l. Peng and Y. Gu

where Ω is the (constant) turn rate vector and Ω = 0 , v is the velocity vector.
Taking the derivative of (2.12) lead to the following equivalent
a = (Ω ⋅ v )Ω − ( Ω ⋅ Ω )v (2.12)

Using the fact that v is orthogonal to Ω , that is, Ω ⊥ v , (2.13) can be reformulated
as
a = −ω 2 v (2.13)
where ω is defined as
a
ω Ω = (2.14)
v
If the acceleration perturbations modeled as white noise w , (2.13) can be
expressed as
a = −ω 2 v + w (2.15)
The corresponding 3D discrete-time model is
x(k) = diag[F(ω), F(ω), F(ω)]x(k −1) + w(k) (2.16)
where
⎡ sin ωT 1 − cos ωT ⎤
⎢1 ω ω2 ⎥
⎢ ⎥
sin ωT ⎥
F (ω ) = ⎢ 0
(2.17)
cos ωT
⎢ ω ⎥
⎢ ⎥
⎢ 0 −ω sin ωT cos ωT ⎥
⎣⎢ ⎥⎦
and T is the sampling period.

3 IMM Estimator
Here we consider a typical linear dynamic system, it can be represented as
X (k ) = F (k ) X (k − 1) + W (k ) (3.1)

Z (k ) = H (k ) X (k ) + V (k ) (3.2)

where X(k) denoted by X = [ x, x, x, y, y, y, z , z , z ] is the system state that are


and Z ( k ) is the output measurement. F ( k ) is the state transition matrix. H ( k ) is
the measurement matrix. W ( k ) ∼ N (0, Q ( k )) and V ( k ) ∼ N (0, R ( k )) are the Gaussian
noises that are used to descript the system disturbance and the measurement noise,
respectively.
The IMM algorithm uses model (Markov chain state) probabilities to weigh the
inputs and outputs of a bank of parallel Kalman filters (maybe other filters) at each
time instant.
IMM Algorithm for a 3D High Maneuvering Target Tracking 533

The main steps of the IMM estimator [3, 9, 10, 11] are as follows:
Step 1- Model Interaction or Mixing
The mode-conditioned state estimate and the associated covariances from the
previous iteration are mixed to obtain the initial condition for the mode-matched
filters. The initial condition in cycle k for the Kalman filter matched to the j-th mode
is computed using

∑ Xˆ i (k −1)μi| j (k −1)
r
Xˆ 0j (k −1) i =1
(3.3)

and

Pj0 (k −1) = ∑i =1 μi| j (k −1){Pi (k −1) +[Xˆ i (k −1) − Xˆ 0j (k −1)][Xˆi (k −1) − Xˆ 0j (k −1)]'}
r
(3.4)

where r is the number of model-matched filters used. The state estimates and their
covariance matrix at time k-1 conditioned on the i-th model are denoted by Xˆi (k−1)
and Pi (k −1) , respectively; μi| j (k − 1) are the mixing probabilities and can be
described as
{
μi| j (k − 1) P m(k − 1) = i m(k ) = j, Z k −1 }
pij μi (k − 1)
= r
(i, j = 1, 2, r) (3.5)
∑ p μ (k − 1)
l =1
lj l

where m(k) is the index of the model in effect in the interval (k-1, k]. μi ( k ) is the
probability that the model i (i=1, 2, 3…r) is in effect in the above interval and can be
expressed as

μi (k ) P{m(k ) = i Z k } (3.6)

The cumulative set of measurements up to and including scan k is denoted by Zk. pij is
the model transition probability and is defined as
pij P{m( k ) = j m(k − 1) = i} (3.7)

The definitions of m(k-1), μi ( k − 1) and Zk-1 are similar to the definitions of m(k),
k
μi ( k ) and Z .
Step 2- Model-conditioned Filtering
According to the outline of the Kalman filtering, the mixed state Xˆ j (k − 1) and the
0

associated covariance matrix Pj0 ( k − 1) are matched to each model to yield the

model-conditioned state estimate Xˆ j ( k ) and its covariance matrix Pj ( k ) at time k . In


addition, the likelihood function Λ j (k ) of each model at time k can be computed
using
534 D.-l. Peng and Y. Gu

Λ j (k ) p{Z (k ) m(k ) = j , Z k −1} = N [υ j (k ); 0, S j ( k )] (3.8)

where υ j (k ) and S j (k ) are the measurement residual and its covariance.


N [υ j (k );0, S j (k )] denotes the normal pdf with argument υ j (k ) , mean zero and
covariance matrix S j (k ) .

Step 3- Model Probability Update


The model probabilities are updated based on the likelihood function of each model
using
μ j (k ) = Λ j (k ) ∑ l =1 plj μi (k − 1) c
r
(3.9)

where c is a normalization constant and can be computed using


r r
c = ∑ ∑ Λ i ( k ) pli μ l ( k − 1) (3.10)
i =1 l =1

Step 4- Estimate and Covariance Combination


The model-conditioned estimates and covariances are combined to find the overall
estimate Xˆ (k ) and its covariance matrix P(k) are obtained as follows:

Xˆ ( k ) = ∑ j =1 Xˆ j ( k ) μ j ( k )
r
(3.11)

P (k ) = ∑ j =1 μ j (k ){Pj (k ) +[Xˆ j (k) − Xˆ (k)][ Xˆ j (k) − Xˆ (k)]'}


r
(3.12)

4 Implementation and Simulation

4.1 Description of Simulation Scenarios

The simulation scenario is following:


(1) A 3D maneuvering target trajectory is considered. The target is located in A
(30km, 30km, 30km) at time t=0 s and moves with constant velocity v equal to
(300m/s, 300m/s, 300m/s) for time 0~50 s. The target executes a clockwise coordinate
turn B with a initial acceleration (10m/s2, -12m/s2, 2 m/s2) during 50~100 s, and such
acceleration vector to be given is orthogonal to the velocity vector in order to
maintain the orthogonal property ( a ⋅ v = 0 ) in 3DCSCT model. During 200~250s,
the target performs an anti-clockwise coordinate turn C with an arbitrary initial
acceleration (16 m/s2, 28 m/s2, -14 m/s2). The target moves with constant velocity
during 250~300s.
(2) The X-Y-Z positions of the target are measured and three measurement
standard deviations are all 400m. The process noise is assumed to be a white noise.
We denote CV-CA-3DCSCT as an IMM algorithm that includes a CV model, a
CA model, and a 3DCSCT model. The same signs are used to CV-Singer-3DCSCT
and CV-CSM-3DCSCT, respectively. All those algorithms have the same initial
model probabilities μ0 = [ 0.8 0.1 0.1] and the same model switching probabilities:
IMM Algorithm for a 3D High Maneuvering Target Tracking 535

⎡0.97 0.03 0 ⎤

pij = ⎢ 0 0.75 0.25⎥⎥ (4.1)
⎢⎣ 0.05 0 0.95⎥⎦

According to [8] R u is (800(0.92)k + 20) m2 /s4 when the kinematic constraint is


used for 3DCSCT model.

4.2 Simulation Results and Analysis

When the target perform turns B and C, the RMSE of the three algorithms in X, Y, Z
respectively are demonstrated in Fig. 1. Fig.2 is the specific of the RMSE of three
algorithms between 200~250s, i.e., Fig. 2 is the magnified version of Fig.1 between
200~250s.
It’s clearly shown that when the target is non-maneuvering the performance
of three algorithms is almost the same. At the turn B, CV-Singer-3DCSCT and
CV-CSM-3DCSCT have almost the same RMSE and are slightly better than CV-CA-
3DCSCT in tracking accuracy. When the assumption of 3DCSCT model is slightly
violated, such as turn C, CV-Singer-3DCSCT and CV-CSM-3DCSCT also have
almost the same RMSE, however, they are much better than CV-CA-3DCSCT.

Fig. 1. RMSE of three IMM algorithms for Fig. 2. RMSE of three IMM algorithms at
X-Y-Z turn C

5 Conclusions
The benefits of using the CSM and 3DCSCT model in IMM algorithm to track a 3D
high maneuvering target have been clearly demonstrated in this paper. When the
target perform “high-g” turn in 3D space, this IMM algorithm utilizing CSM is better
536 D.-l. Peng and Y. Gu

than other two IMM algorithms, which Singer and CA are included. However, how to
choose the parameter in models and filters is an important issue to be addressed in
future study.

References
1. Li, X.R., Jilkov, V.P.: Survey of maneuvering target tracking. In: Part V: multiple-models.
SPIE, vol. 4048, pp. 212–236 (2000)
2. Blom, H.A., Bar-Shalom, Y.: The interacting multiple model algorithm for systems with
markovian switching coefficient. IEEE Transactions on Automatic Control 33(8), 780–783
(1988)
3. Watson, G.A., Blair, W.D.: IMM algorithm for tracking targets that maneuver through
coordinated turn. In: SPIE, vol. 1698, pp. 236–247 (1992)
4. Nabaa, N., Bishop, R.H.: Validation and comparison of coordinated turn aircraft maneuver
models. IEEE Transactions on Aerospace and Electronic Systems 36(1), 250–259
5. Singer, R.A.: Estimating optimal tracking filter performance for manned maneuvering
targets. IEEE Transactions on Aerospace and Electronic Systems 6(4), 473–483 (1970)
6. Zhou, H.R., Jin, Z.L., Wang, P.D.: Maneuvering target tracking, pp. 135–145. National
Defence Industry Press, Beijing (1991)
7. Tahk, M., Speyer, J.L.: Target tracking problems subject to kinematic constraints. IEEE
Transactions on Automatic Control 35(3), 324–326 (1990)
8. Alouani, A.T., Blair, W.D.: Use of a kinematic constraint in tracking constant speed,
maneuvering targets. IEEE Transactions on Automatic Control 38(7), 1107–1111 (1993)
9. Bar-Shalom, Y., Li, X.R., Kirubarajan, T.: Estimation with applications to tracking and
navigation: theory, algorithms, and software, pp. 453–457. Wiley, New York (2001)
10. Li, X.R., Jilkov, V.P.: Survey of maneuvering target tracking. Part V: multiple-model
methods. IEEE Transactions on Aerospace and Electronic Systems 41(4), 1255–1321
(2005)
11. Kadirkamanathan, V., Li, P., Kirubarajan, T.: Sequential Monte Carlo filtering vs. the
IMM estimator for fault detection and isolation in nonlinear systems. In: SPIE, vol. 4389,
pp. 263–274 (2001)
A New Method Based on Ant Colony Optimization for
the Probability Hypothesis Density Filter*

Jihong Zhu1, Benlian Xu2, Fei Wang2, and Qiquan Wang1


1
School of Automation, NanJing University of Science & Technology,
NanJing, 210094, China
2
School of Electric and Automatic Engineering, ChangShu Institute of Technology,
ChangShu, 215500, China
{djyzhjh,xu_benlian,wangleea,wangzqwhz}@yahoo.com.cn

Abstract. A new approximating estimate method based on ant colony


optimization algorithm for probability hypothesis density (PHD) filter is
investigated and applied to estimate the time-varying number of targets and their
states in clutter environment. Four key process phases are included: generation of
candidates, initiation, extremum search and state extraction. Numerical
simulations show the performance of the proposed method is closed to the
sequence Monte Carlo PHD method.

Keywords: Multi-target tracking, Probability hypothesis density, Ant colony


optimization, extremum search.

1 Introduction
Multi-target tracking (MTT) is regarded as a classic but an intractable problem in a
wide variety of contexts. According to recent literature [1-5], data association (DA)
problems form the main stream in MTT. But due to its combinatorial nature, the DA
problem makes up the bulk of the computational load in MTT filed. The random finite
sets (RFS) which avoids explicit associations between measurements and tracks
becomes an alternative formulation in recent decade. Especially, the probability
hypothesis density (PHD) [6], a novel RFS-based filter, and its implementations have
generated substantial interest.
The PHD filter operates on the single-target state space and avoids the
combinatorial problem that arises from DA problem. This salient feature renders the
PHD filter extremely attractive. However, the PHD recursion involves multiple
integrals that have no closed form solutions in general. Fortunately, two methods have
been successfully developed for approximating the PHD filter so that it can be
implemented [7-8], i.e., the sequence Monte Carlo PHD method (SMCPHD) [7] and
the Gaussian mixture PHD method (GMPHD) [8]. Hundreds of papers based on
methods in [7-8] are proposed in recent decade. But most of them are applied directly
with two methods in different fields or modified with traditional DA algorithm.

*
This work is supported by national natural science foundation of China (No.60804068) and by
national science foundation of Jiangsu province (No.BK2010261) and by cooperation
innovation of industry, education and academy of Jiangsu province (No.BY2010126).

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 537–542, 2011.
© Springer-Verlag Berlin Heidelberg 2011
538 J. Zhu et al.

So far, there are few reports on the ant-based applications to the parameter estimate
field or multi-target tracking except [9-11]. In this work, a novel approximating
method based on ant colony optimization (ACO) for PHD filter is proposed. The
remainder of this paper is organized as follows. Section 2 presents the background on
the PHD filter. Section 3 describes the principle of the proposed method for PHD
filter. Numerical simulations are conducted and corresponding results are analyzed in
Section 4. Finally, conclusions are drawn in Section 5.

2 The Probability Hypothesis Density Filter


The more details for PHD filter are introduced in [6]. Here, only main formulas are
given.
For a RFS X on χ with probability distribution P , its first-order moment or
intensity is a function v : χ → [ 0, ∞ ) such that for each region S ⊆ χ

∫ S
v( x)dx = ∫ | X ∩ S | P(dX ) (1)

where | X | denotes the cardinality of a set X . In other words, the integral of v over
any region S gives the expected number of elements of X that are in S . This
intensity is commonly known in the tracking literature as PHD.
Let vk and vk |k −1 denote the respective intensities associated with the multi-target
posterior density pk and the multi-target predicted density pk |k −1 . So the PHD filter
can be shown that the posterior intensity can be propagated in time via the PHD
recursion (2) and (3).

vk |k −1 ( x) = ∫ ps , k (ς ) f k |k −1 ( x | ς )vk −1 (ς )d ς
(2)
+ ∫ β k |k −1 ( x | ς )vk −1 (ς )d ς + γ k ( x)

vk ( x) = ⎡⎣1 − pD , k ( x) ⎤⎦ vk |k −1 ( x)
pD , k ( x) g k ( z | x)vk |k −1 ( x) (3)
+ ∑κ
Z ∈Z K ( z ) + ∫ pD , k (ξ ) g k ( z | ξ )vk |k −1 (ξ )d ξ
k

where γ k (i) denotes the intensity of the birth RFS Γ k at time k , β k |k −1 (i| ς ) denotes
the intensity of the RFS Bk |k −1 (ς ) spawned at time k by a target with previous state
ς , ps , k (ς ) denotes the probability that a target still exists at time k given that its
previous state is ς , f k |k −1 (i| ς ) denotes the transition probability density of individual
targets, pD , k ( x) denotes the probability of detection given a state x at time k ,
g k ( z |i) denotes the likelihood of individual targets and γ k (i) denotes the intensity of
the clutter RFS Κ k at time k .
A New Method Based on Ant Colony Optimization for the PHD Filter 539

3 Approximating Method Based on ACO for PHD Filter

As mentioned in (1), the expected number of elements of X , Nˆ = ∫ v( x)dx can be


used as an estimate for the number of targets. The local maximum of the intensity are
points in χ with the highest local concentration of expected number of targets and
hence can be used to generate estimates for the elements of X . So the idea that a
method may extract the peaks of intensity function directly has been generated, i.e.,
the solution problem is transfer to find all extreme points of intensity function.
Obviously, the Ant colony optimization (ACO) [12] provides an alternative to solve
this task, due to its successful application on many combinatorial optimization
problems and continuous-space optimization problems.
The approximating method based on ACO for PHD filter, which includes four
phases, i.e., generation of candidates, initiation, extremum search and state extraction,
is depended on [13], but made major changes with that in their work.
In the first phase, the state of candidates is generated as same as that in the particle
filter [7]. Without loss of generality, let xt(i−)1 denotes the state of candidate i at time
t − 1 , which is represented by the position ( xt(−i )1 , yt(−i )1 ) and the velocity ( xt(−i )1 , y t(−i )1 ) as
xt(i−)1 = [ xt(−i )1 , yt(−i )1 , xt(−i )1 , yt(−i )1 ]T .
In the second phase, the value of intensity function and some parameters should be
initiated. The number of ants N ant is fixed with the number of candidates, and the
pheromone of candidate i is set to be τ ( i ) = 1 .Given the importance
density pk (i| xk −1 ) , rk (i| xk −1 ) and qk (i| xk −1 , Z k ) , the (2) can be transformed by
f k |k −1 ( x | ς )
vk |k −1 ( x) = ∫ ps , k (ς ) pk ( x | xk −1 )vk −1 (ς )d ς
pk ( x | xk −1 )
(4)
β k |k −1 ( x | ς ) γ k ( x)
+∫ )rk ( x | xk −1 )vk −1 (ς )d ς + qk ( x | xk −1 )
rk ( x | xk −1 ) qk ( x | xk −1 )
The local maximum of the intensity are points with the highest local concentration
of expected number of targets, in other words, the true target always distributes
around these local maximum points. The value of intensity function of each candidate
is computed by formula (5).
vk |k −1 ( x) ≈ pk ( x | xk −1 )rk ( x | xk −1 ) + qk ( x | xk −1 ) (5)

In the third phase, the extremum search process is executed. Suppose the value of
candidate i is denoted by vk( i|k) −1 , the value of its neighbors is denoted by [vk(i|k−1)−1 , vk(i|k+1)−1 ] .
If ant am locates on the candidate i , it will move to this left neighbor or right
neighbor, four moving behaviors are designed as follows:
z If vk( i|k−1)−1 < vk( i|k) −1 and vk( i|k) −1 < vk( i|k+1)−1 holds, ant am will move to candidate i + 1 .

z If vk( i|k−1)−1 > vk(i|k) −1 and vk( i|k) −1 > vk(i|k+−1)1 holds, ant am will move to candidate i − 1 .
540 J. Zhu et al.

z If vk( i|k−1)−1 ≤ vk( i|k) −1 and vk( i|k) −1 ≥ vk( i|k+1)−1 holds, ant am will select candidate i + 1 or i − 1
with given probability threshold P0 .
z If vk( i|k−1)−1 > vk(i|k) −1 and vk( i|k) −1 < vk( i|k+1)−1 holds, ant am will select candidate i + 1 or i − 1
with probability P , which is given by
−ηij / C1
τ i( m ) e
Pij ( m ) = (6)
∑ τ in( m) e−η
n∈[ i −1, i +1]
in / C1

where C1 is a given positive constant, the ηij is heuristic value,


i.e., ηij = v(i )
k | k −1 −v( j)
k | k −1 , j ∈ [i − 1, i + 1] , which implied that the ant will move to its
neighbor which the value of function is bigger than that of it.
When all ants finish their tour, the pheromone update process is executed. Suppose
ant m move on candidate i , the ant will release pheromone on candidate i , and the
pheromone amount is denoted by Δτ im (t )

Δτ im (t ) = C2 (vk( m|k )−1 − vk(i|k) −1 ) (7)

where C2 is a given positive constant. If the number of ants has moved to the
candidate i is l at iteration t , the pheromone on candidate i is updated as following
formula
l
τ i (t ) = (1 − ρ )τ i (t ) + ∑ Δτ im (t ) (8)
m =1

Meanwhile, all ants will stay on the points with local maximum intensity function
value. But no all of these points are originated from true targets, so in the final phase,
the state extraction of targets is executed depending on the measurement at each time
step. Given the importance density g k′ ( z | x) , the formula (3) can be defined by

vk ( x) ≈ ⎣⎡1 − pD , k ( x) ⎦⎤ vk |k −1 ( x)
pD , k ( x) g k′ ( z | x)vk |k −1 ( x) (9)
+ ∑κ ′
Z ∈Z K k ( z ) + C3 pD , k ( x ) g k ( z | x )vk |k −1 ( x )

where C3 is a given positive parameter. Each candidate where ants stay on will be
computed the value based on formula (9), if vk ( xi ) of candidate i is smaller than a
given parameter ε , all ants staying on candidate i will die and the candidates with
surviving ants will be regarded the state originated from true target. And these
candidates will be utilized as the beginning process at next time step.

4 Numerical Simulations
For illustration purposes, two dimensional scenario with an unknown and time
varying number of targets observed in clutter over the surveillance region
A New Method Based on Ant Colony Optimization for the PHD Filter 541

[1km,3km] × [14km,16km] are considered, the Poisson birth RFS Γ k with intensity
γ k ( x) = 0.1N ( x, mr , Pr ) , mr = [2000,50,14816, −50]T , Pr = diag ([100,10,100,10]T ) and
other parameters are set to be as same as in [10]. The importance density used
are pk = f k |k −1 , qk = N (i, x , Q) and g k′ = g k ( z | x) . Additionally, the ant-based
parameters are set to be as follows: N iteration = 500 , C1 = 0.5 , C2 = 100 , C3 = 1.0 ,
ρ = 0.2 , ε = 1e − 20 .
4
x 10
3000 1.6
True tracks
2500 1.58
Measurements
x (m)

2000 1.56
True tracks
1500 1.54 Measurements

1000 1.52
5 10 15 20 25 30 35 40 45 50

y (m)
time step 1.5
4
x 10
1.6 1.48

1.55 1.46
y (m)

1.5 1.44

1.45 1.42

1.4 1.4
5 10 15 20 25 30 35 40 45 50 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
time step x (m)

Fig. 1. True target tracks and measurements

3000 3000

2500 2500
x(m)

x(m)

2000 2000

1500 Ant PHD estimates 1500 Particle PHD estimates


True tracks True tracks
1000 1000
5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50
time time
4 4
x 10 x 10
1.6 1.6

1.55 1.55
y(m)

y(m)

1.5 1.5

1.45 1.45

1.4 1.4
5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50
time time

Fig. 2. Position estimates of proposed method and SMCPHD method

4 3
Target number Target number
No. of targets

No. of targets

3 The estimated target number The estimated target number


2
2
1
1

0 0
5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50
time step time step
OSPA distance (in m)

OSPA distance (in m)

1.69
10 1.6
10

1.1
1.6 10
10
5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50
time step time step

Fig. 3. Target number estimate and OSPA distance of proposed method and SMCPHD
542 J. Zhu et al.

Figure 1 shows the true target tracks in clutter environment. Figure 2 shows that
the position estimates based on the proposed method and SMCPHD (500 particles).
Figure 3 shows target number estimate and OSPA distance of proposed method and
SMCPHD. From Figure 2 and 3, it can be observed that the performance of proposed
method is closed to the SMCPHD in such kind of scenario. But our method is simpler
than SMCPHD method, due to the approximating represent.

5 Conclusions
A new approximating estimate method based on ACO algorithm for PHD filter is
proposed. The main idea is composed of four process phases, and the key idea of the
proposed method is that the extremum search method based on ACO deals with the
approximating recursive function. Simulations show that the proposed method can be
closed to the SMCPHD according to the OSPA distance metric. And the proposed
method is simpler than SMCPHD. Future work will focus on the estimate accuracy of
the proposed method and extend the method into maneuvering targets tracking cases.

References
1. Lee, M.S., Kim, Y.H.: New Data Association Method for Automotive Radar Tracking. IEE
Proc.-Radar Sonar Navig. 148(5), 297–301 (2001)
2. Li, X.R., Bar-Shalom, Y.: Tracking in Clutter with Nearest Neighbor Filters: Analysis and
Performance. IEEE Trans. On Aerospace and Electronic Systems 32(3), 995–1010 (1996)
3. Li, X.R.: Tracking in Clutter with Strongest Neighbor Measurements –Part I: Theoretical
Analysis. IEEE Trans. On Automatic Control 43(11), 1560–1578 (1998)
4. Fortmann, T., Bar-Shalom, Y., Scheffe, M.: Sonar Tracking of Multiple Targets Using Joint
Probabilistic Data Association. IEEE Journal of Oceanic Engineering, OE 8, 173–183 (1983)
5. Blackman, S.S.: Multiple Hypothesis Tracking for Multiple Target Tracking. IEEE A&E
Systems Magazine 19(1), 5–18 (2004)
6. Mahler, R.: Multi-target Bayes Filtering via First-order Multi-target Moments. IEEE
Trans. AES 39(4), 1152–1178 (2003)
7. Vo, B., Singh, S., Doucet, A.: Sequential Monte Carlo Implementation of the PHD Filter
for Multi-target Tracking. In: Proc. Int’l Conf. on Information Fusion, Cairns, Australia,
pp. 792–799 (2003)
8. Vo, B., Ma, W.K.: The Gaussian Mixture Probability Hypothesis Density Filter. IEEE
Trans. Signal Processing 54(11), 4091–4104 (2006)
9. Nolle, L.: On a Novel ACO-Estimator and its Application to the Target Motion Analysis
problem. Knowledge-Based Systems 21(3), 225–231 (2008)
10. Xu, B.L., Vo, B.: Ant Clustering PHD Filter for Multiple Target Tracking. Applied Soft
Computing 11(1), 1074–1086 (2011)
11. Xu, B.L., Chen, Q.L., Zhu, J.H., Wang, Z.Q.: Ant Estimator with Application to Target
Tracking. Signal Processing 90(5), 1496–1509 (2010)
12. Dorigo, M., Maniezzo, V., Colorni, A.: Positive Feedback as a Search Strategy. Technical
Report 91-016, Dipartimento di Elettronica, Politecnico di MILANO, Milan, Italy (1991)
13. Pang, C.Y., Li, X.: Applying Ant Colony Optimization to Search All Extreme Points of
Function. In: 5th IEEE Conf. on industrial Electronics and Applications, pp. 1517–1521
(2009)
A Hybrid Algorithm Based on Fish School
Search and Particle Swarm Optimization for
Dynamic Problems

George M. Cavalcanti-Júnior, Carmelo J.A. Bastos-Filho,


Fernando B. Lima-Neto, and Rodrigo M.C.S. Castro

Polytechnic School of Pernambuco, University of Pernambuco, Recife, Brazil


{gmcj,cjabf,fbln,rmcsc}@ecomp.poli.br

Abstract. Swarm Intelligence algorithms have been extensively applied


to solve optimization problems. However, some of them, such as Particle
Swarm Optimization, may not present the ability to generate diversity
after environmental changes. In this paper we propose a hybrid algo-
rithm to overcome this problem by applying a very interesting feature
of the Fish School Search algorithm to the Particle Swarm Optimiza-
tion algorithm, the collective volitive operator. We demonstrated that
our proposal presents a better performance when compared to the FSS
algorithm and some PSO variations in dynamic environments.

1 Introduction
The optima solutions for many real-world problems may vary over the time. For
example, the optimal routes for a computer network can change dynamically due
to nodes failures or due to unavailable links. Therefore, optimization algorithms
to solve real-world problems should present the capability to deal with dynamic
environments, in which the optima solutions can change along the time.
Many bio-inspired optimization algorithms have been proposed in the last
two decades. Among them, there are the swarm intelligence algorithms, which
were conceived based on some collective behaviors. In general, swarm algorithms
are inspired in groups of animals, such as flocks of birds, schools of fish, hives
of bees, colonies of ants, etc. Although a lot of swarm-based algorithms were
already proposed, just some few were designed to tackle dynamic problems.
One of the most used swarm intelligence algorithms is the Particle Swarm
Optimization (PSO). Despite the fast convergence capability, the vanilla version
of the PSO can not tackle dynamic optimization problems. It occurs because the
entire swarm often increases the explotation around a good region of the search
space, reducing the overall diversity of the population. However, some variations
of the PSO have been created in order to increase the capacity to escape from
regions in the search space where the optimum is not located anymore [1,2,3].
On the other hand, another swarm intelligence algorithm proposed in 2008, the
Fish School Search algorithm (FSS) [4,5,6], presents a very interesting feature that
can be very useful for dynamic environments. FSS presents an operator, called

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 543–552, 2011.

c Springer-Verlag Berlin Heidelberg 2011
544 G.M. Cavalcanti-Júnior et al.

volitive operator, which is capable to auto-regulate the exploration-exploitation


trade-off during the algorithm execution.
Since the PSO algorithm converges faster than FSS but can not auto-adapt
the granularity of the search, we believe the FSS volitive operator can be applied
to the PSO in order to mitigate this PSO weakness and improve the performance
of the PSO for dynamic optimization problems. Based on this, we propose in
this paper a hybrid algorithm, called Volitive PSO.
This paper is organized as follows. Section 2 provides the background on PSO
and FSS, also including a brief explanation of a well known PSO variation to
tackle dynamic problems, called Charged PSO. Section 3 describes our proposal,
which is a FSS-PSO hybrid algorithm. Section 4 presents the simulation setup.
Section 5 is divided in two sub-sections and depicts some results. The former
presents a parametrical analysis of our proposal and the latter shows a compar-
ison between our proposal and some other approaches. In Section 6 we give our
conclusions and we present some ideas for future works.

2 Background
2.1 PSO (Particle Swarm Optimization)
Particle Swarm Optimization is a population-based optimization algorithm in-
spired by the behavior of flocks of birds. It was firstly introduced by Kennedy
and Eberhart [7] and it has been largely applied to solve optimization problems.
The standard approach is composed by a swarm of particles, where each one
has a position within the search space − →
xi and each position represents a solu-
tion for the problem. The particles fly through the search space of the problem
searching for the best solution, according to the current velocity − →
vi , the best
−−−→
position found by the particle itself (Pbesti ) and the best position found by the
−−−→
entire swarm during the search so far (Gbest ).
According to the approach proposed by Shi and Eberhart [8] (this approach is
also called inertia PSO), the velocity of a particle i is evaluated at each iteration
of the algorithm by using the following equation:

→ −−−→ → −−−→ →
vi (t + 1) = w−

vi (t) + r1 c1 [Pbesti − −
xi (t)] + r2 c2 [Gbesti − −
xi (t)], (1)
where r1 and r2 are numbers randomly generated in the interval [0, 1]. The
inertia weight (w) controls the influence of the previous velocity and balances
the exploration-exploitation behavior along the process. It generally decreases
from 0.9 to 0.4 during the algorithm execution. c1 and c2 are called cognitive
and social acceleration constants, respectively, and weights the influence of the
memory of the particle and the information acquired from the neighborhood.
The position of each particle is updated based on the velocity of the particle,
according to the following equation:


xi (t + 1) = −

xi (t) + −

vi (t + 1). (2)
A Hybrid Algorithm Based on FSS and PSO for Dynamic Problems 545

The communication topology defines the neighborhood of the particles and,


as a consequence, the flow of information through the particles. There are two
basic topologies: global and local. In the former, each particle shares and acquires
information directly from all other particles, i.e. all particles use the same social
memory, called Gbest . In the local topology, each particle only share information
with two neighbors and the social memory is not the same within the whole
swarm. This approach, called Lbest , helps to avoid a premature attraction of all
particles to a single spot point in the search space.

2.2 Charged PSO

Since the standard PSO can not tackle dynamic problems due to the the low
capacity to increase the diversity after the entire swarm has converged to a
single region of the search space, many efforts to overcome this weakness have
been made. The simplest idea is to restart the particles every time the search
space changes. However, all the previous information obtained from the problem
during the search process is lost in this case.
An interesting approach introduced by Blackwell and Bentley [1] is the Charged
PSO, which uses the idea of electrostatic charges. Some particles are charged (they
repeal themselves) and some others are neutral. In general, the neutral particles
tend to exploit towards a single sub-region of the search space, whereas the charged
particles never converges to a unique spot. Nevertheless, the charged particles are
constantly exploring in order to maintain diversity.
In order to consider the effect of the charged particles, the velocity equation
receives a fourth term, as shown in the equation (3). This term is defined as the
acceleration of the particle i (−
→a i ) an can be seen in equation (4).

→ −−−→ → −−−→ →
vi (t + 1) = w−

vi (t) + r1 c1 [Pbesti − −
xi (t)] + r2 c2 [Gbest − −
xi (t)] + −

a i (t). (3)

 Qi Qj − →
r ij (t), if Rc ≤ −

r ij (t) ≤ Rp ,


a i (t) = =j →
i −
r ij (t)3
(4)
0, otherwise ,
where − →
r ij (t) = −

x i (t) − −

x j (t), Qi is the charge magnitude of the particle i, Rc
is the core radius and Rp is the perception limit of the particle. Neutral particles
have charge value equal to zero, i.e. Qi = 0.

2.3 FSS (Fish School Search)

The Fish School Search (FSS) is an optimization algorithm based on the gre-
garious behavior of oceanic fish. It was firstly proposed by Bastos-Filho et al in
2008 [4]. In the FSS, each fish represents a solution for the problem. The success
of a fish during the search process is indicated by its weight. The FSS has four
operators, which are executed for each fish of the school at each iteration: (i)
individual movement, which is responsible for local search stepind ; (ii) feeding,
which updates the fish weights indicating the degree of success or failure during
546 G.M. Cavalcanti-Júnior et al.

the search process so far; (iii) collective-instinctive movement, which makes all
fish moves toward a resultant direction; and (iv) collective-volitive movement,
which controls the granularity of the search. In this paper, as we are dealing
with dynamic environments, only the feeding and collective-volitive movement
operators are used to build the proposed hybrid algorithm.

Feeding operator
The feeding operator determines the variation of the fish weight at each iteration.
One should notice that a fish can increase or decrease its weight depending,
respectively, on the success or failure during the search process. The weight of
the fish is evaluated according to the following equation:
Δfi
Wi (t + 1) = Wi (t) + , (5)
max(|Δf |)
where Wi (t) is the weight of the fish i, Δfi is the variation of the fitness function
between the new position and the current position of the fish, max(|Δf |) is
the absolute value of the greatest fitness variation among all fish. There is a
parameter wscale that limits the maximum weight of the fish. The weight of
each fish can vary between 1 and wscale and has an initial value equal to wscale 2
.

Collective-volitive movement operator


This operator controls the granularity of the search executed by the fish school.
When the whole school is achieving better results, the operator approximates the
fish aiming to accelerate the convergence toward a good region. On the contrary,
the operator spreads the fish away from the barycenter of the school and the fish
have more chances to escape from a local minimum. The fish school expansion or
contraction is applied as a small drift to every fish position regarding the school
barycenter, which can be evaluated as shown below:
N − →

− x i (t)Wi (t)
B (t) = i=1 N − → . (6)
i=1 x i (t)

We use equation (7) to perform the fish school expansion (use sign +) or
contraction (use sign −).

− →


→ x i (t) − B (t)
x i (t + 1) = −

x i (t) ± stepvol r1 −
→ , (7)
d(−
→x i (t), B (t))


where r1 is a number randomly generated in the interval [0, 1]. d(− →
xi , B ) evaluates
the euclidean distance between the particle i and the barycenter. stepvol is called
volitive step and controls the step size of the fish. The stepvol is bounded by two
parameters (stepvol min and stepvol max ) and decreases linearly from stepvol max
to stepvol min along the algorithm iterations. It helps the algorithm to initial-
ize with an exploration behavior and change dynamically to an exploitation
behavior.
A Hybrid Algorithm Based on FSS and PSO for Dynamic Problems 547

3 Volitive PSO

This section presents the proposed algorithm, called Volitive PSO, which is a
hybridization of the FSS and the PSO algorithms. Our proposal is to include two
FSS operators in the Inertia PSO, the feeding and the collective-volitive move-
ment. In the Volitive PSO, each particle becomes a weighted particle, where the
weight is used to indicate the collective-volitive movement, resulting in expan-
sion or contraction of the school. In our proposal, the stepvol does not decrease
linearly, it decreases according to equation (8). The parameter volitive step decay
percentage (decayvol ) must be in the interval [0, 100].

100 − decayvol
stepvol (t + 1) = stepvol (t) . (8)
100
The stepvol is reinitialized to stepvol max when a change in the environment is
detected. We use a sentry particle [9] to detect these changes. The fitness of the
sentry particle is evaluated in the end of each iteration and in the beginning of
the next iteration. The Algorithm 1.1 shows the Volitive PSO pseudocode.

Algorithm 1.1: Volitive PSO pseudocode


Initialize parameters and particles;
while the stop condition is not reached do
foreach particle of the swarm do
Evaluate the fitness of the particle;
−−−→ −−−→
Evaluate Pbest and Lbest ;
end
if an environment change is detected then
Initialize stepvol ;
end
foreach particle of the swarm do
Update the velocity and the position of the particle;
Evaluate the fitness of the particle;
end
Execute feeding operator;
Execute collective-volitive movement operator;
foreach particle of the swarm do
−−−→ −−−→
Evaluate Pbest and Lbest ;
end
Update stepvol and w;
end

4 Simulation Setup
In this section we present the benchmark function, the metric to measure the qual-
ity of the algorithms and the values for the parameters used in the simulations.
548 G.M. Cavalcanti-Júnior et al.

4.1 Benchmark Function

We used the DF1 benchmark function proposed by Morrison and Jong [10] in our
simulations. DF1 is composed by a set of random peaks with different heights
and slopes. The number of peaks, their heights, slopes, and positions within the
search space are adjustable. The function for a N -dimensional space is defined
according to the equation (9).

f (−

x ) = maxi=1,2,...,P [Hi − Si (−

x −− →
xi )2 ], (9)

where P is the number of peaks (peak i is centered in the position −→


xi ), Hi is the
peak height and Si is the peak slope. The values for xid , Hi and Si are bounded.
The dynamic components of the environment are updated using discrete
steps. The DF1 uses a logistic function to control the generation of different
step sizes. The parameter used to calculate the steps is adjusted according to
the equation (10).
ei = rei−1 [1 − ei−1 ], (10)
where r is a constant in the interval [1,4]. As r increases, more simultaneous
results for e are achieved. As r gets closer to 4, the behavior becomes chaotic.
The dynamics of the environment is specified using the following parameters:
Npeaks is the number of peaks in motion; rh is the r value for height dynamics;
rs is the r value for slope dynamics; rxd is the r value for position dynamics in
dimension d; It is necessary to have a scaling factor for each r value.

4.2 Performance Metric

The mean fitness metric was introduced by Morrison [11]. He argued that a
representative performance metric to measure the quality of an algorithm in a
dynamic environment should reflect the performance of the algorithm across the
entire range of environment dynamics. The mean fitness is the average over all
previous fitness values, as defined below:
T
t=1 Fbest (t)
Fmean (T ) = , (11)
T
where T is the total number of iterations and Fbest is the fitness of the best
particle after iteration t. The advantage of the mean fitness is that it represents
the entire algorithm performance history.
We also used the collective mean fitness [11], that is simply the average value
of the mean fitness at the last iteration over a predefined number of trials.

4.3 Parameters Settings

All results presented in this paper are the average values after 30 trials. We
used 10,000 iterations for all algorithms. We performed the experiments in two
situations: (i) 10 dimensions and 10 peaks and (ii) 30 dimensions and 30 peaks.
A Hybrid Algorithm Based on FSS and PSO for Dynamic Problems 549

In this paper, only the peak positions are varied along the iterations. The heights
and slopes of the peaks were initialized randomly within the predefined interval.
The parameters used for the DF1 function are Hbase = 40, Hrange = 20, Hscale =
0.5, rh = 3.2, Sbase = 1, Srange = 7, Sscale = 0.5, rs = 1.2, xbase id = −10,
xrange id = 20, xscale id = 0.7, rxd = 3.2.
For all PSO algorithms, we used 50 particles, local topology, c1 and c2 equal
to 1.494 [12] and w decreasing linearly from 0.9 to 0.4 along 100 iterations. We
set up w = 0.9 every time an environment change is detected. We chose the
local topology since it helps to avoid premature convergence to a local optimum,
which is good for optimization in dynamic environments. The Charged PSO was
tested empirically with 30%, 50% and 70% of charged particles, and for Q = 4,
Q = 8, Q = 12 and Q = 16. In both scenarios, the best results were achieved
for 30% of charged particles and Q = 12. Hence, these values were used. For
the FSS, we used 50 fish, Wscale = 500, initial and final individual step equal to
2% and 0.01%, and initial and final volitive step equal to 40% and 0.1%. stepind
and stepvol decreases linearly along 100 iterations and are reinitialized when a
change in environment occurs. For the Volitive PSO, we used wscale = 500, and
stepvol min = 0.01%.

5 Results
5.1 Analysis of the Parameters
This section presents an analysis of the influence of the parameters decayvol
and stepvol max in the performance of the Volitive PSO. As preliminary re-
sults showed that the algorithm is more sensible to the decayvol parameter and
high values for decayvol do not present good performance, we tested the follow-
ing decayvol values: 0%, 10% and 25%. For each decayvol value, we varied the
stepvol max value and the box plots of the mean fitness at the last iteration are
shown in the Figure 1.
For the case 1 (10 dimensions and 10 peaks), the average mean fitness for dif-
ferent stepvol max are not so different (as shown in Figures 1(a), 1(c) and 1(e)).
However, slightly better results can be observed for decayvol = 10%. Neverthe-
less, for the case 2 (30 dimensions and 30 peaks), the best results were achieved
for decayvol equal to 0%. It indicates that is better to not diminish the stepvol for
spaces with higher dimensionality. The best results for the case 2 were achieved
when stepvol max = 40% and decayvol = 0%. Hence, we used these values for the
comparison presented in the next sub-section.

5.2 Comparison with Other Approaches


In this section we present a brief performance comparison among the Volitive
PSO, Inertia PSO, Restart PSO (simply reinitialize the particles when a change
in the environment is detected), Charged PSO and FSS. Figure 2 depicts the
average values of fitness for each algorithm. As can be seen, the Volitive PSO
achieved better results in average than the other algorithms in both cases.
550 G.M. Cavalcanti-Júnior et al.

(a) decayvol = 0%, 10d and 10 peaks. (b) decayvol = 0%, 30d and 30 peaks.

(c) decayvol = 10%, 10d and 10 peaks. (d) decayvol = 10%, 30d and 30 peaks.

(e) decayvol = 25%, 10d and 10 peaks. (f) decayvol = 25%, 30d and 30 peaks.

Fig. 1. Analysis of the parameters decayvol and stepvol max of the Volitive PSO
algorithm
A Hybrid Algorithm Based on FSS and PSO for Dynamic Problems 551

(a) 10 dimensions and 10 peaks. (b) 30 dimensions and 30 peaks.

Fig. 2. Comparative evolution of the algorithms on the DF1 function

Table 1. Collective Mean Fitness - Average (standard deviation) after 10, 000
iterations

(a) 10 dimensions and 10 peaks. (b) 30 dimensions and 30 peaks.


PSO 39.207 (6.533) PSO 24.827 (5.486)
Restart PSO 46.528 (5.590) Restart PSO 32.493 (6.088)
Charged PSO 42.249 (4.542) Charged PSO 22.039 (6.965)
FSS 31.032 (9.742) FSS 20.192 (7.340)
Volitive PSO 47.168 (4.517) Volitive PSO 41.854 (4.521)

Table 1 shows the collective mean fitness (and standard deviation in parenthe-
sis) after 10, 000 iterations. One can observe that the Volitive PSO also achieved
lower standard deviation in both cases.

6 Conclusion
In this paper we proposed a hybrid FSS-PSO algorithm for dynamic optimiza-
tion. We showed that the collective-volitive movement operator applied to the
PSO can help to maintain diversity when the search space is varying over the
time, without reducing the exploitation capability. Some preliminary results
showed that the volitive step must not decay quickly. It indicates the impor-
tant hole of the FSS-operator to generate diversity after environmental changes.
Further research includes a deeper analysis of the Volitive PSO and more tests
varying the peaks height and slopes. Also, we intend to analyze the dynamics of
the swarm within the search space.

Ackonowledgments
The authors acknowledge the financial support from CAPES, CNPq and Uni-
versity of Pernambuco for scholarships, support and travel grants.
552 G.M. Cavalcanti-Júnior et al.

References
1. Blackwell, T.M., Bentley, P.J.: Dynamic Search with Charged Swarms. In: Proceed-
ings of the Genetic and Evolutionary Computation Conference, pp. 19–26 (2002)
2. Rakitianskaia, A., Engelbrecht, A.P.: Cooperative charged particle swarm opti-
miser. In: Congress on Evolutionary Computation, CEC 2008, pp. 933–939 (June
2008)
3. Nickabadi, A., Ebadzadeh, M.M., Safabakhsh, R.: Evaluating the performance of
DNPSO in dynamic environments. In: IEEE International Conference on Systems,
Man and Cybernetics, pp. 2640–2645 (October 2008)
4. Bastos-Filho, C.J.A., Neto, F.B.L., Lins, A.J.C.C., Nascimento, A.I.S., Lima, M.P.:
A novel search algorithm based on fish school behavior. In: IEEE International
Conference on Systems, Man and Cybernetics, pp. 2646–2651. IEEE, Los Alamitos
(October 2009)
5. Bastos-Filho, C.J.A., Neto, F.B.L., Sousa, M.F.C., Pontes, M.R.: On the Influence
of the Swimming Operators in the Fish School Search Algorithm. In: SMC, pp.
5012–5017 (October 2009)
6. Bastos-Filho, C.J.A., de Lima Neto, F.B., Lins, A.J.C.C., Nascimento, A.I.S., Lima,
M.P.: Fish school search. In: Chiong, R. (ed.) Nature-Inspired Algorithms for Op-
timisation. SCI, vol. 193, pp. 261–277. Springer, Heidelberg (2009)
7. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of IEEE
international conference on neural networks, vol. 4, pp. 1942–1948 (1995)
8. Shi, Y., Eberhart, R.: A modified particle swarm optimizer. In: The 1998 IEEE
International Conference on Evolutionary Computation Proceedings, IEEE World
Congress on Computational Intelligence, pp. 69–73 (1998)
9. Carlisle, A., Dozier, G.: Applying the particle swarm optimizer to non-stationary
environments. Phd thesis, Auburn University, Auburn, AL (2002)
10. Morrison, R.W., Jong, K.A.D.: A test problem generator for non-stationary envi-
ronments. In: Proc. of the 1999 Congr. on Evol. Comput., pp. 2047–2053 (1999)
11. Morrison, R.W.: Performance Measurement in Dynamic Environments. In:
GECCO Workshop on Evolutionary Algorithms for Dynamic Optimization Prob-
lems, pp. 5–8 (2003)
12. Eberhart, R.C., Shi, Y.: Particle Swarm Optimization: Developments, Applications
and Resources. In: Proceedings of the IEEE Congress on Evolutionary Computa-
tion, CEC 2001 (2001)
Feeding the Fish – Weight Update Strategies
for the Fish School Search Algorithm

Andreas Janecek and Ying Tan

Key Laboratory of Machine Perception (MOE), Peking University


Department of Machine Intelligence, School of Electronics Engineering and
Computer Science, Peking University, Beijing, 100871, China
[email protected], [email protected]

Abstract. Choosing optimal parameter settings and update strategies


is a key issue for almost all population based optimization algorithms
based on swarm intelligence. For state-of-the-art optimization algorithms
the optimal parameter settings and update strategies for different prob-
lem sizes are well known.
In this paper we investigate and compare different newly developed
weight update strategies for the recently developed Fish School Search
(FSS) algorithm. For this algorithm the optimal update strategies have
not been investigated so far. We introduce a new dilation multiplier as
well as different weight update steps where fish in poor regions loose
weight more quickly than fish in regions with a lot of food. Moreover,
we show how a simple non-linear decrease of the individual and volitive
step parameters is able to significantly speed up the convergence of FSS.

1 Introduction
The Fish School Search (FSS) algorithm [1, 2, 3] is a recently developed swarm
intelligence algorithm based on the social behavior of schools of fish. By living
in swarms, the fish improve survivability of the whole group due to mutual pro-
tection against enemies. Moreover, the fish perform collective tasks in order to
achieve synergy (e.g. finding locations with lots of food). Comparable to real
fish that swim in the aquarium in order to find food, the artificial fish search
(swim) the search space (aquarium) for the best candidate solutions (locations
with most food ). The location of each fish represents a possible solution to the
problem – comparable to locations of particles in Particle Swarm Optimization
(PSO, [4]). The individual success of a fish is measured by its weight – conse-
quently, promising areas can be inferred from regions where bigger ensembles of
fish are located. As for other heuristic search algorithms we consider the prob-
lem of finding a “good” (ideally the global) solution of an optimization problem
with bound constraints in the form: minx∈Ω f (x), where f : RN → R is a non-
linear objective function and x is the feasible region. Since we do not assume
that f is convex, f may possess many local minima. Solving such tasks for high
dimensional real world problems may be expensive in terms of runtime if exact
algorithms were used. Various nature inspired algorithms have shown to be able

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 553–562, 2011.

c Springer-Verlag Berlin Heidelberg 2011
554 A. Janecek and Y. Tan

to preform well with these difficulties. Even though if these algorithms are only
meta-heuristics, i.e. there is no proof that they reach the global optimum of the
solution, these techniques often achieve a reasonably good solution for the given
task at hand in a reasonable amount of time.
Related work. The FSS algorithm was introduced to the scientific community
in 2008 [1]. This paper was extended to a book chapter [2] where FSS has been
evaluated and compared to different variants of PSO. Results indicate that FSS is
able to achieve better results as PSO on several benchmark functions, especially
on multimodal functions with several local minima. In another study [3] the same
authors analyzed the importance of the swimming operators of FSS and showed
that all operators have strong influences on the results. Although for some bench-
marks the individual operator alone sometimes produced better results than all
operators together, the results using only the individual operator are highly sen-
sitive to the initial and also final values of stepind and stepvol . Moreover, it was
shown that a rather large initial value for stepind (stepind initial = 10%) generally
achieved the best results. In a very recent study FSS has been used successfully
to initialize the factors of the non-negative matrix factorization (NMF) [5].
In this work we aim at investigating the influence of newly developed weight
update strategies for FSS as well as the influence of a non-linear decrease of
the step-size parameters stepind and stepvol . We introduce and compare weight
update strategies based on a linear decrease of weights, as well as a fitness based
weight decrease strategy. Moreover, we introduce a combination of (i) this fit-
ness based weight decrease strategy, (ii) the non-linear decrease of the step-size
parameters, and (iii) a newly introduced dilation multiplier which breaks the
symmetry between contraction and dilation but can be useful in some situa-
tions to escape from local minima. Experimental evaluation performed on five
benchmark functions shows that especially the non-linear decrease of the step-
size parameters is an effective and efficient way to significantly speed up the
convergence of FSS and also to achieve better fitness per iteration results.

2 The Fish School Search Algorithm


FSS is based on four operators which can be grouped into two classes: feeding
and swimming. Feeding represents updating the weight of the fish based on the
successfulness of the current movement. The swimming operators (individual
movement, collective instinctive movement, and collective volitive movement)
move the fish according to the feeding operator. FSS is closely related to PSO and
other population based algorithms such as Genetic Algorithms [6], Differential
Evolution [7], and the Firework Algorithm [8]. The main difference compared to
PSO is that no global variables need to be logged in FSS. Some similarities and
differences of FSS to other population based algorithms are given in [2].
FSS operators. In the following we briefly review the basic operators of the
Fish School Search algorithm as presented in [3]. A pseudo code of the FSS
algorithm can also be found in [3]. The algorithm starts with all fish initialized
at random positions and equal weight wi (0) set to 1.
Feeding the Fish – Weight Update Strategies for the FSS Algorithm 555

A. Individual movement: In each iteration, each fish randomly chooses a


new position which is determined by adding to each dimension j of the current
position a random number multiplied by a predetermined step (stepind ).

nj (t) = xj (t) + randu(−1, 1) ∗ stepind , (1)

where randu(−1, 1) is a random number from a uniform distribution in the


interval [−1, 1]. The movement only occurs if the new position n has a better
fitness than the current position x, and if n lies within the aquarium boundaries.
Fitness difference (Δf ) and displacement (Δx) are evaluated according to

Δf = f (n) − f (x), (2)

Δx = n − x. (3)
If no individual movement occurs Δf = 0 and Δx = 0. The parameter stepind
decreases linearly during the iterations
stepind initial − stepind f inal
stepind (t + 1) = stepind (t) − . (4)
number of iterations

B. Feeding: Fish can increase their weight depending on the success of the
individual movement according to

wi (t + 1) = wi (t) + Δf (i)/max(Δf ), (5)

where wi (t) is the weight of fish i, Δf (i) is the difference of the fitness at current
and new location, and max(Δf ) is the maximum Δf of all fish. An additional
parameter wscale limits the weight of a fish (1 <= wi <= wscale ).
C. Collective instinctive movement: After all fish have moved individually, a
weighted average of individual movements based on the instantaneous success of
all fish is computed. All fish that successfully performed individual movements
influence the resulting direction of the school movement (i.e. only fish whose
Δx != 0 influence the direction). The resulting direction m(t) is evaluated by
N
i=1 Δxi Δfi
m(t) = N . (6)
i=1 Δfi

Then, all fish of the school update their positions according to m(t)

xi (t + 1) = xi (t) + m(t). (7)

D. Collective volitive movement: This movement is either a contraction of


the swarm towards the barycenter of all fish, or a dilation of the swarm away from
the barycenter, depending on the overall success rate of the whole school of fish.
If the overall weight increased after the individual movement step, the radius of
the fish school is contracted in order to increase the exploitation ability, else the
556 A. Janecek and Y. Tan

radius of the fish school is dilated in order to cover a bigger area of the search
space. First, the barycenter b (center of mass/gravity) needs to be calculated
N
xi wi (t)
b(t) = i=1
N
. (8)
i=1 wi (t)

When the total weight of the school increased in the current iteration, all fish
must update their location according to

(x(t) − b(t))
x(t + 1) = x(t) − stepvol randu(0, 1) , (9)
distance(x(t), b(t))
when the total weight decreased in the current iteration the update is

(x(t) − b(t))
x(t + 1) = x(t) + stepvol randu(0, 1) , (10)
distance(x(t), b(t))

where distance() is a function which returns the Euclidean distance between


x and b, and stepvol is a predetermined step used to control the displacement
from/to the barycenter. As suggested in [3] we set stepvol = 2 ∗ stepind .

3 New Update Strategies


On the one hand, we apply new weight update strategies that aim at adjusting
the weight of each fish in each iteration (S1, S2 ). On the other hand, we intro-
duce a non-linear update to the step-size parameters stepind and stepvol (S3 ).
Finally, S4 is a combination of S2, S3, and an additional parameter.

Behavior of step_ind
% of search space amplitude

A a a B
10 %

c
b

D
0.001 %
0 2500 5000
Iterations

Fig. 1. Linear and non-linear decrease of step ind and step vol

• S1 (weight update) - linear decrease of weights: Here, the weights of all fish
are decreased linearly in each iteration by a pre-defined factor Δ lin such
that after the weight update in Eqn. (5) the weight of all fish is reduced by
wi = wi − Δ lin , and all weights smaller than 1 are set to 1.
• S2 (weight update) - fitness based decrease of weights: Here, not all fish will
have their weights diminished by the same factor, instead fish in poor regions
will loose weight more quickly. If f (x) is a vector containing the fitness values
of all fish at their current location, the weight of the fish will be decreased by
Feeding the Fish – Weight Update Strategies for the FSS Algorithm 557

Δf f it based = normalize(f (x)), where normalize() is a function that scales


f (x) in the range [0, 1]. Experiments showed that in order to get good results
Δf f it based needs to be scaled by a constant cf it (between 3 and 5), which is
done by Δf f it based = (Δf f it based .2 )/cf it . Finally the weights are updated
by (11) and weights smaller than 1 are set to 1.
wi = wi − Δf f it based (11)
• S3 (step size update) - non-linear decrease of stepind and stepvol : S3 im-
plements a non-linear decrease of the step size parameters which is based
on the shape of an ellipse (see Fig. 1). The motivation for this non-linear
decrease is that the algorithm is forced to converge earlier to the (ideally
global) minimum and has more iterations to search the area around the op-
timum in more detail. The bold curve in Fig. 1 shows the new non-linear
step parameter stepind nonlin , while the dotted line (“c”) shows the behavior
when stepind is decreased linearly. Remember that stepvol = 2 ∗ stepind .
Let a be the number of iterations, let b be the distance between stepind initial
and stepind f inal , and let t be the number of the current iteration. In each
iteration stepind nonlin (t) is calculated by
 
stepind nonlin (t) = stepind initial − sqrt (1 − t2 /a2 ) ∗ b2 , (12)
which is derived from the canonical ellipse equation x2 /a2 +y 2 /b2 = 1 (where
x is replaced with t, and y is replaced with stepstepind nonlin (t) .
• S4 - combination of S2, S3 and a dilation multiplier: This strategy com-
bines S2 and S3 with a newly introduced dilation multiplier cdil that allows
to cover a bigger area of the search space when a dilation occurs in the col-
lective volitive movement (i.e. when the total weight of the school decreased
in the current iteration). The general idea behind the dilation multiplier is
to help the algorithm to jump out of local minima. S2 and S3 are applied
in every iteration, and, moreover, in case of a dilation all weights are reset
to their initial weight (w(t) = 1). A pseudo code of S4 follows
while stop criterion is not met do
apply (in this order) Eqns. (1) (2) (3) (11) (12) (6) (7) (8);
if (w(t) > w(t − 1)) then
Eqn. (9);
else
w(t) = 1;
(x(t)−b(t))
x(t + 1) = x(t) + cdil ∗ stepvol ∗ randu(0, 1) ∗ distance(x(t),b(t))
end
end

4 Experimental Setup
Table 1 shows the benchmark functions used for minimization in this paper, as
well as the search space and the optimum point for each function. The initializa-
tion subspace was chosen to be in the interval [up/2, up], where up is the upper
558 A. Janecek and Y. Tan

Table 1. Benchmark test functions and function parameters

Name Equation Search space Optimum


  
1
D
FAckley (x) = −20exp −0.2 D x2i −32 ≤ xi ≤ 32 0.0D
i=1
 
1

D
− exp n cos(2πxi ) + 20 + e
i=1


D x2 
D √

FGriewank (x) = i − cos xi / i + 1 −600 ≤ xi ≤ 600 0.0D


4000
i=1 i=1

D

FRastrigin (x) = 10D + x2i − 10 cos(2πxi ) −5.12 ≤ xi ≤ 5.12 0.0D
i=1


D−1
FRosenbrock (x) = 100(xi+1 − x2i )2 + (1 − xi )2 −30 ≤ xi ≤ 30 1.0D
i=1


D
FSphere (x) = x2i −100 ≤ xi ≤ 100 0.0D
i=1

limit of the search space for each function (similar to [3]). We used the same
settings as in [3]: 5 000 iterations, 30 dimension, and 30 fishes, leading to 300 000
functions evaluations. We performed 15 trials per function. For all experiments
stepind initial was set to 10% of up, and stepind f inal was set to 0.001% of up.

5 Evaluation
In this section we evaluate the update strategies introduced in Section 3. First,
we discuss each strategy separately and focus especially on fitness per iteration
aspects, i.e. how many iterations are needed in order to achieve a given fitness.
Later we compare the best results achieved by all update strategies to each other
and discuss the increase of computational cost caused by the update strategies.
In all figures basic FSS refers to the basic FSS algorithm as presented in [3].
S1 - linear decrease of weights. The results for strategy S1 are show in
Fig. 2. The results are shown for four different values of Δlin (abbreviated as
“Δ lin 0.0XXX” in the figure) ranging from 0.0125 to 0.075. Subplots (B) to
(F) show the fitness per iteration for the five benchmark functions, and Subplot
(A) shows the average (mean) weight of all fish per iteration, which decreases
with increasing Δlin . Obviously, in most cases S1 is not able to improve the
results, but for the Rastrigin function the final result after 5 000 iterations can
by clearly improved when Δlin is set to 0.075 (and partly also for 0.05). However,
generally this update strategy is neither able to improve the final results after
5 000 iterations nor to achieve better results after a given number of iterations.
S2 - fitness based decrease of weights. The results for strategy S2 are
show in Fig. 3. Subplot (A) shows again the average (mean) weight of all fish
per iteration – the average weight is very similar to Subplot (A) of Fig. 2. The
parameter cf it that scales the decrease of the weights is abbreviated as “cf it X”.
The results for the Rastrigin function are even better than for the strategy S1
Feeding the Fish – Weight Update Strategies for the FSS Algorithm 559

(A) Avg. weight per iteration (B) Ackley function


300 10
basic FSS basic FSS
250
Δ lin 0.0250 8 Δ lin 0.0250
mean(weight)

200 Δ lin 0.0500 Δ lin 0.0500


Δ lin 0.0750 6 Δ lin 0.0750

f(x)
150
4
100

50 2

0 0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
Iterations Iterations
(C) Griewank function (D) Rastrigin function
8 200
basic FSS
Δ lin 0.0250
6 150
Δ lin 0.0500
Δ lin 0.0750
f(x)

f(x)
4
100 basic FSS
Δ lin 0.0250
2 Δ lin 0.0500
50
Δ lin 0.0750
0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
Iterations Iterations
(E) Rosenbrock function (F) Sphere function
10000 10000

8000 8000

6000 6000
f(x)

f(x)

4000 basic FSS 4000 basic FSS


Δ lin 0.0250 Δ lin 0.0250
2000 Δ lin 0.0500 2000 Δ lin 0.0500
Δ lin 0.0750 Δ lin 0.0750
0 0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
Iterations Iterations

Fig. 2. S1 - linear decrease of weights


(A) Avg. weight per iteration (B) Ackley function
300 10
basic FSS basic FSS
250 c 3 8 c 3
fit fit
mean(weight)

200 c fit 4 c fit 4


6
c fit 5 c fit 5
f(x)

150
4
100

50 2

0 0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
Iterations Iterations
(C) Griewank function (D) Rastrigin function
8 200
basic FSS
c fit 3
6 150
c fit 4
c fit 5
f(x)

f(x)

4 basic FSS
100
c fit 3
2 c fit 4
50 c fit 5
0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
Iterations Iterations
(E) Rosenbrock function (F) Sphere function
10000 10000
basic FSS
8000 c fit 3 8000
c fit 4
6000 6000
c 5
f(x)

f(x)

fit basic FSS


4000 4000 c fit 3
c 4
2000 2000 fit
c 5
fit
0 0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
Iterations Iterations

Fig. 3. S2 - fitness based decrease of weights


560 A. Janecek and Y. Tan

(A) Behavior of step_ind (B) Ackley function


10
% of search space amplitude

10 % basic FSS basic FSS


interpolated 8 interpolated
non−linear non−linear
6

f(x)
4

0.001 % 0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
Iterations Iterations
(C) Griewank function (D) Rastrigin function
8 200
basic FSS basic FSS
interpolated interpolated
6 150
non−linear non−linear
f(x)

f(x)
4
100

2
50

0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
Iterations Iterations
(E) Rosenbrock function (F) Sphere function
10000 10000
basic FSS basic FSS
8000 interpolated 8000 interpolated
non−linear non−linear
6000 6000
f(x)

f(x)

4000 4000

2000 2000

0 0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
Iterations Iterations

Fig. 4. S3 - non-linear decrease of stepind and stepvol

(A) Avg. weight per iteration (B) Ackley function


5 10
c 5 basic FSS
dil
4 8 non−linear
mean(weight)

c dil 4
3 6 c 5
f(x)

dil

2 4

1 2

0 0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
Iterations Iterations
(C) Griewank function (D) Rastrigin function
8 200
basic FSS basic FSS
non−linear non−linear
6 150
c dil 4 c dil 4
c 5 c 5
f(x)

f(x)

4 dil dil
100

2
50

0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
Iterations Iterations
(E) Rosenbrock function (F) Sphere function
10000 10000
basic FSS basic FSS
8000 non−linear 8000 non−linear
c 4 c 4
dil dil
6000 c dil 5 6000 c dil 5
f(x)

f(x)

4000 4000

2000 2000

0 0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
Iterations Iterations

Fig. 5. S4 - combination of S1, S2 and dilation multiplier


Feeding the Fish – Weight Update Strategies for the FSS Algorithm 561

Table 2. Comparison of mean value and standard deviation (in small font under mean
vaule) for 15 trials after 5 000 iterations for the five benchmarks functions. The best
results are highlighted in bold. Last row: computational cost.

Function basic FSS S1 S2 S3 S4


FAckley (x) 0.0100 0.0100 0.1270 0.0007 0.0007
0.0019 0.0023 0.0043 6.7e-05 5.3e-05

FGriewank (x) 0.0233 0.0172 0.7501 0.0058 3.2e-05


0.0098 0.0061 0.1393 0.0048 5.7e-06

FRastrigin (x) 70.443 36.879 30.745 67.126 48.156


19.465 8.0181 11.801 15.834 13.780

FRosenbrock (x) 27.574 28.498 26.277 22.775 23.718


1.2501 1.3876 2.3244 2.5801 2.5353

FSphere (x) 0.0699 0.0649 0.1165 3.4e-04 3.4e-04


0.0183 0.0237 0.0615 7.26e-05 6.59e-05

Runtime 1.0 × 1.0017 × 1.0042 × 1.0142 × 1.0238

and also the results for the Rosenbrock function could be improved slightly. As
Table 2 indicates, this strategy achieves the best final result for the Rastrigin
function of all strategies after 5 000 iterations. For the other benchmark functions
this strategy perform equally or worse than basic FSS.
S3 - non-linear decrease of stepind and stepvol . S3 results are show in Fig. 4,
where stepnonlinear (t) is abbreviated as “non-linear”. “Interpolated” shows the
results using an interpolation of “basic FSS” and “non-linear”, i.e. stepinterpol (t)
= stepind (t) - [stepind (t) − stepnonlinear (t)] /2. Subplot (A) shows the behavior
of stepind and should be compared to Fig. 1. The results indicate that this non-
linear decrease of the step-size parameters significantly improves the fitness per
iteration for all five benchmark functions. Generally, “non-linear” achieves the
best results, followed by “interpolated”. For some functions, such as (D) or (E),
this strategy needs only about half as many iterations as basic FSS to achieve
almost the same results as basic FSS after 5 000 iterations.
S4 - combination of S2, S3 and dilation multiplier. The results for stra-
tegy S4 are show in Fig. 5 and are compared to basic FSS and “non-linear”
from strategy S3 . The dilation multiplier cdil is abbreviated as “cdil X”. Since
the weight of all fish is reset to 1 if a dilation occurs, the average (mean) weight
per iteration is relatively low (see Subplot (A)). Generally, this strategy achieves
similar results as strategy S3 , but clearly improves the results for the Rastrigin
function and achieves a better final result after 5 000 iterations and also better
fitness per iteration for “cdil 5”.
Comparison of final results. Table 2 shows a comparison of the mean values
and the standard deviations after 5 000 iterations. As can be seen, the results for
all five benchmark functions could be improved. Overall, strategy S4 achieves
the best results followed by S3 . S1 and S2 are better or equal than basic FSS
for 4 out of 5 benchmark functions.
562 A. Janecek and Y. Tan

Computational cost. The last row of Table 2 shows the increase in computa-
tional cost caused by the additional computations of the update steps. Example:
the runtime for S1 is 1.0017 times as long as the runtime for basic FSS. This
indicates that the increase in runtime is only marginal and further motivates the
utilization of the presented update steps.

6 Conclusion
In this paper we presented new update strategies for the Fish School Swarm algo-
rithm. We investigated the influence of newly developed weight update strategies
as well as the influence of a non-linear decrease of the step-size parameters stepind
and stepvol . Results indicate that strategies S3 and S4 are able to significantly
improve the fitness per iteration for all benchmark functions and also achieve
better final results after 5 000 iterations when compared to the basic implemen-
tation of FSS. The results motivate for further research on update strategies for
FSS and also for adapting the non-linear decrease of the step size parameters
for other search heuristics.

Acknowledgments. This work was supported by National Natural Science


Foundation of China (NSFC), Grant No. 60875080. Andreas wants to thank the
Erasmus Mundus External Coop. Window, Lot 14 (2009-1650/001-001-ECW).

References
[1] Bastos Filho, C., Lima Neto, F., Lins, A., Nascimento, A.I.S., Lima, M.: A novel
search algorithm based on fish school behavior. In: IEEE International Conference
on Systems, Man and Cybernetics, SMC 2008, pp. 2646–2651 (2008)
[2] Bastos Filho, C., Lima Neto, F., Lins, A., Nascimento, A.I.S., Lima, M.: Fish
school search: An overview. In: Chiong, R. (ed.) Nature-Inspired Algorithms for
Optimisation. SCI, vol. 193, pp. 261–277. Springer, Heidelberg (2009)
[3] Bastos Filho, C., Lima Neto, F., Sousa, M., Pontes, M., Madeiro, S.: On the in-
fluence of the swimming operators in the fish school search algorithm. In: Int.
Conference on Systems, Man and Cybernetics, pp. 5012–5017 (2009)
[4] Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of IEEE
International Conference on Neural Networks, vol. 4, pp. 1942–1948 (1995)
[5] Janecek, A.G., Tan, Y.: Using population based algorithms for initializing non-
negative matrix factorization. In: ICSI 2011: Second International Conference on
Swarm Intelligence (to appear, 2011)
[6] Goldberg, D.E.: Algorithms in Search, Optimization and Machine Learning, 1st
edn. Addison-Wesley Longman, Boston (1989)
[7] Storn, R., Price, K.: Differential Evolution - A Simple and Efficient Heuristic
for Global Optimization over Continuous Spaces. Journal of Global Optimiza-
tion 11(4), 341–359 (1997)
[8] Tan, Y., Zhu, Y.: Fireworks algorithm for optimization. In: Tan, Y., Shi, Y., Tan,
K.C. (eds.) ICSI 2010. LNCS, vol. 6145, pp. 355–364. Springer, Heidelberg (2010)
Density as the Segregation Mechanism in Fish School
Search for Multimodal Optimization Problems

Salomão Sampaio Madeiro, Fernando Buarque de Lima-Neto,


Carmelo José Albanez Bastos-Filho, and Elliackin Messias do Nascimento Figueiredo

Polytechnic School of Pernambuco, University of Pernambuco, Recife, Brazil


52720-001, Recife-PE, Brazil
{ssm,fbln*,cjabf,emnf}@ecomp.upe.br

Abstract. Methods to deal with Multimodal Optimization Problems (MMOP)


can be classified in three main approaches, regarding the number and the type
of desired solutions. In general, methods can be applied to find: (1) only one
global solution; (2) all global solutions; and (3) all local solutions of a given
MMOP. The simultaneous capture of several solutions of MMOPs without
parameter adjustment is still an open question in optimization problems. In this
article, we discuss a density segregation mechanism for Fish School Search to
enable simultaneous capture of multiple optimal solutions of MMOPs with one
single parameter. The new proposal is based on vanilla version of Fish School
Search (FSS) algorithm, which is inspired on actual fish school behavior. The
performance of the new algorithm is evaluated and compared to the performance
of other methods such as NichePSO and Glowworm Swarm Optimization (GSO)
for seven well-known benchmark functions of two dimensions. According to
the obtained results, presented in this article, the new approach outperforms the
algorithms NichePSO and GSO for all benchmark functions.

1 Introduction
Multimodal Optimization Problems (MMOP) occur in various fields including
geophysics [1], electromagnetism [2], climatology [3] and logistics [4, 5], among
others. To find more than one optimal solution of MMOPs can be useful because of
two main reasons [6, p. 88]: (1) to provide insights in functions landscape; and (2) to
allow selections of alternative solutions, e.g. when the dynamic nature of constraints in
the search space makes a previous optimum solution infeasible.
Several methods based on computational models inspired on natural processes have
been proposed to deal with MMOPs. For example, Particle Swarm Optimization (PSO)
[7] is an effective optimization method [8] for which several approaches have been
proposed in order to make it able to capture multiple optimal solutions of MMOPs
[9, 10, 11, 12, 13, 14]. Moreover, new methods to MMOPs have been proposed based
on, for example, a swarm of glowworms [6].
Although several swarm based methods have been proposed to deal with MMOPs,
there are still two important issues to be addressed. The first one concerns to the
fact that performance of most of the proposed methods depends on manual parameter
adjustment. And the second one concerns to the reduction of performance when the

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 563–572, 2011.
c Springer-Verlag Berlin Heidelberg 2011
564 S.S. Madeiro et al.

dimensionality of the problem increases. In other words, many current methods, when
applied to MMOPs with more than five dimensions, e.g. present low performance.
In this article, we discuss a density segregation mechanism that enables simultaneous
capture of multiple optimal solutions of MMOPs put together on top of the original
algorithm Fish School Search (FSS) [15]. The new proposal allow each fish to
find different food sources (i.e. different optimal solutions). So, at each iteration
of the algorithm, the school can be divided into subgroups. Each created subgroup
corresponds to a possible solution to the multimodal problem. At the end of the
execution, the set of all captured solutions is provided.
The paper is organized as follows: in Section 2, we give a detailed description of
the algorithm FSS. In Section 3, we give a detailed description of the modified FSS
algorithm, based on density. In Section 4, we compare the performance of the density
FSS with the performance of other algorithms, anmelly, NichePSO and GSO for a set
of well known benchmark of multimodal functions. The conclusion is provided and
commented upon in Section 5.

2 Fish School Search


The FSS algorithm has four operators, which can be grouped in two classes: feeding
and swimming. In the next subsections, we present in details the four FSS operators.

2.1 Individual Movement Operator


The individual movement operator is applied in each iteration for each fish in the school.
Each fish randomly chooses a new position and evaluates it by using the fitness function.
The fish will only move to that calculated position if it is more advantageous than the
current one; otherwise, it stays at the same position. The next candidate position is
determined as shown in (1):

xi (t + 1) = xi (t) + rand(−1, 1) stepind (t), (1)

where xi (t) is the current position of the fish in dimension i, xi (t+1) is the new calculated
position of the fish for dimension i, and rand() is a function that return numbers
uniformly distributed in a given interval. The stepind is calculated as a percentage
of xmax ) for all dimension i. stepind decreases linearly during iterations by using
stepind (t + 1) = stepind (t) − (stepind initial − stepind f inal )/iterations in order to improve
the exploitation ability in later iterations, where iterations is the number of iterations
used in the simulation. The stepind initial and stepind f inal are the initial and the final
individual movement step, respectively. Note that the stepind initial must be higher then
stepind f inal in order for the gradual shift from exploration to exploitation modes of
operation along the iterations.

2.2 The Feeding Operator


The weight is a metaphor to quantify the success of the search process for individual
fish. The heavier a fish is, the higher is the probability for it to be in a good region
Density as the Segregation Mechanism in FSS for MMOP 565

of the search space (i.e. aquarium). The amount of food that a fish eats depends on
the improvement in its fitness and the largest improvement in the fitness of the entire
school. The weight of a fish is updated according to Wi (t + 1) = Wi (t) + Δ fi /max(Δ f ),
where Wi (t) is the weight of the fish i, (Δ fi ) is the difference of the fitness at current and
new position for the fish i, max(Δ f ) is a function that returns the maximum difference
of the fitness values among all the fish. One should remember that Δ fi = 0 for a fish
that does not perform the individual movement at the current iteration.

2.3 Collective Instinctive Movement Operator


Only fish that successfully performed individual movements, i.e. Δ xi = 0, influence the
resulting direction of the school movement. The resulting direction (I) is evaluated by
using (2). After that, all fish of the school must update their positions according to (3).
The collective instinctive movement operator, at each iteration of the algorithm, tends
to guide the whole school in the direction of movement taken by the fish that found the
largest portion of food in its individual movement (i.e. to regions of the space search in
which it was discovered the large amounts of food).

∑Ni=1 Δ xi Δ fi
I(t) = , (2)
∑Ni=1 Δ fi

xi (t + 1) = xi (t) + I(t). (3)

2.4 Collective Volitive Movement Operator


The collective volitive movement operator is important to: (1) balance the trade-off
between exploration and exploitation; and (2) avoid the algorithm to be trapped in local
optima. The school contraction is applied as a step - inward drift - to every fish position
with regard to the school barycenter. Conversely, the school dilatation is applied as a
step outwards. The fish-school barycenter is obtained by considering all fish positions
and their weights, as shown in (4). All fish must update their positions according to (5)
when the total weight of the school increases at the current iteration. On the other hand,
if the total weight of the school remains constant or reduces at the current iteration, all
fish must update their positions according to (6).

∑Ni=1 xiWi (t)


B(t) = , (4)
∑Ni=1 Wi (t)

(x(t) − B(t))
x(t + 1) = x(t) − stepvol rand(0, 1) , (5)
distance(x(t), B(t))
(x(t) − B(t))
x(t + 1) = x(t) + stepvol rand(0, 1) , (6)
distance(x(t), B(t))
where distance() is a function which returns the Euclidean distance between the
barycenter and the fish current position, stepvol is a predetermined step used to control
the displacement from/to the barycenter.
566 S.S. Madeiro et al.

The stepvol must be in the same order of magnitude of the step used in the individual
movement. As stepvol is multiplied by a factor drawn from the uniform distribution in
interval [0,1] with expected value equal to 0.5, which is usually twice the stepind value.

3 Density as the Segregation Mechanism in FSS


The modifications applied to FSS are presented as follows.

3.1 Feeding Operator


In the new proposal, as opposed to what occurs in vanilla FSS, Δ fi is shared among
all fish. The objective of that sharing process is to locate areas for which fish cooperate
(i.e. share food) and then, gather together all cooperating fish in the same subgroup. The
sharing mechanism of Δ fi among other fish j is performed according to (7). It depends
d
on two factors: (1) the normalized distance dRi j = mini dj , k = i, where k ∈ 1, 2, . . . , N
ik
(i.e. dRi j ≥ 1), and (2) the number qi j of fish k for which dik < di j (i.e. the density of
fish around fish i), including fish i.
N
1 Δ fi
Δ fi = Pi ∑ qi j = N = Pi , (7)
j=1 (d R ij ) ∑ 1
j=1 (d )qi j
Ri j

In (7), Pi must be evaluated for each fish i and it represents the amount of food the
fish i will receive after sharing Δ fi . The other fish j will receive (d 1 )qi j of Pi , as given
Ri j

in (8). In (7), when i = j, we have dRi j = 0 and qi j = 0, resulting in 00 . For this case,
computationally, we consider 00 = 1. Note that if, for a given fish i, we have min dik = 0,
we decided to consider min dik = 4, 9E −324 as this is the lowest possible value for the
numerical precision using the data type double in our computer set up.
Pi Δ fi
C(i, j) = = . (8)
(dRi j )qi j (dRi j )qi j ∑Nk=1 (dR 1 )qik
ik

According to (7) and (8), each fish j will receive an amount of food exponentially
smaller according to (dRi j )−qi j . This equation is based on the one used by Martinetz and
Schulten [16, p. 520] in the step (iv) of their proposed algorithm for creating Topology
−ki
Representing Networks (TRN). For density FSS, the expression e λ of [16, p. 520]
was adapted to (dRi j )−qi j , in order to quantity the amount of food a fish j will receive
because the successful foraging behavior of another ”colleague” fish i. The greater the
value of qi j (i.e. the greater the density of fish around fish i), the smaller will be the
amount of food avaiable for fish j. That means that crowded areas exert little influence
over other fish.
At the end of each iteration, each fish i received an amount of food given as C( j, i)
from other fish j that successful found food. The sum of each amount C( j, i) for all
other fish j corresponds to the total amount of food fish i received in a given iteration.
Then, the weight Wi (t) of fish i at the t th iteration is updated according to (9).
Density as the Segregation Mechanism in FSS for MMOP 567

Q
Δ fj
Wi (t + 1) = Wi (t) + ∑ , (9)
(d
j=1 Ri j ) ∑
qi j N
k=1 (d
1
q jk
R jk )

where Q is the quantity of fish that successful found food at the t th iteration. In this
new proposal, differently from a real fish in nature, we assumed that the weight of all
artificial fish do not decrease along iterations.

3.2 Memory Operator

In the algorithm derived here, each fish i has a memory Mi = {Mi1 , Mi2 , . . . , MiN }, where
N is the number of fish in the school, and Mi j quantifies the influence of one fish j over
the fish i. Mi j depends on the total amount of food the fish i received because of the
foraging behavior of fish j (i.e. C( j, i)) along the entire execution of the algorithm.
The bigger is C( j, i), the greater will be the influence of fish j over fish i. This exerted
influence manifests itself in terms of how synchronized will be the behavior of a fish i
regarding the foraging behavior of another fish j.
After the Feeding Operator is computed, the Memory Operator updates Mi j as shown
in (10). In (10), 0 ≤ ρ ≤ 1 is a parameter that controls the influence of one fish over
every other. For example, if the value of ρ is close to 1, in general, just after a relatively
small number of iterations (e.g. 10 iterations), the memory of each fish may be greatly
changed. That is, fish here learn and forget rather quickly.

Δ fj
Mi j (t + 1) = (1 − ρ )Mi j (t) + = (1 − ρ )Mi j (t) + C( j, i). (10)
(dRi j )
qi j
∑Nk=1 (d 1
q jk
R jk )

3.3 Collective Instinctive Movement Operator

For density based FSS, the resultant behavior Ii for each fish i is evaluated as shown
in (11). In (11), Ii is a sum of the directions taken by each fish j during the Individual
Movement Operator weighted by Mi j . Note that, contrary of FSS, even though a fish
does not locate food (i.e. Δ x j = 0), it would influence the resultant behavior Ii of other
fish i. In other words, even if fish j does not locate food, fish i will mimic its behavior
(i.e. remain stationary) according to the memorized value Mi j .

∑Nj=1 Δ x j Mi j
Ii (t) = . (11)
∑Nk=1 Mik

3.4 Operator for Partitioning the Main Fish School

In our new approach, at each iteration of the algorithm, the main school is partitioned
into subgroups. One fish i will be in the same subgroup of other fish j if and only if:

Mi j = max Mik ∨ M ji = max M jk , (12)


k=1,2,...,N k=1,2,...,N
568 S.S. Madeiro et al.

where N is the number of fish in the main school. Therefore, fish i is into the same
subgroup of fish j if and only if fish j is the fish that exerts the largest influence over
fish i or fish i is the fish that exerts the largest influence over fish j.
The algorithm for the partition of the main school into subgroups is illustrated in
Algorithm 1. Through this procedure, a fish i is chosen randomly in the main school.
After that, all other fish j of the main school that are into the same subgroup of fish
i, according to the definition in (12), are removed from the main school and put in the
same subgroup of fish i. Then, for each fish j, all other fish k that are into the same
subgroup of fish j are removed from the main school. This process of selection of fish
that are into the same subgroup of fish i is repeated until all the fish into the subgroup of
fish i have been removed from the main school. Then, another fish i is chosen randomly
from the main school and the procedure is repeated until all the fish have been removed
from the main school.

Algorithm 1. Pseudo code of Partition Operator


while There is fish in the main school do
Choose a fish i randomly in the main school
Create a new subgroup Si
Put fish i in subgroup Si
Remove fish i from the main school
Find other fish j in the main school that satisfies (12)
while there exists fish j in the main school do
Put fish j in subgroup Si
Remove fish j from the main school
Set i = j
Find other fish j in the main school that satisfies (12)
end while
end while

3.5 Individual Movement Operator


In density FSS, the new update mechanism to update the length of the step of the fish is
given in (13).
stepindi (t + 1) = decayi ∗ stepindi (t), (13)
 
Ri (t) − min(R j (t))
decayi = decaymin − (decaymin − decaymax(t)), (14)
max(R j (t)) − min(R j (t))
  t
decaymaxend Tmax
decaymax (t) = decaymaxini , (15)
decaymaxini
Q
Δ fj
Ri (t) = ∑ (dR ) ∑
qi j N 1
q jk
, (16)
j=1 ij k=1 (d R jk )

where stepindi (0) = stepinit , 0 ≤ decaymin ≤ 1, 0 ≤ decaymaxini ≤ 1, and 0 ≤


decaymaxend ≤ 1 are parameters of the algorithm and decaymaxend < decaymaxini <
decaymin . decaymin is the decay to be applied to the step of the fish that received the
smallest amount of food in a subgroup. decaymax decays exponentially as given in (15)
and it is used to reduce the step size of the fish that received the largest amount of
food in a subgroup. For the other fish in a given subgroup, the value of decayi , where
decaymax < decayi < decaymin , is evaluated as given in (14).

3.6 Collective Volitive Movement Operator


In density based FSS, the Collective Volitive Movement Operator is performed
independently for each created subgroup. The barycentre is calculated for each
subgroup based on the weight of fish as given in (4). Then, aiming to allow a progressive
convergence of each subgroup around a potential solution of one MMOP, each fish
“swims” in the direction of the barycentre of its subgroup according to (17). To avoid
premature convergence of subgroups around regions for which there is no solution, the
size of the step to be performed for a fish in the barycentre direction of its subgroup
varies according to the value of decaymax (t).

x(t + 1) = x(t) + (1 − decaymax(t))(B(t) − x(t)). (17)

4 Experiments
In this section the performance of our new approach is compared to the performance of
NichePSO [10] and GSO [6] for benchmark functions with two and more dimensions
and with a finite number of optimal solutions.

4.1 Methodology
For the experiments in this section, we used the set of multimodal functions shown in
Table 1.
For all benchmark functions, we consider that: (1) the entities are disposed uniformly
on the search space; to ensure a uniform distribution of the entities in the search
space, we used Faure sequences [18] to generate a uniform sequence of pseudo-random
numbers; (2) the number of iterations for density FSS is halved regarding the number
of iterations of the algorithms NichePSO and GSO for all experiments, since in density
FSS each fish performs two calls to the objective function at each iteration; (3) if the
normalized distance between two captured optima i and j, for all optima, is less than
0.01, we assume those optima as being the same optimal solution; in this case, only
the fittest optimum will be taken, and the other one will be discarded.
 The normalized
distance dN (i, j) between two points i and j is given as dN (i, j) = (xN −xN )·(x N −xN )
i j i j
D ,
x1 x2 x1
where xN = ( x1 , x2 , . . . , xD ), D is the number of dimensions of one MMOP, and
max max max
xkmax is the upper bound of the dimension k; (4) for all selected optima, we considered
that density FSS and NichePSO have captured an optimal solution k if the normalized
distance between k and the optima closest to k is less than 0.005; in GSO [6, p. 99], an
optimal solution k is captured when at least three glowworms are located at a distance
less than ε = 0.05 from k. In this paper, the value of ε was modified to 0.005, following
the procedure used for the algorithms NichePSO and density FSS, as described earlier.
570 S.S. Madeiro et al.

Table 1. Multimodal benchmark functions. The domain of the search space and the corresponding
number of peaks for that domain are given in the second and the third columns, respectively.

Function Domain No. of Source


peaks

F1 (X) = ∑m
i=1 cos (X(i)), X ∈ R
2 m [−π , π ]m 3m [6]
F2 (x, y) = cos2 (x) + sin2 (y) [−5, 5]2 12 [11, 6]
 
x2i xi
F3 (x) = 1 + ∑ni=1 4000 − ∏ni=1 cos √ [−29, 29]2 124 [17]
i
F4 (x, y) = 200 − (x2 + y2 − 11)2 − (x + y2 − 7)2 [−6, 6]2 4 [10, 6]
x)2 e−[x +(y+1) ]
2 2
F5 (x, y) = 3(1 − − 10( 5x − x3 − [−3, 3]2 3 [6]
y5 )e−[x +y ] − ( 13 )e−[(x+1) +y ]
2 2 2 2

F6 (x, y) = ∑Q −bi ((x−xi ) +(y−yi ) ) , where Q = 10, [−5, 5]2


2 2
i=1 ai e 10 [6]
ai = 1 + 2ϑ , bi = 2 + ϑ , xi = −5 + 10ϑ , yi = −5 + 10ϑ ,
and ϑ is a random value uniformly distributed in [0,1]
 2 
F7 (X) = 10n + ∑m i=1 xi − 10 cos(2π xi ) , X ∈ R [−5, 5]2 ,
m 100, 6m [6]
[−3, 3]m

The parameter configuration for NichePSO was the same used in [14, p. 2300]. In
[14, p. 2300], c1 = c2 = 1.2, w linearly decreases from 0.7 to 0.2, δ = 10−4, μ = 10−2
and ε = 0.1. Those values are used for all experiments performed in this paper. For
GSO, we used the same configuration as described in [6, p. 99]: ρ = 0.4, γ = 0.6,
β = 0.08, nt = 5, s = 0.03, l0 = 5. The value of the parameter rs = 2 was chosen
for all experiments based on the results presented for GSO in [6, pp. 109–110]. For
density FSS, the parameter values were: ρ = 0.3, stepinit = 0.05, decaymin = 0.999,
decaymaxini = 0.99, and decaymaxend = 0.95. All those values were determined based on
tedious numerical experiments performed earlier.

4.2 Performance Comparison among NichePSO, GSO and Density FSS


In this section, for each function, we vary both the number t of iterations of the
algorithms and the number n of entities in the swarm. In the experiments in this
section, we set t = 50, 100, 150, . . . , 500. We choose tmax = 500 iterations because,
for some numerical experiments, the performance of the algorithms do not improve
significantly after 500 iterations. For the functions F1 , F2 , F4 , F5 and F6 , we used n =
5, 10, 15, . . . , 200, 210, 220, . . . , 350. Therefore, for those functions, there are 550
(10*55) combinations between t and n. For the functions F3 and F7 , we used n =
5, 10, 25, 50, 100, 150, . . . , 1300, 1400 and n = 5, 10, 25, 50, 100, 150, . . . , 1000,
respectively. Then, for F3 , there are 300 (10*30) combinations and for the F9 there are
230 (10*23) combinations. For each combination, we collect the mean value and the
standard deviation of the number of optimal solutions captured by the algorithms in 30
trials.
Density as the Segregation Mechanism in FSS for MMOP 571

Table 2. Comparison of the performance of the algorithms NichePSO, GSO and density FSS
regarding the percentage of the total number of combinations for which the algorithms captured
on average more than 95% of the number of optimal solutions of one MMOP

F1 F2 F3 F4 F5 F6 F7

NichePSO 68.18% 34.55% 0.667% 80.09% 82.73% 0.00% 0.00%


GSO 72.75% 52.00% 9.33% 73.09% 78.18% 0.00% 0.00%
dFSS 74.00% 64.55% 34,67% 86.91% 83.45% 0.00% 23.48%

In order to compare the performance of the algorithms NichePSO, GSO and density
FSS we choose the metric: percentage of the total number of combinations for which
the algorithms captured on average more than 95% of the number of optimal solutions
(i.e. number of peaks) of one MMOP. Table 2 summarizes the performance of the
algorithms. In general, as one can note in Table 2, density FSS outperformed the
algorithms NichePSO and GSO for all functions used in this Section, regarding the first
metric. For the function F7 , for example, density FSS captured on average more than 95
optimal solutions for 23.48% of the total number of combinations, whereas NichePSO
and GSO failed to capture more than 95 optimal solutions for all combinations.

5 Conclusions

In this paper, the algorithm FSS was adapted to locate simultaneously multiple optima
of MMOP. We were able to produce a new mechanism (and algorithm), based on
the principle of density, that affords the segregation for splitting the fish school and,
consquently, is able to locate multiple optima. Two new operators are proposed for
the partition of the fish school into subswarms, such that each created subswarm
corresponds to one potential optimal solution of a given MMOP. At the end of the
execution of the algorithm, a set of captured optima of a given MMOP is clearly
produced.
The experimental results demonstrate that FSS based on density is a far better
approach to MMOP than NichePSO and GSO. The reason for that is the evident
ability of density FSS to simultaneously capture multiple optima without heavy
parameterization additional costs. The highlights of the current proposal are: (1) it
outperforms NichePSO and GSO for all benchmark functions; (2) it has the ability
to tackle MMOPs of more than two dimensions with the need of manual parameter
adjustments.

Acknowledgments

The authors acknowledge the financial support from CAPES, CNPq and University of
Pernambuco for scholarships, support and travel grants.
572 S.S. Madeiro et al.

References
[1] Koper, K., Wysession, M., Wiens, D.: Multimodal function optimization with a niching
genetic algorithm: A seismological example. Bulletin of the Seismological Society of
America 89(4), 978–988 (1999)
[2] Dilettoso, E., Salerno, N.: A self-adaptive niching genetic algorithm for multimodal
optimization of electromagnetic devices. IEEE Transactions on Magnetics 42(4), 1203–
1206 (2006)
[3] El Imrani, A., Zine El Abidine, H., Limouri, M., Essaid, A.: Multimodal optimization of
thermal histories. Comptes Rendus de l’Academie de Sciences - Serie IIa: Sciences de la
Terre et des Planetes 329(8), 573–577 (1999)
[4] Luh, G.-C., Chueh, C.-H.: Job shop scheduling optimization using multi-modal immune
algorithm. In: Okuno, H.G., Ali, M. (eds.) IEA/AIE 2007. LNCS (LNAI), vol. 4570, pp.
1127–1137. Springer, Heidelberg (2007)
[5] Naraharisetti, P., Karimi, I., Srinivasan, R.: Supply chain redesigns - multimodal
optimization using a hybrid evolutionary algorithm. Industrial and Engineering Chemistry
Research 48(24), 11094–11107 (2009)
[6] Krishnanand, K., Ghose, D.: Glowworm swarm optimization for simultaneous capture of
multiple local optima of multimodal functions. Swarm Intelligence 3(2), 87–124 (2009)
[7] Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory. Micro Machine
and Human Science, 39–43 (1995)
[8] Shi, Y., Eberhart, R.C.: An empirical study of particle swarm optimization. In: IEEE
Congress on Evolutionary Computation, pp. 1945–1960 (1999)
[9] Parsopoulos, K., Vrahatis, M.N.: Modification of the particle swarm optimizer for locating
all the global minima. In: Karny (ed.) Artificial Neural Networks and Genetic Algorithms,
pp. 324–327 (2001)
[10] Brits, R., Engelbrecht, A.P., van den Bergh, F.: A niching particle swarm optimizer. In:
Proceedings of the 4th Asia-Pacific conference on simulated evolution and learning, pp.
692–696 (2002)
[11] Parsopoulos, K., Vrahatis, M.N.: On the computation of all global minimizers through
particle swarm optimization. IEEE Transactions on Evolutionary Computation 8(3), 211–
224 (2004)
[12] Brits, R., Engelbrecht, A., van den Bergh, F.: Locating multiple optima using particle swarm
optimization. Applied Mathematics and Computation 189(2), 1859–1883 (2007)
[13] Ozcan, E., Yilmaz, M.: Particle swarms for multimodal optimization. In: Beliczynski, B.,
Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds.) ICANNGA 2007. LNCS, vol. 4431, pp.
366–375. Springer, Heidelberg (2007)
[14] Engelbrecht, A., Van Loggerenberg, L.: Enhancing the nichepso. In: IEEE Congress on
Evolutionary Computation, CEC, pp. 2297–2302 (2007)
[15] Bastos-Filho, C., de Lima Neto, F., Lins, A., Nascimento, A., Lima, M.: Fish school search.
In: Chiong, R. (ed.) Nature-Inspired Algorithms for Optimisation. SCI, vol. 193, pp. 261–
277. Springer, Heidelberg (2009)
[16] Martinetz, T.M., Schulten, K.J.: Topology representing networks. Neural Networks 7(3),
507–522 (1994)
[17] Griewank, A.: Generalized descent for global optimization. Journal of Optimization Theory
and Applications 34, 11–39 (1981)
[18] Thiemard, E.: Economic generation of low-discrepancy sequences with a b-ary gray
code. Technical report, Departement de Mathematiques, Ecole Polytechnique Federale de
Lausanne, CH-1015 Lausanne, Switzerland (1998)
Mining Coherent Biclusters with Fish School Search

Lara Menezes and André L.V. Coelho

Graduate Program in Applied Informatics, University of Fortaleza, Fortaleza-CE, Brazil


[email protected], [email protected]

Abstract. Fish School Search (FSS) is a recently-proposed metaheuristic


inspired by the collective behavior of fish schools. In this paper, we provide a
preliminary assessment of FSS while coping with the task of mining coherent
and sizeable biclusters from gene expression and collaborative filtering data.
For this purpose, experiments were conducted on two real-world datasets
whereby the performance of FSS was compared with that exhibited by two
other population-based metaheuristics, namely, Genetic Algorithm (GA) and
Particle Swarm Optimization (PSO). The results achieved demonstrate the
usefulness of FSS while tackling the biclustering problem.

Keywords: Biclustering, Fish School Search, Genetic Algorithms, Particle


Swarm Optimization, Bioinformatics, Collaborative Filtering.

1 Introduction
Fish schools are one of the best examples of collective animal behavior [17]. Schools
are groups composed of many fish, usually of the same species, acting as a single unit
and moving in more or less harmonious patterns throughout the oceans. These groups
show a streamlined structure and uniform behavior aiming at avoiding predators and
finding food. Fish join schools for selfish reasons; therefore, in order for schooling to
improve fitness, schools must offer benefits greater than the costs of increased
visibility to predators, increased competition, and energetic instability [12].
Recently, a novel swarm intelligence metaheuristic, named as Fish School Search
(FSS), was introduced by Bastos Filho et al. [2]. In a nutshell, FSS is inspired by the
collective behavior displayed by real fish schools and thus is composed of operators
that mimic their feeding and swimming activities. Together these operators afford
salient computational properties such as [2][14]: (i) high-dimensional search abilities;
(ii) on-the-‘swim’ selection between exploration and exploitation; and (iii) self-
adaptable guidance towards sought solutions (which can be multimodal).
In FSS, the school “swims” (searches) for “food” (candidate solutions) in the
“aquarium” (search space). The weight of each fish acts as a sort of memory of its
individual success, and both individual and collective movements are performed so as
to locate and explore promising areas of the aquarium. So far, this algorithm has been
adopted with success to solve continuous optimization problems [2][14]. In this paper,
we follow a different perspective by providing a preliminary assessment of the
potentials of FSS while tackling a non-trivial data mining task known as biclustering.
The main idea behind biclustering is to simultaneously cluster both rows and
columns of a data matrix, allowing the extraction of contextual information from it [13].

Y. Tan et al. (Eds.): ICSI 2011, Part II, LNCS 6729, pp. 573–582, 2011.
© Springer-Verlag Berlin Heidelberg 2011
574 L. Menezes and A.L.V. Coelho

This notion can be traced back to the 1960’s, though it has become more well-known
since the beginning of the last decade, when it was reintroduced by Cheng and Church
[3] in the domain of gene expression data analysis. Biclustering techniques have been
applied in different contexts, such as in bioinformatics, time series expression data, text
mining, and collaborative filtering [5]-[9][13]. Some of their advantages over
conventional clustering algorithms are [3][9][13][18]: (i) they can properly deal with
missing data and corrupted measurements by automatically selecting rows and columns
with more coherent values and dropping those corrupted with noise; (ii) they group
items based on a similarity measure that depends on a context, i.e. a subset of the
attributes, describing not only the grouping, but the context as well; and (iii) they allow
that rows and columns be simultaneously included in multiple biclusters.
Many different biclustering algorithms can be found in the literature [13][16][18].
In particular, due to the highly combinatorial nature of this problem, bio-inspired
metaheuristics have been successfully adopted to tackle it, such as genetic algorithms
(GA), particle swarm optimization (PSO), artificial immune systems (AIS), and ant
colony optimization (ACO) [7]-[9][15][19]. In this study, to assess the performance of
FSS in mining coherent and sizeable biclusters, a simple modification of the
algorithm was performed in order to allow for the representation of binary solutions.
Two datasets, one related to bioinformatics [4] and the other to collaborative filtering
[6], were considered, and the assessment is done here having as yardstick the levels of
performance exhibited by GA and PSO. Overall, the results achieved so far suggest
that the FSS algorithm is competitive in terms of locating coherent biclusters and
prevails in terms of better computational efficiency.
The rest of the paper is organized as follows: Section 2 provides a brief account on
the biclustering problem. Section 3 describes the FSS algorithm while Section 4
reviews the main steps of the GA and PSO algorithms. In Section 5, the empirical
results achieved so far are discussed, while, in Section 6, we provide final remarks.

2 The Biclustering Problem


In general, the goal of a biclustering algorithm is to efficiently locate sizeable and
coherent biclusters hidden in a given data matrix [3][9][16]. Madeira and Oliveira
[13] devised four dimensions to classify these algorithms: the type of biclusters
found; the structure of the biclusters; the method used to identify biclusters; and the
context in which they are applied.
The type of a bicluster refers to the similarity between the elements in the bicluster.
So, a bicluster may have constant values, constant values only on rows or columns,
coherent values or coherent evolutions. In turn, the structures that may be identified
comprise: single biclusters; exclusive row biclusters; exclusive column biclusters;
exclusive row and column biclusters; checkerboard structures; non-overlapping
biclusters with tree structure; non-overlapping non-exclusive biclusters; overlapping
biclusters with hierarchical structure; and arbitrarily positioned, overlapping biclusters.
Finally, the method (approach) used to identify biclusters may be classified in terms of
the number of biclusters (one or more) to be discovered per run. Some of these
approaches include: iterative row and column clustering combination; divide-and-
conquer; greedy iterative search; exhaustive bicluster enumeration; and via distribution
parameter identification [13].
Mining Coherent Biclusters with Fish School Search 575

Recently, there have been an increasing number of studies investigating the


application of bio-inspired algorithms to biclustering [7]-[9][15][19]. One crucial
issue in this context is to use a good objective function (aka fitness function) to
measure the bicluster quality. As in [19], the following fitness function , is
adopted here to measure the quality of the biclusters produced by FSS, GA and PSO
algorithms. This function makes use of the mean squared residue score (MSR),
proposed by Cheng and Church to measure the bicluster coherence [3]:
| || |, if MSR , δ
, , (1)
, otherwise
,

MSR , ∑ , . (2)
| || |
In (1) and (2), I denotes the set of rows, J means the set of columns, is an element
in the submatrix, stands for the average of the ith row, indicates the average of
the jth column, is the average of the whole submatrix, and δ is a threshold to be
pre-defined by the user.
Each individual in the chosen algorithms (GA, PSO and FSS) denotes a candidate
bicluster and is formed by two strings of sizes n and m, where n (m) denotes the total
number of rows (columns) of the data matrix [19]. The solution encoding adopted is
binary with the bit ‘1’ (‘0’) meaning that the corresponding row or column belongs
(not belongs) to the bicluster. So, a matrix element needs to have both its row and
column set to ‘1’ to effectively belong to the bicluster.

3 Fish School Search


As mentioned before, the FSS algorithm is inspired by the social behavior exhibited
by several oceanic fish species. The main behavioral characteristics derived from real
fish schools and incorporated into FSS can be grouped into two categories [2][14]: (i)
Feeding, inspired by the natural instinct of fish to consume food in order to grow
strong and reproduce; and (ii) Swimming, aimed at mimicking the coordinated
movement produced by all the fish in the school to locate food. Such behaviors are,
respectively, metaphors for the evaluation of candidate solutions in the search process
and the search process itself.
The swimming operators are in charge of guiding the search effort globally toward
regions of the aquarium that are collectively sensed as more promising with regard to
the fitness function. These operators are divided into three classes [2]: individual
movement, collective-instinctive movement, and collective-volitive movement.
Concerning the individual movement, it occurs only if the candidate destination
point lies within the aquarium boundaries and the food density there seems to be
better than the current location, . The swim direction for this movement is randomly
chosen and a parameter, , is defined to determine the fish displacement in the
aquarium. To include randomness in the search process, should be multiplied
by a random number taken from a uniform distribution. Only after this movement
takes place is that the fish feeding will effectively happen [2][14].
All fish contain an innate memory of their success – their weights. As proposed in
[2], fish’s weight variation is proportional to the normalized difference between the
576 L. Menezes and A.L.V. Coelho

evaluation of the fitness function of the previous and current fish position, aiming at
modeling the difference of food concentration on these sites. If the fish did not find a
better position, it is assumed that its weight remains constant, according to [14].
After feeding, the collective-instinctive movement takes place by calculating a
weighted average of the individual movements based on the immediate success of all
fish in the school [2][14]. Only those fish that had successful individual movements
influence the resulting direction of this collective movement. When the overall
direction is computed, each fish is repositioned.
Finally, the third swimming operator, referred to as collective-volitive movement
[2][14], is devised as an overall success/failure evaluation based on the incremental
weight variation of the whole fish school. If the school is accumulating weight, the
radius of the school should contract; otherwise, it should enlarge. This amplification
or contraction is applied as a small step drift to every fish position taking as reference
the school’s barycenter. The barycenter is calculated by considering all fish positions
and their respective weights. Also in this movement, a control parameter, , is
adopted to determine the effective fish displacement in the aquarium [2][14]. The
value of this parameter can be set as a function of [14].
In this paper, in order to cope with the biclustering problem, a binary encoding of
the candidate solutions has been adopted. Since the standard FSS algorithm was
originally conceived to deal with continuous optimization problems [2][14], we have
resorted to the same trick adopted in the context of PSO to convert the representation
of the solutions from real to binary (see Subsection 4.2). By this means, a vector with
binary positions ( ) effectively encodes the solution (bicluster) associated with each
fish after the individual movement operator.
The main steps of the FSS algorithm adopted here are described below [2][14]:
Randomly initialize the first population of fish
Evaluate the fitness value of each fish
While termination conditions are not satisfied
For each fish i
Update its position applying the individual movement operator
1 1,1
1, if 1 0, 1
1 , where
0, otherwise
1
1
1
Evaluate the fitness value of fish i
Apply feeding operator

1
max ∆
end for
Calculate weighted average of individual movements
∑ ∆ ∆
∑ ∆
For each fish i
Mining Coherent Biclusters with Fish School Search 577

Apply collective-instinctive movement operator


1
end for
Calculate the school’s barycenter


For each fish i
Apply collective-volitive movement operator, depending whether the overall weight
of the school has increased or remained constant
1 0,1 , if overall weight increased

1 0,1 , otherwise
end for
Update individual and volitive steps
1
#
1 2 1
end while

4 Contestant Algorithms
In the sequel, we briefly outline the main steps behind the GA and PSO algorithms as
investigated here for biclustering purposes.

4.1 Genetic Algorithms

Genetic Algorithms are general-purpose search algorithms that use the vocabulary
and principles borrowed from natural genetics [10]. In a nutshell, a GA instance
moves from one population of individuals (solutions), referred to as chromosomes, to
a new population, using selection (to reproduce and to survive) together with
genetics-inspired operators, such as crossover and mutation. By this means,
individuals with better genetic features are more likely to survive and produce
offspring increasingly fit in the next generations, while less fit individuals tend to
disappear. The most known form of a GA (referred to as standard or simple GA [10])
employs a binary encoding of the solutions whereby a genotype-phenotype mapping
is used for interpretation and evaluation of the individuals. The pseudocode of the GA
instance used in this paper is presented next:
Randomly initialize the first population of chromosomes
Evaluate the fitness value of each chromosome
While termination conditions are not satisfied
Select parents to generate new chromosomes
Apply genetic operators (crossover and mutation) with a given probability
Evaluate the new individuals
Choose individuals to form the new generation
end while
578 L. Menezes and A.L.V. Coelho

4.2 Particle Swarm Optimization

The PSO algorithm maintains a swarm of particles where each particle represents a
solution to the problem [11]. During its flight, all particles perform three basic
operations, namely, they evaluate themselves, compare the quality of their solutions
with that of their neighbors, and try to mimic that neighbor with the best performance
so far. By this means, the position of each particle is adjusted each iteration based on
its own previous experience and the experience of its social neighbors.
Originally, two PSO variants were developed, which differ in the type of the
particles’ neighborhoods: (i) Global Best PSO, aka gbest PSO, in which the
neighborhood of each particle is the entire swarm; and (2) Local Best PSO, aka lbest
PSO, which creates a neighborhood for each particle comprising a number of local
neighbors, possibly including the particle itself [11].
Each particle i has a set of attributes, namely, its current position ( , current
velocity ( , the best position discovered by the particle so far ( ) and the best
position discovered so far by its associated neighborhood ( or ). All
particles start with randomly initialized velocities and positions. The position of a
particle is changed by adding a velocity to the current position. It is the velocity
vector that drives the search process, and reflects both the experiential knowledge of
the particle and socially exchanged information from the particle’s neighborhood.
In the update of the particle’s velocity, three control parameters assume important
roles, namely: the inertia weight, , which is a sort of momentum factor, controlling
the influence of the previous velocity; and and , which are acceleration
coefficients (known as cognitive and social factors), delimiting how strongly the
particle is attracted by the regions containing pbest and gbest, respectively. The
impact of the latter parameters are modulated by random variables, which are
responsible for the stochastic nature of the algorithm [11][19].
In this study, a discrete version of the gbest PSO was used to search for biclusters
as particles. In this case, for a particle flying over a binary space, the values of its
position and velocity vectors must lie in the range 0,1 . A straightforward trick is to
use a logistic function over velocity to transform it from real to binary spaces [19]:
.
By now, the dimensions of the velocity vector of each particle are represented as
probability thresholds, and the components of the novel position vector of the particle
are calculated by randomly choosing a number in 0,1 and then verifying whether
this number is higher than the respective threshold. The pseudocode of the PSO
instance used in this paper is presented next:

Randomly initialize the first population


While termination conditions are not satisfied
For each particle i
Calculate its fitness value
If the fitness value is better than the best fitness value in its history
Set current position as the new
end if
end for
Mining Coherent Biclusters with Fish School Search 579

Choose the best value among all particles as of all particles


For each particle i
Calculate the particle’s velocity
1 0, 1
0, 1
Update the particle’s position
1, if 1 0, 1
1 , where
0, otherwise
1
1
1
end for
end while

5 Experiments and Results


To assess the applicability of FSS in mining coherent and sizeable biclusters,
experiments were conducted on two real-world datasets. The first, known as Yeast [4]
and with size 2,884 17, relates to the bioinformatics domain, in particular, to the
task of identifying sets of genes of Saccharomyces cerevisiae sharing compatible
expression patterns across subsets of samples. The other dataset, known as MovieLens
[6] and with size 943 1,682, relates to the task of collaborative filtering, whose
general purpose is to perform automated suggestions for a given user based on the
opinions of other users with similar interests. With the biclusters elicited, it is possible
to predict how a given user would rate a certain movie. The value for the coherence
threshold δ (see Section 2) adopted for the two datasets was 300 and 2, respectively.
The three contestant metaheuristics were implemented in Java and integrated into
the BicAT toolbox [1]. After preliminary experiments, the algorithms were finally
configured as follows for both datasets:
• The population size was 30 individuals (either fish, particles or chromosomes);
• The termination condition was to run through 1,000 iterations in total;
• The GA was equipped with uniform crossover, simple bit-flip mutation,
tournament selection of size 2 for reproduction, and fitness-proportionate selection
with linear ranking for population replacement applied only on the offspring and
the 20% best individuals of the current population. Crossover and mutation rates
were set as 0.75 and 0.05, respectively;
• The PSO algorithm was configured with the social and cognitive acceleration
coefficients and set both as 1.0, the same value of the inertia weight ; and
• The values of the GA/PSO control parameters comply with those adopted by Xie
et al. [19]. For FSS, tests with different values of and were
realized. Tables 1 and 2 bring the average ± std. dev. results achieved by nine
configurations of the initial and final values of , taken as percentages of
the actual search space [2] (first column), in terms of MSR and size (aka volume)
of the best bicluster found – recall that here 2 . In these
tables, the best configuration is highlighted and the mean values of the total search
580 L. Menezes and A.L.V. Coelho

runtime (2nd. column), the rate of overlapping among the biclusters of the last
generation (5th. column), the iteration where the best bicluster was found for the
first time (6th. column), and the rate of success (in 30 trials) in locating coherent
and sizeable biclusters (7th. column) are also given. One can notice that FSS is
indeed sensitive to the calibration of and , showing better search
behavior for biclusters with higher initial values for these control parameters.

Table 1. Results for different parameter configurations of FSS – Yeast dataset

Success
Time (s) MSR Volume Overlap Iteration (%)
(10, 1) 440.964 ± 3.808 256.822 ± 34.674 2,953.5 ± 55.516 0.258 ± 0.074 22 ± 12.749 100
(10, 0.1) 440.046 ± 4.201 257.057 ± 36.914 3,008.2 ± 269.947 0.303 ± 0.098 21.85 ± 10.302 95
(10, 0.01) 439.052 ± 1.342 256.457 ± 60.529 3,067.95 ± 413.851 0.297 ± 0.06 19.25 ± 9.419 90
(10, 0.001) 439.157 ± 1.351 253.615 ± 52.875 3,036.3 ± 301.966 0.303 ± 0.094 25.35 ± 10.189 95
(1, 0.1) 439.273 ± 1.358 315.87 ± 183.119 3,915.55 ± 2,115.46 0.281 ± 0.063 17.5 ± 12.352 60
(1, 0.01) 444.821 ± 13.441 332.083 ± 164.295 3,467 ± 1,184.546 0.246 ± 0.039 14.4 ± 8.958 45
(1, 0.001) 442.096 ± 5.814 333.791 ± 152.209 3,750.3 ± 1,786.454 0.247 ± 0.064 15.65 ± 8.235 40
(0.1, 0.01) 440.338 ± 1.829 457.138 ± 196.66 4,768.8 ± 2,156.878 0.228 ± 0.016 6.45 ± 5.125 20
(0.1, 0.001) 439.444 ± 1.331 527.752 ± 115.233 6,498.9 ± 2,215.774 0.227 ± 0.009 8.2 ± 4.372 5

Table 2. Results for different parameter configurations of FSS – MovieLens dataset

Success
Time (s) MSR Volume Overlap Iteration (%)
(10, 1) 284.666 ± 17.539 0.658 ± 0.026 504,929.4 ± 14,549.076 0.402 ± 0.011 32.95 ± 12.275 100
(10, 0.1) 290.29 ± 17.574 0.662 ± 0.02 509,949.95 ± 15,747.593 0.413 ± 0.018 38.6 ± 14.873 100
(10, 0.01) 286.56 ± 18.938 0.662 ± 0.03 509,269.45 ± 17,080.4 0.397 ± 0.025 35.1 ± 16.635 100
(10, 0.001) 289.01 ± 14.671 0.671 ± 0.025 512,090.05 ± 14,601.992 0.409 ± 0.02 39.1 ± 12.957 100
(1, 0.1) 261.52 ± 11.832 0.671 ± 0.025 459,960.25 ± 13,354.922 0.349 ± 0.036 14.45 ± 9.801 100
(1, 0.01) 266.843 ± 13.277 0.664 ± 0.022 464,786.9 ± 12,737.374 0.343 ± 0.04 20.85 ± 12.758 100
(1, 0.001) 263.142 ± 13.73 0.662 ± 0.017 458,808.75 ± 14,839.044 0.348 ± 0.044 17.2 ± 14.062 100
(0.1, 0.01) 251.749 ± 5.104 0.667 ± 0.023 445,906.55 ± 6,306.942 0.269 ± 0.002 7.6 ± 6.116 100
(0.1, 0.001) 253.378 ± 3.617 0.675 ± 0.026 447,246.6 ± 8,891.807 0.269 ± 0.002 7.1 ± 4.09 100

Tables 3 and 4 provide a comparison of the average performance achieved in 30


runs by the three algorithms, considering the same metrics as defined before (the best
FSS configuration for each dataset was used in this assessment). The results confirm
that FSS was indeed a good choice for mining coherent biclusters (i.e., with MSR
values less than the predefined δ) in both datasets, significantly outperforming GA
with regard to average MSR, search runtime, and success rate for the Yeast dataset.
Overall, FSS has prevailed over the others in terms of the criteria of search runtime,
time to locate the best solution and rate of overlapping, meaning that this algorithm is
computationally faster, has a good convergence rate, and is capable of preserving
diversity among the biclusters represented by the individuals (fish). However, in what
Mining Coherent Biclusters with Fish School Search 581

Table 3. Comparative results for Yeast dataset

Algorithm FSS GA PSO


Time (s) 440.964 ± 3.808 981.896 ± 24.266 449.398 ± 1.608
MSR 256.822 ± 34.674 297.543 ± 132.254 233.148 ± 38.171
Volume 2,953.5 ± 55.516 3,607.45 ± 1,329.772 4,844.8 ± 24.541
Overlap 0.258 ± 0.074 1±0 0.993 ± 0.001
Iteration 22 ± 12.749 52.85 ± 5.752 996.05 ± 3.471
Success (%) 100 60 100

Table 4. Comparative results for MovieLens dataset

Algorithm FSS GA PSO


Time (s) 289.01 ± 14.671 1,834.881 ± 59.999 2,705.011 ± 35.687
MSR 0.671 ± 0.025 0.675 ± 0.02 0.668 ± 0.012
Volume 512,090.05 ± 14,601.992 570,444.05 ± 24,607.453 1,244,640.05 ± 18,316.482
Overlap 0.409 ± 0.02 1±0 0.994 ± 0.001
Iteration 39.1 ± 12.957 50.3 ± 6.292 996.85 ± 3.329
Success (%) 100 100 100

concerns the sizes of the biclusters elicited, FSS method has not accomplished the
same level of performance as demonstrated by GA and mostly by PSO, with the latter
noticeably championing in this measure.

6 Final Remarks
In this paper, we presented a first-round empirical evaluation of the performance of a
new bio-inspired metaheuristic, Fish School Search [2], while tackling the non-trivial
biclustering task. When compared to GA and PSO, the results achieved for two real-
world datasets suggest that FSS is indeed very competitive in terms of fast locating
coherent biclusters. As future work, we shall conduct further experiments with other
datasets from bioinformatics [5][7] and text mining [9], and investigate the use of
alternative fitness functions for helping FSS better locating more sizeable biclusters.

Acknowledgment
This work was financially supported by CNPq (via Grant # 312934/2009-2) and
CAPES/PROSUP (via a master degree scholarship).

References
1. Barkow, S., Bleuler, S., Prelić, A., Zimmermann, P., Zitzler, E.: BicAT: A Biclustering
Analysis Toolbox. Bioinformatics 22, 1282–1283 (2006)
2. Filho, C.J.A.B., de Lima Neto, F.B., Lins, A.J.C.C., Nascimento, A.I.S., Lima, M.P.: Fish
School Search. In: Chiong, R. (ed.) Nature-Inspired Algorithms for Optimisation. SCI,
vol. 193, pp. 261–277. Springer, Heidelberg (2009)
582 L. Menezes and A.L.V. Coelho

3. Cheng, Y., Church, G.M.: Biclustering of Expression Data. In: International Conference on
Intelligent Systems for Molecular Biology, pp. 93–103 (2000)
4. Cho, R., Campbell, M., Winzeler, E., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg,
T., Gabrielian, A., Landsman, D., Lockhart, D., Davis, R.: A Genome-Wide
Transcriptional Analysis of the Mitotic Cell Cycle. Mol. Cell 2, 65–73 (1998)
5. Coelho, G.P., de França, F.O., Von Zuben, F.J.: Multi-Objective Biclustering: When Non-
dominated Solutions are not Enough. J. Math. Model Algor. 8, 175–202 (2009)
6. de Castro, P.A.D., de França, F.O., Ferreira, H.M., Von Zuben, F.J.: Applying Biclustering
to Perform Collaborative Filtering. In: International Conference on Intelligent System
Design and Applications, pp. 421–426 (2007)
7. de Castro, P.A.D., de França, F.O., Ferreira, H.M., Von Zuben, F.J.: Applying Biclustering
to Text Mining: An Immune-Inspired Approach. In: de Castro, L.N., Von Zuben, F.J.,
Knidel, H. (eds.) ICARIS 2007. LNCS, vol. 4628, pp. 83–94. Springer, Heidelberg (2007)
8. de França, F.O., Coelho, G.P., Von Zuben, F.J.: bicACO: An Ant Colony Inspired
Biclustering Algorithm. In: Dorigo, M., Birattari, M., Blum, C., Clerc, M., Stützle, T.,
Winfield, A.F.T. (eds.) ANTS 2008. LNCS, vol. 5217, pp. 401–402. Springer, Heidelberg
(2008)
9. Divina, F., Aguilar-Ruiz, J.S.: Biclustering of Expression Data with Evolutionary
Computation. IEEE Trans. Knowl. Data Eng. 18, 590–602 (2006)
10. Eiben, A.E., Smith, J.: Introduction to Evolutionary Computing, 2nd edn. Springer,
Heidelberg (2007)
11. Engelbrecht, A.P.: Fundamentals of Computational Swarm Intelligence. Wiley, Chichester
(2007)
12. Hamilton, W.D.: Geometry for the Selfish Herd. J. Theor. Biol. 31, 295–311 (1970)
13. Madeira, S.C., Oliveira, A.L.: Biclustering Algorithms for Biological Data Analysis: A
Survey. IEEE/ACM Trans. Comput. Biol. Bioinform. 1, 24–45 (2004)
14. Madeiro, S.S.: Multimodal Search by Density-Based Fish Schools (in Portuguese). Master
Dissertation, University of Pernambuco (2010)
15. Mitra, S., Banka, H.: Multi-objective Evolutionary Biclustering of Gene Expression Data.
Pattern Recogn. 39, 2464–2477 (2006)
16. Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Gruissem, W., Hennig,
L., Thiele, L., Zitzler, E.: A Systematic Comparison and Evaluation of Biclustering
Methods for Gene Expression Data. Bioinformatics 22, 1122–1129 (2006)
17. Sumpter, D.: Collective Animal Behavior. Princeton Univ. Press, Princeton (2010)
18. Tanay, A., Sharan, R., Shamir, R.: Biclustering Algorithms: A Survey. In: Srinivas, A.
(ed.) Handbook of Computational Molecular Biology, Chapman & Hall/CRC (2005)
19. Xie, B., Chen, S., Liu, F.: Biclustering of Gene Expression Data Using PSO-GA Hybrid.
In: International Conference Bioinformatics and Biomedical Engineering, pp. 302–305
(2007)

You might also like