Modeling and Simulation of Complex Communication Networks
Modeling and Simulation of Complex Communication Networks
This publication is copyright under the Berne Convention and the Universal Copyright
Convention. All rights reserved. Apart from any fair dealing for the purposes of research
or private study, or criticism or review, as permitted under the Copyright, Designs and
Patents Act 1988, this publication may be reproduced, stored or transmitted, in any
form or by any means, only with the prior permission in writing of the publishers, or in
the case of reprographic reproduction in accordance with the terms of licences issued
by the Copyright Licensing Agency. Enquiries concerning reproduction outside those
terms should be sent to the publisher at the undermentioned address:
www.theiet.org
While the authors and publisher believe that the information and guidance given in this
work are correct, all parties must rely upon their own skill and judgement when making
use of them. Neither the authors nor publisher assumes any liability to anyone for any
loss or damage caused by any error or omission in the work, whether such an error or
omission is the result of negligence or any other cause. Any and all such liability
is disclaimed.
The moral rights of the authors to be identified as authors of this work have been
asserted by them in accordance with the Copyright, Designs and Patents Act 1988.
Preface xiii
Index 413
Preface
Thank you for choosing “Modeling and Simulation for Complex Networks.” This book
offers a unique set of chapters and case studies employing the use of various disparate
techniques for the modeling and simulation of complex communication networks.
Rather than focus on simplistic models using simple numerical simulations, the book
focuses instead on tools and techniques which can be used for the realistic modeling of
large-scale and complex communication networks—termed collectively as Complex
Adaptive COmmunicatiOn Networks and environmentS (CACOONS).
The book has been logically sectioned in three parts. The first part focuses on
the importance of modeling and simulation and also gives two varied examples of
unconventional but powerful tools which can be useful for any type of modeling and
simulation, in general, and modeling and simulation of CACOONS, in particular.
This is followed by the second part which presents three critical reviews and
surveys in the domain of modeling and simulation. The third part of the book focuses
on practical case studies of modeling and simulation using different techniques. Of
interest here is a focus on the use of the Cognitive Agent-based Computing (CABC1 )
framework. CABC framework can be used to model any type of complex physical
system or complex adaptive system (CAS). As such, it can be a very useful approach to
model and simulate any type of CACOONS. The third part presents several practical
case studies employing the use of CABC framework in various areas of CACOONS.
Next, I give an overview of the various chapters in a bit more detail.
The first part starts with the essence and importance of “Modeling and Simula-
tion,” presented by Ören et al. The chapter not only gives an overview of modeling
and simulation but also presents taxonomies. The chapter first gives an overview of
why simulation could be needed instead of an actual system. Then it moves on to
taxonomies and ontologies for use in modeling and simulation.
The next chapter in part I presents a detailed overview of using Simio, a modern,
sophisticated object-oriented tool for developing simulations of complex real-world
systems. The chapter first starts out with an overview of the Simio object framework.
This is followed by a description of Simio object classes. Next, concepts related
to modeling movement are presented. This can be transformed to model not only
messages in the Internet of Things but also people, mobile devices, and more. A
description of modeling physical components is also presented. This is followed by
techniques for modeling processes. The chapter concludes by giving an overview of
process tables, API, experimentation, and useful applications in scheduling.
1
Pronounced as Ka-bek.
xiv Modeling and simulation of complex communication networks
Part I of the third chapter by Salva et al. presents a simulation environment for
cybersecurity attack analysis for network traffic. The chapter gives an overview of
simulation, emulation, and virtualization. After this, a network case study focusing
on network anomaly detection is presented.
In the second part, there are three critical reviews, each analyzing key literature
related to modeling and simulation in the domain of modeling CACOONS of various
kinds. The first survey by Akram and Niazi presents demand response management
in the domain of smart grid. The chapter first gives an overview of smart grid and how
this particular domain offers unique challenges to develop and understand by means
of modeling and simulation. It next presents an overview of the problem domain of
demand–response management in large-scaled CACOONS in the domain of smart
grid. In terms of approaches, the chapter first presents learning-based approaches.
Subsequently, it focuses on the complexity inherent to smart grid domain. Before
concluding, the chapter then moves on to open research problems and directions in
the domain.
The second chapter in the second part is by Akram and Niazi focusing on the
use of agent-based computing, multiagent systems, and agent-based modeling in
the domain of smart grid. It starts with an overview of the concepts and moves to
applications ranging from learning to more. It concludes after giving an overview of
key-open problems and issues in the domain.
The final chapter in second part is by Ventrella et al. This chapter focuses on the
scale-free network topologies giving a detailed overview as well as literature review
and more. The chapter starts with an overview of concepts pertaining to mapping
the Internet. It then gives concepts of using traceroute to mapping. This is followed
by IP options, subnet discovery, and router-level mapping. The chapter then presents
internet models with a focus on graph theoretic approaches. It gives an overview
of relevant concepts ranging from the basics to topological concepts such as scale-
free, power-law, among others. An overview of network topology generation is also
presented. Afterwards, the chapter moves on to the key topic of interest—namely
shortest path models.
In the final part of the book, six modeling and simulation case studies are pre-
sented. The studies have been selected based on the criteria that first, these will be
of interest not only to modelers and simulation experts but also to researchers and
practitioners in the domain of complex communication and social networks.
The first chapter in the part is focused on the important topic of accurate modeling
of VoIP traffic in modern communication networks by Toral-Cruz et al. The chapter
starts by giving an overview of the importance and complexity in VoIP traffic in
large-scale networks. It next presents the concepts of why modern networks have
evolved from simple packet networks to multiservice networks. Subsequently, the
chapter moves on to the importance of QoS in VoIP networks besides presenting VoIP
frameworks such as H.323 and SIP. It then formally describes and models concepts
related to QoS, one-way delay, jitter, self-similar processes, and more.
The second chapter in the last part presents implementation of two framework
levels from the CABC framework in the domain of Internet of Things. The chapter
starts with concepts related to the CABC framework and the simulator of choice.
Preface xv
It then presents research questions in the domain of 5G and the IoT. It then presents
detailed results and discussion in the domain.
The third chapter in the part is by Attaullah et al. and focuses on the use of the
DescRiptivE Agent-based Modeling (DREAM) from the CABC framework for the
modeling and simulation of the Chord peer-to-peer (P2P) protocol. The chapter first
introduces the Chord protocol describing its inherent complexities requiring the use
of more advanced modeling and simulation techniques. After the description of chord
protocols, the chapter presents the DREAM for the protocol allowing for a quantitative
description using complex network centralities. The chapter also presents detailed
results from both PeerSim as well as NetLogo-based simulations besides comparing
DREAM with the previous approach—the so-called ODD approach originating from
the domain of ecology and having been traditionally used in the past to model agent-
based and individual-based models.
This is followed by another chapter employing the use of DREAM and ODD to
model a P2P protocol commonly known as the Kademlia protocol. The chapter first
gives an overview of the protocol and the challenges associated with the complexity
of P2P protocols. This is followed by ODD and DREAM models, results, discussion,
and a detailed comparison.
BitTorrent is a very commonly used P2P protocol in the real world. The next
chapter in the part presents the use of the DREAM modeling level of the CABC
framework for the modeling and simulation of the BitTorrent protocol. After pre-
senting the background and overview of the torrent protocol, the chapter presents a
BitTorrent case study for use in the simulation model. Next, ODD and DREAM are
presented before a set of detailed discussion on the utility of the CABC framework
in the modeling and simulation of CACOONS.
The final chapter in the book is by Khan and Niazi and presents the application of
CABC level 1—complex network modeling level for the use of complex citation net-
works to analyze the domain of “Social networks.” The chapter starts by introducing
related concepts focused on measuring impact, citations, and scientometrics. It then
presents the dataset retrieved for developing the complex citation networks. This is
followed by a detailed network analysis demonstrating how this approach can be used
to model, simulate, transform, and analyze various types of complex networks data.
The book presents first steps in the domain of consolidating material specifically
focused on the modeling and simulation of complex communication networks—
CACOONS. It presents a selection of key case studies as well as concepts with a
primary focus on making the concepts accessible to a wide audience. However, like
any text in such a large and vibrant domain, it is understandable that we were only
able to present a sampling of key case studies and modeling paradigms in the domain.
Readers are further recommended to follow Springer-Nature CASs Modeling journal
for gaining access to more case studies and applications in the domain of modeling
and simulation of complex communication networks—CACOONS.
While we have tried our level best to minimize errors, it is impossible to minimize
all errors. If the book looks nice, it is all due to the efforts put in by the IET staff. And
if there are any mistakes, I humbly accept them to be mine. As such, it is requested to
kindly do keep sending your valuable and kind feedback and comments to the book
editor at [email protected].
Part I
Modeling and simulation
Chapter 1
Modeling and simulation: the essence and
increasing importance
Tuncer Ören1 , Saurabh Mittal2 , and Umut Durak3
The technical aspects of the essence of simulation are elaborated based on the
following definition: simulation is performing a goal-directed experimentation or
gaining experience under controlled conditions by using dynamic models either to
develop/enhance skills or for entertainment; where a dynamic model denotes a model
for which behaviour and/or structure is variable over time. Hence, experimentation
and experience aspects are explained. Several taxonomies, ontologies, and some
ontology-based dictionaries are cited for a comprehensive and integrative percep-
tion of simulation. Finally, the evolution and increasing importance of simulation is
explained.
1.1 Introduction
‘Simulation as a discipline is like mathematics and logic. It can be studied per se to
develop its own theories, methodologies, and tools, and it can be used in a multitude
of problem areas in many disciplines. The uses of simulation involve this second
aspect and make it a vital enabling technology for many disciplines’. The above is
from the conclusion section of another publication [1]. A recent publication ‘Guide
to Simulation-Based Disciplines: Advancing our Computational Future’ elaborates
on the universality of simulation [2], and another publication ‘The Profession of
Modeling and Simulation’ casts light on the professional aspect of simulation [3].
The clarifications given in this chapter on many aspects of simulation are relevant to
the universality of simulation.
The term simulation has been in existence in English since fourteenth century.
Its meaning is based on the concept of similarity. Depending on the goal of the
similarity, the original non-technical use of the term simulation has positive and
negative connotations. From a positive point of view, simulation implies imitation
such as simulated leather or simulated pearl. From a negative point of view, simulation
1
School of Electrical Engineering and Computer Science, University of Ottawa, Ontario, Canada
2
The MITRE Corporation, United States
3
German Aerospace Center (DLR), Institute of Flight Systems, Germany
4 Modeling and simulation of complex communication networks
implies disguised reality, e.g. counterfeit, feigning, false show, and hypocrisy. Later,
the term simulation acquired technical meanings. However, still the term is also used
with its original non-technical connotations. To denote its technical aspects, we use
the following concise and comprehensive definition:
Due to its many aspects, there are many definitions of simulation. About 100 defini-
tions of simulation were compiled and presented in nine categories by Ören [4], and
a critical review of them was offered in a sequel publication [5]. As a testimony of the
variety of simulation, Appendix A lists over 750 types of simulation. Appendix B, a
list of 120 types of input variables, is yet another testimony of the richness of the field.
M&S is essentially composed of two separate activities: modelling and simula-
tion. While modelling necessitates abstraction, simulation is purely an engineering
activity that involves expertise from computer science and engineering discipline [2].
When we talk of simulation as a singular activity, it subsumes model building. From
historical evidence, model building has been attempted by various non-technical
means and a constant engagement with the problem-at-hand or the question under
exploration. Model development has been undertaken in different disciplines in
diverse manner. Some examples are as follows:
● Engineering: Model building is done for two purposes: design and control.
The design aspect involves creating model(s) of a ‘would be’ system. The con-
trol aspect necessitates building the model of an ‘existing system’ that needs
exploration of various control algorithms and mechanisms.
● Science: Model building is done to understand a natural phenomenon. New
nomenclature, taxonomy, vocabulary and abstractions are developed. The crit-
ical part is the specification of assumptions that limit the complexity of real
world in the model description.
● Education: Model building is done to explain, teach, understand, or learn a real-
world phenomenon. The abstraction level is dependent upon the audience that is
undergoing learning.
● Training: Model building is done to impart training or enhance skills (motor,
decision-making and communication, and operation) of the trainee in a specific
complex environment where it is cost prohibitive to involve real-world assets and
systems.
● Entertainment: Model building is done to provide a fictional reality in real or
staged environments for amusement purposes.
● Decision support: Model building is done to evaluate various courses of action
of a real-world state of a system on an existing model. In such cases, it is cost
prohibitive to perform real-world evaluation due to danger to life and property.
In all the disciplines mentioned above, model building incorporates the skill of devel-
oping abstractions. The determination of an abstraction level is contingent upon
Modeling and simulation: the essence and increasing importance 5
various factors such as the problem-at-hand, the desired goal, the available tools,
and the available knowledge. For example, in each economic era, from the age of
farming to Industrial Age and to the Information Era we currently are in, the prob-
lems, the desired goals, the tools, and the availability of knowledge have evolved,
leading to new representation of models. Some models that describe the natural laws
have withstood the test of time for example Newton Laws developed in eighteenth
century, and sometimes, a completely known theory developed in twentieth century
such as quantum mechanics fundamentally changes the perception of reality. Each
economic era has led to the evolution of these four aspects and, consequently, model
building has evolved accordingly. Model building takes its refuge in mathematics
at the core level and involves constant subject–environment engagement to keep the
developed abstractions attuned to the problem-at-hand. In times, today, much of the
model development has moved to computerized workbenches, often called integrated
development environments that bridge the gap between the model builder and the
model representation.
The simulation activity builds upon the model-building activity and presents
the challenge of running the model over time. In a computational environment, a
simulator (a software entity) is tasked with managing the advancement of time. In
a non-computational environment, the perception of movement of time becomes a
critical factor in determining how effective the simulation is. For example, in a stage or
theatre, if the modelled ‘act’is executed in slower time or faster than real time, it would
yield a completely different experience to the audience. Likewise, in a computational
environment, the advancement of time delivers results that may or may not address
the problem-at-hand. Handling time on an appropriate time base then becomes a
paramount activity in simulation.
In the following sections, the following is done. Experimentation aspects of
simulation are discussed in Section 1.2. In Section 1.3, experience aspects to
develop/enhance three types of skills or for entertainment are discussed. Taxonomies
and ontologies of simulation are mentioned in Section 1.4. Evolution and increasing
importance of simulation is discussed in Section 1.5, and last section is for conclusion.
1. Real system may not exist (as in engineering problems where new systems are
aimed to be built).
2. Real system may not be reachable for experimentation (e.g. testing lunar
vehicles).
6 Modeling and simulation of complex communication networks
thousand link trainers were manufactured from 1934 to 1950 [12]. They provided
means for training basic motor skill for pilots. The current flight simulator market is
about USD 6 Billion and 2021 forecast is about USD 7.5 Billon [13].
In constructive simulation, simulated people use simulated equipment in virtual
environment. The aim is to enhance decision-making and communication skills of
trainees through interactions with the simulation systems. Air traffic control simu-
lation systems are one of the typical examples [14]. Simulated pilots use simulated
aircrafts in air traffic control simulation systems where the simulation provides the
possibility to train controllers for decision-making and communication skills. One of
the commercial-off-the-shelf products is MaxSim – air traffic control simulators from
Adacel which can generate realistic air traffic based on defined scenarios and provides
direct voice communication possibilities with virtual pilots via speech recognition
features [15].
One of the key issues of simulation-based training is transfer of training which
is defined as the degree to which trainees effectively apply the trained skills in real
operation [16]. The research about transfer of training in flight simulators has quite
a long history. Valverde has published a paper in 1973 that provides a review of
flight simulator transfer of training studies since the 1950s [17]. In one of the recent
studies, Pool and Zaal present a cybernetic approach to assess the transfer of training
for manual control skills in flight simulators using multi-channel pilot models [18].
The fidelity, immersion, presence, and buy-in are defined as the four factors that drive
the transfer of training [19]. Fidelity is defined as the extent to which the simulation
matches the real world. While the immersion is the feeling of the individual to be
absorbed by the experience, in the situated immersion, the presence is defined as the
subjective experience of existence within the simulation [20]. Buy-in is eventually
the user’s acceptance of the experience as a useful training event.
8 Modeling and simulation of complex communication networks
1.4.1 Background
Taxonomy, as the science of classification, is an indispensable aspect of scientific
studies and is concerned with finding, describing, classifying, and naming of things.
For example, taxonomies of plants and animals identify logical relationships of dif-
ferent species. In animal taxonomy, a living organism is assigned successively in a
kingdom, a phylum, class, order, family, genus, and species. Another example is tax-
onomy of learning and Bloom’s taxonomy of educational objectives [36]. Taxonomy of
learning and Bloom’s taxonomy of educational objectives are particularly important
Modeling and simulation: the essence and increasing importance 9
Due to the richness of modelling and simulation and its relationship with other
relevant disciplines, several other taxonomies of specific topics will be useful. Even
the most fundamental concepts have several terms to represent nuances. For example,
there are over 150 terms related with ‘variables’, over 90 terms related with ‘values’,
and over 1,000 terms related with or representing types of models (M&S Bok Index
studies). To attest the richness of the field two appendices are given. Appendix A is a
list of over 750 types of simulation and Appendix B is a list of 120 types of input.
1.4.3 Ontologies of simulation
Silver et al. prepared an ontology for discrete-event modelling and simulation [64].
The book edited by Tolk [65] is a very good source of information about simulation
ontologies. From Tolk’s book, the following are noteworthy contributions to simu-
lation ontology: Partridge et al. [66]; Hofmann [67]; Heath and Jackson [68]; and
Wang et al. [69]. An ontology for simulation systems engineering is developed by
Durak and Ören [70].
An ontology-based dictionary of multimodels was prepared by Ören, Mittal, and
Durak [9]. An ontology-based dictionary of machine understanding can be used for
simulating systems with understanding abilities including systems able to understand
emotions [38].
As a normative view, we think that development of new and updated as well
as more diversified taxonomies, ontologies, and ontology-based dictionaries may be
useful for learning several aspects of simulation, since it is progressing very rapidly
and becoming infrastructure for many disciplines.
1.6 Conclusion
Experimentation and experience aspects of simulation have already made it an
invaluable infrastructure for many disciplines and application areas. In this chapter,
evolution and increasing importance of simulation are elaborated after clarifications
of its experimentation and experience aspects. A comprehensive and integrative view
of simulation would be helpful to appreciate many advantages it offers. For this rea-
son, many taxonomies and ontologies and some ontology-based dictionaries are also
presented.
Disclaimer
The author’s affiliation with The MITRE Corporation is provided for identification
purposes only, and is not intended to convey or imply MITRE’s concurrence with,
or support for, the positions, opinions or viewpoints expressed by the author(s).
Approved for Public Release, Distribution Unlimited [Case Number: PR_17-3254-2].
12 Modeling and simulation of complex communication networks
References
[1] Ören T.I. ‘Uses of simulation’ in Sokolowski J.A., Banks C.M. (eds.). Princi-
ples of Modeling and Simulation: A Multidisciplinary Approach. New Jersey:
John Wiley; 2009. pp. 153–179.
[2] Mittal S., Durak U., Ören T. (eds.). Guide to Simulation-Based Disciplines:
Advancing our Computational Future. Cham: Springer; 2007.
[3] Tolk A., Ören T. (eds.). The Profession of Modeling and Simulation: Discipline,
Ethics, Education, Vocation, Societies, and Economics. Hoboken, NJ: John
Wiley & Sons; 2017.
22 Modeling and simulation of complex communication networks
[4] Ören T.I. ‘The many facets of simulation through a collection of about 100
definitions’. SCS M&S Magazine. 2011, vol. 2(2), pp. 82–92.
[5] Ören T.I. ‘A critical review of definitions and about 400 types of modeling and
simulation’. SCS M&S Magazine. 2011, vol. 2(3), pp. 142–151.
[6] Ören T.I., Zeigler B.P. ‘Concepts for advanced simulation methodologies’.
Simulation. 1979, vol. 32(3), pp. 69–82.
[7] Ören T.I. ‘Modeling and simulation: A comprehensive and integrative view’
in Yilmaz L., Ören T.I. (eds.). Agent-Directed Simulation and Systems
Engineering. Berlin: Wiley; 2009. pp. 3–36.
[8] Ören T.I., Yilmaz L. ‘Philosophical aspects of modeling and simulation’ in
Tolk A. (ed.). Ontology, Epistemology, and Teleology of M&S: Philosophical
Foundations for Intelligent M&S Applications. Berlin, Heidelberg (Germany):
Springer-Verlag; 2013. pp. 157–172.
[9] Ören T., Mittal S., Durak U. ‘The evolution of simulation and its contribu-
tions to many disciplines’ in Mittal S., Durak U., Ören T. (eds.). Guide to
Simulation-Based Disciplines: Advancing our Computational Future. Cham
(Switzerland): Springer; 2017. pp. 3–24.
[10] Bruzzone A.G., Massei M. ‘Simulation-based military training’ in Mittal S.,
Durak U., Ören T. (eds.). Guide to Simulation-Based Disciplines: Advancing
our Computational Future. Cham (Switzerland): Springer; 2017. pp. 315–362.
[11] Bezdek W.J., Maleport J., Olshon R. ‘Live, virtual & constructive simulation
for real time rapid prototyping, experimentation and testing using network
centric operations’. AIAA Modeling and Simulation Technologies Conference
and Exhibit, Honolulu, HI, 2008.
[12] De Angelo J., George L.S., Moody J. The Link Flight Trainer: An Historic
Mechanical Engineering Landmark. ASME International, History and Her-
itage Committee, & Roberson Museum & Science Center, Binghamton, New
York, 2000.
[13] MarketsandMarkets. Flight Simulator Market by Application (Military,
Commercial), by Type of Flight (Fixed Wing, Rotary Wing, Unmanned
Aircraft), Military Component (FFS, FMS, FTD), Commercial Compo-
nent (FFS, FBS, FTD), Geography – Global Forecast to 2021 [online].
Available from http://www.marketsandmarkets.com/Market-Reports/flight-
simulator-market-22246197.html [Accessed 09 Sep 2017].
[14] Hopkin V.D. Human Factors in Air Traffic Control. Bristol, PA: CRC Press;
1995.
[15] Adacel. ATC Simulation and Training [online]. Available from http://www.
adacel.com/solutions_services/downloads/brochures/2017_MaxSim_WEB.pdf
[Accessed 11 Sep 2017].
[16] Baldwin T.T., Ford J.K. ‘Transfer of training: A review and directions for future
research’. Personnel Psychology. 1988, vol. 41(1), pp. 63–105.
[17] Valverde H.H. ‘A review of flight simulator transfer of training studies’. Human
Factors. 1973 vol. 15(6), pp. 510–522.
[18] Pool D.M., Zaal P.M.T. ‘A cybernetic approach to assess the training of manual
control skills’. IFAC-PapersOnLine. 2016 vol. 49(19), pp. 343–348.
Modeling and simulation: the essence and increasing importance 23
[19] Alexander A.L., Brunyé T., Sidman J., Weil S.A. From Gaming to Training:
A Review of Studies on Fidelity, Immersion, Presence, and Buy-In and Their
Effects on Transfer in PC-Based Simulations and Games. DARWARS Training
Impact Group, Woburn, MA: 2005.
[20] Witmer B., Singer M. Measuring presence in virtual environments. U.S. Army
Research Institute for the Behavioral and Social Sciences Tech. Report No.
1014, 1994.
[21] Yeh T.Y., Faloutsos P., Reinman G. ‘Enabling real-time physics simulation in
future interactive entertainment’. Proceedings of the 2006 ACM SIGGRAPH
Symposium on Videogames; Boston, MI; 2006.
[22] Eberly D.H. Game Physics. 2nd edition, Boca Raton, FL: CRC Press; 2010.
[23] Millington I. Game Physics Engine Development. San Francisco, CA: Morgan
Kaufmann Publishers; 2007.
[24] Bullet Physics Library [online] Available from http://bulletphysics.org/
wordpress/ [Accessed 11 Sep 2017].
[25] Autodesk® Maya [online] Available from https://www.autodesk.de/products/
maya [Accessed 11 Sep 2017].
[26] Blender [online] Available from https://www.blender.org/ [Accessed 11 Sep
2017].
[27] Red Dead Redemption [online] Available from http://www.rockstargames.
com/games/info/reddeadredemption [Accessed 11 Sep 2017].
[28] Toy Story 3 [online] Available from http://games.disney.com.au/toy-story-3-
video-game [Accessed 11 Sep 2017].
[29] GameWorks PhysX Overview [online] Available from https://developer.nvidia.
com/gameworks-physx-overview [Accessed 11 Sep 2017].
[30] Havok [online] Available from https://www.havok.com/ [Accessed 11 Sep
2017].
[31] Boeing A., Bräunl T. ‘Evaluation of real-time physics simulation systems’.
Proceedings of the 5th International Conference on Computer Graphics and
Interactive Techniques in Australia and Southeast Asia; Perth, Australia; 2007.
[32] Iben H., Meyer M., Petrovic L., Soares O., Anderson J., Witkin A. Artistic
simulation of curly hair. Pixar Animation Studios Technical Memo 12-03a,
2012.
[33] Merida [online] Available from http://princess.disney.com/merida [Accessed
11 Sep 2017]
[34] Brave [online] Available from http://movies.disney.com/brave [Accessed 11
Sep 2017].
[35] Mullen T. Bounce, Tumble, and Splash!: Simulating the Physical World with
Blender 3D. Indianapolis, IN: John Wiley & Sons; 2008.
[36] Anderson L.W. A Taxonomy for Learning, Teaching, and Assessing: Pearson
New International Edition: A Revision of Bloom’s Taxonomy of Educational
Objectives, Abridged Edition. London, England: Pearson Education Limited;
2013.
[37] Ören T., Mittal S., Turnitsa C., Diallo S.Y. Simulation-based learning and
education disciplines’ in Mittal S., Durak U., Ören T. (eds.). Guide to
24 Modeling and simulation of complex communication networks
[66] Partridge C., Mitchell A., de Cesare S. ‘Guidelines for developing ontological
architectures in modelling and simulation’ in Tolk A. (ed.). Ontology, Epis-
temology, and Teleology of M&S: Philosophical Foundations for Intelligent
M&S Applications. Berlin, Heidelberg (Germany): Springer-Verlag; 2013, pp.
27–57.
[67] Hofmann, M. ‘Ontologies in modeling and simulation: An epistemologi-
cal perspective’ in Tolk A. (ed.). Ontology, Epistemology, and Teleology of
M&S: Philosophical Foundations for Intelligent M&S Applications. Berlin,
Heidelberg (Germany): Springer-Verlag; 2013. pp. 59–87.
[68] Heath, B.L., Jackson R.A. ‘Ontological implications of modeling and simula-
tion in postmodernity’ in Tolk A. (ed.). Ontology, Epistemology, and Teleology
of M&S: Philosophical Foundations for Intelligent M&S Applications. Berlin,
Heidelberg (Germany): Springer-Verlag; 2013. pp. 89–103.
[69] Wang W., Wang W., Li Q., Yang F. ‘Ontological, epistemological, and tele-
ological perspectives on service-oriented simulation frameworks’ in Tolk A.
(ed.). Ontology, Epistemology, and Teleology of M&S: Philosophical Foun-
dations for Intelligent M&S Applications. Berlin, Heidelberg (Germany):
Springer-Verlag; 2013, pp. 335–358.
[70] Durak U., Ören T. ‘Towards an ontology for simulation systems engineering’.
Proceedings of the SpringSim’16; Pasadena, CA, 2016.
[71] Ören T.I. ‘Computer-aided modelling systems’ in Cellier F.E. (ed.). Progress
in Modelling and Simulation. London: Academic Press; 1982. pp. 189–203.
[72] Ören T.I., Zeigler B.P. ‘System theoretic foundations of modeling and simula-
tion: A historic perspective and the legacy of A. Wayne Wymore’. Simulation.
2012, vol. 88(9), pp. 1033–1046.
[73] Zeigler B.P. Multifacetted Modeling and Discrete Event Simulation. London:
Academic Press; 1984.
[74] Simon, H.A., Newell A. ‘Simulation of human thinking’ in Greenberger M.
(ed.). Computers and the World of the Future. Cambridge, MA: The MIT Press;
1962. pp. 94–131.
[75] Feigenbaum E.A., Feldman J. (eds.). Computers and Thought. McGraw-Hill
Book Company; 1963
[76] Ören T.I. ‘Artificial intelligence and simulation: A typology’. Proceedings of
the 3rd Conference on Computer Simulation; Mexico City, 1995
[77] Yilmaz L., Ören T.I. (eds.). Agent-Directed Simulation and Systems Engineer-
ing. Berlin: Wiley-Berlin; 2009.
[78] Ören T.I., Yilmaz L. ‘Synergy of systems engineering and modeling and
simulation’. Proceedings of the 2006 International Conference on Modeling
and Simulation – Methodology, Tools, Software Applications (M&S MTSA);
Calgary, AL, Canada, 2006.
[79] NATO-SaaS. Modeling and Simulation as a service: New concepts and
service-oriented architectures. NATO STO Technical Report AC/323(MSG-
131)TP/608, 2015.
[80] Ören T.I., Zeigler B.P., Elzas M.S. (eds.). Simulation and model-based
methodologies: An integrative view. Berlin: Springer-Verlag; 1984.
Chapter 2
Flexible modeling with Simio
David T. Sturrock1 and C. Dennis Pegden1
2.1 Overview
1
Simio LLC, USA
28 Modeling and simulation of complex communication networks
Simio comes with pre-built libraries of objects. For example, the Standard Library
is set of general purpose objects (source, server, path, sink, etc.) that is commonly
used to model a wide range of discrete systems. Likewise, the Flow Library is a
set of general purpose objects (e.g., tank, pipe, filler) that is used to model systems
involving material flows such as liquids, sand, gravel, etc. Many other libraries are
also available such as the Extras library that represents cranes, elevators, robots,
and more.
In many cases, a modeling project is approached by first building a custom library
of special purpose objects, and then those objects are used as building blocks for creat-
ing a model. For example, a complex communication network involving ships, tanks,
airplanes, command centers, satellites, etc. can be modeled by first creating objects
representing each of the physical components and then placing multiple instances of
these objects into the final model. Objects can be stored in libraries and easily shared.
A beginning modeler may prefer to use pre-built objects from libraries; however, the
system is designed to make it easy for even beginning modelers to build their own
intelligent objects.
As noted above, a Simio model is built by combining objects that represent the
physical components of the system. A Simio model looks like the real system. The
model logic and animation is built as a single step. An object is animated in 3D to
reflect the physical object and its changing state. For example, a robot opens and
closes its gripper, and a battle tank turns its turret. The animated model provides a
moving picture of the system in operation. To simplify the effort of building animated
3D models, Simio can import 2D and 3D background objects as well as 2D and 3D
object representations from the target domain. Simio also provides a direct link to
Trimble 3D Warehouse, a free massive online library of 3D graphic symbols that
contains high-quality 3D symbols from virtually every domain.
Objects are built using graphical processes and the concepts of object-orientation.
There is no need to write programming code to create new custom objects. The activity
of building an object in Simio is identical to the activity of building a model—in fact,
there is no difference between an object and a model. This concept is referred to as
the equivalence principle and is central to the design of Simio. Whenever you build a
model, it is an object that can be instantiated into another model. For example, if you
combine two satellite dishes and six missile launchers into a missile defense battery,
the missile defense battery model is itself an object (see Figure 2.2) that can then
be instantiated any number of times into other models. The missile defense battery
model is an object just like the satellite dish and missile launchers are objects. In
Simio, there is no way to separate the idea of building a model from the concept
of building an object. Every model that is built in Simio is automatically a building
block that can be used in building higher level models.
Composite objects: The previous example in which we defined a new object def-
inition (missile defense battery) by combining other objects (satellite dish and
missile launcher) is one example of how we can create object definitions in Simio.
This type of object is called a composed object because we create this object by
combining two or more component objects. This object-building approach is fully
Flexible modeling with Simio 29
Intelligent object
Entity
Transporter
movement between objects. A node defines a starting or ending point for a link.
Links and nodes can be combined into complex communication and physical net-
works. Although the base link has little intelligence, we can add behavior to allow
it to model unconstrained flow, congested traffic flow, or complex material handling
systems such as accumulating conveyors or power and free conveying systems.
Agents are objects that can freely move through three-dimensional space. Agents
are also typically used for developing agent-based models. This modeling view is
useful for studying systems that are composed of many independently acting intel-
ligent objects that interact with each other and in so doing create the overall system
behavior. Examples of applications include market acceptance of a new product or
service, or population growth of competing species within an environment. Note that
in Simio, all objects graphically defined processes provide intelligence to control
their behavior rather than requiring Java or other programming code as in most other
products.
Entities are objects that can freely move through three-dimensional space. Entities
can move through the system from object to object over a network of links and nodes
or move directly between objects through free space. Examples of entities include
communications such as information packets, or physical items such as tanks, satel-
lites, ships, etc. Note that in traditional modeling systems, the entities are typically
passive and are acted upon by the model processes. However, in Simio, the entities
can have intelligence and control their own behavior.
The final class of object is a Transporter and is subclassed from the entity class.
A transporter is an entity that has the added capability to pick up, carry, and drop-off
one or more other entities. By default, transporters have none of this behavior, but by
adding model logic to this class, we can create a wide range of transporter behaviors.
A transporter can model an airplane, ship, subway car, automated guided vehicle
(AGV), or any other object that can carry other entities from one location to another.
The Standard Library contains a vehicle object and a worker object, both of which
are derived from a transporter object.
A key feature of Simio is the ability to create a wide range of object behaviors
from these six basic classes. The Simio modeling framework is application domain
neutral—i.e., these basic classes are not specific to communications, manufacturing,
service systems, healthcare, military, etc. However, it is easy to build application-
focused libraries comprising intelligent objects from these classes designed for
specific application. For example, it is relatively simple to build an object (in this
case a link) that represents a complex accumulating conveyor for use in manufacturing
applications. The design philosophy of Simio directs that this type of domain-specific
logic belongs to the objects that are built by users, and not programmed into the core
system.
B F
both items are modeled with entities, which move through the 3D model in one of two
ways. The first is to simply move in free space with no constraints in movement. In
this case, the entity can set its own direction, speed, and acceleration. In free space, the
entity is in complete control of its own movement. The second method is to move over
a network of nodes and links, where the network may control and limit the movements
of the entities. Networks are very useful for modeling complex movements.
Networks comprise one or more links, where each link starts and ends at a
node. A node can have any number of incoming and outgoing links. Links can be
unidirectional or bidirectional, have a capacity that limits traffic on the link, and can
have a maximum speed to limit traffic speed. Links also have a selection weight that
can be used in decision rules for routing entities through the network. The example
network in Figure 2.4 has six nodes (labeled A–F) and ten links connecting the nodes,
where the triangles are entities moving through the network.
The complete set of all links in a model is referred to as the global network.
However, links can also belong to one or more subnetworks. For example, the com-
munication links between a set of satellite dishes might be represented by a subnetwork
that is limited for use by signals traveling between satellite dishes, and pathways where
ships travel may be specified by a separate network.
The Standard Library contains four link objects and two node objects. The con-
nector, path, time path, and conveyor are derived from the link object and the basic
node and the transfer node are derived from the node object. The connector moves
entities across the link in zero-time. This type of link is used to model movements
such as signals that travel at the speed of light, for which the travel time is negligible
and can be ignored. The path is a type of link used to model entity movements where
Flexible modeling with Simio 33
each entity can travel at its own speed and either pass or not pass other entities based
on a property that is specified on the path. The time path is used to model situations
where the travel time on the link is specified by an expression (perhaps involving
random variables and other system status variables). The conveyor is a type of link
that is used to model both accumulating and non-accumulating conveyors that are
found in typical manufacturing and warehousing applications. Although the links and
nodes provided by the Standard Library work for many applications, users can also
create their own custom nodes and links.
Each entity in the network can have a single destination where it is headed, or
it can follow a specified travel sequence through the network. A travel sequence is
an ordered list of nodes (e.g., A, C, D, F in Figure 2.4) that must be visited in the
specified order on the way to its last node in the sequence. In either case, an entity
may have more than one possible routing to its next destination. For example, an
entity traveling from C to E could either take the direct path from C to E or travel
from C to D and then D to E. This might be advantageous, for example, if the link
to C to E was congested, and the travel speed on the alternate route through D was
faster and warranted the extra travel distance. The decision for which route to take
when moving to its next intermediate or final destination is based on properties that
are specified on the transfer node. The Outbound Link Rule property specifies that
the link should be selected based the shortest path or on decision weights that can
be assigned to each link. The Link Preference Property specifies if all links are to be
considered, only links that are currently available or a specific link is desired.
Flow source Fixed Generates a flow of fluid or other mass of a specified entity type
Flow sink Fixed Destroys flow entities representing quantities of fluids or
other mass that have finished processing in the model
Tank Fixed Models a volume or weight capacity–constrained location for
holding entities representing quantities of fluids or other mass
Container Entity Models a type of simple moveable container (e.g., barrels or totes)
entity for carrying flow entities representing quantities of fluids or
other mass
Filler Fixed Fills containers with flow entities representing quantities of fluids
or other mass
Emptier Fixed Empties the flow contents of container entities
Item to flow Fixed Converts entities representing discrete items into flow entities
converter representing quantities of fluids or other mass
Flow to item Fixed Converts flow entities representing quantities of fluids or other
converter mass into entities representing discrete items
Flow node Node Regulates the flow of entities representing quantities of fluid or
other mass
Flow Link A zero-time connection between two flow nodes
connector
Flexible modeling with Simio 35
interarrival time (typically a random variable). All properties can be used for either
deterministic or stochastic arrivals. Other properties provide flexibility in terminating
the arrival stream, for example, after a specified time or specified number of arrivals.
Many objects also use events to customize their behavior. Events let objects easily
communicate with other objects. For example, an event triggered elsewhere in the
model might cause a source object to create an arrival or entirely stop creating new
arrivals.
Figure 2.6 shows a simple model built using the source, server, and sink objects,
along with path links to define the movements between these objects. In this example,
entities are created at the source, travel to the server where they queue up and wait
for processing, and then travel to the sink where they depart the model.
The objects shown in Figure 2.6 all have their default generic graphics; in a
typical model, these would be replaced by more appropriate graphics. For example,
if the server represented an ATM machine at a bank, we would typically replace
the rectangular server symbol with a graphic symbol of an ATM machine from 3D
Warehouse. We could also replace the triangles representing entities with animated
36 Modeling and simulation of complex communication networks
Sink1
Server1
Source1
Figure 2.6 Example model using source, server, and sink objects
Sink1
Server1
Source1
walking people. Figure 2.7 enhances that same model with the default graphics for
the server and entity replaced, which required only a few minutes to create.
The server object that is used in this simple model is one of the most powerful
and commonly used objects in the Standard Library. It can model a wide range of
Flexible modeling with Simio 37
Values
Starved
Processing
Blocked
Failed
53.3606%
Off shift
Failed processing
Off shift processing
Setup
Off shift setup
46.6394%
Server1
physical elements of a system that constrain the movement of entities based on one
or more activities that must take place, secondary resources that may be required, and
material that may be consumed. Figure 2.8 illustrates many of the common properties
of the server object as well as an optional attached pie chart in the facility view that
indicates possible resource states.
The server can model multichannel processors, follow complex work schedules,
and incorporate failure/repair patterns. The server can also model complex operations
composed of a network of tasks that follow precedence relationships and operate
parallelly and/or sequentially. For example, Figure 2.9 illustrates a generic six-step
task sequence where each task has prerequisite tasks. Not only is the number and
relationship of tasks unlimited, but each individual task could require resources or
materials, or even be defined to execute one or more other objects, which themselves
might have networks of tasks.
The worker and vehicle are two other objects that are commonly used in Simio
models. The worker object is used to model operators or crew members that move
around the system and perform tasks. For example, a server may request that a worker
must come to the server to set it up before processing an entity. The vehicle object
is used to model ships, trucks, AGVs, etc., that travel through the model, picking up
and dropping off entities. Vehicles have flexible work selection and allocation logic,
38 Modeling and simulation of complex communication networks
Task2 Task4
Task6
Task1
Task3 Task5
reliability logic, and many options to control both behavior and animation such as
load and unload time, dwell logic, and automatic parking and homing options.
While each object has object-specific properties as mentioned above, each object
also has categories of properties that are found across many different objects. For
example, objects that incorporate buffers or queues typically have a Buffer Logic
category that contains properties to describe the capacity of those buffers, as well
as the logic that governs balking (bypass queue entry) and reneging (abort queue
waiting). Most objects have a Financials category that specifies the properties to
support comprehensive activity–based costing and supporting all world currencies.
Objects that typically represent some types of machine or equipment have a
Reliability category where failure-related properties such as downtime mode, period
between downtimes, and time to repair are specified. Downtime modes include cal-
endar or processing time between failures, processing count between failures, and
event-based failures.
Many objects also have categories to provide higher level interaction with other
objects such as state assignments, statistics, customized animation, and data log-
ging. Two broad interaction mechanisms—processes and data tables—are discussed
in Sections 2.6 and 2.7, respectively.
2.6 Processes
The use of library objects permits fast, highly productive modeling. But unless the
library is designed to closely match your application, you will often have to customize
objects in order to model accurately enough to meet project objectives. In most OO
simulation products, this customization can only be accomplished by modifying the
object definition using programming code like Java, C++, or a proprietary language.
Doing so takes a level of expertise often not readily available. Simio provides two
alternatives, both based on the patented concept of processes.
A process is a graphical way of defining the logic behind an object. Processes can
be used to make decisions, seize or release resources, search collections of objects or
data, wait for or trigger communications events, assign state variables, record custom
Flexible modeling with Simio 39
Process1
Decide1 Seize1 Delay1 Release1 Tally1
Begin End
Decide Seize Delay Release Tally
True
False
Wait1 Assign1
Wait Assign
statistics, and much more. Figure 2.10 illustrates a process that makes a decision, then
either seizes, delays, and releases a resource or waits for an event, assigns a state, and
then records an observational statistic. Although over 60 steps are available, the most
commonly used steps are described in Table 2.3.
A process can be used in an object definition to define the logic in a new object
or customize the logic in a subclassed object. But in many cases an even simpler alter-
native is available. Most library objects have “hooks” called add-on process triggers
that can be used to supplement the logic in a specific object instance. Figure 2.11
illustrates the add-on process triggers available in the server object. These support the
40 Modeling and simulation of complex communication networks
database files. But processing such files incrementally during a model run can often
be slow and inconvenient. So Simio extends this commonly available capability to
also create in-memory data repositories called data tables. In-memory tables execute
extremely fast.
The schema or design of data tables is under user control—you can have any
number of columns of different data types, in any order you want. A table can be
designed to be most convenient to the modeler or could be designed to perfectly
match an external data source to avoid transforming the data on each use. Data tables
can be simple tables, like a spreadsheet, or can be comprehensive sets of hierarchical
relational tables linked by keys and foreign keys. Figure 2.12 illustrates a set of
three tables, Job Table, Process Plan, and WIP, that are related by a key field. The
master-detail view is expanded on the first part type to show the relationships.
Tables can be built and used entirely within Simio, but it is more common to
import the table data from an external source. Simio incorporates sophisticated table-
input mechanisms. In addition to CSV, Excel, and databases, Simio directly supports
reading data tables in the Business to Manufacturing Markup Language (B2MML).
Since B2MML is used to integrate business systems such as ERP and software such
as SAP [2] with manufacturing systems such as manufacturing execution systems
(MESs), it is a rich source of predefined information for use in simulations. Simio
can also generate tables directly from Wonderware™, a leading MES software. Tables
can be configured to import on demand, or tables with frequently changing data can be
configured to automatically import with each run. Not only does Simio have extensive
built-in support of data import (see Figure 2.13), but also it provides the capability,
including sample source code, to customize data import with programming in any of
over 60 .NET languages.
42 Modeling and simulation of complex communication networks
Simio tables are key to implementing two important modeling strategies which
are having major impact on the simulation industry.
Data-driven modeling is a way of structuring a model, so much of the model
data is in data tables rather than disseminated throughout the model and allowing
the configuration of the model to take place in data tables (or associated external
files). The combination of these features makes models easier for the modeler to
understand, maintain, and share with others, and makes it easier (e.g., “lowers the
bar”) for stakeholders to use and update the model without comprehensive knowledge
of simulation.
Data-generated modeling is a mechanism for building all or most of a model
entirely from external data. For example, a fairly complete model can be built directly
from data using the B2MML, ISA 95, or Wonderware™ import mechanisms men-
tioned above. Alternatively, the Simio application programming interface (API) can
be used to import model data from virtually any database, spreadsheet, or other data
source. Importing major parts of system configuration and descriptive information
can dramatically lower the time and expertise required to create a model-based solution
to a pressing problem.
While the experiment window offers standard textual reports, most people prefer
the built-in Pivot Grid reports. Like the pivot tables featured in many top data analysis
packages, Simio allows you to filter, sort, and recategorize the data. This allows you
to generate concise, custom reports in literally just a few clicks, then you can save
those reports and reuse them anytime. The details of all scenarios are shown along
with statistical measures like mean, minimum, maximum, and half-width. Both the
summary and the detailed results can also be exported for additional analysis in
external programs.
Simio’s experiment window will automatically run any number of replications,
using all your available processors (defaults up to 16). If you have the common
configuration of a dual-threaded quad core processor, you can run eight replications
in about the same time it would take to run one. With higher versions of Simio, you
can also take full advantage of other computers in your workgroup, and you can even
extend the limit of 16 to take full use of a server farm or network of workgroup
computers. Another approach to running replications is to use the Simio Portal. This
Azure™-based software as a service offering allows you to bring the processing power
of the cloud and scale up to run massively parallel replications and instantly distribute
the results across the internet.
Running many replications quickly is most important when you want to compare
multiple scenarios. Simio allows you to define referenced properties in your model,
which are displayed as controls in your experiment. Controls describe how one sce-
nario differs from another, for example, number of workers or number of servers.
Simio also allows you to define Responses in the experiment. A response is like a
key performance indicator (KPI) that is a quick measure of the performance of each
scenario. Additional statistical information is also recorded on responses to support
the response results view. The response results view is an enhancement of the measure
of risk and error (MORE) analysis technique described by Nelson [3] that makes it
easier for people without a strong statistical background to gain important insight into
their data.
When you have many possible scenarios to evaluate, manually generating them
can be tedious. And it is important to minimize or completely avoid the execu-
tion of poor scenarios. Simio is tightly integrated with OptQuest ® , the leading
simulation-based optimization product. OptQuest uses metaheuristics to guide the
search algorithm to quickly find better solutions. OptQuest combines Tabu search,
scatter search, integer programming, and neural networks into a single composite
search algorithm that is orders of magnitude faster than other approaches [4].
Simio has an extensive API that allows customization of virtually all aspects of
Simio using the API and any of over 60 .NET languages. Users with programming
background can create new tools and customize the menus to display those tools
to your stakeholders. You can create new steps and elements—the fundamentals of
Simio processes. You can create design-time add-ins that support building models
from external data. You can build in new import/export capabilities to support a
unique or proprietary data source for importing individual items, entire data tables, or
even generating models directly from external data. With the API, you can even add
custom experimentation. For example, both OptQuest and the Select Best Scenario
tools were implemented as experiment add-ins.
As an example of highly customized menu items, a customer who was schedul-
ing weather-sensitive operations used the API to add a “Get Weather” menu item
which would log on to a weather subscription service and download regional weather
forecasts into a Simio data table that was directly accessed during planning. This
was combined with other application-specific items to create a custom ribbon using
customer terminology.
While Simio already includes a comprehensive set of scheduling rules (see Sec-
tion 2.10) you can customize these or create your own rules. On the analysis end,
although Simio already includes extensive support for experimentation and optimiza-
tion, the API supports creation of custom design of experiments and add-ins such as
a custom optimization algorithm.
In addition to providing comprehensive documentation of the API, Simio supplies
extensive sets of sample code for all the items mentioned above.
Although simulation tools in the past have been primarily used in the design of complex
networks, Simio is specifically designed to also support applications for scheduling
these same systems. In scheduling applications, the focus is on simulating the actual
flow of entities through the network, given an initial starting state for the system. The
simulation of the entity flow then produces an operational schedule of the system, for
a given starting state and a given set of entities. The purpose of the simulation is to
forecast the operational performance of the network in an actual real-time setting.
An example of a complex network scheduling application is the scheduling of
computational tasks to be executed on a network of processors. This application
can be represented as a directed graph, where each node on the graph is processor
that can perform one or more independent computational tasks. However, because
of data interdependencies, the tasks have precedence relationships that define the
permissible sequences for execution for these tasks. This is a complex scheduling
application; however, it is relatively simple to model the basic network system using
Simio (using routing sequences or task sequences).
Traditional simulation tools lack the necessary features to use them effectively
in a real-time scheduling environment. However, Simio has been designed from the
Flexible modeling with Simio 45
Production data
Production
Schedule
Simio
Production
portal schedule
view
Scheduling Production floor
Figure 2.14 Example scheduling deployment integrated with ERP and MES
entity plan Gantt. These two forms of Gantt charts graphically display activity from
a resource or entity perspective. The Simio implementations provide extra tracking
options such as graphical material inventory, resource states, downtime, schedules,
constraints, and detailed background information on any Gantt item. Data logs are also
the basis for user-designed Dashboard Reports (Figure 2.17). In addition to extensive
tracking and diagnostics, using a drag and drop interface, the Gantt charts can be used
to interact with the plan by such actions as specifying overtime or downtime periods,
or selecting alternate process flows. While these Gantt and log-related features are
primarily designed for planning and scheduling applications, many users have found
them to also be extremely valuable in providing debugging and clear communication
for traditional design applications.
A common problem with most scheduling systems is that the schedule must be
created deterministically. There is no good way to generate a schedule that accu-
rately predicts system downtime, material delays, extended processing times, and
other commonly encountered variability. One approach is to ignore such variability
entirely, which results in an optimistic schedule that becomes infeasible the first time
something goes wrong. Another approach is to build-in extra processing time or idle
Flexible modeling with Simio 47
time to allow for when things go wrong. But unfortunately, this is time that is wasted
when things go well.
Simio initially creates a deterministic plan based on no variability, then it makes
additional stochastic replications that consider all the potential problems and cal-
culates the risk of key milestones or targets being missed. The colored markers on
each order in Figure 2.18 indicate the risk associated with each order. For example,
even though Order-01 and Order-02 have similar slack time, the lower likelihood
of Order-02 achieving its release date target might be due to utilizing an unreliable
machine or consuming materials that are often late. With the knowledge of this risk
before deploying the schedule, the scheduler can use Simio to objectively evalu-
ate the most cost-effective ways to reduce the risk. This in-turn makes the schedule
more robust, e.g., it stays useful for a longer time. A related benefit is the ability to
quickly replan. When a major event (e.g., an equipment failure) invalidates the plan,
in Simio, the replan time is typically a few minutes versus the hours required in most
other approaches.
Figure 2.16 Resource plan Gantt
Figure 2.17 Interactive custom dashboard
50 Modeling and simulation of complex communication networks
2.11 Summary
Library-based modeling has long provided a faster way to build models, but unless the
library was closely matched to your application, it was often necessary to make model
approximations which made your solutions less accurate. The flexibility promised
through OO technology has the potential for dramatic improvements but often has
problems scaling to large models, and the customization of objects and libraries still
required high programming expertise.
Simio was invented with two primary goals in mind. The first was to bring new
technology to the OO simulation field to allow users to more effectively build objects,
libraries, and models without programming. The second goal was to extend the field
of discrete-event simulation beyond the traditional system design applications into
planning and scheduling. Rather than “bolting on” features as needed, Simio was
designed from the ground up to incorporate all the features needed to solve problems
in design, planning, and scheduling, in a single tool using a single model.
The creation of Simio with its data-driven and data-generated modeling features
was timely—just as the concepts of the smart factory promise a new way of operating
our production systems. The smart factory [5], also referred to as the fourth industrial
revolution or Industry 4.0 (Figure 2.19), represents the concept of physical systems
where the components are monitored and connected to a virtual system model to
predict and improve system performance. The virtual factory model provided by
Simio is a key component of the smart factory of the future.
Glossary
B2MML: Business to Manufacturing Markup Language as defined in [6] is a set of
XML schemas implementing the ISA-95, Enterprise-Control System Integration
family of standards, known internationally as IEC/ISO 62264.
Flexible modeling with Simio 51
ERP
Industry 4.0
Industry 3.0
Industry 2.0
Industry 1.0
Dispatching rule: An algorithm for deciding which job to process next in a production
facility, such as which job has the earliest due date or which requires a minimum
changeover.
Enterprise resource planning (ERP): Enhancements of the original material require-
ments planning (MRP) functions to bring together accounting, human resources,
and other functions into a fully integrated IT system. ERP also incorporated sup-
ply chain management (SCM) to extend inventory control over a broader scope,
including distribution.
Entity: Part of an object model and can have its own intelligent behavior. They can
make decisions, reject requests, decide to take a rest, etc. Entities have object
definitions just like the other objects in the model. Entity objects can be dynam-
ically created and destroyed, moved across a network of links and nodes, move
through 3D space, and move into and out of fixed objects. Examples of entity
objects include customers, parts, or workpieces.
Event: A notification that can be given by one object and responded to by several. It
alerts other objects that an action has occurred.
Experiment: Part of the project that is used for output analysis. The user defines one
or more sets of inputs/outputs (scenarios) and runs multiple replications to get
statistically valid results from which to draw conclusions.
Finite capacity scheduling (FCS):A scheduling approach that accounts for the limited
production capacity of the system. This contrasts with the enterprise resource
planning system that typically assumes an infinite capacity.
Gantt chart: A chart used in scheduling applications for showing activities over a
timeline. A resource Gantt and an entity Gantt show the same information, but
from two different perspectives.
52 Modeling and simulation of complex communication networks
References
[1] Student Models, Student Simulation Competition [online]. Available from
https://www.simio.com/academics/student-competition.php. Accessed Nov
2018.
[2] Junot Systems, Inc. Advanced SAP MES Integration [online]. Available from
http://mes-to-sap.com/. Accessed Nov 2018.
Flexible modeling with Simio 53
[3] Nelson, B. L. 2008. “The MORE Plot: Displaying Measures of Risk & Error
From Simulation Output.” In Proceedings of the 2008 Winter Simulation Con-
ference, edited by S. J. Mason, R. R. Hill, L. Mönch, O. Rose, T. Jefferson,
J. W. Fowler, Piscataway, New Jersey: Institute of Electrical and Electronics
Engineers, Inc.
[4] OptTek Systems, OptQuest [online]. Available from http://www.opttek.
com/products/optquest/. Accessed Nov 2018.
[5] Pegden, C. D. Deliver onYour Promise, How Simulation-Based Scheduling Will
Change Your Business. Pittsburgh, Simio LLC, 2017.
[6] MESA International, Business to Manufacturing Markup Language (B2MML)
[online]. Available from http://www.mesa.org/en/B2MML.asp. Accessed
Nov 2018.
Chapter 3
A simulation environment for cybersecurity
attack analysis based on network traffic logs
Salva Daneshgadeh1 , Mehmet Uğur Öney2 ,
Thomas Kemmerich3 , and Nazife Baykal1
The continued and rapid progress of network technology has revolutionized all modern
critical infrastructures and business models. Technologies today are firmly relying on
network and communication facilities which in turn make them dependent on network
security. Network-security investments do not always guarantee the security of orga-
nizations. However, the evaluation of security solutions requires designing, testing
and developing sophisticated security tools which are often very expensive. Simu-
lation and virtualization techniques empower researchers to adapt all experimental
scenarios of network security in a more cost and time-effective manner before decid-
ing about the final security solution. This study presents a detailed guideline to model
and develop a simultaneous virtualized and simulated environment for computer net-
works to practice different network attack scenarios. The preliminary object of this
study is to create a test bed for network anomaly detection research. The required
dataset for anomaly or attack detection studies can be prepared based on the proposed
environment in this study. We used open source GNS3 emulation tool, Docker con-
tainers, pfSense firewall, NTOPNG network traffic–monitoring tool, BoNeSi DDoS
botnet simulator, Ostinato network workload generation tool and MYSQL database
to collect simulated network traffic data. This simulation environment can also be
utilized in a variety of cybersecurity studies such as vulnerability analysis, attack
detection, penetration testing and monitoring by minor changes.
3.1 Introduction
A computer network is a set of connected network devices at the edge of the net-
work which are used in personal and professional lives such as PCs, tablets, iPads and
1
Department of Information Systems, Informatics Institute, Middle East Technical University, Turkey
2
Department of Computer Engineering, Atılım University, Turkey
3
Department of Information Security and Communication Technology, Norwegian University of Science
and Technology, Norway
56 Modeling and simulation of complex communication networks
and functions. Therefore, users could easily modify its functions based on their own
needs [7].
REAL (realistic and large) network simulation tool was based on a modified
version of the NEST 2.5 simulation test bed. Its initial developing motivation was to
compare the “fair queuing” gateway algorithm with first-come-first-served schedul-
ing and with competing proposals from Digital Equipment Corporation. REAL was
composited of two parts: a simulated server and a display client. The Berkeley UNIX
socket was used to connect the server to the client. It supported packet switched, store
and forward networks similar to the existing Xerox corporate net and the DARPA
Internet. REAL was able to model many details of the flow in the network and
transport layers [8].
In general, each network simulation or emulation study requires a simulation
scenario which defines the input configuration. According to Bajaj et al. [9], each
simulation scenario is usually made up of four components:
1. Network topology: which defines the physical interconnects between nodes and
the static characteristics of links and nodes.
2. Traffic model: which defines the network usage patterns and locations of unicast
and multicast senders.
3. Test generation: which creates events such as flooding traffic toward specific
node.
4. Network dynamics: such as node and link failures.
Additionally, NS2, NS3, OMNeT++, SSFNet, J-Sim, OPNET and QualNet are
some other examples of the well-known network simulation tools [10]. According
to Wehrle et al. [11], simulation tools have to model different network elements as
following:
● Network nodes: which illustrate end nodes such as PCs, laptops, servers, tablets
and network devices such as routers, hubs and switches.
● Network devices: which illustrate the physical devices that connect nodes to
Ethernet network interface card, a wireless IEEE 802.11 device, etc.
● Communication channels: which illustrate the medium for sharing information
among network devices such as fiber-optic-point-to-point links, shared broadcast
media, wireless spectrum, etc.
● Communication protocols: which model the implementation of standardized and
experimental network protocols such as User Datagram Protocol (UDP), Domain
Name System (DNS), etc.
● Protocol headers: which illustrate the special data related to the specific protocol
in the network packets.
● Network packets: which are the main parts of the information exchange in
computer networks. Network packets consist of protocol header and payload
data.
Conjointly, Wehrle et al. [11] emphasize on the importance of the realism rather
than abstraction in network simulation, as the high level of the abstraction might result
in abundant divergence from the experimental results.
58 Modeling and simulation of complex communication networks
3.1.4 Virtualization
Computer virtualization techniques were first developed in 1960s by IBM [13]. Virtu-
alization techniques enable users to divide the physical computer to multiple isolated
environments called virtual machines or guest machines. Virtual machines also can
be seen as an emulation of physical machines. Virtual machines are another solu-
tion which is used to model networks. There are two types of virtualization: virtual
machines which are powered by hypervisors and container-based virtual machines
(Docker).
1
Control group is a Linux kernel feature which provides isolated workstation with limited resources called
container.
2
Namespace is a Linux feature that prevents observation of resources used by different groups.
60 Modeling and simulation of complex communication networks
presents discussion and results; and finally, Section 3.7 summarizes the study and
presents a road map for the future work.
● Both the background and the attack data were synthesized for the privacy issues.
● Data’s false alarm characteristics were neglected; therefore, it is difficult to claim
that the available dataset is similar to the observed data.
● The workload of the synthesized data does not seem to be similar to the traffic in
real networks.
● More probably, the TCPdump data collector tool was overwhelmed during the
heavy traffic load and drop packets.
● There is no exact definition of the attacks for some cases such as probing or buffer
overflow.
A simulation environment for cybersecurity attack analysis 61
Gogoi et al. [24] emphasize on the nature of the input data as the key aspect of
any anomaly detection system. Input data is defined as a collection of data with some
attributes of same or different types such as binary, categorical or continuous. As the
nature of attributes determines the applicability of an anomaly detection technique,
it is so prominent to employ the dataset with desired attributes. It is not likely to find
any publicly available real dataset which perfectly matches attribute requirements of
all anomaly detection studies. In a nutshell, the combination of real data and realistic
synthetic dataset which represents the real environment could be seen as a coherent
choice to validate novel anomaly detection engines in the rapidly growing computer
and information technology area.
3
https://www.wireshark.org/.
A simulation environment for cybersecurity attack analysis 63
4
MACE is a toolkit to generate divers set of attacks [43].
5
TheLTProf collects legitimate traffic samples from public traces [44].
6
SURGE is a web workload generation tool which mimics a set of real users accessing a server [45].
7
It is a special network analysis tool written in Python to create network packets [46].
64 Modeling and simulation of complex communication networks
3.3 Methodology
The most challenging aspect of simulation based anomaly detection research is
proving the reliability and dependability of simulated datasets in comparison to
real-life datasets. On the other hand, well-designed simulation environment offers
repeatability, programmability and extensibility of the validation instrument [12].
The main purpose of this chapter is to introduce a simulation environment
using VMware virtualization software to design a flexible and reliable simulation
environment. In order to realize the simulated environment, some software and
hardware were required such as VMware workstation, GNS3 software and Ubuntu
Docker image. We also used open-source pfSense firewall, NTOPNG and MYSQL
to apply network rules, collect network flow data and store network flow data, respec-
tively. Moreover, we utilized botnet simulator and network traffic–generator tools
for creating DDoS attack and normal traffic data samples. We used GNS3 sim-
ulator to develop our experimental environment; as mention in [38], the results
of the GNS3 simulation tool matched the results obtained from the Cisco net-
work. Additionally, GNS3 is a well-tested and established network-simulation tool,
which is also used by many other companies like Exxon, Walmart, AT&T, NASA,
etc. [38].
We have implemented the virtual lab named Cyber Security Simulated Lab (CSSL)
in order to create an isolated platform to simulate, test and analyze different types
of security threats. Our infrastructure was built by means of a VMware virtualiza-
tion software on one physical machine. In order to connect the virtual machine to the
network, we mapped the external Internet connection of our host machine to the inter-
nal VM network. The CSSL allows us to configure different network topologies for
simulating different attack scenarios. Our virtual test bed is an isolated environment
to mainly fabricate and collect simulated DDoS attack data. As network technolo-
gies are growing rapidly, we primarily employed open platforms to include different
efforts and different packages whenever there is a need [10]. Moreover, using open
source tools and applications facilitates the repeatability of the study. We initially
defined the network topology as shown in Figure 3.2 using GNS3. Attacker and tar-
get machines are Ubuntu Docker appliance for GNS3. pfSense is an appliance of the
GNS3. VMnet8 is our exit point to the Internet. We disabled all incoming and out-
going traffic to/from the VMnet8 using firewall rules during the experimental phase
for security concerns. (For more information refer to Section 3.5.3.)
3.4.1 GNS3
GNS3 is a graphical network emulation tool which can provide simulation/emulation
of entire networks and many network devices such as links, switches, routers
A simulation environment for cybersecurity attack analysis 65
Attacker
Attacker zone
Simulated inside
victim zone
Service_Machine
Switch(MirroringFunction) PfSense_Firewall Switch
Computer QEMU*
GNS3 GUI
server 1
Controller
IOU**
Computer
GNS3 WEB
server 2
firewalls, etc. As it can be seen in Figure 3.3, GNS3 has a similar architecture to
Linux computers based on internal interfaces (network to device driver) and appli-
cation interfaces (sockets) [11]. All the communications in GNS3 tool are done over
HTTP using JSON; therefore, HTTP basic authentication can be used to securely
access to the application programming interfaces [47]. Additionally, GNS3 enables
packet filtering and raw-packet capturing in the network using its direct interface to
Wireshark application [48]. We installed both GNS3 windows application and GNS3
virtual machine image [49]. As it can be seen in Figure 3.4, we also connected them to
each other by setting the remote main server address of the GNS3 windows application
to the IP address of the GNS3 virtual machine.
66 Modeling and simulation of complex communication networks
3.4.2 Ubuntu
We used the following command to pull GNS3 Ubuntu Docker container on GNS3
VM from Docker registry.
● NAT (network address translation): The virtual machine does not have an IP
address on the external network. Therefore, it translates the addresses of virtual
machines in a private VMnet network to that of the host machine. Subsequently,
A simulation environment for cybersecurity attack analysis 67
VMnet1
10.5.6.x
VMnet8 Internet
10.5.5.x
VMnet2
10.5.7.x
VMware
it uses the host computer network connection in order to connect to the Inter-
net. VMware virtual DHCP server assigns an address to the virtual machine.
It provides a transparent and easy-to-configure method to access to network
resources.
● Host-only: It provides a network connection between the virtual machine and the
host computer. The virtual machine is connected to the host-operating system
using a virtual Ethernet adapter that is visible to the host-operating system on a
virtual private network. It is not visible to the outside host.
● Custom: It is a more complicated networking configuration option which provides
customized setup for virtual network adapters. After selecting “Custom” option,
the user should choose a virtual switch to connect the virtual machine’s adapter
to that switch.
Accordingly, we created three corresponding network adapters for the GNS3
virtual machine like Figure 3.6. We also assigned IP addresses for each of the interfaces
of GNS3 virtual machine. Figure 3.7 demonstrates the assignment of IP addresses to
interfaces of VMnet8.
68 Modeling and simulation of complex communication networks
VMnet8
10.5.5.2
PfSense_Firewall
Switch
10.5.5.3
VMnet2
Attacker
10.5.7.70
10.5.7.3
PfSense_Firewall
10.5.6.3 10.5.5.3
Figure 3.8 IP address assignment for WAN and LAN interfaces of pfSense
8
The list of source IP addresses to participate in the DDoS attack can be provided by a text file and then
pass to BoNeSi using ‘-i’ parameter.
70 Modeling and simulation of complex communication networks
VMnet1
Service_Machine
Switch(MirroringFunction)
10.5.6.60
10.5.6.70
Ostinato
is vulnerable to all the traditional attacks and exploits worse than normal environ-
ments [52]. Moreover, we wanted an isolated environment to create and test attacks
without affecting the real systems.
9
VNC is a graphical desktop sharing system based on the Remote Frame Buffer protocol to remotely access
and control another computer [54].
72 Modeling and simulation of complex communication networks
Traffic (eth0)
4.76 Mbit/s
4.50 Mbit/s Network link saturated
4 Mbit/s
3.50 Mbit/s
3 Mbit/s
2.50 Mbit/s
2 Mbit/s
1.50 Mbit/s
1 Mbit/s
500 Kbit/s
0
14:47:57 14:50:00 14:51:40 14:53:20 14:55:00 14:56:40 14:57:56
Figure 3.14 shows the summary of a time-based graph of the packets’ arriving
rate during simulated DDoS attack in NTOPNG.
TCP (rcvd)
TCP (sent)
2 Mbits
1 Mbits
500 Kbit/s
Table 3.1 Fields of NTOPNG logs and their corresponding data types
a
Auto incremental.
3.7 Summary
References
[1] Kurose JF, Ross KW. Computer networking: A top-down approach. Addison-
Wesley, Reading; 2010.
[2] Breslau L, Estrin D, Fall K, et al. Advances in network simulation. Computer.
2000;33(5):59–67.
[3] Sarkar NI, Halim SA. A review of simulation of telecommunication networks:
Simulators, classification, comparison, methodologies, and recommendations.
Cyber Journals: Multidisciplinary Journals in Science and Technology, Journal
of Selected Areas in Telecommunications (JSAT). 2011;2(3):10–17.
[4] Chang X. Network simulations with OPNET. In: Proceedings of the 31st Con-
ference on Winter Simulation: Simulation—A Bridge to the Future-Volume 1.
ACM; 1999. p. 307–314.
[5] Gyires T. Network simulation. In: Iványi A, editor. Algorithms of informatics.
vol. 2. Budapest: MondAt Kiadó. 2007.
[6] Chan KFP, De Souza P. Transforming network simulation data to semantic
data for network attack planning. In: ICMLG 2017 5th International Confer-
ence on Management Leadership and Governance. Academic Conferences and
Publishing Limited; 2017. p. 74.
[7] Dupuy A, Schwartz J, Yemini Y, Bacon D. NEST: A network simulation and
prototyping testbed. Communications of the ACM. 1990;33(10):63–74.
[8] Keshav S. REAL: A network simulator. University of California Berkeley,
Berkeley, CA, USA; 1988.
10
Host: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz, 16.0GB RAM, 64-bit OS.
11
Virtual Appliance: 1×4 Core processor, 6.1GB RAM.
A simulation environment for cybersecurity attack analysis 77
[9] Bajaj S, Breslau L, Estrin D, et al. Improving simulation for network research.
University of Southern California, Tech. Rep; 1999.
[10] Pan J, Jain R. A survey of network simulation tools: Current status and future
developments. Washington University in St. Louis, Tech. Rep; 2008.
[11] Wehrle K, Günes M, Gross J. Modeling and tools for network simulation.
Aachen: Springer Science & Business Media; 2010.
[12] Behal S, Kumar K. Trends in validation of DDoS research. Procedia Computer
Science. 2016;85:7–15.
[13] Bitner B, Greenlee S. z/VM a brief review of its 40 year history. Dosegljivo:
http://www vm ibm com/vm40hist pdf (pridobljeno: 26 4 2016). 2012.
[14] Preeth E, Mulerickal FJP, Paul B, Sastri Y. Evaluation of Docker containers
based on hardware utilization. In: Control Communication & Computing India
(ICCC), 2015 International Conference on. IEEE; 2015. p. 697–700.
[15] Morris D, Voutsinas S, Hambly N, Mann R. Use of Docker for deployment
and testing of astronomy software. Astronomy and Computing. 2017;20:
105–119.
[16] Geng X, Zeng X, Hu L, Guo Z. An novel architecture and inter-process com-
munication scheme to adapt chromium based on Docker container. Procedia
Computer Science. 2017;107:691–696.
[17] Grunewald D, Lützenberger M, Chinnow J, Bye R, Bsufka K, Albayrak
S. Agent-based network security simulation. In: The 10th International
Conference on Autonomous Agents and Multiagent Systems-Volume 3. Inter-
national Foundation for Autonomous Agents and Multiagent Systems; 2011.
p. 1325–1326.
[18] Bhattacharyya DK, Kalita JK. Network anomaly detection: A machine learning
perspective. New York: Chapman and Hall/CRC; 2013.
[19] Gogoi P, Bhattacharyya D, Borah B, Kalita JK. A survey of outlier detec-
tion methods in network anomaly identification. The Computer Journal.
2011;54(4):570–588.
[20] Jyothsna V, Prasad VR, Prasad KM. A review of anomaly based intru-
sion detection systems. International Journal of Computer Applications.
2011;28(7):26–35.
[21] Chandola V, Banerjee A, Kumar V. Anomaly detection: A survey. ACM
Computing Surveys. 2009;41(3), Article 15:1–58.
[22] Ahmed M, Mahmood AN, Hu J. A survey of network anomaly detection
techniques. Journal of Network and Computer Applications. 2016;60: 19–31.
[23] Tavallaee M, Bagheri E, Lu W, Ghorbani AA. A detailed analysis of the KDD
CUP 99 data set. In: Computational Intelligence for Security and Defense
Applications, 2009. CISDA 2009. IEEE Symposium on. IEEE; 2009. p. 1–6.
[24] Gogoi P, Bhuyan MH, Bhattacharyya D, Kalita JK. Packet and flow based
network intrusion dataset. In: International Conference on Contemporary
Computing. Springer; 2012. p. 322–334.
[25] Botta A, Dainotti A, Pescapé A. A tool for the generation of realistic net-
work workload for emerging networking scenarios. Computer Networks.
2012;56(15):3531–3547.
78 Modeling and simulation of complex communication networks
[42] Zhao X, Qian YK, Wang CS. A framework of evaluation methodologies for
network anomaly detectors. In: Advanced Materials Research. vol. 756. Trans
Tech Publ; 2013. p. 3005–3010.
[43] Sommers J, Yegneswaran V, Barford P. A framework for malicious work-
load generation. In: Proceedings of the 4th ACM SIGCOMM Conference
on Internet Measurement. ACM; 2004. p. 82–87.
[44] Mirkovic J. D-WARD: Source-end defense against distributed denial-of-
service attacks. University of California, Los Angeles, CA; 2003.
[45] Barford P, Crovella M. Generating representative web workloads for network
and server performance evaluation. In: ACM SIGMETRICS Performance
Evaluation Review. vol. 26. ACM; 1998. p. 151–160.
[46] Scapy’s documentation; 2018. Available from: https://scapy.readthedocs.io.
[47] GNS3 Architecture. GNS3 Academy.; 2018. Available from: http://api.gns3.
net/en/latest/general.html#architecture.
[48] Neumann JC. The book of GNS3: Build virtual network labs using Cisco,
Juniper, and More. San Francisco: No Starch Press; 2015.
[49] GNS3 Software. GNS3 Inc.; 2018. Available from: https://www.gns3.
com/software.
[50] Goldstein M. BoNeSi – The DDoS Botnet Simulator; 2016. Available from:
https://github.com/Markus-Go/bonesi.
[51] Ali I, Meghanathan N. Virtual machines and networks-installation, perfor-
mance study, advantages and virtualization options. arXiv preprint arXiv:
11050061. 2011.
[52] Reuben JS. A survey on virtual machine security. Helsinki University of
Technology. Tech. Rep; 2007.
[53] TightVNC Software. TightVNC Group; 2018. Available from: https://www.
tightvnc.com.
[54] Virtual Network Computing. AT&T Laboratories Cambridge; 2018. Available
from: http://www.hep.phy.cam.ac.uk/vnc_docs/protocol.html.
[55] NTOP Software. NTOP Inc.; 2018. Available from: https://www.ntop.org/
products/traffic-analysis/ntop/.
[56] DrayTek. Vigor3300V user guide V3.0; 2009. https://www.draytek.com/en/
products/products-a-z/router.all/2016/03/30/vigor3300v/.
Part II
Surveys and reviews
Chapter 4
Demand–response management in smart grid:
a survey and future directions
Waseem Akram1 and Muaz A. Niazi1
Nowadays, one of the key areas of research in smart grid (SG) is demand–response
management (DRM). DRM assists in simplifying interactions between the customers
and the utility-service providers. It also helps in the improvement of energy efficiency
as well as effects on load balancing. Studies on DRM have brought a number of
interesting, technical discussions and research contributions. Many of these studies
work toward making energy-efficient systems. However, there is a need to work
in the domain of customer satisfaction; this area needs considerable new advances.
From past few decades, a number of studies have been carried out in SG regarding
DRM. However, there is no such work that presents a comprehensive analysis of these
works. There is a need to investigate different techniques, their advantages, as well
as limitations. By focusing on DRM from a customer satisfaction perspective, in this
chapter, we present a detailed overview of different solutions for developing DRM.
We also group existing solutions and identify trends and challenges in an SG domain
from DRM perspective.
4.1 Overview
We first start by giving an introduction of SG. Then background and basic concepts are
given. Next, we present a detailed review of different literature from DRM perspective
in SG. Then open-research problems are given. Finally, we present conclusion at the
end of this chapter.
4.2 Introduction
The traditional power system provides one-way power flow to the consumers. On
the other side, the energy demands are continuously growing from consumer sides.
This makes the traditional power system difficult to respond to the ever-changing and
1
Computer Science Department, COMSATS Institute of Information Technology, Pakistan
84 Modeling and simulation of complex communication networks
rising energy demand of consumers. Due to this issue, the energy sector has started
working for efficient and sustainable energy system. This effort introduced the SG
concept in the energy domain.
The SG introduced a two-way dialog where electricity and information can be
exchanged between utility and consumers. It integrates advanced information and
communication technology (ICT), smart meters, smart appliances, and other sensing
mechanisms [1]. It is a developing network of distributed nodes, where all operations
of the system are controlled by an intelligent and autonomous system [2]. The SG
involves the transmission of energy to the consumers in a controlled and smart way,
which benefices both utility and end users [3].
DRM plays an important role in SG environment. It enables the dynamic adjust-
ment of energy demand from consumers in response to the price signals and incentives.
This process shifts higher demand to lower demand, thus reducing energy cost [4,5].
It assists in the interaction between end users, appliances, and utility service provider
which minimizes end-user effort in controlling power usage devices [6,7]. It also
helps in fault detection and prevention in the system, thus improves system reliability
and sustainability [2].
There are several research challenges related to the DRM. The deployment of ICT,
smart meters, and renewable energy resources is a challenging task [1]. Renewable
energy resources have unpredictable fluctuation in power generation. It is difficult to
predict energy for the day ahead [8]. Another big challenge is decision-making for
demand and consumption at consumer side. Consumers are making a decision about
how much energy is required for a certain type of appliance in a particular time period.
This makes the consumer decision more complex. The users’ demand for energy
changes with time (variable demand), this needs an adaptive strategy of grid unit
that can modify their capacity according to the user demand [9]. Reliability is another
issue in an SG environment [10]. Some naturally accruing events lead to the cascading
failure of SG [6,11–14], where supervisory control and data-acquisition system is
used to detect and prevent a fault in the system [3]. The SG presents heterogeneous
structures composed of distributed nodes. All operations are controlled through a
communication network. The current communication techniques are inefficient due
to the large and complex systems.
The deployment of renewable energy resources needs more coordination and
controlling techniques to achieve reliable and efficient system. A multi-agent system
(MAS) is a useful tool for coordination and controlling all operations within the SG,
due to its distributed and autonomous property. MAS is widely used in SG appli-
cations. In articles [13,15–17], MAS is adopted for DRM [18], fault handling [14],
and voltage and storage control [17], [19]. In the last couple of decades, researchers
have made a number of contributions to the DRM and have made the efficient system
in the SG environment. However, there is still a need for improvement in consumer
satisfaction domain.
From past few decades, a number of studies have been carried out in SG regarding
DRM. However, there is no such work that presents a comprehensive analysis of
these works. There is a need to investigate different techniques, their advantages, and
limitations. So here in this part, we present a comprehensive and detailed review of
Demand–response management in smart grid 85
4.3 Backgrounds
In this section, we are going to present basic background and concepts for
understanding DRM in the SG.
4.3.1 Smart grid
The traditional power system is responsible for generation and transmission of energy
to end users. However, the user demand changes with time (variable demand), so the
static approach cannot deal with variable demand. This problem gained the attention
of researchers and introduced SG technology. SG is a complex system that is being
formed from the traditional power system [20]. This integrates advanced communi-
cation and control technology that enables the system to perform the automated oper-
ation. It also consists of other various technologies like smart meters, smart homes,
generators, storage devices, appliances, load, etc. This presents a network composed
of distributed nodes; all operations of the system are controlled intelligently and
autonomously. The key benefit of an SG is to achieve an efficient energy system [2].
NIST [21] presented a conceptual model for SG domain called NIST SG frame-
work 1.0 in the National Institute of Standards and Technology, US Department of
Commerce. This model represents seven different actors/applications that are inter-
acting with each other. The conceptual model for SG has been shown in Figure 4.1.
Each actor in this model is described below:
1. Customer: Represent end users that consume and store energy. They may be
residential, industrial or commercial.
2. Market: Operators in the electricity market.
3. Operation: They manage all energy transmissions.
4. Service provider: They provide services and facility to the utility and customer.
5. Generation: They generate and store energy.
6. Transmission: They carry energy over large distances.
7. Distribution: They carry energy to and from customers.
The model components:
1. Social components: Electricity consumers, producers, grid operators.
2. Technical components: Loads (consuming devices), generators, power lines,
buses.
The interaction and behavior of these actors will influence and be influenced by
the technical system. The changes in the configuration of the technical system will
86 Modeling and simulation of complex communication networks
Service
Market Operation
provider
affect the actors’ behavior, and changes in actors’ behaviors will affect the technical
system configuration. Therefore, there is a need to consider coupled social-technical
system in order to achieve reliable, sustainable, and resilient power system.
discussions on DRM. During our literature study, we found two types of literature;
learning-based techniques and complex system to address DRM in the SG.
ANN
Learningbased
RL
Collaboration
CAS
PSO
Game theory
Security management
HEM
Renewable energy
resources
Energy market
Microgrid
to 2,425.6 and the energy demand is increased from 1,224.9 to 1,478.4. However, this
work does not handle constraint on power consumption.
In [49] by Song et al., another framework for optimal nonstationary demand-side
management in an SG environment is proposed. In this method, the user selects their
energy usage pattern according to their priority and needs. They used a repeated game
approach which provides interaction among foresighted price anticipating users. This
method showed 50% reduction in energy cost and robustness in error. However, higher
threshold value results in a trade-off between cost and peak average ratio.
In [50], Nunna and Doolla carried out research work on management of demand
response in multiple microgrid networks. In this work, customers participate in
demand–response strategy. This study proposed a priority index approach through
which customers participate in the market. This method reduced peak demand. It is
found that customers with high priority index get power at low cost.
In [51], O’Brien et al. focused on DRM in SG application. In this work, demand
response is modeled as the game-theoretic environment, and Shapley-value (SV) is
used for payment distribution process. RL technique is used to estimate SV. Simulation
results showed that for random sampling, 1,000,000 samples take 58.2 s execution
time, while for sigmoid sample, 51,129 samples take 6.5 s. The results also showed
that uniform sample balances demand and response. However, this method is not
suitable for distribution scheme and its direct estimation is difficult. The literature
summary of DRM has been shown in Table 4.1.
showed that on clusters creation, the bit error is very large. It has been shown that by
using UDP protocol for communication, the broadcast showed no optimal solution,
while the TCP protocol showed a high bandwidth capacity. However, the convergence
rate is increased. With high DR usage, effective communication is achieved at a high
94 Modeling and simulation of complex communication networks
cost. Another disadvantage of this study is that the distributed DR is applied on huge
distance; there is no local communication among neighbor’s channels.
In [54], Tsai et al. have worked on distributed DR for large-scale consumers
load with the conjunction of renewable energy resources. In this work, a neighbor-
communication strategy is applied. This results in low communication cost. They
used a randomized alternative direct technique of multipliers for distributed DR. In
this method, there is no need for communication synchronization. With few mes-
sages, the balance state can be achieved by the system. The results showed 50%
balance state is achieved. It showed outperformance by using a RTP scheme over the
existing distributed DR. However, they assumed that all consumers involved in com-
munication process are trustable. The proposed scheme cannot handle wrong data
transmission.
Although DR creates an energy-efficient system by reducing energy demand
from peak-hour to off-peak. In this process, consumers and utility service providers
always communicate with each other. Consumers transmit their energy demand profile
to the grid unit, while from grid side, the energy-cost information is routed to the
consumers. In this communication process, this information can be accessible to
unauthorized users. So there is a need to make the communication system secure. In
this regard, Wada et al. [55] worked on privacy management and proposed masking
method to secure the privacy of each individual in a smart distributed energy system.
In this scheme, every agent uses a mask signal along with their states. Then, during
the communication process, each agent exchanges their mask with other agents. To
obtain the correct signal, agents subtract the obtained signal from their own state.
The RTP scheme of DR is applied. The results showed that this method can protect
information of each agent along with a balanced state of the system. The literature
summary of security management in SG has been shown in Table 4.2.
Ghazvini et al. have proposed a new HEM algorithm. The proposed scheme schedule
appliances, EVs, and electric water heater (EWH) with a combination of energy
storage. EV is considered as a dispatchable energy source. In this work, renewable
energy resource such as photovoltaic voltage (PV) is also used. They used simple
rule-based algorithm under different pricing scheme which schedules EV charging
and EWH heating process. The simulation results showed 29.5%–31.5% energy-cost
reduction.
In [57], Luo et al. have worked on large scale ice-thermal storage system with
the investigation to find out how to use it for fast voltage-control strategy with the
conjunction of renewable energy resources. The work presents a modified version
of the conventional system for thermal load management. In this work, a refrigera-
tor is used for the ice-thermal load. This work showed that the proposed technique
can effectively reduce the ratio of power imbalance in smart homes. The proposed
technique is implemented on computer-simulation tool. The results showed the total
fluctuation in voltage frequency reduced. The possible extension of this work can be
the use of the proposed scheme on the large-scale distributed power system.
In smart homes, a smart meter is used that monitors the user load and demand
profiles. However, the load forecasting of individuals at large scale is a challenging
task due to stochastic nature of the individual demand. In this regard, in [58], Yu et al.
have worked on this issue and proposed the use of the sparse coding technique to
model individual loads at a large scale of the distributed power system. In this work,
data of 5,000 homes based on a project with the collaboration of electrical power
board in Chattanooga (2011–13) was used. The objective function was to forecast
and predict next-day and next-week total load. The results showed that 10% accuracy
of the system improved. However, the proposed scheme needs to be tested on others
sparse methods like change point detection in a distributed system for getting a more
accurate system.
In most previous studies, the game theoretic approach has been used for DRM.
However, their computation cost is very large for finding Nash equilibrium. In [59],
Li et al. have proposed a sparse load-shifting based DRM that schedule different smart
home appliances. In this work, bidirectional communication is used that improve the
searching process for Nash equilibrium. The objective function to minimize peak to
average ratio (PAR) was used. The proposed algorithm showed the linear cost for
finding Nash equilibrium. The results showed convergence rate of 500 iterations.
The deployment of DR needs appropriate policy design and new technology. In this
regard [60], a MAS is developed for residential DR in a distributed energy network.
In this work, two agents, i.e., home agent and retailer agent, are used. The home agent
predicts the load profile of consumers. The RTP scheme of DR is used in this work.
The convex programming is used to model the consumption pattern of consumers.
They used two objective functions, i.e., energy-cost minimization and users’ waiting
time. In this work, two case studies were considered. The simulation results showed
that in Case 1, PAR and cost are reduced by 2.32$ and 62.7$, respectively. For case
2, PAR and cost are reduced by 1.54$ and 51.82$, respectively.
In [61], Huang et al. have introduced the use of the smart-gateway network in
the SG. In this work, a single home with multiple rooms is considered along with a
96 Modeling and simulation of complex communication networks
single power grid and one PV. They used the multi-agent framework. First, energy-
demand pattern of each room is extracted with some uncertainties assumptions. Each
room is considered an agent. In this work, a dataset of a single building is used. A
minority-game-based DR is used for peak demand reduction. The simulation results
showed that peak load is reduced 38.5% in summer and 5.8% in winter. The literature
summary of HEM in SG has been shown in Table 4.3.
Yao [62] EVs charging, binary Computation time Energy cost is ignored
optimization 0.19 s is achieved
Le Floch Two type of EV load PAR reduced 40% Only feasible under
[63] management, price-based DR limited threshold voltage
Jannati Optimal management of EV Operational cost Not tested on other DR
[64] with parking plots, time-of- reduced 4.30% strategies
use DR
all, it computes voltage capacity of the system. Then customizes load profile of the
consumers by using a price-based DR. The proposed scheme showed feasibility for
some specific cases like if the voltage remains within fixed limits, the flexible load is
achieved. The work is implemented on IEEE 55-bus radial distribution network. The
results showed 40% reduction in PAR.
The main benefit of renewable energy resources is to reduce air pollution pro-
duced by fuel consumption in power grid. In this context, a number of EVs, as well
as their parking plots, also increase to reduce the burden on the power grid. However,
there needs an optimal operation of these EVs with the conjunction of parking plots.
In this regard, in [64], Jannati and Nazarpour proposed an optimal management sys-
tem for EVs and their parking plots. They integrated the model with the wind, PV,
and local generators. They also used hydrogen and fuel cell storage system. The TOU
DR strategy is applied to schedule charging and discharging process of EVs. The
objective function to minimize operational cost along with charging and discharging
of EVs cost is considered. Then mixed integer linear programming is used in four case
studies. In Case 1, hydrogen storage and DR was not applied. In Case 2, hydrogen
storage was integrated with the model. In Case 3, DR strategy was integrated, and in
Case 4, both hydrogen storage and DR strategy were used. The simulation results of
Case 2, Case 3, and Case 4 were compared with Case 1. The obtained results showed
that Case 2, Case 3, and Case 4 reduced operation cost by 1.79%, 4.07%, and 4.30%,
respectively. The literature summary of electric vehicles has been shown in Table 4.4.
With the advent of smart homes, electrical sector encourages users to use renew-
able energy which benefits both users and grid as the total energy cost can be reduced.
A buy-back strategy encourages users to generate more power from renewable energy
resources that reduce the load on the main power grid. In [66], Chiu et al. worked
on buy-back scheme with dynamic pricing technique. Dynamic pricing is modeled
as a convex optimization dual problem. In this work, a day-ahead time-dependent
pricing scheme is used. It also integrates wind, PV, and battery storage in the system.
The objective function to achieve maximum user and company benefit was used. The
simulation results showed that 1.28 PAR was achieved and peak load was reduced
from 881.11 to 754.18/kW h.
Nowadays, the uses of wind energy resource are increasing. However, due to
the stochastic nature of it, the mismatch of energy demand and power generation is
also increasing. This introduced micro-combined heat power (CHP)—a hybrid energy
system. However, there is a need to analyze the impact of DR with the conjunction
of CHP at large scale with wind-energy resources. In [67], Jiang et al. addressed
this issue and proposed an operation model representing the residential hybrid energy
system. The proposed scheme uses price response, micro-CHP, smart appliances, and
also load aggregator. The load aggregator is used to centralize different consumers
load. The scheme is implemented on IEEE 118-bus. The simulation results showed that
wind power curtailment is reduced 78% in 6-buses. It also reduced energy cost 10.7%
and operation cost 11.7% on 118 buses. HEMS use DR to schedule home appliances.
However, currently, there is no accurate method that predicts load consumption of
appliances within a residential building. In [68], Hu and Xiao have worked on load
prediction within the residential sector. In this work, the air conditioner appliance is
used to train the thermal model. The historical data of indoor and outdoor temperature
was used. The optimization algorithms like trust region algorithm, genetic algorithm
(GA), and PSO were used to schedule the load of the air conditioner. They also used
two strategies for the temperature which are set point and precooling. The simulation
results showed power reduction 26%. However, they used single speed compressor.
The proposed methodology should be tested on the inverter-driven air conditioner.
In [69], Amrollahi and Bathaee have worked on modeling a stand-alone microgrid
that is far from the main power grid. This work investigates DR in the component size
of optimization of the microgrid. They considered only wind and solar energy system.
In this work, component size optimization and cost reduction are done by time-
shift and load scheduling. The simulation results showed that a number of batteries,
inverters, PV capacity, and energy cost reduced to 35.6%, 35%, 1.8%, and 17.1%,
respectively.
In SG, a thermostatic load such as heat and air conditioner also help in reducing
energy cost. In [82], Behboodi et al. have worked on thermostatic load control with
controlling the real-time energy market by using transactive control paradigm. In this
work, an ABM is developed that models DR for thermostatic loads. The proposed
scheme can control thermostatic load under heating and cooling condition. The sim-
ulation results showed 10% energy cost reduction. However, this work ignored other
appliances and just focused on the heater and air conditioner. The proposed work can
be extended to integrate others appliances as well as renewable energy resources.
Demand–response management in smart grid 99
Wang Promotion the use of PV, Energy cost is reduced, Not implemented on
[65] game-theoretic approach, fewer batteries, other DR strategies
RTP scheme of DR consumers with high
response get larger PV
Chiu [66] Modeling renewable PAR achieved 1.28 Maintain centralize
energy as buy-back communication
scheme, dynamic pricing infrastructure, not
technique of DR useful in the case
of blackout
Jiang [67] Modeling large-scale CHP Reduced energy PAR is ignored
with DR. Price response DR cost 10.7%
Hu [68] Load profile prediction trust Power reduction 26% Only modeled single
region algorithm, GA, PSO speed compressor
Amrollahi Modeled stand-alone Energy cost Only wind and solar
[69] microgrid time-shift, reduced 35.6% energy is considered
DR scheme
Behboodi Optimization of thermostatic Energy cost Only focused on heater
[82] load, transactive control reduced 10% and air conditioner
scheme, agent-based model
Shakeri Control scheme for thermal Energy cost Demand and PV
[70] energy storage novel reduced 20% capacity prediction
optimization algorithm was ignored
Regarding HEMS, in [70], Shakeri et al. have proposed a new control strategy for
thermal and storage-management system. The working of the proposed algorithm is
that it receives price information in advance and purchases energy at an off-peak hour.
This work also integrated batteries and PV in a residential home. Total 26 appliances
are used. Results showed 20% energy cost reduction. However, the proposed scheme
was not tested for demand forecasting and prediction of PV capacity. The literature
summary of renewable energy sources has been shown in Table 4.5.
4.4.3.6 Mircorgrid
From past few decades, it confirms that network microgrid plays an important role
in making an energy-efficient and reliable system. However, due to the unpredictable
Demand–response management in smart grid 101
nature of renewable energy resources, they impose new challenges on the smart
distributed energy system. To address this issue, in some papers, stochastic technique
is used. In [76], Nikmehr et al. have proposed another scheme for network microgrid to
schedule consumers load. In this work, intermittent nature of load and generation unit
is considered. They used time-of-use and real-time pricing of DRM. The optimization
technique PSO is used for scheduling consumer load under uncertainty scenario. The
simulation results showed the execution time of PSO is 241 s, while other stochastic
technique showed 2,763 s. The operational cost is reduced to 17.3%, 30.6% with TOU
and RTP, respectively.
In [77], a peer-to-peer network consists of consumers generation unit, i.e., PV
is considered. They used priced-based DR strategy. The energy-sharing problem is
modeled as a dynamic internal pricing scheme which provides supply and demand
ratio. In this work, the flexibility of consumer’s consumption is considered. The
objective function economic cost and user’s willingness is used. The performance
of the system is evaluated in terms of prosumers cost and sharing of energy. The
simulation results showed that total power loss is reduced from 3,321 to 3,187/kW h.
The convergence rate is noted as 60 iterations. However, this work was not tested on
a distributed network.
In [78], work is done on the smart microgrid and proposed stochastic optimization
problem model with an objective function to minimize operational cost and CO2
emission along with renewable energy resources. In this work, probability density
function is used to predict wind speed and solar irradiance. Three types of consumers
were considered, i.e., residential, industrial, and commercial. The incentive-based
DR strategy is applied with three different case studies, i.e., (1) operational cost
and emission; (2) operational cost, emission, and DR; (3) multi-objective function,
operational cost, and emission. The simulation results showed that by using DR, the
operational cost is reduced by 21% and emission by 14%. The literature summary of
microgrid has been shown in Table 4.7.
102 Modeling and simulation of complex communication networks
In the context of HEM, different smart appliances consume energy. However, the
demand patterns and available power are not remaining same at all time. Their value
changes with time. This fluctuation always tends to create an unbalance situation
between energy demand and available energy. To some extent, this issue is addressed
by a number of research works. A study such as presented in [57] worked on the ice-
thermal storage system. The main purpose of the work was to control voltage with
the conjunction of renewable energy resource. The possible extension of this work
can be to use the proposed scheme on the large-scale distributed power system. As in
real world, power system presents a complex system. So it needs to work on a large
scale to observe the behavior of each component in the system.
In [80][81], the authors have presented work on appliances scheduling. These
studies present how scheduling the power demand of different appliances by using
different heuristic approaches can be effective. However, these current studies only
focused on a single home. The possible extension of the current work may be to
test the proposed technique on multiple homes along with different DR strategies to
investigate the load as well as energy cost pattern.
Nowadays, the concept of EVs has been introduced in SG domain. The EVs
have the capability to store and transmit energy. These EVs are used to store energy
whenever energy cost is low from grid unit. Then they sell energy at low cost when
the load on the main grid is high. So this reduces load burden on the grid as well as
high energy cost. In this context, a number of studies such as in papers [62–64] have
proposed different models that show how to effectively use EVs in SG scenarios for
load and energy-cost reduction. However, there are still open-researches issues like
sometime they ignore energy cost, only feasible under limited voltage, some schemes
are not tested on different DR.
Renewable energy resources offer alternate energy resource in the form of wind
and PV energy. Users can fulfill their energy demands from these resources. How-
ever, their energy production is unpredictable. They only depend on weather condition.
From past few decades, different heuristic optimization techniques are used for han-
dling the unpredictable nature of these renewable energy resources. A study presented
in [65] worked on PV promotion. However, they just focused on RTP scheme and
ignored other DR strategies. They need to study the effectiveness of the current work
on other DR strategies. A study in the paper [66] proposed a buy-back scheme for
renewable energy resource. In this work, they used a centralized communication
infrastructure which is not useful in the case of a blackout. Fault in one part can
tend to create disturbance in the whole system. Other studies such as presented in
papers [67–69], [82] also worked on renewable energy resources and demonstrated the
energy cost reduction. However, they just focused on cost reduction; other parameters
like user discomfort and PAR reduction is ignored.
The energy market is responsible for buying energy from power sources and then
selling to the consumers. This area of research also studied different literature. They
demonstrated the peak load and energy-cost reduction. However, the current work
was presented on a small level. Studies such as in papers [76–78] worked on the
microgrid. By using optimization techniques, energy cost is successfully reduced.
However, they ignored the PAR parameters as well as users comfort level.
Demand–response management in smart grid 105
4.6 Conclusions
The DRM plays an important role in the SG environment. It offers a broad range of
advantages on system operation by reducing energy cost as well as effects on load
balancing. In this part, we covered the different approaches applied for DRM in SG and
proposed a classification of DRM models according to the techniques used for their
implementation. The current literature in SG from DRM aspect is categorized into
three main research directions. These research directions are learning-based approach,
complex system, and some other different techniques. We finally described each
technique and its model in detail. We also highlighted open-research problems exist
in each solution.
References
[1] Gungor VC, Sahin D, Kocak T, et al. Smart grid technologies: Communica-
tion technologies and standards. IEEE Transactions on Industrial Informatics.
2011;7(4):529–539.
[2] Bollinger LA, van Blijswijk MJ, Dijkema GP, et al. An energy systems mod-
elling tool for the social simulation community. Journal of Artificial Societies
and Social Simulation. 2016;19(1):1.
[3] Siano P. Demand response and smart grids: A survey. Renewable and
Sustainable Energy Reviews. 2014;30:461–478.
[4] Thimmapuram PR, Kim J. Consumers’ price elasticity of demand modeling
with economic effects on electricity markets using an agent-based model. IEEE
Transactions on Smart Grid. 2013;4(1):390–397.
[5] Kamyab F, Amini M, Sheykhha S, et al. Demand response program in smart
grid using supply function bidding mechanism. IEEE Transactions on Smart
Grid. 2016;7(3):1277–1284.
[6] Rahman M, Mahmud M, Pota H, et al. A multi-agent approach for enhancing
transient stability of smart grids. International Journal of Electrical Power &
Energy Systems. 2015;67:488–500.
[7] Giraldo J, Mojica-Nava E, Quijano N. Synchronization of isolated micro-
grids with a communication infrastructure using energy storage systems.
International Journal of Electrical Power & Energy Systems. 2014;63:71–82.
[8] Lawrence TM, Boudreau MC, Helsen L, et al. Ten questions concerning
integrating smart buildings into the smart grid. Building and Environment.
2016;108:273–283.
[9] Haider HT, See OH, Elmenreich W. Residential demand response scheme
based on adaptive consumption level pricing. Energy. 2016;113:301–308.
[10] Lakić E, Artač G, Gubina AF. Agent-based modeling of the demand-side
system reserve provision. Electric Power Systems Research. 2015;124:85–91.
[11] BabalolaA, Belkacemi R, Zarrabian S. Real-time cascading failures prevention
for multiple contingencies in smart grids through a multi-agent system. IEEE
Transactions on Smart Grid. 2016;9(1):373–385.
106 Modeling and simulation of complex communication networks
[45] Mocci S, Natale N, Pilo F, et al. Demand side integration in LV smart grids
with multi-agent control system. Electric Power Systems Research. 2015;125:
23–33.
[46] Hurtado L, Nguyen P, Kling W. Smart grid and smart building inter-operation
using agent-based particle swarm optimization. Sustainable Energy, Grids and
Networks. 2015;2:32–40.
[47] Wei W, Liu F, Mei S. Energy pricing and dispatch for smart grid retailers under
demand response and market price uncertainty. IEEE Transactions on Smart
Grid. 2015;6(3):1364–1374.
[48] Chai B, Chen J, Yang Z, et al. Demand response management with multiple
utility companies: A two-level game approach. IEEE Transactions on Smart
Grid. 2014;5(2):722–731.
[49] Song L, Xiao Y, Van Der Schaar M. Demand side management in smart
grids using a repeated game framework. IEEE Journal on Selected Areas in
Communications. 2014;32(7):1412–1424.
[50] Nunna HK, Doolla S. Demand response in smart distribution system with
multiple microgrids. IEEE Transactions on Smart Grid. 2012;3(4):1641–1649.
[51] O’Brien G, El Gamal A, Rajagopal R. Shapley value estimation for compen-
sation of participants in demand response programs. IEEE Transactions on
Smart Grid. 2015;6(6):2837–2844.
[52] Rahman MS, Basu A, Kiyomoto S, et al. Privacy-friendly secure bidding for
smart grid demand-response. Information Sciences. 2017;379:229–240.
[53] Moghaddam MHY, Leon-Garcia A, Moghaddassian M. On the performance of
distributed and cloud-based demand response in smart grid. IEEE Transactions
on Smart Grid. 2017;9:5403–5417.
[54] Tsai SC, Tseng YH, Chang TH. Communication-efficient distributed demand
response: A randomized ADMM approach. IEEE Transactions on Smart Grid.
2017;8(3):1085–1095.
[55] Wada K, Sakurama K. Privacy masking for distributed optimization and its
application to demand response in power grids. IEEETransactions on Industrial
Electronics. 2017;64(6):5118–5128.
[56] Ghazvini MAF, Soares J, Abrishambaf O, et al. Demand response implemen-
tation in smart households. Energy and Buildings. 2017;143:129–148.
[57] Luo X, Lee CK, Ng WM, et al. Use of adaptive thermal storage system as
smart load for voltage control and demand response. IEEE Transactions on
Smart Grid. 2017;8(3):1231–1241.
[58] Yu CN, Mirowski P, Ho TK. A sparse coding approach to household elec-
tricity demand forecasting in smart grids. IEEE Transactions on Smart Grid.
2017;8(2):738–748.
[59] Li C,Yu X,Yu W, et al. Efficient computation for sparse load shifting in demand
side management. IEEE Transactions on Smart Grid. 2017;8(1):250–261.
[60] Wang Z, Paranjape R. Optimal residential demand response for multiple het-
erogeneous homes with real-time price prediction in a multiagent framework.
IEEE Transactions on Smart Grid. 2017;8(3):1173–1184.
Demand–response management in smart grid 109
[75] Bahrami S, Wong VW, Huang J. An online learning algorithm for demand
response in smart grid. IEEE Transactions on Smart Grid. 2017;9(5): 4712–
4725.
[76] Nikmehr N, Najafi-Ravadanegh S, Khodaei A. Probabilistic optimal schedul-
ing of networked microgrids considering time-based demand response pro-
grams under uncertainty. Applied Energy. 2017;198:267–279.
[77] Liu N,Yu X, Wang C, et al. An energy sharing model with price-based demand
response for microgrids of peer-to-peer prosumers. IEEE Transactions on
Power Systems. 2017;32(5)3569–3583.
[78] Aghajani G, Shayanfar H, Shayeghi H. Demand side management in a smart
micro-grid in the presence of renewable generation and demand response.
Energy. 2017;126:622–637.
[79] Ellabban O, Abu-Rub H. Smart grid customers’ acceptance and engagement:
An overview. Renewable and Sustainable Energy Reviews. 2016;65:1285–
1298.
[80] Manzoor A, Javaid N, Ullah I, et al. An intelligent hybrid heuristic scheme
for smart metering based demand side management in smart homes. Energies.
2017;10(9):1258.
[81] Ahmad A, Khan A, Javaid N, et al. An optimized home energy management
system with integrated renewable energy and storage resources. Energies.
2017;10(4):549.
[82] Behboodi S, Chassin DP, Djilali N, et al. Transactive control of fast-acting
demand response based on thermostatic loads in real-time retail electricity
markets. Applied Energy. 2018;210:1310–1320.
Chapter 5
Applications of multi-agent systems in smart
grid: a survey and taxonomy
Waseem Akram1 and Muaz A. Niazi1
Multi-agent systems (MASs) in the smart-grid area have received a great deal of
attention from the research community in recent years. Studies on MAS to the smart
grid have brought a number of interesting technical discussions on simulation and
modeling of the smart grid and research contributions. Researchers are trying to
bring energy efficiency and load balancing in the smart grid. Many of these research
works have achieved efficiency in power-system domain, while the social system
and consumer satisfaction still need improvement. By focusing on the MAS in smart
grid, in this part, we survey the body of knowledge and discuss the challenges of
simulation and modeling of MAS in the smart grid. We investigate and group the
existing solutions and highlight open-research problems.
5.1 Overview
5.2 Introduction
The traditional power system provides one-way power flow, which is responsible for
generation and transmission of energy to end users. However, the user demand changes
with time (variable demand). The one-way power flow could not deal with variable
demand. This problem gained the attention of researchers and introduced smart-
grid technology by integrating information and communication technology with the
traditional system. The smart grid is a power system consisting of various technologies
like a smart meter, ICT, smart homes, generators, storage devices appliances, load, etc.
1
Computer Science Department, COMSATS Institute of Information Technology, Pakistan
112 Modeling and simulation of complex communication networks
The smart grid is a network composed of distributed nodes, all operations of the
system are controlled intelligently and autonomously, in order to achieve efficient
energy system [1].
Fuel consumption changes the climate. This change attracted the researchers
to introduce renewable energy resources like solar and wind. However, the out-
come of these resources is unpredictable due to its fluctuation behavior. To achieve
future sustainability, reliability, and resilience features of the smart grid, the research
community is attracted to deploy renewable energy resources in power system. The
structure of the power system is now shifted to more bottom-up approach. This
means that all decisions related to power generation and transmission are taken by
various actors (agents) in generation unit in a distributed manner. Various actors
interact with the technical system (power system) and they are dependent on each
other.
The smart-grid system is made up of two main components, e.g., technical sys-
tem and social system. The technical system consists of power plants, power lines,
load, transformers, and busses. The social system consists of consumers, operators,
and electricity retailers. Each component of the social system interacts with each
component of the technical system.
The deployment of renewable energy resources needs more coordination, man-
agement, and controlling techniques to achieve reliable and efficient system.
A MAS is a useful tool for coordination and management of all operations within
the smart grid, due to its distributed and autonomous property. MAS is widely used
for smart-grid application. They are responsible for the management and control of all
smart-grid activities. They can perform various tasks like communication among dif-
ferent agents, fault detection and prevention, power scheduling, voltage controlling,
and storing energy.
In previous literature, a number research works have been carried out in the smart-
grid domain. However, currently, there is no such work that investigates and analyzes
these works. There is a need to find out which technique is feasible in what scenario.
In this chapter, we provide a detailed survey and comparison of different techniques
available for smart-grid system over the period 2010–16. The aim of this study is
to present a comprehensive understanding of the smart-grid domain, its application,
as well as the open-research problems that need to be addressed to gain sustainable
and reliable system. We have cited a large number of scientific publications round
about 100 papers. To the best of our knowledge, this is the first comprehensive
survey on MAS in the smart grid. While during our literature review, we found one
paper [2] that presents a survey on a specific aspect of MAS in the smart grid.
In [2], the author focuses on demand-side management, generation and transmission
management. Although the author discussed important issues in the domain. However,
there is no discussion about other relevant aspects such as communication, self-
healing, power scheduling, and storage management.
In this part, we aim to present more comprehensive and concise overview up
to date by targeting five aspects such as communication, demand-side manage-
ment, fault detection and prevention, power scheduling, and storage and voltage
management.
Applications of multi-agent systems in smart grid 113
Hierarchal framework
Li et al. [3] presented an agent-based decentralized control scheme for distributed
smart-grid network. It consists of two layers. One is the bottom layer that represents
a communication network composed of agents that act as controllers and collect
information about grid status. Second is the top layer representing a distribution
process of the power grid network. The agents at bottom layer control the power
produced by distributor grids. This study achieved balance state between power and
demand. It also reduced communication complexity and voltage variation.
One-way power communication in smart grid is considered to be slow in response.
In [4], Al-Agtash presented a novel agent-based model for two-way power commu-
nication in the smart grid. This model provides two-way power flow between user’s
demand and power generators. This architecture consists of three layers: power gen-
erators, middle-ware, and electricity agents. Agents operate in an integrated manner
within smart grid. They control and monitor demand variations and selling of power
at customer side. These agents provide reliability, security, and stability of the system.
Simulation results showed that market price decreased from 80 to 50/mW h. How-
ever, there are still some design issues, i.e., API, integrity, and consistency of agents
operation.
The decentralized management system in a smart grid makes each part of the
system intelligent and autonomous. Palicot et al. [5] have presented hierarchal cog-
nitive radio network architecture for the smart grid. The framework focuses on the
hierarchal position of each element of the system. The results showed that peak power
55,000 W reduced to 900 W. This method reduced pressure on the system and also
reduced the risk of failure.
114 Modeling and simulation of complex communication networks
Hierarchal
Coalition formation
Group communication
Census
PSO
Communication
RL
Learning ANN
Bayesian
Collaborative
CAS
Demand integration
Complex model
PSO
Adaptive program
Self-organizing MAF
WPH
MAS-SG Fault control
Fuzzy-rule
Algorithm Census
Sweep technique
Spanning tree
Self-organizing
Hierarchal
Complex model
Census
RL
Learning
ANN
RL
Learning
ANN
Volt/Var
Storage/Voltage Monitoring Census
State monitoring
Self-organizing
Normality analysis
Search
Hill-climbing
Swarm-intelligence
cost estimation that calculates the system incremental cost. This method enhanced
the system vulnerability and it requires less information. The results showed that
information loss and iteration have a direct relationship. This method gives better
results when information loss is 5%.
Bayesian learning
Information about prices and demands may be lost during the communication process.
The incomplete information affects the performance of smart grid. In [16], Misra et
al. addressed this issue regarding smart grid. In this work, the agent-based model is
proposed using Bayesian learning approach. It consists two types of agents: customer
and grid agents. Customer’s agents calculate the price given by the grid. Grid’s agents
calculate demand given by customers based on the probability of their belief. Simu-
lation results showed that utility is increased by 40%. However, this method ignored
control packet loss rate. The literature summary of communication management has
been shown in Table 5.1.
When an economic dispatch and demand response are treated as separate and
sequential operation, energy efficiency decreases. Zhang et al. [23] presented opti-
mal energy-management strategy in order to maximize social welfare. This method
operates through coordination of demand response and economic dispatch. Economic
dispatch is provided by generator and demand response by the customers. This method
is also used for discovery of the power demand–supply mismatch. The simulation
results showed convergence rate of 40 iterations.
Another approach for demand response is studied by O’Neill et al. [24] and
proposed consumer automated energy system. This technique reduces residential
energy cost and usage. This method uses online energy cost estimation and user
decision policy. This is the independent approach to energy price and system behavior.
In this method, users decide which device will use energy and how much. The results
showed 40% cost reduction by using price unaware energy scheduling.
Collaborative approach
MASs are widely used for controlling and managing a smart grid. In [1], Manick-
avasagam proposed and developed intelligent energy control center (ECC) mechanism
for the smart grid. This technique consists of two layers. The one is DER serve as a
client and the other is ECC as a server. ECC is controlled and monitored by a fuzzy
logic controller (FLC). Communication and negotiation between client servers take
place through internet protocol. The simulation results are stored in an excel database
acting as a monitoring agent. ECC uses these results for decision-making in DERs.
However, communication between results and FLC is not taken into account.
The mismatch between supply and demand reduces system performance. Paral-
lel Monte Carlo tree search (P-MCTS) can produce an optimal solution for power
balancing, but it has no coordination support. In [25], Golpayegani et al. extended
the P-MCTS work by introducing collaborative and coordination concept. Agents
negotiate with each other and present their proposal. This method resolves prob-
lems of agent’s conflict, load-shifting, and charging capacity. The results showed that
charge capacity increased from 33% to 50%. However, this model does not deal with
prediction of data.
In [26], Le Cadre and Bedo worked on uncertainty in a smart-grid environ-
ment and present decentralized hierarchal based on the learning game approach. It
is composed of supplier, generator, and consumer agents. Agents forecast demand
and production of the grid in a collaborative manner. It determines the price that
balance power and demand. The results showed that in a shared information network,
faster convergence rate is achieved using cooperative learning as compared with an
individual learning.
120 Modeling and simulation of complex communication networks
Demand-side integration
Demand-side integration in smart grid results in security, quality, efficiency, and
reduction in cost. In [30], Mocci et al. proposed a MAS for integration of demand
and electric vehicles (EVs). The load agents calculate power demand and act as master
agents. The master agents with cooperative agents send power load and global data
to the demand side. It achieved demand–response rate of 85%. It also reduced the
flow of data. However, this technique is not able to calculate the state of batteries of
different storage at the different time.
In [31], Nunna et al. proposed a priority banking scheme. It concerns with
users’ demands. This method gives some share to the users from available resources.
It monitors user demand and updates their priority. This method reduced network
loss by 50% and also reduced dependency on overall grid. However, this technique
provides fewer shares to users.
5.3.3.1 Self-organizing
Self-organization is an activity of the system in which each or some parts of the
system arrange themselves based on the local interaction among each component of
122 Modeling and simulation of complex communication networks
the system. In this section, we discuss some of the self-organization approaches that
have been carried out for addressing fault-monitoring problem.
failure and loss in the smart grid. The proposed model searches for overloaded trans-
mission lines and then redistribute power to that line. The system decreases the
transmitted power in the overloaded lines and brings the lines to in working state.
This process successfully halts the cascading failure without load shedding. How-
ever, this approach needs major hardware requirement and efficient dispatch power
history. Additionally, the algorithm also consists of a large number of constraints.
In [39], Nassar and Salama introduced the dynamic microgrid concept having
flexible boundaries. With this feature, the size of the grid can be reduced or extended
according to the need. It uses forward–backward sweep technique for power flow.
In an emergency situation, self-healing feature is achieved. The result showed good
performance when compared with fixed boundary system. However, the computation
time of this technique is very large which is 15.106/h.
In [40] by Chen et al., work is done on restoration of the power flow after a
natural disaster. In this work, multi-agent coordination control scheme based on a
mixed-integer linear program is presented. The proposed system controls on and off
status of switched devices. A local communication technique is used for discovery
of global information. The global information is used for the optimal decision. The
results showed the computation time of this technique 0.265 s. However, this work
does not focus on communication range, battery capacity, and the requirement for
global information discovery.
Multi-agent framework
Fault detection and its diagnosis avoid loss of synchronous operation in power sys-
tem. In [41], Rahman et al. presented an intelligent agent-based model for system
protection in critical time. This model has the ability of autonomous decision-making
for circuit breakers and detects a fault in critical time. Simulation results showed the
flexibility and stability of the system. However, this model cannot be implemented in
the large and complex power system.
Wolf-pack hunting
In [42], Xi et al. presented multi-agent wolf-pack hunting approach for the smart-
grid system. The wolf-pack idea is derived from a hunting group of a wild wolf
pack. The basic idea is to ensure survival in the harsh environment. This model can
handle optimal management of power distribution and can operate in load disturbance
condition. Experimental results showed that the convergence rate is 51.37%–57.4%
and the error rate is 0.5%. The agents exchange information so rapidly and calculate
the optimal policy. It increased utilization cost with reduction of generation cost.
5.3.3.2 Algorithmic approach
In the past, studies based on algorithmic approaches such as a fuzzy-rule, census,
sweep technique, and spanning tree approach are also presented in the smart-grid
domain. Next, we discuss these studies.
Fuzzy-rule
In [43], Elmitwally et al. proposed distributed system based on fuzzy rule-based multi-
agent approach. Its work mainly focuses on eliminating congestion of smart-grid
124 Modeling and simulation of complex communication networks
Census scheme
In [44], Teng et al. proposed a restoration framework for an emergency situation in a
smart-grid environment. In this method, a dynamic leader agent is used for operation
in emergency and disaster situation, and bus agents operate in a normal situation. This
method reduced communication time, and communication bandwidth is kept saved
during a disaster.
Sweep technique
In [45], Nguyen and Flueck proposed another decentralized distributed agent-based
model for power flow problem. It consists of multi-agents having autonomous, local
view, and decentralized behavior. Agents use back and forward sweep iteration
technique for power flow solving. The results showed computation time 81.96 s.
Self-organizing
Smart grid requires real-time monitoring to provide reliable services for end users.
In [47], Colson and Nehrir proposed a decentralized MAS for real-time power man-
agement in smart grid. MAS controls the grid assists based on price, resources, and
users’ demand. The experimental results show that decentralized MAS are reliable
for real-time monitoring in the smart grid. It is also shown that as time continues, the
performance of storage degrades due to discharging.
Hierarchal approach
In [48], Hu et al. proposed a hierarchal approach based on a MAS for smart-grid
operation. This approach integrates the EVs and addresses grid congestion and voltage
violation problems. The results showed good performance for power scheduling and
control. However, the communication between agents is too complex.
There have been several designs proposed for smart-grid architecture but still
facing feasibility and economy problems. In [49], Chao and Hsiung proposed fair
energy resource allocation algorithm for electricity trading among smart grid. This
technique prevents starvation situation and fatal problem. It also reduces power cost. It
achieved 96.25% fairness index even in the high worst case. However, this technique
does not take power transmission into account.
Rahman et al. [50] have proposed an agent-based model to address voltage stabil-
ity problem. In this model, agents manage their activities through online information
and power flow. They estimate voltage variation by using distributed synchronous
compensator. Simulation experiments showed robustness performance of the system.
However, communication time delay is observed 15 ms, while voltage stability has
improved.
In a smart-grid environment, there need to achieve stability and reduction in
operation cost. In [51], Radhakrishnan proposed smart-grid framework based on
the multi-agent distributed energy management system. It performs optimal energy
allocation and management in smart grid. This model consists of renewable energy
sources, storage devices, and generators. It controls power balance by the state of
charge of the batteries. Simulation results showed a reduction of total cost from 662.2
to 658.4. However, the performance of the proposed algorithm degrades under some
uncertain condition.
126 Modeling and simulation of complex communication networks
Census-based approach
A centralized system is not able to handle flexible power loads to maintain the power
balance in a smart-grid environment. In [52], Li et al. proposed a look-ahead schedul-
ing model for flexible loads in a smart-grid environment. This model consists of three
layers: centralized, distributed, and cooperative control. Load agents perform coor-
dination among agents, and cooperative control strategy is used for communication
protocol. This model provides flexible strategies to handle the large flexible load.
However, this model is not able to handle uncertainty.
In [53], Guo et al. proposed an economic dispatch scheme based on projected
gradient concerns with economic dispatch problem. It decomposes centralize opti-
mization into local optimal agents. It deals with the stochastic environment. This
scheme presents a finite time average census algorithm. In this method, agents itera-
tively calculate the solution of the optimal problem. Its communication with agents is
limited. This method achieved plug-in-play, and it does not require any private infor-
mation. It can handle quadratic and non-quadratic cost function. The results showed
that overall cost of the system reduced.
Kahrobaee et al. in [54] presented the concept of smart home within a smart-
grid environment. In this work, home is considered an agent who can buy, sell, and
store energy and interact with the grid. This framework consists of home agent based
on distributed multi-agent network. The home agent makes autonomous decisions to
buy, sell and store energy, it takes a decision based on maximum utility. The home
agent decision affects the market price. The results showed home agent decision
reduced their energy cost as it buys, sells and generate energy at the same time.
However, this method is simple and does not address all issue related to demand and
supply.
In [55], Samadi et al. addressed uncertainty issues in smart grid and present
an optimized algorithm based on the central unit. This technique only needs future
demand estimation and minimizing energy cost for each user. The results showed
that the peak to average load is 25.5% achieved. It also reduced energy expenses.
However, the complexity of the system is increased.
In [56], Gregoratti and Matamoros presented another approach for power flow
in a smart-grid environment. The proposed technique controls and manages power
flow among multiple microgrids. This technique focuses on protecting private local
information, and it is based on sub-gradient cost minimization approach. The results
showed limited iteration and faster convergence rate. However, in this work, the
communication with the main grid was not considered.
Cognitive-based approach
In [57], Bu and Yu studied green cognitive network in smart-grid application. Cogni-
tive network monitors smart-grid operation and provides information to the control
unit. The power allocation is performed based on collected information. Power allo-
cation, price, and efficiency are modeled as three-stage Stackelberg game. Results
demonstrated 31.09% cost reduction. However, this technique does not handle the
incomplete scenario.
Applications of multi-agent systems in smart grid 127
Reinforcement learning
RL technique is an essential tool for computation and estimation of payoff to achieve
game equilibrium in a smart-grid environment. Wang et al. [58] presented a scheme
based on RL technique for energy trading in the smart grid. This method chooses a
random strategy and maximizes the average utility and revenue. The proposed scheme
is able to achieve Nash equilibrium. This technique handles incomplete information
available and stochastic environment. Information is exchanged through the central
unit and protects private information. However, implementation of the finite action
learning algorithm is a challenging task in real value action environment.
In [59], Samadi et al. worked on load scheduling and power trading in a smart-
grid environment. The study considered high penetration renewable resources. They
adopt the game theory approach. In this method, users can sell their extra power to
their neighbors locally. This method handles the reverse power flow problem. This
increases the revenue and decreases energy expenses of the users. The results showed
that average energy imported is reduced to 820.2 kW from 1,360.9 kW, and energy
cost is reduced to 40.37$ from 60.91$.
Energy hub provides interaction between energy carriers in supply requiring
loads. In [60], Sheikhi et al. extended the energy hub system. This study proposed
cloud-computing concept which consists of a utility provider and customer interaction
through the cloud. The cloud takes the input of utility power and produces output to
the users. This model provides two-way communications between utility companies
and energy hub. The results showed that energy cost is reduced to 33%. However, the
proposed system is unable to predict consumer’s future demands.
In [61], by Ghorbani et al., fault-detection technique based on the MAS is
presented in a smart-grid environment. This technique combines centralize and decen-
tralize features that demonstrate the hierarchal coordination scheme. It consists of
zone agents, feeder agents, and substation agents. Zone agents provide services
to detect and locate the fault and help feeder agents to restore services using the
q-learning technique. This method needs fewer messages for communication and
reduced computation time. The results showed that 16 messages are required for
communication for 21 agents, while centralized and decentralized scheme required 20
and 38 messages, respectively. However, the number of zone agents and feeder agents
remain fixed with the system size which results in more burden and computational
time in the complex system.
Venayagamoorthy et al. [62] proposed intelligent dynamic energy management
system (I-DEMS) based on neural network and RL. They used Bellman equation
for the optimal control signal and calculate min and max cost-to-go function. They
compared this technique with DEMS based on Decision Tree method, DT is inefficient
because it supplies energy based on available power. The result shows that I-DEMS
is reliable and it extends battery life, but this technique does not predict battery sate.
128 Modeling and simulation of complex communication networks
5.3.5.1 Learning
Storage and voltage-management problem are addressed by using RL and neural
network approach. Next, we discuss these learning techniques and try to explain how
different studies addressed the storage and voltage problem in smart-grid domain.
Reinforcement learning
In [72] by Li et al., research work is concerned with the implementation of RL
technique for load-balancing problem in the smart grid. The proposed scheme is
based on dynamic hierarchal approach. It finds an optimal policy to balance power
demand and supply. It handles curse dimensionality problem. It is a fast-learning
technique in an unknown environment.
In [73], Salehizadeh and Soltaniyan proposed a fuzzy q-learning technique. It
handles multidimensional renewable power in less iteration. With this method, 40%
iterations decreased as compared to other techniques. It models electricity in continues
range.
Wind energy is uncertain and is a variable energy resource; this effects smart-
grid performance. In [74], de Montigny et al. addressed this issue and proposed
multi-agent architecture. This method calculates import and export losses. It also
calculates global-demand forecasting using minute-to-minute strategy. Additionally,
it also estimates system performance from historical data. Results obtained through
minute-to-minute strategy and showed that number of generating unit start and stop
increased by 5%. However, computational time of this method is very large.
Load frequency managing and controlling is a hot topic for research in a smart-
grid environment. The linear model is not capable of handling dynamic behavior of
the system. In [75], Daneshfar et al. addressed this issue and proposed multi-agent
RL technique which consists of two agents: estimator and controller. Estimator agent
finds frequency error, and controller agent uses genetic optimization for frequency
control. This technique showed frequency variation fall to zero through the optimal
solution. However, load disturbance is generated by reaching to maximum frequency.
In [76], Wei et al. addressed battery-management issues in a smart-grid envi-
ronment. This study proposed a dual iterative q-learning technique based on adaptive
dynamic programming for managing and controlling storage devices. In this method,
dual iteration, internal iteration for minimizing power cost, and external iteration for
finding Q function to converge into optimum is used. This algorithm converged into
optimal solution in 20 iterations. However, the proposed algorithm finds optimal solu-
tion indirectly. Initial interaction handles demand response at customer side, and the
load is considered as a flat point. Real-time interaction is used for decision-making.
This technique used hidden mode MDP. This technique outperforms as training period
is increasing. However, in the studied system, a smart home was not considered.
Integrating different types of energy storage devices in smart grid produces
implementation challenges. In [77], Qiu et al. focused on controlling and manag-
ing different types of energy-storage devices. This study proposed RL-based scheme
to optimize coordination of energy-storage devices. The results showed that system
gradually learns with time and results in an optimal solution. This study also showed
Applications of multi-agent systems in smart grid 131
that system losses decreased. However, it required large computational time, and it
does not support power-sharing feature.
Integrating photoelectric energy with smart grid decreases fossil fuel consump-
tion as well as electricity bill. In [78], Wang et al. proposed near-optimal control
algorithm for the residential storage system which controls power generation, predicts
power consumption, and accounts for various loss components during operation. They
applied RL technique for prediction amount of energy in ESS. This technique per-
forms optimization on energy price and energy demand price. Experimental results
show that the proposed algorithm outperforms and achieves up to 72% enhance-
ment in electricity-cost reduction compared with baseline storage control algorithm.
Limitation of this system is that PV generation system only works in sunlight.
Battery management plays a key role in a smart-grid environment. In [79],
Kuznetsova et al. presented a two step-ahead RL algorithm for battery scheduling
within microgrid architecture. It is composed of local consumers, generator, and stor-
age devices connected to the external grid. This technique predicts and forecasts power
demand. It finds optimal actions for battery scheduling. Simulation results showed
3.94% improvement in battery. However, the simulation running is very large.
In [80], Vandael et al. addressed day-ahead power scheduling problem for EVs
in a smart-grid environment. In this method, charging process is performed by the
heuristic scheme. The heuristic scheme is controlling and managing each EV. The
system collectively learns cost-effective scheduling strategy for EV charging through
RL technique. The results showed that average cost increased by 10%. However, this
method has some overloading and over constraint issues.
In [81], Guan et al. focused on minimizing energy cost in a smart-grid envi-
ronment. In this work, RL technique is applied to find an optimal policy to storage
devices. This method does not require any future prediction about energy generation
and consumption but the partial observable environment. The TD-lambda algorithm
is used for convergence to the optimal solution in the non-Markovian environment.
Simulation results showed 59.8% reduction in energy cost.
Artificial neural network
Battery management plays a key role in smart grid; it is important to measure the
health of batteries during operation. In [82], Landi and Gross proposed two different
techniques for estimating battery health in smart-grid application. First one is based on
fuzzy logic and the second one is a neural network. These techniques use temperature,
charging/discharging, and a number of the cycle as parameters. Results showed 5%
error rate.
5.3.5.2 Monitoring
In this section, we discuss different approaches presented for storage and voltage
monitoring.
Volt/Var control
In [83], Zhang et al. presented a multi-agent distributed algorithm for integrated
volt/var control in the smart grid. Agents are collaboratively controlling voltage and
capacitor. This method deals with the optimization of voltage profile, reducing system
132 Modeling and simulation of complex communication networks
loss, and switching of the capacitor. Two types of agents are used: switching agents
who detect and solve system fault and volt/var control agents who control power flow.
This technique controls voltage above the lower limit but does not handle voltage
below high limit. The results showed that the average time for solving power flow is
9.4405 s, which demonstrates an efficient technique. However, the solution does not
lead to optimum.
Census approach
Researchers are also interested in reducing high-power consumption and demand to
reduce cost. In [84], Sharma et al. proposed agent-based distributed control model
to address this issues. In this model, power-storage devices are used as agents. It
achieves convergence in agreement of power consumption. It prevents overcharging
and discharging of batteries. Results showed 95% and 85% charging and discharging
efficiency, respectively. However, the communication between agents is limited, and
it does not predict the state of batteries only its maximum/minimum state.
State monitoring
For dynamic state estimator, in [85], Srivastava et al. proposed a MAS for the
multi-area power system. This method divides the whole network into subsystem
and algorithm executes in parallel. This use two unit’s: field and phasor unit run
separately. At last, center controller integrates their results. The algorithm follows
cubature Kalman filter. Results showed 2.4(10−2 ) voltage error. It has been showed
that extended Kalman filter is not feasible.
In [86], Teleke et al. focused on battery management and proposed rule-based
control strategy. This technique monitors and controls charge/discharge limit and
battery lifetime. It also utilizes 70% battery capacity. The results showed voltage
deviation reduction from 24% to 4%. However, this required high-capacity batteries.
Integrating solar energy in a smart grid make it an active system which required
cyber-physical management system. In [87], the author presents a goal-based Holonic
MAS. This technique uses nested agent concept and controls power strategy and
state estimation. The results showed execution time 93 s and absolute error 0.038%.
However, the complexity of the system increased by nested agents.
In [88], Klaimi and Merghem-Boulahia focused on energy-management system
and proposed a multi-agent intelligent model for smart-grid application. In this tech-
nique, intelligent storage devices are used for storing surplus power. This technique
reduced energy cost and access to the grid. Results showed 60% cost reduction.
5.3.5.3 Searching
The searching techniques used for addressing storage and voltage problems include
self-organizing, normality analysis, hill-climbing, and swarm intelligence. Next,
these search-based techniques are discussed.
Self-organizing
The integrating and monitoring of smart microgrid is at the initial stage, and it needs
more research studies. In [89], Vaccaro et al. proposed and developed a self-organized
standalone smart microgrid framework for solving and controlling smart microgrid
Applications of multi-agent systems in smart grid 133
was discussed in [25–27]. This scheme has open issues regarding communication, pre-
diction, and probability distribution. Complex adaptive system approach was applied
in [28,29]. This reduced 40% energy cost and also peak load to 8%–5% range. How-
ever, this approach does not handle high load and also the cost of energy increased
for some users. Demand-side integration was discussed in [30,31]. The open research
issues existing in this scheme are as follows: it does not estimate the state of bat-
teries and offers fewer energy shares to the users. In [32], PSO technique based on
BEMS framework has unbalanced situation issue. The game theory approach was
also applied to address demand–response problem. This approach has open issues
related to sensitivity, information gathering, and the trade-off between cost and PAR.
It also not suitable for distribution scheme.
How to detect and prevent a fault in the system? To address this challenge, a
number of research efforts have been done and cited in our review work. We grouped
these studies into two categories, i.e., self-organizing and algorithmic approach. The
self-organizing approach consists of adaptive programming, MAF, and WPH. These
approaches can perform self-healing task in an efficient manner. However, there exist
some open research problems that are as follows: these required major hardware for
implementation, unable to address complex model, cannot address battery capacity,
there is no global information discovery.
Algorithmic approaches consist of fuzzy-rule, census, sweep and spanning tree
techniques. These studies successfully reduced congestion and communication time.
However, there still exist some open research problems that to be addressed. In this
scheme, the system performance degrades in the case of failure and no guarantee of
an optimal solution.
How to perform power scheduling? We surveyed research work and grouped
these work into two categories, i.e., complex system and learning-based model.
The complex system consists of self-organizing, hierarchal, census, and cognitive-
based approaches. The self-organizing approach is discussed in [47]. This technique
showed good performance in term of monitoring; however, performance degrades in
discharging periods. The hierarchal scheme is discussed in [48–50], this approach
has the ability to handle the starvation problem and achieved 96.5% fairness index.
However, this scheme increased complexity and computational time. Census-based
approaches are also reviewed in this part for power-scheduling task. In [52], flex-
ibility concept is introduced and provides flexible strategy to perform flexible
power transmission. In [53], the central unit is introduced and achieved 25.5%
peak to the average rate. In [54], the subgradient concept was used for cost
minimization. This showed fast convergence rate; however, there is no commu-
nication with the main grid. Pruning strategy was discussed in [56] that prune
those agents which are not participating in the communication. This reduced search
space size; however, this method is unable to prune those agents which are close to
each other.
Learning-based approaches (RL and ANN) adopted to address power scheduling
problem in the smart grid. With the adaptation of RL-based approaches, private
information was protected from external users. It provides reverse power flow facility,
where the user can send extra power back to the main grid. Cloud interaction concept
was introduced in [60], where user and utility can interact with each other through the
Applications of multi-agent systems in smart grid 137
cloud. This reduced energy cost to 33%. ANN-based approach is presented in [71],
which integrates wind energy resource with other resources. However, learning-based
approaches are still facing open-research problems that are as follows: there is no
collaborative learning, the conflict between cost and voltage, and there is no procedure
to predict system state.
How to manage and handle storage devices and voltage? To address this prob-
lem, a number of research works are discussed and reviewed in this part. We grouped
these work into three categories, i.e., learning-based, monitoring, and search-based
approaches. Regarding learning-based approach, in [74], minute-to-minute forecast-
ing strategy was applied. This increased the number of generating units. However,
the computational time is also increased. Different types of energy storage devices
was integrated with the system in [77], this decreased energy loss. In [81], two-step
ahead forecasting strategy was applied which showed 3.94% improvement in battery
life. ANN-based learning scheme was used in [83] for state estimation and showed
5% error rate.
Monitoring-based techniques consist of volt/var, census, and state monitoring.
These techniques control voltage and monitor system state. Agent-based distributed
control (ABDC) based on monitoring approach prevents overcharging and discharg-
ing of the battery. This method achieved 95% and 85% efficiency in charging and
discharging, respectively.
Search-based techniques consist of self-organizing, normality analysis, hill
climbing, and swarm intelligence. Self-organizing technique addressed application
synchronization problem in [91]. However, this technique is unable to handle semantic
data. Normality analysis is used in [92] which integrate EVs. In [93], the knowledge-
based scheme was used which provide integration of new agents, reconfiguration, and
replication services. However, this scheme requires large data for intensity control.
5.5 Conclusions
As a simulation and modeling perspective, the MAS in smart grid has recently been
attracting an increasing attention from the research community. The growing domains
of interest in MAS in the domain of smart grid are communication protocols, demand
response, self-healing, power scheduling, load balancing and storage-device manage-
ment. A number of research works have been carried out and developed multi-agent
based models for smart grid in abovementioned domains.
In this part, we covered the different approaches adopted in MAS for smart-grid
modeling and proposed a classification of MAS models according to the techniques
used for their implementation. We finally described each technique and its model.
We also highlighted open research problems exist in each solution.
The basic objective of MAS in smart-grid modeling is load balancing, to bring
balance or equilibrium between users demand and generation capacity. In another
word, MAS in smart-grid modeling deals with energy-optimization process. As for
the authors are concerned, this is the first article which clearly highlights open research
problem in MAS in the smart grid that covers a large number of different research
studies.
138 Modeling and simulation of complex communication networks
The aim of this survey was to allow a comprehensive understanding of the various
emerging development in the field of the smart grid, the different approaches, their
advantages, and limitations. We hope it will be a good guideline and a starting point
to those researchers coming to this field and desiring to increase their knowledge in
smart-grid domain from MAS perspective.
References
[1] Manickavasagam K. Intelligent energy control center for distributed gen-
erators using multi-agent system. IEEE Transactions on Power Systems.
2015;30(5):2442–2449.
[2] Siano P. Demand response and smart grids: A survey. Renewable and
Sustainable Energy Reviews. 2014;30:461–478.
[3] Li Q, Chen F, Chen M, et al. Agent-based decentralized control method for
islanded microgrids. IEEE Transactions on Smart Grid. 2016;7(2):637–649.
[4] Al-Agtash S. Electricity agents in smart grid markets. Computers in Industry.
2013;64(3):235–241.
[5] Palicot J, Moy C, Résimont B, et al. Application of hierarchical and distributed
cognitive architecture management for the smart grid. Ad Hoc Networks.
2016;41:86–98.
[6] Larsen GK, van Foreest ND, Scherpen JM. Power supply–demand balance in
a smart grid: An information sharing model for a market mechanism. Applied
Mathematical Modelling. 2014;38(13):3350–3360.
[7] Yan Y, Qian Y, Hu RQ. A secure and efficient scheme for machine-to-
machine communications in smart grid. In: Communications (ICC), 2012
IEEE International Conference on. IEEE; 2012. p. 167–172.
[8] Ye D, Zhang M, Sutanto D. Decentralised dispatch of distributed energy
resources in smart grids via multi-agent coalition formation. Journal of Parallel
and Distributed Computing. 2015;83:30–43.
[9] Dagdougui H, Sacile R. Decentralized control of the power flows in a net-
work of smart microgrids modeled as a team of cooperative agents. IEEE
Transactions on Control Systems Technology. 2014;22(2):510–519.
[10] Nguyen CP, Flueck AJ. Modeling of communication latency in smart grid.
In: Power and Energy Society General Meeting, 2011 IEEE. IEEE; 2011.
p. 1–7.
[11] Zhang Y, Rahbari-Asr N, Chow MY. A robust distributed system incremental
cost estimation algorithm for smart grid economic dispatch with communi-
cations information losses. Journal of Network and Computer Applications.
2016;59:315–324.
[12] Wang Z, Wang L. Adaptive negotiation agent for facilitating bi-directional
energy trading between smart building and utility grid. IEEE Transactions on
Smart Grid. 2013;4(2):702–710.
[13] Yu T, Wang H, Zhou B, et al. Multi-agent correlated equilibrium Q (λ) learning
for coordinated smart generation control of interconnected power grids. IEEE
Transactions on Power Systems. 2015;30(4):1669–1679.
Applications of multi-agent systems in smart grid 139
[30] Mocci S, Natale N, Pilo F, et al. Demand side integration in LV smart grids
with multi-agent control system. Electric Power Systems Research. 2015;125:
23–33.
[31] Nunna HK, Saklani AM, Sesetti A, et al. Multi-agent based demand response
management system for combined operation of smart microgrids. Sustainable
Energy, Grids and Networks. 2016;6:25–34.
[32] Hurtado L, Nguyen P, Kling W. Smart grid and smart building inter-operation
using agent-based particle swarm optimization. Sustainable Energy, Grids and
Networks. 2015;2:32–40.
[33] Wei W, Liu F, Mei S. Energy pricing and dispatch for smart grid retailers under
demand response and market price uncertainty. IEEE Transactions on Smart
Grid. 2015;6(3):1364–1374.
[34] Chai B, Chen J, Yang Z, et al. Demand response management with multiple
utility companies: A two-level game approach. IEEE Transactions on Smart
Grid. 2014;5(2):722–731.
[35] Song L, Xiao Y, Van Der Schaar M. Demand side management in smart
grids using a repeated game framework. IEEE Journal on Selected Areas in
Communications. 2014;32(7):1412–1424.
[36] Nunna HK, Doolla S. Demand response in smart distribution system with
multiple microgrids. IEEE Transactions on Smart Grid. 2012;3(4):1641–1649.
[37] O’Brien G, El Gamal A, Rajagopal R. Shapley value estimation for compen-
sation of participants in demand response programs. IEEE Transactions on
Smart Grid. 2015;6(6):2837–2844.
[38] BabalolaA, Belkacemi R, Zarrabian S. Real-time cascading failures prevention
for multiple contingencies in smart grids through a multi-agent system. IEEE
Transactions on Smart Grid. 2016;9(1):373–385.
[39] Nassar ME, Salama MM. Adaptive self-adequate microgrids using dynamic
boundaries. IEEE Transactions on Smart Grid. 2016;7(1):105–113.
[40] Chen C, Wang J, Qiu F, et al. Resilient distribution system by micro-
grids formation after natural disasters. IEEE Transactions on Smart Grid.
2016;7(2):958–966.
[41] Rahman M, Mahmud M, Pota H, et al. A multi-agent approach for enhancing
transient stability of smart grids. International Journal of Electrical Power &
Energy Systems. 2015;67:488–500.
[42] Xi L, Zhang Z, Yang B, et al. Wolf pack hunting strategy for automatic gener-
ation control of an islanding smart distribution network. Energy Conversion
and Management. 2016;122:10–24.
[43] Elmitwally A, Elsaid M, Elgamal M, et al. A fuzzy-multiagent self-
healing scheme for a distribution system with distributed generations. IEEE
Transactions on Power Systems. 2015;30(5):2612–2622.
[44] Teng F, Sun Q, Xie X, et al. A disaster-triggered life-support load restora-
tion framework based on multi-agent consensus system. Neurocomputing.
2015;170:339–352.
[45] Nguyen CP, Flueck AJ. A novel agent-based distributed power flow solver for
smart grids. IEEE transactions on Smart Grid. 2015;6(3):1261–1270.
Applications of multi-agent systems in smart grid 141
[62] Venayagamoorthy GK, Sharma RK, Gautam PK, et al. Dynamic energy man-
agement system for a smart microgrid. IEEE Transactions on Neural Networks
and Learning Systems. 2016;27(8):1643–1656.
[63] Li D, Jayaweera SK. Reinforcement learning aided smart-home decision-
making in an interactive smart grid. In: Green Energy and Systems Conference
(IGESC), 2014 IEEE. IEEE; 2014. p. 1–6.
[64] Rayati M, Sheikhi A, Ranjbar AM. Applying reinforcement learning method
to optimize an Energy Hub operation in the smart grid. In: Innovative Smart
Grid Technologies Conference (ISGT), 2015 IEEE Power & Energy Society.
IEEE; 2015. p. 1–5.
[65] Kim BG, Zhang Y, van der Schaar M, et al. Dynamic pricing and energy
consumption scheduling with reinforcement learning. IEEE Transactions on
Smart Grid. 2016;7(5):2187–2198.
[66] Wang X, Zhang M, Ren F, et al. GongBroker: A broker model for power trading
in smart grid markets. In: Web Intelligence and Intelligent Agent Technology
(WI-IAT), 2015 IEEE/WIC/ACM International Conference on. vol. 2. IEEE;
2015. p. 21–24.
[67] Lim Y, Kim HM. Strategic bidding using reinforcement learning for load
shedding in microgrids. Computers & Electrical Engineering. 2014;40(5):
1439–1446.
[68] Zhang Y, van der Schaar M. Structure-aware stochastic load management in
smart grids. In: INFOCOM, 2014 Proceedings IEEE. IEEE; 2014. p. 2643–
2651.
[69] Liao H, Wu Q, Jiang L. Multi-objective optimization by reinforcement learning
for power system dispatch and voltage stability. In: Innovative Smart Grid
Technologies Conference Europe (ISGT Europe), 2010 IEEE PES. IEEE; 2010.
p. 1–8.
[70] Shirzeh H, Naghdy F, Ciufo P, et al. Balancing energy in the smart grid
using distributed value function (DVF). IEEE Transactions on Smart Grid.
2015;6(2):808–818.
[71] Motevasel M, Seifi AR. Expert energy management of a micro-grid con-
sidering wind energy uncertainty. Energy Conversion and Management.
2014;83:58–72.
[72] Li FD, Wu M, He Y, et al. Optimal control in microgrid using multi-agent
reinforcement learning. ISA Transactions. 2012;51(6):743–751.
[73] Salehizadeh MR, Soltaniyan S. Application of fuzzy Q-learning for electricity
market modeling by considering renewable power penetration. Renewable and
Sustainable Energy Reviews. 2016;56:1172–1181.
[74] de Montigny M, Heniche A, Kamwa I, et al. Multiagent stochastic simulation
of minute-to-minute grid operations and control to integrate wind generation
under AC power flow constraints. IEEE Transactions on Sustainable Energy.
2013;4(3):619–629.
[75] Daneshfar F, Bevrani H. Load-frequency control: A GA-based multi-agent
reinforcement learning. IET Generation, Transmission & Distribution.
2010;4(1):13–26.
Applications of multi-agent systems in smart grid 143
[76] Wei Q, Liu D, Shi G. A novel dual iterative Q-learning method for optimal
battery management in smart residential environments. IEEE Transactions on
Industrial Electronics. 2015;62(4):2509–2518.
[77] Qiu X, Nguyen TA, Crow ML. Heterogeneous energy storage optimization for
microgrids. IEEE Transactions on Smart Grid. 2016;7(3):1453–1461.
[78] Wang Y, Lin X, Pedram M. A near-optimal model-based control algo-
rithm for households equipped with residential photovoltaic power generation
and energy storage systems. IEEE Transactions on Sustainable Energy.
2016;7(1):77–86.
[79] Kuznetsova E, Li YF, Ruiz C, et al. Reinforcement learning for microgrid
energy management. Energy. 2013;59:133–146.
[80] Vandael S, Claessens B, Ernst D, et al. Reinforcement learning of heuristic EV
fleet charging in a day-ahead electricity market. IEEE Transactions on Smart
Grid. 2015;6(4):1795–1805.
[81] Guan C, Wang Y, Lin X, et al. Reinforcement learning-based control of res-
idential energy storage systems for electric bill minimization. In: Consumer
Communications and Networking Conference (CCNC), 2015 12th Annual
IEEE. IEEE; 2015. p. 637–642.
[82] Landi M, Gross G. Measurement techniques for online battery state of
health estimation in vehicle-to-grid applications. IEEE Transactions on
Instrumentation and Measurement. 2014;63(5):1224–1234.
[83] Zhang X, Flueck AJ, Nguyen CP. Agent-based distributed volt/var control with
distributed power flow solver in smart grid. IEEE Transactions on Smart Grid.
2016;7(2):600–607.
[84] Sharma DD, Singh S, Lin J. Multi-agent based distributed control of distributed
energy storages using load data. Journal of Energy Storage. 2016;5:134–145.
[85] Sharma A, Srivastava SC, Chakrabarti S. Multi-agent-based dynamic state
estimator for multi-area power system. IET Generation, Transmission &
Distribution. 2016;10(1):131–141.
[86] Teleke S, Baran ME, Bhattacharya S, et al. Rule-based control of battery energy
storage for dispatching intermittent renewable sources. IEEE Transactions on
Sustainable Energy. 2010;1(3):117–124.
[87] Pahwa A, DeLoach SA, Natarajan B, et al. Goal-based holonic multiagent
system for operation of power distribution systems. IEEE Transactions on
Smart Grid. 2015;6(5):2510–2518.
[88] Klaimi J, Merghem-Boulahia L, Rahim-Amoud R, et al. An energy manage-
ment approach for smart-grids using intelligent storage systems. In: Digital
Information and Communication Technology and its Applications (DICTAP),
2015 Fifth International Conference on. IEEE; 2015. p. 26–31.
[89] Vaccaro A, Loia V, Formato G, et al. A self-organizing architecture for decen-
tralized smart microgrids synchronization, control, and monitoring. IEEE
Transactions on Industrial Informatics. 2015;11(1):289–298.
[90] Hu J, Saleem A, You S, et al. A multi-agent system for distribution grid
congestion management with electric vehicles. Engineering Applications of
Artificial Intelligence. 2015;38:45–58.
144 Modeling and simulation of complex communication networks
The term Internet refers to the global network infrastructure, connecting more than
15 billions of devices around the world. At the time of this writing, it supports a
massive distribution of information, which reaches around 1.5 ZB per year. These
estimates, however, are continuously growing: by 2021, the annual global traffic will
grow to 3.3 ZB per year [1,2]. Therefore, the Internet appears as a complex system
that is continuously evolving during the time.
The knowledge about the Internet topology has always been considered an impor-
tant aspect for researchers, industries, and service providers. It, in fact, is extremely
useful to evaluate network resilience [3], analyze topological properties and their
evolution [4], predict and improve the performance of communication protocols and
the effectiveness of routing algorithms [5], solve specific problems involving a par-
ticular topological structure (i.e., how to distribute storage across routers in order to
obtain an optimal caching allocation) [6], and so on. Thus, analytical models showing
Internet characteristics (like average shortest path and shortest path distribution) and
simulation tools able to reproduce Internet-like topologies are key instruments for
most of the research activities in this context.
Nevertheless, the complexity and the dynamism of the overall Internet architec-
ture make the study of the Internet topology as one of the hottest and hardest research
topic to solve [7]. First of all, it is important to have a clear definition of topology.
According to the Open System Interconnection (OSI) model, a topology represents a
simplified way to depict the interconnections among communication entities [8]. But,
what communication entities refers to, is not completely clear a priori. The scientific
literature, for instance, considers three main levels of granularity, namely interface
level, router level, and Autonomous System (AS)-level [9–11]. Thus, a network topol-
ogy may expose different information according to the level of granularity taken into
account. Second, differently from other large networks, like public switched telephone
1
Department of Electrical and Information Engineering (DEI), Politecnico di Bari, Bari, Italy
2
CNIT, Consorzio Nazionale Interuniversitario per le Telecomunicazioni, Italy
146 Modeling and simulation of complex communication networks
network, the Internet did not grow according to a topological design developed by
some central authority or administration [12]. Hence, huge dimension, rapid change,
and lack of publicly available information inevitably make hard to capture a complete
snapshot of the overall network infrastructure [13].
To solve this issue, several methodologies were introduced to infer topology
information, based on both active and passive approaches. These mechanisms must
be properly configured and adapted when applied to interface router and AS levels
of granularity. At the same time, however, it is also important to consider the set of
limitations they introduce, thus being able to better estimate the level of accuracy of
retrieved data [10,14–17].
Starting from inferred data, it is possible to formulate mathematical models able
to capture statistical characteristics of the Internet. Graph theory is widely used to
reach this goal [18]. In fact, many models were already developed, which refer to
regular, random, small world, and the most recent power-law and scale-free graphs
[11,19–22]. Among them, however, the scale-free graph is widely accepted as the
best model able to represent Internet-like topologies. A number of network simulators
already implement these models and are able to reproduce Internet-like topologies
that can be used in a variety of research activities.
Another important step forward in the study of the Internet topology is the model-
ing of the shortest path connecting any peers attached to the communication systems.
The scientific literature already provides models for both average shortest path and
distribution of the shortest path length [23–26].
Based on these premises, the present book chapter aims at providing an overview
of Internet-like topologies, by covering a broad set of aspects, including the level of
granularity, methodologies useful to retrieve topology information, simulation tools,
and analytical models. Then, the accuracy of reference models for the distribution
of the shortest path length (i.e., Gamma, Lognormal, and Weibull distributions) is
evaluated through a massive simulation campaign, carried out by using the Boston
university Representative Internet Topology gEnerator (BRITE) tool [27]. From one
side, obtained results demonstrate that the available models are able to catch the
average value and the distribution of the shortest path over a very broad set of condi-
tions. But, from another side, they also highlight an unresolved issue: they require a
case-by-case tuning of model parameters.
The rest of this chapter is organized as in the following. Section 6.1 presents the
main levels of granularity of the Internet topology and reviews active and passive
methodologies useful to collect data. Section 6.2 discusses Internet topology models
based on the graph theory and provides an overview of topology generator tools.
Section 6.3 investigates, through computer simulations, the accuracy of analytical
models developed for scale-free networks and identifies useful applications of the
shortest path distribution. Finally, Section 6.4 draws the conclusions.
The scientific literature generally describes the Internet topology through different
levels of granularity. In all the cases, however, the graph theory is deeply adopted
Shortest path models for scale-free network topologies 147
as a key instrument that captures well the required details of the overall network
architecture [10,28]. Such a consideration is also valid for Internet-like topologies,
like restricted portion of the Internet handled by a single Internet Service Provider
(ISP). In fact, at the time of this writing, it is common to represent the Internet topology
as an undirected graph, G. More specifically, this graph is further characterized by
the ordered pair G = (N , E), where N refers to a set of vertices (also called nodes or
points), connected by a set of E edges (also called arcs or lines) [29–31]. Without loss
of generality, it is possible to assume that devices belonging to the global network
infrastructure establish a bidirectional relationship. Therefore, the graph is considered
undirected because edges do not have any orientation.
The roles covered by both nodes and edges belonging to an Internet-like topol-
ogy strictly depends on the level of granularity selected to model the network itself.
Conventional approaches include interface level, router level, and Autonomous Sys-
tem (AS) level [9–11] (see the preliminary overview depicted in Figure 6.1). It
is important to note that details about network topology, routing policies, peer-
ing relationships, and resilience are commercially sensitive, could expose potential
vulnerability to attackers, and reveal resilience planning. Accordingly, they are not
publicly available. At the same time, the network is dynamic and constantly evolv-
ing because of failures, maintenance, and upgrades. For these reasons, information
regarding both global structure and local properties of the Internet cannot be retrieved
in an easy way. Nevertheless, dedicated approaches can be used to partially solve this
problem. They can be divided into two kinds of methodologies, namely, passive and
active [32]. The passive method learns the presence of nodes and their interactions by
simply collecting the information flowing over a wire and generated by other commu-
nication protocols (which work for different purposes). The active method, instead,
supposes to send dedicated packets (i.e., probe messages) to target devices into the
network and to collect the related responses.
The following paragraphs describe the three levels of granularity introduced
above. At the same time, they also present the most important passive and active
methodologies used to retrieve and study Internet or Internet-like network topologies.
For each single strategy, pros and cons are evaluated too (see the summary reported
in Table 6.1). Finally, they also provide an overview regarding geographic network
topologies.
Interface level
Nodes
edges
(a)
Router level
Nodes
edges
(b)
AS level
Nodes
edges
(c)
Figure 6.1 Internet topology at three main levels of granularity: (a) Interface level,
(b) Router level, and (c) AS level
topology maps a given network interface and edges refer to direct connections between
nodes [10]. Routers with multiple configured network interfaces are mapped to
multiple logical nodes. Thus, the resulting interface-level topology embraces a num-
ber of nodes equal to the number of active network interfaces with an IP address
and a number of edges equal to the amount of direct connections established at
the network layer.
Shortest path models for scale-free network topologies 149
Table 6.1 Methodologies used to retrieve and study Internet-like topologies and
their related issues
IP datagram can pass, at most, through x consecutive routers before being discarded.
This is because, every intermediate router decrements the TTL value by 1 unit before
triggering the forwarding process. Therefore, as soon as the TTL value reaches the
value 0, the corresponding IP datagram is no more forwarded toward the destination
interface, but an ICMP Time Exceeded message is sent back to the source node for
notification purposes.
Starting from these premises, traceroute works as follows. At the beginning, the
device that runs the tool issues a group of ICMP messages, whose TTL value is set
to 1. Note that more than one message is sent at each step because the procedure
intends to collect statistical information related to communication delays (such as
minimum, maximum, and average value of the round trip time, generally expressed
in milliseconds). These initial packets reach only the node directly connected to the
sender, before being discarded. The ICMP Time Exceeded messages generated by
this node are used by the sender to infer details about the first network interface of
the forwarding path. Then, a new set of ICMP messages is sent with a TTL value set
to 2. In line with the process described above, the sender can now learn information
about the second hop of the forwarding path toward the destination. This process is
repeated until the destination node is reached. At the end, the sender collects some
details of the network topology, on a hop-by-hop basis [11].
It is important to remark that two main limitations affect traceroute [14]. First, if
some routers do not implement ICMP, the acquired forwarding path will not consider
some of the intermediate network interfaces. Second, in the event that a intermediate
router implements a load-balancing strategy, traceroute will generate results referring
to multiple paths through which packets are sent. Thus, the acquired forwarding path
will include additional network interfaces and the learned network topology could
not exactly capture the reality.
This technique can suffer from incomplete data because of relationship policies and
routing preferences that make the packet observe only some paths and missing other
ones.
belong to two opposite paths, is used. After having identified the subnets, aliases are
inferred by analyzing path segments [11].
It is possible to conclude that alias resolution techniques are generally considered
accurate. But, sometimes retrieved data could be incomplete. The reason is that
traceroute can fail when nodes are disconnected, turned off, or configured to not
respond to probe packets [17].
6.1.3 AS level
Before introducing the latest level of granularity useful to describe Internet-like
topologies, it is important to remark that the global network infrastructure appears as
a connection of several autonomous systems (ASs). Each AS is made up by a group
of routers deployed by one or more network operators, on behalf of a single admin-
istrative entity [38]. For instance, an AS can refer to the network of a large company,
a university, a network service provider, and so on. Typically, individual users, small
enterprise networks, and ASs located at the edge of the Internet can join the global
network through other ASs, namely, ISP. In turn, ISPs may obtain the same service
from one or more upstream ISPs. Each AS is uniquely identified by an AS number
(ASN). Originally, it was defined as a 16-bit integer (by admitting a maximum of
65.536 assignments). Then a 32-bit ASN has been introduced in order to uniquely
identify a higher number of ASs [39]. In addition, ASs are divided into two categories:
transit and stub. A transit AS is part of the core network and usually carries traffic
between isolated domains, managed by different administrative entities. A stub AS,
instead, provides Internet connectivity to end users. Thus, from one side, it is con-
nected to end users. From another side, it is connected to the rest of the Internet
through one or more transit ASs. Sometimes, the administrator of a given AS can
Shortest path models for scale-free network topologies 153
change its own traffic relationship with other providers, thus modifying the overall
network architecture and making the resulting topology constantly evolving.
The AS level of granularity, also known as inter-domain description, depicts the
Internet architecture as a group of interconnected ASs. Accordingly, it brings to an
undirected graph where each node identifies one AS and edges represent the logi-
cal peering relationship between two adjacent ASs (see Figure 6.1(c)). Despite its
coarse level of details, the AS level of granularity is frequently leveraged to study,
control, optimize, and implement inter-domain routing, mechanisms for the provi-
sioning of the quality of service, and customer-provider and peering relationships
between ISPs.
Also in this case, both passive and active mechanisms can be used to infer infor-
mation related to the AS level topology. The first mechanism basically collects data
generated by the Border Gateway Protocol (BGP) [40] or provided by the Inter-
net Routing Registry [41]. The second one investigates forwarding paths through
traceroute.
There is also a the Newman–Watts variant of the Watts–Strogatz network that does not
include the removal of the edges from the underlying lattice in the building process.
In this model, edges are only added between pairs of nodes in the same way as in a
Watts–Strogatz network [52].
Moreover, [21] studied the neighborhood size within some distances. Also in
this case, the relation follows a power law, but it was considered an approximation
because of the small number of samples. In particular, let P(h) be the total number
of pairs of nodes within h hops. P(h) is proportional to the number of hops to the
power of a constant H , according to the relation P(h) ∝ cH when h δ, where δ is
the diameter of the network and c = N + 2E.
After [21], several researchers supported these findings and tried to further under-
stand the origin of the power law [53,54]. A very important contribution was provided
by the Barabasi–Albert model [22]. At the same time, the literature also proposes
opposing theories. For instance, Chen et al. [55] argued that an AS level topology
does not include all the Internet connectivity. In fact, at least 20%–50% of the physi-
cal links are missing. Therefore, the node degree distribution does not follow a strict
power-law relationship.
● Efficiency: the tool should be able to generate large topologies by preserving the
required statistical characteristics and by using a reasonable CPU and memory
consumption.
● User friendliness: the usage of the tool should be easy to learn.
low-level square. BRITE also provides a bandwidth value to each link according to
four distributions:
● Constant: All links have the same value.
● Uniform: Bandwidth values are assigned according to a uniform distribution
between two input values.
● Exponential: Bandwidth values are assigned according to an exponential distri-
bution with mean equal to an input value.
● Heavy-tailed: Bandwidth values are assigned according to a heavy-tailed distri-
bution (Pareto with shape 1.2) with minimum and maximum values equal to two
input values.
path length and graph diameter. For scale-free networks, the average shortest path
length, d̄, is approximately equal to
d̄ ≈ logN , (6.4)
where N represents the number of nodes in the topology. In particular, this formula
refers to the scale-free network that are built by adding each new vertex to m other
nodes with m = 1. Otherwise, if m > 1, the average shortest path, d̄, is asymptotically
equal to
log N
d̄ ∼ (6.5)
log log N
1 1 η−1 x/θ
f (x; θ, η) = x e (6.6)
θ η (η)
where (·) is the gamma function and θ > 0 and η > 0 are scale and shape parameters,
respectively. This model indicates that the distance distribution of all nodes consists
of two regimes. The former is characterized by a rapid growth. The latter refers to an
exponential decay.
1 2 2
f (x; μ, σ ) = √ e(logx−μ) /2σ , (6.8)
xσ 2π
where − inf < μ < + inf is the logarithm of the mean and σ > 0 is the logarithm of
the standard deviation.
0.6 1
Weibull
Gamma 0.8
Lognormal
0.4
Simulation 0.6
pdf
cdf
0.4 Weibull
0.2
Gamma
0.2 Lognormal
Simulation
0 0
0 10 20 30 0 10 20 30
h hops h hops
(a) N = 5,000 (b) N = 5,000
0.6 1
Weibull
Gamma 0.8
Lognormal
0.4
Simulation 0.6
pdf
cdf
0.4 Weibull
0.2
Gamma
0.2 Lognormal
Simulation
0 0
0 10 20 30 0 10 20 30
h hops h hops
(c) N = 10,000 (d) N = 10,000
0.6 1
Weibull
Gamma 0.8
Lognormal
0.4
Simulation 0.6
pdf
cdf
0.4 Weibull
0.2
Gamma
0.2 Lognormal
Simulation
0 0
0 10 20 30 0 10 20 30
h hops h hops
(e) N = 20,000 (f) N = 20,000
Figure 6.2 Probability density function and cumulative distribution function of the
shortest path length, obtained for m = 1 and different values of N :
(a) N = 5,000, (b) N = 5,000, (c) N = 10,000, (d) N = 10,000,
(e) N = 20,000, and ( f ) N = 20,000
0.6 1
Weibull
Gamma 0.8
Lognormal
0.4
Simulation 0.6
pdf
cdf
0.4 Weibull
0.2
Gamma
0.2 Lognormal
Simulation
0 0
0 10 20 30 0 10 20 30
h hops h hops
(a) N = 5,000 (b) N = 5,000
0.6 1
Weibull
Gamma 0.8
Lognormal
0.4
Simulation 0.6
pdf
cdf
0.4 Weibull
0.2
Gamma
0.2 Lognormal
Simulation
0 0
0 10 20 30 0 10 20 30
h hops h hops
(c) N = 10,000 (d) N = 10,000
0.6 1
Weibull
Gamma 0.8
Lognormal
0.4
Simulation 0.6
pdf
cdf
0.4 Weibull
0.2
Gamma
0.2 Lognormal
Simulation
0 0
0 10 20 30 0 10 20 30
h hops h hops
(e) N = 20,000 (f) N = 20,000
Figure 6.3 Probability density function and cumulative distribution function of the
shortest path length, obtained for m = 2 and different values of N :
(a) N = 5,000, (b) N = 5,000, (c) N = 10,000, (d) N = 10,000,
(e) N = 20,000, ( f ) N = 20,000
if the source IP addresses have the same hop-count value as that of an attacker.
Therefore, it is important to check if the hop-count distributions are not clustered
around a single value at various locations of the network. The hop count will be
more effective if the standard deviation is high.
● Epidemic spreading: Shortest path distribution can be exploited in the study of
epidemic spreading models [26,73]. In particular, path-length statistics are closely
Shortest path models for scale-free network topologies 165
0.6 1
Weibull
Gamma 0.8
Lognormal
0.4
Simulation 0.6
pdf
cdf
0.4 Weibull
0.2
Gamma
0.2 Lognormal
Simulation
0 0
0 10 20 30 0 10 20 30
h hops h hops
(a) N = 5,000 (b) N = 5,000
0.6 1
Weibull
Gamma 0.8
Lognormal
0.4
Simulation 0.6
pdf
cdf
0.4 Weibull
0.2
Gamma
0.2 Lognormal
Simulation
0 0
0 10 20 30 0 10 20 30
h hops h hops
(c) N = 10,000 (d) N = 10,000
0.6 1
Weibull
Gamma 0.8
Lognormal
0.4
Simulation 0.6
pdf
cdf
0.4 Weibull
0.2
Gamma
0.2 Lognormal
Simulation
0 0
0 10 20 30 0 10 20 30
h hops h hops
(e) N = 20,000 (f) N = 20,000
Figure 6.4 Probability density function and cumulative distribution function of the
shortest path length, obtained for m = 3 and different values of N :
(a) N = 5,000, (b) N = 5,000, (c) N = 10,000, (d) N = 10,000,
(e) N = 20,000, and ( f ) N = 20,000
Table 6.2 Theoretical and simulated average shortest path and average diameter
obtained through computer simulations for different values of m
m=1
Table 6.3 Goodness of fit obtained through the Kolmogorov–Smirnov test for the
three shortest path distribution models, for different values of N and m
m=1
6.4 Conclusion
The chapter focuses on Internet topology models. First, it investigated topology rep-
resentations provided at different levels of granularity. Second, it reviewed topology
models based on the graph theory and related topology generator tools. Third, it stud-
ied analytical models showing average shortest path and distribution of the shortest
path length for scale-free networks. In fact, starting from these concepts, the accuracy
of reference models for the shortest path are studied and compared through computer
simulations. Obtained results demonstrate that available models are able to catch the
average value and the distribution of the shortest path distribution over a very broad
set of conditions. More specifically, when m = 1, the Gamma distribution shows the
lowest distance for all N values. When m = 2 and m = 3, instead, the Weibull distri-
bution provides the lowest error. The lognormal distribution always registers the worst
behavior. On the contrary, all the evaluated models require a case-by-case tuning of
their parameters, which represents an important limit to be solved in future research
activities.
Acknowledgment
This work was partially founded by PON projects founded by the Italian MIUR, includ-
ing Pico&Pro (code: ARS01_01061) AGREED (code: ARS01_00254), FURTHER
(code: ARS01_01283), and RAFAEL (code: ARS01_00305), and by the research
project E-SHELF (code: OSW3NO1) founded by the Apulia Region – Italy.
168 Modeling and simulation of complex communication networks
References
[1] Cisco. The Zettabyte Era: Trends and Analysis. San Jose, CA, USA: Cisco
Systems, Inc.; 2016.
[2] Huawei. Global Connectivity Index 2016. Huawei Technologies Co., Ltd.;
2016.
[3] Sterbenz JP, Çetinkaya EK, Hameed MA, et al. Evaluation of network
resilience, survivability, and disruption tolerance: analysis, topology gen-
eration, simulation, and experimentation. Telecommunication Systems.
2013;52(2):705–736.
[4] Gregori E, Improta A, Lenzini L, et al. Discovering the geographic
properties of the Internet AS-level topology. Networking Science. 2013;
3(1):34–42.
[5] Sun L, Song F, Yang D, et al. DHR-CCN, distributed hierarchical routing for
content centric network. Journal of Internet Services and Information Security.
2013;3(1/2):71–82.
[6] Wang Y, Li Z, Tyson G, et al. Optimal cache allocation for content-centric
networking. In: Proceedings of International Conference on Network Protocols
(ICNP). IEEE; 2013. p. 1–10.
[7] Oliveira R, Pei D, Willinger W, et al. The (in) completeness of the observed
internet AS-level structure. IEEE/ACM Transactions on Networking (ToN).
2010;18(1):109–122.
[8] Roughan M, Willinger W, Maennel O, et al. 10 Lessons from 10 years of
measuring and modeling the internet’s autonomous systems. IEEE Journal on
Selected Areas in Communications. 2011;29(9):1810–1821.
[9] Donnet B, Friedman T. Internet topology discovery: a survey. IEEE Commu-
nications Surveys & Tutorials. 2007;9(4):56–69.
[10] Motamedi R, Rejaie R, Willinger W. A survey of techniques for Internet
topology discovery. IEEE Communications Surveys & Tutorials. 2015;17(2):
1044–1065.
[11] Haddadi H, Rio M, Iannaccone G, et al. Network topologies: inference,
modeling, and generation. IEEE Communications Surveys Tutorials. 2008
Second;10(2):48–69.
[12] Zegura EW, Calvert KL, Donahoo MJ. A quantitative comparison of graph-
based models for internet topology. IEEE/ACM Transactions on Networking.
1997;5(6):770–783.
[13] Floyd S, Paxson V. Difficulties in simulating the Internet. IEEE/ACM
Transactions on Networking (ToN). 2001;9(4):392–403.
[14] Augustin B, Cuvellier X, Orgogozo B, et al. Avoiding traceroute anomalies
with Paris traceroute. In: Proceedings of the 6th ACM SIGCOMM Con-
ference on Internet Measurement. IMC ’06. New York, NY: ACM; 2006.
p. 153–158.
[15] Roughan M, Willinger W, Maennel O, et al. 10 lessons from 10 years of measur-
ing and modeling the internet’s autonomous systems. IEEE Journal on Selected
Areas in Communications. 2011;29(9):1810–1821.
Shortest path models for scale-free network topologies 169
[16] Pansiot JJ, Mérindol P, Donnet B, et al. Extracting intra-domain topology from
MRINFO Probing. In: PAM. Springer; 2010. p. 81–90.
[17] Keys K. Internet-scale IP alias resolution techniques. SIGCOMM Computer
Communication Review. 2010;40(1):50–55.
[18] Pastor-Satorras R, Vespignani A. Evolution and structure of the Internet: a
statistical physics approach. Cambridge, UK: Cambridge University Press;
2007.
[19] Erdos P, Rényi A. On the evolution of random graphs. Publication of the Math-
ematical Institute of the Hungarian Academy of Sciences. 1960;5(1):17–60.
[20] Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’networks. Nature.
1998;393(6684):440.
[21] Faloutsos M, Faloutsos P, Faloutsos C. On power-law relationships of the inter-
net topology. In: Proceedings of the Conference on Applications, Technologies,
Architectures, and Protocols for Computer Communication. SIGCOMM ’99.
New York, NY: ACM; 1999. p. 251–262.
[22] Barabási AL. Network Science. Cambridge, UK: Cambridge University Press;
2016.
[23] Bauckhage C, Kersting K, Rastegarpanah B. The Weibull as a model of shortest
path distributions in random networks. In: Proceedings of Int. Workshop on
Mining and Learning with Graphs, Chicago, IL; 2013.
[24] Vazquez A. Polynomial growth in branching processes with diverging repro-
ductive number. Physical Review Letters. 2006;96(3):038702.
[25] Kalisky T, Cohen R, Mokryn O, et al. Tomography of scale-free networks and
shortest path trees. Physical Review E. 2006;74(6):066108.
[26] Bauckhage C, Kersting K, Hadiji F. Parameterizing the distance distribution
of undirected networks. In: UAI; 2015. p. 121–130.
[27] Medina A, Lakhina A, Matta I, et al. BRITE: an approach to universal topology
generation. In: Proceedings of Ninth International Symposium on Modeling,
Analysis and Simulation of Computer and Telecommunication Systems. IEEE;
2001. p. 346–353.
[28] Baumann A, Fabian B. How robust is the internet?—Insights from graph anal-
ysis. In: Proceedings of International Conference on Risks and Security of
Internet and Systems. Springer; 2014. p. 247–254.
[29] Bollobás B. Modern Graph Theory. vol. 184. Springer Science & Business
Media; 2013.
[30] Calvert KL, Doar MB, Zegura EW. Modeling internet topology. New York,
USA: IEEE Communications Magazine. 1997;35(6):160–163.
[31] Zegura EW, Calvert KL, Donahoo MJ. A quantitative comparison of graph-
based models for Internet topology. IEEE/ACM Transactions on Networking
(TON). 1997;5(6):770–783.
[32] John W, Tafvelin S, Olovsson T. Passive internet measurement: overview
and guidelines based on experiences. Computer Communications.
2010;33(5):533–550.
[33] Kurose JF, Ross KW. Computer Networking: A Top-Down Approach. vol. 5.
Boston, MA, USA: Addison-Wesley Reading; 2010.
170 Modeling and simulation of complex communication networks
[71] Jin C, Wang H, Shin KG. Hop-count filtering: an effective defense against
spoofed DDoS traffic. In: Proceedings of the 10th ACM Conference on Com-
puter and Communications Security. CCS ’03. New York, NY: ACM; 2003.
p. 30–41.
[72] Wang H, Jin C, Shin KG. Defense against spoofed IP traffic using hop-
count filtering. IEEE/ACM Transactions on Networking (ToN). 2007;15(1):
40–53.
[73] Iannelli F, Koher A, Brockmann D, et al. Effective distances for epidemics
spreading on complex networks. Physical Review E. 2017;95(1):012313.
[74] Hunkeler U, Truong HL, Stanford-Clark A. MQTT-S A publish/subscribe
protocol for wireless sensor networks. In: Proceedings of Int. Conf.
on Communication Systems Software and Middleware. IEEE; 2008.
p. 791–798.
[75] Xylomenos G, Ververidis CN, Siris VA, et al. A survey of information-
centric networking research. IEEE Communications Surveys & Tutorials.
2014;16(2):1024–1049.
[76] Piro G, Amadeo M, Boggia G, et al. Gazing into the crystal ball: when
the Future Internet meets the Mobile Clouds. IEEE Transactions on Cloud
Computing. 2016.
[77] Sanguankotchakorn T, Jaiton P. Effect of triangular routing in mixed IPv4/IPv6
networks. In: Networking, 2008. ICN 2008. Seventh International Conference
on. IEEE; 2008. p. 357–362.
Part III
Case studies and more
Chapter 7
Accurate modeling of VoIP traffic in modern
communication
Homero Toral-Cruz1 , Al-Sakib Khan Pathan2 ,
and Julio C. Ramírez Pacheco3
7.1 Introduction
In the recent years, voice has become one of the most attractive and important ser-
vices in telecommunication networks which can be transmitted via circuit-switched
and packet-switched networks. The most common examples of circuit-switched and
packet-switched networks are the public switched telephone network (PSTN) and
Internet, respectively [1]. Compared to traditional resource-dedicated PSTN, the
Internet is resource-shared. Therefore, the conditions in the PSTN are totally dif-
ferent from those in the Internet. There are several advantages in the case of voice
transmission using internet protocol (IP) technology, also called voice over IP (VoIP)
[2]: the reduced communication cost, the use of joined IP infrastructure, the use in
multimedia applications, etc. It is also an interesting fact that sending wireless phone
calls over IP networks is considerably less expensive than that of sending over cellular
voice networks. However, such types of communications must ensure good perfor-
mance and quality of voice transmissions. Faster wireless networking technologies
and more powerful mobile telephones promise to help solve these problems [3,4].
With the currently available technologies, VoIP over mobile networks has not yet got
that much popularity. However, the advantage of such technology is that keeping the
wireless network as it is, with a higher quality of service (QoS) facility, the existing
infrastructure of IP networks could be used for serving the users. If a company owns
the Wi-Fi connection, such VoIP communication could be provided at a very low
cost. Either Wi-Fi or WiMAX network or even if any of these is unavailable, cellular
technologies could be used to pass the voice traffic to the Internet. This is known as
mobile VoIP (mVoIP). mVoIP via cellular services could achieve QoS by prioritizing
voice packets over those used for data and other traffic types [2].
1
NETCOM Laboratory, Department of Sciences and Engineering, University of Quintana Roo, Mexico
2
Department of Computer Science and Engineering, Southeast University, Bangladesh
3
Department of Basic Sciences and Engineering, Universidad Del Caribe, Mexico
176 Modeling and simulation of complex communication networks
The emergence of new real-time services, such as VoIP and mVoIP, has allowed
the growth and evolution of the Internet as a modern communication network, char-
acterized by its complex nature. The main reason for its complex nature is the result
of the convergence of information and media transmission (voice, video, and data)
through the same communication channel. As there are a very high (and increasing)
number of nodes (i.e., devices) connected to the network, and these are being added
in a random and decentralized manner, the network has the property of scale-free,
meaning that the node degree distribution follows a power-law distribution [5]. With
the increase in number of nodes and connections, the traffic load in the network
increases and congestions occur accordingly.
Congestion is a cause of the QoS impairment, which consists of delay issues
(i.e., delay and jitter) and packet loss. For real-time communications, such as VoIP
and mVoIP, jitter and packet loss can have high impact on the QoS. To achieve a
satisfactory level of voice quality, theVoIP networks must be designed by using correct
traffic models [2]. Traffic modeling in classical and modern communication networks
mainly comprise the following steps [6]: (1) Selection of one or more models that
may provide a good description of the statistical properties of packet traffic. In order
to select an adequate traffic model, it is necessary to study the traffic characteristics.
The main characteristics of a traffic source are its average data rate, burstiness, and
correlation. The average data rate gives an indication of the expected traffic volume
for a given period of time. Burstiness describes the tendency of traffic to occur in
clusters. Data burstiness is manifested by the correlation function which describes the
relationship between packet arrivals at different times and is an important factor in
packet losses due to buffer and bandwidth limitations. (2) Estimation of parameters
for the selected model. Parameter estimation is based on a set of statistics (e.g.,
mean, variance, density function or autocovariance function (ACV), and multifractal
characteristics) that are measured or calculated from observed data traces. The set
of statistics used in the inference process depends on the impact they may have in
the main QoS parameters of interest. (3) Statistical testing for election of one of the
considered models and analysis of its suitability to describe the traffic type under
analysis.
Previously, the statistical properties of the IP traffic were described by Poisson
model, where the autocorrelation decays exponentially fast; this is because in the
early stages, the size of the Internet was small and its structure was simple (simple
packet network). However, the explosive growth in the number of users and the
growing diversity of real-time traffic have allowed to discover complex behaviors
in the IP traffic, where the Poisson model is not able to capture all their statistical
properties [5,7]. Such behaviors can have significant impact on network performance
and can be well described by long-range dependence (LRD), self-similarity, and
multifractality [6]. The LRD behavior manifests itself along a communication channel
as a bursty activity in the packet rate, which produces a wide range of traffic volume
away from the average rate and persists strongly on all relevant time scales [5]. In the
LRD traffic, the autocorrelation decays slowly as a power-law function. This great
variation in the traffic volume leads to buffer overflow and network congestion that
result in packet loss and jitter, which directly impact the quality of VoIP applications.
Accurate modeling of VoIP traffic in modern communication 177
This chapter presents the jitter and packet-loss modeling on VoIP traffic by means
of network measurements. We basically modeled the main QoS parameters of VoIP
traffic which could be related with regular VoIP communications over regular-wired
networks as well as for mVoIP (which functions as an application that runs over any
wireless network technology that provides data access to the Internet). Hence, our
modeling work is basically relevant to both regular VoIP and mVoIP technologies.
Links
Routers
Hosts Hosts
network. This concept encapsulates the convergence of network processes where the
convergent network is called multiservice network and is based on IP technology. In
the multiservice network, packets should be transported transparently from host to
host without excessive protocol conversion through an IP network core (Internet) [11].
However, with this convergence, a new technical challenge has emerged. The IP
network core provides best-effort services in most of the cases and cannot guarantee
the QoS of real-time multimedia applications, such as VoIP [12].
The efficient design of modern communication networks is a complex task and it
involves complex mathematical topics. An efficient network design can be achieved by
using accurate models and that ensures that the network has the necessary capabilities
of providing services with a certain standard QoS [8].
In the abovementioned context, the connectivity of a communication network is
modeled by using concepts from graph theory, i.e., a packet network can be modeled
by means of a graph G = (V , E), where V is the set of nodes, E is the set of links
and the degree of a node is the number of nearest neighbors. The degree of a node
is a local quantity; however, an interesting study is the node degree distribution of
the entire network, because it gives important information about the global properties
of a network and can be used to characterize different network topologies. From the
perspective of node degree, the main topologies used in communication networks are
the following [5]:
● Regular-symmetric networks: The regular-symmetric network has the same
degree for all nodes, e.g., the ring network, rectangular toroidal network,
triangular toroidal network, hexagonal toroidal network.
● Random networks: In the random network, the node degree distribution is well
approximated by a binomial distribution.
● Scale-free networks: In the scale-free network, the node degree distribution is
described by a power-law.
Recent studies have shown that the real topology of the modern communication net-
works is neither completely random nor completely regular-symmetric but it basically
shows a prominent characteristic of self-organization, with which the node degree dis-
tribution is described by a power-law [5]. This scale-free network model was proposed
by Barabási and Albert in [13] and it involves one of the following three operations
for each time step: add new links between the existing nodes, rewire links, and add
new nodes.
Often, we have a communication network with a particular topology and we seek
traffic models to measure or predict the network performance as a function of some
QoS parameters, such as delay and packet loss. This activity mainly involves the
probabilistic relations between traffic, network resources, and QoS and it is better
known as traffic theory [8].
At the early stage of the packet networks, the traffic and congestions were rather
sparse, and one of the models which were widely applied for the traffic modeling was
the Poisson model. However, with the evolution from the simple packet network to
multiservice network, the classical Poisson model fails to model the new aggregated
traffic. Besides, recent studies in modern communication networks have shown that
Accurate modeling of VoIP traffic in modern communication 179
Lower Physical
RTP/UDP IP
layers interface
Waveform
Terminal output reference point
Continuous digital
Asynchronous packets
Figure 7.2 Source terminal diagram from a VoIP system. Adopted, with permission,
from Reference [14]
180 Modeling and simulation of complex communication networks
De-jitter D/A
De-packetization Decoder Spkr
buffer converter
Figure 7.3 Destination terminal diagram from a VoIP system. Adopted, with
permission, from Reference [14]
Accurate modeling of VoIP traffic in modern communication 181
the terminal input reference point. Then it passes through the lower layers and then
through the IP and RTP/UDP blocks. After that, the sequence of voice datagrams is
passed to the de-jitter buffer, which performs very important tasks, in the sense of
QoS because it is used to compensate the network jitter at the cost of further delay
(buffer delay) and loss (late arrival loss). Therefore, the de-jitter buffer defines the
relationship between jitter and loss on the receiver side.
At this point, voice stream is already impaired by the delay and loss due to the
traversed network. Additional impairments occur due to the additional delay of the
remaining blocks and the additional loss caused by the discarding of highly delayed
packets, compared to de-jitter buffer size which is desirably optimized [16]. Therefore,
an important design parameter at the receiver side is the de-jitter buffer size, because
this parameter becomes the essential descriptor of intrinsic quality that supplants
jitter. The packet headers are stripped off and voice samples are extracted in the de-
packetization block. Finally, the de-jittered voice samples are decoded to recover the
original voice signal.
7.3.2.1 H.323
ITU-T (ITU Telecommunication Standardization Sector) H.323 is a set of protocols of
voice, video, and data conferencing over packet-switched networks such as Ethernet
Local Area Networks (LANs) and the Internet that do not provide a guaranteed QoS
[18,17]. The H.323 protocol stack is designed to operate above the transport layer of
the underlying network. H.323 was originally developed as one of the several video-
conferencing recommendations issued by the ITU-T. The H.323 standard is designed
to allow clients on H.323 networks to communicate with clients on other videocon-
ferencing networks. The first version of H.323 was issued in 1996, designed for use
with Ethernet LANs and borrowed much of its multimedia conferencing aspects from
other H.32.x series recommendations. H.323 is part of a large series of communica-
tion standards that enable videoconferencing across a range of networks. This series
also includes H.320 and H.324, which address the ISDN and PSTN communications,
respectively. H.323 is known as a broad and flexible recommendation. Although H.323
182 Modeling and simulation of complex communication networks
The H.323 architecture is partitioned into zones. Each zone comprises the collection
of all terminals, GW and MCU managed by a single GK. H.323 is an umbrella
Accurate modeling of VoIP traffic in modern communication 183
code from 100 to 199 is considered provisional. Responses from 200 to 699 are
final responses.
– 1xx informational: Request received, continuing to process request. The client
should wait for further responses from the server.
– 2xx success: The action was successfully received, understood, and accepted.
The client must terminate any search.
– 3xx redirection: Further action must be taken in order to complete the request.
The client must terminate any existing search but may initiate a new one.
– 4xx client error: The request contains bad syntax or cannot be fulfilled at this
server. The client should try another server or alter the request and retry with
the same server.
– 5xx server error: The request cannot be fulfilled at this server because of
server error. The client should try with another server.
– 6xx global failure: The request is invalid at any server. The client must
abandon search.
The first digit of the status code defines the class of response. The last two digits
do not have any categorization role. For this reason, any response with a status code
between 100 and 199 is referred to as a “1xx response,” any response with a status
code between 200 and 299 as a “2xx response,” and so on.
● SIP requests: The core SIP specification defines six types of SIP requests, each
of them with a different purpose. Every SIP request contains a field, called a
method, which denotes its purpose.
– INVITE: INVITE requests invite users to participate in a session. The body
of INVITE requests contains the description of the session. Significantly,
SIP only handles the invitation to the user and the user’s acceptance of the
invitation. All of the session particulars are handled by the SDP used. Thus,
with a different session description, SIP can invite users to any type of session.
– ACK: ACK requests are used to acknowledge the reception of a final response
to an INVITE. Thus, a client originating an INVITE request issues an ACK
request when it receives a final response for the INVITE.
– CANCEL: CANCEL requests cancel pending transactions. If a SIP server
has received an INVITE but not yet returned a final response, it will stop
processing the INVITE upon receipt of a CANCEL request. If, however, it
has already returned a final response for the INVITE, the CANCEL request
will have no effect on the transaction.
– BYE: BYE requests are used to abandon sessions. In two-party sessions,
abandonment by one of the parties implies that the session is terminated.
– REGISTER: Users send REGISTER requests to inform a server (in this case,
referred to as a registrar server) about their current location.
– OPTIONS: OPTIONS requests query a server about its capabilities, including
which methods and which SDPs it supports.
SIP is independent of the type of multimedia session handled and of the mechanism
used to describe the session. Sessions consisting of RTP streams carrying audio and
186 Modeling and simulation of complex communication networks
video are usually described using SDP, but some types of sessions can be described
with other description protocols. In short, SIP is used to distribute session descriptions
among potential participants. Once the session description is distributed, SIP can be
used to negotiate and modify the parameters of the session and terminate the session.
CODEC Ie
Ie Model
PLR
R
E-Model MOS
Id
Delay Id Model
R = R0 − Is − Id − Ie + A (7.1)
where Id is a function of the delay T and Ie is a function of the used CODEC type
(G.711 or G.729) and PLR . The delay components within the function Id are OWD
and round trip time delay. In order to simplify the expression for Id , we used (7.3).
Figure 7.4 illustrates how the E-Model may be used to predict the voice quality in
VoIP applications.
188 Modeling and simulation of complex communication networks
Besides, the relationship between the R factor and MOS is given by the next
expression:
MOS = 1; R<0
MOS = 1 + 0.035R + 7 · 10−6 R(R − 60)(100 − R); 0 ≤ R ≤ 100 (7.5)
MOS = 4.5; R > 100
Typically, the R factor values are categorized as shown in Table 7.1 [29]:
where D(K) is the OWD of a packet K of size L δ represents the propagation delay,
σ is the processing delay, s is the number of hops, L/Ch is the transmission delay and
Xh (t) is the queuing delay of a packet Kof size L at hop h(h = 1, . . . , s) with capacity
Ch [33].
7.3.6 Jitter
When voice packets are transmitted from source terminal (sender) to destination
terminal (receiver) over IP networks, packets may experience variable delay, called
jitter. The packet inter-arrival time (IAT) on the receiver side is not constant even if the
packet inter-departure time (IDT) on the sender side is constant. As a result, packets
arrive at the destination terminal with varying delays (between packets) referred to
as jitter [34]. The difference between arrival times of successive voice packets that
arrive on the receiver side is measured according to RFC 3550 [35]—this is illustrated
in Figure 7.5. This figure shows the jitter measurement between the sending packets
and the receiving packets.
Accurate modeling of VoIP traffic in modern communication 189
Sender Receiver
SK –3 SK –2 SK –1 SK RK –3 RK –2 RK –1 RK
IDT(K,K–3)=SK – SK –1 IAT(K,K–1)=RK – RK –1
Let SK be the RTP timestamp and RK be the arrival time in RTP timestamp units
for packet K. Then, for two packets K and K − 1, the OWD difference between two
successive packets, K and K − 1 is given by the following equation [34]:
J (K) = (RK − SK ) − (RK−1 − SK−1 ) = (RK − RK−1 ) − (SK − SK−1 )
= IAT(K) − IDT(K) (7.7)
IAT(K) = J (K) + IDT(K) (7.8)
where IDT(K, K − 1) = (SK − SK−1 ) is the IDT and IAT(K, K − 1) = (RK − RK−1 )
is the IAT or arrival jitter for the packets K and K − 1. In the current context,
IAT(K, K − 1) is referred to as jitter [34]
On the other hand, when voice packets are transported over IP networks, they
may experience delay variations and packet loss. From (7.8), a relationship between
jitter and packet loss can be established using the following equations [34]:
If packet K − 1 is lost,
IAT(K) = J (K) + (2) · IDT(K) (7.9)
Therefore, if n consecutive packets are lost,
IAT(K) = J (K) + (n + 1) · IDT(K) (7.10)
Therefore, (7.10) describes the packetloss effects in the VoIP jitter.
Found
S2
p21 p12
S1
Lost
quality tests is random or Bernoulli-like packet loss. Random loss here means inde-
pendent loss, implying that the loss of a particular packet is independent of whether
or not previous packets were lost. However, random loss does not represent the loss
distributions typically encountered in real networks. For example, losses are often
related to periods of network congestion. Hence, losses may extend over several pack-
ets, showing a dependency between individual loss events. In this chapter, dependent
packet loss is often referred to as bursty. The packet loss is bursty in nature and
exhibits temporal dependency [36]. As noted earlier in the introduction section, if
packet n is lost then normally there is a high probability that packet n + 1 will also be
lost. Consequently, there is a strong correlation between consecutive packet losses,
resulting in a bursty packetloss behavior. Hence, this temporal dependency can be
effectively modeled by a finite Markov chain [3637].
Let S = S1 , S2 , . . . , Sm be the m states of an m-state Markov chain and let pij be
the probability of the chain to pass from state Si to the state Sj . The probabilities of
transitions between states can be represented by the transition matrix P[2]:
⎡ ⎤
S1 S2 · · · Sm
⎢ S1 S2 · · · S m ⎥
⎢ ⎥
P=⎢ . .. . . .. ⎥ (7.11)
⎣ .. . . .⎦
S1 S2 · · · Sm
such that S1 + S2 + · · · + Sm = 1
In the twostate Markov chain (see Figure 7.6), one of the states (S1 ) represents
a packet loss and the other state (S2 ) represents the case where packets are correctly
Accurate modeling of VoIP traffic in modern communication 191
Found in bad
S4
state
S3 Lost in bad
state
p32 p23
S2 Found in good
state
Good
p21 p12
state
S1 Lost in good
state
The two two-state chains can be described by four independent transition prob-
abilities (two for each one). Two further probabilities characterize the transitions
between the two two-state chains leading to a total of six independent parameters for
this particular four-state Markov chain [2].
In the four-state Markov chain, states S1 and S3 represent packets lost, S2 and S4
packets found, and six parameters (p21 , p12 , p43 , p34 , p23 , p32 ∈ (0, 1)) are necessary to
define all the transition probabilities. The four steady-state probabilities of this chain
are [38]
1
S1 = (7.13)
1 + (p12 /p21 ) + (p12 p23 /p21 p32 ) + (p12 p23 p34 /p21 p32 p43 )
1
S2 = (7.14)
1 + (p21 /p12 ) + (p23 /p32 ) + (p23 p34 /p32 p43 )
1
S3 = (7.15)
1 + (p34 /p43 ) + (p32 /p23 ) + (p21 p32 /p12 p23 )
1
S4 = (7.16)
1 + (p43 /p34 ) + (p32 p43 /p23 p34 ) + (p21 p32 p43 /p12 p23 p34 )
The probability of the chain to be either in S1 or in S3 , which corresponds to PLR, is
then r = S1 + S3 [38].
Discrete self-similarity: Let Xt = (Xt ; t ∈ N) denote a discrete time series with mean
μX , variance σX2 , autocorrelation function r (k), and ACV γ (k), k ≥ 0, where Xt
can be interpreted as the jitter, at time instance t.
When considering discrete time series, the definition of self-similarity is given
in terms of the aggregated processes, as following:
(m) (m)
Xk = (Xk ; k ∈ N ) (7.17)
(m)
where m represents the aggregation level and Xk
is obtained by averaging the
(m)
original series Xt over nonoverlapping blocks of size m, and each term Xk is
given by
1
km
(m)
Xk = Xi ; k = 1, 2, 3, . . . (7.18)
m i=(k−1)m+1
σX2
γXm (k) = ((k + 1)2H − 2k 2H + (k − 1)2H )k ≥ 1 (7.21)
2
The time series Xt is called asymptotically second-order self-similar if
σX2
lim γ m (k) = ((k + 1)2H − 2k 2H + (k − 1)2H ) (7.22)
m→∞ 2
Second-order self-similarity (in the exact or asymptotic sense) has been a dominant
framework for modeling IP traffic.
So far, the role of second-order self-similarity has been discussed but not much
has been mentioned about the role of H and limiting values. The definition of LRD
and its interconnection with the correlation factor r(k) will now be discussed.
Let r(k) = γ (k)/σX2 be the autocorrelation function of Xt with self-similarity
parameter 0 < H < 1, H = 1/2, then the asymptotic behavior of r(k) is given by the
following equation:
r(k) ∼ H (2H − 1)k 2H −2 k → ∞ (7.23)
−η
In particular, if 1/2 < H < 1, r(k) asymptotically behaves as ck for 0 < η < 1,
where c > 0 is a constant,
η = 2 − 2H and this also means that the correlations
are nonsummable: ∞ k=−∞ r(k) = ∞. That is, the autocorrelation function decays
slowly. When r(k) obeys a power-law, the corresponding stationary process Xt is
calledlong-range dependent. On the other hand, Xt is short-range dependent if the
sum ∞ k=−∞ r(k) < ∞ does not diverge.
194 Modeling and simulation of complex communication networks
Generally speaking, time series with LRD has a Hurst parameter 0.5 < H < 1; on
the other hand, time series with shortrange dependence (SRD) has a Hurst parameter
0 < H < 0.5
Following are some simple facts regarding the value of H and its impact on
γ (k) [45].
1, k = 0
1. γ (k) = for H = 0.5. This is the well-known property of white
0, k = 0
Gaussian noise.
2. γ (1) < 0 for 0 < H < 0.5
3. γ (1) > 0 for 0.5 < H < 1
1
σX2 = · var(CX2,1,t ) (7.29)
1 − 22H −2
Then, the variance of the ith component is related to the variance of Xt as follows:
var(CX2,i,t ) = (1 − r) · r i−1 · σX2 (7.30)
where
r = 22H −2 (7.31)
Accurate modeling of VoIP traffic in modern communication 195
The plot log2 [var(CX2,i,t )] vs. i is equivalent to the wavelet-based diagram proposed
in [46], the logscale diagram (LD); i.e., var(CX2,i,t ) = (E|dX (j, ·)|2 /2j ) when using the
Haar family of wavelet basis functions ψj,k (t) = 2−j/2 ψ0 (2−j t − k) (see [47]) where:
⎧
⎨ +1 0 ≤ t < 1/2
ψ0 (t) = −1 1/2 ≤ t < 1 (7.32)
⎩
0 otherwise
For a finite-length time series with “L” octaves, the number of octaves (j) of the LD
is related to index i of (7.24), according to (7.33).
j=i (7.33)
The LD of an exactly self-similar time series is a straight line. Hence, a linear
regression can be applied in order to estimate the Hurst index.
LAN A LAN B
Local cable ISP network (link speed –3MB) CINVESTAV GDL network (link speed–2MB)
Gatekeeper
Gatekeeper
A1 A2 A3 A4 B1 B2 B3 B4
impairments. Several studies have found that self-similarity and LRD can have a neg-
ative impact on the IP traffic, because they give rise to great losses and/or delays. For
this reason, it is important to analyze the correlation structures (SRD and LRD)
of the VoIP traffic. The Hurst parameter is used to measure the degree of self-
similarity and LRD. Generally speaking, time series with LRD has a Hurst parameter
of 0.5 < H < 1; on the other hand, time series with SRD has a Hurst parameter of
0 < H < 0.5
Unlike other statistics, the Hurst parameter, although mathematically well
defined, cannot be estimated unambiguously from real-world samples. Therefore,
several methods have been developed in order to estimate the Hurst parameter. Exam-
ples of classical estimators are those based on the R/S statistics [51] (and its unbiased
version [52]), detrended fluctuation analysis [52,53], maximum likelihood (ML) [54],
aggregated variance (VAR) [51], wavelet analysis [46], etc. In [55], Clegg developed
an empirical comparison of estimators for data in raw form and corrupted in various
ways. An important observation is that the estimation of the Hurst parameter may
Accurate modeling of VoIP traffic in modern communication 197
0.8
0.7
0.6
Hurst parameter (H)
0.5
0.4
0.3
0.2
0.1
0
1 2 4 8 16 32 64 128
Aggregation level (m)
G.711_10 ms G.711_20 ms G.711_40 ms G.711_60 ms
Figure 7.9 Hurst parameter for VoIP jitter data traces with SRD
differ from one estimator to another, which makes the selection of the most adequate
estimator a difficult task. It seems to depend on how well the data sample meets the
assumptions the estimator is based on. However, through analytical and empirical
studies, it has been discovered that the estimators, that have the best performance in
bias and standard deviation, and, consequently, in mean squared error (MSE), are the
Whittle ML and the wavelet-based estimator proposed by Veitch and Abry in [46].
From these two estimators, the wavelet-based one is computationally simpler and
faster [46,51].
Motivated by the above inferences, and following the methodology proposed in
[56] to find correlations and LRD, the Hurst parameter is estimated by the wavelet-
based estimator [46] of jitter data traces as a function of the aggregation level m
(m = {1, 2, 4, 8, 16, 32, 64, 128}). Figures 7.9 and 7.10 show the Hurst parameter of
representative jitter data traces to different aggregation levels m. Generally, Hurst
parameters larger than 0.5 for all aggregation levels are a strong indication of LRD.
It can be observed from Figure 7.9 that a set of jitter data traces has Hurst parameters
larger than 0.5 for all aggregation levels. This indicates a high degree of LRD. In
contrast, the other sets of jitter data traces shown in Figure 7.10 have Hurst parameters
lower than 0.5. These results are thus not a strong indication of LRD. This indicates
that the ACVs decay quickly to zero, indicating no memory property (SRD).
These results show that VoIP jitter exhibits self-similar characteristics with SRD
or LRD; therefore, a self-similar process can be used to model the jitter behavior.
The discovery of LRD and weak self-similarity in the VoIP jitter data traces
was followed by a further work that shows the evidence for multifractal behavior.
198 Modeling and simulation of complex communication networks
1.2
1.1
1
Hurst parameter (H)
0.9
0.8
0.7
0.6
0.5
0.4
1 2 4 8 16 32 64 128
Aggregation level (m)
G.711_10 ms G.711_20 ms G.711_40 ms G.711_60 ms
Figure 7.10 Hurst parameter for VoIP jitter data traces with LRD
The discovery of evidence for multifractal behavior is a richer form of scaling behavior
associated with nonuniform local variability, which could lead to a complete and
robust model of VoIP traffic over all time scales of engineering interest.
In order to accomplish this analysis, we decomposed the time series of VoIP jitter
into a set of time series or components CX2,i,t as it is defined in (7.24). The behavior
of these components is used to determine the kind of asymptotic fractal scaling. If
the variance of the components of a time series is modeled by a straight line, the
time series exhibits monofractal behavior. Then, a linear regression can be applied
in order to estimate the Hurst parameter. On the other hand, if the variance of the
components cannot be adequately modeled with a linear model, the scaling behavior
should be described with more than one scaling parameter, i.e., the time series exhibits
multifractal behavior [57]. In Figures 7.11 and 7.12, we show the component behavior
of the collected VoIP jitter data traces.
Figure 7.11 shows the component behavior of VoIP jitter data traces that belong to
the data sets with SRD. It is observed that the variance of the components of this time
series is modeled by a straight line; therefore, the time series exhibits monofractal
behavior.
Figure 7.12 shows the component behavior of VoIP jitter data traces that belong to
the data sets with LRD. It is observed that the variance of the components of this time
series cannot be adequately modeled with a linear model, and the scaling behavior
should be described with multiple scaling parameters (biscaling). Therefore, this time
series exhibits multifractal behavior.
Accurate modeling of VoIP traffic in modern communication 199
4
LD-diagram
H=0.43
2
–2
log2[var(Ci)]
–4
–6
–8
–10
–12
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
i
Figure 7.11 Component behavior of VoIP jitter data traces: monofractal behavior
8
LD-diagram
H1=0.43
H2=1.11
6
4
log2[var(Ci)]
–2
–4
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
i
Figure 7.12 Component behavior of VoIP jitter data traces: multifractal behavior
200 Modeling and simulation of complex communication networks
PLR [%]
Macroscopic packet loss behavior
PLR3
PLR1
PLRT
PLR5
PLR4
PLR2
W1 W2 W3 W4 W5 Time [s]
WT
These results show that VoIP jitter with SRD and LRD exhibit monofractal and
multifractal behavior, respectively. This phenomenon explains the behavior of the
data traces with SRD and high degree of self-similarity (scale invariance), because
the self-similarity is defined for a single-scale parameter. On the other hand, the data
traces with LRD exhibit weak self-similarity because of having associated nonuniform
local variability (multifractal behavior).
0
0 1 2 3 4 5 6
(a) Samples ×104
0
0 1 2 3 4 5 6
×104
(b) Samples
Figure 7.14 Packet-loss patterns from VoIP test calls: (a) homogeneous PLR and
(b) nonhomogeneous PLR
The threshold used to delimit between a good state (low level of packet loss) or
bad state (high level of packet loss) is a function of the perceived quality, good or
poor, respectively, according to the computed MOS values.
In Figure 7.14(a) and (b), the microscopic period with lower PLR is delimited by
the solid square, while the microscopic period with higher PLR is delimited by the
dashed square.
FOR n = 2 to N
IF (P[n] = 1)
X [n] = X [n] + X [n − 1]
END IF
END FOR
i=1
FOR n = 2 to N
IF (P[n] = 1)
X̂ [i] = X [n − 1]
i =i+1
END IF
END FOR
0.9
Hurst parameter (H)
0.8
0.7
0.6
0.5
0.4
0.3
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Packet-loss rate (%)
The figure shows the empirical functions f (PLR τ , Hτ ) that were obtained from
simulation results and the function fREAL (PLR ε , Hε ).
The functions f (PLR τ , Hτ ) resulted from applying T packet-loss patterns to rep-
resentative VoIP jitter data traces Xt . In these functions, each point represents the
PLR τ and Hτ of a particular new time series X̂tτ .
The function fREAL (PLR ε , Hε ) is generated by “E” jitter data traces. In this func-
tion, each point represents the PLR ε and Hε of a particular jitter data trace Xtε , where
t = 1, . . . , N , ε = 1, 2, . . . , E, and “E” is the number of representative jitter data
traces used.
204 Modeling and simulation of complex communication networks
where HM is the H parameter of the model found; Ĥ0 , â, and b̂ are the fitted parameters;
Ĥ0 is the H parameter when PLR = 0. The fitted parameters are estimated by linear
regression. The strategy to find Ĥ0 , â, and b̂ is such that it minimizes the MSE, i.e.,
MSE = r (Ĥ0 + âr b̂ − Hτ )2 dr, and the validity of the proposed model corresponds
to those ranges of r = PLR (e.g., 0%–4%).
7.6 Conclusions
The fast development and evolution of communication networks and the emergence of
enhanced services (e.g., VoIP) that can be offered to the end users have been captured
in the concept of modern communication network. Modern communication networks
are complex in nature and have the property of scale-free, meaning that the numbers
of nodes and connections increases continuously and congestions occur accordingly.
Congestion is a cause of impairment in real-time multimedia applications, such as
VoIP. The QoS level for a VoIP application depends on many parameters; however,
jitter and packet loss have an important impact on the QoS. However, an efficient
network design can be achieved by using accurate models.
The current chapter presents the jitter and packet-loss modeling of VoIP traffic
by means of network measurements and could be useful both for today’s networks
and future networks supporting VoIP/mVoIP technologies.
Accurate modeling of VoIP traffic in modern communication 205
References
[1] A.-S.K. Pathan, M.M. Monowar, and Z.M. Fadlullah, Building Next-
Generation Converged Networks: Theory and Practice USA: CRC Press
Taylor & Francis Group, 2013, 337–360.
[2] H. Toral-Cruz, A.-S.K. Pathan, and J.C.R. Pacheco, Accurate modeling of VoIP
traffic QoS parameters in current and future networks with multifractal and
Markov models, Mathematical and Computer Modelling Journal, 57 (11–12)
(2013): 2832–2845.
[3] D. Geer, The future of mobile VoIP in the enterprise, IEEE Computer, 42 (6)
(2009): 15–18.
[4] S.K. Chui, O.-C. Yue, and W.C. Lau, impact of handoff control messages on
VoIP over wireless LAN system capacity, Proceedings of the 14th European
Wireless Conference (EW), 22–25 June 2008, 1–5.
[5] L. Kocarev and G. Vattay, Complex Dynamics in Communication Networks.
Germany: Springer-Verlag Berlin Heidelberg, 2005.
[6] A. Nogueira, P. Salvador, R. Valadass, and A. Pacheco, Modeling network traf-
fic with multifractal behavior, Telecommunication Systems, 24 (2–4) (2003):
339–362.
[7] O.I. Sheluhin, S.M. Smolskiy, and A.V. Osin, Self-Similar Processes in
Telecommunications. England: John Wiley & Sons, Ltd, 2007.
[8] C. Larsson, Design of Modern Communication Networks: Methods and
Applications. The Netherlands: Academic Press, 2014.
[9] J.F. Kurose and K. W. Ross, Computer Networking: A Top-Down Approach.
USA: Addison-Wesley, 2010.
[10] T. Janevski, Traffic Analysis and Design of Wireless IP Networks. USA: Artech
House, Inc., 2003.
[11] H. Hanrahan, Network Convergence: Services, Applications, Transport, and
Operations Support. England: John Wiley & Sons, Ltd, 2007.
[12] J. Jo, G. Hwang and H. Yang, Characteristics of QoS Parameters for VoIP
in the Short-Haul Internet, Proc. International Conferences on Info-tech
and Info-net (ICII), IEEE, Beijing, China, 29 October–1 November 2001,
pp. 498–502.
[13] R. Albert and A.-L. Barabási, Topology of Evolving Networks: Local Events
and Universality, Physical Review Letters, 85 (24) (2000): 5234–5237.
[14] ITU-T Recommendation G.1020, Performance Parameter Definitions for
Quality of Speech and Other Voiceband Applications Using IP Networks, 2006.
[15] L. Estrada-Vargas, Self-similar time series: analysis and
modeling with applications to VoIP, Ph.D. Thesis, Electrical Engineering,
Telecommunication Section, CINVESTAV, Guadalajara, Jalisco, Mexico,
2015.
[16] S. Madhani, S. Shah, A. Gutierrez, Optimized Adaptive Jitter Buffer Design
for Wireless Internet Telephony, Proc. Global Telecommunications Confer-
ence (GLOBECOM), IEEE, Washington, D.C., USA, 26–30 November 2007,
pp. 5248–5253.
206 Modeling and simulation of complex communication networks
[36] O. Hohlfeld, Stochastic Packet Loss Model to Evaluate QoE Impairments, PIK
Journal, 1 (2009): 53–56.
[37] G. Haßlinger and O. Hohlfeld, The Gilbert-Elliott Model for Packet Loss in
Real Time Services on the Internet, Proc. 14th GI/ITG Conference on Mea-
suring, Modelling and Evaluation of Computer and Communication Systems
(MMB), 31 March 2008–2 April 2008, pp. 269–286.
[38] H. Toral-Cruz, D. Torres-Román, and L. Estrada-Vargas, “Analysis and Mod-
eling of QoS Parameters in VoIP Traffic”, Chapter 1 in Advancements in
Distributed Computing and Internet Technologies: Trends and Issues. (A.-S.K.
Pathan, M. Pathan and H. Y. Lee eds.), Hershey, PA, USA: IGI Global, 2011
pp. 1–22.
[39] T. Janevski, Traffic Analysis and Design of Wireless IP Networks. USA: Artech
House Publishers, 2003.
[40] W.E. Leland, M.S. Taqqu, W. Willinger and D.V. Wilson, On the self-similar
nature of Ethernet traffic (extended version), IEEE/ACM Transactions on
Networking (TON), 2 (1) (1994): 1–15.
[41] K. Park and W. Willinger, Self-Similar Network Traffic and Performance
Evaluation. USA: John Wiley & Sons, Inc., 2000.
[42] K.M. Rezaul and V. Grout, A Survey of Performance Evaluation and Control for
Self-Similar Network Traffic, Proc. of the Second International Conference
on Internet Technologies and Applications (ITA), 4–7 September 2007, pp.
514–524.
[43] G. Samorodnitsky, Long range dependence, Foundations and Trends in
Stochastic Systems, 1 (3) (2007): 163–257.
[44] G. Zhang, G. Xie, J. Yang and D. Zhang, Self-Similar Characteristic of Traffic
in Current Metro Area Network, Proc. of the 15th IEEE Workshop on Local
and Metropolitan Area Networks, 10–13 June 2007, pp. 176–181.
[45] J. Gao, Multiscale Analysis of Complex Time Series: Integration of Chaos and
Random Fractal Theory, and Beyond. USA: Wiley-Interscience, 2007.
[46] D. Veitch and P. Abry, A wavelet based joint estimator for the parame-
ters of LRD, IEEE Transactions on Information Theory, 45 (3) (1999):
878–897.
[47] M.V. Wickerhauser, Adapted Wavelet Analysis from Theory to Software. USA:
IEEE Press, 1994, 213–235.
[48] Advanced Information CTS (Centro de Tecnología de Semiconductores)
Property, Alliance FXO/FXS/E1 VoIP System, www.cts-design.com
[49] Wireshark: A Network Protocol Analyzer, http://www.wireshark.org/
[50] H. Toral-Cruz, QoS Parameters Modeling of Self-similar VoIP Traffic and an
Improvement to the E Model, PhD. Thesis, Electrical Engineering, Telecom-
munication Section, CINVESTAV, Guadalajara, Jalisco, Mexico, 2010.
[51] H.-D. J. Jeong, J.-S. R. Lee, and K. Pawlikowski, Comparison of various
estimators in simulated FGN, Simulation Modelling Practice and Theory, 15
(9) (2007): 1173–1191.
[52] J. Mielniczuk and P. Wojdyllo, Estimation of Hurst exponent revisited,
Computational Statistics & Data Analysis, 51 (9) (2007): 4510–4525.
208 Modeling and simulation of complex communication networks
This book chapter discusses two of the agent-based modeling (ABM) levels, i.e.
exploratory agent-based modeling (EABM) and validated agent-based modeling
(VABM). In first part of this chapter, we shall briefly explain EABM with the help of
a case study of 5G networks modeled in an agent-based simulator called NetLogo [1]
of the use cases of 5G networks is Internet of Things (IoT). We designed, implemented
and experimented this case study to explore the futuristic approaches to ease the imple-
mentation of this under-developing 5G network which still needs to be explored. Next,
we discuss another important level of modeling, i.e., VABM. Since ABM approach
has turned into an attractive and efficient way for displaying large-scale complex sys-
tems, verification and validation (V&V) of these models have become questionable.
Here, we shall briefly explain VABM with the help of the same case study of 5G
networks modeled as in EABM. Using VABM, we shall validate the case study if it
is a credible solution.
8.1 Introduction
We have entered into a world of Big Data where several devices communicate with
each other. They generate their own data in large quantity when communicating with
each other. Today’s devices generate and accumulate huge amounts of data to their
databases. ABM is a powerful way to put that data to work. An agent-based model
featuring individuals can use real properties and behaviors taken from their databases.
The results deliver refined optimization by providing a precise, easy, and up-to-date
way to model, forecast and compare scenarios. One of uprising issues is modeling
and understanding new network domains such as 5G networks. 5G networks is still
a naive domain as it lacks standards and needs more exploration. Here, ABM can be
of great help especially for complex networks where the dynamics of a system are
nonlinear. In this chapter, first, we discuss ABM framework. ABM framework assists
complex adaptive system (CAS) modeling in four levels. We discuss a brief definition
1
Riphah Institute of System Engineering, Riphah International University, Pakistan
2
COMSATS Institute of IT, Pakistan
210 Modeling and simulation of complex communication networks
of each level. Next, we discuss one of the levels of ABM: EABM in detail. We evaluate
EABM on one of the case studies of 5G networks, i.e., IoT using ABM simulator.
In the next section, we highlight the importance of modeling in agent-based simulator.
We also provide a brief comparison between different simulators commonly used by
researchers. For modeling our 5G case study of IoTs, we have used NetLogo [1]. We
evaluate different research queries to provide a proof of concept. Next, we discuss
the case study in depth, its modeling approach, design and implementation. We shall
then discuss the results. Lastly, we conclude this section of EABM.
1. Complex network modeling (CoNeM) level is for developing models using inter-
active data of various system components. This level of the framework involves
the use of complex networks to model, visualize, simulate and analyze any CAS.
2. EABM level is for developing agent-based models for assessing the feasibil-
ity to model for further research. This can, e.g., be useful for developing
proof-of-concept models such as for funding applications without requiring an
extensive learning curve for the researchers. The EABM modeling paradigm
allows researchers to experiment and develop proof-of-concept models of CAS
with the goal of performing experimentation for improving understanding about a
particular real-world complex system. Section 8.1.1.1 discusses EABM in detail.
3. Descriptive agent-based modeling (DREAM) is for developing descriptions of
agent-based models by means of using templates and complex network-based
models. Building DREAM models allows model comparison across scientific
disciplines. DREAM allows researchers to develop semi formal, formal or
pseudo-code-based specifications coupled with complex network representations
of agent-based models allowing models to be better described for communication
across disciplines without requiring the same usage of terminology.
4. VABM using virtual overlay multi-agent system (VOMAS) is for developing a
verified and validated model in a formal manner. VOMAS involves the creation
of a VOMAS for ensuring the validity of the simulation model by checking its
conformance with the real world. Section 8.1.5 discusses VABM and VOMAS
in detail.
gives a brief overview of EABM features. Most agent-based models start out as
exploratory in nature. Some examples of EABM studies include work by Palmer
et al. for modeling artificial economic life [5], Becu et al. for modeling catchment
water management [6], work by Holland [7] and by Premo for ethnoarchaeology [8] to
work using ABM for modeling AIDS spread [9]. Other examples of exploratory agent-
based modeling include a simulation of how research is considered as an emergent
phenomenon as presented earlier in [10]. EABM explores wide range of possible
explanations. It uses an understanding of pattern found in a case study data, collected
from control, repeatable experiments that can be tested. With EABM, more testing
and experiments can be done to explore the behavior of a system. A nonlinear system
dynamic is studied and analyzed to evolve a model created. This will help in making
new discoveries.
● It is difficult to actualize and test all situations for large-scale and complex net-
works. Can EABM be used to analyze such networks with such a huge number
of devices?
● How metrics impact such large-scaled networks especially on new domain
networks?
● Can we use EABM for providing a proof of concept for such domains?
● How a particular problem can be addressed with such networks?
● How can we investigate the impact of an algorithm on communication
performance?
● Can we implement EABM on more of such case studies?
1
1
1
4. On receiving the message agent, the neighbor devices can subsequently forward
the message based on the policy of the message agent, which will be described
below. Next, we discuss the design of the message agents.
Message agents
Message agents are of two types:
1. SACS setup message agents
2. Query message agents
Message agents forward messages to other devices. We describe both types of message
agents as the following: (1) They help in establishing gradient value. SACS setup
agents have a simple task to perform. Once they are on a specific device, they use the
algorithm described in Figure 8.4 to hop from one device to the next. This is done by
means of counting the distance (radius) by means of a hop count. As a result when
the hop value reaches the end of the SACS radius, it implies the end of the line for
the SACS setup process. However, during each hop, the agents also communicate
their respective hop count value to the computing device agent, on which they are
currently located. As such, the computing device agent stores this information locally
for future queries. It is important to note here that SACS gradient can be established
from a smaller radius to a large radius, where the radius here would be used in
terms of communication hop counts. In the case of smaller radii, there are chances
that subsequently when queries are looking for a particular content source, they might
spend a long time hopping randomly before they can locate a particular SACS gradient
“scent,” i.e., the hop value flags left over by the SACS setup message agent in the
initial phase. (2) The second type of message agents is the query agent type. The
goals of the query message agents are different from the SACS setup message agents.
The query agents are initiated by a user logged on to one of the computing devices.
As such, the goal of the query agent is to look for information. Now, the interesting
thing to note here is that according to the small world complex network theory, a large
number of real-world networks have nodes which can locate most other nodes using
a small number of hops. As such, when an owner of a set of devices is looking for
specific information, chances are that the query would be able to locate this content
on one of the computers locally or within a few hops away from the person. Query
agents are thus responsible for content location discovery based on the “smell” of the
Exploratory and validated agent-based modeling levels case study 217
prob 48 sacs-radius 50
Probability SACS Range
Setup
n-gw 10 sens-radius 50
Searches
Number of Gateway nodes Devices Communication Range
Content Sources
ttl? 12 gw-cost 51
content discovered by means of the flag values dropped by the previous SACS setup
messages during the initial setup phase.
8.1.3.2 Implementation
We simulate this model in NetLogo. Below, we describe all the functions and com-
ponents of the implemented model. For implementing this model, we need to declare
(1) global variables that is metrics declared for all the procedures and can be called
by any procedure or breed. Then we define (2) breeds, i.e., agentsets, which can be
assigned procedures, e.g., in our models, it can be devices, messages. Next we shall
explain the (3) procedures or functions assigned to the defined breeds.
Global variables
We have several globally declared variables which include many input (from sliders)
and output variables such as counters as shown in Figure 8.5.
Breeds
We have created two main breeds, computing devices (i.e., breed devices device) and
message agents (i.e., breed messages message).
218 Modeling and simulation of complex communication networks
Procedures
1. Setup: Setup procedure first clears the older simulation. Creates new and gate-
way devices and adjust the location of the devices. Setup algorithm is shown in
Figure 8.6.
2. Setup-devices: This function invokes the patches in the simulation. First of all,
each patch creates a random number between 0 and 100. Next, this number is
compared with the probability assigned via a global input variable. Based on a
comparison of these two numbers, the patch might sprout a device at this location.
Next, the device is initialized with certain values. Initially, the agent is given an
unexplored status. The SACS distance is assigned equal to the SACS radius
input global variable. Initially, all nodes are given “false” as the Boolean value
for both the start as well as the goal and the gateway variables. The shapes of the
computing devices are next adjusted to be random (one-of shapes), and they are
slightly randomly moved to ensure that most devices will not overlap previous
devices.
3. Setup-gateway: This function is concerned with creating the gateway nodes from
the previously created computing device node agents. The working is based on a
random selection of agents from these devices. The number of agents which are to
be created as the gateway nodes is based on a global input variable. After making
the node as gateway, it changes its color. To show it here, we have encircled the
gateways in Figure 8.7.
4. No-overlapping: This function adjusts each device location so that they do not
overlap each other.
5. Searches: This function calls Do-Search function and the input is taken from the
slider “Number of Searches.”
6. Do-Search: This function selects a device and checks it should not be a gateway.
Then it calls “Create-Search-Node” function.
7. Create-Search-Node: This function creates a search device. It sets the Boolean
variable to 1 which means start. This device becomes the starting point of the
query. It is then set to blue color. the location is saved in a temporary value.
It then executes the hatching of k-number of query agents taken as input from
slider. For each of the query agents, it calls a “Setup Query” function.
8. Setup query: This assigns values to the query agents. Setups the query agents
shape to circle and color green. This increments the total query count global
variable.
9. Content sources: This function creates the content sources, taking the number
from the slider “Number of Content Sources.”
Exploratory and validated agent-based modeling levels case study 219
10. Create content sources: This function selects a device node that is neither a
gateway nor has a query, then it calls “Create Content Source Node” function.
11. Create content source node: Selected device node is set to true and color is set to
red.
After calling setup, searches and goals functions, the screen now looks as shown
in Figure 8.7.
12. Setup SACS: It locates all the content sources nodes. Next it calls “Setup-SACS-
d” function.
13. Setup-SACS-d: This function is repeatedly called on agents until the gradient is
established. It stops if there are no devices in its range. Otherwise, it will also stop
if the calling argument “d” is greater than the global variable “SACS-radius.” If
the SACS-distance is non-zero, it changes the color of the node to gray as shown
in Figure 8.8. To highlight the connections, we show it with links. This way, the
content sources are the only ones which will visually stand out from the other
nodes. After setting these basic attributes, it compares the current argument “d”
with the SACS-radius global variable. If d is less than or equal to this value, it
creates a new agent set. This agent set is formed of all other nodes in a certain
communication radius but is based on a condition of being unexplored till now.
The communication radius is again a configurable value “sens-radius.” Now,
this agent set is not guaranteed to be non-empty so it is tested for emptiness. If
non-empty, each of these agents is asked to execute the same function again but
with an incremented “d” value. Thus, this process can continue till the SACS
gradient is properly setup around all SACS content sources reflecting how the
content sources can be self-advertising their contents. Afterwards, the simulation
screen can be observed to reflect this setup as shown in Figure 8.9.
220 Modeling and simulation of complex communication networks
14. Move and move (forever): If any queries are there, it will continue by asking
them to execute move-RW function, otherwise it will terminate. Move is for one
tick and move (forever) is for continuous ticks.
15. Move-RW: The function executes by first creating a list of all nodes in the given
sensing radius. This radius is basically equal in physical terms to the communi-
cation radius. In case there are no other devices in range, the query agent will
simply die. If there are other devices, the next step is choosing one of the nodes
as the next location. This is performed differently based on what is the selected
Exploratory and validated agent-based modeling levels case study 221
mode of movement of queries from the user interface. Based on the Boolean vari-
able “SACS?”, the next node is either a random node in the case of a false value
for this variable, or else one of the nodes with minimum of “SACS-dist” variable.
Note that this represents the gradient previously established by the SACS nodes.
As such, once this node is selected, the query agent can first move to the selected
node. Next, it can update its internal variables such as TTL value by decrement-
ing it and also storing the tagged location node inside the “loc” variable. After
moving and updating these values, it needs to calculate the cost associated with
this move. If the location is a gateway node, then the move will incur cost equal
to “gw-cost,” a user input configurable variable, otherwise the cost will simply
be incremented by 1. Finally, before this function terminates, a call is made to
the CheckCS function, which is explained next.
16. CheckCS: This function is primarily for goal verification for queries. In this
function, the query checks whether it has reached either one of a gateway node
or else a content source. If the current location agent satisfies either of these
conditions, then the overall number of successful queries is incremented and
then the query agent terminates itself. If it has not reached the goal nodes, then
again it checks its TTL value. In case, the TTL value has reached zero, then again
it dies.
17. Move devices: This function is to test the effects of mobility of devices on SACS.
It randomly moves a percentage of the devices over time, and when the devices
are out of range, it calls setup-SACS again to reset the SACS setup messages.
However, here the total number of queries will be constant and instead the cost
of communication as well as the number of successful queries will be evaluated in
depth.
Experiment 2
We vary the both SACS value and TTL value. We repeat these simulation for 1,000
while repeated each simulation for 50 times. Table 8.4 shows the statistics.
Experiment 3
We vary the SACS value while TTL is kept constant. We repeat these simulation for
250 while repeated each simulation for 50 times. Table 8.5 shows the statistics.
Experiment 4
We vary the both SACS value and TTL value. We repeat these simulation for 1,000
while repeated each simulation for 50 times. Table 8.6 shows the statistics.
224 Modeling and simulation of complex communication networks
550
520
500
450
400 400
350
300
250
200
150
100
50 32
0
0 10 20 30 40
SACS-radius
8.1.4.4 Discussion
The first set of results is the details of variation of SACS-radius and its effects on the
successful queries. In Figure 8.10, there are five values for the SACS-radius ranging
from 0 to 40. Here, the value 0 implies that the SACS is not used. As SACS radius is
20, the number of successful queries increase drastically and keeps on increasing as
the SACS radius increases.
In next experiment, the results are covered in Figure 8.11. Here, we increase
SACS radius and see the effect on cost of execution of the queries. Number of queries
is extremely too much when SACS is not used but decreases as we start using it and
keeps on decreasing as the SACS radius increases.
Next, in Figure 8.12, we experimented TTL value against Number of queries
cost. We experiment to see if TTL causes the rise in the number. of successful queries
or not and also we want to see the effect of this. The data is color coded based on
the SACS radius. We see the number of queries/cost is high when no SACS is used.
Higher TTL values give more SACS radius values.
We plot a graph Figure 8.13 using mean successful queries vs. TTL value. Now,
we can note here that for TTL = 10, SACS appears to have the best possible successful
query values. The differences between all SACS values are initially very small for
Exploratory and validated agent-based modeling levels case study 225
9500
9000
95% CI Number of queries
8500
8000
7,768
7500
7,134
7000
6,500
6500 6,388
6000
5500
5000
0 10 20 30 40
SACS-Radius
20000
SACS radius
16000 0
10
Number of queries
20
30
12000 40
8000
4000
0 10 20 30 40
TTL value
477 470
450
400 383 389
358
350 327 335
298
300
270
254
250
200 190 195
150
100
70
50 32
0
10 20 30 40
TTL value
9000 8765
8000 7568
95% CI number of queries
7034
7000 6700
6000
5000
4000
3000
2000
1000
0
0 10 20 30 40
SACS-radius
20000
SACS radius
16000 0
10
20
Number of queries
30
40
12000
8000
4000
0 10 20 30 40
TTL value
457
450 442
404
399 392
400 376 381
368
350
300
250
200
150
114
100 98
78
57
50
0
10 20 30 40
TTL value
Group 1 Group 2 Group 3 Group 4
8.1.5 Conclusion
This section gives an in-depth detail of exploratory agent-based model with a case
study of 5G network modeled over ABM simulating tool: NetLogo. We discussed
exploratory ABM and defined how it can be used for exploring new domains. Using
EABM, we provided a proof of concept, identified problems and analyzed on agent-
based model. We used IoT 5G network as a use case. We used this network case
study which to the best of our knowledge cannot otherwise be modeled using any
other simulator easily because this particular domain is still unexplored in various
ways. We saw it is easy to actualize and test all situations for large-scale and complex
networks using ABM. Our EABM of the SACS algorithm clearly demonstrates how
different CAS researchers can use ABM to perform extensive simulation experiments
using parameter sweeping to come up with comprehensive hypotheses. Our results
demonstrated the effective testing and validation of various exploratory hypothe-
ses. We demonstrated how one hypothesis can lead to the next and how simulation
experiments can be designed to test these hypotheses.
We next discuss another important level of modeling framework, i.e., VABM. For
EABM, we designed, implemented and experimented a case study of 5G networks
to explore the futuristic approaches and its ease to implement this under developing
network. Since ABM approach has turned into an attractive and efficient way for
displaying large-scale complex systems, V&V of these models have become ques-
tionable. Here, we shall briefly explain VABM with the help of a case study of 5G
networks modeled in an agent-based simulator called NetLogo [1]. We have already
discussed NetLogo simulator in Section 8.1.2.1. One of the use cases of 5G net-
works is IoT, see section 8.1.3 for further details. Using Validated ABM, we shall
validate a modeled case study if it is a credible solution or not. Simulation provides
lots advantages for modeling complex systems but once they are simulated, question
arises if these simulated models can be trusted. A reliable simulated model which
has a random behavior must correlate with the real world or what it is expected to
behave. Therefore, a simulated modeled is first required to be verified if it is working
as expected and then it is required to be validated according to the requirements of
assigned parameters.
8.2.1 Introduction
ABM has become a significant and efficient way of modeling systems especially
complex systems. Modeling explains and makes it easy to understand real world
networks which are costly to explain or larger in scale. A successful simulation that
is able to produce a sufficiently credible solution can be used for prediction. Since it
is not possible in terms of execution concern and unnecessary (counting components
that do not have much impact on the framework) to construct a simulation model that
230 Modeling and simulation of complex communication networks
Verification
System
Model input Process Model output System
requirements
Validation
caters all the detail and behavior of the real system, a few presumptions must be made
about the framework to build a simulation model. Therefore, a simulation model is
an abstract representation of a physical system and intended to enhance our ability to
understand, predict or control the behavior of the system. However, the abstractions
and assumptions introduce inaccuracies to the simulation model. One of the important
tasks after simulating a system is determining how accurate a simulation model is
with respect to the real system. There are no set rules or models to validate a simulated
system although researchers and industrial people use various techniques to verify
and validate their simulated systems. In the previous section, EABM, we discussed
ABM frameworks levels. We discussed EABM in depth and explored underdeveloping
network case study, i.e., IoT of 5G networks. Also, we highlighted the importance
and purpose of using simulator, NetLogo and provided a brief comparison between
different ABM simulators. We used NetLogo to simulate the IoT case study and
evaluated different research queries to provide a proof of concept. In this section, we
shall validate the similar simulated 5G network and see if the model is feasible or not.
We shall validate one of the use cases of 5G networks, i.e., IoT. Next, we discuss the
case study in depth, its modeling approach and design. We shall then discuss results.
Lastly, we conclude this section of VABM.
are not known was selected as elements of what is called the parameters. Parameter
were changed and observed over networks. Then simulated. There are many princi-
ples and techniques of model V&V that have been presented, e.g., in [20]. However,
it is difficult and time-consuming to use all possible techniques for validating every
model that is developed. Modelers depend upon choosing the appropriate techniques.
Choosing an appropriate techniques assures the acceptable accuracy and credibility
of their model. We shall use VOMAS as proposed by [4]. This scheme is designed
in a manner which can be implemented on any kind of agent-based model. VOMAS
has the capability to monitor position-based (spatial) as well as non-position-based
agents. Subject matter experts (SMEs), are individuals who can provide validity to a
model. SME is who can give the accurate specification as well as analyze the outputs
and simulations runs of a model. VOMAS approach allows experts to be involved in
the design of the agent-based model as well as the custom-built VOMAS from scratch.
By involving SMEs from the start of the project, which are essentially equivalent to
clients in the software engineering domain, VOMAS approach allows the simulation
study to be a stronger candidate for success. “The Virtual Overlay Multi-agent System
is created for each simulation model separately by a discussion between the simulation
specialist as well as the SMEs. When the actual simulation is executed, the VOMAS
agents perform monitoring as well as logging tasks and can even validate constraints
given by the system designer at design time. VOMAS has been designed to cater
for both face validity as well as model assumptions and IO-transformations. Model
assumptions are ensured by the use of invariants. Face validation is ensured by means
of various techniques based on spatial and non-spatial validation and animation-based
validation. IO-transformations are ensured by means of essential logging components.
Thus, in other words VOMAS provides the complete validation package” [21]. It has
the following features:
elective models that one can use in ABMS. Different approaches to assess models
are the participatory model advancement with partners, or the utilization of Turing
tests. Here [22] presents several unique validation and verification techniques that are
being widely used in industrial and research models of manufacturing, engineering,
and business processes. These models are mostly discrete event simulation models
and are aimed to lower the cost of the system, its process and efficiency of working.
Face validity is asking the concerning experts if the models output is accurate to the
system. [4] explains validation methods, taxonomy in detail. The VOMAS technique
is considered to be an extension of the Companion Modeling [5] that involves both
SME as well as Simulation Specialists in developing of an overlay multi-agent system
for the purpose of validation. Any good modeled simulation should not be considered
as a full depiction of a real system. There can be different reasons ranging from non-
availability of complete data sets, weather conditions, complex and costly parameters,
etc. which can only be tested in real situations. Some level of abstraction is always
involved in simulating a model. There is no set or defined way of verifying and vali-
dating a model. Particular techniques may be chosen depending upon the pertaining
paradigm of a model.
1. To monitor various parameters specified by the SME during the design process,
during actual execution of the simulation experiments.
2. To report generation of any extraordinary values or violations of invariants again
specified on the basis of interactions with the SME.
Exploratory and validated agent-based modeling levels case study 233
3. To log activities of agents during simulation experiments. These logs are provided
to SMEs for post-simulation data analysis.
Thus, in other words, VOMAS is based on an interactive process going back
and forth between the simulation experiments and analysis of logs by SMEs and
simulation specialists.
1
1
1
validity, we have to use a different approach. One possible approach in the absence
of real data is thus to perform cross-model validation or “docking” of the models.
8.2.3.5 VOMAS agent design
A naïve way of designing a VOMAS could be to have each device calculate SACS
value over time. We have previously described how the algorithm works.
Output metrics: The output parameters that are of interest in the evaluation of
SACS are as follows:
However, here the total number of queries will be constant, and instead the cost of
communication as well as the number of successful queries will be evaluated in depth.
8.2.6 Conclusion
In this part of the chapter, we applied one of the techniques for validation of an agent-
based model. We validated a case study of 5G networks using VOMAS methodology,
i.e., VOMAS. This VABM level of the framework builds upon previous framework
levels such as EABM allowing for VABM. As a means of unification of all ideas and
concepts, the methods and case studies extensively involve both the use of agent-based
models and complex network models and methods in different scientific disciplines.
We presented a case study of IoT demonstrating the broad applicability of the proposed
methods involving building customized validation schemes based on the particular
case study. We presented a case study of IoT of 5G network which has a complex
structure of communication. This network involves several devices communicating
with each other while taking care of several other parameters. Unlike traditional
validation exercises, VOMAS validation involves in-simulation agents, which observe
and, if needed, interact with the simulation environment. The individual virtual agents
validate the communicating devices as the messages proceeds in the network. Also,
we verified how the network evolves when the message agents are given specific TTL
or reach flagged devices.
Exploratory and validated agent-based modeling levels case study 237
References
[1] 2017 Roundup Of Internet Of Things Forecasts. (last accessed on 27-
May-2018). https://www.forbes.com/sites/louiscolumbus/2017/12/10/2017-
roundup-of-internet-of-things-forecasts/#439ad5b41480/.
[2] 5G and Its Incredible Numbers that Will Rock the Global Economy. (last
accessed on 12-July-2018). https://ict.io/la-5g-et-ses-chiffres-incroyables-qui-
vont-faire-vibrer-leconomie-mondiale/.
[3] Andrews, Jeffey G, Stefano Buzzi, Wan Choi, et al. What will 5g be? IEEE
Journal on Selected Areas in Communications 2014; 32(6).1065–1082.
[4] Balci, Osman. Verification validation, and testing. Handbook of simulation
1998; 10.335–393.
[5] Becu, Nicolas, Pascal Perez, Andrew Walker, Olivier Barreteau & Christophe
Le Page. Agent based simulation of a small catchment water management in
Northern Thailand: description of the catchscape model. Ecological Modelling
2003; 170(2). 319–331.
[6] Gubbi, Jayavardhana, Rajkumar Buyya, Slaven Marusic & Marimuthu
Palaniswami. Internet of things (IoT): a vision, architectural elements,
and future directions. Future Generation Computer Systems 2013; 29(7).
1645–1660.
[7] Holland, John H. Studying complex adaptive systems. Journal of Systems
Science and Complexity 2006; 19(1). 1–8.
[8] Korkalainen, Marko, Mikko Sallinen, Niilo Kärkkäinen & Pirkka Tukeva. Sur-
vey of wireless sensor networks simulation tools for demanding applications.
In Networking and services, 2009. ICNS’09. fifth international conference on,
102–106. IEEE 2009.
[9] Laghari, Samreen & Muaz A Niazi. Modeling the internet of things, self-
organizing and other complex adaptive communication networks: a cognitive
agent-based computing approach. PLoS One 2016; 11(1). e0146760.
[10] Niazi, Muaz & Amir Hussain. Agent-based tools for modeling and simulation
of self-organization in peer-to-peer, adhoc, and other complex networks. IEEE
Communications Magazine 2009; 47(3). 166–173.
[11] Niazi, Muaz A. Emergence of a snake-like structure in mobile distributed
agents: an exploratory agent-based modeling approach. The Scientific World
Journal 2014.
[12] Niazi, Muaz A. Towards a novel unified framework for developing formal,
network and validated agent-based simulation models of complex adaptive
systems. arXiv preprint arXiv:1708.02357. 2017.
[13] Niazi, Muaz A & Amir Hussain. A novel agent-based simulation framework
for sensing in complex adaptive environments. IEEE Sensors Journal 2009;
11(2). 404–412.
[14] Niazi, Muaz A, Qasim Siddique, Amir Hussain & Mario Kolberg.
Verification & validation of an agent-based forest fire simulation model. In
Proceedings of the 2010 spring simulation multiconference, 1. Society for
Computer Simulation International. 2010.
238 Modeling and simulation of complex communication networks
[15] Niyato, Dusit, Marco Maso, Dong In Kim, Ariton Xhafa, Michele Zorzi &
Ashutosh Dutta Practical perspectives on IoT in 5g networks: from theory
to industrial challenges and business opportunities. IEEE Communications
Magazine 2017; 55(2). 68–69.
[16] Palattella Maria Rita, Mischa Dohler, Alfredo Grieco, et al. Internet of things
in the 5g era: enablers, architecture, and business models. IEEE Journal on
Selected Areas in Communications 2016; 34(3). 510–527.
[17] Palmer, Richard G, W Brian Arthur, John H Holland, Blake Le Baron & Paul
Tayler. Artificial economic life: a simple model of a stockmarket. Physica D:
Nonlinear Phenomena 1994; 75(1–3). 264–274.
[18] Premo, Luke S. Exploratory agent-based models: towards an experimental
ethnoarchaeology. In Digital discovery: exploring new frontiers in human
heritage. CAA. 29–36. 2006.
[19] Saxena, Navrati, Abhishek Roy, Bharat JR Sahu & HanSeok Kim. Efficient
IoT gateway over 5g wireless: a new design with prototype and implementation
results. IEEE Communications Magazine 2017; 55(2). 97–105.
[20] Siddiqa A, Niazi M. A novel formal agent-based simulation modeling frame-
work of an aids complex adaptive system. International Journal of Agent
Technologies and Systems (IJATS). 2013 Jul 1;5(3). 33–53.
[21] Wilensky, Uri. Netlogo 1999.
[22] Xiang, Xiaorong, Ryan Kennedy, Gregory Madey & Steve Cabaniss. Verifica-
tion and validation of agent-based scientific simulation models. Agent-directed
simulation conference, 47–55. 2005. ISBN: 1-56555-291
Chapter 9
Descriptive agent-based modeling of the
“Chord” P2P protocol
Hasina Attaullah1 , Urva Latif 1 , and Kashif Ali1
9.1 Introduction
1
Cosmose Research Group, COMSATS University, Pakistan
240 Modeling and simulation of complex communication networks
environment, their behavior, and studying their behavior and influence of agent on
the system as a whole. An ABM consists of agents and their specific model measures
and types:
1. Complex network model is designed when the data of interaction is available and
analysis is done to observe emergent pattern in CAS.
2. Exploratory agent-based modeling is done when interaction data is not available.
It is used for exploring previous concepts, authenticate concept, and predict future
research dimension.
3. Descriptive agent-based modeling to compare interdisciplinary models to learn
and enhance knowledge.
4. Validation-based modeling to predict simulation results according to the case.
Hash Key
H(192.168.23.1) N0
H(192.167.1.1) N3
H(192.168.22.2) N4
H(192.169.23.4) N6
H(191.167.1.3) N8
Hash Key
H(A) K0
H(B) K3
H(C) K4
H(D) K6
H(E) K8
Finger table
Finger table N0
Start Interval Successor
1 (1,2) 3
2 (2,4) 3 K 0, K12
4 (4,0) 6
Finger table N3
N0 Start Interval Successor
N12 N1 4 (4,5) 4
K1, K2,
5 (5,7) 6 K3
N11 7 (7,8) 8
N2
Finger table N4
N10 Start Interval Successor
5 (5,6) 6
N3
6 (6,8) 6 K4
N9 8 (8,0) 8
Finger table N8
Start Interval Successor N4
9 (9,10) 0
10 (10,12) 0 K7, K 8 N5 Finger table N6
N8 Start Interval Successor
12 (12,3) 0 N7 7 (7,8) 8
N6
8 (8,10) 8 K5, K 6
10 (10,0) 0
Figure 9.1 Finger table at active nodes. Adapted, with permission, from
Reference [2]
9.2.7 Stabilization
As nodes frequently join and leave a chord ring which makes chord network dynamic,
the key task of chord is executing these operations to preserve the ability to find
each key in the ring. In chord when node wants to join or leave the ring, the finger
244 Modeling and simulation of complex communication networks
Finger table N0
Start Interval Successor
1 (1,2) 3
2 (2,4) 3 K 0, K12
4 (4,0) 6
Finger table N3
N0 Start Interval Successor
N12 N1 4 (4,5) 4
K1, K2,
5 (5,7) 6 K3
N11 7 (7,8) 8
N2
Figure 9.2 Finger table at nodes after stabilization. Adapted, with permission,
from Reference [2]
table should be updated accordingly for up-to-date information. This will ensure the
efficient retrieval of information. To attain this objective, Chord needs to maintain
nodes successor correctly and key should be maintained by node successor. When
joining or leaving a chord ring, node impact the ring by changing successor and
predecessor of previously joined nodes, to maintain the entries correctly. Chord uses
stabilize () function to update the changes in finger tables. And for fast lookup of
entries, it is also necessary for finger table to be accurate.
● In stabilization at first join function is being called and then node’s finger and
predecessor are initialized. Suppose in Figure 9.2, N7 wants to join chord ring,
initially successor of N6 is N8 and predecessor of N8 is N6. When N7 runs
stabilize(), it asks N8 for its predecessor and resolves whether N8 should be
N7 successor instead. Stabilize() then informs node N7 successor of N7 pres-
ence, giving the successor the chance to change its predecessor to N7 depicted in
Figure 9.3.
● Existing nodes finger tables are updated to rationalize the changes. For example,
in Figure 9.2, finger table N4 is updated as node N7 is added to the ring. Successor
of node N6 is now updated as N7 in finger table of node 4.
Descriptive agent-based modeling of the “Chord” P2P protocol 245
Succ N6 = N7 Predece N7 = N8
N6 N7 N8
Predece N7 = N6 Predece N8 = N7
Figure 9.3 Successor and predecessor assignment after new node joining
● The last process is shifting of item keys to new joining node. In Figure 9.2, K7
is shifted from N8 to N7 when N7 joined the chord ring. And now N8 has only
responsibility of K8 as K7 is moved to new added node.
9.2.9 PeerSim
9.2.10 Literature review
Many variants of chord are proposed in literature. Different Chord-based protocols
are proposed in literature to improve and enhance the functionality of a Chord P2P
protocol. Some of renowned work is investigated and presented below.
time as compared to chord. Limitation of this approach is that only average query
response time is measured, other parameters like latency and failures are ignored.
The similarity data retrieval is a major issue in P2P systems. In M-Chord [36],
generalized vector technique iDistance [37] is used to map a data into one-dimensional
sphere. The data is divided among the nodes in a ring network. Similarity search
algorithm for query processing is proposed. Results show that it performs better in
query processing as compared to chord. Limitation of this work is while processing
query, maximum hop-count increases with the increase of nodes in a network.
Search for data on node is successful only when exact keyword is given for search
but mostly we only know some features of data. Moreover, the keyword distribution is
mostly skewed which means some keywords occur often and other occur rarely. To deal
with this problem, concept of multi-keyword queries is implemented and every node
is assigned with a keyword to refer the objects that have that keyword [38]. Keyword
feature vector, of size Q, of the data is formed and stored at different nodes in the Q
dimensional feature space. Query efficiency of the system should be improved.
In chord with the passage of time, network gets large and time to query the data
gets high. The existing data query approaches support the key-value based search
which does not support the semantic-based search. A small-world-based overlay for
P2P search is proposed [39] in which author introduced the design of an overlay
network, named semantic small world (SSW). SSW integrates four ideas: semantic
clustering, dimension reduction, the small world network, and the efficient search
algorithm that boost the data search in semantic-based search in P2P system. The
simulation results of the tests are quite reasonable than the traditional chord network.
In the proposed approach, the information of physical network is missing because if
the nodes are far away from each other than the result would be different.
Author’s combined these protocol and make a MESH-CHORD to minimize the prob-
lem due to mobility and to achieve better results. Results show that message overhead
reduces for about 40% in static environment and cross layering increases the success-
ful operations to 94%, while in the case of original Chord, it will drop to 70%. If the
traffic is high, MESH-CHORD slows down.
topology information (IPv6 address have these information) and use it in the construc-
tion of DHT system that is used in the Chord. It creates many small local chord using
IPv6 hierarchical address prefix and embed them into Global Chord. This approach
does not define node joining mechanism.
In the Chord, the peers are not aware of the underlaying physical path of other
peers which cause end-to-end delay because of the largest path for routing. Apply any-
cast (one-to-nearest-one-of-many) [50] of IPv6 to tell the nodes of the Chord network
more about their underlying network topology to achieve better routing efficiency.
Chord know the physical topology of the network which is done by using any-cast
of IPv6, so it is performed on the 1,740 nodes which join and leave the network
randomly and used the relative plenty delayed and which show that to query data
along the network is 28% efficient than the original chord. The problem they have is
it will work only for ideal any-cast, if it is not the case, it will be failed which is quite
hard to implement in today’s network.
balancing matrix. Attribute value pairs are used for increasing the ability to search
content without using its canonical name. If the registration rate or data-hosting rate
of a node exceed specified threshold or the handling matrix is in dynamic state, the
registration may fail.
Number of hops (path length) and actual time required for lookup are two
parameters that play role in key lookup. Enhanced bidirectional chord and enhanced
bidirectional chord with lookup-parasitic random sampling are proposed to reduce
lookup path length and latency, respectively [54]. Two finger table approach used
in these will increase robustness. Enhanced bidirectional chord protocol can reduce
path length up to 36% and latency 36%. Other approach can reduce path length and
latency up to 33% and 63%, respectively. System robustness for the proposed model
is not discussed.
Tai et al. [55] proposed a LISP-PChord to increase the scalability and trustwor-
thiness. The users keep on changing theirs IPs, and the routing tables of the system
updating keeps going on. Pointer nodes are introduced and mapping is done with
physical nodes using LP and genetic algorithm. There are three layers: pointer space,
logical space, and physical space. Pointer space meets the routing fairness and desired
load balance through divisibility.
The motivation behind this paper [56] is to propose a WILCO that is not
only aware of the P2P availability but also peer’s location and services as well.
This paper proposes a wireless location-aware Chord-based overlay mechanism for
WMNs (WILCO). The location awareness of the proposed mechanism is achieved
through a novel geographical multilevel Chord-ID assignment to the MRs on grid
WMNs. Also an improved finger table is proposed in the paper to make use of
the geographical multilevel ID assignment to minimize the underlay hop count of
overlay messages. The proposed scheme outperforms the original Chord and the
state-of-the-art MeshChord in terms of lookup efficiency and it significantly reduces
the overlay message overhead. Algorithms efficiently deploy P2P services, content
replication, and update and make use of the location awareness to improve service
quality.
In this paper [57], an improved version of Chord algorithm has been proposed.
The way Chord is initially designed does not consider physical characteristics of a
network. Also, while calculating the optimal path for data query, network latency is
ignored. These issues have been fixed by first analyzing and identifying the redundant
information in the Chord finger table. Then that redundant information is modi-
fied in a way so that it can be utilized in a more efficient manner. There was no
runtime addition or deletion of nodes from the network, and their effects were not
recorded.
and individual-based models. The basic principal of ODD was to make models more
descriptive and understandable.
9.3.1 Purpose
The aim of this work is to design and implement agent-based model of P2P network
protocol chord in NetLogo and compare its results with object-based model of chord
in PeerSim. Chord can provide efficient search, data locater and authentication, etc.
Agent-based model of chord helps us in acquiring deep understanding of chord agents
and its procedures. Through this model, we can perform behavioral study for agents
by initializing variables to different values and simulate results.
9.3.2.1 Agents/Individuals
There are four types of agents that are nodes, update-nodes, seeker nodes, and ping.
Node is the most vital agent of the model as it depicts the node of the network.
Basically, all the active nodes have finger tables and data stored in those tables depicts
state variables of that node. State variables of nodes are
Update-nodes handle fingers table updates of the network. State variables of update
nodes are
Seeker-nodes are the nodes that are in search of successor and predecessor. State
variables of seeker-nodes are
Pings are the messages that are sent by nodes to other nodes in network to find
successor and to connect its fingers to nodes.
9.3.2.3 Environment
Environment depicts the hardware on which this chord protocol is working. In our
model, this protocol works on nodes. Nodes can be routers, hubs, or any gateways in
real environment.
9.3.2.4 Collectives
All the nodes combine to form the shape of a ring using function in-ring. All the
nodes in network define their successors, predecessors, and location of their fingers
and links are formed. Any key is looked up in the network using links between the
nodes.
in node hash id. Complexity for key lookup is O(log N). Node joining or leaving
the network will create O(log2 N) messages to maintain the finger tables. Scope of
basic principles is of sub-models level. The model will provide insight about the basic
principles as through agents we can relate working of model in real word scenarios
through the agents. The model use previously developed theory of agent-based traits
from which system dynamics emerge. In this model, individuals are represented as
network nodes and their behaviors are tracked with time.
9.3.4.2 Emergence
The nodes connect to form a ring. For look-up key is forwarded through the network
using the links formed between nodes.
9.3.4.3 Adaptation
When a node leaves or joins the network, the finger tables of all nodes associated with
that node gets updated. The updating process ensures successful lookup of keys, as to
achieve success in key lookup, proper data storage at node finger table is necessary.
9.3.4.4 Objectives
The adaptive trait of updating finger tables increase the success of key lookup. The
lookup success is measured on two basics. Number of hops and success in finding
the particular key determines the success of key lookup.
9.3.4.5 Learning
When a node leaves a network, its predecessor and successor update their finger tables
and inform other nodes about it.
9.3.4.6 Sensing
When a node wants to join the network, it send messages to the nearby nodes and
then gets connected to the nearest node. Then its immediate successor is found on the
circle, and rest of the network is updated through stabilization.
9.3.4.7 Stochasticity
The process of node joining or leaving are stochastic. There is no particular frequency
for it.
9.3.4.8 Interaction
There are direct interactions between nodes. These interactions are represented
through links. For looking up a key in the network, these direct links are used.
9.3.4.9 Collectives
All the nodes join to form a ring-shaped structure, and each node effects this collective
work. Each node is connected to its successor, predecessor, and other nodes through
its fingers.
254 Modeling and simulation of complex communication networks
9.3.4.10 Observation
All the nodes join to form a ring-shaped structure, and each node effect this collective
work. Each node is connected to its successor, predecessor, and other nodes through
its fingers.
9.3.5 Initialization
Hash degree, number of node and add nodes are the state variables. Initially, all nodes
are defined, and but they are not connected to each other. The number of nodes are
there initially and its value can be set through slider. Initialization can be varied among
different simulations.
9.3.7 Sub-models
Sub-models make it easy to understand a code.
9.3.7.1 Set-up
It is called to clear previous variables and set-up environment for new simulation.
9.3.7.2 Init-node
It initializes nodes and sets their state variable values.
9.3.7.3 Create-network
It creates a ring of some nodes, and rest of the nodes keep on joining the network.
9.3.7.4 Go
It is the main procedure that starts the simulation. All other main procedures like start
messages, update messages, and maintenance are called from it.
Start
Update-Node
Seeker-Node
9.4.3 Flowchart
A flowchart is used to describe the algorithms and processes. A flowchart basi-
cally gives the step-by-step information that is needed for working of any process
or algorithm. The flowchart shown in Figure 9.7 explains the steps of initial network
formation in Chord algorithm. First, it initialize the values of network size, finger size,
node that can join the network and leave the network. The hash function is used to get
the hash value of node IP. After the hash IDs has been assigned to nodes, nodes try to
256 Modeling and simulation of complex communication networks
Start
Procedures Globals
Seeker-Node
Update-Node
end
find successors using seeker messages. Fingers of nodes are placed at the respective
nodes in the ring. When a node leaves or join, maintenance procedure is called to
update finger table (Table 9.4).
Join Node
Find successor
Update table
Maintain finger
table
Node agent
Node agents are the main agents of a chord protocol as depicted in Table 9.5. They rep-
resent nodes in a network. They are placed in a ring network. They can communicate
with each other using messages. Breed Node is the first one to explain in pseudo-code
based specification. It has six variables which are hid, predecessor, successor, fingers,
in-ring, and next. The variable “hid” is Hash id and is identification number assigned
to nodes generated by a hash algorithm SHA [25]. Predecessor is the id of a previous
node of any given node. Successor is the id of next node in a network. Next variable
is fingers, it is fingers of nodes which are linked to any specific node at any given
time in a finger table. The variable in-ring is used to create ring for nodes. The last
variable “next” is next node entry in a finger table.
258 Modeling and simulation of complex communication networks
Network size,
Start number of nodes
added/removed
Assign hashes to
nodes and keys
Find Successor
Update finger
Maintenance
table
i++
Yes
Receive update
No
End
The next is “seeker-Node.” seeker-Node shown in Table 9.6 is the one who wants
to join ring network. It is used to find successor of a node. Seeker node seeks for a
key in a network. It has five different variables. The “sender” variable is used for the
node who wants to know its successor. The “seeking” is the data key which any node
wants to find, “destination” variable is the successor of node sender. The variable
entry is the entry in a finger table, and last variable “in-ring” is used to join ring.
The next breed is “update-Node” breed. “update-Node” breed is used to respond
to “seeker-Node” breed. It responds to seeker node by sending message to seeker-
Node. Message either contains desired successor address or a closest preceding node
address if desired successor is not found. It has four variables, “connectTonode,”
Descriptive agent-based modeling of the “Chord” P2P protocol 259
ABM Agents
ABM Globals
Globals InputGlobals
Globals OutputGlobals
Agents Agent-Attributes
Agents Agent-Breed
ABM BS-Expts
ABM Procedures
BS-Expts network-size
Agent-Breed Node
Agent-Breed Update-Node
Agent-Breed Seeker-Node
Agent-Breed Ping-Node
InputGlobals network-size
InputGlobals stabilize-count
InputGlobals Speed
Node Node-Attributes
Update-Node Update-Node-Attributes
Seeker-Node Seeker-Node-Attributes
Node-Attributes hid
Node-Attributes predecessor
Node-Attributes successor
Node-Attributes fingers
Node-Attributes in-ring
Node-Attributes next
Update-Node-Attributes ConnectToNode
Update-Node-Attributes in-ring
Update-Node-Attributes destination
Update-Node-Attributes entry
Seeker-Node-Attributes sender
Seeker-Node-Attributes seeking
Seeker-Node-Attributes in-ring
Seeker-Node-Attributes destination
Seeker-Node-Attributes entry
Procedure setup
Procedure init-node
Procedure create-network
Procedure start-messages
Procedure lookup-messages
Procedure maintenance
Procedure find-successor
Procedure join-node
Procedure lookup
Procedure stabilize
Procedure notify
Procedure lookup-ping
Procedure fix-fingers
Procedure report
Procedure init-fingers
Procedure receive-update
260 Modeling and simulation of complex communication networks
9.4.4.2 Globals
Next we will explain globals used in our simulation model Chord. We have
four global input variables described in Table 9.9. The variables “network-size,”
Descriptive agent-based modeling of the “Chord” P2P protocol 261
Sliders :
network-size : it is used to give total number of nodes in a ring
node-join : no. of nodes want to join the network
node-leave : no. of nodes want to leave the network
update-frequency : time after which stabilize is called to update the finger table.
Switch :
speed : distance between two nodes
stabilize-count : used to count stabilization call
“node-join,” “node-leave,” and “update-frequency” are taken from sliders and are
provided by user. The other two variables “speed” and “stabilize-count” are declared
globally in a program. The variable “stabilize-count” is to count how many times the
procedure stabilize is called, and the last variable “speed” is used to calculate the
distance between two nodes (Table 9.9).
9.4.4.3 Procedures
First procedure to be called is setup. Setup is the key procedure which should be
called at the start of the simulation every time. Purpose of setup is to clear previously
declared environment variables and to prepare the workspace for new simulation. It
clears the output generated by previous simulation and reset ticks as well. The setup
procedure in our case will call init-node procedure which initializes the nodes. After
nodes are initializes, message is sent to each node to set their labels as hash ids and set
their color to given color. Create network procedure is called to create empty chord
network for new nodes. And at the end, nodes are arranged in a ring shape using
layout-circle function (Table 9.10).
Init-node is a function which will create nodes for simulation. It is called from
procedure setup. It creates nodes using global variable network-size from slider. It
sets successor and predecessor to nobody. It sends message to node to set its shape
to circle and calculate hash ids for nodes as their identification key. Hash ids are
calculated using 2HashDegree , where hash degree is taken from slider as global variable.
And it initializes finger table to 160 entries (Table 9.11).
262 Modeling and simulation of complex communication networks
neighboring node and forward it accordingly to that specified speed. The procedure
“start-messages” is called from procedure go (Table 9.14).
The procedure “lookup-messages” is called from procedure “go.” It sends mes-
sage to agent “Node” to get successor list and then sort it. After that it calls procedure
264 Modeling and simulation of complex communication networks
“receive-update” to update the finger table of a node and also ping all the nodes which
are in successor list.
The procedure “maintenance” connects all new joining nodes in a ring. Then it
calls procedure “stabilize” and “fix-finger” to update the finger table periodically
(Table 9.15).
The specification of procedure “find-successor” is explained for a given node.
In this procedure, if a given node is in the range of the requesting node, then it will
return successor id to requesting node. If it is not in range, then it will return the id
of closest preceding node (Table 9.16).
The “join-node” procedure is called from “lookup-ping” procedure. It initial-
izes successor and predecessor to “nobody” and initializes fingers by procedure
init-fingers. It generates messages from requesting nodes to destination node. It
flooded messages until link is created between two nodes (Table 9.17).
Next procedure explained here is “init-fingers.” In this procedure, variable next
of agent “Node” is initialized to “0.” It is called from procedure “create-network” and
“join-node” (Table 9.18).
The procedure “stabilize” updates the finger table. It is called periodically for up-
to-date information. Whenever a new node joins a network, the node calls a “stabilize”
Descriptive agent-based modeling of the “Chord” P2P protocol 265
procedure. After procedure is called, node sets its entry of successor node and notifies
its successor node to acknowledge it as a predecessor. All the nodes, which are already
connected in the ring, run “stabilize” periodically and it asks the joining node to set
its predecessor to the nearest node (Tables 9.19–9.21).
266 Modeling and simulation of complex communication networks
Procedure lookup-ping : new node sends messages on joining, this procedure handles it
9.4.4.4 Experiments
In this section, we have explained the procedures related to experiments (Tables
9.23–9.25).
Inputs:
Nodes added/removed: [5,10,−5,−10]
network-size: [5,000]
Stop condition: Ticks = 500
Final commands: none
Inputs:
Network-size: [500,1000,5000,10000]
nodes added/removed: [20,−5]
Stop condition: Ticks = 500
Final commands: none
Chord-PeerSim
18
16
14
12
10
LOG (network size)
8
Average no of hops
6
4 Maximum no of hops
2
0
500 1,000 2,000 5,000 10,000 50,000
Network size
Figure 9.9 Plot showing maximum number of hops, average number of hops and
log of network size in PeerSim
800
600
Stabilization
400
200
0
−10 −5 0 5 10
Nodes added/removed
maximum number of hop count, and log of network size is plotted. By varying
network size to 500, 1,000, 5,000, 10,000, and 20,000, we have different values for
hop count. We can see that average number of hop count increases with the increase
of network size. The average hop count for network size 500 is 2, and it increases to
7.36 with the network size 50,000.
In Figure 9.10, the interval graph for stabilization count is plotted by varying
nodes added and removed. The network size in this case is fixed, i.e., 5,000. The
interval graph shows that when nodes are removed, stabilization is called more fre-
quently. Stabilization of maximum value in case when 10 nodes are removed is 881,
whereas mean is 240. But in case when no node is removed but new nodes are added
in the network, there is not any stabilization call.
In Figure 9.11, we have seen that number of failures rises when more nodes are
joining and leaving the network.
3.0
2.0
Failure
1.0
0.0
−10 −5 0 5 10
Nodes added/removed
Chord–NetLogo
35
30
25
20
15 Maximum number of hops
0
500 1,000 2,000 5,000 10,000
Network size
Figure 9.12 Plot showing maximum hop count and average hop count with varying
network size in NetLogo
800
600
Stabilization
400
200
0
−10 −5 0 5 10
Nodes added/removed
in some simulators to interpret nodes interaction. PeerSim also does not have a
GUI, but NetLogo GUI interface is user friendly, easy to understand and interpret.
One can see the behavior of network and interaction of nodes easily. We can add
sliders in the interface and add any input at any time. In Figure 9.8, we can see that
we can add or remove nodes using sliders. We can give network size according to
scenario requirement.
● Plots visualization: In PeerSim, we cannot visualize the plots while simulation is
running. In the case of NetLogo, we have real-time monitors where we can plot
the behavior of network.
● Results comparison: Comparing the results of PeerSim and NetLogo, we have
observed the similar trends in results. We can see the average number of hop
is greater in NetLogo as compared to PeerSim. Also, maximum hop count is
greater in NetLogo as shown in Figure 9.9. Overall trend is same in both; with the
increasing network size, number of hops increases. But the slight difference in
values is because of different simulation environment. When we compare number
of stabilization calls, in the case of PeerSim when a node joins, there is not any
stabilization call, but in NetLogo, finger table is updated after stabilization call. So
we have more stabilization calls in NetLogo as compared to PeerSim. When there
is not any node joining the network, we still have stabilization call in NetLogo as
it is called after some value.
Table 9.26 Table of eccentricity, betweenness and degree centrality measures of the
Chord network
ABM 0.660452 4 0
Agents 0.521469 3 0
Globals 0.238983 3 0
InputGlobals 0.191525 7 15.545455
OutputGlobals 0 1 0
Agent-Attributes 0 1 0
Agent-Breed 0.641243 5 11.075758
BS-Expts 0.098305 4 7
Procedures 0.489831 18 42.232323
node-join 0 1 0
node-leave 0 1 0
network-size 0 1 0
stabilize-count 0 1 0
Speed 0 1 0
update-frequency 0 1 0
network-sizeB 0 1 0
Node 0.209605 2 0
Update-Node 0.155367 2 0
Seeker-Node 0.183051 2 0
ping-Node 0.126554 2 0
Node-Attributes 0.191525 7 14.166667
Update-Node-Attributes 0.129944 5 10
Seeker-Node-Attributes 0.161017 6 12.090909
hid 0 1 0
predecessor 0 1 0
successor 0 1 0
fingers 0 1 0
in-ringn 0 1 0
next 0 1 0
ConnectToNode 0 1 0
in-ringu 0 1 0
destination 0 1 0
entry 0 1 0
sender 0 1 0
seeking 0 1 0
in-rings 0 1 0
destination-s 0 1 0
entry-s 0 1 0
setup 0 1 0
receive-update 0 1 0
init-node 0 1 0
create-network 0 1 0
start-messages 0 1 0
lookup-messages 0 1 0
maintenance 0 1 0
find-successor 0 1 0
(Continued)
274 Modeling and simulation of complex communication networks
join-node 0 1 0
lookup 0 1 0
stabilize 0 1 0
notify 0 1 0
lookup-ping 0 1 0
fix-fingers 0 1 0
report 0 1 0
leave-network 0 1 0
init-fingers 0 1 0
ping-Node-attributes 0.098305 4 7.888889
sender-p 0 1 0
destination-p 0 1 0
in-ringp 0 1 0
node-join-b 0 1 0
node-leave-b 0 1 0
Procedure is larger from all as node procedure has many others procedures attached
with it. The size of node Input Globals is also larger as it has many input global
variables attached with it. Other nodes which has larger size are agent attributes;
every agent attribute has many attributes attached with it, so their size also increased.
fingers
Node
Update-Node Update-Node-Attributes
Agent-Attributes
Seeker-Node
stabilize-count
Connect ToNode
node-join Agent
Agent-Breed in-ringu
entry destination
InputGlobals Globals ABM
node-leave
OutputGlobals ping-Node
network-size
Speed
network-sizeB BS-Expts Seaker-Node-Attributes
ping-Node-attributes
update-frequency node-leave-b
sender
node-join-b Procedures seeking
in-rings
destination-s
init-fingers
entry-s
leave-network setup
report receive-update
fix-fingers init-node
create-network
lookup-ping
notify lookup-messages
stabilize start-messages
lookup join-node maintenance
find-successor
successor
ping-Node
next hid
Node
in-ringn
Agent-Breed
predecessor
Node-Attributes fingers
Seeker-Node-Attributes
Agent-Attributes
sender seeking
receive-update
entry-s in-ringsdestination-s Speed stabilize-count
Agents
notify setup node-join
network-size
fix-fingers node-leave
Procedures ABM Globals
start-messages InputGlobals update-frequency
leave-network
lookup
init-node maintenance
OutputGlobals
stabilize node-join-b
BS-Expts
lookup-ping
report
create-network
network-sizeB
join-node node-leave-b
init-fingers
lookup-messages
find-successor
Figure 9.15 Complex network model of Chord nodes resized according to degree centrality
Descriptive agent-based modeling of the “Chord” P2P protocol 277
Degree (%)
20
18
Degree (%)
16
14
12
10
8
6
4
2
0
ABM
InputGlobals
Agent-Breed
node-join
stabilize-count
network-sizeB
Seeker-Node
Update-Node-
predecessor
in-ringn
in-ringu
sender
destinations-s
receive-up date
find-successor
stabilize
fix-fingers
start-messages
init-fingers
destination-p
node-leave-b
Figure 9.16 Plot showing the degree centrality
Betweenness (%)
0.7
0.6 Betweenness (%)
0.5
0.4
0.3
0.2
0.1
0
ABM
InputGlobals
Agent-Breed
node-join
stabilize-count
network-sizeB
Seeker-Node
Update-Node-
predecessor
in-ringn
in-ringu
sender
destinations-s
receive-up date
find-successor
stabilize
fix-fingers
start-messages
init-fingers
destination-p
node-leave-b
decaying curve that shows that there are a few nodes with more links as compared to
other nodes.
In Figure 9.19, degree centrality is plotted which shows a cumulative curve in
decreasing order. In Figures 9.20 and 9.21, eccentricity centrality and betweenness
centrality are shown, both of which showing the same curve behavior like degree
centrality.
278 Modeling and simulation of complex communication networks
Eccentricity (%)
45
40
35 Eccentricity (%)
30
25
20
15
10
5
0
ABM
InputGlobals
Agent-Breed
node-join
stabilize-count
network-sizeB
Seeker-Node
Update-Node-
predecessor
in-ringn
in-ringu
sender
destinations-s
receive-up date
find-successor
stabilize
fix-fingers
start-messages
init-fingers
destination-p
node-leave-b
Figure 9.18 Plot showing the eccentricity centrality
1.00
Fraction of vertices having degree x or greater
0.50
0.20
0.10
0.05
0.02
1 2 5 10 20
Degree centrality
9.5.6 Discussion (ODD vs. DREAM pros and cons of both) and
which is more useful for modeling the chosen P2P protocol
ODD protocols give the overview, design concepts, and details of the CHORD and
explains all the main elements, i.e., emergence, adaptation, objectives, prediction,
Descriptive agent-based modeling of the “Chord” P2P protocol 279
1.0
Fraction of vertices having eccentricity x or greater
0.8
0.6
0.4
0.2
10 15 20 25 30 35 40
Eccentricity centrality
0.5
0.2
0.1
References
[1] Balakrishnan H. Chord: a scalable peer-to-peer lookup service for internet
applications. In: ACM SIGCOMM. Citeseer; 2001.
Descriptive agent-based modeling of the “Chord” P2P protocol 281
[21] Niazi M, Hussain A. Agent-based tools for modeling and simulation of self-
organization in peer-to-peer, ad hoc, and other complex networks. IEEE
Communications Magazine. 2009;47(3):166–173.
[22] Shoham Y. Agent-oriented programming. Artificial Intelligence. 1993;60(1):
51–92.
[23] Alharbi H, Hussain A. An agent-based approach for modelling peer to peer
networks. In: Modelling and Simulation (UKSim), 2015 17th UKSim-AMSS
International Conference on. IEEE; 2015. p. 532–537.
[24] Batool K, Niazi MA, Sadik S, et al. Towards modeling complex wireless sensor
networks using agents and networks: a systematic approach. In: TENCON
2014-2014 IEEE Region 10 Conference. IEEE; 2014. p. 1–6.
[25] Burrows JH. Secure hash standard. Department of Commerce: Washington
DC; 1995.
[26] Avramidis A, Kotzanikolaou P, Douligeris C, et al. Chord-PKI: a dis-
tributed trust infrastructure based on P2P networks. Computer Networks.
2012;56(1):378–398.
[27] Sit E, Morris R. Security considerations for peer-to-peer distributed hash
tables. In: International Workshop on Peer-to-Peer Systems. Springer; 2002.
p. 261–269.
[28] Rottondi C, Panzeri A, Yagne C, et al. Mitigation of the eclipse attack in Chord
overlays. Procedia Computer Science. 2014;32:1115–1120.
[29] Srivatsa M, Liu L. Mitigating denial-of-service attacks on the chord over-
lay network: a location hiding approach. IEEE Transactions on Parallel and
Distributed Systems. 2009;20(4):512–527.
[30] Douceur JR. The Sybil attack. In: International Workshop on Peer-to-Peer
Systems. Berlin, Heidelberg: Springer; 2002. p. 251–260.
[31] Uruena M, Cuevas R, Cuevas A, et al. A model to quantify the success of a
Sybil attack targeting reload/chord resources. IEEE Communications Letters.
2013;17(2):428–431.
[32] Nechaev B, Korzun D, Gurtov A. CR-Chord: improving lookup avail-
ability in the presence of malicious DHT nodes. Computer Networks.
2011;55(13):2914–2928.
[33] Meng X, Liu D. GeTrust: a guarantee-based trust model in Chord-based
P2P networks. IEEE Transactions on Dependable and Secure Computing.
2016;10(7):134–147.
[34] Lu EJL, Huang YF, Lu SC. ML-Chord: a multi-layered P2P resource shar-
ing model. Journal of Network and Computer Applications. 2009;32(3):
578–588.
[35] Liu J, Zhuge H. A semantic-based P2P resource organization model R-Chord.
Journal of Systems and Software. 2006;79(11):1619–1631.
[36] Novak D, Zezula P. M-Chord: a scalable distributed similarity search structure.
In: Proceedings of the 1st International Conference on Scalable Information
Systems. ACM; 2006. p. 19.
[37] Jagadish HV, Ooi BC, Tan K-L, Yu C, Zhang R. iDistance: An adaptive
B+-tree based indexing method for nearest neighbor search. ACM Transac-
tions on Database Systems (TODS 2005). p.364–397.
Descriptive agent-based modeling of the “Chord” P2P protocol 283
[38] Joung YJ, Yang LW, Fang CT. Keyword search in DHT-based peer-to-
peer networks. IEEE Journal on Selected Areas in Communications.
2007;25(1):46–61.
[39] Li M, Lee WC, Sivasubramaniam A, et al. SSW: a small-world-based overlay
for peer-to-peer search. IEEE Transactions on Parallel and Distributed
Systems. 2008;19(6):735–749.
[40] Li M, Chen E, Sheu PC. A chord-based novel mobile peer-to-peer file sharing
protocol. In: Asia-Pacific Web Conference. Springer; 2006. p. 806–811.
[41] Woungang I, Tseng FH, Lin YH, et al. MR-Chord: improved chord lookup
performance in structured mobile P2P networks. IEEE Systems Journal.
2015;9(3):743–751.
[42] Canali C, Renda ME, Santi P, et al. Enabling efficient peer-to-peer resource
sharing in wireless mesh networks. IEEE Transactions on Mobile Computing.
2010;9(3):333–347.
[43] Zoels S, Despotovic Z, Kellerer W. On hierarchical DHT systems—an
analytical approach for optimal designs. Computer Communications.
2008;31(3):576–590.
[44] Chou JC, Huang TY, Huang KL. SCALLOP: a scalable and load-balanced
peer-to-peer lookup protocol for high-performance distributed systems. In:
Cluster Computing and the Grid, 2004. CCGrid 2004. IEEE International
Symposium on. IEEE; 2004. p. 19–26.
[45] Kaashoek MF, Karger DR. Koorde: a simple degree-optimal distributed hash
table. In: International Workshop on Peer-to-Peer Systems. Springer; 2003.
p. 98–107.
[46] Chou JY, Huang TY, Huang KL, et al. SCALLOP: a scalable and load-
balanced peer-to-peer lookup protocol. IEEE Transactions on Parallel and
Distributed Systems. 2006;17(5):419–433.
[47] Cuevas R, Uruena M, Banchs A. Routing fairness in chord: analysis and
enhancement. In: INFOCOM 2009, IEEE. IEEE; 2009. p. 1449–1457.
[48] Hong F, Li M, Wu M, et al. PChord: improvement on Chord to achieve
better routing efficiency by exploiting proximity. IEICE Transactions on
Information and Systems. 2006;89(2):546–554.
[49] Xiong J, Zhang Y, Hong P, et al. Reduce Chord routing latency issue in the
context of IPv6. IEEE Communications Letters. 2006;10(1):62–64.
[50] Dao LH, Kim J. A Chord: topology-aware Chord in anycast-enabled
networks. In: Hybrid Information Technology, 2006. ICHIT’06. International
Conference on. vol. 2. IEEE; 2006. p. 334–341.
[51] Rao W, Chen L, Fu AWC, et al. Optimal resource placement in structured
peer-to-peer networks. IEEE Transactions on Parallel and Distributed
Systems. 2010;21(7):1011–1026.
[52] Forestiero A, Leonardi E, Mastroianni C, et al. Self-chord: a bio-inspired P2P
framework for self-organizing distributed systems. IEEE/ACM Transactions
on Networking (TON). 2010;18(5):1651–1664.
[53] Gao J, Steenkiste P. Design and evaluation of a distributed scalable content
discovery system. IEEE Journal on Selected Areas in Communications.
2004;22(1):54–66.
284 Modeling and simulation of complex communication networks
[54] Wu YC, Liu CM, Wang JH. Enhancing the performance of locating data
in chord-based P2P systems. In: Parallel and Distributed Systems, 2008.
ICPADS’08. 14th IEEE International Conference on. IEEE; 2008. p. 841–846.
[55] Tai Z, Sheng W, Dan L. LISP-PCHORD: an enhanced pointer-based DHT to
support LISP. China Communications. 2013;10(7):134–147.
[56] Le-Dang Q, McManis J, Muntean GM. Location-aware chord-based overlay
for wireless mesh networks. IEEE Transactions on Vehicular Technology.
2014;63(3):1378–1387.
[57] Ding S, Zhao X. Analysis and improvement on Chord protocol for structured
P2P. In: Communication Software and Networks (ICCSN), 2011 IEEE 3rd
International Conference on. IEEE; 2011. p. 214–218.
[58] Grimm V, Berger U, Bastiansen F, et al. A standard protocol for describing
individual-based and agent-based models. Ecological Modelling. 2006;198
(1–2):115–126.
[59] Team RC. R: A language and environment for statistical computing. 2013; 201.
Chapter 10
Descriptive agent-based modeling of
Kademlia peer-to-peer protocol
Hammad-Ur-Rehman1∗ and
Muhammad Qasim Mehboob1∗
10.1 Introduction
Kademlia, a peer-to-peer (P2P) protocol based on DHTs (Distributed Hash Tables),
offers desirable features which are not offered by other protocols simultaneously.
One notable feature is, in other protocols a large number of messages were needed to
know about other nodes. Kademlia minimizes these number of messages [1]. While
doing so, the key lookups configuration information automatically spreads among
neighboring nodes. The nodes have all the desirable knowledge needed to route the
specific query through the paths which have low latency. Another benefit of this
protocol is the algorithm [1] which is used to find other node’s existence and can also
resist attacks by which basic service denial occurs.
Being a P2P system, Kademlia can be modeled in form of either complex
network-based [2,3] or agent-based models. Models represent the complex system
in term of its multiple components, behavior and communication among them for
management and other tasks. Communication between nodes in a P2P system like
Kademlia is complex in nature. Finding a node, Joining and Store adds complex-
ity to Kademlia protocol. So, due to all these complex natures of Kademlia, we
can confidently consider Kademlia a complex system. In CAS (Complex Adaptive
Systems) [4], multiple nonlinear components interact with each other which leads
to emergent behavior. So this emergent behavior requires modeling Kademlia as
CAS, or fully as CACOONS (Complex Adaptive COmmunicatiOn Networks and
environmentS) [5].
To thoroughly understand the complex system, system modeling is must [6].
Modeling CAS through traditional solutions is impossible because CAS are highly
robust [4] and have many variables. The emergent behavior for each entity which is
involved in a system can be better understood by complex system modeling. A system
1
COSMOSE Research Group Computer Science Department, COMSATS University Islamabad, Pakistan
∗
Both authors contributed equally.
286 Modeling and simulation of complex communication networks
Rest of the chapter is structured as follows: Section 10.2 presents background and
literature review; in Section 10.3, the model design is given; Section 10.4 describes
results and discussion, and Section 10.5 describes the conclusion and future work.
the particular <Key, Value> pair. If algorithm has to search for the particular value,
the algorithm has to know the key for that value and iterate the network in multiple
steps. Every step will be used to find the nodes which are closer to key unless nodes
return the needed value or no more nodes closer to keys are found. Near any target,
key servers can also be located using the route ID–based algorithm. Kademlia uses
XOR metric to calculate the distance—a reason for many benefits which we get by
using Kademlia. Due to symmetry property of XOR metric members of Kademlia
get queries from same distribution of nodes containing in the routing tables.
Systems like Chord do not gain information like routing from the queries received
because they do not use the XOR metric property [1]. This asymmetry leads to
inflexible routing tables. Within an interval, Kademlia can send query or asynchronous
parallel queries to get any node routes which are based on latency. In Kademlia nodes
that are near to some particular ID are located using one routing algorithm from end
to end. Whereas old systems use separate algorithms to go near the particular ID
and for last hops another algorithm is used. Of all the existing systems, Kademlia
resembles the most with Pastry’s first phase. But Pastry in the second phase switch
toward the difference between IDs in numeric terms. Pastry nodes that are closer to
second metric are fairly distant from the first, which creates discontinuity at node IDs
and creating negative effect on performance.
1 0
1 1 0
0
1
1
1
0
0
0
1
1
1
1
0
0
0
0
0
1
1
1
0
0
Figure 10.1 Kademlia Binary Tree. 0011 is represented by black dot. All subtrees
where 0011 have contact are shown by Gray Dots. Adapted, with
permission, from Reference [1]
1 0
1 0
1
0
1
1
1
0
0
0
0
0
1
1
1
1
0
0
0
0
0
1
1
1
0
0
2
4
3
Figure 10.2 Node locating by ID. 0011 prefix node finds 1110 by learning and
querying. Line space on top shows 160-bit Ids. RPC messages made
by 1110 are shown below. Say first RPC is at 101 known to 1110
already. Next RPCs are to nodes returned by RPCs. Adapted, with
permission, from Reference [1]
290 Modeling and simulation of complex communication networks
when there is no node which shares a unique/particular prefix with the key. Also, in
situation when some of the given node’s subtrees are empty.
10.2.4.4 Node
For routing query messages or contacting other nodes, nodes store information about
every other node. For each 0 ≤ i < 160, every node have <IP address, UDP port,
Node Id> triplets for each node which is in between distance of 2i and 2i + 1. Here
these set of lists is called K-Buckets. k-buckets always have sorted output. Buckets
are sorted with respect to time seen. Tail has the most recently seen node, whereas
head has the least recently seen node. At time when node receives any message, it
updates particular K-bucket for sender’s Node ID. Also for small value of i, kBuckets
are generally empty. List grows up to k size for large i values.
Assume a case when a node which is sending exists in a receiver’s bucket, then
the receiver changes its position to list’s tail. Assume a case when a sending node
does not exist in particular K-bucket and there are also fewer than k-entries in bucket,
Descriptive agent-based modeling of Kademlia peer-to-peer protocol 291
then new sender is just inserted at list’s tail. If a particular K-bucket is full, then the
recipient just pings the least recently seen node and if it fails to pong, then the node
is evicted and the new node is inserted at the tail of the list, and in contrast, if least
recently seen that node responds in time, then that node is shifted to the end of the list
and received node is abandoned. All these nodes which are live are never removed
from the list.
10.2.4.5 Protocol
Four (RPCs), namely, PING, STORE, FIND_NODE, FIND_VALUE constitute
Kademlia protocol.
● PING
In this RPC, a node sends a message to another and if gets reply from that, both
the nodes have to update the particular k-buckets. Basically, this RPC is used to
check is node is live.
● STORE
Recipient node have to store a <Key, Value> pair received from sender. So that
it can be used later for retrieval.
● FIND_NODE
In this RPC, sender node sends a 160-bit key and recipient has to return K-triples
that are closest to the received key. In the best case, it has to return k triples but
it can send less if it has knowledge of few triples.
● FIND_VALUE
This RPC is equivalent to FIND_NODE if the corresponding value is not present.
If it is present then only the value is returned.
1 0
1 0
1 0
1 0
1 0
1
Figure 10.3 Routing table evolution. Initially, it was single k-bucket for node.
Bucket whose range cover Node Id splits as the k-bucket fills. Adapted,
with permission, from Reference [1]
Find k-bucket
For sender’s node ID
No
Bucker full? No
Yes
No
Yes
Figure 10.4 Adding contacts with bucket splitting. Adapted, with permission, from
Reference [12]
P2P concept has been introduced in mobile networks and has a lot of appli-
cations. But due to highly robust, random network topology and low transmission
range information retrieval problem arises. In the current paper [16], the authors
proposed economical and automatic information retrieval approach based on cache,
which is updated by seeing relevant factors. NS2 simulation experiments show that
this approach has better results than old ones. Average response time from query and
network messages are reduced. One thing missing was they could have extended their
method to content based search in these networks (Figure 10.5).
294 Modeling and simulation of complex communication networks
Given a key...
Find closest
non-empty k-bucket
Closer
nodes? Yes
No
No remaining
un-queried nodes
No
Return max(k)
closer nodes
For these nodes, up to k...
Botnets became popular in recent years and these have many applications, but
Botnets have become apparent as mostly extreme cyberattacks in near years. To solve
this backdrop, as discussed in [17], the authors have proposed new botnet called
AntBot in which C&C information is spread across all bots. aMule-based distributed
simulator is used for implementation. Doing enough simulations, it is proved that
against pollution-based migrants AntBot operated resiliently. They could have used
few other defense mechanism also which they did not.
The paper [18] is about management of content retrieval in KAD. Content search
is implemented in KAD by using Kademlia DHTs. In P2P systems, information loss
happens if node churn occurs. KAD already deals with this by publishing multiple
redundant copies of information. Here [18] they have only tweaked some parameters
and showed performance. Their results show that lookup performance can be increased
by coupling lookup and content retrieval but latency will be same. What more they
could have done is to use more design parameters i.e. requests in parallel, timeout,
round-trip delay, etc.
P2P networks can be used in applications like file sharing. They have to share
and gather huge chunk of computing resources due to which their average energy
usage is higher. There are other [19] proxy [20] based approaches used to solve
this. The paper [21] addresses energy problem by proposing a two-layer model [22].
The lower layer is composed of files for sharing, whereas the upper is composed of
DHTs, the work of which is to index peers and files and show availability of each.
Simulation result shows, in proposed system 50% less energy is used as compared to
others [23,24], and without any other delay, more than 80% of files start downloading.
They could have shortlisted peers (which have the highest availability ratios) among
other peers.
Efficient and fast search operations are possible in P2P networks [25]. For
querying or disseminating valuable information, their topology is useful. While broad-
casting when a network fails to link or bad node joins network problem occurs. To
solve this problem, the paper [26] presents a replication broadcast algorithm for
Kademlia [27]. This algorithm only uses entries in the routing tables of Kademlia.
The results show that replication can be used for enhancing speed and reliability. The
problem is, replication also increases networking traffic. Trade-off between time for
broadcast and network traffic cost could have been a solution for this.
In structures P2P systems, each peer may join or leave at any time. This is referred
to as churn. In distributed systems and in P2P systems due to high dynamicity routing
failures, saved data loss or random peer view occurs so minimizing churn is necessary.
To minimize churn, the paper [28] proposed a zone replication technique which
is implemented on DHTs top and consists of three steps: publishing, maintenance
protocols and searching. PeerfactSim.Kom was used for simulation; the result shows
that the proposed solution produces smaller routes and good resistance to churn as
compared to others. They divided the proposed solution into three parts which I think
will increase the execution time as compared to old systems.
Clustering has become a hot topic for P2P systems in recent years. Plain DHTs
have many applications but still they lack for some kind of applications, i.e., multime-
dia, etc. Also, they addressed whether clustering improves querying performance or
296 Modeling and simulation of complex communication networks
not? For supporting wide applications range in the paper [29], they presented Echo.
Echo (based on the Cayley graph model [30]) is a framework which improves query
efficiency by combining DHTs functionality [31], homogeneity of load with Clus-
tering. Also, they examined the effect of prefix clustering on congestion by using
“Congestion-freeness” [32] notation. The results show that as this model is Cayley
dependent, other DHTs can also be implemented using this. Where they lost was they
could have maintained peers statistics to make network strong and firm.
Kademlia and chord [33] provide an effective way for finding other nodes in P2P
networks. But lookup queries for finding resources along the path can be disrupt easily
by malicious nodes. In [34], a Reputation for Directory Services (ReDS) framework is
described that first track the lookup requests presented by other nodes and then based
on tracking it enhances lookups in redundant DHTs. The author also explores how
shared reputation can work with ReDS in the context of free-rider prevention [35]. By
using ReDS, simulations showed that over an extensive range of situations lookups,
success rates enhanced by 80% or more for Kad and Halo.
For distributed multimedia services in 2008, IETF P2PSIP was designing a
protocol to combine the SIP functionality of media session and for P2P resources
localization and decentralized distribution. At that time, infrastructure that was con-
sidered for P2PSIP scenarios was single domain for inside connectivity. The paper [36]
proposes a peer called “super peer,” the architecture is hierarchical based among multi
P2PSIP domains for interconnection. Every domain will have one minimum “super
peer” which will be stable peer among all other peers, and together, all super peers
from different domain make an upper overlay layer. To validate the routing state and
routing performance of analytical model PeerFactSim.Kom simulator [37] was used.
The results show that peers routing entries significantly (approximately 50%) decrease
as associated routing states and we increase in number of domains. A realistic scenario
is simulated by setting up the churn as explained in [38]. The selection mechanism
of super-peers can increase the performance.
In the last few years, P2P network architecture has gain very much popularity in
variety of applications and services, for example, VoIP streaming and collaborative
computing applications. The essential requirement of deploying such networks are
security and integrity. Withstanding various malicious peers in the network also called
Sybil attack can poison the routing tables and may make the retrieval, storage and
routing process time consuming and extremely difficult. The paper [39] proposes
a new trust and reputation-based technique which makes more secure the retrieval,
storing and routing the resources in Kademlia network. For simulation, a discrete event
simulator DEUS is used. In results, it is noticed that when the trust-based algorithm
is applied on Kademlia’s RPCs (PING, STORE, FIND), the networks perform 20%
better than pure RPCs. In simulation 200 Sybil nodes vs 1000 true nodes are used.
In future, it will be interesting to find a mathematical close form for finding optimal
value of balancing factor.
The P2P networks play an important role in communication-oriented systems. But
the performance of P2P networks is significantly affected by the churn phenomenon
particularly in mobile environment. Until now, no proper analysis is presented for
Descriptive agent-based modeling of Kademlia peer-to-peer protocol 297
structured P2P networks. In [40], the authors conduct an evaluation for the Kademlia-
based P2P system in the communication-oriented environment in the presence of
churn. For simulation, Nethawk EAST simulator is used and the simulation is con-
ducted on mobile platform. It is noticed that for robust system under different levels
of churn 3 degree is enough for both lookup parallelism and resource replication, and
in terms of CPU load, 200 bytes or less were used for optimal energy consumption.
In future, it can be applied on some larger settings to confirm the conclusion draw in
this paper.
In the literature, many DHTs have been vigorously studied, and for peer orga-
nization in DHTs, many diverse suggestions have been made, but in real systems,
very few DHTs implemented on huge scale. Previously, developed crawler–based
p2p system has duration limitation of crawls to few days at best. The system devel-
oped, as discussed in [41], is able to operate at the rate of per crawl every 5 min
and peer behavior is measured in terms of up-time distribution and churn rate. Their
findings conclude that in high resolution document sharing peers (of kad) leave and
join the networks with a binomial distribution (negative), while session time of peer
will be same as Weibull distribution. In this paper, ID repetition has not been taken
into consideration.
Kademlia and chord are two relevant DHTs that are used in different P2P appli-
cations to provide decentralized services. In the literature, quite a lot piece of work
has appeared that evaluate the performance of both DHTs, but the results are neither
concluding nor consistent because a different churn model is used and neglects the
key point that churn occurs since the beginning of the DHTs lifetimes. In [42], a
realistic and fair framework is integrated by the following:
1. Executing the churn model at the time of peer creation,
2. Performance evaluation methodology considers different DHTs parameters are
not equal, and
3. A churn metric that keep track of rate of change of P2P population.
Successful lookup ratio metric is used for performance evaluation, and it is noticed
that under the similar scenario, Kademlia shows better performance than chord. The
only problem here is that simulations were conducted on small scale. For real world,
it was not tested.
In any P2P network, user enters or leaves network continuously. This behavior is
called churn. It has become important to understand the resilience properties when
the churn is changing on high rate. In paper [43], the authors in first step find the
dynamic churn model that is lifetime based for P2P system and have reached station-
arity that is reducible to a uniform node failure model. In the next step, a reachable
component method is developed, and then using this method under different level of
churn rates, routing performance of P2P networks [44] is evaluated. The results show
that de Bruijn graph based [45] routing networks show outstanding resilience under
tremendously high rates of nodes turnovers. The routing networks that were tested
include randomized-Chord, Chord, CAN and Kademlia.
Structured overlay networks have gained much popularity in our daily appli-
cations. The main problem of such networks is vulnerable to attack that aimed to
298 Modeling and simulation of complex communication networks
damage the functionality and structure of networks. In past, many proposals [46] for
secure architecture design were presented, but comprehensive and broadly acknowl-
edged solution is lacking. In [47], the authors present a new solution called Layered
Identity-based Kademlia-Like Infrastructure (Likir) which aims to secure implemen-
tation of P2P network based on DHT. For performance evaluation, both Likir and
Kademlia overlay nets were run on PlanetLab networks, and lookup operations in
Likir are greater than Kademlia because of cryptographic efforts spent during a node
session, but Likir is more secure than Kademlia. In evaluation, small nets were run
on PlanetLab, but in real, there are millions of peers and it will further increase the
lookup time for Likir.
Kademlia, based on keys, is one of the most effective routing protocols. In
Kademlia’s routing phase it needs to contact log (N) nodes. Due to this bottleneck
problem occurs. To reduce the quantity of nodes that participates in lookup process,
the paper [48] proposed an algorithm named “Shades.” In the proposed algorithm,
nodes have their own caches that will reduce time in cache hit case. Other algorithms
may store a set but they cannot count the number of times items are inserted [49]. Sim-
ulation results show that Shades have reduced median quantity of nodes contributing
to each lookup by 22%–36% compared to others when tested and 30%–40% reduced
when compared to Kademlia. In real time, it may corrupt the data. They have taken
data corruption problem into consideration.
In the paper [50], another implementation of Kademlia DHT protocol is discussed
which have over 1 million nodes. All nodes are concurrent and use eDonkey file
sharing protocol. There are many design weaknesses in Kad that attackers can exploit
to fail the find and search mechanism. They measured that cost and other parameters
of those attacks against 16,000 nodes which are kad connected. The attack they have
described in their paper has two phases. One is “Preparation Phase” and the other one
is “Execution Phase.” DVN, simulation which was quite large in magnitude, scaled
up to 200k nodes. And after all, they found out their attacks are more cost effective.
Based on DHTs, Kad network is the most popular P2P network and due to
network’s scalability and reliability, the user base of this network will continue to
grow in future. In the literature, many decentralized P2P architectures have been
proposed but still most existing system running on centralized architecture. In article
[51], two popular network types are examined one is server-based eDonkey and
other decentralized DHT Kad. In comparison to these networks, it is discovered that
eDonkey receive all 10% server request that conclude that eDonkey is more popular,
but in other results, it is noticed that Kad is more robust as compared to eDonkey.
These comparisons are not done on very large scale. So if it was done on large then
results may vary.
In this section, we will give a detail description of ODD and DREAM models for
our Kademlia protocol. We will also present the Unified Modeling Language (UML)
diagrams of our protocol.
Descriptive agent-based modeling of Kademlia peer-to-peer protocol 299
10.3.2 Overview
● Purpose
The main purpose of this model is to understand and learn that how agent-
based modeling techniques can be applied to simulate Kademlia protocol on huge
scale. Further to compare Kademlia performance with already existing non-ABM
models.
● Entities
There are three types of entities involved in this model. The first one is the
entities that are in the network, the second that want to join the network and the
third that are leaving the networks. All three types of entities are represented by
nodes. Each node owns some variables like every node has a node id of 160 bits
and a list of kBuckets to store the contact and key, value pair to store data.
● Process overview
At first, all three types of nodes are generated and placed randomly, and then using
protocol’s RPCs, they start contacting each other and start forming a network.
When one node contacts to another to join the network, the recipient node checks
their buckets. If it is not full, the recipient node simply adds the new node at the
end of one of the buckets, which depends on the XOR distance between their ids,
but if the bucket is full, then the recipient will ping the node on top of bucket, and
if gets response from the pinged node, then it will discard new node, otherwise
it will add the new node by removing the pinged node. kBuckets are sorted by
time, the last seen nodes will be on top and the recent seen will be in the bottom
of buckets.
and second output that we want to analyze how latency changes as we increase
network size.
● Adaptation
It is basically how agents adapt environment changes. So, we can say it is a test of
decision-making capabilities of agents when environment changes. Constraints
are already well defined and accordingly the environment changes. We test our
model in two scenarios—one is in the presence churn and the other is in the
absence of churn.
● Objectives
In adaptive environment as environment changes, the individual node or agent
also gets the effect or reward from the change for their adaptive behavior to
accomplish their own tasks. So, the core objective of our model is to analyze
how the messages and latency behave as we change network size when churn is
present and when churn is absent.
● Sensing
There are some properties related to each node that help in decision making
when nodes are communicating each other. These properties also increase the
performance of system. In our case to find the data, the node does not contact
every node in the network, but it only contacts to the nodes that are in their
buckets. This behavior reduces the number of messages and also the latency to
find the required data.
● Interaction
The nodes that are in buckets can contact each other to find or store a value.
● Stochastically
The model is developed using the protocol specification from the Markov
paper [1].
● Observation
In the simulation on every step, the following information are collected:
1. Network size or the number of nodes
2. Number of nodes joining the network
3. Number of nodes leaving the network
4. Number of messages
10.3.4 Details
In this section, we will describe how the model is initialized, what are the initial states
of model, we will also tell about the input data if any used, and at the end, we will
also tell about the model’s parameters and their values.
● Initialization
NetLogo tool is used for model implementation. NetLogo is a tool used for ABM.
It is free and open source. The initialization of model is done by calling the “setup”
procedure, and in this function, randomly nodes are created and a random id is
assigned to them and all nodes are placed randomly. Some walkers are generated
also position on some of the nodes. kBuckets of every node are also initialize
with nobody.
Descriptive agent-based modeling of Kademlia peer-to-peer protocol 301
Parameter Value
● Input data
No input data is feed to model.
● Submodels
The parameters that are used in the model and their values are given in Table 10.1.
[Contact replies]
Set timeout
[Reply]
Timeout
[Timeout]
Find k-bucket
For sender’s node ID
No
No
Bucket Full?
Yes
Set timeout
Timeout
No
Yes Response?
Develop basic
understanding of CAS
Perform simulation
experiment
Discover emergent
behavior
Figure 10.8 DREAM methodology for ABM. Adapted, with permission, from
Reference [3]
agents can move across the network. Here first, we will be defining the node type
agent (Table 10.2).
We have implemented this model in NetLogo (Toolkit for ABM). For one node agent,
there are four internal variables, hid that is node id and an opaque 160-bit number,
which we have used as a label for a particular node. kBuckets are for node’s all bits
Descriptive agent-based modeling of Kademlia peer-to-peer protocol 305
Hid Message
value Location
Make-New-Node
Walker-Attributes go
Node-Attributes
key Walker node-lookup Forever
init-props
xor-distance
nested-get
tgt tRefresh from-distance
Apha alphaClosestNodes integer-to-binary digit-value xor-distance-btw-nodes
tRepublish OutputGlobals
from-base
closest-non-empty-bucket
Breed Walker: This agent is used for walker that can move across the network.
because every node for each 0<= i <160 keeps a list of <IP address, UDP port,
Node ID> triplet for those nodes which have distance between 2i and 2i + 1 from
itself. Next, we will be defining the walker type agent.
Here Walker breed represents walker type agents (Table 10.3). These walkers
are used for the purpose of routing to move in a network. Next, we presented the
internal variables used in this type of network. The location variable is used to store
the current location of a walker. The message is being sent to Walker and does action
on the basis of that. Now as we have developed a specification model of the agent
306 Modeling and simulation of complex communication networks
Sliders
Population: Used for specifying initial number of nodes
Hash_Degree: Used to specify the size of hash
initial-seed: Used to specify initial seed size in network
all breeds, next we will be developing model for global variables used in the code.
In simulation, different configurations are dependent on Global variables. So these
global variables are important.
● Globals
In this section, we will be describing the global variables in the simulation model.
For simulation setup, there are eight key input variables. In the specification
model below, key global variables are described.
Here we have three variables which are used as input, and we can clearly note that all
these variables are sliders (GUI element in NetLogo) (Table 10.4). The population is
the input provided by the user to specify the initial number of nodes. Hash_Degree is
the input provided by the user to specify the size of hash, that is, each node will be
given a random hash id between 0 and 2(Hash_Degree)−1 . Initial_seed is the input which
will be used to specify initial seeds size in a network.
● Procedures
After describing breeds and global variables in this section, we will be describing
procedures that are part of our model. There can be different types of procedures.
The first procedure is setup procedure (Table 10.5).
Descriptive agent-based modeling of Kademlia peer-to-peer protocol 307
For every ABM, a setup procedure is the most important one. Models can have
different names for setup procedure, but more or less similar kind of procedure
is needed to set up simulation scenarios. Here point to note is that remaining
quantity of old simulation should not be disturbing future simulations. That’s
why old results are cleared before the next simulation. By doing this, we can
make sure that our simulation is running in a clean fashion. Then, some global
constant variables, which are used in the simulation, are assigned constant values.
Number of nodes depend on the “Population” slider value. Then, each node is
traversed and each node’s label is set. Node ID is used as node label. Also for
each node, another procedure “init-props” is called that will be explained later.
In the next step, for all nodes cur is set as “self.” After this a walker is created,
walker’s color is changed to one-of base colors. Next walker’s location is set to
one of the nodes and the initial location is moved to walker’s location.
“init-props” is the procedure which will set each nodes shape, label, store x and
y-coordinate, and id (Table 10.6). Also at the end kBuckets for each node is set as
nobody. To arrange nodes in some order and shape, there is button present on screen
called “Make Circle” that can be used. Next, we will be describing procedure named
“go.”
“go” is the procedure which is being called repeatedly for each simulation
(Table 10.7). This function gets called when “go (forever)” button is clicked or “go
(single step)” is clicked. First, in this procedure, ticks counter value is reported and
then another procedure named “node-lookup” is called.
“integer-to-binary” is the procedure used to convert an integer to a binary number
(Table 10.8). First, we start the loop which runs till number is not equal to zero. First,
set “rem” as number % 2. In the next step, set bitList, now add rem in the start of
bitList. Next, set “num” as number % 2. At last, “number” is set as a floor of “num”
to return.
308 Modeling and simulation of complex communication networks
“nested-get” is the procedure which is used to get value from the list on the basis
of index (Table 10.9). It takes list and index as input and return value from the list. It
is called from “create-network” for each node.
“node-lookup” is the procedure which is used to do node lookup procedure
(Table 10.10). Lookup is started by first picking alpha nodes from closest non-empty
Descriptive agent-based modeling of Kademlia peer-to-peer protocol 309
kBucket. If closest non-empty kBucket has fewer entries, it just picks alpha nodes it
has knowledge of. The source then sends to find the node RPCs to alpha nodes, where
alpha is typically 3. Then each alpha sends request to further K nodes and check if
they reply or not. If alpha = 1, then Kademlia resembles chord.
“find-alpha-closest-nodes” is the procedure which is used to get alpha closest
nodes from bucket (Table 10.11). It calls “closest-non-empty-bucket” procedure to
get the closest non-empty bucket, and then from the output of that procedure, we get
alpha nodes.
“find-k-closest-nodes” is used to get k closest nodes from bucket (Table 10.12).
It calls “closest-non-empty-bucket” procedure to get the closest non-empty bucket,
and then from the output of that procedure, we get k nodes to return.
“xor-distance-btw-nodes” is a procedure which is used to find xor distance
between two nodes (Table 10.13). It calls “xor-distance” procedure to find xor distance
between src and tgt ids.
310 Modeling and simulation of complex communication networks
node and least recently node. If there is no reply from least recently seen node, then
recipient just removes it from the particular kBucket and at the tail new sender is
inserted. Otherwise, if there is a response from last recently seen node, it is shifted to
tail of list, and new sender is discarded.
“closest-non-empty-bucket” is used to get the closest non-empty bucket from
buckets list (Table 10.15). It calls “xor-distance” procedure to find xor distance
between src and tgt ids. Then from buckets list get the closest non-empty bucket
and return it.
312 Modeling and simulation of complex communication networks
of bin2Len and bin1Len. Then make a list of zeros and pad it to list named “zero”
and, at last, make a list containing zeros and bin1 and put that in bin1. After main if
condition we have declared a local variable “dist as 0” and in last run for loop on each
of the arrays bin1 and bin2. In loop’s each iteration, we compare each bit of bin1 and
bin2, if bits are not equal, then increment “dis” by one. And return dist.
“binary-to-integer” is the procedure which is used to convert binary number to
decimal number (Table 10.17). It takes bits as input and convert them to a decimal
number.
Figure 10.10 Kademlia Complex Network Model, resized and colorized according
to degree centrality
shown below in the figures: Figure 10.14 shows poweRlaw plot of Degree, Fig-
ure 10.15 shows poweRlaw plot of Closeness centrality and Figure 10.16 shows
poweRlaw plot of Betweenness centrality (Figures 10.17 and 10.18).
1. Messages
In Kademlia protocol, the messages exchanged are the events linked to PeerSim.
For every message, we create an instance of Messages class that extends PeerSim
class of SimpleEvent.
i. MSG_FINDNODE: Find node action is stared using this message.
ii. MSG_ROUTE: This message is used to query about the target node from
the neighbors.
iii. MSG_RESPONSE: In response to MSG_ROUTE, this msg is sent that
contains the information of k nodes that are nearest to the target node.
2. Code comments
i. Bootstrap process
In the core of PeerSim, the WireKOut class contained. Two classes State-
Builder and CustomDistribution are implemented to facilitate network’s
bootstrap and initialization process.
Descriptive agent-based modeling of Kademlia peer-to-peer protocol 315
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
ABM
ABM GLOBALS
GLOBALS OUTPUTGLOBALS
OUTPUTGLOBALS INPUTGLOBALS
INPUTGLOBALS APHA
APHA BITS
BITS K
K TEXPIRE
TEXPIRE TREFRESH
TREFRESH TREPLICATE
TREPLICATE TREPUBLISH
TREPUBLISH
Degree
AGENTS
AGENTS AGENT-ATTRIBUTES
AGENT-ATTRIBUTES
Closeness-centrality
AGENT-BREEDS
AGENT-BREEDS
NODE
NODE
NODE-ATTRIBUTES NODE-ATTRIBUTES
HID HID
KEY KEY
VALUE VALUE
KBUCKETS KBUCKTS
WALKER WALKER
WALKER-ATTRIBUTES WALKER-ATTRIBUTES
LOCATION LOCATION
MESSAGE MESSAGE
PROCEDURES PROCEDURES
Degree
FOREVER FOREVER
GO GO
Closeness-centrality
MAKE-NEW-NODE MAKE-NEW-NODE
SETUP SETUP
CREATE-NETWORK CREATE-NERWORK
INIT-PROPS INIT-PROPS
NESTED-GET NESTED-GET
BINARY-TO-INTEGER BINARY-TO-INTEGER
INTEGER-TO-BINARY INTEGER-TO-BINARY
XOR-DISTANCE XOR-DISTANCE
FROM-BINARY FROM-BINARY
FROM-BASE FROM-BASE
DIGIT-VALUE
DIGIT-VALUE
Figure 10.11 Plot showing the Degree centrality
REPORTER
REPORTER
Modeling and simulation of complex communication networks
ALPHACLOSESTNODES
SRC ALPHACLOSESTNODES
TGT SRC
TGT
GLOBALS
INPUTGLOBALS
y
AGENTS
0.02 0.05 0.10 0.20 0.50 1.00 AGENT-BREEDS
NODE
1
NODE-ATTRIBUTES
WALKER
Betweenness (%)
WALKER-ATTRIBUTES
PROCEDURES
FOREVER
2
GO
MAKE-NEW-NODE
SETUP
CREATE-NETWORK
x
Degree
INIT-PROPS
NESTED-GET
Betweenness (%)
5
BINARY-TO-INTEGER
INTEGER-TO-BINARY
XOR-DISTANCE
FROM-BINARY
FROM-BASE
10
DIGIT-VALUE
Figure 10.13 Plot showing the closeness centrality
REPORTER
NODE-LOOKUP
FIND-ALPHA -CLOSEST-NODES
CLOSEST-NON-EMPTY-BUKET
XOR-DISTANCE-BTW-NODES
317
318 Modeling and simulation of complex communication networks
Closeness-centrality
1.00
0.50
0.20
y
0.10
0.05
0.02
Betweenness
1.0
0.5
y
0.2
0.1
When the process starts, the WireKOut class randomly creates the
links between nodes. It creates a virtually overlay network. After this,
CustomDistribution class is used to initialize the network, for example,
for every node, unique ids are assigned in a range between 0 …2BITS .
Here BITS is coming from the Kademlia protocol and usually the value
0.00
0.20
0.40
0.60
0.80
1.00
1.20
ABM
GLOBALS
OUTPUTGLOBALS
INPUTGLOBALS
APHA
BITS
K
y TEXPIRE
TREFRESH
0.02 0.05 0.10 0.25 0.50 1.00 TREPLICATE
TREPUBLISH
AGENTS
Eigen centrality
AGENT-ATTRIBUTES
AGENT-BREEDS
NODE
NODE-ATTRIBUTES
0.05
HID
KEY
VALUE
KBUCKETS
WALKER
0.10
WALKER-ATTRIBUTES
LOCATION
MESSAGE
PROCEDURES
FOREVER
x
GO
0.20
MAKE-NEW-NODE
SETUP
Eigen centrality
Eigen centrality (%)
CREATE-NETWORK
INIT-PROPS
NESTED-GET
BINARY-TO-INTEGER
INTEGER-TO-BINARY
0.50
XOR-DISTANCE
FROM-BINARY
FROM-BASE
Figure 10.17 Plot showing the Eigen centrality
DIGIT-VALUE
REPORTER
1.00
ALPHACLOSESTNODES
SRC
FIND-ALPHA -CLOSEST-NODES
FIND-K-CLOSEST-NODES
CLOSEST-NON-EMPTY-BUKET
XOR-DISTANCE-BTW-NODES
319
320 Modeling and simulation of complex communication networks
Number of messages per search operation with respect network size in Absence of churn
35
Simulations
Number message (avg)
30
25
20
102 103 104 105
Network size (number node)
Figure 10.19 Number of messages with respect to network size per search
operation in churn absence
count, delivers messages count and finds operation count. The STEP is the
only allowed parameter that used the stderr to define the output frequency
measure.
4. Results
The performance of Kademlia protocol in search operations is analyzed in both
scenarios, that is, in the presence of churn and in the absence of churn by varying
the network size. The result of average messages exchanged during the search
operation is presented.
i. Simulation Parameters
In order to simulate Transmission Control Protocol (TCP)-like connection-
oriented protocol, a reliable channel was selected. In simulation, the network
size was varied from 128 to 65,536 nodes. In each round, the number of nodes
was doubled and total simulation time was around 1 h. The observer step
was 100,000 and traffic step was calculated from the following equation:
Simulation Time
TrafficStep = (10.3)
Network Size
The turbulence step was calculated from the blew equation
Number of messages per search operation wrt network size in Presence of churn
Simulations
Number message (avg)
Figure 10.20 Number of messages with respect to network size per search
operation in churn presence
Figure 10.21 NetLogo user interface of Kademlia random shape. It has sliders,
buttons, plot, etc.
Figure 10.20 is the same graph as the previous one. The only difference is, now
the churn is present. You can see that there is very little difference in the graph as
compared with the graph from absence of churn. In this graph, the turbulence control
is activated. The probability of nodes for leaving or joining the network is same. It is
also an evidence to show that Kademlia handles churn very well (Figure 10.21).
Descriptive agent-based modeling of Kademlia peer-to-peer protocol 323
Figure 10.22 NetLogo user interface of Kademlia circle shape. It has sliders,
buttons, plot, etc.
Number of messages per search operation wrt network size in Absence of churn
400
Simulations
350
Number message (avg)
300
250
200
150
100
50
101 102 103
Network size (number node)
Figure 10.23 Number of messages with respect to network size per search
operation in churn absence
architecture section. We keep track of links that are made between the nodes in the
process of lookup and take them as messages.
10.4.4.1 Configuration
Like PeerSim in ABM, there are system-wide three main parameters that can be
used to customized the protocol. The first one is the BITS that defines the node id
length and the default value is 20. In PeerSim simulation, the default value of BITS
is 160, but in NetLogo, it is set to 20 because of limiting the simulation scale and
to avoid overflow errors in hashes operation. The second one is K that is used to
define the single k-bucket length default value is 20. The third one is ALPHA that is
used in lookup operation, and it defines the simultaneous lookup process. The default
value is 3.
10.4.4.2 Results
As described earlier, the performance of Kademlia protocol in search operation is
analyzed. The performance is analyzed in both scenarios that are in the presence of
churn and in the absence of churn by varying the network size. The result of average
messages exchanged during the search operation is presented. We used BehaviorSpace
tool of NetLogo for our experiment. The experiment consists of 200 simulations, 100
in the presence of churn and 100 in the absence of churn. The network size is changed
from 25 to 400 with a difference of 25. The results from the NetLogo simulation was
written in csv file, and then by using MATLAB, the results were plotted.
Figure 10.23 shows the results of lookup operation in the absence of churn, which
means stable environment. In this plot, the number of messages required per search
operation is compared with network size. You can see, as the network size increases,
the number of messages required is also increasing.
Figure 10.24 is the same graph as the previous one. The only difference is, now
the churn is present. You can see that there is very little difference in the graph as
Descriptive agent-based modeling of Kademlia peer-to-peer protocol 325
Number of messages per search operation wrt network size in Presence of churn
400
Simulations
350
Number message (avg)
300
250
200
150
100
50
101 102 103
Network size (number node)
Figure 10.24 Number of messages with respect to network size per search
operation in churn presence
compared with the graph from absence of churn. The probability of nodes for leaving
or joining the network is same. It is also an evidence to show that Kademlia handles
churn very well.
10.4.6 Discussion
10.4.6.1 Comparison of ODD and DREAM
In this part of the chapter, we will present comparison of ODD [8] and DREAM [3]
from all perspectives.
326 Modeling and simulation of complex communication networks
Number of messages per search operation Number of messages per search operation
with respect network size in Absence of churn with respect network size in Absence of churn
35 400
Peersim simulations Netlogo simulations
350
Number message (avg)
200
25
150
100
20 50
102 103 104 105 101 102 103
Network size (number node) Network size (number node)
Figure 10.25 The left side is PeerSim simulation and in the right NetLogo
simulation
Number of messages per search operation with respect Number of messages per search operation with respect
network size in Presence of churn network size in Presence of churn
35 400
Peersim simulations Netlogo simulations
350
Number message (avg)
300
30
250
200
25
150
100
20 50
102 10
3
10
4
10
5
10
1
10
2
10
3
Figure 10.26 The left side is PeerSim simulation and in the right NetLogo
simulation
ODD only provides a textual description of ABM for the purpose of making
the model more readable and ODD also promotes rigorous model formulations. To
describe ABM, it provides checklist covering key features. ODD has some limitations
that are described next. According to a survey conducted by Grimm [52], for the
publications which have used ODD only, 75% have used it correctly and the remaining
25% have some flaws, even some parts of the protocol were compromised. So, in [52],
the author concludes that using ODD protocol, it is not an easy task to write down
ABM specification.
Descriptive agent-based modeling of Kademlia peer-to-peer protocol 327
An issue with ODD is, some of the ODD specifications are overlapping in differ-
ent sections. For example, introduction and purpose sections have overlaps. The same
thing that is asked in the purpose section is also included in the introduction section.
The submodel section also has some similarities with the design concept section. The
submodel section is also described again in the scheduling and process section.
ODD is not suitable for comparison between different ABMs and to describe
large ABM. Some large ABMs have a lot of features and ODD has less description of
them, which is not enough to cover the whole ABM. Comparison between different
ABMs is not possible in ODD due to the lack of quantitative assessment. The only way
to do is to prepare an ODD checklist for both ABMs and put them is a table and find
the differences and similarities between them. Therefore, using ODD comparison of
two ABMs is not an easy task and also reviewing of many ABMs is a hectic the task.
ODD specification has redundancy, replication and less information. Sometimes,
the same ABM with a different version is published in different publications. But in
these publications, ODD specifications are almost same just with little change in
process section and entities. Another issue with ODD is, using it, someone cannot
replicate ABM because it provides a very specific description of the model. So, to
replicate ABM with such specific, redundant and ambiguous information is quite
difficult.
As compared to ODD, DREAM provides a detailed description of ABM. Using
DREAM, we can develop a complex network model of any ABM, describe through
pseudo-code specification and also can make steps for network analysis. So, by using
DREAM, we can easily get detailed description and pseudo-code based design of
ABM which will be quite helpful for us to compare ABMs and replicate ABMs of
any domain.
DREAM allows to understand and analyze visually the complex network of any
ABM without going into much code details. It provides quantitative measurement
after performing network analysis, which lacks in ODD. Therefore, by using these
measurements, one can easily understand, replicate and compare ABM from different
domains.
DREAM is independent of scientific domains and applicable to any ABM
research domain. DREAM allows comparing different models that are developed
in different scientific domains. For example, to compare two ABMs of different
domains, first we have to develop the complex networks of both models and then we
can analyze and compare them in the same manner. So, we can say that by DREAM,
we cannot only get detailed description but we can also compare ABMs from different
domains.
DREAM further provides pseudo-code specification of ABM. Using this spec-
ification, anyone can understand ABM regardless of discipline. This specification
translates to code and then ABM development. Hence, using DREAM, we can
confidently say that it is an easy task to understand and to replicate any ABM.
In [53], the authors did a very good empirical analysis of both protocol method-
ologies. They used 13 features to evaluate the methodologies and calculate the rank
of each methodology. The averaging result of DREAM was 1.76 and ODD was 0.69.
328 Modeling and simulation of complex communication networks
Hence, the results show that DREAM provides very detailed specification, and it is
the best suitable choice for ABM understanding, comparison and replication.
References
[1] Maymounkov P, Mazieres D. Kademlia: A peer-to-peer information system
based on the XOR metric. In: International Workshop on Peer-to-Peer Systems.
Springer; 2002. p. 53–65.
Descriptive agent-based modeling of Kademlia peer-to-peer protocol 329
[38] Steiner M, En-Najjary T, Biersack EW. A global view of KAD. In: Proceedings
of the 7thACM SIGCOMM Conference on Internet Measurement. ACM; 2007.
p. 117–122.
[39] Pecori R. S-Kademlia: A trust and reputation method to mitigate a Sybil attack
in Kademlia. Computer Networks. 2016;94:205–218.
[40] Ou Z, Harjula E, Kassinen O, et al. Performance evaluation of a Kademlia-
based communication-oriented P2P system under churn. Computer Networks.
2010;54(5):689–705.
[41] Steiner M, En-Najjary T, Biersack EW. Long term study of peer behavior in the
KAD DHT. IEEE/ACM Transactions on Networking (ToN). 2009;17(5):1371–
1384.
[42] Medrano-Chávez AG, Pérez-Cortés E, Lopez-Guerrero M. A performance
comparison of Chord and Kademlia DHTs in high churn scenarios. Peer-to-
Peer Networking and Applications. 2015;8(5):807–821.
[43] Kong JS, Bridgewater JS, Roychowdhury VP. Resilience of structured
P2P systems under churn: The reachable component method. Computer
Communications. 2008;31(10):2109–2123.
[44] Sen S, Wang J. Analyzing peer-to-peer traffic across large networks. In:
Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurement.
ACM; 2002. p. 137–150.
[45] Fraigniaud P, Gauron P. D2B: A de Bruijn based content-addressable network.
Theoretical Computer Science. 2006;355(1):65–79.
[46] Boneh D, Franklin M. Identity-based encryption from the Weil pairing. SIAM
Journal on Computing. 2003;32(3):586–615.
[47] Aiello LM, Milanesio M, Ruffo G, et al. An identity-based approach to
secure P2P applications with Likir. Peer-to-Peer Networking and Applications.
2011;4(4):420–438.
[48] Einziger G, Friedman R, Kantor Y. Shades: Expediting Kademlia’s lookup
process. Computer Networks. 2016;99:37–50.
[49] Fan L, Cao P, Almeida J, et al. Summary cache: A scalable wide-area web
cache sharing protocol. IEEE/ACM Transactions on Networking. 2000;8(3):
281–293.
[50] Wang P, Tyra J, Chan-Tin E, et al. Attacking the KAD network—Real
world evaluation and high fidelity simulation using DVN. Security and
Communication Networks. 2013;6(12):1556–1575.
[51] LocherT, Schmid S, Wattenhofer R. eDonkey & eMule’s Kad: Measurements &
Attacks. Fundamenta Informaticae. 2011;109(4):383–403.
[52] Grimm V, Berger U, DeAngelis DL, et al. The ODD protocol: A review and
first update. Ecological Modelling. 2010;221(23):2760–2768.
[53] Akram W, Niazi MA, Iantovics LB. Towards Agent-Based Model Specification
in Smart Grid: A Cognitive Agent-based Computing Approach. arXiv preprint
arXiv:171003189. 2017.
Chapter 11
Descriptive agent-based modeling of the
“BitTorrent” P2P protocol
Abdul Saboor1 , Nasir Khan1 , and Mubariz Rehman1
11.1 Introduction
BitTorrent designed by Cohen [1] is a peer-to-peer (P2P) file-sharing protocol devel-
oped to distribute data in such a manner that the actual distributer would be capable
to decrease the bandwidth use, and he will be able to reach the same amount of users
as well. In BitTorrent [1], data is divided or broken down into small chunks, and
every single user using those chunks would be available to upload to other people
in a swarm to save bandwidth where swarm is a group of peers connected to single
torrent. It is one of the most efficient protocols for replication and distribution of files
over the internet [2]. BitTorrent facilitates users to upload and download files at the
same time with each other as it uses principal known as tit-for-tat where BitTorrent
client act as a client and server. BitTorrent is the complex networks which improve
the performance of the communication systems.
In a network, many clients communicate with each other in a complex P2P
network using BitTorrent. In BitTorrent, peers work as seeders or lechers which act
both as a client and a server [1]. The main complexity [3] of the BitTorrent model is that
the whole model follows the power law distribution mechanism. In this mechanism,
when the total number of seeder involved in communication increases with respect to
network size, it tends to decrease in the network clustered. High-network clustering
values are only obtained when the swarm size of the network increases exponentially.
Agent-based modeling leads to the most efficient paradigm for the complex
adaptive systems (CASs) [4] and for the large-scale complex adaptive communica-
tion networks and environment. One of the main limitations of the other models is
that it does not offer the flexibility in terms of the complex connectivity of P2P net-
work. Agent-based modeling for the complex systems provides many benefits like
the overall throughput of the system increases and nodes interaction become easy for
communication.
In the modern era where the network sizes increase as well as the complexity
and the dynamically increase in the different network communication like wireless
1
Cosmose Research Group, COMSATS University, Pakistan
334 Modeling and simulation of complex communication networks
networks, mobile ad hoc networks, P2P communication. This all leads to the increase
in the complex network-emergent phenomena. These effects cause the network con-
gestion and the overall communication cost increases. To avoid these effects, modeling
of the network communication is needed instead of CASs and complex adaptive com-
munication networks and environments (CACOONS) [5]. Agent-based modeling of
the complex networks overcomes these issues and increases the flexibility of the
systems. Agent based modelling (ABM) is commonly used to model the dynamic
behavior in CASs. These techniques are more effective for the modeling of complex
networks because the interaction of the networks is handled using the visualization
techniques. As the simulation experiment of the complex systems increases the over-
all complexity of the systems, so there is a need of specific modeling approach
for the large-scale networks. ABM allows a clear, concise and the unambiguous
representation of the complex network model.
In the past few years, the unpredictable growth with respect to scale and complex-
ity of the network is observed. Networks sizes are increasing rapidly which results
in increase in their complexity. In the large complex networks, it is very difficult to
test and evaluate each module and its implementation. For the above aforementioned
purpose, simulation in terms of modeling plays a vital role in the development and
the design of the distributed system networks. For the agent-based modeling, a very
interactive and popular tool NetLogo is used. It provides the visual simulations for
the complex adaptive networks. One of the main features of the NetLogo models is
that it provides user friendly environment.
From the previous studies, is no descriptive agent based model (DREAM) [4] for
the BitTorrent P2P networks is developed. To overcome the complexity of distributed
networks, the basic Descripted agent-based model is proposed in [4]. In this chap-
ter, we have modeled DREAM of BitTorent. Overview, design concepts and details
(ODD) is used for textual representation of BitTorent. We have shown the compar-
ison between the ODD model [6] and DREAM model. As a result, we conclude
that for agent-based modeling, the NetLogo environment is extremely flexible with
very less space complexity and the DREAM model provides the basic description of
agent-based modeling.
In the BitTorrent P2P network, the main benefits are as follows:
Perform
Perform complex Perform simulation
interdisciplinary
network analysis experiments
model comparison
Discover
emergent
behavior
11.1.1 Contributions
The main contribution of the BitTorrent in ABM is
● Agent-based modeling by using NetLogo provides the wide range of benefits such
as real-time analysis and statistical measurements.
● Implementation in NetLogo is more efficient and in a visualized form. Complex
network simulation is easier as compare to Network Simulator 2, etc.
336 Modeling and simulation of complex communication networks
Client Client
Server
Client Client
in real time is easier. Different agent-based modeling tools used for the simulation of
some of them are swarm, macon, NetLogo, metaabm, etc. Agent-based modeling has
direct relation with the CASs.
Figure 11.3 Peer-to-peer network. Adopted, with permission, from Reference [14]
demand to this neighbor. As leechers have complete knowledge about every chunks
availability in its neighborhood, it always demands the oddest one.
11.3.5.1 Peer
A peer is an active client of BitTorrent to whom other peers can connect with for the
purpose of sharing content or it can be stated as peer is an instance of a client than is
running on any computer.
11.3.5.2 Swarm
Swarm represents the whole networks of users or peers which are connected to a same
torrent.
11.3.5.3 Tracker
Tracker is a server whose responsibility is keeping track of all the peers or
communication between the peers which are in the swarm by using BitTorrent
protocol.
11.3.5.4 Leecher
Leecher is referred to that peer which has downloaded some complete content and
which never uploads content or data.
11.3.5.5 Seeder
Seeders refer to those who have done downloading content and leaves connection
open for uploading parts of content to leechers.
supporting standard BitTorrent protocol but may differ in few certain features [12].
Now to start downloading, .torrent file is downloaded and opened in BitTorrent client.
When the user requests for the web page, the computer system sends the request
to the download server where the web page is present. The computers through which
the web page is obtained are the central servers. This scenario describes how much
of the traffic is on the web [12]. To prevent these from these traffic issues, P2P
protocols are introduced. BitTorrent is classified as a P2P protocol; the computers
involved in the process are called BitTorrent swarm (a set of computers which upload
and download the same torrent). These computers send and receive the data with the
central server involvement.
Mostly, .torrent file is uploaded in the BitTorrent client by joining the BitTorent
Swarm. There is tracker embedded in the .torrent file with the help of this BitTorent
contacts. The specific server which contains the address of the connected computers
is called a tracker. The Tracker has a specific IP address which is to be shared to the
swarm for connection establishment.
Once the connection is established between all the systems, the BitTorrent client
download the numerous small pieces of files in the torrent. Once the downloading
process completes, it can share the data to the other BitTorrent clients in the swarm.
In this special procedure, every system can download and upload the data on the same
torrent. This process enhance the downloading speed; if there are 10,000 request for
a single file to download on the same server by using P2P protocol, it does not cause
the traffic load on the server but also enhances the response time of the server. It also
speeds up the systems.
When the user requests for downloading from the swarm, it is called peers or
leechers. After downloading the complete file from the swarm, the user remains
connected to server. Contributing maximum bandwidth so the other user involved
can easily download the file. This process is called seeders. The torrent which has
complete set of file must join the swarm (group of computers) so that the other systems
can start the downloading process. If there is no seeder in the P2P protocol process,
then the downloading process will not to be started. It is to prefer the clients who
utilize their more upload bandwidth other the transferring the files to others clients
which is a very slow process. The users who contribute more toward the upload
bandwidth have speed up the swarms download process. The nature of BitTorrent
files is a flood like architecture in which many nodes have the file. The output of the
systems increases if mores nodes are to be attached to swarm. This will be in result of
reducing the computation cost of the system and also bandwidth/resource utilization.
Some of the others P2P protocol enhances the redundancy issues, while BitTorrent
also get rid of these issues. BitTorrent is used for reducing the distribution cost of
BOINC client server system [8].
● Both the downloading and uploading cause congestion in bandwidth, but it can
be avoided if you have fast internet.
● While downloading the file from torrent, everyone can have access of each system
IP Address. This will lead to a lack of security. To enhance the security, use the
virtual private networks simply.
● The computer performance decreases while using the P2P BitTorrent software.
In [14], the authors implemented broad trace analysis and modeling to understand
the behaviors of such systems and found that the current BitTorrent system delivers
poor service availability, unfair services to peers and fluctuating downloading per-
formance. Hence, the authors proposed a new architecture design where the different
torrent tracker sites are organized into an overlap to assist inter-torrent association.
BitTorrent plays an essential part in the Internet but has required an up-to-date
knowledge of its Ecosystem. The authors [13]provided a broad picture of English-
language BitTorrent community Ecosystem and identified over 4.6 million exclusive
torrents and 38,996 trackers of maximum widespread torrent-discovery sites in period
of 9 months. Furthermore, the authors developed multitracker crawler and found
that BitTorrent ecosystem is most successful open application by many measures.
However, the fame of BitTorrent content is delicate to its age.
Complex network model is proposed in [21], for BitTorrent-like networks and val-
idated analytical computers using simulations. Furthermore, the authors have shown
that their proposed model is consistent by using BitTorrent simulator with BitTor-
rent protocol. However, for heterogeneous configurations, authors have ignored to
generalize the model and extensions like peer exchange.
In [22], the authors have developed scalable methodologies by gathering wide
measurements of BitTorrent demand demographics, laying their resultant traffic
matrix and showing that a huge fraction of small ISPs do not have plenty means
to limit traffic after studying real ISPs. However, area yields win circumstances for
average and huge size ISPs which is confined by unlocalizable torrents which have
insufficient native neighbors.
A new phenomena-based friendship and trust model for the scheme of P2P
file-sharing networks design is presented by the authors [23] and executed the TRI-
BLERP2P file-sharing system. Furthermore, the authors described that how to make
social overlay and semantics on top of the BitTorrent protocol. The authors have shown
that how several TRIBLER mechanisms can produce good performance with respect
to current results and addressed major challenges in P2P research. However, the
authors ignored to extend reputation system with TRIBLER, application-level multi-
casting and tag-based navigation. However, proposed system is unable to predict users
file concern and public relationship and practical file replication and recommendation.
The authors [24] confirmed by analyzing BitTorrent trace that by clustering nodes
with common interest and physically close can polish file searching proficiency in P2P
system and proposed SOCNET that integrates multiple components and proved that
proposed system SOCNET performed better than other systems in trustworthiness,
dynamism resilience, system overhead and file searching efficiency.
In [25], the authors considered BitTorrent network and conducted an agency
service in which agents are used for priority evaluation for users requested tasks by
prioritizing each task in descending order. Furthermore, the authors have designed
new scheme to avoid duplicate files transferring and for activation of free riders.
Results showed that proposed idea performed better for more downloads compared
to original under limited time and bandwidth.
The authors [15] addressed distribution problem and defined collaborative file
distribution system including transmission and possession matrix while gathering
theoretical bound required for minimum time distribution. Furthermore, the authors
Descriptive agent-based modeling 345
developed several types of algorithm which decides that to whom and which file pieces
are to be sent with in a scheduling problem. Results of proposed algorithm which is
a weighted maximum flow algorithm show better results than other algorithms by
returning optimal solution in many cases.
The authors [26] proposed methodology to motivate peers for contribution of
resources in the network in which every peer in an access link shares his upload and
download streams. Furthermore, presented allocation scheme is implemented on each
peer in a distributed manner. Results of proposed scheme showed improvement in
peers performance in heterogeneous system, and with comparison with BitTorrent
it exhibits significant advantage in few terms like dynamic capacity allocation and
seeds motivation to for contributing and remaining in the system.
In [27], the authors have discussed some motivations of BitTorrent protocol
and focused on its two main components which are piece revelation strategy and
unchoking algorithm. Furthermore, the authors have used game theoretic approach
and show that BitTorrent does not use tit-for-tat and proposed another model known as
auction-based mode as well new bootstrap mechanism for ping. Results of proposed
model attains objectivity and robustness deprived of any wireline alterations to the
BitTorrent protocol. However, the components considered in this work are treated
orthogonally and focused on incentives in a swarm instead of between swarm.
For multimedia broadcasting, the authors [7] have proposed strategy based on
quality over P2P network based on PBS and showed that those peers who have greater
service level provides greater and stable quality of multimedia. Results showed that
aforementioned technique is more efficient for multimedia broadcasting of P2P then
TFT, which is used by BitTorrent currently. However, the authors ignored to carry
simulations on other video-streaming applications. In [28], the authors proposed
improved version of BitTorrent for P2P communication in cloud computing.Instead
if sharing the complete file, the peers and seeders can share the segments of the
data with other clients and they proposed a centralized tracker in order to increase
the security. In this way, a peer do not know the identity of the client.Results show
that their proposed model show significant results in cloud computing. However, the
authors did not compare the results with other P2P protocols to show its significance.
In P2P protocols, BitTorrent is mostly used for file sharing. For congestion
control, TCP was introduced in BitTorent which was later on replace by uTP which
control the congestion on application level. In [29], the authors studied the completion
time of torrents using both TCP and new uTP protocol in order to compare their
performances. The results show that uTP performed better in terms of torrent time
completion. However, the authors did not consider multiple peers on a single machine.
In [30], the authors performed analysis and measurement study on very famous
P2P protocol BitTorrent. The researchers made all the data publicly available so that
people can verify it truthfulness. Due to BitTorrent work in global components, it
ensures both reliability of content and metadata. They also found at that decentraliza-
tion makes the metadata more exposed. The authors ignored decentralization issue in
this study.
Researchers in [31] performed simulations to study bit torrent protocol. They
studied the protocol and evaluated its performance under different workloads. In their
study, they considered several performance measuring metrics like file download time,
346 Modeling and simulation of complex communication networks
peer-link consumption and distribution of trackers among various peers. The results
show that BitTorrent protocol performs optimally in terms of peer-link consumption
and file download time except some extreme conditions.
P2P networks are mostly the top-listed networks for sharing the information on
the internet. It is a very difficult task to have a proper check and balance for the data
shared through P2P networks, whether the information is legitimate and illegitimate.
The existing P2P networks follow the simple handshake protocol in which authentica-
tion services are not included. The authors [32] proposed an XTRA-P2P framework
through which covert channel communication is avoided. XTRA-P2P is robust for the
security attacks on P2P networks. Efficient outcomes against eavesdropping attacks
and brute force attacks of proposed technique are observed. In this paper, the authors
focus on three parameters: handshake message, bit field message and piece message;
however, the author did not consider other parameters of P2P protocol.
The BitTorrent protocol is designed for the content with time insensitive. The
author [33] enhanced the current approach for the video streaming as it is time sen-
sitive.The piece have higher download priority. In the basic BitTorrent protocol, the
pieces are downloaded randomly; however, priority-based approach is introduced by
the authors to enhance the system for streaming.However, the time complexity of the
proposed system is much higher than the general P2P network system. Effectiveness
and the streaming model robustness in real network is still a challenge.
BitTorrent is a scalable P2P distributed system, without overwhelming the capac-
ity of the server, large files can be shared. Resource utilization is one of the challenges
in BitTorrent systems. The authors [34] introduced the discrete event simulator for
BitTorrent systems that equalized the overall performance of newly joined nodes that
have less or more blocks than average nodes. Delay issued for pre-seeded nodes are
also mentioned in paper. By combining the bandwidth matching tracker and pairwise
block level, the unfairness among nodes in BitTorrent system decreases.
BitTorrent is the most powerful and complex P2P protocol. To evaluate P2P
protocol simulation and experimental evaluation is the basic methodologies. In this
paper, the author [35] introduce a Torrent Lab, a specific test bed in which live
experiments and BitTorrent simulation are performed.It allows the new agents to
involve in the simulation; however, in this simulation, the Torrent lab is machine
dependent. Hardware Resources and Machine Processing Capacity are the limitation
of this work done.
In this paper, the authors [36] proposed a multi-agent based modeling of BitTor-
rent P2P protocol. The authors proposed a system in JADE platform where each peer
act as agent and each agent can exchange the information with other agent as BitTor-
rent Platform works. The proposed models allow more than 1,000 systems to involve
in simulation at the same time. The results were analyzed on multiple parameters:
number of peers, abort rate, exit rate, etc. Limitation of this proposed work is that
JADE agent run autonomously while real and precise agent-based models run on any
real BitTorrent network.
The authors [37] proposed a simple fluid model (P2P model) in which different
parameters such as scalability, efficiency and the performance is evaluated.In the
proposed work, the model have high efficiency with respect to capacity utilization;
Descriptive agent-based modeling 347
however, the model proposed used global knowledge as a selection of peers, while in
real P2P network system, the peers have limited view of other peers.
In this article, the author [38] presents a discrete event simulator using a Bit-
Torrent simulations. The main objective of the proposed work is to achieve multiple
peer up to 5,000. The proposed work focused on the upload capability of each node
and data size each node serves. However, the limitation of this work is that the size
of individual peer is less as compared to usual. Overall behavior of P2P protocol is
affected from set size because the selection strategy is based upon set size of peer.
In this paper, the authors [39] simulate P2P protocol at the packet level. Packet-
level approach is more complex than the flow-level simulations of P2P protocols. The
simulation is done on NS-2 network simulator which allows using multiple algorithms
for peer and piecing selection. Constant peer population and flash crowd algorithm is
evaluated. However, the complete scenario on packet level approach is not evaluated
even when the first seed appears until all the peer in network can get the shared content.
The authors [40] proposed a general P2P simulator for BitTorrent modeling.
GPS supports the modeling of the download component. Complex networks, large
size of files is complex in model-based approach. However, the proposed simulator
can achieve maximum efficiency by modeling the communication at message-level
approach. Geographical peer selection and bloom filter usage approach is involved in
BitTorrent GPS model. However, the run-time efficiency of the model is much high
than the P2P simulator.
BitTorrent P2P system that generates a large amount of ISP traffic through which
the cost increases. To overcome this issue, a new approach is implemented to con-
trol cost and enhance BitTorrent traffic locality. The authors [41] implemented a
technique in which biased neighbor selection is focused. In biased neighbor selec-
tion, peer choose the majority but not all neighbors as a peer. By comparing with other
approaches bandwidth limiting, gateway peer and caching biased neighbor selection
not required dedicated server and can be implemented on big P2P networks. In the
future, the proposed technique is integrated with bandwidth limiting and caching to
improve the overall performance of the system.
In the research article [42], the deep concepts behind the BitTorrent working
is discussed.The main objective is to understand the behavior of the P2P protocol
under heavy traffic load. In the research article, authors are concerned about the
two parameters: one is the download speed and the other is the availability. The
comparison between different P2P protocols is also discussed in paper.
In [43], different P2P protocols are studied which are used both in industry as
well as self-used.The main of this paper is to discuss the different technical issues such
as network flow control, delivery, etc. P2P technologies used in the future generation
computing are also discussed. Traffic measurement is the key focus of the P2P network
protocol studied in this article.
In [44], the authors inspected BitTorrent protocol for data diffusion in the environ-
ment of Computational Desktop Grid. The authors designed a prototype and finds out
that even if Desktop Grid architecture depends on centralized coordination, even then
they can simply incorporate this P2P technology deprived of fundamental variations
on their model of deployment. Furthermore, the authors have shown experimental
348 Modeling and simulation of complex communication networks
performance and evaluation on a LAN cluster that BitTorrent performed well for
large data files transfers and scalable with increasing number of nodes but suffers
from overhead while transmitting small data files.
Working of BitTorrent is studied by the authors [45] and used several mechanisms
to attain optimal performance of the protocol. The authors investigated the influence
on download rate by number of peers by using libtorrent client and NS-2 simulator.
The authors found that overall Download Time decreases with increase in number
of peers because of higher availability of files through multiple peers. However,
Download Time increases due to increase in traffic.
Three modification are presented by the authors [46] to improve fairness of Bit-
Torrent protocol. According to the authors presented models, all models provide some
level of improvements. Furthermore, the authors ranked all presented modification
according to improvement to fairness, and this ranking also demonstrates that how
each proposal modifies the BitTorrent.
To show the topology of P2P networks, the authors [47] have presented plane
graph model by using the BitTorrent protocol. For evaluation, different parameters
are calculated such as clustering, betweenness, shortest path, coefficient, etc., for
revealing the topological features of BitTorrent. The authors showed that high clus-
tering value is obtained with larger swarm, which permits peers to have additional
adjacent neighbors. Furthermore, the authors achieved positive correlation between
node strength and betweenness. However, the authors did not consider analysis of the
dynamical progress of BitTorrent networks.
The authors [48] have presented BitTorrent-based data sharing for mobile devices.
The authors presented work analyzes different solution based on cloud which uses
remote server for downloading via BitTorrent and moving it to mobile devices in
energy efficient way. Furthermore, the authors presented model is evaluated via mea-
surements carried out on mobile devices, which showed that energy consumption is
minimized, traffic is minimized and the content on the cloud server can be accessed
in multiple ways such as streaming or HTTP. However, savers in the clouds are not
feasible both from business and architecture side.
11.4.1 PeerSim
PeerSim is the simulator of the P2P network models which provide the scalability,
flexibility and the efficient environment for P2P simulations. In the real environment,
the experimental simulation of P2P systems is very costly and their results are not
reproductive. To avoid all of these issues, a java-based simulator is introduced called
PeerSim. Testing of the specific protocol is also supported by the PeerSim. The main
features of the PeerSim are described in the following sections.
11.4.1.1 Scalability
PeerSim provides the scalable real-time environment for the simulations of P2P net-
work model. In PeerSim simulator, the network size is not fixed, it depends upon
the P2P network protocol for the network simulations. The large simulation can also
performed in very cost-cost effective manner.
Descriptive agent-based modeling 349
11.4.1.2 Modularity
In the simulation environment of the PeerSim, all of the components are freely
involved in the simulations and are easy to configure. The components enrollment in
the simulation is dynamic and controllable.
is divided into total ten chunks, each chunk represents a segment, so there are total ten
segments involved in the simulation. Each turtle have their own ten variables each of
which corresponds to a possession of a particular file segments. For example, if the
segment 5 variable for a given turtle is sent to be 0, it means the current turtle cannot
possess this segment. If on the other hand it is to be set to one, then we conclude as the
given turtle can possess the segment. Depending on this condition, the seeders begin
with all ten segments variable set to one while leechers have zero. The simulation
can be finished in two ways: whether all turtles turn green which means they have
downloaded the complete file entirely or the second way is that seeds drop out from
the simulation and remove one or more segments from the network. In this case, it
would be not possible for anyone to finish the simulation properly by downloading
completely.
Basic principles
The basic principle of the BitTorrent model is to design a P2P agent-based model
which can simulate the performance with less computation complexity and increase
the run time of the simulations. The comparisons with other simulation platforms like
NS2 have more time complexity as well as space complexity. To overcome this entire
limitation, agent-based modeling is to be adopted. Multiple numbers of clients can
contribute and download the data from the network. One of the main advantages of
the proposed model is that it does not contain congestion in the network model.
Emergence
The main idea behind the proposed network model is to control the environment
dynamically. The turtles involved in the agent-based model and other parameters like
selfishness are to be controlled using the slider parameter. The outcome of the model is
unpredictable by changing the behavior space and complexity. The results are mostly
dependent on the rules defined inside model than on the individual or environment.
Adaptation
In the proposed network environment, there are some specific rules that are defined.
There must be at least a seeder in the simulation. If the probability for the simulation
drop out is more, then there will be more chances of simulation failure. To avoid the
failure outcome, these rules are to be adopted by the network model. When lecher
gets all the segment of the file then they turn its behavior to the seeder for supporting
the environment. If there will be more seeders involved in the simulation, then the
run time of the simulation is minimized. In the other way, we called seeder value as
a fitness factor of the network.
Objectives
There is a clear negative correlation between the number of initial seeds and the total
time it takes for all turtles to completely download the file. Furthermore, torrents with
few initial seeds also exhibit greater volatility in torrent activity (roughly analogous to
Descriptive agent-based modeling 351
average download speed). Both these behaviors are consistent with real-world obser-
vations. This simulation also beautifully reproduces typical torrent speed behavior
whereby downloads begin slowly (with few seeders) then rapidly speed up as more
file segments get distributed. This is closely related to the observed pattern whereby
some segments are distributed much more widely than others initially (when there
are only 1 or 2 seeds), a behavior that is also reproduced by this simulation.
Learning
In the proposed model, the agents cannot change their behavior with respect to time or
previous history. The behaviors depend upon the downloading segments; if there will
at least one segment, then the color of the turtle changes to blue. While downloading
all the complete segments its color changes to green. These are the main learning
concepts of the proposed agent-based model.
Interaction
There is the direct interaction between the seeder and leeches. According to the
main concept of the P2P network, there is one-to-one connection between each node.
Depending on these interactions, the segment values are changed and seeders upload
the segment to other leeches. The interaction communication between each nodes are
represented as line in the graphical user interface.
Initialization
At the initial stage of the model, there are two entities initialized: seeders and leeches.
The values of the seeders and leeches can be changed by using the slider parameter.
It is not necessary at each simulation the values of seeders and leeches are same.
Its change depends upon the required network specifications. The initial conditions
have much effect on the model and simulation. If the value increases, the run time
complexity also increases.
Input data
The proposed model does not contain the input data to represent process like time-
varying, etc. Although the input is not in terms of initial value of state variable or
parameter values, in the dynamic system model, the input is to be obtained by the user
by the state variables. Depending upon these state variable values, the simulation runs
but the internal variables are not affected by these input data. Success/failure rate of
the network model depends upon the state-variable values. The simulation is said to
have succeeded when all the agents acquire all file segments (i.e., all turtles end up
green). The simulation fails when users are unable to obtain some file segments. This
will occur when seeds drop out (due to selfishness) before certain file segments are
in distribution amongst leeches.
Seed
Leech
the size of the file) and allows individual users who are downloading the file to
simultaneously upload completed segments to other users. This means that users are
downloading from each other as well as from the server or the initial distributors.
Compared to traditional web hosting where everyone downloads directly from the
server, BitTorrent significantly reduces bandwidth usage and hardware requirements
for the initial distributor, thereby resulting in substantial cost savings. In some cases
(as this model will demonstrate), it is possible for the initial distributors (seeds) to drop
out completely without undermining the network, provided enough file segments are
in distribution.
Procedure
Setup
Go
Check
random Yes
number
of links
No
Generate selfish
links (random
number segments)
Distribute random
segments
Download complete
segments
Leeches are node agents that begin with no parts of the file then gradually obtain file
segments from seeds and from other leeches.
Download random
Distribute random
segment complete
Generate random
number segment
segment number
segment number
Check random
Setup
Go
Loop
call ()
call ()
call ()
call ()
call ()
return () result ()
Peer obtain
Torrent tracker 6 swarm info
server
peer/seeds
Another peer
gets .torrent file
5
ts Make turtles
Switch/Input Data
Links Breeds Links Already-WideSpreads
Go
Avaitable
segment already Widespread
Upload File Segment
Upload File Segment
her
ABM
Experiments
Procedures
setup go Already Widespread Available Do plots
Selfish turtle dropout upload file segment Make new seed green
This network model will be described in detail and a CNA will be performed
next. The result will be discussed in the results section of the chapter (Figure 11.9).
Now, the “Seeds” breed here is first described in the specification model. After-
wards range we note the internal variables that will be used in the simulation. There
are two specific internal variables here. One is the “Random_segment_number” vari-
able. This variable is used to generate a randomly generated number at each iteration
and determines which file segment a seed will attempt to share.
The other variable is “Segments” which represents data chunks of a complete file
and depends on the size of the file. The seeds will have all the segments of a file.
Unlike seeds, the leeches are the agents which try to get segments from the
seeds or leeches. After developing a specification model of the agent breeds, we next
develop a model for various globals variable. These variables play an important role
for the configuration of the simulation model.
11.5.5 Globals
In this section, we describe the global variables for the simulation model. Now, in this
model, the key input global variables here are four in number. These are described
next in the specification model given below.
Sliders:
p: The probability that, each time interval, seeds will drop out of the system.
s: The number of seeds that the simulation begins with.
l: The number of leeches the simulation begins with.
Switch:
Smart Seeding: Whether seeds operate intelligently, or randomly.
In this specification, we can clearly see that three of the variables, viz., “p,” “s”
and “l,” represent inputs which are provided by the user via a “slider” UI element in
the NetLogo simulation model. Whereas “Smart_seeding” is a switch or represents
a Boolean input variable. “p” is used for the probability (%) that, each time interval,
seeds will drop out of the system. “s” is the number of seeds that the simulation begins
with. “I” is the number of leeches the simulation begins with and the “Smart_seeding”
is whether seeds operate intelligently, or randomly. The slider variable “s” and “l” are
used to initialize the number of seeds and leeches for the particular simulations.
Greater the value of “s,” the network will converge faster, and greater the number of
“l,” greater will be the execution time of a simulation.
358 Modeling and simulation of complex communication networks
These globals are randomly placed when initial simulation screen is setup having
“s,” “l” and “p” as a sliders used for gradual change in an input configuration required
in different experiments. It also assists in minimizing the effects of our configuration.
So by changing the number of agents and repeating the number of simulations several
times will not skew the eventual results.
The switch “smart_seeding ” is used for seeding intelligently or randomly. When
this is switched off, seeds randomly select a segment to upload. When smart seeding
is switched on, seeds will be significantly more likely to upload a segment that is not
already in distribution amongst leeches.
11.5.6 Procedures
The procedure makes seeds “smart” in which they are highly unlikely to seed parts that
are already in distribution amongst leeches. If more than one leech has the segment
corresponding the random number that was initially generated by a seed, random
numbers will continue to be generated up to three times until a segment is found
that is not yet distributed. From personal experience, real-world torrents exhibit both
random and “smart” behavior depending on tracker parameters.
Procedure Check-if-segment-is-already-widespread:
11.5.6.1 Check-if-segment-is-available
This checks to see if the random segment number that has been selected corresponds to
a file segment that the turtle does not possess. If this is the case, the turtle is instructed
to keep generating numbers (up to 100 times) until it finds one that corresponds to a
file segment that it DOES possess. In other words, it ensures leeches are uploading
segments that they actually have.
Procedure Check-if-segment-is-available:
11.5.6.2 Check-if-segment-is-needed-by-others
This procedure checks to see if the random segment number that has been selected
corresponds to a file segment that every other turtle already possesses. If this is the
case, the turtle is instructed to keep generating numbers (up to 100 times) until it
finds one that corresponds to a file segment that another turtle needs. For example,
if everyone else already has segment 2, a turtle would not upload segment 2.
Procedure Check-if-segment-is-needed-by-others:
11.5.6.3 Do-plots
Procedure Do-plots:
This procedure creates the histogram showing the number of links during each “tick”
and a plot showing the number of segments completed with time. This is an accurate
representation of total torrent activity; however, it may not directly correspond to an
individual turtle’s activity.
11.5.6.4 Generate-random-segment-number
“ask leeches with [random-segment-number]” is a “workaround” to overcome a prob-
lem whereby leeches who generated a segment number (say 8) who did not possess
that segment would immediately upload it (during that tick) if they happened to receive
it from another turtle. Their random segment number is set to 88888, so they would
360 Modeling and simulation of complex communication networks
not try to upload anything until the next tick (i.e., after they have generated another
random number).
Procedure Generate-random-segment-number:
11.5.6.5 Go
Procedure Go:
In this procedure, the simulation starts and P2P communication begins. If the complete
file is downloaded by at least one, it generates a random segment number and the
seed turn to green.
11.5.6.6 Make-turtles
Procedure Make-turtles:
In this procedure, initially all leeches are to be defined and the color of these leeches
at beginning is set as a red. Then at the Step 2, the seeds are also settled. The leeches
and seeds are settled in a circle.
Descriptive agent-based modeling 361
11.5.6.7 Makes-new-seeds-green
Procedure Makes-new-seeds-green:
This makes turtles that have finished downloading all ten segments turn green. It
does not technically change their breed, but this does not really matter in terms of the
performance and accuracy of this simulation.
11.5.6.8 Selfish-green-turtles-dropout
Procedure Selfish-green-turtles-dropout:
In this procedure, if the seeds those download and participate in other leeches down-
loading process, these are called selfish turtles. If the selfishness is more than the
leeches, then the simulations stops.
11.5.6.9 Setup
For this model to work with NetLogo’s new plotting features, clear-all-and-reset-
ticks should be replaced with clear-all at the beginning of your setup procedure and
reset-ticks at the end of the procedure.
Procedure Setup:
Begin
1. clear-all-and-reset-ticks
2. set-default-shape turtles “circle”
3. make-turtles
End
11.5.6.10 Upload-file-segment
Assuming turtles actually have the segment they are trying to upload (highly likely
given the “check-if-segment-is-available” above), they will seek out a leech without
that segment and share it with that turtle. The visible link is partly for esthetic purposes
and simply to show where the upload activity is during each “tick.” This process is
repeated for every possible random number (0–9) and therefore every possible file
segment.
Procedure Upload-file-segment:
All the procedures and their communication links are shown in Tables 11.1
and 11.2.
11.5.7 Experiments
As expected, there is a clear negative correlation between the number of initial seeds
and the total time it takes for all turtles to completely download the file. Furthermore,
torrents with few initial seeds also exhibit greater volatility in torrent activity (roughly
analogous to average download speed). Both these behaviors are consistent with real-
world observations. This simulation also beautifully reproduces typical torrent speed
behavior whereby downloads begin slowly (with few seeders) then rapidly speed up
as more file segments get distributed. This is closely related to the observed pattern
whereby some segments are distributed much more widely than others initially (when
there are only 1 or 2 seeds), a behavior that is also reproduced by this simulation.
As selfishness increases, the probability of failure also increases, although the random
nature of this agent behavior creates extreme variance in this relationship (i.e., total
Descriptive agent-based modeling 363
ABM Globals
ABM Procedure
ABM Links
ABM Agents
Global Slider
Global Switch
Global Results
Procedure Already Widespread
Procedure Available
Procedure Needed by other
Procedure Do plots
Procedure Random Segment Number
Procedure Go
Procedure Make turtles
Procedure Make new seed green
Procedure Selfish turtle dropout
Procedure setup
Procedure upload file segment
Links Link Breeds
Agents Agents Breed
Agents Agents Attribute
Attribute Uploading
Attribute Downloading
350
Number of completed pieces (ms)
300
250
200
150
100
50
0
0 200000 400000 600000 800000 1×106 1.2×106 1.4×106 1.6×106
Time
one represents the selfishness, second slider is used to set number of initial seeds
and third slider is used to set a number of initial leechers. Furthermore on left side
we also have a switch which is used for the purpose of smart seeding and a graph
showing links. On the right side, we have multiple monitors showing segments and
a segment graph. To appropriately notice performance of BitTorrent in NetLogo we
have simulated a similar network setup and parameters as explained in PeerSim.
Furthermore, the results of both NetLogo and PeerSim are described in subsequent
sections.
Parameters Values
parameters. The file which is to be distributed has 10 segments and each agent owns
10 variables and each variable corresponds to particular segment in which 1 indicates
possession of particular segment and 0 indicates that turtle does not possess that
particular segment. In this model, all leeches are set to 0. In this experiment, we
have set initial seeds to 50, leeches to 400 and set selfishness probability of dropping
out of each green turtle to 0 with smart seeding set to ON. We start our experiment
by calling go procedure. During each and every tick, each turtle who owns a file
segment (initially seeds) produces a random number from 0 to 9 which matches to
the file segment which is going to be uploaded. They will only generate numbers
that are required by others. With smart seeding switched ON, seeds will generate
random numbers up to three times until they find one that is possessed by 1 or fewer
leeches to prevent the probability of uploading the same seeds multiple times. The
upload method starts by requesting a turtle to select that leech which does not own
file segment to be uploaded and then a particular turtle is asked to set variable to 1
which shows that they now own that segment. Maximum of one segment is uploading
per tick and no limit for downloading. Furthermore, in real-world scenario, download
capacity hugely beats upload capacity for maximum users. Once a leech owns single
segment, it goes blue, and finally when a leech owns all segments, they go green which
can be seen in figures that all the leeches turns green after getting all the segments.
After leeches go green, there is a likelihood that these green turtles will drop out
which signifies selfish manners, and the drop out probability for individual green
turtle can be set by means of slider parameter. In this scenario, the simulation can
finish in either two ways. First, it can end when all turtles go green. Second, if seeds
drop out in a way that completely eliminates one or more segments from circulation
(Figures 11.11–11.13).
500
Number of completed segments
400
300
200
100
pieces for each node in the network. In PeerSim distribution of the pieces among the
nodes is randomly chosen in a beginning. When the simulation begins, few nodes
start to distribute torrent with their neighbor nodes, some but not all of the nodes, and
when all nodes own file graph shows constant value.
While in our designed agent-based model we have followed the same procedure
discussed above. On x-axis we have time in which each agent gets all of the segments
and on y-axis we have number of completed segments which can be seen in Figure 11.4.
Furthermore we are distributing segments among agents instead of pieces. When
leeches own all the segments, they turn green, which states either selfish behavior
or end of simulation which can be seen in the graph when all segments that are
distributed among agents graph go constant. Furthermore, detailed results of both
ABM and PeerSim are already discussed above in detail.
0
50
100
150
200
250
300
350
Fraction of vertices having betweenness x or greater ABM
Globals
Procedures
0.05 0.10 0.20 0.50 1.00 Agents
Links
Slider/Input Data
3.0
Switch/Input Data
Results/Output Data
LinksBreeds
Already-WideSpreads
Available
Upload File Segment
3.5
Needed by Others
do Plots
Random Segment...
Segment already...
Go
4.0
Make turtles
Betweenness
4.5
Setup
Make turtles
Betweenness centrality
Upload File Segment
Agent Breed
5.0
Seeder
Leecher
Agent Attributes
5.5
Figure 11.15 Betweenness of NetLogo simulation
Download
Modeling and simulation of complex communication networks
6.0
Betweenness
Descriptive agent-based modeling 371
Make turtles
Upload File SegmentAlready-WideSpreads
Setup
Die
Switch/Input Data
Selfish Turtle Dropout
Input Data Globals Procedures Go
Agents
The plot in Figure 11.18 shows the eccentricity centrality. In the case of eccen-
tricity centrality, following “Procedures” which has the highest Eccentricity centrality
value, we find ABM to have the second highest value followed by “Agents,” “Agents
Breed” and “Agent Attribute.” In Figure 11.19, we have plotted the eccentricity cen-
trality in R [49] using power law. The plot show a decaying behavior because there
are few nodes in the network which have more connections and more nodes in the
network which have few connections.
The plot in Figure 11.20 shows the analysis of eigen centrality of network. The
plot shows that the node containing highest peak of eigen centrality is “Procedures”
followed by “ABM.” However, the nodes containing the lowest eccentricity centrality
is “Download” and “Uploads.” In Figure 11.21, we have plotted the eigen centrality
in R [49] using power law. The plot show a decaying behavior because there are few
nodes in the network which have more connections and more nodes in the network
which have few connections.
The degree centrality analysis of network is plotted in Figure 11.22. The plot
shows that the node containing highest peak of degree centrality is “Procedures.”
Node “Globals” have the second highest degree centrality, which represent the inputs,
which are entered from the NetLogo user interface model. In Figure 11.23, we have
plotted the degree centrality in R [49] using power law. The plot shows a decaying
behavior because there are few nodes in the network which have more connections
and more nodes in the network which have few connections.
Eccentricity
372
0
1
2
3
4
5
6
ABM 7
Globals
Procedures
Fraction of vertices having eccentricity x or greater Agents
Links
0.05 0.10 0.20 0.50 1.00 Slider/Input Data
Switch/Input Data
Results/Output Data
3.0
LinksBreeds
Already-WideSpreads
Available
Upload File Segment
Needed by Others
3.5
do Plots
Random Segment...
Segment already...
Go
Make turtles
4.0
Make New Seeds...
Selfish Turtle Dropout
Die
Eccentricity
Setup
4.5
Make turtles
Upload File Segment
Eccentricity centrality
Agent Breed
Seeder
5.0
Leecher
Agent Attributes
Upload
5.5
6.0
Eccentricity
Eigen centrality
0
0.2
0.4
0.6
0.8
1
1.2
ABM
Globals
Fraction of vertices haveing eigencentrality x or greater Procedures
Agents
0.05 0.10 0.20 0.50 1.00 Links
Slider/Input Data
Switch/Input Data
Results/Output Data
LinksBreeds
0.05
Already-WideSpreads
Available
Upload File Segment
Needed by Others
do Plots
Random Segment...
0.10
Segment already...
Go
Make turtles
Make New Seeds...
Selfish Turtle Dropout
0.20
Die
Eigen centrality
Setup
Eigen centrality
Make turtles
Upload File Segment
Agent Breed
Seeder
Leecher
0.50
Agent Attributes
Upload
1.00
Descriptive agent-based modeling
Eigen
centrality
373
374
Degree
2
4
6
8
10
12
14
ABM
Fraction of vertices having degree x or greater Globals
Procedures
0.05 0.10 0.20 0.50 1.00 Agents
Links
Slider/Input Data
1
Switch/Input Data
Results/Output Data
LinksBreeds
Already-WideSpreads
Available
Upload File Segment
Needed by Others
2
do Plots
Random Segment Numbers
segment already Wide spread
Go
Make turtles
Degree
Degree centrality
5
Make turtles
Upload File Segment
Agent Breed
Seeder
Leecher
10
Modeling and simulation of complex communication networks
Degree
Descriptive agent-based modeling 375
● ODD model is the textual model for the BitTorrent P2P network. It does not allow
the visualized network model in the BitTorrent simulations.
● ODD model does not support the quantitative comparisons, and the statistical
measures are not involved in ODD model.
● Agent-based modeling is the visualization modeling of the BitTorrent network.
For the comparison between different agent-based models, the visualization
modeling is necessary, although ODD does not support visualization modeling.
● In the DREAM model, the pseudocode description is present for the BitTorrent
model, while ODD does not allow the pseudocode description of the agent-based
model.
● ODD models does not allow the technical description of the algorithms.
● DREAM model supports the activity diagrams, sequence diagrams of the network
model while ODD does not include this type of modeling support.
At the end, we conclude that ODD models gives the basic ideas of the modeling,
and DREAM model allows the Descripted detail of the agent-based modeling of the
BitTorrent. Theory of computation is the basic study of complexity and the automata.
Complex theory provides the theoretical study of the complex systems. The agent-
based modelers mostly prefer to address a specific theoretical problem by using the
agent-based modeling. The main relation between the agent-based modeling and
computation theory is that it is mutually more beneficial for the CASs. To solve the
theoretical concepts, the complexity provides the fuzzy concepts to the agent-based
modeling. As the agent-based modeling enhanced the understanding, concepts of
complexity varied from field to field. Theory of computation belongs to the field
376 Modeling and simulation of complex communication networks
11.7 Conclusion
Among many P2P networks, BitTorent is very famous for sharing files over a net-
work. In this chapter, we proposed modeling and simulation of the BitTorrent protocol
by using a combination of agent-based and complex network-based approaches. The
simulation results demonstrate that our proposed ABM-based BitTorrent model per-
formed better. Furthermore, forABM specification, we followed two approaches, first
is ODD and the second one is DREAM methodology. We presented qualitative as well
as a quantitative comparison of both ODD and DREAM specification techniques. The
comparative study of ODD and DREAM proved that DREAM methodology is the
more useful approach for documenting an ABM not only in terms of modeling but
also for replication of the models, specifically for P2P networks.
References
[1] Cohen B. “Incentives build robustness in BitTorrent.” In Workshop on
Economics of Peer-to-Peer systems, vol. 6, pp. 68–72. 2003.
[2] Johnsen JA, Karlsen LE, and Birkeland SS. “Peer-to-peer networking with
BitTorrent.” Department of Telematics, Norwegian University of Science and
Technology (NTNU), Norway. (2005).
[3] Scanlon M and Shen H. “An analysis of BitTorrent cross-swarm peer participa-
tion and geolocational distribution.” In 2014 23rd International Conference on
Computer Communication and Networks (ICCCN), Shanghai, pp. 1–6. 2014.
[4] Niazi MA and Hussain A. Complex adaptive communication networks and
environments: part 1. Simulation Transactions of the Society for Modeling
and Simulation International 2013;89:559–561.
[5] Niazi MA. Complex adaptive systems modeling: a multidisciplinary roadmap.
Complex Adaptive Systems Modeling 2013;1:1.
[6] Grimm V, Berger U, DeAngelis DL, Polhill JG, Giske J, and Railsback
SF. The ODD protocol: a review and first update. Ecological Modelling
2010;221(23):2760–2768.
Descriptive agent-based modeling 377
[35] Bharambe AR, Herley C, and Padmanabhan VN. “Analyzing and Improving
a BitTorrent Networks Performance Mechanisms.” In INFOCOM 2006. 25th
IEEE International Conference on Computer Communications. Proceedings
2006.
[36] Srinivasan A and Aldharrab H. XTRA—eXtended bit-Torrent pRotocol for
Authenticated covert peer communication. Peer-to-Peer Networking and
Applications 2018;11:1–15.
[37] Vlavianos A, Iliofotou M, and Faloutsos M. “BiToS: Enhancing BitTorrent
for supporting streaming applications.” In INFOCOM 2006. 25th IEEE Inter-
national Conference on Computer Communications. Proceedings, pp. 1–6.
IEEE, 2006.
[38] Bindal R, Cao P, Chan W, et al. “Improving traffic locality in BitTorrent via
biased neighbor selection.” In Distributed Computing Systems, 2006. ICDCS
2006, pp. 1–9. IEEE, 2006.
[39] Bharambe, AR., Herley C., and Padmanabhan VN. “Analyzing and improv-
ing bittorrent performance.” Microsoft Research, Microsoft Corporation One
Microsoft Way Redmond, WA 98052 (2005): 2005-03.
[40] Barcellos MP, Mansilha RB, and Brasileiro FV. “Torrentlab: Investigating
bittorrent through simulation and live experiments.” In Computers and Com-
munications, 2008. ISCC 2008. IEEE Symposium on, pp. 507–512. IEEE,
2008.
[41] Costa-Montenegro E, Burguillo-Rial JC, Gil-Castiñeira F, and González-
Castaño FJ. Implementation and analysis of the BitTorrent protocol with
a multi-agent model. Journal of Network and Computer Applications
2011;34(1):368–383.
[42] Qiu D and Srikant R. “Modeling and performance analysis of BitTorrent-
like peer-to-peer networks.” In ACM SIGCOMM Computer Communication
Review, vol. 34, no. 4, pp. 367–378. ACM, 2004.
[43] CHOE, YungRyn. “Analyzing and improving a bittorrent network’s perfor-
mance mechanisms.” In: ACM MM’07. 2007.
[44] Eger K, Hoßfeld T, Binzenhöfer A, and Kunzmann G. “Efficient simulation
of large-scale p2p networks: packet-level vs. flow-level simulations.” In Pro-
ceedings of the Second Workshop on Use of P2P, GRID and Agents for the
Development of Content Networks, pp. 9–16. ACM, 2007.
[45] Yang W and Abu-Ghazaleh N. “GPS: a general peer-to-peer simulator and
its use for modeling BitTorrent.” In Modeling, Analysis, and Simulation of
Computer and Telecommunication Systems, 2005. 13th IEEE International
Symposium on, pp. 425–432. IEEE, 2005.
[46] R Development Core Team. R: A Language and Environment for Statistical
Computing. Vienna, Austria: the R Foundation for Statistical Computing. 2013.
Available online at http://www.R-project.org/.
[47] Pouwelse JA, Garbacki P, Epema DHJ, and Sips HJ. “An introduction to
the bittorrent peer-to-peer file-sharing system.” In 19th IEEE Annual Com-
puter Communications Workshop, IEEE Technical Committee on Computer
Communications. 2004.
380 Modeling and simulation of complex communication networks
12.1 Introduction
1
Department of Computer Science, COMSATS University Islamabad, Pakistan
382 Modeling and simulation of complex communication networks
through collaborative analysis of author network. Next, we have revealed core subject
categories of the domain in terms of centrality, the frequency of occurrence, bursts,
and sigma. Next, we seek productive institutions in terms of betweenness centrality,
burstiness, sigma, and frequency of publications. Then, we explored popular keywords
in terms of centrality, the frequency of occurrence, co-occurrence burst, and sigma.
Subsequently, we identified core countries in terms of sigma, centrality, publication
frequency, and bursts. Additionally, we explored clusters of the cited references.
The rest of the paper is organized as follows: Section 12.2 background, Section
12.3 presents the methodology, Section 12.4 demonstrates results, Section 12.5
summarizes the results, and Section 12.6 contains conclusions.
12.2 Background
In this section, we present the background for better understanding of the social
networks and visual analysis.
that is, some actors have many connections with other actors, whereas others have a
few connections. These are heterogeneous networks following the power law degree
distribution [1].
Some nodes occupy a central position in the network. Information diffuse in the
network through central nodes. Diffusion demonstrates the dissemination of events
(information, disease, rumor, or virus, etc.) all over a network. Social networks are
an active source of information dissemination. On the one hand, useful information
can be promoted and propagated efficiently and effectively via social networks. On
the other hand, malicious information such as virus and rumor can also propagate
uncontrollably in social networks.
Identification of
largest cluster
in cited-references
14,000
12,000
10,000
8,000
6,000
4,000
2,000
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Figure 12.2 Citations history of the scientometric data in the domain of “social
networks” over latest 11 years (2007–18)
Citespace
BibExcel
Interest over time CRExplorer
100 CitNet Explorer
Biblio Tools
Interest
75
50
25
Figure 12.3 Google Trends for the comparison between science mapping tools.
Color figure can be viewed at: https://www.researchgate.net
It allows dynamic, spatial, temporal, and interactive visualization. It has a simple and
interactive interface.
It could directly import data from WoS, Scopus, and PubMed. CiteSpace takes
scientometric information and performs the following bibliometric analysis: author,
document, and journal co-citation analysis; coauthors, institutions, and country col-
laboration analysis; co-terms and keywords co-occurrence analysis. The key features
of CiteSpace include timeline view, geographic mapping, and dual-map overlays.
It also offers built-in database and could be connected to MySQL on the local host.
It can export cited references to Endnote and RIS.
It can also generate a summary report containing key information obtained from
the analyzed literature. Its strong documentation is also available, including books,
articles, tutorials, demos, manual, and videos. It also has a Facebook page on which
near real-time help is available.
We have compared CiteSpace with four other science mapping tools: “BibEx-
cel,” “CRExplorer,” “CitNetExplorer,” and “BiblioTools.” Apparently, CiteSpace in
Google Trends shown in Figure 12.3 is on top with an average popularity score of 27.
Social networks—a scientometric visual survey 387
The largest radius of the node Ellison NB (2007) indicates that it is a landmark
node having highest citation frequency. The thickness of purple trims around Kossinets
G (2006) shows that it is the most influential article in the domain. The pink trims
around the nodes are indicating that their centrality score is ≥0.1.
Further details are given below in tabular form.
Table 12.3 presents top articles ranked in terms of citation counts. The top article
is Ellison NB (2007) in Cluster #2 with citation frequency of 558. It has 9,521 citations
on GS. The second one is Fortunato S (2010) in Cluster #0 with citation frequency
of 542. It obtained 6,725 citations on GS. The third is Christakis NA (2007) in Cluster
#0 with citation frequency of 501. It has 4,583 citations on GS. The fourth is Boyd DM
(2007) in Cluster #2 with citation frequency of 455. It has 12,726 citations on GS. The
fifth is Newman MEJ (2010) in Cluster #0 with citation frequency of 417. The sixth is
Kaplan AM (2010) in Cluster #2 with citation frequency of 387. It has 2,493 citations
on GS. The seventh is Borgatti SP (2009) in Cluster #3 with citation frequency of
385. It has 17,398 citations on GS. The eighth is Newman MEJ (2003) in Cluster #1
with citation frequency of 381. It has 1,840 citations on GS. The ninth is Christakis
NA (2008) in Cluster #0 with citation frequency of 375. The tenth is Snijders TAB
(2010) in Cluster #3 with citation frequency of 359. It has 1,320 citations on GS.
Table 12.4 presents top articles listed in terms of citation burst. “The burstiness
of the frequency of an entity over time indicates a specific duration in which an abrupt
change of the frequency takes place” [13]. The top article ranked by citation burst is
Newman MEJ (2003) in Cluster #1 with burstiness of 140.95. The second one is Boyd
DM (2007) in Cluster #2 with bursts of 126.59. The third is Albert R (2002) in Cluster
#1 with bursts of 120.88. The fourth is Ellison NB (2007) in Cluster #2 with bursts of
111.91. The fifth is Watts DJ (1998) in Cluster #1 with bursts of 100.56. The sixth is
Barabasi AL (1999) in Cluster #1 with bursts of 95.75. The seventh is McPherson M
(2001) in Cluster #5 with bursts of 83.29. The eighth is Borgatti SP (2002) in Cluster
#3 with bursts of 80.51. The ninth is Christakis NA (2007) in Cluster #0 with bursts
of 79.06. The tenth is Putnam R D (2000) in Cluster #5 with bursts of 75.83.
390 Modeling and simulation of complex communication networks
betweenness centrality and citation burst” [15]. The top-ranked item by sigma is
Kaplan AM (2010) in Cluster #2 with a sigma of 5819401151475.16. The second one
is Newman MEJ (2003) in Cluster #1 with a sigma of 27277977080.97. The third is
Albert R (2002) in Cluster #1 with a sigma of 79866981.70. The fourth is Centola D
(2010) in Cluster #0 with a sigma of 66555034.25. The fifth is Bond RM (2012) in
Cluster #0 with a sigma of 8847176.59. The sixth is Onnela JP (2007) in Cluster #0
with a sigma of 40726.42. The seventh is Newman MEJ (2010) in Cluster #0 with a
sigma of 25355.81. The eighth is Kossinets G (2006) in Cluster #0 with a sigma of
24045.82. The ninth is McPherson M (2001) in Cluster #5 with a sigma of 19341.76.
The tenth is Amaral LAN (2000) in Cluster #1 with a sigma of 1555.13.
After a detailed visualization of cited references co-citation network, our next
target is to identify largest connected cluster in the network of cited references.
Table 12.7 Summary of the largest connected cluster of the cited references of
co-citation network
a
Labelling algorithms: Term Frequency Inverse Document Frequency (TFIDF), Log-Likelihood Ratio
(LLR), and Mutual Information (MI).
Social networks—a scientometric visual survey 393
The largest cluster (#0) has 30 members and a silhouette value of 0.825. The
silhouette score suggests that the homogeneity of the underlying cluster is relatively
high. It is labeled as “social network” by LLR, “application | complex network” by
TFIDF, and “statistical analysis” by MI. The most active citer to the cluster is 0.17
Pei, S (2013) [16].
The second largest cluster (#1) has 28 members and a silhouette value of 0.933
indicating high homogeneity score. The cluster is labeled as “small world network”
by LLR, “role | infection” by TFIDF, and “assortative interaction” by MI. The most
active citer to the cluster is 0.29 Newman, MEJ (2001) [17].
The third largest cluster (#2) has 26 members and a silhouette value of 0.998,
which indicates higher homogeneity score. The cluster is labeled as “social networking
site” by LLR, “feelings | educational method” by TFIDF, and “job search” by MI. The
most active citer to the cluster is 0.23 Krasnova, H (2015) [18].
The fourth largest cluster (#3) has 25 members and a silhouette score of 0.846. It
is labeled as “structural hole” by LLR, “coauthorship network | physical activity” by
TFIDF, and “acute myocardial infarction” by MI. The most active citer to the cluster
is 0.24 Hoppe, B (2010) [19].
The fifth largest cluster (#4) has eight members and a silhouette value of 0.969. It
is labeled as “HIV infection” by LLR, “practice | perspective” by TFIDF, and “social
network” by MI. The most active citer to the cluster is 0.62 Riolo, CS (2001) [20].
The sixth largest cluster (#5) has five members and a silhouette value of 0.945. It
is labeled as “CEO compensation” by LLR, “dyadic stability” by TFIDF, and “social
network” by MI. The most active citer to the cluster is 0.4 Denner, J (2001) [21].
After identifying the largest component in the cited reference co-citation network,
our focus is the analysis of author collaboration network.
73 Liu Y 2012
64 Zhang Y 2009
64 Zhang J 2008
61 Kim J 2011
61 Wang Y 2013
57 Latkin CA 2001
56 Chen Y 2009
52 Lee S 2010
49 Lee J 2010
48 Chen L 2013
Table 12.10 presents top authors sorted in terms of betweenness centrality. The
top-ranked author by betweenness centrality is “ZhangY” (2009) with centrality score
of 0.04. The second one is “Kim Y” (2011) with centrality score of 0.04. The third
is “Newman MEJ” (2001) with centrality score of 0.03. The fourth is “Lee J” (2010)
with centrality score of 0.03. The fifth is “Wang L” (2007) with centrality score of
0.03. The sixth is “Holme P” (2003) with centrality score of 0.03. The seventh is
“Fowler JH” (2007) with centrality score of 0.03. The eighth is “Park J” (2004) with
centrality score of 0.03. The ninth is “Liljeros F” (2004) with centrality score of 0.03.
The tenth is “Christakis NA” (2008) with centrality score of 0.02.
Table 12.11 presents top authors sorted in terms of sigma score. The top-ranked
author by sigma is “ZhangY” (2009) with a sigma of 1.98. The second one is “Newman
MEJ” (2001) with a sigma of 1.33. The third is “Kim Y” (2011) with a sigma of 1.31.
The fourth is “Lee J” (2010) with a sigma of 1.29. The fifth is “Wang Y” (2013) with
a sigma of 1.23. The sixth is “Wang L” (2007) with a sigma of 1.21. The seventh
396 Modeling and simulation of complex communication networks
is “Zhang J” (2008) with a sigma of 1.21. The eighth is “Holme P” (2003) with a
sigma of 1.18. The ninth is “Christakis NA” (2008) with a sigma of 1.17. The tenth is
“Park J” (2004) with a sigma of 1.16.
After analyzing coauthors collaboration network, we move toward visual analysis
of the institution’s network.
The thickness of purple trim around Harvard University indicates that it is the
most central institution, and the largest radius around Harvard indicate that it is also
the landmark node. The red highlights on Chinese Academy of Science indicate that
it is the most active institution with the highest burst.
The detailed analysis is presented in the tabular form below.
Table 12.12 demonstrates the publication count of institutions. The top-ranked
institution by publication frequency is the “Harvard University, USA” (2001) with
398 Modeling and simulation of complex communication networks
(2001) with the centrality of 0.07. It is ranked 32nd in World University Rankings.
The ninth is “University of California, Berkeley, USA” (2001) with the centrality
of 0.07. It is ranked 18th in World University Rankings. The tenth is “University
of Maryland, USA” (2001) with the centrality of 0.06. It is ranked 69th in World
University Rankings.
Table 12.14 presents top institutions sorted in terms of sigma score. The top-
graded institution by sigma is the “Harvard University, USA” (2001) with a sigma
of 1.00. It is ranked sixth in World University Rankings. The second one is the
“Karolinska Institute, Sweden” (2001) with a sigma of 1.00. It is ranked 38th in
World University Rankings. The third is the “University of Michigan, USA” (2001)
with a sigma of 1.00. It is ranked 21st in World University Rankings. The fourth is
the “Stockholm University, Sweden” (2001) with a sigma of 1.00. It is ranked 134th
in World University Rankings. The fifth is the “University of Oxford, UK” (2002)
with a sigma of 1.00. It is ranked first in World University Rankings. The sixth is
“Pennsylvania State University, USA” (2001) with a sigma of 1.00. It is ranked 77th
in World University Rankings. The seventh is the “University of Toronto, Canada”
(2001) with a sigma of 1.00. It is ranked 22nd in World University Rankings. The
eighth is the “University of Melbourne, Australia” (2001) with a sigma of 1.00. It is
ranked 32nd in World University Rankings. The ninth is the “University of California,
Berkeley, USA” (2001) with a sigma of 1.00. It is ranked 18th in World University
Rankings. The tenth is the “University of Maryland, USA” (2001) with a sigma of
1.00. It is ranked 69th in World University Rankings.
Table 12.15 presents top institutions listed in terms of burst of publications.
The top-ranked institution in terms of bursts is “University of Chinese Academy of
Sciences, China” (2014) with bursts of 40.52. It is ranked 189th in World University
Rankings. The second one is “University of Southern California, USA” (2016) with
bursts of 36.5. It is ranked 66th in World University Rankings. The third is “Tsinghua
University, China” (2013) with bursts of 29.49. It is ranked 30th in World University
Rankings. The fourth is “Nanyang Technological University, Singapore” (2005) with
400 Modeling and simulation of complex communication networks
bursts of 27.79. It is ranked 52nd in World University Rankings. The fifth is the
“Nanyang Technological University, Singapore” (2016) with bursts of 26.76. It is
ranked 52nd in World University Rankings.
After giving an overview of the collaborative institution network analysis, we
present an overview of the analysis of the country–country network.
one is People’s Republic of China (2001) with publication frequency of 4,045. The
third is England (2001) with publication frequency of 3,637. The fourth is Australia
(2001) with publication frequency of 2,071. The fifth is Canada (2001) with pub-
lication frequency of 1,955. The sixth is Spain (2001) with publication frequency
of 1,592. The seventh is Germany (2001) with publication frequency of 1,522. The
eighth is Netherlands (2001) with publication frequency of 1,480. The ninth is Italy
(2001) with publication frequency of 1,216. The tenth is South Korea (2001) with
publication frequency of 1,156.
Table 12.17 lists the top countries ranked in terms of burstness. The top country
ranked by bursts is the United States (2001) with bursts of 52.20. The second one is
Sweden (2001) with bursts of 51.65. The third is India (2002) with bursts of 38.31.
The fourth is Iran (2010) with bursts of 26.02. The fifth is Saudi Arabia (2013) with
bursts of 23.92. The sixth is Pakistan (2011) with bursts of 21.89. The seventh is
Egypt (2002) with bursts of 13.04. The eighth is Croatia (2009) with bursts of 10.57.
The ninth is Israel (2001) with bursts of 8.67. The tenth is Estonia (2008) with bursts
of 7.04.
402 Modeling and simulation of complex communication networks
Table 12.18 presents the central regions in terms of betweenness centrality. The
top-ranked country in terms of centrality is England (2001) with the centrality of
0.25. The second one is the United States (2001) with the centrality of 0.12. The
third is Canada (2001) with the centrality of 0.11. The fourth is Spain (2001) with the
centrality of 0.11. The fifth is France (2001) with the centrality of 0.10. The sixth is
Denmark (2001) with the centrality of 0.08. The seventh is Switzerland (2001) with
the centrality of 0.08. The eighth is Chile (2008) with the centrality of 0.08. The ninth
is Finland (2001) with the centrality of 0.07. The tenth is Germany (2001) with the
centrality of 0.07.
Table 12.19 contains top countries sorted in terms of sigma score. The top-graded
country by sigma is the United States (2001) with a sigma of 426.27. The second one
is Sweden (2001) with a sigma of 6.14. The third is Saudi Arabia (2013) with a sigma
of 2.09. The fourth is India (2002) with a sigma of 1.52. The fifth is Norway (2001)
with a sigma of 1.42. The sixth is Denmark (2001) with a sigma of 1.37. The seventh
is Finland (2001) with a sigma of 1.35. The eighth is Hungary (2002) with a sigma
of 1.27. The ninth is Israel (2001) with a sigma of 1.11. The tenth is Iran (2010) with
a sigma of 1.10.
Social networks—a scientometric visual survey 403
network” (2001) with a co-occurrence frequency of 11,151. The second one is “net-
work” (2001) with a co-occurrence frequency of 2,983. The third is “model” (2001)
with a co-occurrence frequency of 2,679. The fourth is “behavior” (2001) with a co-
occurrence frequency of 2,452. The fifth is “social network analysis” (2005) with a
co-occurrence frequency of 1,801. The sixth is “Internet” (2002) with co-occurrence
frequency of 1,740. The seventh is “health” (2001) with a co-occurrence frequency
of 1,689. The eighth is “Facebook” (2010) with a co-occurrence frequency of 1,639.
The ninth is “community” (2001) with a co-occurrence frequency of 1,616. The tenth
is “performance” (2001) with co-occurrence frequency of 1,557.
Table 12.21 presents top keywords based on co-occurrence burst. The top-graded
keyword by bursts is “media” (2015) with bursts of 135.87. The second one is “chil-
dren” (2001) with bursts of 101.12. The third is “stress” (2001) with bursts of 95.19.
The fourth is “social media” (2012) with bursts of 89.61. The fifth is “Twitter” (2014)
with bursts of 80.67. The sixth is “mortality” (2001) with bursts of 77.78. The seventh
is “weak ty” (2008) with bursts of 70.61. The eighth is “quality of life” (2001) with
bursts of 70.20. The ninth is “life” (2001) with bursts of 64.17. The tenth is “care”
(2001) with bursts of 63.19.
Table 12.22 illustrates top keywords listed in terms of betweenness centrality.
The top-ranked keyword by betweenness centrality is “social network” (2001) with
the centrality of 0.50. The second one is “behavior” (2001) with the centrality of 0.13.
The third is “support” (2001) with the centrality of 0.09. The fourth is “Facebook”
(2010) with the centrality of 0.09. The fifth is “mortality” (2001) with the centrality
of 0.08. The sixth is “health” (2001) with the centrality of 0.08. The seventh is
“community” (2001) with the centrality of 0.08. The eighth is “predictor” (2001)
with the centrality of 0.07. The ninth is “children” (2001) with the centrality of 0.06.
The tenth is “population” (2001) with the centrality of 0.06
Table 12.23 lists top keywords sorted in terms of sigma. The top-ranked keyword
by sigma is “mortality” (2001) with a sigma of 389.96. The second one is “children”
(2001) with a sigma of 234.65. The third is “support” (2001) with a sigma of 13.08.
The fourth is “predictor” (2001) with a sigma of 7.48. The fifth is “social support”
Social networks—a scientometric visual survey 405
(2001) with a sigma of 6.44. The sixth is “women” (2001) with a sigma of 5.33.
The seventh is “privacy” (2014) with a sigma of 5.11. The eighth is “stress” (2001)
with a sigma of 3.65. The ninth is “HIV” (2001) with a sigma of 3.45. The tenth is
“population” (2001) with a sigma of 3.09.
After giving an overview of the popular keywords of the domain, we present an
overview of the key subject categories of the domain.
In Figure 12.10, it can be noted that “computer science” is the most occurred
category, and it is the most influential category of the “social network” domain. A
detailed analysis is listed below in the tabular form.
Table 12.24 presents top categories sorted in terms of co-occurrence frequency.
The top-ranked subject category by co-occurrence frequency is “Computer Science”
Social networks—a scientometric visual survey 407
(2001) with a co-occurrence frequency of 6,939. The second one is “Business & Eco-
nomics” (2001) with a co-occurrence frequency of 4,510. The third is “Psychology”
(2001) with a co-occurrence frequency of 4,097. The fourth is “Computer Science,
Information Systems” (2002) with a co-occurrence frequency of 3,695. The fifth is
“Engineering” (2003) with a co-occurrence frequency of 2,423. The sixth is “Pub-
lic, Environmental & Occupational Health” (2001) with a co-occurrence frequency
of 2,296. The seventh is “Management” (2001) with a co-occurrence frequency of
2,141. The eighth is “Sociology” (2001) with a co-occurrence frequency of 2,011.
The ninth is “Information Science & Library Science” (2001) with a co-occurrence
frequency of 1,834. The tenth is “Business” (2001) with a co-occurrence frequency
of 1,667.
Table 12.25 presents top categories sorted in terms of co-occurrence bursts. The
top-ranked item by bursts is “Sociology” (2001) with bursts of 77.40. The second
one is “Gerontology” (2001) with bursts of 63.49. The third is “Nursing” (2001)
with bursts of 60.05. The fourth is “Geriatrics & Gerontology” (2001) with bursts of
56.93. The fifth is “Hospitality, Leisure, Sport &Tourism” (2016) with bursts of 48.15.
The sixth is “Psychiatry” (2001) with bursts of 44.72. The seventh is “Psychology,
Developmental” (2001) with bursts of 44.70. The eighth is “Demography” (2001)
with bursts of 44.45. The ninth is “Physics, Mathematical” (2001) with bursts of
42.99. The tenth is “Psychology, Applied” (2001) with bursts of 40.51.
Table 12.26 presents top categories sorted in terms of betweenness central-
ity. The top-ranked item by centrality is “Psychology” (2001) with the centrality
of 0.21. The second one is “Computer Science, Interdisciplinary Applications”
(2002) with the centrality of 0.20. The third is “Health Care Sciences & Services”
(2001) with the centrality of 0.20. The fourth is “Mathematics” (2001) with the cen-
trality of 0.18. The fifth is “Social Sciences—Other Topics” (2001) with the centrality
of 0.16. The sixth is “Public, Environmental & Occupational Health” (2001) with the
centrality of 0.15. The seventh is “Environmental Sciences & Ecology” (2001) with
the centrality of 0.13. The eighth is “Engineering” (2003) with the centrality of 0.11.
408 Modeling and simulation of complex communication networks
The ninth is “Social Sciences, Interdisciplinary” (2001) with the centrality of 0.11.
The tenth is “Psychology, Multidisciplinary” (2001) with the centrality of 0.10.
Table 12.27 presents top categories sorted in terms of sigma. The top-ranked item
by sigma is “Sociology” (2001) with a sigma of 350.69. The second one is “Psychol-
ogy” (2001) with a sigma of 14.41. The third is “Psychiatry” (2001) with a sigma of
9.19. The fourth is “Physics, Mathematical” (2001) with a sigma of 8.71. The fifth is
“Social Sciences, Biomedical” (2001) with a sigma of 6.38. The sixth is “Biomedical
Social Sciences” (2001) with a sigma of 6.38. The seventh is “Rehabilitation” (2001)
with a sigma of 5.45. The eighth is “Public, Environmental & Occupational Health”
(2001) with a sigma of 5.35. The ninth is “Physics” (2001) with a sigma of 3.22. The
tenth is “Mathematics, Interdisciplinary Applications” (2001) with a sigma of 2.44.
After analyzing the key subject categories, we perform our final analysis of
journal co-citation network.
Social networks—a scientometric visual survey 409
unique records. Here we provide highlights of the key findings of this scientometric
study.
First, we observed the popularity of the domain over past 10 years and revealed
that starting from a few citations in 2007, the “social network” has risen to 18,000
citations only in the year 2017. Similarly, the number of publications in the domain
has risen from 314 publications in the year 2001 to 5,710 publications in the year 2017.
In the successive analysis of cited reference co-citation network, we identified
that the article “Ellison NB (2007)” is the landmark node with a citation frequency of
558, the article “Newman MEJ (2003)” has strongest citation burst of strength 140.95,
the top-ranked article in terms of betweenness centrality is “Centola D (2010),” and the
top-ranked article in terms of sigma is “Kaplan AM (2010).” We have also identified
key turning points in the largest connected component of cited reference co-citation
network which are “Kaplan AM (2010),” “Bond RM (2012),” “McPherson M (2001),”
“Watts D (1999),” and “Newman MEJ (2002)” and the key pivot points which are
“Amaral Lan (2000),” Watts D (1999),” and “Newman MEJ (2000).”
In the analysis of collaborative author network, we observed that the author “Liu
Y (2012)” is the landmark node with the highest number of publications. We also
observed that “Wang Y (2013)” has the strongest publication burst. We also observed
that “ZhangY (2009)” is the most central author in the domain and also has the highest
burst.
In the institution–institution network analysis, it is revealed that Harvard is the
most productive and the most centrally organization in the domain. It also has highest
sigma score. We also observed that the “University of Chinese Academy of Sciences”
has the strongest burst.
In the visualization of keywords co-occurrence network, we observed that “social
network” is the most popular and most central keyword in the domain. We also
observed that the keyword “media” has the highest co-occurrence burst and the
keyword “mortality” has a highest sigma score.
In the category co-occurrence network, we observed that “computer science” is
the most frequently occurred subject category of the domain. We also observed that
“sociology” has highest co-occurrence burst and highest sigma score. We also found
that “psychology” is the most central category.
Subsequently, in the analysis of the journal co-citation network, we also noted
that “computers in human behavior” journal has highest publication frequency and
its impact factor is 3.536.
References
[1] M. Newman, Networks: An introduction: Oxford University Press, New York,
USA, 2010.
[2] M. Newman, Networks: An introduction: New York, NY: Oxford University
Press Inc., 2010, pp. 1–2.
[3] D. J. D. S. Price, “Networks of scientific papers,” Science, USA, Vol. 149, No.
3683, pp. 510–515, 1965.
[4] J. R. Clough and T. S. Evans, “What is the dimension of citation space?,”
Physica A: Statistical Mechanics and its Applications, Netherlands, vol. 448,
pp. 235–247, 2016.
[5] C. Chen, CiteSpace: A practical guide for mapping scientific literature: Nova
Science Publishers, Incorporated, New York, USA, 2016.
[6] M. E. Newman, “Coauthorship networks and patterns of scientific collab-
oration,” Proceedings of the National Academy of Sciences, vol. 101, pp.
5200–5205, 2004.
[7] P. Pradhan, Science mapping and visualization tools used in bibliometric &
scientometric studies: An overview, INFLIBNET Centre, Gandhinagar, Series
23, Report no.4, ISSN 0971-9849 2017.
[8] C. Chen, “CiteSpace II: Detecting and visualizing emerging trends and tran-
sient patterns in scientific literature,” Journal of theAssociation for Information
Science and Technology, vol. 57, pp. 359–377, 2006.
[9] O. Persson, R. Danell, and J. W. Schneider, “How to use BibExcel for var-
ious types of bibliometric analysis,” In: Åström, F., Danell, R., Larsen, B.,
Wiborg-Schneider, J. (Eds.), Celebrating scholarly communication studies: A
Festschrift for Olle Persson at his 60th Birthday, vol. 5, pp. 9–24, 2009.
[10] A. Thor, W. Marx, L. Leydesdorff, and L. Bornmann, “Introducing Cite-
dReferencesExplorer (CRExplorer): A program for reference publication year
spectroscopy with cited references standardization,” Journal of Informetrics,
vol. 10, pp. 503–515, 2016.
[11] N. J. Van Eck and L. Waltman, “CitNetExplorer: A new software tool for
analyzing and visualizing citation networks,” Journal of Informetrics, vol. 8,
pp. 802–823, 2014.
412 Modeling and simulation of complex communication networks
joining swarm and peers discovery in Centre for Applied Internet Data
340 Analysis (CAIDA) 154
literature review 343–8 CheckCS function 221
PeerSim 348 “check-if-segment-is-available” 358,
model design 349 362
ABM results 366–7 “check-if-segment-is-needed-by-others”
comparison of both 367–9 359
DREAM model 354–6 chord
DREAM network models 369 -based approaches 249–50
experiments 362–5 hierarchy-based 248
globals 357–8 load distribution and resource
ODD approach 349 allocation based 249
overview of proposed model mobility-based 247–8
351–2 peer data management-based 246–7
PeerSim results 366 performance of 245
procedures 358–62 routing and latency-based 248–9
security-based 245–6
pseudocode-based specification
Chord, ODD model of 250
356
design concepts 252
results and discussions 365
adaptation 253
ODD vs DREAM 375–6
basic principles 252–3
Blender 8
collectives 253
Blue Box 6
emergence 253
BoNeSi 55, 63, 68–9
interaction 253
Border Gateway Protocol (BGP) 153 learning 253
Boston university Representative objectives 253
Internet Topology gEnerator observation 254
(BRITE) tool 146, 159, 162 sensing 253
Botnets 295 stochasticity 253
Breed Node 257, 260 entities, state variables, and scales
building energy management system 251
(BEMS) 120 agents/individuals 251–2
Bullet Physics Library 8 collectives 252
burstiness 176, 389, 400–1 environment 252
Business to Manufacturing Markup spatial units 252
Language (B2MML) 41 initialization 254
buy-in 7 input data 254
process overview and scheduling
category co-occurrence network 252
analysis 405–10 purpose 251
census-based approach 115–16, 132, sub-models 254
136 create-network 254
census scheme 124 go 254
centrality measures, power law plots of init-node 254
313–14 set-up 254
Index 415