0% found this document useful (0 votes)
410 views

Simgrid Tutorial

The document discusses tools and methodologies for simulating large-scale distributed systems. It introduces the SimGrid framework, which aims to provide standard tools that allow experiments on such systems to be reproducible, use common methodologies, and be easily understood by students. The document outlines SimGrid's agenda, which includes discussing resource models, validation of simulation models, platform modeling, and examples of using SimGrid to simulate scheduling algorithms and compare solutions.

Uploaded by

royal1352
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
410 views

Simgrid Tutorial

The document discusses tools and methodologies for simulating large-scale distributed systems. It introduces the SimGrid framework, which aims to provide standard tools that allow experiments on such systems to be reproducible, use common methodologies, and be easily understood by students. The document outlines SimGrid's agenda, which includes discussing resource models, validation of simulation models, platform modeling, and examples of using SimGrid to simulate scheduling algorithms and compare solutions.

Uploaded by

royal1352
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 142

The SimGrid Framework for Research on Large-Scale Distributed Systems

Martin Quinson (Nancy University, France) Arnaud Legrand (CNRS, Grenoble University, France) Henri Casanova (Hawaii University at Manoa, USA)
[email protected]

Large-Scale Distributed Systems Research


Large-scale distributed systems are in production today
Grid platforms for e-Science applications Peer-to-peer le sharing Distributed volunteer computing Distributed gaming

Researchers study a broad range of systems


Data lookup and caching algorithms Application scheduling algorithms Resource management and resource sharing strategies

They want to study several aspects of their system performance


Response time Throughput Scalability Robustness Fault-tolerance Fairness

Main question: comparing several solutions in relevant settings


SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (2/142)

Large-Scale Distributed Systems Science?


Requirement for a Scientic Approach
Reproducible results
You can read a paper, reproduce a subset of its results, improve

Standard methodologies and tools


Grad students can learn their use and become operational quickly Experimental scenario can be compared accurately

Current practice in the eld: quite dierent


Very little common methodologies and tools Experimental settings rarely detailed enough in literature (test source codes?)

Purpose of this tutorial


Present emerging methodologies and tools Show how to use some of them in practice Discuss open questions and future directions
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (3/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications Conclusion
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (4/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches
Real-world experiments Simulation

Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications Conclusion
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (5/142)

Analytical or Experimental?

Analytical works?
Some purely mathematical models exist Allow better understanding of principles in spite of dubious applicability
impossibility theorems, parameter inuence, . . .

Theoretical results are dicult to achieve


Everyday practical issues (routing, scheduling) become NP-hard problems
Most of the time, only heuristics whose performance have to be assessed are proposed

Models too simplistic, rely on ultimately unrealistic assumptions.

One must run experiments


Most published research in the area is experimental

SimGrid for Research on Large-Scale Distributed Systems

Experiments for Large-Scale Distributed Systems Research

(6/142)

Running real-world experiments


Eminently believable to demonstrate the proposed approach applicability Very time and labor consuming
Entire application must be functional Parameter-sweep; Design alternatives

Choosing the right testbed is dicult


My own little testbed?
Well-behaved, controlled,stable Rarely representative of production platforms

Real production platforms?


Not everyone has access to them; CS experiments are disruptive for users Experimental settings may change drastically during experiment (components fail; other users load resources; administrators change cong.)

Results remain limited to the testbed


Impact of testbed specicities hard to quantify collection of testbeds... Extrapolations and explorations of what if scenarios dicult (what if the network were dierent? what if we had a dierent workload?)

Experiments are uncontrolled and unrepeatable


No way to test alternatives back-to-back (even if disruption is part of the experiment)

Dicult for others to reproduce results


even if this is the basis for scientic advances!
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (7/142)

Simulation
Simulation solves these diculties No need to build a real system, nor the full-edged application Ability to conduct controlled and repeatable experiments (Almost) no limits to experimental scenarios Possible for anybody to reproduce results Simulation in a nutshell Predict aspects of the behavior of a system using an approximate model of it Model: Set of objects dened by a state Rules governing the state evolution Simulator: Program computing the evolution according to the rules Wanted features:
Accuracy: Correspondence between simulation and real-world Scalability: Actually usable by computers (fast enough) Tractability: Actually usable by human beings (simple enough to understand) Instanciability: Can actually describe real settings (no magical parameter) Relevance: Captures object of interest

SimGrid for Research on Large-Scale Distributed Systems

Experiments for Large-Scale Distributed Systems Research

(8/142)

Simulation in Computer Science


Microprocessor Design
A few standard cycle-accurate simulators are used extensively http://www.cs.wisc.edu/~arch/www/tools.html Possible to reproduce simulation results

Networking
A few established packet-level simulators: NS-2, DaSSF, OMNeT++, GTNetS Well-known datasets for network topologies Well-known generators of synthetic topologies SSF standard: http://www.ssfnet.org/ Possible to reproduce simulation results

Large-Scale Distributed Systems?


No established simulator up until a few years ago Most people build their own ad-hoc solutions
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (9/142)

Simulation in Parallel and Distributed Computing


Used for decades, but under drastic assumptions in most cases

Simplistic platform model


Fixed computation and communication rates (Flops, Mb/s) Topology either fully connected or bus (no interference or simple ones) Communication and computation are perfectly overlappable

Simplistic application model


All computations are CPU intensive (no disk, no memory, no user) Clear-cut communication and computation phases Computation times even ignored in Distributed Computing community Communication times sometimes ignored in HPC community

Straightforward simulation in most cases


Fill in a Gantt chart or count messages with a computer rather than by hand No need for a simulation standard
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (10/142)

Large-Scale Distributed Systems Simulations?


Simple models justiable at small scale
Cluster computing (matrix multiply application on switched dedicated cluster) Small scale distributed systems

Hardly justiable for Large-Scale Distributed Systems


Heterogeneity of components (hosts, links)
Quantitative: CPU clock, link bandwidth and latency Qualitative: ethernet vs myrinet vs quadrics; Pentium vs Cell vs GPU

Dynamicity
Quantitative: resource sharing availability variation Qualitative: resource come and go (churn)

Complexity
Hierarchical systems: grids of clusters of multi-processors being multi-cores Resource sharing: network contention, QoS, batches Multi-hop networks, non-negligible latencies Middleware overhead (or optimizations) Interference of computation and communication (and disk, memory, etc)
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (11/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems
Possible designs Experimentation platforms: Grid5000 and PlanetLab Emulators: ModelNet and MicroGrid Packet-level Simulators: ns-2, SSFNet and GTNetS Ad-hoc simulators: ChicagoSim, OptorSim, GridSim, . . . Peer to peer simulators SimGrid

Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: SimGrid for Research on Large-Scale DistributedConcurrent Sequential Processes Research Comparing Heuristics for Systems Experiments for Large-Scale Distributed Systems

(12/142)

Models of Large-Scale Distributed Systems


Model = Set of objects dened by a state Set of rules governing the state evolution

Model objects:
Evaluated application: Do actions, stimulus to the platform Resources (network, CPU, disk): Constitute the platform, react to stimulus.
Application blocked until actions are done Resource can sometime do actions to represent external load

Expressing interaction rules


more abstract Mathematical Simulation: Based solely on equations Discrete-Event Simulation: System = set of dependant actions & events Emulation: Trapping and virtualization of low-level application/system actions Real execution: No modication

less abstract

Boundaries are blurred


Tools can combine several paradigms for dierent resources Emulators may use a simulator to compute resource availabilities
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (13/142)

Simulation options to express rules


Network
Macroscopic: Flows in pipes (mathematical & coarse-grain d.e. simulation) Data sizes are liquid amount, links are pipes Microscopic: Packet-level simulation (ne-grain d.e. simulation) Emulation: Actual ows through some network timing + time expansion

CPU
Macroscopic: Flows of operations in the CPU pipelines Microscopic: Cycle-accurate simulation (ne-grain d.e. simulation) Emulation: Virtualization via another CPU / Virtual Machine

Applications
Macroscopic: Application = analytical ow Less macroscopic: Set of abstract tasks with resource needs and dependencies
Coarse-grain d.e. simulation Application specication or pseudo-code API

Virtualization: Emulation of actual code trapping application generated events


SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (14/142)

Large-Scale Distributed Systems Simulation Tools


A lot of tools exist
Grid5000, Planetlab, MicroGrid, Modelnet, Emulab, DummyNet ns-2, GTNetS, SSFNet ChicagoSim, GridSim, OptorSim, SimGrid, . . . PeerSim, P2PSim, . . .

How do they compare?


How do they work?
Components taken into account (CPU, network, application) Options used for each component (direct execution; emulation; d.e.; simulation)

What are their relative qualities?


Accuracy (correspondence between simulation and real-world) Technical requirement (programming language, specic hardware) Scale (tractable size of systems at reasonable speed) Experimental settings congurable and repeatable, or not

SimGrid for Research on Large-Scale Distributed Systems

Experiments for Large-Scale Distributed Systems Research

(15/142)

Experimental tools comparison


Grid5000 Planetlab Modelnet MicroGrid ns-2 SSFNet GTNetS ChicSim OptorSim GridSim P2PSim PlanetSim PeerSim SimGrid CPU direct virtualize emulation coarse d.e. coarse d.e. coarse d.e. math/d.e. Disk direct virtualize amount coarse d.e. (underway) Network direct virtualize emulation ne d.e. ne d.e. ne d.e. ne d.e. coarse d.e. coarse d.e. coarse d.e. cste time math/d.e. Application direct virtualize emulation emulation coarse d.e. coarse d.e. coarse d.e. coarse d.e. coarse d.e. coarse d.e. state machine coarse d.e. state machine d.e./emul Requirement access none lot material none C++ and tcl Java C++ C Java Java C++ Java Java C or Java Settings xed uncontrolled controlled controlled controlled controlled controlled controlled controlled controlled controlled controlled controlled controlled Scale <5000 hundreds dozens hundreds <1,000 <100,000 <177,000 few 1,000 few 1,000 few 1,000 few 1,000 100,000 1,000,000 few 100,000

Direct execution no experimental bias (?) Experimental settings xed (between hardware upgrades), but not controllable Virtualization allows sandboxing, but no experimental settings control Emulation can have high overheads (but captures the overhead) Discrete event simulation is slow, but hopefully accurate To scale, you have to trade speed for accuracy
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (16/142)

Grid5000 (consortium INRIA)


French experimental platform
1500 nodes (3000 cpus, 4000 cores) over 9 sites Nation-wide 10Gb dedicated interconnection http://www.grid5000.org

Scientic tool for computer scientists


Nodes are deployable: install your own OS image Allow study at any level of the stack:
Network (TCP improvements) Middleware (scalability, scheduling, fault-tolerance) Programming (components, code coupling, GridRPC) Applications

Applications not modied, direct execution Environment controlled, experiments repeatable Relative scalability (only 1500-4000 nodes)
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (17/142)

PlanetLab (consortium)
Open platform for developping, deploying, and accessing planetary-scale services Planetary-scale 852 nodes, 434 sites, >20 countries

Distribution Virtualization each user can get a slice of the platform Unbundled Management local behavior dened per node; network-wide behavior: services multiple competing services in parallel (shared, unprivileged interfaces) As unstable as the real world Demonstrate the feasability of P2P applications or middlewares No reproducibility!
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (18/142)

ModelNet (UCSD/Duke)
Applications
Emulation and virtualization: Actual code executed on virtualized resources Key tradeo: scalability versus accuracy

Resources: system calls intercepted


gethostname, sockets

CPU: direct execution on CPU


Slowdown not taken into account!

Network: emulation through:


one emulator (running on FreeBSD) a gigabit LAN hosts + IP aliasing for virtual nodes emulation of heterogeneous links Similar ideas used in other projects (Emulab, DummyNet, Panda, . . . )
Amin Vahdat et Al., Scalability and Accuracy in a LargeScale Network Emulator, OSDI02.
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (19/142)

MicroGrid (UCSD)
Applications
Application supported by emulation and virtualization Actual application code is executed on virtualized resources Accounts for CPU and network
Application

Resources: wraps syscalls & grid tools


gethostname, sockets, GIS, MDS, NWS

Virtual Resources

CPU: direct execution on fraction of CPU


nds right mapping

Network: packet-level simulation


parallel version of MaSSF

MicroGrid Physical Ressources

Time: synchronize real and virtual time


nd the good execution rate
Andrew Chien et Al., The MicroGrid: a Scientic Tool for Modeling Computational Grids, SuperComputing 2002.
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (20/142)

Packet-level simulators
ns-2: the most popular one
Several protocols (TCP, UDP, . . . ), several queuing models (DropTail, RED, . . . ) Several application models (HTTP, FTP), wired and wireless networks Written in C++, congured using TCL. Limitated scalability (< 1, 000)

SSFNet: implementation of SSF standard


Scalable Simulation Framework: unied API for d.e. of distributed systems Written in Java, usable on 100 000 nodes

GTNetS: Georgia Tech Network Simulator


Design close to real networks protocol philosophy (layers stacked) C++, reported usable with 177, 000 nodes

Simulation tools of / for the networking community


Topic: Study networks behavior, routing protocols, QoS, . . . Goal: Improve network protocols Microscopic simulation of packet movements Inadequate for us (long simulation time, CPU not taken into account)
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (21/142)

ChicagoSim, OptorSim, GridSim, . . .


Network simulator are not adapted, emulation solutions are too heavy PhD students just need simulator to plug in their algorithm
Data placement/replication Grid economy

Many simulators. Most are home-made, short-lived; Some are released ChicSim designed for the study of data replication (Data Grids), built on ParSec
Ranganathan, Foster, Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications, HPDC02.

OptorSim developped for European Data-Grid


DataGrid, CERN. OptorSim: Simulating data access optimization algorithms

GridSim focused on Grid economy


Buyya et Al. GridSim: A Toolkit for the Modeling and Simulation of Global Grids, CCPE02.

every [sub-]community seems to have its own simulator

SimGrid for Research on Large-Scale Distributed Systems

Experiments for Large-Scale Distributed Systems Research

(22/142)

PeerSim, P2PSim, . . .
Thee peer-to-peer community also has its own private collection of simulators: focused on P2P protocols main challenge = scale

P2PSim Multi-threaded discrete-event simulator. Constant communication time. Alpha release (april 2005)
http://pdos.csail.mit.edu/p2psim/

PlanetSim Multi-threaded discrete-event simulator. Constant communication time. Last release (2006)
http://planet.urv.es/trac/planetsim/wiki/PlanetSim

PeerSim Designed for epidemic protocols. processes = state machines. Two simulation modes: cycle-based (time is discrete) or event-based. Resources are not modeled. 1.0.3 release (december 2007)
http://peersim.sourceforge.net/

OverSim A recent one based on OMNeT++ (april 2008)


http://www.oversim.org/

SimGrid for Research on Large-Scale Distributed Systems

Experiments for Large-Scale Distributed Systems Research

(23/142)

SimGrid
History

(Hawaii, Grenoble, Nancy)

Created just like other home-made simulators (only a bit earlier ;) Original goal: scheduling research need for speed (parameter sweep) accuracy not negligible HPC community concerned by performance

SimGrid in a Nutshell
Simulation communicating processes performing computations Key feature: Blend of mathematical simulation and coarse-grain d. e. simulation Resources: Dened by a rate (MFlop/s or Mb/s) + latency
Also allows dynamic traces and failures

Tasks can use multiple resources explicitely or implicitly


Transfer over multiple links, computation using disk and CPU

Simple API to specify an heuristic or application easily


Casanova, Legrand, Quinson. SimGrid: a Generic Framework for Large-Scale Distributed Experimentations, EUROSIM08.
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (24/142)

Experimental tools comparison


Grid5000 Planetlab Modelnet MicroGrid ns-2 SSFNet GTNetS ChicSim OptorSim GridSim P2PSim PlanetSim PeerSim SimGrid CPU direct virtualize emulation coarse d.e. coarse d.e. coarse d.e. math/d.e. Disk direct virtualize amount coarse d.e. (underway) Network direct virtualize emulation ne d.e. ne d.e. ne d.e. ne d.e. coarse d.e. coarse d.e. coarse d.e. cste time math/d.e. Application direct virtualize emulation emulation coarse d.e. coarse d.e. coarse d.e. coarse d.e. coarse d.e. coarse d.e. state machine coarse d.e. state machine d.e./emul Requirement access none lot material none C++ and tcl Java C++ C Java Java C++ Java Java C or Java Settings xed uncontrolled controlled controlled controlled controlled controlled controlled controlled controlled controlled controlled controlled controlled Scale <5000 hundreds dozens hundreds <1,000 <100,000 <177,000 few 1,000 few 1,000 few 1,000 few 1,000 100,000 1,000,000 few 100,000

Direct execution no experimental bias (?) Experimental settings xed (between hardware upgrades), but not controllable Virtualization allows sandboxing, but no experimental settings control Emulation can have high overheads (but captures the overhead) Discrete event simulation is slow, but hopefully accurate To scale, you have to trade speed for accuracy
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (25/142)

So what simulator should I use?


It really depends on your goal / resources
Grid5000 experiments very good . . . if have access and plenty of time PlanetLab does not enable reproducible experiments ModelNet, ns-2, SSFNet, GTNetS meant for networking experiments (no CPU) ModelNet requires some specic hardware setup MicroGrid simulations take a lot of time (although they can be parallelized) SimGrids models have clear limitations (e.g. for short transfers) SimGrid simulations are quite easy to set up (but rewrite needed) SimGrid does not require that a full application be written Ad-hoc simulators are easy to setup, but their validity is still to be shown, ie, the results obtained may be plainly wrong Ad-hoc simulators obviously not generic (dicult to adapt to your own need)

Key trade-o seem to be accuracy vs speed


The more abstract the simulation the fastest The less abstract the simulation the most accurate Does this trade-o really hold?
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (26/142)

Simulation Validation
Crux of simulation works
Validation is dicult Almost never done convincingly (not specic to CS: other science have same issue here)

How to validate a model (and obtain scientic results?)


Claim that it is plausible (justication = argumentation) Show that it is reasonable
Some validation graphs in a few special cases at best Validation against another validated simulator

Argue that trends are respected (absolute values may be o) it is useful to compare algorithms/designs Conduct extensive verication campaign against real-world settings

SimGrid for Research on Large-Scale Distributed Systems

Experiments for Large-Scale Distributed Systems Research

(27/142)

Simulation Validation: the FLASH example


FLASH project at Stanford
Building large-scale shared-memory multiprocessors Went from conception, to design, to actual hardware (32-node) Used simulation heavily over 6 years

Authors compared simulation(s) to the real world


Error is unavoidable (30% error in their case was not rare) Negating the impact of we got 1.5% improvement Complex simulators not ensuring better simulation results
Simple simulators worked better than sophisticated ones (which were unstable) Simple simulators predicted trends as well as slower, sophisticated ones Should focus on simulating the important things

Calibrating simulators on real-world settings is mandatory

For FLASH, the simple simulator was all that was needed. . .
Gibson, Kunz, Ofelt, Heinrich, FLASH vs. (Simulated) FLASH: Closing the Simulation Loop, Architectural Support for Programming Languages and Operating Systems, 2000
SimGrid for Research on Large-Scale Distributed Systems Experiments for Large-Scale Distributed Systems Research (28/142)

Conclusion
Large-Scale Distributed System Research is Experimental
Analytical models are too limited Real-world experiments are hard & limited Most literature rely on simulation

Simulation for distributed applications still taking baby steps


Compared for example to hardware design or networking communities but more advanced for HPC Grids than for P2P Lot of home-made tools, no standard methodology Very few simulation projects even try to:
Publish their tools for others to use Validate their tools Support other peoples use: genericity, stability, portability, documentation, . . .

SimGrid for Research on Large-Scale Distributed Systems

Experiments for Large-Scale Distributed Systems Research

(29/142)

Conclusion
Claim: SimGrid may prove helpful to your research
User-community much larger than contributors group Used in several communities (scheduling, GridRPC, HPC infrastructure, P2P) Model limits known thanks to validation studies Easy to use, extensible, fast to execute Around since almost 10 years

Remainder of this talk: present SimGrid in detail


Under the cover:
Models used Implementation overview Tool performance and scalability Use cases and success stories

Main limitations
Model validity

Practical usage
How to use it for your research

SimGrid for Research on Large-Scale Distributed Systems

Experiments for Large-Scale Distributed Systems Research

(30/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications Conclusion
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (31/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid
Modeling a Single Resource Multi-hop Networks Resource Sharing

Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications Conclusion
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (32/142)

Analytic Models underlying the SimGrid Framework


Main challenges for SimGrid design
Simulation accuracy:
Designed for HPC scheduling community dont mess with the makespan! At the very least, understand validity range

Simulation speed:
Users conduct large parameter-sweep experiments over alternatives

Microscopic simulator design


Simulate the packet movements and routers algorithms Simulate the CPU actions (or micro-benchmark classical basic operations) Hopefully very accurate, but very slow (simulation time simulated time)

Going faster while remaining reasonable?


Need to come up with macroscopic models for each kind of resource Main issue: resource sharing. Emerge naturally in microscopic approach:
Packets of dierent connections interleaved by routers CPU cycles of dierent processes get slices of the CPU
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (33/142)

Modeling a Single Resource


Basic model: Time = L +
size B

Resource work at given rate (B, in MFlop/s or Mb/s) Each use have a given latency (L, in s)

Application to processing elements (CPU/cores)


Very widely used (latency usually neglected) No cache eects and other specic software/hardware adequation No better analytical model (reality too complex and changing) Sharing easy in steady-state: fair share for each process

Application to networks
Turns out to be inaccurate for TCP B not constant, but depends on RTT, packet loss ratio, window size, etc. Several models were proposed in the literature

SimGrid for Research on Large-Scale Distributed Systems

Resource Models in SimGrid

(34/142)

Modeling TCP performance (single ow, single link)


Padhye, Firoiu, Towsley, Krusoe. Modeling TCP Reno Performance: A Simple Model and Its Empirical Validation. IEEE/ACM Transactions on Networking, Vol. 8, Num. 2, 2000.

B = min

Wmax , RTT RTT

1 2bp/3 + T0 min(1, 3 3bp/8) p(1 + 32p 2 )

Wmax : receiver advertised window

p: loss indication rate

RTT: Round trip time b: #packages acknowledged per ACK T0 : TCP average retransmission timeout value

Model discussion
Captures TCP congestion control (fast retransmit and timeout mecanisms) Assumes steady-state (no slow-start) Accuracy shown to be good over a wide range of values p and b not known in general (model hard to instanciable)

SimGrid for Research on Large-Scale Distributed Systems

Resource Models in SimGrid

(35/142)

SimGrid model for single TCP ow, single link


Denition of the link l
Ll : physical latency Bl : physical bandwidth

Time to transfer size bytes over the link:


Time = Ll + size Bl

Empirical bandwidth: Bl = min(Bl , Wmax ) RTT


Justication: sender emits Wmax then waits for ack (ie, waits RTT) Upper limit: rst min member of previous model RTT assumed to be twice the physical latency Router queue time assumed to be included in this value
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (36/142)

Modeling Multi-hop Networks: Store & Forward


S

l1

l2

l3

First idea, quite natural


Pay the price of going through link 1, then go through link 2, etc. Analogy to the time to go from a city to another: time on each road

Unfortunately, things dont work this way


Whole message not stored on each router Data split in packets over TCP networks (surprise, surprise) Transfers on each link occur in parallel
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (37/142)

Modeling Multi-hop Networks: WormHole


S

l1

l2

pi,j

l3
MTU

Remember Networking classes?


Links packetize stream according to MTU (Maximum Transmission Unit) Easy to simulate (SimGrid until 2002; GridSim 4.0 & most ad-hoc tools do)

Unfortunately, things dont work this way


IP packet fragmentation algorithms complex (when MTUs dier) TCP contention mecanisms:
Sender only emits cwnd packets before ACK Timeouts, fast retransmit, etc.

as slow as packet-level simulators, not quite as accurate


SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (38/142)

Macroscopic TCP modeling is a eld


TCP bandwidth sharing studied by several authors
Data streams modeled as uids in pipes Same model for single stream/multiple links or multiple stream/multiple links
flow 0 link 1 flow 1 link 2 flow 2 link L flow L

Notations
L: set of links Cl : capacity of link l (Cl > 0) nl : amount of ows using link l F: set of ows; f P(L) f : transfer rate of f

Feasibility constraint
Links deliver their capacity at most: l L,
f l

f Cl

SimGrid for Research on Large-Scale Distributed Systems

Resource Models in SimGrid

(39/142)

Max-Min Fairness
Objective function: maximize min(f )
f F

Equilibrium reached if increasing any f decreases a f (with f > f ) Very reasonable goal: gives fair share to anyone Optionally, one can add prorities wi for each ow i maximizing min(wf f )
f F

Bottleneck links
For each ow f , one of the links is the limiting one l (with more on that link l, the ow f would get more overall) The objective function gives that l is saturated, and f gets the biggest share f F, l f ,
f l

f = Cl

and f = max{f , f

l}

L. Massouli and J. Roberts, Bandwidth sharing: objectives and algorithms, e IEEE/ACM Trans. Netw., vol. 10, no. 3, pp. 320-328, 2002.
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (40/142)

Implementation of Max-Min Fairness

Bucket-lling algorithm
Set the bandwidth of all ows to 0 Increase the bandwidth of every ow by . And again, and again, and again. When one link is saturated, all ows using it are limited ( Loop until all ows have found a limiting link removed from set)

Ecient Algorithm
1. Search for the bottleneck link l so that: Cl = min nl Ck , kL nk

2. f l, f = Cll ; n Update all nl and Cl to remove these ows 3. Loop until all f are xed

SimGrid for Research on Large-Scale Distributed Systems

Resource Models in SimGrid

(41/142)

Max-Min Fairness on Homogeneous Linear Network


C1 = C C2 = C
link 1 flow 1 link 2 flow 2

flow 0

n1 = 2 n2 = 2

0 C /2 1 C /2 2 C /2

All links have the same capacity C Each of them is limiting. Lets choose link 1 0 = C /2 and 1 = C /2 Remove ows 0 and 1; Update links capacity Link 2 sets 1 = C /2 Were done computing the bandwidth allocated to each ow

SimGrid for Research on Large-Scale Distributed Systems

Resource Models in SimGrid

(42/142)

Max-Min Fairness on Backbone


C0 C1 C2 C3 C4 =1 = 1000 = 1000 = 1000 = 1000 n0 n1 n2 n3 n4 =1 =1 =2 =1 =1

Flow 1

link 1 link 2 link 0


Flow 2

link 3

link 4

1 999 2 1
1 1000 1000 1000 1000 1, 1 , 2 , 1 , 1

The limiting link is link 0 since The limiting link is link 2 This xes 1 = 999 Done. We know 1 and 2

1 1

= min

This xes 2 = 1. Update the links

SimGrid for Research on Large-Scale Distributed Systems

Resource Models in SimGrid

(43/142)

Side note: OptorSim 2.1 on Backbone


OptorSim (developped @CERN for Data-Grid)
One of the rare ad-hoc simulators not using wormhole

Unfortunately, strange resource sharing:


1. For each link, compute the share that each ow may get: 2. For each ow, compute what it gets: f = min
lf Cl nl

Flow 1

link 1 link 2 link 0


Flow 2

link 3

link 4

C0 C1 C2 C3 C4

=1 = 1000 = 1000 = 1000 = 1000

Cl nl n1 n1 n2 n3 n4

=1 =1 =2 =1 =1

share share share share share

= = = = =

1 1000 500 1000 1000

1 = min(1000, 500, 1000) = 500!! 2 = min( 1 , 500, 1000) = 1

1 limited by link 2, but 499 still unused on link 2


This unwanted feature is even listed in the README le...
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (44/142)

Proportional Fairness
Max-Min validity limits
MaxMin gives a fair share to everyone Reasonable, but TCP does not do so Congestion mecanism: Additive Increase, Muplicative Decrease (AIMD) Complicates modeling, as shown in literature

Proportional Fairness
MaxMin gives more to long ows (resource-eager), TCP known to do opposite Objective function: maximize
F

wf log(f )

(instead of min wf f for MaxMin)


F

log favors short ows


Kelly, Charging and rate control for elastic trac, in European Transactions on Telecommunications, vol. 8, 1997, pp. 33-37.

SimGrid for Research on Large-Scale Distributed Systems

Resource Models in SimGrid

(45/142)

Implementing Proportional Fairness


Karush Kuhn Tucker conditions:
Solution {f }f F is uniq Any other feasible solution {f }f F satisfy:
f F

f f 0 f

Compute the point {f } where the derivate is zero (convex optimization) Use Lagrange multipliers and steepest gradient descent

Proportional Fairness on Homogeneous Linear Network


flow 0 link 1 flow 1 link 2 flow 2 link L flow L

Maths give that:

C 0 = n+1

and

l = 0, l =

C n n+1

Ie, for C=100Mb/s and n=3, 0 = 25Mb/s, 1 = 2 = 3 = 75Mb/s Closer to practitioner expectations
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (46/142)

Recent TCP implementation


More protocol renement, more model complexity
Every agent changes its window size according to its neighbors one
(selsh net-utility maximization)

Computing a distributed gradient for Lagrange multipliers

same updates

TCP Vegas converges to a weighted proportional fairness


Objective function: maximize Lf log(f ) (Lf being the latency)

TCP Reno is even worse


Objective function: maximize
f F

arctan(f )

Low, S.H., A Duality Model of TCP and Queue Management Algorithms, IEEE/ACM Transactions on Networking, 2003.

Ecient implementation: possible, but not so trivial


Computing distributed gradient for Lagrange multipliers: useless in our setting Lagrange multipliers computable with ecient optimal-step gradient descent
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (47/142)

So, what is the model used in SimGrid?


--cfg=network model command line argument
CM02 Vegas Reno MaxMin fairness Vegas TCP fairness (Lagrange approach) Reno TCP fairness (Lagrange approach)

By default in SimGrid v3.3: CM02 Example: ./my simulator --cfg=network model:Vegas

CPU sharing policy


Default MaxMin is sucient for most cases cpu model:ptask L07 model specic to parallel tasks

Want more?
network model:gtnets use Georgia Tech Network Simulator for network Accuracy of a packet-level network simulator without changing your code (!) Plug your own model in SimGrid!!
(usable as scientic instrument in TCP modeling eld, too)
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (48/142)

How are these models used in practice?


Simulation kernel main loop
Data: set of resources with working rate 1. Some actions get created (by application) and assigned to resources 2. Compute share of everyone (resource sharing algorithms) 3. Compute the earliest nishing action, advance simulated time to that time 4. Remove nished actions 5. Loop back to 2

111 000 111 000 111 1 000 0 1111 1 0000 0 1111 1 0000 0 11111 00000 11111 00000
11 00 11 00 11 00 11 00 11 00

111 000 1 0 11 00 111 000 1 0 11 00 111 000 111 000 1 0 11 00 111111 000000 1 0 11 00

11 00 1 0 11 00 1 0
Resource Models in SimGrid

Simulated time
(49/142)

SimGrid for Research on Large-Scale Distributed Systems

Adding Dynamic Availabilities to the Picture


Trace denition
List of discrete events where the maximal availability changes t0 100%, t1 50%, t2 80%, etc.

Adding traces doesnt change kernel main loop


Availability changes: simulation events, just like action ends

111 000 111 000 111 1 000 0 1111 1 0000 0 1111 1 0000 0 11111 00000 11111 00000
SimGrid for Research on Large-Scale Distributed Systems

11 11111 00 00000 11 111 00 000 11111 00000 111111 11 000000 00 111 000 11 111 00 000 111111 000000 11 1111111 00 0000000 11 00 111 000
Resource Models in SimGrid

Simulated time

SimGrid also accept state changes (on/o)


(50/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models
Single link Dumbbell Random platforms Simulation speed

Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications Conclusion
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (51/142)

SimGrid Validation
Quantitative comparison of SimGrid with Packet-Level Simulators
NS2: The Network Simulator SSFnet: Scalable Simulation Framework 2.0 (Dartmouth) GTNetS: Georgia Tech Network Simulator

Methodological limits
Packet-level supposed accurate (comparison to real-world: future work) Max-Min only: other models were not part of SimGrid at that time

Challenges
Which topology? Which parameters consider? e.g. bandwidth, latency, size, all
How to estimate performance? e.g. throughput, communication time How to estimate simulation response time slowdown? P PerfPacketLevel How to compute error? e.g. PerfSimGrid
Velho, Legrand, Accuracy Study and Improvement of Network Simulation in the SimGrid Framework, to appear in Second International Conference on Simulation Tools and Techniques, SIMUTools09, Rome, Italy, March 2009.

(other publication by Velho and Legrand submitted to SimuTools09)


SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (52/142)

SimGrid Validation
Experiments assumptions
Topology: Single Link; Dumbbell; Random topologies (several) Parameters: data size, #ows, #nodes, link bandwidth and latency Performance: communication time and bandwidth estimation
All TCP ows start at the same time All TCP ows are stopped when the rst ow completes Bandwidth estimation is done based on communication remaining.

Slowdown:

Simulation time Simulated time

Notations
B, link nominal bandwidth ; L, link latency S, Amount of transmitted data Error: (TGTNetS , TSimGrid ) = log(TGTNetS ) log(TSimGrid )
Symmetrical for over and under estimations (thanks to logs) 1X Average error: || = |i | Max error: |max | = max(|i |) i n i Computing gain/loss in percentage: e || 1 or e |max | 1
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (53/142)

Validation experiments on a single link (1/2)


Experimental settings
TCP source Link 1 flow TCP sink

Flow throughput as function of L and B Fixed size (S=100MB) and window (W=20KB) Legend

Results
1000 Throughput (KB/s) 800 600 400 200 0 1000 500 Bandwidth (KB/s) 0 100 80 60 40 20 0

Mesh: SimGrid results S S/min(B, : GTNetS results : NS2 results : SSFNet


with TCP FAST INTERVAL=default
W ) 2L

+L

+: SSFNet
with TCP FAST INTERVAL=0.01

Latency (ms)

Conclusion
SimGrid estimations close to packet-level simulators (when S=100MB)
When B < When B >
W 2L W 2L

(B=100KB/s, L=500ms), |max | || 1% (B=100KB/s, L= 10ms), |max | || 2%


Resource Models in SimGrid (54/142)

SimGrid for Research on Large-Scale Distributed Systems

Validation experiments on a single link (2/3)


Experimental settings
TCP source Link 1 flow TCP sink

Compute achieved bandwidth as function of S Fixed L=10ms and B=100MB/s

Evaluation of the CM02 model


900 800

Throughput (Kb/s)

700 600 500 400 300 200 100 0 0.001 0.01 0.1 1 10 NS2 SSFNet (0.2) SSFNet (0.01) GTNets SimGrid 100 1000

Packet-level tools dont completely agree SSFNet TCP FAST INTERVAL bad default GTNetS is equally distant from others

Data size (Mb)


2

CM02 doesnt take slow start into account S S < 100KB S [100KB; 10MB] S > 10MB || 146% 17% 1% |max | 508% 80% 1%

1.5

||

0.5

0 0.001

0.01

0.1

1 Data size (MB)

10

100

1000

SimGrid for Research on Large-Scale Distributed Systems

Resource Models in SimGrid

(55/142)

Validation experiments on a single link (3/3)


Experimental settings
TCP source Link 1 flow TCP sink

Compute achieved bandwidth as function of S Fixed L=10ms and B=100MB/s

Evaluation of the LV08 model


900 800

Throughput (KB/s)

700 600 500 400 300 200 100 0 0.001 0.01 0.1 1 10 NS2 SSFNet (0.2) SSFNet (0.01) GTNets SimGrid 100 1000

Statistical analysis of GTNetS slow-start New SimGrid model (MaxMin based)


Bandwidth decreased (92%) Latency changed to 10.4 L

This dramatically improve validity range S S < 100KB S > 100KB || 12% 1% |max | 162% 6%

Data size (MB)


2

1.5

||

0.5

0 0.001

0.01

0.1

1 Data size (MB)

10

100

1000

SimGrid for Research on Large-Scale Distributed Systems

Resource Models in SimGrid

(56/142)

Validation experiments on the dumbbell topology


Experimental settings
Flo wA 10 B/s 0M s m 10

10 10

0M B/ ms s

B MB/s
s B/ M 00 ms 1 10
Fl ow B

20 ms
10 0M

ms

B/

Comparison limited to the GTNetS packet-level simulator Bandwidth: linearly sampled with 16 points B [0.01, 1000] MB/s Latency: linearly sampled with 20 points L [0, 200] ms Size: S=100MB Compare instantaneous bandwidth share (SimGrid vs. GTNetS)

SimGrid for Research on Large-Scale Distributed Systems

Resource Models in SimGrid

(57/142)

Validation experiments on the dumbbell topology


Throughput as function of L when B=100MB/s (limited by latency)
0.60 0.55 Instantaneous Bandwidth Share (MB/s)
Flo wA 0 10 /s MB ms 10

0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 Flow B Flow A

10 0M B/ 10 ms s

B MB/s
0M B/ s

20 ms
10 0M

10

ms 10
ow Fl B

ms

B/

Similar trends in both simulators

0.1 1 8

0.1 1 8

50

100

200

10

50

100

150

10

L < 10 ms Flow A gets less bandwidth than B

GTNetS

Latency (ms)

SimGrid

L = 10 ms Flow A gets as much bandwidth as B L > 10 ms Flow A gets more bandwidth than B

Neglectable error
|max | 0.1%
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (58/142)

150

200

Validation experiments on the dumbbell topology


Throughput as function of L when B=0.1MB/s (limited by bandwidth)
0.100 Instantaneous Bandwidth Share (MB/s) 0.090 0.080 0.070 0.060 0.050 0.040 0.030 0.020 0.010 10 50 100 100 150 200 150 200 10 50 0
Flow B Flow A

Flo

wA 10 B/s 0M s m 10

10 10

0M B/ ms s

B MB/s
s B/ 0M 10 0 ms 1
ow Fl B

20 ms
10 0M

ms

B/

Analysis
|| 15%; |max | 202% Model inaccurate or badly instantiated...

GTNetS Latency (ms)

SimGrid

The trend is respected (as L increases share of ow A increases)

SimGrid for Research on Large-Scale Distributed Systems

Resource Models in SimGrid

(59/142)

Data tting on the Bandwidth ratio in GTNetS


Ratio = f(B,L) Approximation

Ratio =

Share ow A (in GTNetS) Share ow B Li +


8775 Bi 8775 Bj

Ratio

6 5 4 3 2 1

Data tting = Lj +
5e+07 4e+07 3e+07 2e+07 1e+07 0.2 0 Bandwidth 0.1 0.12 0.14 Latency 0.16 0.18

0.02 0.04 0.06 0.08

Conclusion: MaxMin needs priorities


CM02 use latency only (wi = Li ) bandwidth should be considered

LV08 improvements
Max-Min with priorities: wi =
link k is used by ow i

Lk +

8775 Bk

Improved results:
|| 4% ; |max | 44%

SimGrid for Research on Large-Scale Distributed Systems

Resource Models in SimGrid

(60/142)

Validation experiments on the dumbbell topology


Flo w A 10 B/s 0M s m 10

Summary of all experiences for both ows


Some are bandwidth-limited, some are latency-limited Flow A
2 Old Improved 2 1.5 1.5

10 0M B/s 10 ms

B MB/s
s B/ 0M s m 10
ow Fl B

20 ms
10 0M

10

ms

B/s

Flow B
Old Improved

0.5

||
0 100 200 300 400 500 600 700 800 900 1000

||

0.5

0 Experiments

0 0 100 200 300 400 500 600 700 800 900 1000 Experiments

Conclusion
SimGrid uses an accurate yet fast sharing model Improved model is validated against GTNetS Accuracy has to be evaluated against more general topologies
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (61/142)

Validation experiments on random platforms


Experiments on 160 platforms
Hayward

Platforms generated with BRITE (Waxman model) Bandwidth: uniform distribution


Homogeneous: B [100,128] MB/s Heterogeneous: B [10,128] MB/s
Florient

Victoriaville

Europe SPARCs King Boston

Tremblay

Browne Julian Mike Gregory

Gagnon Turcotte Croteau

Mahoney Interleaf UniPress

Bentz Horne AutoCAD Alain Emacs

Vincent Poussart Romano

Toronto

Ltd

Jean_Maurice Pointe_Claire Linda Saint_Amand Ginette Thibault Toulouse Jean_Paul George Maltais Fourier Wright Frank Wilfrid Mont_Tremblant Charron Ottawa Jupiter

Latency: L [0; 5] ms (euclidian distance) Flow size: S=10MB #ows: F=150; #nodes: N [50; 200] Four scenarios, ten dierent ow instantiations

Ronald

Foisy

Jean_Yves

Boily

Pedro Vehlo, Arnaud Legrand. Accuracy Study and Improvement of Network Simulation in the SimGrid Framework. Submitted to SimuTools09.

SimGrid for Research on Large-Scale Distributed Systems

Resource Models in SimGrid

(62/142)

Validation experiments on random platforms


Summary of experiments
1 2

Max Error (|max |)

Mean Error (||)

Old Improved

Old Improved 1.5

0.5

0.5

0 3 2 1 0 1 2 3

0 3 2 1 0 1 2 3

Ratio

10

20

30

40

50

60

70 80 90 100 110 120 130 140 150 160 Experiment

Ratio

10

20

30

40

50

60

70 80 90 100 110 120 130 140 150 160 Experiment

ratio = log (|Old |) log (|Improved |)

Interpretation
Clear improvements of new model || < 0.2 (i.e., 22%); |max | still challenging up to 461%
Resource Models in SimGrid (63/142)

SimGrid for Research on Large-Scale Distributed Systems

Simulation speed
200-nodes/200-ows network sending 1MB each
# of ows 10 25 50 100 200 GTNetS Simulation time simulation simulated 0.661s 0.856 1.651s 1.712 3.697s 3.589 7.649s 7.468 15.705s 11.515 SimGrid Simulation time 0.002s 0.008s 0.028s 0.137s 0.536s
simulation simulated

0.002 0.010 0.028 0.140 0.396

200-nodes/200-ows network sending 100MB each


# of ows 10 25 50 100 200 GTNetS Simulation time 65s 163s 364s 753s 1562s
simulation simulated

0.92 1.85 3.89 8.08 12.59

SimGrid Simulation time 0.001s 0.008s 0.028s 0.138s 0.538s

simulation simulated

0.00002 0.00010 0.00029 0.00142 0.00402

GTNetS execution time linear in both data size and #ows SimGrid only depends on #ows
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (64/142)

Conclusion
Models of Grid Simulators
Most are overly simplistic (wormhole: slow and inaccurate at best) Some are plainly wrong (OptorSim unfortunate sharing policy)

Analytic TCP models not trivial, but possible


Several models exist in the literature They can be implemented eciently SimGrid implements Max-Min fairness, proportional (Vegas & Reno)

SimGrid almost compares to Packet-Level Simulators


Validity acceptable in a many cases (|| 5% in most cases) Validity range clearly delimited Maximum error still unacceptable
It is often one GTNetS ow that achieves an insignicant throughput Maybe SimGrid is right and GTNetS is wrong?

SimGrid speedup 103 , GTNetS slowdown up to 10 (ns-2, SSFNet even worse) SimGrid execution time depends only on #ows, not data size SimGrid can use GTNetS to perform network predictions (for paranoids)
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (65/142)

Future Work
Towards Real-World Experiments
Assess the several models implemented in SimGrid Assess Packet-Level simulators themselves Use even more realistic platforms: high contention scenarios Use more realistic applications: e.g. (NAS benchmark)

Improve the Macrosopic TCP Models in SimGrid


Decrease maximum error Use LV08 by default instead of CM02

Develop New Models


Compound models (inuence of computation load over communications) High-speed networks such as quadrics or myrinet Model the disks + Model multicores
SimGrid for Research on Large-Scale Distributed Systems Resource Models in SimGrid (66/142)

size

dont seem sucient

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications Conclusion
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (67/142)

Platform Instantiation

To use models, one must instantiate them Key questions


How can I run my tests on realistic platforms? What is a realistic platform? What are platform parameters? What are their values in real platforms?

Sources of platform descriptions


Manual modeling: dene the characteristics with your sysadmins Automatic mapping Synthetic platform generator

SimGrid for Research on Large-Scale Distributed Systems

Platform Instanciation

(68/142)

What is a Platform Instance Anyway?


Structural description
Hosts list Links and interconnexion topology

Peak Performance
Bandwidth and Latencies Processing capacity

Background Conditions
Load Failures

SimGrid for Research on Large-Scale Distributed Systems

Platform Instanciation

(69/142)

Platform description for SimGrid


Example of XML le
<?xml version=1.0?> <!DOCTYPE platform SYSTEM "surfxml.dtd"> <platform version="2"> <host id="Jacquelin" power="137333000"/> <host id="Boivin" power="98095000"> <prop key="someproperty" value="somevalue"/> <!-- attach arbitrary data to hosts/links --> </host> <link id="1" bandwidth="3430125" latency="0.000536941"/> <route src="Jacquelin" dst="Boivin"><link:ctn id="1"/></route> <route src="Boivin" dst="Jacquelin"><link:ctn id="1"/></route> </platform>

Declare all your hosts, with their computing power


other attributes: availability le: trace le to let the power vary state le: trace le to specify whether the host is up/down

Declare all your links, with bandwidth and latency


bandwidth le, latency le, state le: trace les sharing policy {shared, fatpipe} (fatpipe no sharing)

Declare routes from each host to each host (list of links) Arbitrary data can be attached to components using the <prop> tag
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (70/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications Conclusion
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (71/142)

Platform Catalog
Several Existing Platforms Modeled
Grid5000 9 sites, 25 clusters 1,528 hosts GridPP 18 clusters 7,948 hosts DAS 3 5 clusters 277 hosts LCG 113 clusters 44,184 hosts

Files available from the Platform Description Archive http://pda.gforge.inria.fr


(+ tool to extract platform subsets)
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (72/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications Conclusion
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (73/142)

Synthetic Topology Generation


Characterizing Platform Realism (to design a generator)
Examine real platforms Discover principles Implement a generator

Topology of the Internet


Subject of studies in Network Community for years Decentralized growth, obeying complex rules and incentives Could it have a mathematical structure? Could we then have generative models?

Three generations of graph generators


Random (or at) Structural Degree-based
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (74/142)

Random Platform Generator


Two-step generators
1. Nodes are placed on on a square (of side c) following a probability law 2. Each couple (u, v ) get interconnected with a given probability

1. Node Placement

Uniform

Heavy Tailed

SimGrid for Research on Large-Scale Distributed Systems

Platform Instanciation

(75/142)

Random Platform Generator


2. Probability for (u, v ) get be connected
Uniform: Uniform probability
(not realist, but simple enough to be popular)

Exponential: probability P(u, v ) = e d/(Ld)

0<

d: Euclidean distance between u and v ; L = c 2; c side of placement square Amount of edges increases with

Waxman: probability P(u, v ) = e d/(L) ,

0 < ,

Amount of edges increases with , edge length heterogeneity increases with


Waxman, Routing of Multipoint Connections, IEEE J. on Selected Areas in Comm., 1988.

Locality-aware: probability P(u, v ) =

if d < L r if d L r

Zegura, Calvert, Donahoo, A quantitative comparison of graph-based models for Internet topology, IEEE/ACM Transactions on Networking, 1997.
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (76/142)

Structural Topology Generators


Generate the hierarchy explicitly (Top-Down)
AS-level Topology (1) AS Nodes

Transit-stub [Zegura et Al]


Starting from a connected graph Replace some nodes by connected graphs Add some additional edges (GT-ITM, BRITE)
Router Level Topologies (2) Edge Connection Method (3)

Transit Domains

Multi-homed stub

N-level [Zegura et Al]


Iterate previous algorithm (Tiers, GT-ITM)
Stub-Stub Edge Stub Domains

SimGrid for Research on Large-Scale Distributed Systems

Platform Instanciation

(77/142)

Power-Law : Rank Exponent


Analysis of topology at AS level
Rank rv of node v : its index in the order of decreasing degree Degree dv of node v is proportional to its rank rv to the power of constant R
R dv = rv k
1000 "971108.rank" exp(6.34763) * x ** ( 0.811392 ) 100 100 1000 "980410.rank" exp(6.62082) * x ** ( 0.821274 ) 100 1000 "981205.rank" exp(6.16576) * x ** ( 0.74496 )

100 "routes.rank" exp(4.39519) * x ** ( 0.487592 )

10

10

10

10

0.1 1 10 100 1000 10000

0.1 1 10 100 1000 10000

0.1 1 10 100 1000 10000

1 1 10 100 1000 10000

Nov 97
(R = 0, 81)

Apr 98
(R = 0, 82)

Dec 98
(R = 0, 74)

Routers 95
(R = 0, 48)

Seem to be necessary condition for topology realism


Faloutsos, Faloutsos, Faloutsos, On Power-law Relationships of the Internet Topology, SIGCOMM 1999, p251262.
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (78/142)

Degree-based Topology Generators


Power-laws received a lot of attention recently
Small-World theory Not only in CS, but also in sociology for example

Using this idea for realistic platform generation


Enforce the power law by construction of the platform

Bar`basi-Albert algorithm a
Incremental growth Anity connexion

Probability to connect new v to existing u


Depends on du : P(u, v ) = du k dk

Barabsi and Albert, Emergence of scaling in random networks, Science 1999, num 59, p509512. a
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (79/142)

Checking two Power-Laws


Out degree rank
outdegree_rank
outdegree_rank
outdegree_rank outdegree_rank 10 100

10

outdegree_rank 1 1 10 100 rank 1000

100

10

10

100

1 1 10 100 rank 1000


1 10 100 rank 1000 10000 1 10 100 rank 1000 10000

10

100 rank

1000

Interdomain
11/97
1000

Barbasi Albert a
(BRITE)

Waxman

Transit-Stub
(GT-ITM)

GT-ITM

Out degree frequency


1000 1000 frequency frequency frequency 100 1000 frequency 100 100 100

frequency

10

10

10

10

10

1 1 outdegree_freq 10

1 1 outdegree_freq 10

1 1 outdegree_freq 10

1 1 outdegree_freq 10

1 100 outdegree_freq

Interdomain
11/97

Barbasi Albert a
(BRITE)

Waxman

Transit-Stub
(GT-ITM)

GT-ITM

Laws respected by interdomain topology seemingly necessary condition Bar`basi-Albert performs the best (as expected) a GT-ITM performs the worst
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (80/142)

Power laws discussion


Other power laws? On which measurements?
Expansion Resilience Distortion Excentricity distribution Eigenvalues distribution Set cover size, . . .

Methodological limits
Necessary condition = sucient condition Laws observed by Faloutsos brothers are correlated They could be irrelevant parameters
Baford, Bestavros, Byers, Crovella, On the Marginal Utility of Network Topology Measurements, 1st ACM SIGCOMM Workshop on Internet Measurement, 2001.

They could even be measurement bias!


Lakhina, Byers, Crovella, Xie, Sampling Biases in IP Topology Measurements, INFOCOM03.

Networks have Power Laws AND structure!


Cannot aord to trash hierarchical structures just to obey power laws! Some projects try to combine both (GridG)
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (81/142)

So, Structural or Degree-based Topology Generator?


Observation
AS-level and router-level have similar characteristics Degree-based represent better large-scale properties of the Internet Hierarchy seems to arise from degree-based generators
Tangmunarunkit, Govindan, Jamin, Shenker, Willinger, Network topology generators: Degreebased vs structural, SIGCOMM02

Conclusion
10,000 nodes platform: Degree-based generators perform better 100 nodes platform
Power-laws make no sense Structural generators seem more appropriate

Routing still remains to be characterized


It is known that a multi-hop network route is not always the shortest path
Paxson, Measurements and Analysis of End-to-End Internet Dynamics, PhD Thesis UCB, 1997.

Generators wrongly assume the opposite


SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (82/142)

Network Performance (labeling graph edges)


We need more than a graph!
Bandwidth and latency Sharing capacity (backplane)

Model Physical Characteristics (Peak Performance+Background)


Some models in topology generators (WAN/LAN/SAN) Need to simulate background trac (no accepted model to generate it) Simulation can be very costly

Model End-to-End Performance (Usable Performance)


Easier way to go Use real raw measurements (NWS, . . . )

Some models exist Lee, Stepanek, On future global grid communication performance, HCW

SimGrid for Research on Large-Scale Distributed Systems

Platform Instanciation

(83/142)

Computing Resources (labeling graph vertices)


Situation quite dierent from network resources:
Hard to qualify usable performance Easy to model peak performance + background conditions

Ad-hoc generalization of peak performance


Look at a real-world platform, e.g., the TeraGrid Generate new sites based on existing sites

Statistical modeling (as usual)


Examine many production resources Identify key statistical characteristics Come up with a generative/predictive model

SimGrid for Research on Large-Scale Distributed Systems

Platform Instanciation

(84/142)

Synthetic Clusters
Clusters are classical resource
What is the typical distribution of clusters?

Commodity Cluster synthesizer


Examined 114 production clusters (10K+ procs) Came up with statistical models
Linear t between clock-rate and release-year within a processor family Quadratic fraction of processors released on a given year

Validated model against a set of 191 clusters (10K+ procs) Models allow extrapolation for future congurations Models implemented in a resource generator
Kee, Casanova, Chien, Realistic Modeling and Synthesis of Resources for Computational Grids, Supercomputing 2004.

SimGrid for Research on Large-Scale Distributed Systems

Platform Instanciation

(85/142)

Background Conditions (workload and resource availability)


Probabilistic Models
Naive: experimental distributed availability and unavailability intervals Weibull distributions:
Nurmi, Brevik, Wolski, Modeling Machine Availability in Enterprise and Wide-area Distributed Computing Environments, EuroPar 2005.

Models by Feitelson et Al.: job inter-arrival times (Gamma), amount of work requested (Hyper-Gamma), number of processors requested: Compounded (2p , 1, ...)

Feitelson, Workload Characterization and Modeling Book, available at http://www.cs.huji. il/~feit/wlmod/

Traces
The Grid Workloads Archive (http://gwa.ewi.tudelft.nl/pmwiki/) Resource Prediction System Toolkit (RPS) based traces (http://www.cs.
northwestern.edu/~pdinda/LoadTraces)

Home-made traces with NWS


SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (86/142)

Example Synthetic Grid Generation


Generate topology and networks
Topology: Generate a 5,000 node graph with Tiers Latency: Euclidian distance (scaling to obtain the desired network diameter) Bandwidth: Set of end-to-end NWS measurements

Generate computational resources


Pick 30% of the end-points Clusters at each end-point: Kees synthesizer for Year 2008 Cluster load: Feitelsons model (parameters picked randomly) Resource failures: based on the Grid Workloads Archive

All-in-one tools
GridG
Lu and Dinda, GridG: Generating Realistic Computational Grids, Performance Evaluation Review, Vol. 30::4 2003.

Simulacrum tool
Quinson, Suter, A Platform Description Archive for Reproducible Simulation Experiments, Submitted to SimuTools09.
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (87/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications Conclusion
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (88/142)

Automatic Network Mapping


Main Issue of synthetic generators: Realism!
Solution: Actually map a real platform

Several levels of information (depending on the OSI layer)


Physical inter-connexion map (wires in the walls) Routing infrastructure (path of network packets, from router to switch) Application level (focus on eects bandwidth & latency not causes)
Our goal: conduct experiments at application level, not administrating tool

Network Mapping Process: two-step


1. Measurements 2. Reconstruct a graph

SimGrid for Research on Large-Scale Distributed Systems

Platform Instanciation

(89/142)

Classical Measurements in a Grid Environment?


Use of low-level network protocols (like SNMP or BGP)
Example: Remos Use of SNMP restricted for security reasons (DoS or spying)

Use of traceroute or ping (i.e. on ICMP)


Examples: TopoMon, Lumeta, IDmaps, Global Network Positioning Use of ICMP more and more restricted by admins (for security reasons)

Pathchar
No network privilege required, but must be root on hosts not adapted to Grid settings

Measurements must be at application-level (no privilege)

SimGrid for Research on Large-Scale Distributed Systems

Platform Instanciation

(90/142)

Solutions relying on application-level measurements


NWS (Network Weather Service UCSB)
De facto standard (used in Globus, DIET, NINF) to gather info on network Reports bandwidth, latency, CPU availability, and future trends Only quantitative values, no topological information (but one can label a big clique with NWS-provided values)

ENV (Eective Network View UCSD)


Use interference measurements to build a tree representation

ECO (Ecient Collective Communication CMU)


Use application-level measurements to optimize collective communications Should be generalized

Existing reconstruction algorithms


Cliques (NWS, ECO) or trees (ENV, Classical latency clustering)
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (91/142)

ALNeM (Application-Level Network Mapper)


Long-term goal: be a tool providing topology to network-aware applications Short-term goal: allow the study of network mapping algorithms

S S S S S S

S DB S

1 thm ori Alg Algorithm 2 Al go rith m 3

Wrong topology

Wrong values

Architecture

Right platform

Lightweight distributed measurement infrastructure (collection of sensors) MySQL measurement database Topology builder, with several reconstruction algorithms
Eyraud-Dubois, Legrand, Quinson, Vivien, A First Step Towards Automatically Building Network Representations, EuroPar07.

SimGrid for Research on Large-Scale Distributed Systems

Platform Instanciation

(92/142)

Reconstruction algorithms

Basic algorithms
Clique: Connect all pairs of nodes, label with measured values Maximum Bandwidth Spanning Tree and Minimum Latency Spanning Tree

Improved Spanning Tree


Real platforms are not trees, BwTree and LatTree miss edges Add edges to spanning trees to improve predictions

Aggregation
Grow a set of connected nodes For each new one, connect it to already chosen ones to improve predictions

SimGrid for Research on Large-Scale Distributed Systems

Platform Instanciation

(93/142)

Evaluation methodology
Goal: Quantify similarity between initial and reconstructed platforms Running in situ: beware of experimental bias!
Reconstructed platform doesnt exist in the real world cannot compare measurements on both platforms hard to assess quality of reconstruction algorithms on real platforms

Testing on simulator: both initial and reconstructed platforms are simulated

Several evaluation metrics


1. Compare end-to-end measurements (communication-level) 2. Compare interference amount: BW (a b) 2 Interf ((a, b) , (c, d)) = 1 i BW (a b c d) 3. Compare application running times (application-level) Comm. schema // comm Token-ring Ring No Broadcast Tree No All2All Clique Yes Parallel Matrix Multiplication 2D Yes
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation

# steps 1 1 1 procs
(94/142)

Evaluation methodology
Apply all evaluations on all reconstructions for several platforms
S S S S S S S DB S
1 thm ori Alg Algorithm 2 Al go rith m 3

Wrong topology

Wrong values

Right platform

Measurements
Bandwidth matrix Latency matrix

Algorithms
Clique BW/Lat Spanning Tree Improved BW/Lat Tree Aggregate

Evaluation criteria
End-to-end meas. Interference count Application-level

Development on simulator, creating a real tool for real platforms


Measurement sensors implemented using GRAS:
Same code running either on top of SimGrid, or in situ (more to come)

ALNeM usable in situ, presumably with same predictive quality


SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (95/142)

Experiments on simulator: Renater platform


Real platform built manually (real measurements + admin feedback)

End to end
1.4

Interferences
2500

Application-level
Correct pred. False pos. False neg.
Accuracy
2

# occurences

Accuracy

2000 1500 1000 500

BW Lat
1.2

# actual interf.

token broadcast all2all pmm

iq u ee e BW T Im ree pT La t r Im eeB pT W re A eL gg at re ga te

Clique:

Very good for end-to-end (of course) No contention captured missing interference

Cl iq Tr ue ee B T W Im ree pT La t r Im eeB pT W re A eLa gg re t ga te

bad predictions

Spanning Trees: missing links

bad predictions

(over-estimates latency, under-estimates bandwidth, false positive interference)

Improved Spanning Trees have good predictive power Aggregate accuracy discutable
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (96/142)

Cl iq Tr ue ee B T W Im ree pT La t r Im eeB pT W re A eL gg at re ga te

1.0

Tr

Cl

Experiments on simulator: GridG platforms


GridG is a synthetic platform generator [Lu, Dinda SuperComputing03] Generates realistic platforms Experiment: 40 platforms (60 hosts default GridG parameters) End to end measurements
1.8

Application-level measurements
4

Accuracy

1.6 1.4 1.2

Accuracy

Bandwidth Latency

token broadcast all2all pmm


2

iq u ee e BW T Im ree pT La t r Im eeB pT W re A eL gg at re ga te

Interpretation

Naive algorithms lead to poor results Improved trees yield good reconstructions
ImpTreeBW error 3% for all2all (worst case)
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (97/142)

iq Tr ue ee B T W Im ree pT La t r Im eeB pT W re A eL gg at re ga te

Cl

Tr

Cl

Adding routers to the picture


New set of experiments: only leaf nodes run the measurement processes End to end measurements
4

Application-level measurements

BW Lat Accuracy
4

token broadcast all2all pmm

iq Tr ue ee B T W Im ree pT La t r Im eeB pT W re A eL gg at re ga te

Accuracy

Cl

Interpretation
None of the proposed heuristic is satisfactory Future work: improve this!
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (98/142)

Cl iq Tr ue ee B T W Im ree pT La t r Im eeB pT W re A eL gg at re ga te

Conclusions about ALNeM


Reconstruction algorithm evaluation from application POV
Several quality criteria: similarity of end-to-end, interferences, application timings Runs on simulator or in-situ thanks to GRAS (& SimGrid)
(successfully reconstructed real platforms, but quality assessment very hard)

Classical algorithms are not satisfactory


Spanning trees: miss edges, leading to performance under-estimation Cliques: do not capture any existing interference Improving spanning trees yields much better results (specially ImpTreeBW) Still problems with internal routers

Future work
Other measurements from the sensors (new inputs to algorithms)
Interference (but very expensive to acquire); Packet gap and back-to-back packets

Method based on successive renements


1. Spanning tree as rst approximation 2. Renement by adding some missing links 3. Some (not all) interference measurements to double-check the result
SimGrid for Research on Large-Scale Distributed Systems Platform Instanciation (99/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications Conclusion
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (100/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications Conclusion
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (101/142)

User-visible SimGrid Components


SimDag
Framework for DAGs of parallel tasks

MSG
Simple applicationlevel simulator

GRAS

AMOK

SMPI
Library to run MPI applications on top of a virtual environment

Framework toolbox to develop distributed applications

XBT: Grounding features (logging, etc.), usual data structures (lists, sets, etc.) and portability layer

SimGrid user APIs


SimDag: model applications as DAG of (parallel) tasks MSG: model applications as Concurrent Sequential Processes GRAS: develop real applications, studied and debugged in simulator AMOK: set of distributed tools (bandwidth measurement, failure detector, . . . ) SMPI: simulate MPI codes (still under development) XBT: grounding toolbox

Which API should I choose?


Your application is a DAG SimDag You have a MPI code SMPI You study concurrent processes, or distributed applications
You need graphs about several heuristics for a paper MSG You develop a real application (or want experiments on real platform) GRAS
(102/142)

Most popular API (for now): MSG


SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments

Argh! Do I really have to code in C?!


No, not necessary
Some bindings exist: Java bindings to the MSG interface (new in v3.3) More bindings planned:
C++, Python, and any scripting language SimDag interface

Well, sometimes yes, but...


SimGrid itself is written from C for speed and portability (no dependency) All components naturally usable from C (most of them only accessible from C) XBT eases some diculties of C
Full-featured logs (similar to log4j), Exception support (in ANSI C) Popular abstract data types (dynamic array, hash tables, . . . ) Easy string manipulation, Conguration, Unit testing, . . .

What about portability?


Regularly tested under: Linux (x86, amd64), Windows and MacOSX Supposed to work under any other Unix system (including AIX and Solaris)
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (103/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications Conclusion
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (104/142)

SimDag: Comparing Scheduling Heuristics for DAGs


Root 1 2 3 2 3 4 5 1 4 1 6 5
Time

3
Time

6 End

Main functionalities
1. Create a DAG of tasks
Vertices: tasks (either communication or computation) Edges: precedence relation

2. Schedule tasks on resources 3. Run the simulation (respecting precedences)


Compute the makespan
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (105/142)

The SimDag interface


DAG creation
Creating tasks: SD task create(name, data) Creating dependencies: SD task dependency {add/remove}(src,dst)

Scheduling tasks
SD task schedule(task, workstation number, *workstation list, double *comp amount, double *comm amount, double rate)
Tasks are parallel by default; simply put workstation number to 1 if not Communications are regular tasks, comm amount is a matrix Both computation and communication in same task possible rate: To slow down non-CPU (resp. non-network) bound applications

SD task unschedule, SD task get start time

Running the simulation


SD simulate(double how long) (how long < 0 until the end) SD task {watch/unwatch}: simulation stops as soon as tasks state changes

Full API in the doxygen-generated documentation


SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (106/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes
Motivations, Concepts and Example of Use Java bindings A Glance at SimGrid Internals Performance Results

GRAS: Developing and Debugging Real Applications Conclusion


SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (107/142)

MSG: Heuristics for Concurrent Sequential Processes


(historical) Motivation
Centralized scheduling does not scale SimDag (and its predecessor) not adapted to study decentralized heuristics MSG not strictly limited to scheduling, but particularly convenient for it

Main MSG abstractions


Agent: some code, some private data, running on a given host
set of functions + XML deployment le for arguments

Task: amount of work to do and of data to exchange


MSG task create(name, compute duration, message size, void *data) Communication: MSG task {put,get}, MSG task Iprobe Execution: MSG task execute MSG process sleep, MSG process {suspend,resume}

Host: location on which agents execute Mailbox: similar to MPI tags


SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (108/142)

The MSG master/workers example: the worker


The master has a large number of tasks to dispatch to its workers for execution int worker(int argc, char *argv[ ]) {
m_task_t task; int id = atoi(argv[1]); char mailbox[80]; int errcode;

sprintf(mailbox,"worker-%d",id); while(1) { errcode = MSG_task_receive(&task, mailbox); xbt_assert0(errcode == MSG_OK, "MSG_task_get failed"); if (!strcmp(MSG_task_get_name(task),"finalize")) { MSG_task_destroy(task); break; } INFO1("Processing %s", MSG_task_get_name(task)); MSG_task_execute(task); INFO1("%s done", MSG_task_get_name(task)); MSG_task_destroy(task); } INFO0("Im done. See you!"); return 0; }
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (109/142)

The MSG master/workers example: the master


int master(int argc, char *argv[ ]) {
int number_of_tasks = atoi(argv[1]); double task_comm_size = atof(argv[3]); char mailbox[80]; int i; double task_comp_size = atof(argv[2]); int workers_count = atoi(argv[4]); char buff[64];

/* Dispatching (dumb round-robin algorithm) */ for (i = 0; i < number_of_tasks; i++) { sprintf(buff, "Task_%d", i); task = MSG_task_create(sprintf_buffer, task_comp_size, task_comm_size, NULL); sprintf(mailbox,"worker-%d",i % workers_count); INFO2("Sending %s to mailbox , task->name, mailbox); %s" MSG_task_send(task, mailbox); } /* Send finalization message to workers */ INFO0("All tasks dispatched. Lets stop workers"); for (i = 0; i < workers_count; i++) MSG_task_put(MSG_task_create("finalize", 0, 0, 0), workers[i], 12); INFO0("Goodbye now!"); return 0; }

SimGrid for Research on Large-Scale Distributed Systems

Using SimGrid for Practical Grid Experiments

(110/142)

The MSG master/workers example: deployment le


Specifying which agent must be run on which host, and with which arguments XML deployment le
<?xml version=1.0?> <!DOCTYPE platform SYSTEM "surfxml.dtd"> <platform version="2"> <!-- The master process (with some arguments) --> <process host="Tremblay" function="master"> <argument value="6"/> <!-- Number of tasks --> <argument value="50000000"/> <!-- Computation size of tasks --> <argument value="1000000"/> <!-- Communication size of tasks --> <argument value="3"/> <!-- Number of workers --> </process> <!-- The <process <process <process worker process host="Jupiter" host="Fafard" host="Ginette" (argument: mailbox number to function="worker"><argument function="worker"><argument function="worker"><argument use) --> value="0"/></process> value="1"/></process> value="2"/></process>

</platform>

SimGrid for Research on Large-Scale Distributed Systems

Using SimGrid for Practical Grid Experiments

(111/142)

The MSG master/workers example: the main()

Putting things together int main(int argc, char *argv[ ]) {


/* Declare all existing agent, binding their name to their function */ MSG_function_register("master", &master); MSG_function_register("worker", &worker); /* Load a platform instance */ MSG_create_environment("my_platform.xml"); /* Load a deployment file */ MSG_launch_application("my_deployment.xml"); /* Launch the simulation (until its end) */ MSG_main(); INFO1("Simulation took %g seconds",MSG_get_clock()); }

SimGrid for Research on Large-Scale Distributed Systems

Using SimGrid for Practical Grid Experiments

(112/142)

The MSG master/workers example: raw output


[Tremblay:master:(1) 0.000000] [example/INFO] Got 3 workers and 6 tasks to process [Tremblay:master:(1) 0.000000] [example/INFO] Sending Task_0 to worker-0 [Tremblay:master:(1) 0.147613] [example/INFO] Sending Task_1 to worker-1 [Jupiter:worker:(2) 0.147613] [example/INFO] Processing Task_0 [Tremblay:master:(1) 0.347192] [example/INFO] Sending Task_2 to worker-2 [Fafard:worker:(3) 0.347192] [example/INFO] Processing Task_1 [Tremblay:master:(1) 0.475692] [example/INFO] Sending Task_3 to worker-0 [Ginette:worker:(4) 0.475692] [example/INFO] Processing Task_2 [Jupiter:worker:(2) 0.802956] [example/INFO] Task_0 done [Tremblay:master:(1) 0.950569] [example/INFO] Sending Task_4 to worker-1 [Jupiter:worker:(2) 0.950569] [example/INFO] Processing Task_3 [Fafard:worker:(3) 1.002534] [example/INFO] Task_1 done [Tremblay:master:(1) 1.202113] [example/INFO] Sending Task_5 to worker-2 [Fafard:worker:(3) 1.202113] [example/INFO] Processing Task_4 [Ginette:worker:(4) 1.506790] [example/INFO] Task_2 done [Jupiter:worker:(2) 1.605911] [example/INFO] Task_3 done [Tremblay:master:(1) 1.635290] [example/INFO] All tasks dispatched. Lets stop workers. [Ginette:worker:(4) 1.635290] [example/INFO] Processing Task_5 [Jupiter:worker:(2) 1.636752] [example/INFO] Im done. See you! [Fafard:worker:(3) 1.857455] [example/INFO] Task_4 done [Fafard:worker:(3) 1.859431] [example/INFO] Im done. See you! [Ginette:worker:(4) 2.666388] [example/INFO] Task_5 done [Tremblay:master:(1) 2.667660] [example/INFO] Goodbye now! [Ginette:worker:(4) 2.667660] [example/INFO] Im done. See you! [2.667660] [example/INFO] Simulation time 2.66766

SimGrid for Research on Large-Scale Distributed Systems

Using SimGrid for Practical Grid Experiments

(113/142)

The MSG master/workers example: colorized output


$ ./my_simulator | MSG_visualization/colorize.pl [ 0.000][ Tremblay:master ] Got 3 workers and 6 tasks to process [ 0.000][ Tremblay:master ] Sending Task_0 to worker-0 [ 0.148][ Tremblay:master ] Sending Task_1 to worker-1 [ 0.148][ Jupiter:worker ] Processing Task_0 [ 0.347][ Tremblay:master ] Sending Task_2 to worker-2 [ 0.347][ Fafard:worker ] Processing Task_1 [ 0.476][ Tremblay:master ] Sending Task_3 to worker-0 [ 0.476][ Ginette:worker ] Processing Task_2 [ 0.803][ Jupiter:worker ] Task_0 done [ 0.951][ Tremblay:master ] Sending Task_4 to worker-1 [ 0.951][ Jupiter:worker ] Processing Task_3 [ 1.003][ Fafard:worker ] Task_1 done [ 1.202][ Tremblay:master ] Sending Task_5 to worker-2 [ 1.202][ Fafard:worker ] Processing Task_4 [ 1.507][ Ginette:worker ] Task_2 done [ 1.606][ Jupiter:worker ] Task_3 done [ 1.635][ Tremblay:master ] All tasks dispatched. Lets stop workers. [ 1.635][ Ginette:worker ] Processing Task_5 [ 1.637][ Jupiter:worker ] Im done. See you! [ 1.857][ Fafard:worker ] Task_4 done [ 1.859][ Fafard:worker ] Im done. See you! [ 2.666][ Ginette:worker ] Task_5 done [ 2.668][ Tremblay:master ] Goodbye now! [ 2.668][ Ginette:worker ] Im done. See you! [ 2.668][ ] Simulation time 2.66766

SimGrid for Research on Large-Scale Distributed Systems

Using SimGrid for Practical Grid Experiments

(114/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes
Motivations, Concepts and Example of Use Java bindings A Glance at SimGrid Internals Performance Results

GRAS: Developing and Debugging Real Applications Conclusion


SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (115/142)

MSG bindings for Java: master/workers example


import simgrid.msg.*; public class BasicTask extends simgrid.msg.Task { public BasicTask(String name, double computeDuration, double messageSize) throws JniException { super(name, computeDuration, messageSize); } } public class FinalizeTask extends simgrid.msg.Task { public FinalizeTask() throws JniException { super("finalize",0,0); } } public class Worker extends simgrid.msg.Process { public void main(String[ ] args) throws JniException, NativeException { String id = args[0]; while (true) { Task t = Task.receive("worker-" + id); if (t instanceof FinalizeTask) break; BasicTask task = (BasicTask)t; Msg.info("Processing " + task.getName() + ""); task.execute(); Msg.info("" + task.getName() + " done "); } Msg.info("Received Finalize. Im done. See you!"); } }
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (116/142)

MSG bindings for Java: master/workers example

import simgrid.msg.*; public class Master extends simgrid.msg.Process { public void main(String[ ] args) throws JniException, NativeException { int numberOfTasks = Integer.valueOf(args[0]).intValue(); double taskComputeSize = Double.valueOf(args[1]).doubleValue(); double taskCommunicateSize = Double.valueOf(args[2]).doubleValue(); int workerCount = Integer.valueOf(args[3]).intValue(); Msg.info("Got "+ workerCount + " workers and " + numberOfTasks + " tasks.");

for (int i = 0; i < numberOfTasks; i++) { BasicTask task = new BasicTask("Task_" + i ,taskComputeSize,taskCommunicateSize); task.send("worker-" + (i % workerCount)); Msg.info("Send completed for the task " + task.getName() + " on the mailbox worker-" + (i % workerCount) + ""); } Msg.info("Goodbye now!"); } }

SimGrid for Research on Large-Scale Distributed Systems

Using SimGrid for Practical Grid Experiments

(117/142)

MSG bindings for Java: master/workers example


Rest of the story
XML les (platform, deployment) not modied No need for a main() function glueing things together
Java introspection mecanism used for this simgrid.msg.Msg contains an adapted main() function Name of XML les must be passed as command-line argument

Output very similar too

What about performance XXX XXXworkers 100 500 XXX tasks X 1,000 native .16 .19 java .41 .59 10,000 native .48 .52 java 1.6 1.9 100,000 native 3.7 3.8 java 14. 13. 1,000,000 native 36. 37. java 121. 130.

loss?
1,000 .21 .94 .54 2.38 4.0 15. 38. 134. 5,000 .42 7.6 .83 13. 4.4 29. 41. 163. 10,000 0.74 27. 1.1 40. 4.5 77. 40. 200.

Small platforms: ok Larger ones: not quite. . .

SimGrid for Research on Large-Scale Distributed Systems

Using SimGrid for Practical Grid Experiments

(118/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes
Motivations, Concepts and Example of Use Java bindings A Glance at SimGrid Internals Performance Results

GRAS: Developing and Debugging Real Applications Conclusion


SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (119/142)

Implementation of CSPs on top of simulation kernel


Idea
Each process is implemented in a thread Blocking actions (execution and communication) reported into kernel A maestro thread unlocks the runnable threads (when action done)

Example
Thread A:
Send toto to B Receive something from B
Maestro
Simulation Kernel:
whos next?

Thread A

Thread B

Send "toto" to B

Thread B:
Receive something from A Send blah to A
Receive from B

Receive from A

Maestro schedules threads


Order given by simulation kernel

Send "blah" to A (done) (done)

Mutually exclusive execution


(dont fear)
SimGrid for Research on Large-Scale Distributed Systems

Using SimGrid for Practical Grid Experiments

(120/142)

A Glance at SimGrid Internals


SimDag MSG SMPI SMURF
SimIX network proxy

GRAS

SimIX
POSIX-like API on a virtual platform

SURF
virtual platform simulator

XBT

SURF: Simulation kernel, grounding simulation


Contains all the models (uses GTNetS on need)

SimIX: Eases the writting of user APIs based on CSPs


Provided semantic: threads, mutexes and conditions on top of simulator

SMURF: Allows to distribute the simulation over a cluster (under development)


Not for speed but for memory limit (at least for now)

SimGrid for Research on Large-Scale Distributed Systems

Using SimGrid for Practical Grid Experiments

(121/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes
Motivations, Concepts and Example of Use Java bindings A Glance at SimGrid Internals Performance Results

GRAS: Developing and Debugging Real Applications Conclusion


SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (122/142)

Some Performance Results


Master/Workers on amd64 with 4Gb
#tasks 1,000 Context mecanism ucontext pthread java ucontext pthread java ucontext pthread java ucontext pthread java 100 0.16 0.15 0.41 0.48 0.51 1.6 3.7 4.7 14. 36. 42. 121. 500 0.19 0.18 0.59 0.52 0.56 1.9 3.8 4.4 13. 37. 44. 130. #Workers 1,000 5,000 0.21 0.42 0.19 0.35 0.94 7.6 0.54 0.83 0.57 0.78 2.38 13. 4.0 4.4 4.6 5.0 15. 29. 38. 41. 46. 48. 134. 163. 10,000 0.74 0.55 27. 1.1 0.95 40. 4.5 5.23 77. 40. 47. 200. 25,000 1.66

10,000

1.97

: #semaphores reached system limit (2 semaphores per user process,

100,000

5.5

System limit = 32k semaphores)

1,000,000

41.

Extensibility with UNIX contextes


#tasks 1,000 10,000 100,000 1,000,000 5,000,000 Stack size 128Kb 12Kb 128Kb 12Kb 128Kb 12Kb 128Kb 12Kb 128Kb 12Kb 25,000 1.6 0.5 2 0.8 5.5 3.7 41 33 206 161 #Workers 50,000 100,000 0.9 1.7 1.2 2 4.1 4.8 33.6 33.7 167 161 200,000 3.2 3.5 6.7 35.5 165

Scalability limit of GridSim


1 user process = 3 java threads
(code, input, output)

System limit = 32k threads at most 10,922 user processes

: out of memory
Using SimGrid for Practical Grid Experiments (123/142)

SimGrid for Research on Large-Scale Distributed Systems

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications
Motivation and project goals Functionalities Experimental evaluation (performance and simplicity) Conclusion and Perspectives

Conclusion
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (124/142)

Goals of the GRAS project (Grid Reality And Simulation) Ease development of large-scale distributed apps
Development of real distributed applications using a simulator
Research
Code

Development

Research & Development


Code

rewrite

Code

GRAS
Simulation Application

API GRDK GRE 111 SimGrid 000 11 00

1 0

Without GRAS

With GRAS

Framework for Rapid Development of Distributed Infrastructure


Develop and tune on the simulator; Deploy in situ without modication How: One API, two implementations

Ecient Grid Runtime Environment (result = application = prototype)


Performance concern: ecient communication of structured data How: Ecient wire protocol (avoid data conversion) Portability concern: because of grid heterogeneity How: ANSI C + autoconf + no dependency
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (125/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications
Motivation and project goals Functionalities Experimental evaluation (performance and simplicity) Conclusion and Perspectives

Conclusion
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (126/142)

Main concepts of the GRAS API


Agents (acting entities)
Code (C function) Private data Location (hosting computer)

Sockets (communication endpoints)


Server socket: to receive messages Client socket: to contact a server (and receive answers)

Messages (what gets exchanged between agents)


Semantic: Message type Payload described by data type description (xed for a given type)

Callbacks (code to execute when a message is received)


Also possible to explicitly wait for given messages
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (127/142)

Emulation and Virtualization


Same code runs without modication both in simulation and in situ
In simulation, agents run as threads within a single process In situ, each agent runs within its own process Agents are threads, which can run as separate processes

Emulation issues
How to get the process sleeping? How to get the current time?
System calls are virtualized: gras os time; gras os sleep

How to report computation time into the simulator?


Asked explicitly by user, using provided macros Time to report can be benchmarked automatically

What about global data?


Agent status placed in a specic structure, ad-hoc manipulation API

SimGrid for Research on Large-Scale Distributed Systems

Using SimGrid for Practical Grid Experiments

(128/142)

Example of code: ping-pong (1/2)


Code common to client and server
#include "gras.h" XBT_LOG_NEW_DEFAULT_CATEGORY(test,"Messages specific to this example" ); static void register_messages(void) { gras_msgtype_declare("ping", gras_datadesc_by_name("int" )); gras_msgtype_declare("pong", gras_datadesc_by_name("int" )); }

Client code
int client(int argc,char *argv[ ]) { gras_socket_t peer=NULL, from ; int ping=1234, pong; gras_init(&argc, argv); gras_os_sleep(1); /* Wait for the server startup */ peer=gras_socket_client("127.0.0.1",4000); register_messages(); gras_msg_send(peer, "ping", &ping); INFO3("PING(%d) -> %s:%d",ping, gras_socket_peer_name(peer), gras_socket_peer_port(peer)); gras_msg_wait(6000,"pong",&from,&pong); gras_exit(); return 0; }
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (129/142)

Example of code: ping-pong (2/2)


Server code
typedef struct { /* Global private data */ int endcondition; } server_data_t; int server (int argc,char *argv[ ]) { server_data_t *globals; gras_init(&argc,argv); globals = gras_userdata_new(server_data_t); globals->endcondition=0; gras_socket_server(4000); register_messages(); gras_cb_register("ping", &server_cb_ping_handler); while (!globals->endcondition) { /* Handle messages until our state change */ gras_msg_handle(600.0); /* Actually, one ping is enough for that */ } free(globals); gras_exit(); return 0; } int server_cb_ping_handler(gras_msg_cb_ctx_t ctx, void *payload_data) { server_data_t *globals = (server_data_t*)gras_userdata_get(); /* Get the globals */ globals->endcondition = 1; int msg = *(int*) payload_data; /* Whats the content? */ gras_socket_t expeditor = gras_msg_cb_ctx_from(ctx); /* Who sent it?*/ /* Send data back as payload of a pong message to the pings expeditor */ gras_msg_send(expeditor, "pong", &msg); return 0; }
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (130/142)

Exchanging structured data


GRAS wire protocol: NDR (Native Data Representation)
Avoid data conversion when possible: Sender writes data on socket as they are in memory If receivers architecture does match, no conversion Receiver able to convert from any architecture

GRAS message payload can be any valid C type


Structure, enumeration, array, pointer, . . . Classical garbage collection algorithm to deep-copy it Cycles in pointed structures detected & recreated

Describing a data type to GRAS


Manual description (excerpt)

Automatic description of vector

GRAS_DEFINE_TYPE(s_vect, struct s_vect { gras_datadesc_type_t gras_datadesc_struct(name); int cnt; gras_datadesc_struct_append(struct type,name,field type); double*data GRAS_ANNOTE(size,cnt); gras datadesc struct close(struct type); } );

C declaration stored into a char* variable to be parsed at runtime


SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (131/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications
Motivation and project goals Functionalities Experimental evaluation (performance and simplicity) Conclusion and Perspectives

Conclusion
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (132/142)

Assessing communication performance


Only communication performance studied since computation are not mediated Experiment: timing ping-pong of structured data (a message of Pastry)
typedef struct { int id, row_count; double time_sent; row_t *rows; int leaves[MAX_LEAFSET]; } welcome_msg_t; typedef struct { int which_row; int row[COLS][MAX_ROUTESET]; } row_t ;

Tested solutions
GRAS PBIO (uses NDR) OmniORB (classical CORBA solution) MPICH (classical MPI solution) XML (Expat parser + handcrafted communication)

Platform: x86, PPC, sparc (all under Linux)

SimGrid for Research on Large-Scale Distributed Systems

Using SimGrid for Practical Grid Experiments

(133/142)

Performance on a LAN
Sender: ppc
10
-2

22.7ms 10
-2

sparc
7.7ms 3.9ms 2.4ms 10-3

40.0ms

x86
10
-2

8.2ms 4.3ms

17.9ms

3.1ms 10-3

5.4ms

ppc
Receiver

10-3

0.8ms

10-4

n/a GRAS MPICH OmniORB PBIO 26.8ms XML 42.6ms

10-4

n/a GRAS MPICH OmniORB PBIO XML 55.7ms

10-4

n/a

n/a XML 38.0ms 20.7ms

GRAS MPICH OmniORB PBIO

10

-2

6.3ms 1.6ms

10

-2

4.8ms 2.5ms

7.7ms 7.0ms

10

-2

5.7ms

6.9ms

sparc

10-3

10-3

10-3

10-4

n/a GRAS MPICH OmniORB PBIO XML

10-4

GRAS MPICH OmniORB PBIO

XML 34.3ms

10-4

n/a GRAS MPICH OmniORB PBIO XML

18.0ms 10-2 3.4ms 5.2ms 10-2 2.9ms 10-3 5.4ms 5.6ms 10-2 2.3ms 10-3 3.8ms 2.2ms

12.8ms

x86

10-3

0.5ms

10-4

n/a

n/a XML

GRAS MPICH OmniORB PBIO

10-4

n/a GRAS MPICH OmniORB PBIO XML

10-4

GRAS MPICH OmniORB PBIO

XML

MPICH twice as fast as GRAS, but cannot mix little- and big-endian Linux PBIO broken on PPC XML much slower (extra conversions + verbose wire encoding)

GRAS is the better compromise between performance and portability


SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (134/142)

Assessing API simplicity


Experiment: ran code complexity measurements on code for previous experiment
McCabe Cyclomatic Complexity Number of lines of code GRAS 8 48 MPICH 10 65 PBIO 10 84 OmniORB 12 92 XML 35 150

Results discussion
XML complexity may be artefact of Expat parser (but fastest) MPICH: manual marshaling/unmarshalling PBIO: automatic marshaling, but manual type description OmniORB: automatic marshaling, IDL as type description GRAS: automatic marshaling & type description (IDL is C)

Conclusion GRAS is the least demanding solution from developer perspective


SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (135/142)

Conclusion: GRAS eases infrastructure development


SimDag MSG SMPI SMURF
SimIX network proxy

GRAS
GRE: GRAS in situ

Research & Development

SimIX
POSIX-like API on a virtual platform

11111111 00000000 API 11111111 00000000 GRDK GRE 1111 0000 SimGrid 1111 0000
With GRAS

Code

SURF
virtual platform simulator

XBT

GRDK: Grid Research & Development Kit


API for (explicitly) distributed applications Study applications in the comfort of the simulator

GRE: Grid Runtime Environment


Ecient: twice as slow as MPICH, faster than OmniORB, PBIO, XML Portable: Linux (11 CPU archs); Windows; Mac OS X; Solaris; IRIX; AIX Simple and convenient:
API simpler than classical communication libraries (+XBT tools) Easy to deploy: C ANSI; no dependency; autotools; <400kb
SimGrid for Research on Large-Scale Distributed Systems Using SimGrid for Practical Grid Experiments (136/142)

GRAS perspectives Future work on GRAS


Performance: type precompilation, communication taming and compression GRASPE (GRAS Platform Expender) for automatic deployment Model-checking as third mode along with simulation and in-situ execution

Ongoing applications
Comparison of P2P protocols (Pastry, Chord, etc) Use emulation mode to validate SimGrid models Network mapper (ALNeM): capture platform descriptions for simulator Large scale mutual exclusion service

Future applications
Platform monitoring tool (bandwidth and latency) Group communications & RPC; Application-level routing; etc.

SimGrid for Research on Large-Scale Distributed Systems

Using SimGrid for Practical Grid Experiments

(137/142)

Agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models in SimGrid Analytic Models Underlying SimGrid Experimental Validation of the Simulation Models Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications Conclusion
SimGrid for Research on Large-Scale Distributed Systems Conclusion (138/142)

Conclusions on Distributed Systems Research


Research on Large-Scale Distributed Systems
Reexion about common methodologies needed (reproductible results needed) Purely theoritical works limited (simplistic settings NP-complete problems) Real-world experiments time and labor consuming; limited representativity Simulation appealing, if results remain validated

Simulating Large-Scale Distributed Systems


Packet-level simulators too slow for large scale studies Large amount of ad-hoc simulators, but discutable validity Coarse-grain modelization of TCP ows possible (cf. networking community) Model instantiation (platform mapping or generation) remains challenging

SimGrid provides interesting models


Implements non-trivial coarse-grain models for resources and sharing Validity results encouraging with regard to packet-level simulators Several orders of magnitude faster than packet-level simulators Several models availables, ability to plug new ones or use packet-level sim.
SimGrid for Research on Large-Scale Distributed Systems Conclusion (139/142)

SimGrid provides several user interfaces


SimDag: Comparing Scheduling Heuristics for DAGs of (parallel) tasks
Declare tasks, their precedences, schedule them on resource, get the makespan

MSG: Comparing Heuristics for Concurrent Sequential Processes


Declare independent agents running a given function on an host Let them exchange and execute tasks Easy interface, rapid prototyping New in SimGrid v3.3: Java bindings for MSG

GRAS: Developing and Debugging Real Applications


Develop once, run in simulation or in situ (debug; test on non-existing platforms) Resulting application twice slower than MPICH, faster than omniorb Highly portable and easy to deploy

Other interfaces comming


SMPI: Simulate MPI applications BSP model, OpenMP?
SimGrid for Research on Large-Scale Distributed Systems Conclusion (140/142)

SimGrid is an active and exciting project


Future Plans
Improve usability
(statistics tools, campain management)
SimDag MSG SMPI SMURF
SimIX network proxy

GRAS
GRE: GRAS in situ

Extreme Scalability for P2P Model-checking of GRAS applications Emulation solution ` la MicroGrid a

SimIX
POSIX-like API on a virtual platform

SURF
virtual platform simulator

XBT

Large community
http://gforge.inria.fr/projects/simgrid/ 130 subscribers to the user mailling list (40 to -devel) 40 scientic publications using the tool for their experiments
15 co-signed by one of the core-team members 25 purely external

LGPL, 120,000 lines of code (half for examples and regression tests) Examples, documentation and tutorials on the web page

Use it in your works!


SimGrid for Research on Large-Scale Distributed Systems Conclusion (141/142)

Detailed agenda
Experiments for Large-Scale Distributed Systems Research Methodological Issues Main Methodological Approaches
Real-world experiments Simulation

Tools for Experimentations in Large-Scale Distributed Systems


Possible designs Experimentation platforms: Grid5000 and PlanetLab Emulators: ModelNet and MicroGrid Packet-level Simulators: ns-2, SSFNet and GTNetS Ad-hoc simulators: ChicagoSim, OptorSim, GridSim, . . . Peer to peer simulators SimGrid

Resource Models in SimGrid Analytic Models Underlying SimGrid


Modeling a Single Resource Multi-hop Networks Resource Sharing

Experimental Validation of the Simulation Models


Single link Dumbbell Random platforms Simulation speed

Platform Instanciation Platform Catalog Synthetic Topologies Topology Mapping Using SimGrid for Practical Grid Experiments Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes
Motivations, Concepts and Example of Use Java bindings A Glance at SimGrid Internals Performance Results

GRAS: Developing and Debugging Real Applications


Motivation and project goals Functionalities Experimental evaluation (performance and simplicity) Conclusion and Perspectives

Conclusion
SimGrid for Research on Large-Scale Distributed Systems Conclusion (142/142)

You might also like