0% found this document useful (0 votes)
60 views20 pages

Eitan Zahavi Isaac Keslassy Avinoam Kolodny

Learning Routing System
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views20 pages

Eitan Zahavi Isaac Keslassy Avinoam Kolodny

Learning Routing System
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

DISTRIBUTED ADAPTIVE ROUTING

FOR BIG-DATA APPLICATIONS


RUNNING ON DATA CENTER NETWORKS

Eitan Zahavi*+
Isaac Keslassy+
Avinoam Kolodny+

ANCS 2012

Mellanox Technologies LTD,


Department
*

Technion - EE

Big Data Larger Flows


2

Data-set sizes keep rising

Web2 and Cloud Big-Data applications

Data Center Traffic changes to:

Longer, Higher BW and Fewer Flows

Google

Static Routing of Big-Data = Low BW


3

Static Routing cannot balance a small number of


flows
Congestion: when BW of link flows > link capacity
When longer and higher-BW flows contend:

On lossy network: packet drop BW drop


On lossless network: congestion spreading BW drop
Data flow

S
R

Traffic Aware Load Balancing Systems


4

Adaptive Routing adjusts routing to


network load
Centralized

Flows are routed according to


a global knowledge
Central
Routing
Control

Distributed

Each flow is routed by its input


switch with local knowledge

Self
Routing
Unit
S
R

S
R

S
R

Central vs. Distributed Adaptive Routing


5

Property

Central Adaptive
Routing

Distributed Adaptive
Routing

Scalability

Low

High

Knowledg
e

Global

Local (to keep


scalability)

NonYes
BlockingDistributed

Unknown

Adaptive Routing is
either scalable or have global knowledge
It is Reactive

Research Question
6

Can a Scalable Distributed Adaptive


Routing System perform like centralized
system and produce non-blocking
routing assignments in reasonable time?

Trial and Error


Is Fundamental to Distributed AR
7

Randomize output port Trial 1


Send the traffic
Contention 1
Un-route contending flow

Send the traffic


Contention 2
Un-route contending flow

Randomize new output port Trial 2

Randomize new output port Trial 3

SR

SR

Send the traffic


SR

Convergence!

Routing Trials Cause BW Loss


8

Packet Simulation:
R1 is delivered followed by G1
R2 is stuck behind G1
Re-route
R3 arrives before R2

R1
R2
R3
SR

R1
SR

Out-of-Order Packets delivery!

Implications are significant drop in flow BW

SR

G1

TCP* sees out-of-order as packet-drop and throttle the senders


See Incast papers

* Or any other reliable transport

Research Plan
9

Given
events
Ne
w

1.
2.

3.

Tr
Tr
ial
ial
Tr
1
2
aff
ic

Tr
ial
N

No
C

t
on
te

nti
on

Analyze Distributed Adaptive Routing systems


Find how many routing trials are required to
converge
Find conditions that make the system reach a nonblocking
assignment in a reasonable time

A Simple Policy for Selecting a Flow to ReRoute


10

At each time step


Each output switch
Request

re-route of a single worst


contending flow
1

1
1

SR

At t=0 New traffic pattern is applied


n
Randomize output-ports
SR
and Send flows
At t=0.5 Request Re-Routes
SR
Repeat for t=t+1 until no contention

r
m

input
switch

output
switch

Evaluation
11

Measure average number of iterations I to


convergence
I is exponential with system size !

12

A Balls and Bins


Representation

Each output switch is a balls and bins system


Bins are the switch input links, balls are the link
flows
Assume 1 ball (=flow) is allowed on each bin (=link)

A good bin has 1 ball


Bins are either empty, good or bad
Middle Switch
S
R

S
R

S
R

empty

bad
good

System Dynamics
13

Two reasons of ball moves

Improvement or Induced-move
Induced

Output switch 1
Middle Switch:

2
2

Improve

Middle Switch:

SW2

Output switch 2

SW1

SW3

Balls are numbered by their input switch number

The Last Step Governs Convergence


14

Estimated Markov chain models


What is the probability of the required last
Improvement to not cause a bad Induced
move?
Each one of the r output-switches must do
that step
Absorbing 1
Absorbing
B
A
Therefore convergence time is exponential
1
0
D
Output switch 1
C
with r
B
A

Output switch 2

Bad

Good

Bad

Good

0
C

Bad

Output switch r

Good

0
C

Introducing p
15

Assume a symmetrical system: flows have


same BW
What if the Flow_BW < Link_BW?
The network load is Flow_BW/Link_BW
p = how many balls are allowed in one bin
p=2
p=1
p=1
p=2

S
R

S
R

S
R

p has Great Impact on Convergence


16

Measure average number of iterations I to


convergence
I shows very strong dependency on p

Implementable Distributed System


17

Replace congestion detection by flow-count with


QCN
Detected on middle switch output not output
switch input
Replace worst flow selection by congested flow
sampling
Implement as extension to detailed InfiniBand flit
level model

52% Load on 1152 nodes Fat-Tree


18

No change in number of adaptations


over time !
No convergence

48% Load on 1152 nodes Fat-Tree

Switch Routing Adaptations/ 10usec

19

t [sec]

Conclusions
20

Study: Distributed Adaptive Routing of Big-Data flows

Focus on: Time to convergence to non-blocking routing

Learning: The cause for the slow convergence

Corollary: Half link BW flows converge in few iterations


Evaluation: 1152 nodes fat-tree simulation reproduce
these results

Distributed Adaptive Routing of Half Link_BW


Flows
is both Non-Blocking and Scalable

You might also like