0% found this document useful (0 votes)
57 views31 pages

Routing Overlay Case of Study: PASTRY: Andrea Marin

The document summarizes the routing overlay PASTRY. It discusses that PASTRY is a distributed hash table algorithm for peer-to-peer routing defined in 2001. Each PASTRY node has a unique ID and can route messages to the node with the closest ID to a given key within log(n) steps on average. The routing is achieved through use of a routing table and leaf set on each node to direct messages progressively closer to the target node. Experimental results showed routing using the tables finds the target in log(n) steps with high probability.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views31 pages

Routing Overlay Case of Study: PASTRY: Andrea Marin

The document summarizes the routing overlay PASTRY. It discusses that PASTRY is a distributed hash table algorithm for peer-to-peer routing defined in 2001. Each PASTRY node has a unique ID and can route messages to the node with the closest ID to a given key within log(n) steps on average. The routing is achieved through use of a routing table and leaf set on each node to direct messages progressively closer to the target node. Experimental results showed routing using the tables finds the target in log(n) steps with high probability.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Routing overlay case of study: PASTRY

Andrea Marin

Università Ca’ Foscari di Venezia


Dipartimento di Informatica
Corso di Sistemi Distribuiti

2009
Introduction
Design overview
Self-adaptation
Improving the routing performance

Presentation outline

1 Introduction

2 Design overview

3 Self-adaptation
Node join
Node departure

4 Improving the routing performance

Presentation based on the original paper: A. Rowstorn and P. Druschel.


PASTRY: Scalable, decentralized object location and routing for large-scale
peer-to-peer systems.

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

What is PASTRY?

PASTRY is an implementation of a Distributed Hash Table


(DHT) algorithm for P2P routing overlay
Defined by Rowstron (Microsoft Research) and Druschel (Rice
University) in 2001
Salient features:
Fully decentralized
Scalable
High fault tolerance
Used as middleware by several applications:
PAST storage utility
SCRIBE publish/subscribe system
...

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

Design of PASTRY: summary

Any computer connected to the Internet and running


PASTRY node software can be a PASTRY node
Application specific security polices may be applied
Each node is identified by a unique 128 bit node identifier
(NodeId)
The node identifier is assumed to be generated randomly
Each NodeId in is assumed to have the same probability of
being chosen
Node with similar NodeId may be geographically far
Given a key, PASTRY can deliver a message to the node with
the closest NodeId to key within dlog2b Ne steps, where b is a
configuration parameter (usually b = 4) and N is the number
of nodes

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

Sketch of the routing algorithm

Assume we want to find the node in the PASTRY network


with the NodeId closest to a given key
Note that NodeId and key are both 128 bit sequences
Both NodeId and the key can be thought as sequence of digits
with base 2b

Routing idea
In each routing step, a node normally forwards the message to a
node whose NodeId shares with the key a prefix that is at least one
digit longer than than the key shares with the present node. If
such a node is not known, the message is forwarded to a node that
shares the same prefix of the actual node but its NodeId is
numerically closer to the key,

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

State of a node

Each PASTRY node has a state consisting of:


a routing table
used in the first phase of the routing (long distances)
a neighborhood set M
contains the NodeId and IP addresses of the |M| nodes which
are closest (according to a metric) to the considered node
a leaf set L
contains the NodeId and IP addresses of the |L|/2 nodes
whose NodeId are numerically closest smaller than the present
Nodeid, and the |L|/2 nodes whose NodeId are numerically
closest larger than the present NodeId.

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

The routing table

The routing table is a dlog2b (N)e × (2b − 1) table


b is the configuration parameter
N is the number of PASTRY nodes in the network
The 2b − 1 entries at row n each refers to a node whose
NodeId shares the present node NodeId in the first n digits
but whose (n + 1)th digit has one of the 2b − 1 possible
values other than (n + 1)th digit in the present node id.

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

Routing table example

Assuming 16 bit NodeId, b = 2, number are expressed in base


2b = 4.

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

Routing table dimension

The choice of b and N determine the routing table size


The size is approximatively dlog2b Ne × (2b − 1)
The maximum number of hops between any pair of nodes is
dlog2b Ne
Larger b increases the routing table size but reduces the
number of hops
With 106 nodes and b = 4 we have around 75 table entries

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

Neighborhood set

The Neighborhood set M contains the NodeIds and IP


addresses of the |M| nodes that are closest (according to a
metric that usually depends on the network topology) to the
local node
This set is not normally used in the routing process
It is useful in maintaining local properties

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

Leaf set

The leaf set contain the |L| NodeIds closest to the current node’s
NodeId

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

Routing algorithm: notation

D: key to route
R`i the entry in the routing table R at column i with
0 ≤ i ≤ 2b and row `, 0 ≤ ` ≤ b128/bc
Li the i-th closest nodeId in the leaf set L,
−b|L|/2c ≤ i ≤ b|L|/2c
D` the value of the l’s digit in the key D
shl(A, B): the length of the prefix shared among A and B in
digits
A address of the current node

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

Routing algorithm
if L−b|L|/2c ≤ D ≤ L+b|L|/2c then
/* Route to a leaf */
forward to Li s.th. |D − Li | is minimal
end
else
` ← shl(D, A)
if R`D` 6= null then
/* Route to a node in the routing table */
forward to R`D`
end
else
/* Get as close as you can ... */
forward to T ∈ L ∪ R ∪ M s.th. shl(T , D) ≥ l,
|T − D| < |A − D|
end
Andrea Marin Routing overlay case of study: PASTRY
Introduction
Design overview
Self-adaptation
Improving the routing performance

Example: how do we route?

10233131 ⇒ 10233122 (leaf)


10210221 ⇒ 10211302
Target not in L because 102331024 − 102102214 = 222214 and
102331024 − 12330004 = 1024 and
102332324 − 102331024 = 1304
shl(10233102, 10210221) = 3
1 Andrea Marin Routing overlay case of study: PASTRY
Introduction
Design overview
Self-adaptation
Improving the routing performance

Routing performance

Theorem (Expected number of routing steps)


The expected number of routing steps with PASTRY algorithm is
dlog2b Ne.

Proof
If the target node is reached using the routing table, each step
reduces the set of possible target of 2b
If the target node is in L, then we need 1 step
The third case is more difficult to treat. It is unlikely to
happen, experimental results with uniform NodeId, give:
If |L| = 2b , probability < 0.02
If |L|2b+1 , probability = < 0.006
When case 3 happens it adds an additional step

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

Reliability

In the event of many simultaneous node failures the number


of routing steps may be at worst linear with N (loose upper
bound)
Message delivery is guaranteed unless b|L|/2c nodes with
consecutive NodeIds fails simultaneously. (Very rare event)

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

PASTRY API (simplified version)

PASTRY exports the following operations:


nodeId = pastryInit(Credentials, Application)
Join a PASTRY network or create a new one
Credentials: needed to authenticate the new node
Application: handle to the application that requires the services
route(msg,key)
PASTRY routes message msg to the node with NodeId
numerically closest to key

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

Application API (simplified version)

An application that uses PASTRY services must export the


following operations:
deliver(msg,key)
PASTRY calls this method to deliver a message arrived to
destination
forward(msg,key,nextId)
PASTRY calls this method before forwarding a message. The
application may change the message, or nextId. Setting nextId
to null terminates the delivering.
newLeafs(leafSet)
Used by PASTRY to inform the application about a change in
the leaf set

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview Node join
Self-adaptation Node departure
Improving the routing performance

Scenario and assumptions

Node X wants to join a PASTRY network


X ’s NodeId is computed by the application
E.g. may be a SHA-1 of its IP address or its public key
X knows a close (according to the proximity metric) node A

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview Node join
Self-adaptation Node departure
Improving the routing performance

Join message

Node X sends to A a message of join whose key is X ’s NodeId


The messages is treated by A like all the other messages
A tries to deliver the message to send the message to node Z
whose NodeId is closest to key, i.e., closest to X ’s NodeId
Each node in the path from A to Z sends its state tables to X
X may require additional information to other nodes
X builds its own tables
The interested nodes update their state tables

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview Node join
Self-adaptation Node departure
Improving the routing performance

Neighbourhood set and leaf set

A is assumed to be close to X so X uses A’s neighbourhood


set to initialise its own
Z leaf set is used as base leaf set of X

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview Node join
Self-adaptation Node departure
Improving the routing performance

Building the routing table

Let ` = shl(X , A) ≥ 0
Rows from 0 to ` of A become rows from 0 to ` of X
Row ` + 1 of X is row ` + 1 of B, where B is the node after A
in the path to Z
X sends M, L and the routing table to each node from A to
Z . These update their states
Simultaneous arrivals cause contention solved using timestamp
Messages sent for a node join are O(log2b N)

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview Node join
Self-adaptation Node departure
Improving the routing performance

Dealing with node dapartures

Node can fail or depart from the network without warnings


A node is considered failed when its immediate neighbours (in
NodeId space) cannot communicate with it:
In this case the state of the nodes that refer to the failed node
must be updated

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview Node join
Self-adaptation Node departure
Improving the routing performance

Repairing the leaf set

Scenario:
Node X fails
Node A has X in the leaf set
Actions performed by A to repair its leaf set:
If NodeIdA > NodeIdX then A requires the leaf set of the leaf
node with lowest NodeId
If NodeIdA < NodeIdA then A requires the leaf set of the leaf
node with highest NodeId
A uses the received set to repair its own

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview Node join
Self-adaptation Node departure
Improving the routing performance

Repairing the routing table

Scenario:
Node X fails
Node A has X as target in the routing table in position R`d
Actions performed by A to repair its routing table:
A asks the entry R`d for each target in its routing table R`i
with i 6= d
If none answers with a live node then it passes to row R`+1
and repeats the procedure
If a node exists this procedure finds it with high probability

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview Node join
Self-adaptation Node departure
Improving the routing performance

Repairing the neighbourhood set

Note that the neighbourhood set is not used in the routing,


yet it plays a pivotal role in improving the performance of
PASTRY algorithm
A PASTRY node periodically tests if the nodes in M are live
When a node does not answer the polling node asks for the
neighbourhood set of the other nodes in its M. Then it
replaces the failed node with the closest (according to the
proximity metric) live one.

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

Main idea

PASTRY routing algorithm may result inefficient because few


steps in the routing procedure may require long time
The distribution of NodeIds does not take in account locality
Close NodeIds may be geographically far ⇒ long delays for
message delivering
The neighbourhood set is used to improve the performance

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

Assumptions and goal

Assumptions:
Scalar proximity metric
E.g.: number of routing hops, geographic distance
The proximity space given by the proximity metric is
Euclidean
Triangulation inequality holds
If the metric is not Euclidean PASTRY routing keeps working
but it may be not optimized
Goal:
The nodes in the path of a message delivery from A to B are
close according to the proximity metric.

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

Locality in the routing table

Scenario:
Assume a network satisfies the required property
We show that when a new node X joins the network the
property is maintained
X knows A that is assumed to be close to X
Idea:
R0 of A is used for X . If the property holds for A and A is
close to X then the property holds for S
R1 of X is R1 of B, i.e., the node reached from A. Why can
B be considered close to X ? The distance should be weighted
on the number of possible targets!
The same argument applies to the other routing table rows

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

Further improvements

The quality of the described approximation may decade due to


cascade errors
PASTRY incorporates a second stage in building the locality
route tables
Node X joining the networks requires the state from each of
the nodes mentioned in the routing table and in the
neighbourhood set
Node X replaces in its state the nodes in case it received
better information
E.g. R`d of X may be replaced if node addressed by R`i has a
closest address (according to the proximity metric) that fits in
R`d .

Andrea Marin Routing overlay case of study: PASTRY


Introduction
Design overview
Self-adaptation
Improving the routing performance

Locality property

PASTRY locality features grant that a good route is found


but not that the best route is found
The process approximates the best routing to the destination
The routing decisions are taken locally!
Recall that a resource is present in the network with k
replicas. But the addressed one could be not the closest
(according to the proximity metric)

Andrea Marin Routing overlay case of study: PASTRY

You might also like