0% found this document useful (0 votes)
45 views31 pages

Fault Tolerant Systems: Part 13 - Networks - 3 Chapter 4: Network Fault Tolerance

Hypercube : multiple paths between nodes low diameter cost - a high node degree n ports - new node design Cube-Connected Cycles (CCC) network Keeps degree of a node fixed at 3 or less (for any n) each node of degree n in the hypercube is replaced by a cycle containing n nodes where the degree of every node in the cycle is 3 We may have a CCC(n,k) network with k

Uploaded by

himanshu_agra
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views31 pages

Fault Tolerant Systems: Part 13 - Networks - 3 Chapter 4: Network Fault Tolerance

Hypercube : multiple paths between nodes low diameter cost - a high node degree n ports - new node design Cube-Connected Cycles (CCC) network Keeps degree of a node fixed at 3 or less (for any n) each node of degree n in the hypercube is replaced by a cycle containing n nodes where the degree of every node in the cycle is 3 We may have a CCC(n,k) network with k

Uploaded by

himanshu_agra
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

FAULT TOLERANT SYSTEMS

http://www.ecs.umass.edu/ece/koren/FaultTolerantSystems

Part 13 - Networks 3 Chapter 4: Network Fault Tolerance

Part.13.1

Copyright 2007 Koren & Krishna, Morgan-Kaufman

Hypercube : multiple paths between nodes low diameter cost - a high node degree n ports - new node design

Cube-Connected Cycles Networks

required when size of network increases Cube-Connected Cycles (CCC) Network : Keeps degree of a node fixed at 3 or less (for any n) Example: A CCC network that corresponds to the H3 hypercube Each node of degree 3 in H3 is replaced by a cycle consisting of 3 nodes
Part.13.2 Copyright 2007 Koren & Krishna, Morgan-Kaufman

General CCC Network


Each node of degree n in the hypercube Hn is
replaced by a cycle containing n nodes where the degree of every node in the cycle is 3 The resulting CCC(n,n) network has n 2n nodes We may have a CCC(n,k) network with k 2n nodes each cycle includes kn nodes with the additional k-n nodes having a degree of 2 The extra nodes of degree 2 have a very small impact on the network properties We will restrict ourselves to k=n

Part.13.3

Copyright 2007 Koren & Krishna, Morgan-Kaufman

Labeling CCC Nodes


CCC node label - (i;j)
i - an n-bit binary number label of corresponding hypercube node j (0jn-1)- position of node within cycle

Two nodes (i;j) and (i';j') are linked

by an edge in the CCC if and only if either i = i' and j - j' = 1 mod n (link along the cycle) or j = j' and i differs from i' in precisely the jth bit (dimension-j edge in the hypercube) Example: Nodes 0 and 2 in H3 connected through a dimension-1 edge that corresponds to the edge connecting nodes (0,1) and (2,1)
Part.13.4 Copyright 2007 Koren & Krishna, Morgan-Kaufman

Hypercube vs.CCC
CCC has lower degree of nodes compared to
hypercube CCC has a higher diameter than hypercube Hypercube has a diameter of n The CCC(n,n) has a diameter of

Routing of messages in the CCC is more

complicated than in the hypercube Fault tolerance of the CCC is higher - failure of a single node in the CCC is similar to a single faulty link in the hypercube No closed form expression for reliability of CCC
Part.13.5 Copyright 2007 Koren & Krishna, Morgan-Kaufman

Loop Networks
The cycle topology (also called
6

1 2

loop network) that is replicated in the CCC network can serve as an interconnection network Advantages: 5 simple routing algorithm small node degree 4 Disadvantages: an n-node unidirectional loop has a diameter of n-1 - an average of n/2 intermediate forwarding nodes per message unidirectional loop network not fault tolerant a single node or link failure disconnects the network
Part.13.6

Copyright 2007 Koren & Krishna, Morgan-Kaufman

Chordal Networks
Reduce diameter and improve
fault tolerance of a unidirectional loop network by adding extra links - called chords Each node has an additional backward link connecting it to a node at a distance s, called the skip distance Node i (0in-1) has a forward link to node (i+1) mod n and a backward link to node (i-s) mod n Degree of every node is 4 for any value of n Example: 15-node chordal network with a skip distance of 3

Part.13.7

Copyright 2007 Koren & Krishna, Morgan-Kaufman

Minimizing Diameter of Chordal Networks Choice of s affects diameter D Looking for s that minimizes D Diameter - longest distance

from source to destination - depends on routing Assumption: Routing uses backward chords as long as this is advantageous b - number of backward chords used bs - number of nodes skipped b maximum value of b b's +b' n

Part.13.8

Copyright 2007 Koren & Krishna, Morgan-Kaufman

Minimizing Diameter Cont.


We may need to add a maximum of s-1 forward links to b

For most values of n, and Example:


n=15 ; optimal s = 3 minimal D = 5

minimizes D

Part.13.9

Copyright 2007 Koren & Krishna, Morgan-Kaufman

Reliability of Chordal Networks


Exact reliability calculation
- complicated Instead - calculate number of paths between the two farthest nodes in network If the number of these paths is maximized - reliability is likely to be high The minimum length between the two farthest nodes is b+s-1 The number of paths of length b+s-1 consisting of b' backward chords and (s-1) forward links is
Part.13.10 Copyright 2007 Koren & Krishna, Morgan-Kaufman

Increasing Reliability of Chordal Networks


s that maximizes However, for most values of n,
is

yields the same number of paths Conclusion: In most cases, the value of s that minimizes the diameter also maximizes the number of alternate paths should be selected in order to improve the reliability of the network

Part.13.11

Copyright 2007 Koren & Krishna, Morgan-Kaufman

Reliability of Point-to-Point Networks Not necessarily a regular structure - often

more than one path between any two nodes Path (or Terminal) Reliability - the probability that there exists an operational path between two specific nodes, given the probabilities of link failures

Example - calculating the path reliability for the source-destination pair N1 - N4


Part.13.12 Copyright 2007 Koren & Krishna, Morgan-Kaufman

Three paths from N1 to N4 P1={X1,2,X2,4} P2={X1,3,X3,4} P3={X1,2,X2,3,X3,4} Pi,j (qi,j) - probability that link Xi,j is good (faulty) Probability of node failure - incorporated into failure

Path Reliability - Example I

probability of outgoing links Events {path Pi is operational} must be modified to an equivalent set of mutually exclusive events otherwise some cases will be counted more than once

Mutually exclusive events - (I) P1 is up ; (II) P2 is


up and P1 is down ; (III) P2 are down

P3

is up and both

P1

and

Part.13.13

Copyright 2007 Koren & Krishna, Morgan-Kaufman

Calculating Path Reliability - General Path reliability of a network with m paths p1 ,..., pm

from source Ns to destination Nd Ei ( Ei ) - event in which Pi is operational (faulty) RNsNd = Prob{Operational Path Exists} = Prob{ Ei }

m events decomposed into mutually exclusive events


and or using conditional probabilities

Part.13.14

Copyright 2007 Koren & Krishna, Morgan-Kaufman

Calculating Conditional Probabilities


= - the event in which operational but not

Pj is

faulty given that

Pi is

To identify the links which must fail so that Ei occurs

Ej,

conditional sets are used

At least one link in Pj|i needs to fail

Part.13.15

Copyright 2007 Koren & Krishna, Morgan-Kaufman

Path Reliability - Example II


This six-node network has
9 links - 6 uni-directional and 3 bi-directional All paths from N1 to N6 -

Paths are ordered from shortest to longest


Part.13.16 Copyright 2007 Koren & Krishna, Morgan-Kaufman

Calculating Path Reliability for Example II


The first term for the reliability
equation is Prob{E1} = P1,3 P3,5 P5,6 To calculate the second term in the reliability equation the conditional set is used

At least one of the links in this set must fail so that The second term in the probability equation P1,2 P2,5 P5,6(1- P1,3 P3,5)

P1 is

faulty given that

P2 is

operational

Part.13.17

Copyright 2007 Koren & Krishna, Morgan-Kaufman

Example II - Cont.
For calculating other terms in the sum - intersection
of several conditional sets must be considered

Calculating the fourth term - expression for P4 the conditional sets are: P1|4 = {X5,6}; P2|4 = {X1,2 ,X2,5,X5,6}; P3|4 = {X1,2 ,X2,4}

P1|4 is included in P2|4 -

if P1|4 is faulty, so is P2|4 - P2|4 can be ignored The fourth term in the reliability equation P1,3 P3,5 P4,5 P4,6 (1- P5,6) (1- P1,2 P2,4)
Part.13.18 Copyright 2007 Koren & Krishna, Morgan-Kaufman

Example II Cont.
Calculating the
third term:

The event both P1|3 and P2|3 are faulty


(I) X5,6 is faulty needs to be divided into disjoint cases: (II) X5,6 is operational and both X1,3 and X2,5 are faulty (III) X5,6 and X1,3 are up, and X3,5 and X2,5 are faulty Resulting expression for third term

Remaining terms - calculated similarly Path reliability is the sum of all thirteen terms
Part.13.19 Copyright 2007 Koren & Krishna, Morgan-Kaufman

Alterrnative Calculation of Path Reliability


A given number of components (links) Each component can be up or down We need to calculate the probability that certain
combinations of the components are all up In the last example - 9 links with 29 states Probability of each state obtained by multiplying 9 factors of the form pi , j or qi , j We add up the probabilities of all the states in which a path from node N1 to node N6 exists

The sum is the path reliability RN1,N6


Part.13.20 Copyright 2007 Koren & Krishna, Morgan-Kaufman

Fault-Tolerant Routing
Objective: get a message from source to
destination despite a subset of the network being faulty Basic idea: if no shortest or most convenient path is available because of failures, reroute message through other paths to destination Focus on unicast routing - a message is sent from a source to just one destination Multicast - copies of a message sent to a number of nodes - is an extension of the unicast problem

Part.13.21

Copyright 2007 Koren & Krishna, Morgan-Kaufman

Classification of Routing Algorithms


Centralized routing
nodes, congested links - and selects path for each message Variation: Source of message specifies the route for that message

A central controller knows the network state faulty links or

Distributed routing Unique routing

No central controller each intermediate node decides to


which node to send it next

One path for each source-destination pair Adaptive routing Path selected according to network conditions (congestion) Implementing fault tolerance in centralized routing A centralized router that knows the state of each link Can use graph-theoretic algorithms to determine all paths Secondary considerations (load balancing, number of hops)
can be used to select the path
Part.13.22 Copyright 2007 Koren & Krishna, Morgan-Kaufman

Routing in Injured Hypercubes


Routing algorithm must be modified to route around
the faulty nodes or links Basic idea - list the dimensions along which the packet must travel, and traverse them one by one As edges are traversed and are crossed off the list If, due to a link or a node failure, the desired link is not available - another edge in the list, if any, is chosen for traversal If packet arrives at some node to find all dimensions on its list down - it backtracks to the previous node and tries again

Part.13.23

Copyright 2007 Koren & Krishna, Morgan-Kaufman

Formal Routing Algorithm - Notations


TD - list of dimensions that the message has
traveled on - in order of traversal; . TD R - in reversed order

ik=1 -

exclusive-or operation carried out k times, sequentially


3 i =1

Example -

ai

means (a1 a2 ) a3

D - destination, S - source, d=DS

( - bitwise exclusive-or operation on corresponding bits of D and S) SC(A) - set of nodes visited if we travel on each of the dimensions listed in set A Example - at node 0010 - SC(1,3)={0000,1000}
Part.13.24 Copyright 2007 Koren & Krishna, Morgan-Kaufman

Notations - Cont.
e - n-bit vector consisting of a 1 in the i-th bit
position and 0 everywhere else 2 Example - e3 = 100 Packets are assumed to consist of
i n

(I) d; d=DS (II) Message being transmitted (the payload) (III) List of dimensions taken so far - TD - append operation

TD x - append x to the list TD transmit(j) e j , message, TD j) along the send packet (d j-th-dimensional link from the present node
Part.13.25 Copyright 2007 Koren & Krishna, Morgan-Kaufman

Routing Algorithm for Injured Hypercubes

Part.13.26

Copyright 2007 Koren & Krishna, Morgan-Kaufman

Example of Routing in H3
H3 with faulty node 011 node 000 wants to
send a packet to 111 At 000, d=111 - sends the message out on dimension-0, to node 001 At 001, d = 110 and TD=(0) attempts dimension-1 edge impossible Bit 2 of d is also 1 - checks and finds that the dimension-2 edge to 101 is available - message is sent to 101 and then to 111 Exercise - What if both 011 and 101 are down?

Dimension-0 edges Dimension-1 edges Dimension-2 edges

Part.13.27

Copyright 2007 Koren & Krishna, Morgan-Kaufman

Origin-Based Routing in Mesh


Depth-first strategy:
No advance information about faulty nodes Backtracks if it arrives at a dead-end

If faulty regions are known in advance no backtracking is necessary Topology - two-dimensional NN mesh with at most N-1 failures
and more than N-1 failures

Can be extended to meshes of dimension 3

Assumptions: All faulty regions are square, if not, additional nodes


are declared to have pseudo faults Each node knows the distance along each direction to the nearest faulty region in that direction One node defined as the origin Origin chosen so that its row and column do not have any faulty nodes
possible since there are no more than N-1 failures

Part.13.28

Copyright 2007 Koren & Krishna, Morgan-Kaufman

Sending a message from node S to node D Definitions: IN-path - edges that take the message

Origin-Based Routing Algorithm - Definitions

closer to the origin OUT-path takes the message farther away from the origin (either may be empty) Outbox is the smallest rectangular region that contains both the origin and the destination V is a safe node with respect to D and a set of faulty nodes F if V is in the outbox for D there exists a fault-free OUT-path from V to D Diagonal band for D - all nodes V in the outbox such that

Once we get to a safe node, there exists an OUT-path from

that node to D Each step along an OUT-path increases the distance to the origin - the message cannot be traveling forever in circles
Part.13.29 Copyright 2007 Koren & Krishna, Morgan-Kaufman

Origin-Based Routing Algorithm


Three phases : Phase 1 : The message is routed on an IN path until
it reaches the outbox, at node U Phase 2. : Compute the distance from U to the nearest safe node and compare to the distance to the nearest faulty region in that direction If the safe node is closer than the fault, route to the safe node; otherwise, continue to route on the IN links Phase 3.Once the message is at a safe node U, if there is a safe nonfaulty neighbor V that is closer to the destination, send it to V ; otherwise, U must be on the edge of a faulty region In such a case, move the message along the edge of the faulty region toward the destination D, and turn toward the diagonal band when it arrives at the corner of the faulty square
Part.13.30 Copyright 2007 Koren & Krishna, Morgan-Kaufman

Origin-Based Routing Example


Routing a message from node S at the
northwest end of the network to D The message first moves along the IN links, getting closer to the origin It enters the outbox at node A Since there is a failure directly east of A, it continues on the IN links until it reaches the origin Then it continues, skirting the edge of the faulty region until it reaches node B At this point, it recognizes the existence of a safe node immediately to the north and sends the message through this node to the destination

Part.13.31

Copyright 2007 Koren & Krishna, Morgan-Kaufman

You might also like