Fault Tolerant Systems: Part 13 - Networks - 3 Chapter 4: Network Fault Tolerance
Fault Tolerant Systems: Part 13 - Networks - 3 Chapter 4: Network Fault Tolerance
http://www.ecs.umass.edu/ece/koren/FaultTolerantSystems
Part.13.1
Hypercube : multiple paths between nodes low diameter cost - a high node degree n ports - new node design
required when size of network increases Cube-Connected Cycles (CCC) Network : Keeps degree of a node fixed at 3 or less (for any n) Example: A CCC network that corresponds to the H3 hypercube Each node of degree 3 in H3 is replaced by a cycle consisting of 3 nodes
Part.13.2 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Part.13.3
by an edge in the CCC if and only if either i = i' and j - j' = 1 mod n (link along the cycle) or j = j' and i differs from i' in precisely the jth bit (dimension-j edge in the hypercube) Example: Nodes 0 and 2 in H3 connected through a dimension-1 edge that corresponds to the edge connecting nodes (0,1) and (2,1)
Part.13.4 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Hypercube vs.CCC
CCC has lower degree of nodes compared to
hypercube CCC has a higher diameter than hypercube Hypercube has a diameter of n The CCC(n,n) has a diameter of
complicated than in the hypercube Fault tolerance of the CCC is higher - failure of a single node in the CCC is similar to a single faulty link in the hypercube No closed form expression for reliability of CCC
Part.13.5 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Loop Networks
The cycle topology (also called
6
1 2
loop network) that is replicated in the CCC network can serve as an interconnection network Advantages: 5 simple routing algorithm small node degree 4 Disadvantages: an n-node unidirectional loop has a diameter of n-1 - an average of n/2 intermediate forwarding nodes per message unidirectional loop network not fault tolerant a single node or link failure disconnects the network
Part.13.6
Chordal Networks
Reduce diameter and improve
fault tolerance of a unidirectional loop network by adding extra links - called chords Each node has an additional backward link connecting it to a node at a distance s, called the skip distance Node i (0in-1) has a forward link to node (i+1) mod n and a backward link to node (i-s) mod n Degree of every node is 4 for any value of n Example: 15-node chordal network with a skip distance of 3
Part.13.7
Minimizing Diameter of Chordal Networks Choice of s affects diameter D Looking for s that minimizes D Diameter - longest distance
from source to destination - depends on routing Assumption: Routing uses backward chords as long as this is advantageous b - number of backward chords used bs - number of nodes skipped b maximum value of b b's +b' n
Part.13.8
minimizes D
Part.13.9
yields the same number of paths Conclusion: In most cases, the value of s that minimizes the diameter also maximizes the number of alternate paths should be selected in order to improve the reliability of the network
Part.13.11
more than one path between any two nodes Path (or Terminal) Reliability - the probability that there exists an operational path between two specific nodes, given the probabilities of link failures
Three paths from N1 to N4 P1={X1,2,X2,4} P2={X1,3,X3,4} P3={X1,2,X2,3,X3,4} Pi,j (qi,j) - probability that link Xi,j is good (faulty) Probability of node failure - incorporated into failure
probability of outgoing links Events {path Pi is operational} must be modified to an equivalent set of mutually exclusive events otherwise some cases will be counted more than once
P3
is up and both
P1
and
Part.13.13
Calculating Path Reliability - General Path reliability of a network with m paths p1 ,..., pm
from source Ns to destination Nd Ei ( Ei ) - event in which Pi is operational (faulty) RNsNd = Prob{Operational Path Exists} = Prob{ Ei }
Part.13.14
Pj is
Pi is
Ej,
Part.13.15
At least one of the links in this set must fail so that The second term in the probability equation P1,2 P2,5 P5,6(1- P1,3 P3,5)
P1 is
P2 is
operational
Part.13.17
Example II - Cont.
For calculating other terms in the sum - intersection
of several conditional sets must be considered
Calculating the fourth term - expression for P4 the conditional sets are: P1|4 = {X5,6}; P2|4 = {X1,2 ,X2,5,X5,6}; P3|4 = {X1,2 ,X2,4}
if P1|4 is faulty, so is P2|4 - P2|4 can be ignored The fourth term in the reliability equation P1,3 P3,5 P4,5 P4,6 (1- P5,6) (1- P1,2 P2,4)
Part.13.18 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Example II Cont.
Calculating the
third term:
Remaining terms - calculated similarly Path reliability is the sum of all thirteen terms
Part.13.19 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Fault-Tolerant Routing
Objective: get a message from source to
destination despite a subset of the network being faulty Basic idea: if no shortest or most convenient path is available because of failures, reroute message through other paths to destination Focus on unicast routing - a message is sent from a source to just one destination Multicast - copies of a message sent to a number of nodes - is an extension of the unicast problem
Part.13.21
One path for each source-destination pair Adaptive routing Path selected according to network conditions (congestion) Implementing fault tolerance in centralized routing A centralized router that knows the state of each link Can use graph-theoretic algorithms to determine all paths Secondary considerations (load balancing, number of hops)
can be used to select the path
Part.13.22 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Part.13.23
ik=1 -
Example -
ai
means (a1 a2 ) a3
( - bitwise exclusive-or operation on corresponding bits of D and S) SC(A) - set of nodes visited if we travel on each of the dimensions listed in set A Example - at node 0010 - SC(1,3)={0000,1000}
Part.13.24 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Notations - Cont.
e - n-bit vector consisting of a 1 in the i-th bit
position and 0 everywhere else 2 Example - e3 = 100 Packets are assumed to consist of
i n
(I) d; d=DS (II) Message being transmitted (the payload) (III) List of dimensions taken so far - TD - append operation
TD x - append x to the list TD transmit(j) e j , message, TD j) along the send packet (d j-th-dimensional link from the present node
Part.13.25 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Part.13.26
Example of Routing in H3
H3 with faulty node 011 node 000 wants to
send a packet to 111 At 000, d=111 - sends the message out on dimension-0, to node 001 At 001, d = 110 and TD=(0) attempts dimension-1 edge impossible Bit 2 of d is also 1 - checks and finds that the dimension-2 edge to 101 is available - message is sent to 101 and then to 111 Exercise - What if both 011 and 101 are down?
Part.13.27
If faulty regions are known in advance no backtracking is necessary Topology - two-dimensional NN mesh with at most N-1 failures
and more than N-1 failures
Part.13.28
Sending a message from node S to node D Definitions: IN-path - edges that take the message
closer to the origin OUT-path takes the message farther away from the origin (either may be empty) Outbox is the smallest rectangular region that contains both the origin and the destination V is a safe node with respect to D and a set of faulty nodes F if V is in the outbox for D there exists a fault-free OUT-path from V to D Diagonal band for D - all nodes V in the outbox such that
that node to D Each step along an OUT-path increases the distance to the origin - the message cannot be traveling forever in circles
Part.13.29 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Part.13.31