Distributed Algorithm
Distributed Algorithm
a set ( of congurations;
0
1; and
i
i +1
for i 0.
is reachable from if there is a sequence =
0
1
2
k
=
with
i
i +1
for 0 i < k.
is reachable if it is reachable from a 1.
States and Events
The conguration of a distributed algorithm is composed from the
states at its processes.
A transition is associated to an event (or, in case of synchronous
communication, two events) at one (or two) of its processes.
Local Algorithm at a Process
For simplicity we assume dierent channels carry dierent messages.
The local algorithm at a process consists of:
a set Z of states;
a relation
i
of internal events (c, d);
a relation
s
of send events (c, m, d); and
a relation
r
of receive events (c, m, d).
A process is an initiator if its rst event is an internal or send event.
Asynchronous Communication
Let p = (Z
p
, I
p
,
i
p
,
s
p
,
r
p
) for processes p.
Consider an asynchronous distributed algorithm (p
1
, . . . , p
N
).
( = Z
p
1
Z
p
N
M(/) (/ is the set of messages)
1 = I
p
1
I
p
N
(c
1
, . . . , c
j
, . . . , c
N
, M)
(c
1
, . . . , d
j
, . . . , c
N
, M) if (c
j
, d
j
)
i
p
j
(c
1
, . . . , c
j
, . . . , c
N
, M)
(c
1
, . . . , d
j
, . . . , c
N
, M m) if (c
j
, m, d
j
)
s
p
j
(c
1
, . . . , c
j
, . . . , c
N
, M)
(c
1
, . . . , d
j
, . . . , c
N
, Mm) if (c
j
, m, d
j
)
r
p
j
and m M
Synchronous Communication
Consider a synchronous distributed algorithm (p
1
, . . . , p
N
).
( = Z
p
1
Z
p
N
1 = I
p
1
I
p
N
(c
1
, . . . , c
j
, . . . , c
N
)
(c
1
, . . . , d
j
, . . . , c
N
) if (c
j
, d
j
)
i
p
j
(c
1
, . . . , c
j
, . . . , c
k
, . . . , c
N
)
(c
1
, . . . , d
j
, . . . , d
k
, . . . , c
N
) if (c
j
, m, d
j
)
s
p
j
and
(c
k
, m, d
k
)
r
p
k
for some m /
(c
1
, . . . , c
j
, . . . , c
k
, . . . , c
N
)
(c
1
, . . . , d
j
, . . . , d
k
, . . . , c
N
) if (c
j
, m, d
j
)
r
p
j
and
(c
k
, m, d
k
)
s
p
k
for some m /
Assertions
An assertion is a predicate on the set of congurations of an algorithm.
An assertion is a safety property if it is true in each conguration
of each execution of the algorithm.
An assertion is a liveness property if it is true in some conguration
of each execution of the algorithm.
Invariants
Assertion P is an invariant if:
L
(a) = maxk,
L
(b)+1.
Clocks
A clock maps occurrences of events in a computation to a partially
ordered set such that a b (a) < (b).
L
(a) = maxk,
L
(b)+1.
Complexity Measures
Resource consumption of distributed algorithms can be computed
in several ways.
Message complexity: Total number of messages exchanged by the
algorithm.
Bit complexity: Total number of bits exchanged by the algorithm.
(Only interesting when messages are very long.)
Time complexity: Amount of time consumed by the algorithm.
(We assume: (1) event processing takes no time, and (2) a
message is received at most one time unit after it is sent.)
Space complexity: Amount of space needed for the processes in
the algorithm.
Dierent computations may give rise to dierent consumption of
resources. We consider worst- and average-case complexity
(the latter with a probability distribution over all computations).
Assumptions
Unless stated otherwise, we assume:
asynchronous communication;
Termination: C is nite;
A node holding the token for the rst time informs all
neighbors except its father (and the node to which it will
forward the token).
The token is only forwarded to nodes that were not yet visited
by the token (except when a node sends the token to its father).
Awerbuchs Algorithm - Complexity
Message complexity: < 4[E[ messages
Frond edges carry 2 information and 2 acknowledgement messages.
Tree edges carry 2 forwarded tokens, and
possibly 1 information/acknowledgement pair.
Time complexity: 4[V[2 time units
Tree edges carry 2 forwarded tokens.
Nodes wait at most 2 time units before forwarding the token.
Awerbuchs Algorithm - Complexity
Message complexity: < 4[E[ messages
Frond edges carry 2 information and 2 acknowledgement messages.
Tree edges carry 2 forwarded tokens, and
possibly 1 information/acknowledgement pair.
Time complexity: 4[V[2 time units
Tree edges carry 2 forwarded tokens.
Nodes wait at most 2 time units before forwarding the token.
Question
Are the acknowledgements in Awerbuchs algorithm really needed?
Cidons Algorithm
Abolishes acknowledgements from Awerbuchs algorithm.
When initiator p
0
is passive and sc
p
0
= 0, it calls Announce.
Dijkstra-Scholten Algorithm
Requires (1) a centralized basic algorithm, and (2) an undirected
graph.
A tree T is maintained, in which the initiator is the root, and all
active processes are nodes of T. Initially, T consists of the initiator.
sc
p
estimates (from above) the number of children of process p in T.
When initiator p
0
is passive and sc
p
0
= 0, it calls Announce.
Question
Any suggestions to make the Dijkstra-Scholten algorithm
decentralized?
Shavit-Francez Algorithm
Allows a decentralized basic algorithm; requires an undirected graph.
A forest F of trees is maintained, rooted in the initiators. Each
process is in at most one tree of F.
Initially, each initiator constitutes a tree in F.
When p
0
is passive, it sends a white token.
When p
0
is passive, it sends a white token.
u sends min
u
to Nb
u
;
u
, being the
minimum of min
u
and v;
if min
u
= u, then u becomes the leader;
u sends min
u
to all neighbors except Nb
u
.
Message complexity: 2[V[2 messages
Tree Election Algorithm
Let G be an undirected, acyclic graph.
The tree election algorithm starts with a wake-up phase, driven by
the initiators.
The local algorithm at an awake node u:
u sends min
u
to Nb
u
;
u
, being the
minimum of min
u
and v;
if min
u
= u, then u becomes the leader;
u sends min
u
to all neighbors except Nb
u
.
Message complexity: 2[V[2 messages
Question
Why does u compute the minimum of min
u
and v?
Tree Election Algorithm - Example
4 2
3
6
1 5
2
5
2
1
4
1
4 2
3
6
1 5
2
5
2
1
4
1
4 2
3
6
1 5
2
5
2
1
4
1
4 2
3
6
1 5
2
1
2
1
4
1
4 2
3
6
1 5
2
1
1
1
4
1
4 2
3
6
1 5
2
1
1
1
1
1
4 2
3
6
1 5
1
1
1
1
1
1 leader
4 2
3
6
1 5
2
1
1
1
4
1
4 2
3
6
1 5
2
5
6
1
4
3
4 2
3
6
1 5
2
5
6
1
4
3
4 2
3
6
1 5
2
5
4
1
4
3
4 2
3
6
1 5
2
5
2
1
4
3
Question
How could the echo algorithm be used to get an election algorithm
for any undirected graph?
Echo Algorithm with Extinction
Election for undirected graphs, based on the echo algorithm:
when e
F
is found, the fragment at the other end is asked to
collaborate in a merge.
Level, name and core edge
Fragments carry a level L : N and a name FN.
Fragments F = (L, FN) and F
= (L
, FN
F
e
F
F
: F F
= (L
, FN
)
L > L
e
F
F: F F
= (L, FN)
L = L
e
F
= e
F
: F F
= (L+1, weight e
F
)
The core edge of a fragment is the last edge that connected two
sub-fragments at the same level; its end points are the core nodes.
Parameters of a node
Each node keeps track of:
level and name (i.e., the weight of the core edge) of its
fragment;
= (L
, FN
If L < L
, then initiate, L
, FN
,
nd
found
) is sent by v to u, and
forwarded through F;
F F
If L > L
If L = L
If
0
is symmetric and
0
1
, then
1
2
N
where
N
is symmetric.
So there is an innite fair execution.
Impossibility of Election in Anonymous Networks
Theorem: There is no terminating algorithm for electing a leader in
an asynchronous anonymous graph.
Proof: Take a (directed) ring of size N.
In a symmetric conguration, all nodes are in the same state and
all cannels carry the same messages.
If
0
is symmetric and
0
1
, then
1
2
N
where
N
is symmetric.
So there is an innite fair execution.
Probabilistic Algorithms
In a probabilistic algorithm, each process p holds two local
algorithms,
0
and
1
.
Let
p
: N 0, 1 for each p. In a -computation, the k-th event
at p is performed according to
p
(k)
.
For a probabilistic algorithm where all computations terminate in a
correct conguration,
0
is a correct non-probabilistic algorithm.
Monte Carlo and Las Vegas Algorithms
A probabilistic algorithm is Monte Carlo if:
p gets (
, u, h, b) with <
, or =
and id
p
> u:
it becomes passive and sends (
, u, h+1, b).
p gets (
, u, h, b) with >
, or =
and id
p
< u:
it purges the message.
p gets (, id
p
, h, b) with h < N: it sends (, id
p
, h+1, false).
p gets (, id
p
, N, false): it proceeds to an election at level +1.
p gets (, id
p
, N, true): it becomes the leader.
Passive processes pass on messages, increasing their hop count by one.
Itai-Rodeh Election Algorithm - Correctness + Complexity
Correctness: Eventually one leader is elected, with probability 1.
Average-case message complexity: In the order of N log N messages.
Without levels, the algorithm would break down (if channels are
non-FIFO).
Example:
u u w x
v v
u < v v < w,x
(v,1,true)
Question
Any suggestions how to adapt the echo algorithm with extinction,
to get an election algorithm for arbitrary anonymous undirected
graphs?
Election in Arbitrary Anonymous Networks
We use the echo algorithm with extinction, with random selection
of identities, for election in anonymous undirected graphs in which
all nodes know the network size.
Initially, initiators are active at level 0, and non-initiators are passive.
Each active process selects a random id, and starts a wave, tagged
with its id and level 0.
Suppose process p in wave v at level is hit by wave w at level
if <
, or =
if >
, or =
if =
if <
, or =
if >
, or =
if =
(p
i
) and
(p
i +N
) coincide
with (p
i
). (The probability of such an assignment is (
1
2
)
NL
.)
Let each event at a p
i
in C be executed concurrently at p
i
and
p
i +N
. This
(u, v) =
uv
if uv E and u ,= v
d
(u, v) = if uv , E and u ,= v
d
S{w}
(u, v) = mind
S
(u, v), d
S
(u, w)+d
S
(w, v)
Note that d
V
is the standard distance function.
All-Pairs Shortest-Path Problem
Let G = (V, E) be a directed, weighted graph, with weights
uv
> 0.
We want to compute for each pair of nodes u, v a shortest path
from u to v in G.
For S V, d
S
(u, v) denotes the length of a shortest path in G
with all intermediate nodes in S.
d
S
(u, u) = 0
d
(u, v) =
uv
if uv E and u ,= v
d
(u, v) = if uv , E and u ,= v
d
S{w}
(u, v) = mind
S
(u, v), d
S
(u, w)+d
S
(w, v)
Note that d
V
is the standard distance function.
Floyd-Warshall Algorithm
Exploits the last equality to compute d
S
where S grows from to V.
S :=
forall u, v V do
if u = v then D
u
[v] := 0; Nb
u
[v] :=
else if uv E then D
u
[v] :=
uv
; Nb
u
[v] := v
else D
u
[v] := ; Nb
u
[v] :=
while S ,= V do
pick w from VS (w-pivot round)
forall u, v V do
if D
u
[w]+D
w
[v] < D
u
[v] then
D
u
[v] := D
u
[w]+D
w
[v];
Nb
u
[v] := Nb
u
[w]
S := S w
Time complexity: ([V[
3
)
Floyd-Warshall Algorithm
Exploits the last equality to compute d
S
where S grows from to V.
S :=
forall u, v V do
if u = v then D
u
[v] := 0; Nb
u
[v] :=
else if uv E then D
u
[v] :=
uv
; Nb
u
[v] := v
else D
u
[v] := ; Nb
u
[v] :=
while S ,= V do
pick w from VS (w-pivot round)
forall u, v V do
if D
u
[w]+D
w
[v] < D
u
[v] then
D
u
[v] := D
u
[w]+D
w
[v];
Nb
u
[v] := Nb
u
[w]
S := S w
Time complexity: ([V[
3
)
Floyd-Warshall Algorithm
Exploits the last equality to compute d
S
where S grows from to V.
S :=
forall u, v V do
if u = v then D
u
[v] := 0; Nb
u
[v] :=
else if uv E then D
u
[v] :=
uv
; Nb
u
[v] := v
else D
u
[v] := ; Nb
u
[v] :=
while S ,= V do
pick w from VS (w-pivot round)
forall u, v V do
if D
u
[w]+D
w
[v] < D
u
[v] then
D
u
[v] := D
u
[w]+D
w
[v];
Nb
u
[v] := Nb
u
[w]
S := S w
Time complexity: ([V[
3
)
Floyd-Warshall Algorithm - Example
1 1
1
4
w
v
x
u
pivot u D
x
[v] := 5 D
v
[x] := 5
N
x
[v] := u N
v
[x] := u
pivot v D
u
[w] := 5 D
w
[u] := 5
N
u
[w] := v N
w
[u] := v
pivot w D
x
[v] := 2 D
v
[x] := 2
N
x
[v] := w N
v
[x] := w
pivot x D
u
[w] := 2 D
w
[u] := 2
N
u
[w] := x N
w
[u] := x
D
u
[v] := 3 D
v
[u] := 3
N
u
[v] := x N
v
[u] := w
Question
How can the Floyd-Warshall algorithm be turned into a distributed
algorithm?
Touegs Algorithm
A distributed version of Floyd-Warshall computes the routing
tables at their nodes.
Given an undirected, weighted graph.
Assumption: Each node knows from the start the identities of all
nodes in V. (Because pivots must be picked uniformly at all nodes.)
At the w-pivot round, w broadcasts its values D
w
[v], for all v V.
If Nb
u
[w] = with u ,= w at the w-pivot round, then D
u
[w] = ,
so D
u
[w]+D
w
[v] D
u
[v] for all v V. Hence the sink tree of w
can be used to broadcast D
w
.
Nodes u with Nb
u
[w] ,= must tell Nb
u
[w] to pass on D
w
:
u sends ys, w) to Nb
u
[w] if it is not ;
u sends ys, w) to Nb
u
[w] if it is not ;
global broadcast of D
w
at the w-pivot round;
D
u
[v
0
] := d+
uw
and Nb
u
[v
0
] := w;
u sends mydist, D
u
[v
0
]) to its neighbors (except w).
Termination detection by the Dijkstra-Scholten algorithm.
Worst-case message complexity: Exponential
Worst-case message complexity for minimum-hop: O([V[
2
[E[)
Chandy-Misra Algorithm
A centralized algorithm to compute all shortest paths to initiator v
0
.
Again, an undirected, weighted graph is assumed.
Each node uses only D
w
[v
0
] values from neighbors w.
Initially, D
v
0
[v
0
] = 0, D
u
[v
0
] = if u ,= v
0
, and Nb
u
[v
0
] =.
v
0
sends the message mydist, 0) to its neighbors.
When a node u receives mydist, d) from a neighbor w, and if
d+
uw
< D
u
[v
0
], then:
D
u
[v
0
] := d+
uw
and Nb
u
[v
0
] := w;
u sends mydist, D
u
[v
0
]) to its neighbors (except w).
Termination detection by the Dijkstra-Scholten algorithm.
Worst-case message complexity: Exponential
Worst-case message complexity for minimum-hop: O([V[
2
[E[)
Chandy-Misra Algorithm - Example
u
v
w
1
1
1
1
4
6
x
0
D
v
0
:= 0 Nb
v
0
:=
D
w
:= 6 Nb
w
:= v
0
D
u
:= 7 Nb
u
:= w
D
x
:= 8 Nb
x
:= u
D
x
:= 7 Nb
x
:= w
D
u
:= 4 Nb
u
:= v
0
D
w
:= 5 Nb
w
:= u
D
x
:= 6 Nb
x
:= w
D
x
:= 5 Nb
x
:= u
D
x
:= 1 Nb
x
:= v
0
D
w
:= 2 Nb
w
:= x
D
u
:= 3 Nb
u
:= w
D
u
:= 2 Nb
u
:= x
Merlin-Segall Algorithm
A centralized algorithm to compute all shortest paths to initiator v
0
.
Again, an undirected, weighted graph is assumed.
Initially, D
v
0
[v
0
] = 0, D
u
[v
0
] = if u ,= v
0
, and the Nb
u
[v
0
] form
a sink tree with root v
0
.
At each update round, for u ,= v
0
:
1. v
0
sends mydist, 0) to its neighbors.
2. Let u get mydist, d) from neighbor w.
If d+
uw
< D
u
[v
0
], D
u
[v
0
] := d+
uw
(and u stores w as
future value for Nb
u
[v
0
]).
If w = Nb
u
[v
0
], u sends mydist, D
u
[v
0
]) to its neighbors
except Nb
u
[v
0
].
3. When u received a mydist message from all its neighbors, it
sends mydist, D
u
[v
0
]) to Nb
u
[v
0
], and next updates Nb
u
[v
0
].
v
0
starts a new update round after receiving mydist from all neighbors.
Merlin-Segall Algorithm - Termination and Complexity
After i update rounds, all shortest paths of i hops have been
computed. The algorithm terminates after [V[1 update rounds.
Message complexity: ([V[
2
[E[)
Example:
4
1
1
2
1
4
0
3
5 1
4
1
1
4
1
4
0
1
3
5 1
4
1
1
2
1
3
0
3
5 1
4
1
1
0
3
5
Merlin-Segall Algorithm - Termination and Complexity
After i update rounds, all shortest paths of i hops have been
computed. The algorithm terminates after [V[1 update rounds.
Message complexity: ([V[
2
[E[)
Example:
4
1
1
2
1
4
0
3
5 1
4
1
1
4
1
4
0
1
3
5 1
4
1
1
2
1
3
0
3
5 1
4
1
1
0
3
5
Merlin-Segall Algorithm - Topology Changes
A number is attached to mydist messages.
When a channel fails or becomes operational, adjacent nodes send
the number of the update round to v
0
via the sink tree.
(If the message meets a failed tree link, it is discarded.)
When v
0
receives such a message, it starts a new set of [V[1
update rounds, with a higher number.
If a failed channel is part of the sink tree, the remaining tree is
extended to a sink tree, similar to the initialization phase.
Example:
0
w
v
y x
u z
y signals to z that the failed channel was part of the sink tree.
Breadth-First Search
Consider an undirected graph.
A spanning tree is a breadth-rst search tree if each tree path to
the root is minimum-hop.
The Chandy-Misra algorithm for minimum-hop paths computed a
breadth-rst search tree using O([V[[E[) messages (for each root).
Breadth-First Search - A Simple Algorithm
Initially (after round 0), the initiator is at level 0, and all other
nodes are at level .
After round f 0, each node at f hops from the node (1) is at
level f , and (2) knows which neighbors are at level f 1.
We explain what happens in round f +1:
0
explore
explore
explore/reverse
forward/reverse
forward/reverse
f
f + 1
Breadth-First Search - A Simple Algorithm
+[E[)
There are at most
|V|
| rounds.
Each round, a tree edge carries at most one forward and one
replying reverse.
In total, an edge carries at most 2 explores and replying reverses.
In total, a frond edge carries at most one spurious forward and
one replying no-child.
Worst-case time complexity: O(
|V|
2
)
Levels k+1 up to (k+1) are computed in 2(k+1) time units.
Let =
|V|
|E|
|. Then both message and time complexity are
O([V[
[E[).
Deadlock-Free Packet Switching
Let G = (V, E) be a directed graph, supplied with routing tables.
Each node has buers to store data packets on their way to their
destination.
For simplicity we assume synchronous communication.
Possible events:
If uv is an edge in T
i
, then the i th buer of u is linked to the
i th buer of v.
Hops-So-Far Scheme
k is the length of a longest path in any T
i
.
In the hops-so-far scheme, each node carries k+1 buers.
If uv is an edge in some T
i
, then each j th buer of u is linked
to the (j +1)th buer of v.
Forward-Count Controller
Suppose that for a packet p at a node u, the number s
u
(p) of hops
that p still has to make to its destination is always known.
f
u
is the number of free buers at u.
k
u
is the length of a longest path, starting in u, in any sink tree in G.
In the forward-count controller, each node u contains k
u
+1
buers. A packet p is accepted at node u if and only if s
u
(p) < f
u
.
If the buers of a node u are all empty, u can accept any packet.
Unlike the previous controllers, an accepted packet can be placed
in any buer.
Forward-Count Controller
Suppose that for a packet p at a node u, the number s
u
(p) of hops
that p still has to make to its destination is always known.
f
u
is the number of free buers at u.
k
u
is the length of a longest path, starting in u, in any sink tree in G.
In the forward-count controller, each node u contains k
u
+1
buers. A packet p is accepted at node u if and only if s
u
(p) < f
u
.
If the buers of a node u are all empty, u can accept any packet.
Unlike the previous controllers, an accepted packet can be placed
in any buer.
Forward-Count Controller - Correctness
Theorem: Forward-count controllers are deadlock-free.
Proof: Consider a reachable conguration where no forwarding or
consumption is possible. Suppose, toward a contradiction, that in
some buer is occupied.
Select a packet p, at some node u, with s
u
(p) minimal. p must be
forwarded to a node w, but is blocked:
s
w
(p) f
w
Then some buer in w is occupied. Let q be the packet at w that
arrived last. Let f
old
w
be the number of free buers before qs arrival.
s
w
(q) < f
old
w
f
w
+1
Hence, we get a contradiction:
s
u
(p) = s
w
(p)+1 f
w
+1 > s
w
(q)
So in , all buers are empty.
Forward-Count Controller - Correctness
Theorem: Forward-count controllers are deadlock-free.
Proof: Consider a reachable conguration where no forwarding or
consumption is possible. Suppose, toward a contradiction, that in
some buer is occupied.
Select a packet p, at some node u, with s
u
(p) minimal. p must be
forwarded to a node w, but is blocked:
s
w
(p) f
w
Then some buer in w is occupied. Let q be the packet at w that
arrived last. Let f
old
w
be the number of free buers before qs arrival.
s
w
(q) < f
old
w
f
w
+1
Hence, we get a contradiction:
s
u
(p) = s
w
(p)+1 f
w
+1 > s
w
(q)
So in , all buers are empty.
Example
Acyclic Orientation Cover Controller
Let G be undirected. An acyclic orientation of G is a directed,
acyclic graph obtained by directing all edges of G.
Acyclic orientations G
1
, . . . , G
n
of G are an acyclic orientation cover
of a set T of paths in G if each path in T is the concatenation of
paths P
1
, . . . , P
n
in G
1
, . . . , G
n
.
Given an acyclic orientation cover G
1
, . . . , G
n
of the sink trees. In
the acyclic orientation cover controller, each node has n buers.
If vw is an edge in G
i
, then the i th buer of v is linked to the
i th buer of w, and if i < n, the i th buer of w is linked to
the (i +1)th buer of v.
Acyclic Orientation Cover Controller
Let G be undirected. An acyclic orientation of G is a directed,
acyclic graph obtained by directing all edges of G.
Acyclic orientations G
1
, . . . , G
n
of G are an acyclic orientation cover
of a set T of paths in G if each path in T is the concatenation of
paths P
1
, . . . , P
n
in G
1
, . . . , G
n
.
Given an acyclic orientation cover G
1
, . . . , G
n
of the sink trees. In
the acyclic orientation cover controller, each node has n buers.
If vw is an edge in G
i
, then the i th buer of v is linked to the
i th buer of w, and if i < n, the i th buer of w is linked to
the (i +1)th buer of v.
Example
For each undirected ring there exists a deadlock-free controller that
uses three buers per node and allows packets to travel via
minimum-hop paths.
For instance, in case of a ring of size six:
G G G
1 2 3
Acyclic Orientation Cover Controller - Correctness
Theorem: Acyclic orientation cover controllers are deadlock-free.
Proof: Consider a reachable conguration . Make forwarding and
consumption transitions to a conguration where no forwarding
or consumption is possible.
Since G
n
is acyclic, packets in nth buers can travel to their
destinations. So in , nth buers are empty.
Suppose all (i +1)th buers are empty in , for some i < n. Then
all i th buers must also be empty in . For else, since G
i
is acyclic,
some packet in an i th buer could be forwarded or consumed.
Concluding, in all buers are empty.
Acyclic Orientation Cover Controller - Correctness
Theorem: Acyclic orientation cover controllers are deadlock-free.
Proof: Consider a reachable conguration . Make forwarding and
consumption transitions to a conguration where no forwarding
or consumption is possible.
Since G
n
is acyclic, packets in nth buers can travel to their
destinations. So in , nth buers are empty.
Suppose all (i +1)th buers are empty in , for some i < n. Then
all i th buers must also be empty in . For else, since G
i
is acyclic,
some packet in an i th buer could be forwarded or consumed.
Concluding, in all buers are empty.
Fault Tolerance
A process may (1) crash, i.e., execute no further events, or even (2)
become Byzantine, meaning that it can perform arbitrary events.
Assumptions: The graph is complete, i.e., there is an undirected
channel between each pair of dierent processes. Thus, failing
processes never make the network disconnected.
Crashing of processes cannot be observed.
Consensus: Correct processes must uniformly decide 0 or 1.
Assumption: Initial congurations are bivalent, meaning that both
decisions occur in some terminal congurations (that are reachable
by correct transitions).
Given a conguration, a set S of processes is b-potent if by only
executing events at processes in S, some process in S can decide b.
Fault Tolerance
A process may (1) crash, i.e., execute no further events, or even (2)
become Byzantine, meaning that it can perform arbitrary events.
Assumptions: The graph is complete, i.e., there is an undirected
channel between each pair of dierent processes. Thus, failing
processes never make the network disconnected.
Crashing of processes cannot be observed.
Consensus: Correct processes must uniformly decide 0 or 1.
Assumption: Initial congurations are bivalent, meaning that both
decisions occur in some terminal congurations (that are reachable
by correct transitions).
Given a conguration, a set S of processes is b-potent if by only
executing events at processes in S, some process in S can decide b.
Impossibility of 1-Crash Consensus
Theorem: There is no terminating algorithm for 1-crash consensus
(i.e., only one process may crash).
Proof: Suppose, toward a contradiction, there is such an algorithm.
Let be a reachable bivalent conguration. Then
0
and
1
, where
0
can lead to decision 0 and
1
to decision 1.
p sends k, value
p
, weight
p
) to all processes (including itself).
If w >
N
2
for more than t incoming messages k, b, w), then p
decides b. (Note that t < Nt.)
When p decides b, it broadcasts k+1, b, Nt) and k+2, b, Nt),
and terminates.
Bracha-Toueg Crash Consensus Algorithm - Example
N = 3 and t = 1. Each round a correct process requires two
incoming messages, and two b-votes with weight 2 to decide b.
(Messages of a process to itself are not depicted.)
weight 1
weight 1
weight 1
decide 0
weight 2
weight 2
weight 1
decide 0
weight 2
weight 2 crashed crashed
decide 0
weight 1
weight 1
0 0
<3,0,1>
0
<4,0,2>
<4,0,2> <3,0,1>
1
0 0 1
0
<1,1,1> <2,0,2> <2,0,2>
<1,0,1>
0
<1,0,1> <2,1,1>
0
0 0
<1,0,1>
Bracha-Toueg Crash Consensus Algorithm - Correctness
Theorem: Let t <
N
2
. The Bracha-Toueg t-crash consensus algorithm
is a Las Vegas algorithm that terminates with probability 1.
Proof (part I): Suppose a process decides b in round k. Then in this
round, value
q
= b and weight
q
>
N
2
for more than t processes q.
So in round k, each correct process receives a message q, b, w)
with w >
N
2
.
Hence, in round k+1, all correct processes vote b.
Then, after round k+2, all correct processes have decided b.
Concluding, all correct processes decide for the same value.
Bracha-Toueg Crash Consensus Algorithm - Correctness
Theorem: Let t <
N
2
. The Bracha-Toueg t-crash consensus algorithm
is a Las Vegas algorithm that terminates with probability 1.
Proof (part I): Suppose a process decides b in round k. Then in this
round, value
q
= b and weight
q
>
N
2
for more than t processes q.
So in round k, each correct process receives a message q, b, w)
with w >
N
2
.
Hence, in round k+1, all correct processes vote b.
Then, after round k+2, all correct processes have decided b.
Concluding, all correct processes decide for the same value.
Bracha-Toueg Crash Consensus Algorithm - Correctness
Proof (part II): Assumption: Scheduling of messages is fair.
Let S be a set of Nt processes that do not crash.
Due to fair scheduling, there is a chance > 0 that in a round
each process in S receives its rst Nt messages from processes in S.
So with chance
3
this happens in consecutive rounds k, k+1, k+2.
After round k, all processes in S have the same value b.
After round k+1, all processes in S have weight Nt >
N
2
.
Since Nt > t, after round k+2, all processes in S have decided b.
Concluding, the algorithm terminates with probability 1.
Impossibility of
N
3
|-Byzantine Consensus
Theorem: Let t
N
3
. There is no t-Byzantine consensus algorithm.
Proof: Suppose, toward a contradiction, that there is such an
algorithm. Since t
N
3
, we can choose sets S and T of processes
with [S[ = [T[ = Nt and [S T[ t.
Let conguration be reachable by a sequence of correct
transitions (so that still any process can become Byzantine). In ,
S and T are either both 0-potent or both 1-potent. For else, since
the processes in S T may become Byzantine, S and T could
independently decide for dierent values.
Since the initial conguration is bivalent, there is a conguration
, reachable by correct transitions, and a correct transition ,
with S and T both only b-potent in and only (1b)-potent in .
Such a transition cannot exist.
Impossibility of
N
3
|-Byzantine Consensus
Theorem: Let t
N
3
. There is no t-Byzantine consensus algorithm.
Proof: Suppose, toward a contradiction, that there is such an
algorithm. Since t
N
3
, we can choose sets S and T of processes
with [S[ = [T[ = Nt and [S T[ t.
Let conguration be reachable by a sequence of correct
transitions (so that still any process can become Byzantine). In ,
S and T are either both 0-potent or both 1-potent. For else, since
the processes in S T may become Byzantine, S and T could
independently decide for dierent values.
Since the initial conguration is bivalent, there is a conguration
, reachable by correct transitions, and a correct transition ,
with S and T both only b-potent in and only (1b)-potent in .
Such a transition cannot exist.
Bracha-Toueg Byzantine Consensus Algorithm
Let t <
N
3
. Bracha and Toueg gave a t-Byzantine consensus algorithm.
Again, in every round, correct processes broadcast their value, and
wait for Nt incoming messages. (No weights are needed.)
A correct process decides b upon receiving more than
Nt
2
+t =
N+t
2
b-votes in one round. (Note that
N+t
2
< Nt.)
Echo Mechanism
Complication: A Byzantine process may send dierent votes to
dierent processes.
Example: Let N = 4 and t = 1. Each round, a correct process
waits for 3 votes, and needs 3 b-votes to decide b.
decide 1 decide 0
decide 0
Byzantine
1
0
0
1
1
0
1
1 0
Byzantine
0
0
0 0
0
1 0
Solution: Each incoming vote is veried using an echo mechanism.
A vote is accepted after more than
N+t
2
conrming echos.
Bracha-Toueg Byzantine Consensus Algorithm
Initially, each correct process randomly chooses 0 or 1.
In round k, at each correct, undecided p:
), respectively.
More than t processes, so at least one correct process, sent such
messages to both p and q. Then b = b
.
Bracha-Toueg Byzantine Consensus Alg. - Correctness
Theorem: Let t <
N
3
. The Bracha-Toueg t-Byzantine consensus
algorithm is a Las Vegas algorithm that terminates with probability 1.
Proof: Each round, the correct processes eventually accept Nt
votes, since there are Nt correct processes (and Nt >
N+t
2
).
Suppose in round k, correct processes p and q accept votes for b
and b
), respectively.
More than t processes, so at least one correct process, sent such
messages to both p and q. Then b = b
.
Bracha-Toueg Byzantine Consensus Alg. - Correctness
Theorem: Let t <
N
3
. The Bracha-Toueg t-Byzantine consensus
algorithm is a Las Vegas algorithm that terminates with probability 1.
Proof: Each round, the correct processes eventually accept Nt
votes, since there are Nt correct processes (and Nt >
N+t
2
).
Suppose in round k, correct processes p and q accept votes for b
and b
), respectively.
More than t processes, so at least one correct process, sent such
messages to both p and q. Then b = b
.
Bracha-Toueg Byzantine Consensus Alg. - Correctness
Suppose a correct process decides b in round k. In this round it
accepts more than
N+t
2
b-votes. So in round k, each correct
process accepts more than
N+t
2
t =
Nt
2
b-votes. Hence, in
round k+1, value
q
= b for each correct q. This implies that the
correct processes vote b in all rounds > k. (Namely, in rounds
> k, each correct process accepts at least N2t >
Nt
2
b-votes.)
Let S be a set of Nt processes that do not become Byzantine.
Due to fair scheduling, there is a chance > 0 that in a round
each process in S accepts Nt votes from processes in S. So there
is a chance
2
that this happens in consecutive rounds k, k+1.
After round k, all processes in S have the same value b. After
round k+1, all processes in S have decided b.
Bracha-Toueg Byzantine Consensus Alg. - Correctness
Suppose a correct process decides b in round k. In this round it
accepts more than
N+t
2
b-votes. So in round k, each correct
process accepts more than
N+t
2
t =
Nt
2
b-votes. Hence, in
round k+1, value
q
= b for each correct q. This implies that the
correct processes vote b in all rounds > k. (Namely, in rounds
> k, each correct process accepts at least N2t >
Nt
2
b-votes.)
Let S be a set of Nt processes that do not become Byzantine.
Due to fair scheduling, there is a chance > 0 that in a round
each process in S accepts Nt votes from processes in S. So there
is a chance
2
that this happens in consecutive rounds k, k+1.
After round k, all processes in S have the same value b. After
round k+1, all processes in S have decided b.
Failure Detectors and Synchronous Systems
A failure detector at a process keeps track which processes have
(or may have) crashed.
Given a (known or unknown) upper bound on network delay, and
heartbeat messages by each process, one can implement a failure
detector.
In a synchronous system, processes execute in lock step.
Given local clocks that have a known bounded inaccuracy, and a
known upper bound on network delay, one can transform a system
based on asynchronous communication into a synchronous system.
With a failure detector, and for a synchronous system, the proof
for impossibility of 1-crash consensus no longer applies. Consensus
algorithms have been developed for these settings.
Failure Detectors and Synchronous Systems
A failure detector at a process keeps track which processes have
(or may have) crashed.
Given a (known or unknown) upper bound on network delay, and
heartbeat messages by each process, one can implement a failure
detector.
In a synchronous system, processes execute in lock step.
Given local clocks that have a known bounded inaccuracy, and a
known upper bound on network delay, one can transform a system
based on asynchronous communication into a synchronous system.
With a failure detector, and for a synchronous system, the proof
for impossibility of 1-crash consensus no longer applies. Consensus
algorithms have been developed for these settings.
Failure Detectors and Synchronous Systems
A failure detector at a process keeps track which processes have
(or may have) crashed.
Given a (known or unknown) upper bound on network delay, and
heartbeat messages by each process, one can implement a failure
detector.
In a synchronous system, processes execute in lock step.
Given local clocks that have a known bounded inaccuracy, and a
known upper bound on network delay, one can transform a system
based on asynchronous communication into a synchronous system.
With a failure detector, and for a synchronous system, the proof
for impossibility of 1-crash consensus no longer applies. Consensus
algorithms have been developed for these settings.
Failure Detection
Aim: To detect crashed processes.
T is the time domain. F() is the set of crashed processes at time .
A process cannot observe and F().
1
2
F(
1
) F(
2
) (i.e., no restart).
Crash(F) =
T
F(), and H(p, ) is the set of processes
suspected to be crashed by process p at time .
Each execution is decorated with a failure pattern F and a failure
detector history H.
We require that a failure detector is complete: From some time
onward, every crashed process is suspected by every correct process.
p Crash(F) q , Crash(F)
p H(q,
)
Failure Detection
Aim: To detect crashed processes.
T is the time domain. F() is the set of crashed processes at time .
A process cannot observe and F().
1
2
F(
1
) F(
2
) (i.e., no restart).
Crash(F) =
T
F(), and H(p, ) is the set of processes
suspected to be crashed by process p at time .
Each execution is decorated with a failure pattern F and a failure
detector history H.
We require that a failure detector is complete: From some time
onward, every crashed process is suspected by every correct process.
p Crash(F) q , Crash(F)
p H(q,
)
Strongly Accurate Failure Detection
A failure detector is strongly accurate if only crashed processes are
suspected:
p, q , F() p , H(q, )
Assumptions:
p
k
(if not crashed) broadcasts its value.
p
k
(if not crashed) broadcasts its value.
p, q , Crash(F) p , H(q,
)
Assumptions:
p, q , Crash(F) p , H(q,
)
Assumptions:
q , Crash(F) p , H(q,
)
Let t <
N
2
. Assume a complete and eventually weakly accurate
failure detector. Chandra and Toueg gave a rotating coordinator
algorithm for t-crash consensus.
Each process q records the last round ts
q
in which it updated
value
q
. Initially, value
q
0, 1 and ts
q
= 0.
Processes are numbered: p
1
, . . . , p
N
. Round k is coordinated by p
c
with c = (k modN)+1.
Note: Tel presents a simplied version of the Chandra-Toueg
algorithm (without acknowledgements and time stamps), which
only works for t <
N
3
.
Chandra-Toueg Algorithm
A failure detector is eventually weakly accurate if from some time
onward some process is never suspected:
p
q , Crash(F) p , H(q,
)
Let t <
N
2
. Assume a complete and eventually weakly accurate
failure detector. Chandra and Toueg gave a rotating coordinator
algorithm for t-crash consensus.
Each process q records the last round ts
q
in which it updated
value
q
. Initially, value
q
0, 1 and ts
q
= 0.
Processes are numbered: p
1
, . . . , p
N
. Round k is coordinated by p
c
with c = (k modN)+1.
Note: Tel presents a simplied version of the Chandra-Toueg
algorithm (without acknowledgements and time stamps), which
only works for t <
N
3
.
Chandra-Toueg Algorithm
A failure detector is eventually weakly accurate if from some time
onward some process is never suspected:
p
q , Crash(F) p , H(q,
)
Let t <
N
2
. Assume a complete and eventually weakly accurate
failure detector. Chandra and Toueg gave a rotating coordinator
algorithm for t-crash consensus.
Each process q records the last round ts
q
in which it updated
value
q
. Initially, value
q
0, 1 and ts
q
= 0.
Processes are numbered: p
1
, . . . , p
N
. Round k is coordinated by p
c
with c = (k modN)+1.
Note: Tel presents a simplied version of the Chandra-Toueg
algorithm (without acknowledgements and time stamps), which
only works for t <
N
3
.
Chandra-Toueg Algorithm
p
c
(if not crashed) waits until Nt such messages arrived,
and selects one, say k, b, ts), with ts as large as possible.
p
c
broadcasts k, b).
p
c
(if not crashed) waits until Nt acknowledgements
arrived. If more than t of them are ack, then p
c
decides b,
and broadcasts decide, b). (Note that Nt > t.)
When a process did not yet decide and receives decide, b),
it decides b.
Chandra-Toueg Algorithm - Example
<decide,0>
<1,0,0> <ack,1>
1
<2,1,0> <ack,2>
2
<1,0>
3
<2,0>
p p
p
ts 0
value 1 value 0
ts 0
ts 0 ts 1
value 0 value 0
ts 1
value 1 value 1 value 1
ts 0
value 0
ts 1
ts 0
ts 1
decide 0
value 1
crashed crashed value 0
value 0
value 0
ts 2
crashed value 0
ts 2
decide 0
value 0
ts 2
ts 1
value 0
ts 0
crashed value 0
ts 2
value 0
ts 2
decide 0
ts 2
N = 3 and t = 1
Chandra-Toueg Algorithm - Correctness
Theorem: Let t <
N
2
. The Chandra-Toueg algorithm is a
terminating algorithm for t-crash consensus.
Proof: If the coordinator in some round k receives > t acks, then
(for some b 0, 1):
(1) there are > t processes q with ts
q
k; and
(2) ts
q
k implies value
q
= b.
These properties are preserved in rounds > k.
This follows by induction on k.
By (1), in round the coordinator receives at least one message
with time stamp k.
Hence, by (2), the coordinator of round broadcasts , b).
Chandra-Toueg Algorithm - Correctness
Theorem: Let t <
N
2
. The Chandra-Toueg algorithm is a
terminating algorithm for t-crash consensus.
Proof: If the coordinator in some round k receives > t acks, then
(for some b 0, 1):
(1) there are > t processes q with ts
q
k; and
(2) ts
q
k implies value
q
= b.
These properties are preserved in rounds > k.
This follows by induction on k.
By (1), in round the coordinator receives at least one message
with time stamp k.
Hence, by (2), the coordinator of round broadcasts , b).
Chandra-Toueg Algorithm - Correctness
Theorem: Let t <
N
2
. The Chandra-Toueg algorithm is a
terminating algorithm for t-crash consensus.
Proof: If the coordinator in some round k receives > t acks, then
(for some b 0, 1):
(1) there are > t processes q with ts
q
k; and
(2) ts
q
k implies value
q
= b.
These properties are preserved in rounds > k.
This follows by induction on k.
By (1), in round the coordinator receives at least one message
with time stamp k.
Hence, by (2), the coordinator of round broadcasts , b).
Chandra-Toueg Algorithm - Correctness
Theorem: Let t <
N
2
. The Chandra-Toueg algorithm is a
terminating algorithm for t-crash consensus.
Proof: If the coordinator in some round k receives > t acks, then
(for some b 0, 1):
(1) there are > t processes q with ts
q
k; and
(2) ts
q
k implies value
q
= b.
These properties are preserved in rounds > k.
So from round k onward, processes can only decide b.
Since the failure detector is eventually weakly accurate, from some
round onward some process p will never be suspected.
So when p becomes the coordinator, it receives Nt acks.
Since Nt > t, it decides.
All correct processes eventually receive the decide message of p,
and also decide.
Chandra-Toueg Algorithm - Correctness
Theorem: Let t <
N
2
. The Chandra-Toueg algorithm is a
terminating algorithm for t-crash consensus.
Proof: If the coordinator in some round k receives > t acks, then
(for some b 0, 1):
(1) there are > t processes q with ts
q
k; and
(2) ts
q
k implies value
q
= b.
These properties are preserved in rounds > k.
So from round k onward, processes can only decide b.
Since the failure detector is eventually weakly accurate, from some
round onward some process p will never be suspected.
So when p becomes the coordinator, it receives Nt acks.
Since Nt > t, it decides.
All correct processes eventually receive the decide message of p,
and also decide.
Question
Why is it dicult to implement a failure detector for Byzantine
processes?
Local Clocks with Bounded Drift
Suppose we have a dense time domain.
Let each process p have a local clock C
p
(), which returns a time
value at real time .
We assume that each local clock has bounded drift, compared to
real time:
1
1+
(
2
1
) C
p
(
2
) C
p
(
1
) (1+)(
2
1
)
for some > 0.
Clock Synchronization
At certain time intervals, the processes synchronize clocks:
they read each others clock values, and adjust their local clocks.
The aim is to achieve, for some > 0,
[C
p
() C
q
()[
for all .
Due to drift, this precision may degrade over time, necessitating
repeated synchronizations.
Clock Synchronization
Suppose that after each synchronization, at say real time , for all
processes p, q:
[C
p
() C
q
()[
0
for some
0
< .
Due to -bounded drift, at real time +R,
[C
p
(+R) C
q
(+R)[
0
+ (1+
1
1+
)R <
0
+ 2R
So there should be a synchronization every
0
2
time units.
We assume a bound
max
on network delay. For simplicity, let
max
be much smaller than (so that this delay can be ignored).
Clock Synchronization
Suppose that after each synchronization, at say real time , for all
processes p, q:
[C
p
() C
q
()[
0
for some
0
< .
Due to -bounded drift, at real time +R,
[C
p
(+R) C
q
(+R)[
0
+ (1+
1
1+
)R <
0
+ 2R
So there should be a synchronization every
0
2
time units.
We assume a bound
max
on network delay. For simplicity, let
max
be much smaller than (so that this delay can be ignored).
Mahaney-Schneider Synchronizer
Consider a complete network of N processes, where at most t
processes can become Byzantine.
In the Mahaney-Schneider synchronizer, each correct process at a
synchronization:
1. Collects the clock values of all processes (waiting for 2
max
).
2. Discards those reported values for which less than Nt
processes report a value in the interval [, +]
(they are from Byzantine processes).
3. Replaces all discarded and non-received values with some
value such that there are accepted reported values
1
and
2
with
1
2
.
4. Takes the average of these N values as its new clock value.
Mahaney-Schneider Synchronizer - Correctness
Lemma: Let t <
N
3
. If values a
p
and a
q
pass the lters of correct
processes p and q, respectively, in some synchronization round,
then
[a
p
a
q
[ 2
Proof: At least Nt processes reported a value in [a
p
, a
p
+] to p.
And at least Nt processes reported a value in [a
q
, a
q
+] to q.
Since N2t > t, at least one correct process r reported a value in
[a
p
, a
p
+] to p, and in [a
q
, a
q
+] to q.
Since r reports the same value to p and q, it follows that
[a
p
a
q
[ 2
Mahaney-Schneider Synchronizer - Correctness
Theorem: Let t <
N
3
. The Mahaney-Schneider synchronizer is
t-Byzantine robust.
Proof: Let a
pr
(resp. a
qr
) be the value that correct process p (resp. q)
accepted or computed for process r , in some synchronization round.
By the lemma, for all r , [a
pr
a
qr
[ 2.
Moreover, a
pr
= a
qr
for all correct r .
Hence, for all correct p and q,
[
1
N
(
processes r
a
pr
)
1
N
(
processes r
a
qr
)[
1
N
t2 <
2
3
So we can take
0
=
2
3
, and there should be a synchronization
every
6
time units.
Impossibility of
N
3
|-Byzantine Synchronizers
Theorem: Let t
N
3
. There is no t-Byzantine robust synchronizer.
Proof: Let N = 3, t = 1. Processes are p, q, r ; r is Byzantine.
(The construction below easily extends to general N and t
N
3
.)
Let the local clock of p run faster than the local clock of q.
Suppose a synchronization takes place at real time .
r sends C
p
()+ to p, and C
q
() to q.
p and q cannot recognize that r is Byzantine. So they have to stay
within range of the value reported by r . Hence p cannot
decrease its clock value, and q cannot increase its clock value.
By repeating this scenario at each synchronization round, the clock
values of p and q get further and further apart.
Impossibility of
N
3
|-Byzantine Synchronizers
Theorem: Let t
N
3
. There is no t-Byzantine robust synchronizer.
Proof: Let N = 3, t = 1. Processes are p, q, r ; r is Byzantine.
(The construction below easily extends to general N and t
N
3
.)
Let the local clock of p run faster than the local clock of q.
Suppose a synchronization takes place at real time .
r sends C
p
()+ to p, and C
q
() to q.
p and q cannot recognize that r is Byzantine. So they have to stay
within range of the value reported by r . Hence p cannot
decrease its clock value, and q cannot increase its clock value.
By repeating this scenario at each synchronization round, the clock
values of p and q get further and further apart.
Synchronous Networks
A synchronous network proceeds in pulses. In one pulse, each process:
1. sends messages;
2. receives messages; and
3. performs internal events.
A message is sent and received in the same pulse.
Such synchrony is called lockstep.
From Synchronizer to Synchronous Network
Assume -bounded local clocks, and a synchronizer with precision .
For simplicity, let the maximum network delay
max
, and the time
to perform internal events in a pulse, be much smaller than .
When a process reads clock value (i 1)(1+)
2
, it starts pulse i .
Key question: Does each process receive all messages for pulse i
before the start of pulse i +1?
From Synchronizer to Synchronous Network
Assume -bounded local clocks, and a synchronizer with precision .
For simplicity, let the maximum network delay
max
, and the time
to perform internal events in a pulse, be much smaller than .
When a process reads clock value (i 1)(1+)
2
, it starts pulse i .
Key question: Does each process receive all messages for pulse i
before the start of pulse i +1?
From Synchronizer to Synchronous Network
When a process reads clock value (i 1)(1+)
2
, it starts pulse i .
Since the synchronizer has precision , and the clock of q is
-bounded (from below), for all ,
C
1
q
() C
1
p
() + (1+)
And since the clock of p is -bounded (from above), for all , ,
C
1
p
() + C
1
p
( + (1+))
Hence C
1
q
((i 1)(1+)
2
) C
1
p
(i (1+)
2
), so p receives the
message from q for pulse i before the start of pulse i +1.
Byzantine Broadcast
Consider a synchronous network of N processes, where at most t
processes can become Byzantine.
One process g, called the general, is given an input x
g
V.
The other processes are called lieutenants.
Requirements for t-Byzantine broadcast:
these bounds are known, but only valid from some unknown
point in time.
Public-Key Cryptosystems
A public-key cryptosystem consists of a nite message domain /
and, for each process q, functions S
q
, P
q
: / / with
S
q
(P
q
(m)) = P
q
(S
q
(m)) = m for m /.
S
q
is kept secret, P
q
is made public.
Underlying assumption: Computing S
q
from P
q
is expensive.
p sends secret message m to q: P
q
(m)
p sends signed message m to q: m, S
p
(m))
Example: RSA cryptosystem.
Lamport-Shostak-Pease Authenticating Algorithm
Pulse 1: The general broadcasts x
g
, (S
g
(x
g
), g)), and decides x
g
.
Pulse i : If a lieutenant q receives a message v, (
1
, p
1
) : : (
i
, p
i
))
that is valid, i.e.:
p
1
= g,
p
1
, . . . , p
i
, q are distinct, and
P
p
k
(
k
) = v for k = 1, . . . , i ,
then q includes v in the set W
q
.
If i t and [W
q
[ 2, then in pulse i +1, q broadcasts
v, (
1
, p
1
) : : (
i
, p
i
) : (S
q
(v), q))
After pulse t+1, each correct lieutenant p decides
v if W
p
is a singleton v, or
otherwise (the general is Byzantine)
Lamport-Shostak-Pease Authenticating Alg. - Correctness
Theorem: The Lamport-Shostak-Pease authenticating algorithm is
a t-Byzantine broadcast algorithm, for any t.
Proof: If the general is correct, then owing to authentication,
correct lieutenants only add x
g
to W
q
. So they all decide x
g
.
Suppose a lieutenant receives a valid message v,
) in pulse t+1.
Since
) in pulse t+1.
Since
either W
p
= for all p,
or [W
p
[ 2 for all p,
or W
p
= v for all p, for some v V.
In the rst two cases, all correct processes decide
(the general is Byzantine).
In the third case, they all decide v.
Example
N = 4 and t = 2.
g
Byzantine
Byzantine p
q r
pulse 1: g sends 0, (S
g
(0), g)) to p and q
g sends 1, (S
g
(1), g)) to r
W
p
= W
q
= 0
pulse 2: p broadcasts 0, (S
g
(0), g) : (S
p
(0), p))
q broadcasts 0, (S
g
(0), g) : (S
q
(0), q))
r sends 1, (S
g
(1), g) : (S
r
(1), r )) to q
W
p
= 0 and W
q
= 0, 1
pulse 3: q broadcasts 1, (S
g
(1), g) : (S
r
(1), r ) : (S
q
(1), q))
W
p
= W
q
= 0, 1
p and q decide
Example
N = 4 and t = 2.
g
Byzantine
Byzantine p
q r
pulse 1: g sends 0, (S
g
(0), g)) to p and q
g sends 1, (S
g
(1), g)) to r
W
p
= W
q
= 0
pulse 2: p broadcasts 0, (S
g
(0), g) : (S
p
(0), p))
q broadcasts 0, (S
g
(0), g) : (S
q
(0), q))
r sends 1, (S
g
(1), g) : (S
r
(1), r )) to q
W
p
= 0 and W
q
= 0, 1
pulse 3: q broadcasts 1, (S
g
(1), g) : (S
r
(1), r ) : (S
q
(1), q))
W
p
= W
q
= 0, 1
p and q decide
Example
N = 4 and t = 2.
g
Byzantine
Byzantine p
q r
pulse 1: g sends 0, (S
g
(0), g)) to p and q
g sends 1, (S
g
(1), g)) to r
W
p
= W
q
= 0
pulse 2: p broadcasts 0, (S
g
(0), g) : (S
p
(0), p))
q broadcasts 0, (S
g
(0), g) : (S
q
(0), q))
r sends 1, (S
g
(1), g) : (S
r
(1), r )) to q
W
p
= 0 and W
q
= 0, 1
pulse 3: q broadcasts 1, (S
g
(1), g) : (S
r
(1), r ) : (S
q
(1), q))
W
p
= W
q
= 0, 1
p and q decide
Mutual Exclusion
Processes p
0
, . . . , p
N1
contend for the critical section.
A process that can enter the critical section is called privileged.
For each execution, we require mutual exclusion and no starvation:
if a process p
i
tries to enter the critical section, and no
process remains privileged forever, then p
i
will eventually
become privileged.
Raymonds Algorithm
Requires an undirected graph, which must, also initially, form a
sink tree. At any time, the root, holding a token, is privileged.
Each process maintains a FIFO queue, which may contain identities
of its children, and its own id. Initially, this queue is empty.
Queue maintenance:
When a process that is not the root gets a new head at its
(non-empty) queue, it asks its father for the token.
p
j
is neither inside nor trying to enter the critical section; or
p
j
sent a request with time stamp ts
j
, and either ts
i
< ts
j
or
ts
i
= ts
j
i < j .
p
i
enters the critical section when it received permission from all
other processes.
Ricart-Agrawala Algorithm - Correctness
Mutual exclusion: Two processes cannot send permission to each
other concurrently.
Because when a process p sends permission to q, p has not issued
a request, and the logical time of p is greater than the time stamp
of qs request.
No starvation: Eventually a request will have the smallest time
stamp of all requests in the network.
Ricart-Agrawala Algorithm - Example 1
N = 2, and p
0
and p
1
both are at logical time 0.
p
1
sends request(1, 1) to p
0
. When p
0
receives this message, it
sets its logical time to 1.
p
0
sends permission to p
1
.
p
0
sends request(2, 0) to p
1
. When p
1
receives this message, it
does not send permission to p
0
, because (1, 1) < (2, 0).
p
1
receives permission from p
0
, and enters the critical section.
Ricart-Agrawala Algorithm - Example 2
N = 2, and p
0
and p
1
both are at logical time 0.
p
1
sends request(1, 1) to p
0
, and p
0
sends request(1, 0) to p
1
.
When p
0
receives the request from p
1
, it does not send permission
to p
1
, because (1, 0) < (1, 1).
When p
1
receives the request from p
0
, it sends permission to p
0
.
p
0
receives permission from p
1
, and enters the critical section.
Ricart-Agrawala Algorithm - Optimization
Drawback: High message complexity.
Carvalho-Roucairol optimization: After a rst entry of the critical
section, a process only needs to send a request to processes that it
sent permission to (since its last exit from the critical section).
Question
Suppose a leader has been elected in the network. Give a mutual
exclusion algorithm, with no starvation.
What is a drawback of such a mutual exclusion algorithm?
Mutual Exclusion with Shared Variables
Hagit Attiya and Jennifer Welch, Distributed Computing, McGraw
Hill, 1998 (Chapter 4)
See also Chapter 10 of: Nancy Lynch, Distributed Algorithms, 1996
Processes communicate via shared variables (called registers) in
shared memory.
read/write registers allow a process to perform an atomic read or
write.
A single-reader (or single-writer) register is readable (or writable)
by one process.
A multi-reader (or multi-writer) register is readable (or writable) by
all processes.
An Incorrect Solution for Mutual Exclusion
Let ag be a multi-reader/multi-writer register, with range 0, 1.
A process wanting to enter the critical section waits until ag = 0.
Then it writes ag := 1, and becomes privileged.
When it exits the critical section, it writes ag := 0.
The problem is that there is a time delay between reading ag = 0
and writing ag := 1, so that multiple processes can perform this
read and write in parallel.
An Incorrect Solution for Mutual Exclusion
Let ag be a multi-reader/multi-writer register, with range 0, 1.
A process wanting to enter the critical section waits until ag = 0.
Then it writes ag := 1, and becomes privileged.
When it exits the critical section, it writes ag := 0.
The problem is that there is a time delay between reading ag = 0
and writing ag := 1, so that multiple processes can perform this
read and write in parallel.
Dijkstras Mutual Exclusion Algorithm
turn is a multi-reader/multi-writer register with range 0, . . . , N1.
ag[i ] a multi-reader/ single-writer register, only writable by p
i
,
with range 0, 1, 2.
Initially they all have value 0.
We present the pseudocode for process p
i
.
Entry): L: ag[i ] := 1
while turn ,= i do
if ag[turn] = 0 then turn := i
ag[i ] := 2
for j ,= i do
if ag[j ] = 2 then goto L
Exit): ag[i ] := 0
Dijkstras Mutual Exclusion Algorithm - Correctness
This algorithm provides mutual exclusion.
And if a process p
i
tries to enter the critical section, and no
process remains in the critical section forever, then some process
will eventually become privileged (no deadlock).
However, there can be starvation.
Dijkstras Mutual Exclusion Algorithm - Example
Let N = 3.
ag[1] := 1 p
1
and p
2
read turn = 2
ag[2] := 1 p
1
reads ag[2] ,= 0
p
1
and p
2
read turn = 0 ag[2] := 2
p
1
and p
2
read ag[0] = 0 p
2
enters the critical section
turn := 1 p
2
exits the critical section
turn := 2 ag[2] := 0
ag[1] := 2 ag[2] := 1
ag[2] := 2 p
1
reads ag[2] ,= 0
ag[1] := 1 ag[2] := 2
ag[2] := 1 p
2
enters the critical section
Fischers Algorithm
Uses time delays, and the assumption that an operation can be
performed within one time unit.
turn is a multi-reader/multi-writer register with range 1, 0, . . . , N1.
Initially it has value -1.
We present the pseudocode for process p
i
.
Entry): L: wait until turn = 1
turn := i (takes less than one time unit)
delay of more than one time unit
if turn ,= i then goto L
Exit): turn := 1
Fischers algorithm guarantees mutual exclusion and no deadlock.
Lamports Bakery Algorithm
Multi-reader/single-writer registers number [i ] and choosing[i ]
range over N and 0, 1, respectively; they are only writeable by p
i
.
Initially they all have value 0.
When p
i
wants to enter the critical section, it writes a number to
number [i ] that is greater than number [j ] for all j ,= i .
Dierent processes can concurrently obtain the same number;
therefore the ticket of p
i
is the pair (number [i ], i ).
choosing[i ] = 1 while p
i
is obtaining a number.
When the critical section is empty, and no process is obtaining a
number, the process with the smallest ticket (n, i ) with n > 0 enters.
When p
i
exits the critical section, number [i ] is set to 0.
Lamports Bakery Algorithm
We present the pseudocode for process p
i
.
Entry) : choosing[i ] := 1
number [i ] := maxnumber [0], . . . , number [N1] + 1
choosing[i ] := 0
for j ,= i do
wait until choosing[j ] = 0
wait until number [j ] = 0 or (number [j ], j ) > (number [i ], i )
Exit) : number [i ] := 0
The bakery algorithm provides mutual exclusion and no starvation.
Drawback: Can have high synchronization delays.
Lamports Bakery Algorithm
We present the pseudocode for process p
i
.
Entry) : choosing[i ] := 1
number [i ] := maxnumber [0], . . . , number [N1] + 1
choosing[i ] := 0
for j ,= i do
wait until choosing[j ] = 0
wait until number [j ] = 0 or (number [j ], j ) > (number [i ], i )
Exit) : number [i ] := 0
The bakery algorithm provides mutual exclusion and no starvation.
Drawback: Can have high synchronization delays.
Lamports Bakery Algorithm - Example
Let N = 2.
choosing[1] := 1 p
0
exits the critical section
choosing[0] := 1 number [0] := 0
p
0
and p
1
read number [0] and
number [1] choosing[0] := 1
number [1] := 1 p
0
reads number [0] and number [1]
choosing[1] := 0 number [0] := 2
p
1
reads choosing[0] := 1 choosing[0] := 0
number [0] := 1 p
1
reads choosing[0] = 0 and
choosing[0] := 0 (number [1], 1) < (number [0], 0)
p
0
reads choosing[1] = 0 and p
1
enters the critical section
(number [0], 0) < (number [1], 1)
p
0
enters the critical section
Mutual Exclusion for Two Processes
Assume processes p
0
and p
1
.
ag[i ] is a multi-reader/single-writer register, only writeable by p
i
.
Its range is 0, 1; initially it has value 0.
code for p
0
code for p
1
Entry): L: ag[1] := 0
wait until ag[0] = 0
ag[0] := 1 ag[1] := 1
wait until ag[1] = 0 if ag[0] = 1 then goto L
Exit): ag[0] := 0 ag[1] := 0
This algorithm provides mutual exclusion and no deadlock.
However, p
1
may never progress from Entry to the critical section,
while p
0
enters the critical section innitely often (starvation of p
1
).
Mutual Exclusion for Two Processes
Assume processes p
0
and p
1
.
ag[i ] is a multi-reader/single-writer register, only writeable by p
i
.
Its range is 0, 1; initially it has value 0.
code for p
0
code for p
1
Entry): L: ag[1] := 0
wait until ag[0] = 0
ag[0] := 1 ag[1] := 1
wait until ag[1] = 0 if ag[0] = 1 then goto L
Exit): ag[0] := 0 ag[1] := 0
This algorithm provides mutual exclusion and no deadlock.
However, p
1
may never progress from Entry to the critical section,
while p
0
enters the critical section innitely often (starvation of p
1
).
Question
How can this mutual exlusion algorithm for two processes be
adapted so that it provides no starvation?
Peterson2P Algorithm
ag[i ] is a multi-reader/single-writer register, only writeable by p
i
.
priority is a multi-reader/ multi-writer register.
They all have range 0, 1; initially they have value 0.
code for p
0
code for p
1
Entry): L: ag[0] := 0 L: ag[1] := 0
wait until (ag[1] = 0 wait until (ag[0] = 0
or priority = 0) or priority = 1)
ag[0] := 1 ag[1] := 1
if priority = 1 then if priority = 0 then
if ag[1] = 1 then goto L if ag[0] = 1 then goto L
else wait until ag[1] = 0 else wait until ag[0] = 0
Exit): priority := 1 priority := 0
ag[0] := 0 ag[1] := 0
Peterson2P algorithm provides mutual exclusion and no starvation.
Peterson2P Algorithm
ag[i ] is a multi-reader/single-writer register, only writeable by p
i
.
priority is a multi-reader/ multi-writer register.
They all have range 0, 1; initially they have value 0.
code for p
0
code for p
1
Entry): L: ag[0] := 0 L: ag[1] := 0
wait until (ag[1] = 0 wait until (ag[0] = 0
or priority = 0) or priority = 1)
ag[0] := 1 ag[1] := 1
if priority = 1 then if priority = 0 then
if ag[1] = 1 then goto L if ag[0] = 1 then goto L
else wait until ag[1] = 0 else wait until ag[0] = 0
Exit): priority := 1 priority := 0
ag[0] := 0 ag[1] := 0
Peterson2P algorithm provides mutual exclusion and no starvation.
Peterson2P Algorithm - Example
ag[1] := 0 (L) ag[0] := 0 (L)
ag[1] := 1 ag[0] := 1
ag[0] := 0 (L) p
0
enters the critical section
ag[0] := 1 ag[1] := 1
ag[1] := 0 p
0
exits the critical section
p
0
enters the critical section priority := 1
p
0
exits the critical section ag[0] := 0
priority := 1 ag[0] := 0 (L)
ag[0] := 0 p
1
enters the critical section
Question
How can the Peterson2P algorithm be transformed into a mutual
exclusion algorithm for N 2 processes?
PetersonNP Algorithm
Assume processes p
0
, . . . , p
N1
.
Processes compete pairwise, using the Peterson2P algorithm, in a
tournament tree, which is a complete binary tree.
If next
q
= p, then q sets locked
p
:= false, upon which p can
enter the critical section.
If next
q
=, then q performs compare(q, ) on last.
If q nds that last = p ,= q, it waits until next
q
= p, and then
sets locked
p
:= false.
Note that p only needs to repeatedly poll its local variables
locked
p
and (sometimes, for a short period) next
p
.
Mellor-Crummey-Scott Lock
Let process q exit the critical section.
If next
q
= p, then q sets locked
p
:= false, upon which p can
enter the critical section.
If next
q
=, then q performs compare(q, ) on last.
If q nds that last = p ,= q, it waits until next
q
= p, and then
sets locked
p
:= false.
Note that p only needs to repeatedly poll its local variables
locked
p
and (sometimes, for a short period) next
p
.
Mellor-Crummey-Scott Lock - Example
q performs fetch-and-store(q) on last: last := q
q enters the critical section
p performs fetch-and-store(p) on last: last := p
p performs locked
p
:= true
q exits the critical section, and reads next
q
=
q performs compare(q, ) on last
Since last = p ,= q, q must wait until next
q
= p
p performs next
q
:= p
q performs locked
p
:= false
p enters the critical section
Self-Stabilization
All congurations are initial congurations.
An algorithm is self-stabilizing if every execution reaches a
correct conguration.
Advantages:
fault tolerance
straightforward initialization
Processes communicate via registers in shared memory.
Self-Stabilization
All congurations are initial congurations.
An algorithm is self-stabilizing if every execution reaches a
correct conguration.
Advantages:
fault tolerance
straightforward initialization
Processes communicate via registers in shared memory.
Dijkstras Self-Stabilizing Token Ring
Let p
0
, . . . , p
N1
form a directed ring, where each p
i
holds a value
i
0, . . . , K1 with K N.
p
i
with 0 < i < N is privileged if
i
,=
i 1
.
p
0
is privileged if
0
=
N1
.
Each privileged process is allowed to change its value, causing the
loss of its privilege:
i
:=
i 1
when
i
,=
i 1
, for 0 < i < N;
0
:= (
N1
+1) modK when
0
=
N1
.
If K N, then Dijkstras token ring self-stabilizes. That is, each
execution will reach a conguration where mutual exclusion is satised.
Moreover, Dijkstras token ring guarantees no starvation.
Dijkstras Self-Stabilizing Token Ring
Let p
0
, . . . , p
N1
form a directed ring, where each p
i
holds a value
i
0, . . . , K1 with K N.
p
i
with 0 < i < N is privileged if
i
,=
i 1
.
p
0
is privileged if
0
=
N1
.
Each privileged process is allowed to change its value, causing the
loss of its privilege:
i
:=
i 1
when
i
,=
i 1
, for 0 < i < N;
0
:= (
N1
+1) modK when
0
=
N1
.
If K N, then Dijkstras token ring self-stabilizes. That is, each
execution will reach a conguration where mutual exclusion is satised.
Moreover, Dijkstras token ring guarantees no starvation.
Dijkstras Token Ring - Example
Let N = K = 4. Consider the initial conguration
0
1
2
3
p
p 3 p
2
0
1
p
It is not hard to see that it self-stabilizes. For instance,
0 0
1 1
2 2
3 3
p
p 0 p
p
p
p 0 p
p
2
0 0
0
3
3
Dijkstras Token Ring - Correctness
Theorem: If K N, then Dijkstras token ring self-stabilizes.
Proof: In each conguration at least one process is privileged.
A transition never increases the number of privileged processes.
Consider an execution. After at most
1
2
(N1)N events at
p
1
, . . . , p
N1
, an event must happen at p
0
. So during the
execution,
0
ranges over all values in 0, . . . , K1. Since
p
1
, . . . , p
N1
only copy values, and K N, in some conguration
of the execution,
0
,=
i
for all 0 < i < N.
The next time p
0
becomes privileged, clearly
i
=
0
for all
0 < i < N. So then mutual exclusion has been achieved.
Dijkstras Token Ring - Correctness
Theorem: If K N, then Dijkstras token ring self-stabilizes.
Proof: In each conguration at least one process is privileged.
A transition never increases the number of privileged processes.
Consider an execution. After at most
1
2
(N1)N events at
p
1
, . . . , p
N1
, an event must happen at p
0
. So during the
execution,
0
ranges over all values in 0, . . . , K1. Since
p
1
, . . . , p
N1
only copy values, and K N, in some conguration
of the execution,
0
,=
i
for all 0 < i < N.
The next time p
0
becomes privileged, clearly
i
=
0
for all
0 < i < N. So then mutual exclusion has been achieved.
Question
Can you argue why, if N 3, Dijkstras token ring also
self-stabilizes when K = N 1?
This lower bound for K is sharp!
Dijkstras Token Ring - Lower Bound for K
Example: Let N 4 and K = N2, and consider the following
initial conguration.
. . .
0
2
1
4
3
N-4
N-3
N-2
N-1
N-3
N-5
N-3
N-3
p
2
1
0
p
p p
p N-4
p
p p N-6
p
It does not always self-stabilize.
Dijkstras Token Ring - Message Complexity
Worst-case message complexity: Mutual exclusion is achieved after
at most O(N
2
) transitions.
p
i
for 0 < i < N can copy the initial values of p
0
, . . . , p
i 1
.
(Total:
1
2
(N1)N events.)
p
0
takes on at most N new values to attain a fresh value. These
values can be copied by p
1
, . . . , p
N1
. (Total: N
2
events.)
Arora-Gouda Self-Stabilizing Election Algorithm
Given an undirected network.
Let an upper bound K on the network size be known to all processes.
The process with the largest id becomes the leader.
Each process p
i
maintains the following variables:
Neigh
i
: the set of identities of its neighbors
father
i
: its father in the sink tree
leader
i
: the root of the sink tree
dist
i
: its distance from the root
Arora-Gouda Election Algorithm - Complications
Due to arbitrary initialization, there are three complications.
Complication 1: Multiple processes may consider themselves root
of the sink tree.
Complication 2: There may be cycles in the sink tree.
Complication 3: leader
i
may not be the id of any process in the
network.
Arora-Gouda Election Algorithm
A process p
i
declares itself leader, i.e.
leader
i
:= i father
i
:= dist
i
:= 0
if it detects an inconsistency in its local variables:
leader
i
< i ; or
father
i
=, and leader
i
,= i or dist
i
> 0; or
father
i
, Neigh
i
; or
dist
i
K.
Suppose father
i
= j with j Neigh
i
and dist
j
< K.
If leader
i
,= leader
j
, then leader
i
:= leader
j
.
If dist
i
,= dist
j
+ 1, then dist
i
:= dist
j
+ 1.
Arora-Gouda Election Algorithm
A process p
i
declares itself leader, i.e.
leader
i
:= i father
i
:= dist
i
:= 0
if it detects an inconsistency in its local variables:
leader
i
< i ; or
father
i
=, and leader
i
,= i or dist
i
> 0; or
father
i
, Neigh
i
; or
dist
i
K.
Suppose father
i
= j with j Neigh
i
and dist
j
< K.
If leader
i
,= leader
j
, then leader
i
:= leader
j
.
If dist
i
,= dist
j
+ 1, then dist
i
:= dist
j
+ 1.
Arora-Gouda Election Algorithm
If leader
i
< leader
j
where j Neigh
i
and dist
j
< K, then
leader
i
:= leader
j
father
i
:= j dist
i
:= dist
j
+ 1
To obtain a breadth-rst search tree, one can add:
If leader
i
= leader
j
where j Neigh
i
and dist
j
+ 1 < dist
i
, then
leader
i
:= leader
j
father
i
:= j dist
i
:= dist
j
+ 1
Arora-Gouda Election Algorithm
If leader
i
< leader
j
where j Neigh
i
and dist
j
< K, then
leader
i
:= leader
j
father
i
:= j dist
i
:= dist
j
+ 1
To obtain a breadth-rst search tree, one can add:
If leader
i
= leader
j
where j Neigh
i
and dist
j
+ 1 < dist
i
, then
leader
i
:= leader
j
father
i
:= j dist
i
:= dist
j
+ 1
Arora-Gouda Election Algorithm - Example
leader = 6
father = 5
dist = 4
p
2
p
5
p
p
3
p
1
4
leader = 6
father = 1
dist = 4
leader = 6
father = 3
dist = 3
leader = 6
father = 2
dist = 2
leader = 6
father = 3
dist = 3
Arora-Gouda Election Algorithm - Example
leader = 6
father = 5
dist = 4
p
2
p
5
p
p
3
p
1
4
leader = 6
father = 1
dist = 4
leader = 6
father = 3
dist = 3
leader = 6
father = 2
dist = 5
leader = 6
father = 3
dist = 3
Arora-Gouda Election Algorithm - Example
leader = 6
father = 5
dist = 4
p
2
p
5
p
p
3
p
1
4
leader = 6
father = 1
dist = 4
leader = 6
father = 3
dist = 3
leader = 3
father =
dist = 0
leader = 6
father = 3
dist = 3
Arora-Gouda Election Algorithm - Example
leader = 3
father = 5
dist = 2
p
2
p
5
p
p
3
p
1
4
leader = 3
father = 1
dist = 2
leader = 3
father = 3
dist = 1
leader = 3
father =
dist = 0
leader = 3
father = 3
dist = 1
Arora-Gouda Election Algorithm - Example
leader = 3
father = 5
dist = 2
p
2
p
5
p
p
3
p
1
4
leader = 4
father =
dist = 0
leader = 3
father = 3
dist = 1
leader = 3
father =
dist = 0
leader = 3
father = 3
dist = 1
Arora-Gouda Election Algorithm - Example
leader = 4
father = 5
dist = 4
p
2
p
5
p
p
3
p
1
4
leader = 4
father =
dist = 0
leader = 4
father = 3
dist = 3
leader = 4
father = 1
dist = 2
leader = 4
father = 4
dist = 1
Arora-Gouda Election Algorithm - Example
leader = 4
father = 5
dist = 4
p
2
p
5
p
p
3
p
1
4
leader = 4
father =
dist = 0
leader = 5
father =
dist = 0
leader = 4
father = 1
dist = 2
leader = 4
father = 4
dist = 1
Arora-Gouda Election Algorithm - Example
leader = 5
father = 5
dist = 1
p
2
p
5
p
p
3
p
1
4
leader = 5
father = 1
dist = 3
leader = 5
father =
dist = 0
leader = 5
father = 5
dist = 1
leader = 5
father = 3
dist = 2
Arora-Gouda Election Algorithm - Correctness
A subgraph in the network with a leader value j that is not an id of
any node in this subgraph, contains an inconsistency or a cycle.
Such an inconsistency or cycle will eventually cause a process in
this subgraph to declare itself leader.
Let i be the largest id of any process in the network.
p
i
will eventually declare itself leader.
After p
i
has declared itself leader, the algorithm will eventually
converge to a spanning tree with root p
i
.
Arora-Gouda Election Algorithm - Correctness
A subgraph in the network with a leader value j that is not an id of
any node in this subgraph, contains an inconsistency or a cycle.
Such an inconsistency or cycle will eventually cause a process in
this subgraph to declare itself leader.
Let i be the largest id of any process in the network.
p
i
will eventually declare itself leader.
After p
i
has declared itself leader, the algorithm will eventually
converge to a spanning tree with root p
i
.
Arora-Gouda Election Algorithm - Correctness
A subgraph in the network with a leader value j that is not an id of
any node in this subgraph, contains an inconsistency or a cycle.
Such an inconsistency or cycle will eventually cause a process in
this subgraph to declare itself leader.
Let i be the largest id of any process in the network.
p
i
will eventually declare itself leader.
After p
i
has declared itself leader, the algorithm will eventually
converge to a spanning tree with root p
i
.
Afek-Kutten-Yung Self-Stabilizing Election Algorithm
No upper bound on the network size needs to be known.
The process with the largest id becomes the leader.
A process p
i
declares itself leader, i.e.
leader
i
:= i father
i
:= dist
i
:= 0
if these three variables do not yet all have these values, and p
i
detects even the slightest inconsistency in its local variables:
leader
i
i or father
i
, Neigh
i
or
leader
i
,= leader
father
i
or dist
i
,= dist
father
i
+ 1
p
i
can make a neighbor p
j
its father if leader
i
< leader
j
:
leader
i
:= leader
j
father
i
:= j dist
i
:= dist
j
+ 1
Afek-Kutten-Yung Self-Stabilizing Election Algorithm
No upper bound on the network size needs to be known.
The process with the largest id becomes the leader.
A process p
i
declares itself leader, i.e.
leader
i
:= i father
i
:= dist
i
:= 0
if these three variables do not yet all have these values, and p
i
detects even the slightest inconsistency in its local variables:
leader
i
i or father
i
, Neigh
i
or
leader
i
,= leader
father
i
or dist
i
,= dist
father
i
+ 1
p
i
can make a neighbor p
j
its father if leader
i
< leader
j
:
leader
i
:= leader
j
father
i
:= j dist
i
:= dist
j
+ 1
Afek-Kutten-Yung Self-Stabilizing Election Algorithm
No upper bound on the network size needs to be known.
The process with the largest id becomes the leader.
A process p
i
declares itself leader, i.e.
leader
i
:= i father
i
:= dist
i
:= 0
if these three variables do not yet all have these values, and p
i
detects even the slightest inconsistency in its local variables:
leader
i
i or father
i
, Neigh
i
or
leader
i
,= leader
father
i
or dist
i
,= dist
father
i
+ 1
p
i
can make a neighbor p
j
its father if leader
i
< leader
j
:
leader
i
:= leader
j
father
i
:= j dist
i
:= dist
j
+ 1
Question
Suppose that during an application of the Afek-Kutten-Yung leader
election algorithm, the created subgraph contains a cycle.
Why will at least one of the processes on this cycle declare itself
leader?
Afek-Kutten-Yung Election Algorithm - Complication
Processes can innitely often join a component of the created
subgraph with a false leader.
Example: Given two adjacent processes p
0
and p
1
.
leader
0
= leader
1
= 2; father
0
= 1 and father
1
= 0; dist
0
= dist
1
= 0.
Since dist
0
,= dist
1
+ 1, p
0
declares itself leader:
leader
0
:= 0, father
0
:= and dist
0
:= 0.
Since leader
0
< leader
1
, p
0
makes p
1
its father:
leader
0
:= 2, father
0
:= 1 and dist
0
:= 2.
Since dist
1
,= dist
0
+ 1, p
1
declares itself leader:
leader
1
:= 1, father
1
:= and dist
1
:= 0.
Since leader
1
< leader
0
, p
1
makes p
0
its father:
leader
1
:= 2, father
1
:= 0 and dist
1
:= 3.
Et cetera
Afek-Kutten-Yung Election Algorithm - Complication
Processes can innitely often join a component of the created
subgraph with a false leader.
Example: Given two adjacent processes p
0
and p
1
.
leader
0
= leader
1
= 2; father
0
= 1 and father
1
= 0; dist
0
= dist
1
= 0.
Since dist
0
,= dist
1
+ 1, p
0
declares itself leader:
leader
0
:= 0, father
0
:= and dist
0
:= 0.
Since leader
0
< leader
1
, p
0
makes p
1
its father:
leader
0
:= 2, father
0
:= 1 and dist
0
:= 2.
Since dist
1
,= dist
0
+ 1, p
1
declares itself leader:
leader
1
:= 1, father
1
:= and dist
1
:= 0.
Since leader
1
< leader
0
, p
1
makes p
0
its father:
leader
1
:= 2, father
1
:= 0 and dist
1
:= 3.
Et cetera
Afek-Kutten-Yung Election Algorithm - Complication
Processes can innitely often join a component of the created
subgraph with a false leader.
Example: Given two adjacent processes p
0
and p
1
.
leader
0
= leader
1
= 2; father
0
= 1 and father
1
= 0; dist
0
= dist
1
= 0.
Since dist
0
,= dist
1
+ 1, p
0
declares itself leader:
leader
0
:= 0, father
0
:= and dist
0
:= 0.
Since leader
0
< leader
1
, p
0
makes p
1
its father:
leader
0
:= 2, father
0
:= 1 and dist
0
:= 2.
Since dist
1
,= dist
0
+ 1, p
1
declares itself leader:
leader
1
:= 1, father
1
:= and dist
1
:= 0.
Since leader
1
< leader
0
, p
1
makes p
0
its father:
leader
1
:= 2, father
1
:= 0 and dist
1
:= 3.
Et cetera
Afek-Kutten-Yung Election Algorithm - Complication
Processes can innitely often join a component of the created
subgraph with a false leader.
Example: Given two adjacent processes p
0
and p
1
.
leader
0
= leader
1
= 2; father
0
= 1 and father
1
= 0; dist
0
= dist
1
= 0.
Since dist
0
,= dist
1
+ 1, p
0
declares itself leader:
leader
0
:= 0, father
0
:= and dist
0
:= 0.
Since leader
0
< leader
1
, p
0
makes p
1
its father:
leader
0
:= 2, father
0
:= 1 and dist
0
:= 2.
Since dist
1
,= dist
0
+ 1, p
1
declares itself leader:
leader
1
:= 1, father
1
:= and dist
1
:= 0.
Since leader
1
< leader
0
, p
1
makes p
0
its father:
leader
1
:= 2, father
1
:= 0 and dist
1
:= 3.
Et cetera
Afek-Kutten-Yung Election Algorithm - Complication
Processes can innitely often join a component of the created
subgraph with a false leader.
Example: Given two adjacent processes p
0
and p
1
.
leader
0
= leader
1
= 2; father
0
= 1 and father
1
= 0; dist
0
= dist
1
= 0.
Since dist
0
,= dist
1
+ 1, p
0
declares itself leader:
leader
0
:= 0, father
0
:= and dist
0
:= 0.
Since leader
0
< leader
1
, p
0
makes p
1
its father:
leader
0
:= 2, father
0
:= 1 and dist
0
:= 2.
Since dist
1
,= dist
0
+ 1, p
1
declares itself leader:
leader
1
:= 1, father
1
:= and dist
1
:= 0.
Since leader
1
< leader
0
, p
1
makes p
0
its father:
leader
1
:= 2, father
1
:= 0 and dist
1
:= 3.
Et cetera
Afek-Kutten-Yung Election Algorithm - Join Requests
Let leader
i
< leader
j
for some j Neigh
i
.
Before p
i
makes p
j
its father, rst it sends a join request to p
j
.
This request is forwarded through p
j
s component, toward the root
(if any) of this component.
The root sends back a grant toward p
i
, which travels the reverse
path of the request.
When p
i
receives this grant, it makes p
j
its father:
leader
i
:= leader
j
, father
i
:= j and dist
i
:= dist
j
+ 1.
If p
j
s component has no root, p
j
will never join this component.
Communication is performed using shared variables, so
join requests and grants are encoded in shared variables.
Afek-Kutten-Yung Election Algorithm - Join Requests
Let leader
i
< leader
j
for some j Neigh
i
.
Before p
i
makes p
j
its father, rst it sends a join request to p
j
.
This request is forwarded through p
j
s component, toward the root
(if any) of this component.
The root sends back a grant toward p
i
, which travels the reverse
path of the request.
When p
i
receives this grant, it makes p
j
its father:
leader
i
:= leader
j
, father
i
:= j and dist
i
:= dist
j
+ 1.
If p
j
s component has no root, p
j
will never join this component.
Communication is performed using shared variables, so
join requests and grants are encoded in shared variables.
Afek-Kutten-Yung Election Algorithm - Join Requests
A process can only be forwarding (and awaiting a grant for) at
most one request message at a time.
Join requests and grants between inconsistent nodes are not
forwarded.
Example: Given a ring with nodes u, v, w, and let x > u, v, w.
Initially, u and v consider themselves leader, while w considers u
its father and x the leader.
Since leader
w
> leader
v
, v sends a join req to w.
Without the aformentioned consistency check, w would forward
this join req to u. Since u considers itself leader, it would send
back an ack to v (via w), and v would make w its father.
Since leader
w
,= leader
u
, w would make itself leader.
Now we would have a symmetrical conguration to the initial one.
Afek-Kutten-Yung Election Algorithm - Join Requests
A process can only be forwarding (and awaiting a grant for) at
most one request message at a time.
Join requests and grants between inconsistent nodes are not
forwarded.
Example: Given a ring with nodes u, v, w, and let x > u, v, w.
Initially, u and v consider themselves leader, while w considers u
its father and x the leader.
Since leader
w
> leader
v
, v sends a join req to w.
Without the aformentioned consistency check, w would forward
this join req to u. Since u considers itself leader, it would send
back an ack to v (via w), and v would make w its father.
Since leader
w
,= leader
u
, w would make itself leader.
Now we would have a symmetrical conguration to the initial one.
Afek-Kutten-Yung Election Algorithm - Join Requests
A process can only be forwarding (and awaiting a grant for) at
most one request message at a time.
Join requests and grants between inconsistent nodes are not
forwarded.
Example: Given a ring with nodes u, v, w, and let x > u, v, w.
Initially, u and v consider themselves leader, while w considers u
its father and x the leader.
Since leader
w
> leader
v
, v sends a join req to w.
Without the aformentioned consistency check, w would forward
this join req to u. Since u considers itself leader, it would send
back an ack to v (via w), and v would make w its father.
Since leader
w
,= leader
u
, w would make itself leader.
Now we would have a symmetrical conguration to the initial one.
Afek-Kutten-Yung Election Algorithm - Example
Given two adjacent processes p
0
and p
1
.
leader
0
= leader
1
= 2; father
0
= 1 and father
1
= 0; dist
0
= dist
1
= 0.
Since dist
0
,= dist
1
+ 1, p
0
declares itself leader:
leader
0
:= 0, father
0
:= and dist
0
:= 0.
Since leader
0
< leader
1
, p
0
sends a join request to p
1
.
This join request does not immediately trigger a grant.
Since dist
1
,= dist
0
+ 1, p
1
declares itself leader:
leader
1
:= 1, father
1
:= and dist
1
:= 0.
Since p
1
is now a proper root, it grants the join request of p
0
,
which makes p
1
its father:
leader
0
:= 1, father
0
:= 1 and dist
0
:= 1.
Afek-Kutten-Yung Election Algorithm - Example
Given two adjacent processes p
0
and p
1
.
leader
0
= leader
1
= 2; father
0
= 1 and father
1
= 0; dist
0
= dist
1
= 0.
Since dist
0
,= dist
1
+ 1, p
0
declares itself leader:
leader
0
:= 0, father
0
:= and dist
0
:= 0.
Since leader
0
< leader
1
, p
0
sends a join request to p
1
.
This join request does not immediately trigger a grant.
Since dist
1
,= dist
0
+ 1, p
1
declares itself leader:
leader
1
:= 1, father
1
:= and dist
1
:= 0.
Since p
1
is now a proper root, it grants the join request of p
0
,
which makes p
1
its father:
leader
0
:= 1, father
0
:= 1 and dist
0
:= 1.
Afek-Kutten-Yung Election Algorithm - Example
Given two adjacent processes p
0
and p
1
.
leader
0
= leader
1
= 2; father
0
= 1 and father
1
= 0; dist
0
= dist
1
= 0.
Since dist
0
,= dist
1
+ 1, p
0
declares itself leader:
leader
0
:= 0, father
0
:= and dist
0
:= 0.
Since leader
0
< leader
1
, p
0
sends a join request to p
1
.
This join request does not immediately trigger a grant.
Since dist
1
,= dist
0
+ 1, p
1
declares itself leader:
leader
1
:= 1, father
1
:= and dist
1
:= 0.
Since p
1
is now a proper root, it grants the join request of p
0
,
which makes p
1
its father:
leader
0
:= 1, father
0
:= 1 and dist
0
:= 1.
Afek-Kutten-Yung Election Algorithm - Example
Given two adjacent processes p
0
and p
1
.
leader
0
= leader
1
= 2; father
0
= 1 and father
1
= 0; dist
0
= dist
1
= 0.
Since dist
0
,= dist
1
+ 1, p
0
declares itself leader:
leader
0
:= 0, father
0
:= and dist
0
:= 0.
Since leader
0
< leader
1
, p
0
sends a join request to p
1
.
This join request does not immediately trigger a grant.
Since dist
1
,= dist
0
+ 1, p
1
declares itself leader:
leader
1
:= 1, father
1
:= and dist
1
:= 0.
Since p
1
is now a proper root, it grants the join request of p
0
,
which makes p
1
its father:
leader
0
:= 1, father
0
:= 1 and dist
0
:= 1.
Afek-Kutten-Yung Election Algorithm - Example
Given two adjacent processes p
0
and p
1
.
leader
0
= leader
1
= 2; father
0
= 1 and father
1
= 0; dist
0
= dist
1
= 0.
Since dist
0
,= dist
1
+ 1, p
0
declares itself leader:
leader
0
:= 0, father
0
:= and dist
0
:= 0.
Since leader
0
< leader
1
, p
0
sends a join request to p
1
.
This join request does not immediately trigger a grant.
Since dist
1
,= dist
0
+ 1, p
1
declares itself leader:
leader
1
:= 1, father
1
:= and dist
1
:= 0.
Since p
1
is now a proper root, it grants the join request of p
0
,
which makes p
1
its father:
leader
0
:= 1, father
0
:= 1 and dist
0
:= 1.
Afek-Kutten-Yung Election Algorithm - Correctness
A subgraph in the network with a leader value j that is not an id of
any node in this subgraph, contains an inconsistency, so a process
in this subgraph will declare itself leader.
Each process can only nitely often (each time due to incorrect
initial register values) join a subgraph with a false leader.
Let i be the largest id of any process in the network.
p
i
will eventually declare itself leader.
After p
i
has declared itself leader, the algorithm will eventually
converge to a spanning tree with as root p
i
.
Afek-Kutten-Yung Election Algorithm - Correctness
A subgraph in the network with a leader value j that is not an id of
any node in this subgraph, contains an inconsistency, so a process
in this subgraph will declare itself leader.
Each process can only nitely often (each time due to incorrect
initial register values) join a subgraph with a false leader.
Let i be the largest id of any process in the network.
p
i
will eventually declare itself leader.
After p
i
has declared itself leader, the algorithm will eventually
converge to a spanning tree with as root p
i
.
Afek-Kutten-Yung Election Algorithm - Correctness
A subgraph in the network with a leader value j that is not an id of
any node in this subgraph, contains an inconsistency, so a process
in this subgraph will declare itself leader.
Each process can only nitely often (each time due to incorrect
initial register values) join a subgraph with a false leader.
Let i be the largest id of any process in the network.
p
i
will eventually declare itself leader.
After p
i
has declared itself leader, the algorithm will eventually
converge to a spanning tree with as root p
i
.
Garbage Collection
Processes are provided with memory, and root objects carry
references to (local or remote) heap objects.
Also heap objects can carry references to each other.
Processes can perform three operations related to references:
reference deletion.
Aim of garbage collection is to reclaim inaccessible heap objects.
Garbage Collection - Reference Counting
Reference counting is based on keeping track of the number of
references to an object. If it drops to zero, the object is garbage.
Drawbacks:
functional behavior
time constraints
resource requirements
Job are divided over processors, and are competing for resources.
A scheduler decides in which order jobs are performed on a
processor, and which resources they can claim.
Terminology
arrival time: when a job arrives at a processor
release time: when a job becomes available for execution
execution time: amount of processor time needed to perform the
job (assuming it executes alone and all resources are available)
absolute deadline: when a job is required to be completed
relative deadline: maximum allowed length of time from arrival
until completion of a job
hard deadline: late completion not allowed
soft deadline: late completion allowed
slack: available idle time of a job until the next deadline
A preemptive job can be suspended at any time of its execution
Out of scope
overrun management
performance
migration of jobs
execution time e
For simplicity we assume that the relative deadline of each periodic
job is equal to its period.
Periodic Tasks - Example
T
1
= (1, 2, 1) and T
2
= (0, 3, 1).
1 2 3 4 5 6 0
The hyperperiod is 6.
The conict at time 3 must be resolved by some scheduler.
Job Queues at a Processor
We focus on individual aperiodic and sporadic jobs.
processor
periodic tasks
aperiodic jobs
sporadic jobs
acceptance test
accept
reject
Sporadic jobs are only accepted when they can be completed in time.
Aperiodic jobs are always accepted, and performed such that
periodic and accepted sporadic jobs do not miss their deadlines.
The queueing discipline of aperiodic jobs tries to minimize e.g.
average tardiness (completion time monus deadline), or the
number of missed soft deadlines.
Utilization
Utilization of a periodic task (r , p, e) is
e
p
.
Utilization of a processor is the sum of utilizations of its periodic tasks.
Assumptions: Jobs preemptive, no resource competition.
Theorem: Utilization of a processor is 1 if and only if scheduling
its periodic tasks is feasible.
Scheduler
The scheduler of a processor schedules and allocates resources to
jobs (according to some scheduling algorithms and resource access
control algorithms).
A schedule is valid if:
jitter
system modications
nondeterminism
Priority-Driven Scheduling
On-line scheduling: The schedule is computed at run-time.
Scheduling decisions are taken when:
deadline (EDF)
slack (LST)
We will focus on EDF scheduling.
Periodic tasks and jobs are assumed to be preemptive.
RM Scheduler
Rate Monotonic: Shorter period gives a higher priority.
Advantage: Priority on the level of tasks makes RM easier to
analyze than EDF/LST.
Example: Non-optimality of the RM scheduler
(one processor, preemptive jobs, no competition for resources).
Let T
1
= (0, 4, 2) and T
2
= (0, 6, 3).
2 4 6 0 8 10 12
RM
EDF / LST
Remark: If for periods p < p
, p is always a divisor of p
, then the
RM scheduler is optimal.
EDF Scheduler
Earliest Deadline First: The earlier the deadline, the higher the
priority.
Theorem: Given one processor, and preemptive jobs. When jobs do
not compete for resources, the EDF scheduler is optimal.
Example: Non-optimality in case of non-preemption.
0 1 2 3 4
J
J
J
J
r d d r
non-EDF
EDF
1 2 2 1
1
1 2
2
EDF Scheduler
Example: Non-optimality in case of resource competition.
Let J
1
and J
2
both require resource R.
0 1 2 3 4
J
J
J
J
r d d r
non-EDF
EDF
1 2 2 1
1
1 2
2
EDF Scheduler - Drawbacks
Dynamic priority of periodic tasks makes it dicult to analyze
which deadlines are met in case of overloads.
Late jobs can cause other jobs to miss their deadlines
(good overrun management is needed).
LST Scheduler
Least Slack Time rst: less slack gives a job higher priority.
With the LST scheduler, priorities of jobs change dynamically.
Remark: Continuous scheduling decisions would lead to context
switch overhead in case of two jobs with the same slack.
Theorem: Given one processor, and preemptive jobs. When jobs do
not compete for resources, the LST scheduler is optimal.
Drawback of LST: computationally expensive
Scheduling Anomaly
Let jobs be non-preemptive. Then shorter execution times can lead
to violation of deadlines.
Example: Consider the EDF (or LST) scheduler.
1 2 3 4 5
d
1
r
3
r
2
r
1
d
3
d
2
0
J J J
1 2 3
J
2 3
J
1
J
If jobs are preemptive, and there is no competition for resources,
then there is no scheduling anomaly.
Scheduling Aperiodic Jobs
(For the moment, we ignore sporadic jobs.)
Background: Aperiodic jobs are only scheduled in idle time.
Drawback: Needless delay of aperiodic jobs.
Slack stealing: Periodic tasks and accepted sporadic jobs may be
interrupted if there is sucient slack.
Example: T
1
= (0, 2,
1
2
) and T
2
= (0, 3,
1
2
). Aperiodic jobs
available in [0, 6].
0
0.5
1
1.5
2 3 4 5 6 1 0
Drawback: Dicult to compute in case of jitter.
Scheduling Aperiodic Jobs
(For the moment, we ignore sporadic jobs.)
Background: Aperiodic jobs are only scheduled in idle time.
Drawback: Needless delay of aperiodic jobs.
Slack stealing: Periodic tasks and accepted sporadic jobs may be
interrupted if there is sucient slack.
Example: T
1
= (0, 2,
1
2
) and T
2
= (0, 3,
1
2
). Aperiodic jobs
available in [0, 6].
0
0.5
1
1.5
2 3 4 5 6 1 0
Drawback: Dicult to compute in case of jitter.
Polling Server
Given a period p
s
, and an execution time e
s
for aperiodic jobs in
such a period.
At the start of a new period, the rst e
s
time units can be used to
execute aperiodic jobs.
Consider periodic tasks T
k
= (r
k
, p
k
, e
k
) for k = 1, . . . , n.
The polling server works if
n
k=1
e
k
p
k
+
e
s
p
s
1
Drawback: Aperiodic jobs released just after a polling may be
delayed needlessly.
Deferrable Server
Allows a polling server to save its execution time within a period
p
s
(but not after this period!) if the aperiodic queue is empty.
The EDF scheduler treats the deadline of a deferrable server at the
end of a period p
s
as a hard deadline.
Remark:
n
k=1
e
k
p
k
+
e
s
p
s
1 does not guarantee that periodic jobs
meet their deadlines.
Example: T = (2, 5, 3
1
3
) and p
s
= 3, e
s
= 1. An aperiodic job with
e = 2 arrives at 2.
T misses its deadline at 7
0 1 2 3 4 5 6 7
Drawbacks: Partial use of available bandwidth.
Dicult to determine good values for p
s
and e
s
.
Total Bandwidth Server
Fix an allowed utilization rate u
s
for the server, such that
n
k=1
e
k
p
k
+ u
s
1
When the aperiodic queue is non-empty, a deadline d is
determined for the head of the queue, according to the rules below.
Let the head of the aperiodic queue have execution time e.
When, at a time t, either a job arrives at the empty aperiodic
queue, or an aperiodic job completes and the tail of the aperiodic
queue is non-empty:
d := max(d, t) +
e
u
s
Initially, d = 0.
Total Bandwidth Server
Aperiodic jobs can now be treated in the same way as periodic
jobs, by the EDF scheduler.
In the absence of sporadic jobs, aperiodic jobs meet the deadlines
assigned to them (which may dier from their actual soft deadlines).
Example: T
1
= (0, 2, 1) and T
2
= (0, 3, 1). We x u
s
=
1
6
.
1 2
r r r
3 4 5 6 7 8 9 10 11 12 0
A, released at 1 with e =
1
2
, gets (at 1) deadline 1 + 3 = 4.
A
, released at 2 with e
=
2
3
, gets (at 2
1
2
) deadline 4 + 4 = 8.
A
, released at 3 with e
=
1
3
, gets (at 6
1
3
) deadline 8 + 2 = 10.
Acceptance Test for Sporadic Jobs
A sporadic job with deadline d and execution time e is accepted at
time t if utilization (of the periodic and accepted sporadic jobs) in
the time interval [t, d] is never more than 1
e
dt
.
If accepted, utilization in [t, d] is increased with
e
dt
.
Example: Periodic task T = (0, 2, 1).
Sporadic job with r = 1, e = 2 and d = 6 is accepted.
Utilization in [1, 6] is increased to
9
10
.
Sporadic job with r = 2, e = 2 and d = 20 is rejected (although it
could be scheduled).
Sporadic job with r = 3, e = 1 and d = 13 is accepted.
Utilization in [3, 6] is increased to 1, and utilization in [6, 13] to
3
5
.
Acceptance Test for Sporadic Jobs
A sporadic job with deadline d and execution time e is accepted at
time t if utilization (of the periodic and accepted sporadic jobs) in
the time interval [t, d] is never more than 1
e
dt
.
If accepted, utilization in [t, d] is increased with
e
dt
.
Example: Periodic task T = (0, 2, 1).
Sporadic job with r = 1, e = 2 and d = 6 is accepted.
Utilization in [1, 6] is increased to
9
10
.
Sporadic job with r = 2, e = 2 and d = 20 is rejected (although it
could be scheduled).
Sporadic job with r = 3, e = 1 and d = 13 is accepted.
Utilization in [3, 6] is increased to 1, and utilization in [6, 13] to
3
5
.
Acceptance Test for Sporadic Jobs
Periodic and accepted sporadic jobs are scheduled by the EDF
scheduler.
The acceptance test may reject schedulable sporadic jobs.
The total bandwidth server can be integrated with an acceptance
test for sporadic jobs (e.g. by making the allowed utilization rate
u
s
dynamic).
Acceptance Test for Sporadic Jobs
Periodic and accepted sporadic jobs are scheduled by the EDF
scheduler.
The acceptance test may reject schedulable sporadic jobs.
The total bandwidth server can be integrated with an acceptance
test for sporadic jobs (e.g. by making the allowed utilization rate
u
s
dynamic).
Remote Access Control Algorithms
Resource units can be requested by jobs during their execution,
and are allocated to jobs in a mutually exclusive fashion.
When a requested resource is refused, the job is preempted (blocked).
Resource sharing gives rise to scheduling anomaly.
Remote Access Control Algorithms
Dangers of resource sharing:
(1) Deadlock can occur.
Example: J
1
> J
2
. 1 2
2 1 2
J
1
requires green
requires red; deadlock
2
r r
J J J J
(2) A job J can be blocked by lower-priority jobs.
Example: J > J
1
> > J
k
, and J, J
k
require the red resource.
J
k
J
k
J J J
done done done done done
1 2 k1
J
2
J
k1
r
k1
r
k2
r r
1
r
2
r
k
J
Question
How can we avoid blocking by lower-priority jobs?
Priority Inheritance
When a job J requires a resource R and becomes blocked because
a lower-priority job J
holds R, then J