0% found this document useful (0 votes)
42 views12 pages

Distributed State Space Minimization

This document describes a new distributed algorithm and implementation for reducing labeled transition systems modulo strong bisimulation. The algorithm combines the naive method of Kanellakis-Smolka, which is well-suited for parallelization, with optimizations from the theoretically faster Kanellakis-Smolka algorithm for bounded fanout. The distributed implementation improves on previous work by better overlapping communication and computation. Experimental results are presented for sequential and distributed prototype tools implementing the algorithm.

Uploaded by

Nacer Nacer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views12 pages

Distributed State Space Minimization

This document describes a new distributed algorithm and implementation for reducing labeled transition systems modulo strong bisimulation. The algorithm combines the naive method of Kanellakis-Smolka, which is well-suited for parallelization, with optimizations from the theoretically faster Kanellakis-Smolka algorithm for bounded fanout. The distributed implementation improves on previous work by better overlapping communication and computation. Experimental results are presented for sequential and distributed prototype tools implementing the algorithm.

Uploaded by

Nacer Nacer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Software Tools for Technology Transfer manuscript No.

(will be inserted by the editor)

Distributed State Space Minimization


Stefan Blom1 , Simona Orzan2 ?
1
CWI, The Netherlands, e-mail: [email protected]
2
Eindhoven University of Technology, The Netherlands, e-mail: [email protected]

The date of receipt and acceptance will be inserted by the editor

Abstract We present a new algorithm, and its distributed im- then full-generation tools have an important advantage: af-
plementation, for reducing labeled transition systems mod- ter being generated, a state space can be reduced modulo
ulo strong bisimulation. The base of this algorithm is the an equivalence that preserves the properties to be checked.
Kanellakis-Smolka ’naive method’, which has a high theo- The reduction can considerably reduce the size of the state
retical complexity, but is succesful in practice and well suited space that needs to be verified. This is particularly impor-
to parallelization. This basic approach is combined with op- tant in cases where the original state space is too big to be
timizations inspired by the Kanellakis-Smolka algorithm for verified on a single machine, like in two recent case studies
the case of bounded fanout, which has the best known time with the µCRL toolset: a cache coherence protocol [20] and
complexity. The distributed implementation is improved with a model of JavaSpaces [21]. In both cases, verification on a
respect to previous attempts by a better overlap between com- single machine was possible for the reduced state space only.
munication and computation, which results in an efficient us- Moreover, state space reduction could only be performed us-
age of both memory and processing power. We also discuss ing the distributed tool.
the time complexity of this algorithm and show experimental This paper proposes a new distributed solution for the
results with sequential and distributed prototype tools. problem of state space reduction modulo strong bisimulation
equivalence. This is a widely used system equivalence which
preserves all properties expressible as µ-calculus formulas.
We choose clusters of workstations as target architecture
because it is the most common environment able to offer
1 Introduction the memory and processing power required by model check-
ing industrial applications. So, we are interested in message-
There is currently a lot of interest in building distributed model passing algorithms that can handle very large problem in-
checking tools, both symbolic [11,2] and enumerative [24, stances on a comparatively small number of processors and
16,15,1]. The symbolic tools manipulate a compressed repre- that work well for the specific type of labeled transition sys-
sentation of the state space. The enumerative tools explicitly tems representing state spaces. State spaces have bounded
compute all the states and transitions of the state space and fanout and usually a small depth. The states are distributed
can be sub-divided into on-the-fly and full-generation. An on- evenly among the network nodes (workers), and the transi-
the-fly tool will compute the transitions leading from a state tions are managed by the worker that owns their initial states.
on demand, while it is checking a property. A full-generation
tool will first compute the whole state space and only then History The strong bisimulation reduction is found in the lit-
start checking the property. The main advantage of on-the-fly erature under the name multiple relational coarsest partition
tools is that if the property can be proved or dis-proved ex- problem (MRCPP): given a set S and a number of relations
ploring only a small part of the state-space, the unnecessary on S, ρ1 · · · ρr , find a partition of S into subsets S1 · · · Ss
generation of the rest of the state space is avoided. However, such that for any two subsets Si , Sj and any relation ρk ,
if proving a property requires visiting the whole state space either all or none of the elements of Si are in the relation
? Partially supported by PROGRESS, the embedded systems research
ρk with an element of Sj . Moreover, the coarsest partition
with this property is required, that is the one with the least
program of the Dutch organisation for Scientific Research NWO, the Dutch
Ministry of Economic Affairs and the Technology Foundation STW, grant number of subsets. MRCPP is an immediate generalization
CES.5009. of the relational coarsest partition problem (RCPP), which
2 Stefan Blom, Simona Orzan: Distributed State Space Minimization

treats the case when the set of relations is a singleton. Let tures of marked states as follows: if only some of the states in
N , M the sizes of the set S and of the relation, respectively. a block are marked then the signatures of the marked states
The most well known solutions for RCPP are the O(M N ) all get new IDs and the unmarked states keep their old ID; if
one proposed by Kanellakis and Smolka [14] and the later all states in a block are marked then the old ID is reused for
O(M log N ) one by Paige and Tarjan [19]. [14] also contains the signature which occurs most often and new IDs are as-
an O(N log N ) solution for the restricted case of bounded signed to the others. This is similar to the strategy used in the
fanout. Long before these, Hopcroft described an O(N log N ) Hopcroft or the Paige-Tarjan algorithm, which always split
algorithm for the deterministic case [12], when the relation is with respect to the smallest block. The algorithm terminates
a function. if there are no more changes.
A close variation of the optimized algorithm achieves a
Towards a good distributed solution The main challenge in time complexity of O(N log N ). Namely, this is the case if
building good distributed tools is dividing the computation the old ID is always given to the most often occurring sig-
in such a way that communication is triggered rather infre- nature, not only when all states of the old block are unsta-
quently, but in the same time avoiding large idle times. Ide- ble. However, to implement this, all the states belonging to a
ally, workers have equal computation loads and they rarely given block should be easily retrievable and this is not trivial
need access to remote data. In our case, workers should be to arrange in a distributed setting.
able to compute as much as possible from the states and tran-
sitions that they own. Related work Parallel versions of Kanellakis-Smolka and
All the algorithms above are based on partition refinement Paige-Tarjan have been proposed [25,22], with time com-
M
and they vary in how the refinement step is defined. In the plexities O(N 1+ ) using N  CREW PRAM processors (for

naive algorithm of [14], the refinement consists in putting in any fixed  < 1), and O(N log N ) with O( M N ) CREW PRAM
different blocks any two states that can be distinguished with processors, respectively. These algorithms are designed for
respect to the current partition. To distinguish states, it suf- shared memory machines and they are difficult to translate ef-
fices to compare the set of transitions going out, i.e., the set of ficiently to a distributed memory setting. It would however be
pairs (label, destination block). This ensures an independent interesting to see how they work on a virtual shared memory.
treatment of the states, very suitable for parallelization. In the We expect that the latency of the shared memory simulation
other, theoretically better, algorithms, it is essential that the would seriously affect their performance.
states in the same block could be easily retrieved. To achieve There exist on-the-fly algorithms for bisimilarity check-
this, extra administration is necessary (like sorting), which ing, both sequential [18] and distributed [13]. They are based
can be very expensive in a distributed setting. on solving boolean equation systems and can be used to com-
A distributed implementation of the Kanellakis-Smolka pare two state spaces with respect to an equivalence notion,
naive algorithm has been presented in [5]. However, the ques- while generating them. Our problem is rather to find the equiv-
tion remains how to distribute one of the theoretically better alence classes of a given state space, which is quite differ-
algorithms, or at least use some of their tricks. Therefore in ent and cannot be immediately solved by these algorithms. In
this paper we propose a new algorithm that keeps the sim- fact, we are not aware of any algorithm that attempts to solve
plicity and symmetry of the naive algorithm, while employ- on-the-fly bisimilarity reduction.
ing some optimizations similar to those used in the bounded
fanout Kanellakis-Smolka [14] or Paige-Tarjan [19]. We will Overview The next section introduces some definitions and
refer to the new algorithm as “optimized”. formalizes the problem of reduction modulo strong bisimu-
lation equivalence. In section 3 the optimization of the naive
Naive versus optimized In our implementations, a unique ID algorithm by using a marking procedure is discussed and it is
(an integer) is assigned to each block and partitions are repre- justified that the (sequential) algorithm thus obtained is still
sented as arrays of IDs. The signature of a state x with respect correct. Also the theoretical complexity of the new algorithm
to a partition is a set of pairs of labels and IDs, such that a pair is discussed. Then, in section 4, the distributed implemen-
(a, id) is in the set if and only if there is a transition with the tation of this new optimized algorithm is briefly explained.
label a from the state x to another state belonging to the block Some performance data is presented in section 5 and some
with the ID id. Two states are distinguishable with respect to concluding remarks in section 6.
a partition if they have different signatures with respect to that
partition.
The naive algorithm [5] computes the signatures of all 2 Bisimilarity checking, bisimulation minimization and
states in every iteration and randomly assigns IDs to each sig- the Relational Coarsest Partition Problem
nature. It terminates when the number of signatures becomes
stable. Let Act be a fixed set of labels, representing actions. A la-
The optimized algorithm doesn’t recompute the signa- beled transition system (LT S) is a triple (S, T, s0 ) consisting
tures on each iteration. Instead, it modifies the old signatures. of a set of states S, a set of transitions T ⊆ S × Act × S and
While this recomputation goes on, the states with modified an initial state s0 ∈ S. When T is understood, we will use the
a
signatures are marked. Next, we assign new IDs to the signa- notation p − → q for (p, a, q) ∈ T .
Stefan Blom, Simona Orzan: Distributed State Space Minimization 3

Definition 1. (strong bisimulation)


Let (S, T, s0 ) be an LT S. A binary relation R ⊆ S × S is a
strong bisimulation if for all p, q ∈ S such that p R q: 1. E := 1; c(0) := |S|; U := S;
2. for all x ∈ S
a a
– if p −
→ p0 then ∃q 0 ∈S:q−→ q 0 ∧ p0 R q 0 and ID(x) := 0 ; sig(x) := ∅;
a a
– if q −
→ q 0 then ∃p0 ∈S :p−
→ p 0 ∧ p0 R q 0 3. while U 6= ∅
4. for all x ∈ U
If a strong bisimulation R exists, such that p R q, then we say sig(x) := {(a, ID(y)) | x −a
→ y};
that p and q are bisimilar states. (S 1 , T 1 , s10 ) and (S 2 , T 2 , s20 ), 5. Reusable := {i | 0 ≤ i < E
with S 1 ∩ S 2 = ∅, are bisimilar labeled transition systems if ∧ c(i) = |U ∩ {x | ID(x) = i}|}
their initial states s1 and s2 are bisimilar in the compound 6. ST := ∅; νU := ∅;
LT S (S 1 ∪ S 2 , T 1 ∪ T 2 , s10 ). 7. for all x ∈ U
8. oid := ID(x);
The problem that we focus on, bisimulation minimization, 9. if (sig(x), i) ∈ ST
is to find the equivalence classes of the largest strong bisimu- then
lation on the states of a given LT S. Or, in other words, given 10. ID(x) := i
an LT S, find the LT S that is strongly bisimilar to it and has else
the minimal number of states. 11. if ID(x) ∈/ Reusable
A related problem is that of bisimilarity checking: given then
an LT S S = (S, T, s0 ) and two states p, q ∈ S, decide 12. c(E) := 0;
whether p and q are strongly bisimilar. This problem reduces 13. ID(x) := E;
to the minimization problem, since it suffices to check whether 14. E := E + 1;
else
p and q are in the same equivalence class of the largest bisim-
15. Reusable := Reusable − {ID(x)};
ulation relation definable on the states of S. The way of de- 16. ST := ST ∪ {(sig(x), ID(x))};
ciding whether two transition systems represent the same be- 17. if oid 6= ID(x)
havior is to apply a bisimulation minimization algorithm to then
the compound LT S (S 1 ∪ S 2 , T 1 ∪ T 2 , s10 ) and see whether 18. νU := νU ∪ {y ∈ S | y − a
→ x}
s10 and s20 end up in the same class. 19. c(oid) := c(oid) − 1;
The bisimilarity minimization problem is equivalent to 20. c(ID(x)) := c(ID(x)) + 1;
the relational coarsest partition problem. For an LT S (S, T, s0 ), 21. U := νU;
a partition of the elements of S is a set of disjoint blocks 22. for all x ∈ S IDf (x) := ID(x);
{Bi | i ∈ I} s.t. ∪i∈I Bi = S. An equivalence relation can be
represented as a partition with a block for every equivalence
class. The relational coarsest partition problem is to find, for Figure 1. ( OSBR ) The optimized algorithm
a given LT S and a given initial partition π0 of S, a partition
π s.t.:
- π is a refinement of π0
- ∀p, q ∈ B ∈ π : ∀a ∈ Act : ∀B 0 ∈ π :
∃p0 ∈ B 0 : (p, a, p0 ) ∈ T
⇐⇒ ∃q 0 ∈ B 0 (q, a, q 0 ) ∈ T Two states are distinguishable with respect to a partition if
- π has the fewest blocks among partitions satisfying the and only if they have different signatures with respect to that
above two conditions. partition. A partition π is called stable if every two states of
(A partition π 0 is a refinement of π if every block of π 0 is every block in that partition are undistinguishable, i.e. have
contained in a block of π: ∀C ∈ π 0 : ∃B ∈ π : C ⊆ B). the same signature with respect to π.
The algorithm discussed in this paper solves the bisimula-
tion minimization problem by solving the Relational Coarsest
While performing successive refinements, the signature-
Partition Problem with π0 = {S}.
based bisimulation minimization algorithm keeps the states
with the same signature in the same block. The correctness
3 The optimized algorithm of this method follows from two facts. First, a stable partition
is a bisimulation relation (states are equivalent if they are in
the same block). Second, bisimilar states have the same sig-
In [5], an algorithm is presented that uses the set of all outgo- nature with respect to every computed refinement. Hence, the
ing transitions (signatures) as criteria to distinguish states, as computed bisimulation relation is the coarsest one.
opposed to theoretically more efficient algorithms that reason
in terms of blocks. The signature of a state s with respect to a
partition π is the set of all transitions going from s to blocks We indicate the current partition by a function ID : S −→
of π: N at that assigns block identifiers to states. The naive algo-
rithm proceeds roughly as follows (E is the number of equiv-
a
sigπ (s) = {(a, B) | s −
→ s0 and s0 ∈ B ∈ π} alence classes in the current partition, oE is the number of
4 Stefan Blom, Simona Orzan: Distributed State Space Minimization

equivalence classes in the previous partition):


∀x ∈ S ID(x) := 0; E := 1; oE := 0;
while E 6= oE
∀x compute sig(x)
assign new IDs to states, s.t.
ID(x) = ID(y) ⇐⇒ sig(x) = sig(y)
oE := E; E := no. of new IDs used
In this scheme, all signatures are recomputed in every iter-
ation, which can be an unnecessary and costly effort in the
case of large input LT Ss with a structure that needs a lot of
iterations to stabilize and where very few partition blocks can
be split per iteration (very few signatures actually change).
The main idea of our new optimized approach is to mark,
in every iteration, those states that might have suffered a sig-
nature change, i.e. the states that have an outgoing transition
to a state whose ID changed in the current iteration. Then,
in the next iteration, only the signatures of the marked states
need to be recomputed, since the other signatures do not change.
Moreover the unmarked states do not even need to be checked
in the next refinement step. They just keep their old block
identifier.
We will refer to the marked states as unstable. Note that,
unlike other algorithms, that mark whole blocks (like bounded
fanout Kanellakis-Smolka), we insist on reasoning about un-
stable states and not requesting that the states belonging to
the same block should be easily retrievable. Extra attention
has to be paid to ensure the correctness of the splitting proce-
dure, but it pays off, since the ability to work directly on states
provides parallel/distributed workers with a high(er) degree
of independence.
Figure 2 shows an example of how the bounded fanout
Kanellakis-Smolka and the optimized algorithms refine the
same state space with transitions labeled a (the thin ones)
and b (the thick ones). Since Kanellakis-Smolka is actually
defined for a single relation, we considered a straightforward
labels treatment: refine alternatively with respect to a and b,
until the partition stabilizes. Note the difference in the aggres-
sivity of splitting and, consequently, in the number of itera-
tions.
The optimized algorithm OSBR is presented in Figure 1
and uses the following notations and data structures:
- U, νU - the unstable states set for the current and the next
iteration, respectively
- E - the number of blocks in the current partition. Through-
out the algorithm, the invariant is kept that the blocks of the
current partition are numbered {0 . . . E − 1}.
- c : {0 . . . E − 1} −→ N at - the number of states in each
block
- Reusable - the set of block identifiers that can be reused
in the next iteration, since all the states belonging to those Figure 2. A refinement example, as performed by the Kanellakis-Smolka
blocks are marked unstable. These identifiers should be reused bounded fanout algorithm (left) and our optimized algorithm (right). The
in order to preserve the above mentioned invariant. Moreover, two types of transitions represent two different labels. The unfilled circles
the identifier of every block should be reused for one of its represent the unstable states at the beginning of each iteration.

own sub-blocks, to ensure termination of the iterations series.


- ST - a signatures hashtable used to map the signatures of
Stefan Blom, Simona Orzan: Distributed State Space Minimization 5

the current iteration to new IDs (the block identifiers of the the steps sequence 7 − 20 does not regard x, that IDn (x) =
next iteration). IDn−1 (x) = i.
IDf is the final partition, the blocks of which represent the - En−1 ≤ i < En . This means that i is “created” in the
states of the minimized LTS. The termination and correctness steps 12 − 14 as identifier for a new block. In step 13 the IDn
of OSBR follow from a few simple properties listed below. of the first state of this block is explicitly defined as being i.
3 . Let us consider an iteration n that satisfies ∀x ∈
Lemma 1. Let U n , sig n , IDn , En denote the set of unstable S IDn (x) = IDn−1 (x). This means that in the iteration n−1,
states, the signatures mapping, the ID mapping, and the num- the condition in the step 17, that compares exactly IDn (x)
ber of equivalence classes at the beginning of the n-th itera- and IDn−1 (x) was never satisfied, thus νU remains ∅, that is
tion of the optimized algorithm (i.e. before the n-th execution U n = ∅. The inverse is also true: if U n = ∅ then νU ended
of step 3) — the count starts at 0. The following properties up empty in the previous iteration. This could only happen if
hold,for any n ≥ 0: the condition on line 17 was never met, that is the value of ID
1. (∀x ∈ S) IDn (x) < En . was not changed for any state. Formally, ∀x ∈ S IDn (x) =
(The blocks of the current partition are numbered IDn−1 (x).
{0 · · · E − 1}.) 4 . We prove this by induction on n ≥ 0. The case
2. (∀0 ≤ i < En−1 ) ∃x ∈ S s.t. IDn (x) = IDn−1 (x) = i. n = 0 follows from the fact that (∀x) sig 0 (x) = 0 and
(∀0 ≤ i < En ) ∃x ∈ S s.t. IDn (x) = i. (∀x) ID0 (x) = 0. To prove the first half of the invariant for
(Every identifier in the set {0 · · · E − 1} is really used. an arbitrary n, we consider three cases:
Old blocks pass their identifiers to subblocks of their own.) - x, y ∈ U n−1 . In this case, both sig n (x) and sig n (y) are
3. (∀x ∈ S IDn (x) = IDn−1 (x)) iff U n = ∅ inserted in the hashtable ST, which ensures the same ID value
(A partition is final iff the unstable set becomes empty) for (and only for) equal signatures.
4. (∀x, y ∈ S) - x, y ∈/ U n−1 . Then the sigs and IDs do not change, i.e.
IDn (x) = IDn (y) iff sig n (x) = sig n (y) and sig n (x) = sig n−1 (x), IDn (x) = IDn−1 (x) and sig n (y) =
sig n (x) 6= sig n (y) =⇒ sig n+1 (x) 6= sig n+1 (y). sig n−1 (y), IDn (y) = IDn−1 (y). From the induction hypoth-
(The block identifiers are given correctly. esis, it follows that sig n (x) = sig n (y) iff IDn (x) = IDn (y).
The newer partitions are refinements of the older.) - x ∈ U n−1 and y ∈/ U n−1 . Then there must be a state
5. En−1 ≤ En . z that has caused the instability of x, i.e. there is a transi-
a
En = En−1 iff (∀x ∈ S IDn (x) = IDn−1 (x)). tion x − → z with IDn−1 (z) 6= IDn−2 (z). Then IDn−1 (z) =
n−2
(The number of blocks is increasing with the refinements.. i ≥ E , therefore the pair (a, i) cannot be in sig n−1 (y).
until it stops because the final partition is reached.) And since sig n (x) is recomputed and sig n (y) not, it follows
that sig n (x) 6= sig n (y). It remains to prove that IDn (x) 6=
Proof. The properties 1, 3 and 4 will be proved indepen- IDn (y) as well. Let us first notice that
dently, by induction on n. Property 2 relies on 4 and prop-
if IDn (x) 6= IDn−1 (x) then IDn (x) ≥ En−1 . (1)
erty 5 relies on 1 and 2.
1 . (∀x ∈ S) ID0 (x) = 0 < 1 = E0 . In every iteration, If sig n−1 (x) = sig n−1 (y) = i, then i ∈/ Reusable (since
the only place where fresh values are introduced for ID is step y ∈/ U n−1 ), thus IDn (x) 6= i, thus (1) IDn (x) ≥ En−1 , while
13 (in step 10 an old value is used). But E is also immediately IDn (y) = IDn−1 (y) < En−1 . If, on the contrary, sig n−1 (x) 6=
increased (step 14), therefore the invariant stays true. sig n−1 (y), then IDn−1 (x) 6= IDn−1 (y) (induction hypothe-
2 . For n = 0 the property is obviously true, since sis). IDn (x) is computed in the fragment 7 − 20 and the out-
ID0 (x) = 0 for all states x. Suppose it is true for En−1 , come can be IDn (x) = IDn−1 (x) 6= IDn−1 (y) = IDn (y) or
IDn−1 and let us look at how En and IDn are computed, in IDn (x) 6= IDn−1 (x). In the latter case, IDn (x) ≥ En−1 (1),
the n − 1th iteration. First, the set Reusable is constructed while IDn (y) = IDn−1 (y) < En−1 .
(step 5), containing the identifiers of the blocks whose states And now we prove the second half of the property. Let
are all marked unstable. Let i be any identifier 0 ≤ i < En . x and y be two states for which sig n (x) 6= sig n (y). Then
We distinguish three cases: (w.l.o.g.) there is some pair (a, IDn−1 (z)) ∈ sig n (x) and
- i ∈ Reusable. Then all states x with IDn−1 (x) = i (ac- / sig n (y). If sig n (y) does not contain any pair (a, j) then

a
cording to the induction hypothesis, there is at least one) must clearly sig n+1 (x) 6= sig n+1 (y). Otherwise, let y −→ t be any
n−1
be in U n−1 . Let y be the first of these states that is handled of the a-transitions from y. Then (a, ID (t)) ∈ sig n (y)
in the step 7 of the algorithm. sig n (y) cannot be already in and IDn−1 (t) 6= IDn−1 (z), which means (induction hypoth-
ST, since this would mean that there exist a state z from an- esis) that sig n−1 (t) 6= sig n−1 (z) and, further, sig n (t) 6=
other block (IDn−1 (z) 6= IDn−1 (y)) with the same signa- sig n (z). Above we have proved that this is equivalent to IDn (z) 6=
ture (sig n−1 (z) = sig n−1 (y)), which contradicts point 4 of IDn (t). Thus, sig n+1 (x) contains the pair (a, IDn (z)) and
this lemma. Therefore, y will only be affected by steps 15 sig n+1 (y) does not.
and 16, that do not modify the value of ID. Thus, IDn (y) = 5 . From the points 1 and 2 of this lemma it follows that
IDn−1 (y) = i. (∀n) En is exactly the number of different values for IDn .
- i ∈/ Reusable ∧ i < En−1 . Then there must be a state x Therefore, if ∀x ∈ S IDn (x) = IDn−1 (x) then obviously
for which IDn−1 (x) = i and x ∈/ U n−1 . It follows, since En = En−1 .
6 Stefan Blom, Simona Orzan: Distributed State Space Minimization

We will now prove the inverse statement. Let n be so that


En = En−1 and suppose there exist an x ∈ S with IDn (x) =
i 6= IDn−1 (x). The property 2 says that there exists y ∈ S ∀x ∈ S ID(x) := 0; E := 1; oE := 0; U := S;
such that IDn (y) = IDn−1 (y) = i. But this would mean while E 6= oE
IDn (x) = IDn (y) and IDn−1 (x) 6= IDn−1 (y), which comes ∀B ∈ π s.t. B ∩ U 6= ∅
in contradiction with property 4. t
u − ∀x ∈ B ∩ U compute sig(x)
− split B ∩ Uinto B1 · · · Bp
− assign the old identifier of B
to the largest block in B − U, B1 · · · Bp
Theorem 1. (termination and correctness of OSBR ) For
any LT S (S, T, s0 ), OSBR terminates and the equivalence − assign new IDs to the other new blocks
relation ∼ determined by IDf (x ∼ y ⇐⇒ IDf (x) = − E := E; E := no. of new IDs used
IDf (y)) is the largest strong bisimulation on S.
Proof. It is easy to see that for any iteration n > 0, En ≥
En−1 . It is also clear that En > En−1 can happen only finitely Figure 3. A O(N log N ) algorithm (not implemented)
often, since from the points 1,2 of Lemma 1 follows that
∀n En ≤ |S|. Hence eventually En = En−1 and then the
algorithm stops (3,5 of Lemma 1 and the exit condition of the complexity. We show by Example 2 that this bound is strict
loop at step 3). This proves termination. as well.
We will now justify that ∼ is a strong bisimulation on S.
Let nf be the number of the last iteration, that is IDf = IDnf . Theorem 2. (complexity) For any LT S (S, T, s0 ) with boun-
a
Let x and y be any two equivalent states and let x − → z be ded fanout the algorithm in Figure 3 terminates in O(N log N )
any transition from x. To prove that ∼ is a strong bisimu- steps, where N is the size of S.
a
lation, we have to prove that there exists a transition y − →
nf nf
t with t ∼ z. From ID (x) = ID (y) and property 4 Proof. When a state changes its ID, the size of its new block
of Lemma 1 it follows that sig nf (x) = sig nf (y). Then, is at most half of the size of its old block. Otherwise, the old
since (a, IDnf −1 (z)) must be in sig nf , there is a state t with ID would be kept. This means that any state changes at most
IDnf −1 (t) = IDnf −1 (z) and (a, IDnf −1 (t)) ∈ sig nf (y). But log N times.
nf is the final iteration, thus U nf = ∅, that implies (Lemma Let (s, t) be any transition. Since t can change its ID
1, property 3) IDnf (t) = IDnf −1 (t) and similarly for z. Thus, at most log N times, this transition will be used for an up-
IDnf (t) = IDnf (z), or in other words t ∼ z. date request at most log N times. That is, the maximal num-
Finally, to prove that ∼ is the coarsest strong bisimula- ber of update requests due to state t is log N . Since there
tion, let ∼0 be any other strong bisimulation and show that are N states, the total number of update requests during the
∀x, y ∈ S x ∼0 y =⇒ x ∼ y. To this end, we prove by whole run of the algorithm is at most N log N . Every signa-
induction that ∀n ≥ 0 x ∼0 y =⇒ IDn (x) = IDn (y). The ture check is triggered by one or more update requests, there-
base case n = 0 is immediate. Suppose the statement is true fore we will have at most N log N signature checks.
for n − 1 and let x,y be two states such that x ∼0 y. Then The signature checks are expensive in the general case,
∀x − a
→ z∃y − a
→ t with z ∼0 t, and thus also (induction hy- but they are possible in constant time when we consider boun-
pothesis) ID n−1
(z) = IDn−1 (t). According to the signature ded fanout. Hence, the total complexity of this is O(N log N ).
definition, this means that sig n (x) = sig n (y). From property t
u
4 of Lemma 1, IDn (x) = IDn (y). t u
In order to achieve this bound in practice, a completely dif-
ferent implementation is needed, with carefully chosen data
3.1 Remarks on the time complexity
structures, that allow the fast retrieval of states from the same
block. But this is exactly what we tried to avoid, because this
The optimized algorithm uses a much more careful refine- operation is not too natural in a distributed setting. There-
ment strategy than the naive algorithm, which results in a bet- fore, we have chosen for the theoretically less competitive
ter practical performance in many cases. This is supported by but practically succesful implemntation presented in Figure 1,
the experiments presented in Section 5. where when a block is split, the old identifier is passed on to
The theoretical time complexity is a point that deserves the stable subblock, which is not necessarily the largest sub-
some attention. Since in the worst case, the optimized algo- block.
rithm behaves just like the naive, the complexity of OSBR
is not worse than O(M N ). With the Example 1 we show Example 1. The optimized algorithm can really reach O(M N )
that this is a tight bound for the implementation presented in computation steps. Consider for example the state space in
Figure 1. However, with a very small but significant modifi- Figure 4, with N states and N transitions. After the first iter-
cation, the optimized algorithm reaches O(N log N ). In Fig- ation, two blocks are identified, one containing the states with
ure 3 we show this modified version and below we prove its the signature {0, 0}, the other {1, 0}. The first has N/2 − 1
Stefan Blom, Simona Orzan: Distributed State Space Minimization 7

Unstable
0 0 0 0
a a

1
1 a a a a

1 1 1 1
a a a a a a a a

Figure 4. Example 1, showing that the bound O(N 2 ) for the optimized al-
gorithm is tight.
0 1 0 1 0 1 0 1

0 1 0 1

states, the latter N/2 + 1. Therefore, the second keeps the


identifier 0 and the first block gets a new identifier. Conse- 0 1
quently, N/2 − 2 states are marked unstable and the last state
in the 0-chain is not, which means that all the N/2 − 2 states
will change their identifier in the next iteration. This leads to Figure 5. Example 2, showing that the bound O(N log N ) for the algorithm
in Figure 3 is tight.
the first N/2 − 3 states in the 0-chain being marked unstable.
In the next iterations, the same effect is repeated. Over the
whole run, there will have been (N/2 − 2) + (N/2 − 3) +
· · · 2 + 1 unstable markings (i.e., signature recomputations). The input of our algorithm is an LT S (S, T, s0 ) with N
This means O(N 2 ). states and M transitions, and it has bounded fanout, which is
a reasonable assumption for state spaces. An important hy-
Example 2. Consider the state space in Figure 5. It is a mir- pothesis is that the input size (given by N and M ) is much
rored binary tree, with transitions labelled a in the upper part, bigger than the number of processors available. That is what
and transitions labelled 0 and 1 in the mirrored part. This makes our framework different from other parallel implemen-
shape can be easily extended to obtain a state space with tations that assign a processor for each state and each transi-
O(N ) states, O(log N ) of which are placed on the mirror tion.
line. Note that every state has a unique trace to the final bot-
tom state (deadlock). This means that if we apply the mini- 4.2 Description
mization algorithm, every state ends up in its own block. Af-
ter the first iteration, the states in the upper part of the state There are W workers, each consisting of two threads: a seg-
space are all put together in one block, and the lower part is ment manager, that maintains a part (a segment) of the LT S
split into 3 blocks: the bottommost state, that has no outgoing and computes the signatures of the unstable states, and a sig-
transitions, and two other blocks, one of the states having 0 as natures server, that maintains a part of the signature table ST
outgoing transition, the other 1. Due to the symmetry of the and computes the new IDs. The data structures occurring in
construction, these two large blocks are equal. In every fur- Figure 1 get distributed as follows:
ther iteration, every large block splits again into a singleton – worker i, actually the segment manager i, is responsible
block (the bottom most state of the old block) and two other S
for a subset Si of S. Si ∩ Sj = ∅, ∀i 6= j and i Si = S.
blocks, equal in size. (The argument for this is too tedious The function SM : S −→ {0 . . . W −1} maps every state
and we omit it.) Therefore, the reduction needs O(N log N ) to its base segment manager.
steps. – transition set T generates for every segment manager i
the sets
a
4 The distributed implementation Ini = {(x, a, y) | y ∈ Si ∧ x −
→ y}
a
Outi = {(x, a, ID(y)) | x ∈ Si ∧ x −→ y},
4.1 Framework, assumptions where ID identifies the current partition.
– the unstable states sets U, νU are maintained by managers
Our target architecture is a cluster whose nodes are connected in the form of Ui = U ∩ Si and νU i = νU ∩ Si , respec-
by a high bandwidth network (Distributed Memory Machine). tively.
We assume that both the nodes and the network are reliable – the set of block identifiers {0 . . . E − 1} gets divided into
(no nodes failure, no message loss) and that the order of mes- the disjoint sets IDS0 . . . IDSW −1 and distributed to the
sages between nodes is preserved. Processes communicate by W signatures servers by a mapping SS . Server j also
executing send destination-process message (non-blocking) maintains the part of the counts array c and of the sig-
and r eceive message (blocking). We assume that messages nature table ST corresponding to IDSj :
with the same source and destination keep their order. A mes-
STi = {(oid, s, Lx) | SS (oid) = i ∧
sage is a structure with a tag field (newsig, newid, update etc.)
and data whose meaning depends on the tag. Lx = [x ∈ S | ID(x) = oid ∧ sig(x) = s]}
8 Stefan Blom, Simona Orzan: Distributed State Space Minimization

SEGMENT MANAGER i SIGNATURES SERVER i

1 νU i := ∅ 1 STi := ∅;
2 for all x ∈ Ui 2 loop
compute sig(x) r eceive / newsig : oid, s, x .
send SS (ID(x)) if (oid, s, Lx) ∈ STi
/newsig : ID(x), sig(x), x . then Lx := Lx + [x]
else STi := STi ∪ {(oid, s, [x])}
Reusablei := {oid P IDSi |

c(oid) = (oid,s,Lx)∈STi |Lx|}
20 decide on νIDSi
3 loop 3 for all (oid, s, Lx) ∈ STi
r eceive a message if oid ∈/ Reusablei
case / newid : x, i . then take id from νIDSi
for all y : (y, a, x) ∈ Ini else id := oid
send SM (y) for all x ∈ Lx
/update : y, a, ID(x), i . send SM (x) / newid : x, id .
ID(x) := i 4 re-balance c , νIDS
case / update : x, a, oid, i .
Outi := Outi − (x, a, oid) + (x, a, i)
νU i := νU i ∪ {x}

Figure 6. ( ODBR ) A distributed iteration

Here Lx is the list of all unstable states that have s as The main advantage of overlapping the phases is memory
signature. Lx is necessary because unlike in the sequen- gain: since the consumers and producers of messages are ac-
tial implementation, in the distributed it is not possible to tive in the same time, the messages don’t have to be stored.
generate the new ID at the moment of signature insertion. Thus, less memory is used.

In the distributed algorithm , like in the sequential, a series of 4.3 Correctness argument
iterations is executed. In between iterations, workers synchro-
nize in order to decide whether the final partition has been Lemma 2. The following properties hold for this distributed
reached. The computation inside an iteration is asynchronous algorithm:
and directed only by messages, as sketched in Figure 6. There
1. in every iteration, the signatures of the states in the same
are five phases distinguishable within an iteration:
- managers compute the signatures of the unstable states block are sent to the same server
2. every time a block splits, one of the new blocks gets the
and send them (newsig) to the appropriate servers
old id
- servers receive the signatures (newsig) and insert them in
their local ST 3. in every iteration, finitely many newsig and newid mes-
sages are generated
- servers compute new IDs for the unstable states and send
4. in every iteration, a received newid message generates
them (newid) back to the managers
- managers receive the new IDs for their unstable states (newid) finitely many update messages.
5. (∀n > 0) if IDnd is the state partition at the beginning
and send messages to the parent states of the own states that
changed the ID (update) of iteration n of ODBR , and IDns is the state partition
at the beginning of iteration n of OSBR , then (∀x, y ∈
- managers receive and process the update messages (update)
S) IDnd (x) = IDnd (y) ⇐⇒ IDns (x) = IDns (y)
Due to the division of tasks between managers and servers,
the first and the second phase happen in parallel (step 2 in Proof. 1 . Indeed, if ID(x) = ID(y) = i, both sig(x)
Figure 6). Also the last three (step 3) are overlapped. The and sig(y) are sent (step 2 in the segment manager) to the
overlapping limits the amount of CPU idle time, by allowing signature server responsible for i, SS (i).
computation and communication to proceed in parallel. For 2. Consider a block with the identifier i. If there are
instance, the servers can already proceed with inserting signa- states x ∈ Sj − Uj with ID(x) = i, then it’s clear: all these
tures in the table while managers prepare and send more sig- states are not touched this iteration, i.e. they keep their old
nature messages. In the actual runs of the program, a worker ID. If, on the contrary, all the states x with ID(x) = i are in
(manager + server) will use one processor. some unstable set (∀x ∈ S ID(x) = i ∃jx ∈ Uj ) then all
Stefan Blom, Simona Orzan: Distributed State Space Minimization 9

signatures will be computed and sent to the same server (step problem bcg min naive optimized
time mem time mem time mem
2 in the segment manager). At the signature server side, all (s) (M) (s) (M) (s) (M)
these signatures get inserted in STi and counted – and i is CCP 15.0 18 21.3 20 4.5 18
added to the Reusablei set. Further, in step 3, when the first 1394-LL 18.5 19 6.2 14 3.3 21
triple (i, s, Lx) is encountered, all the states in Lx get i as lift5 113 184 64 123 43 214
new ID. CCP-2p3t - - 4363 968 779 1187
3. The number of newsig and newid messages is lim- Table 2. A comparison of single threaded tools. The times include the I/O
operations.
ited by the total size of the sets Ui , i.e. by |S|.
4. For each /newid : x, i . messages, |Ini | messages
(that is ≤ |S|) with the tag update are sent.
5. By induction on n. t u problem d. naive – 16 CPUs d. opt – 16 CPUs
Theorem 3. (termination and correctness of ODBR ) For time mem time mem
(s) (M) (s) (M)
any LT S (S, T, s0 ), ODBR terminates and the IDfd function lift5 33 460 20 480
computed is the same as the IDf computed by OSBR . CCP-2p3t 550 4430 104 1658
token ring 120 10802 231 4508
Proof. The properties (1), (2) from Lemma 2 ensure that lift6 702 5958 346 3834
the invariants from Lemma 1 are also true in the distributed 1394-LE 555 15388 428 8737
implementation ODBR . (3),(4) ensure that the computation Table 3. A comparison of distributed implementations. The times are without
within an iteration terminates. The global termination is jus- I/O.
tified by the one-to-one mapping between iterations in the
sequential algorithm OSBR and the iterations in the dis-
tributed implementation ODBR (5). From (5) and the cor-
rectness of OSBR (Theorem 1) it follows that the partition
machines. It is clear from this table that the marking proce-
computed is indeed the correct one. t u
dure (used for the optimized) can give significant gains in
time – see the numbers for both cache coherence protocols.
5 Experiments The sequential optimized implementation needs more mem-
ory than the naive, since it keeps both the straight and the
We implemented both the sequential and the distributed ver- inverse transition systems. On the other hand, the naive one
sions of the optimized algorithm and compared their perfor- consumes more memory for the hashtable – all signatures
mance with the naive ones. The experiments were done on an have to be inserted, while only some have to be considered
8 node dual CPU PC cluster and an SGI Origin 2000.1 The by the optimized implementation. Therefore, we expect that
test set consists of state spaces generated by case studies car- the optimized will be less memory expensive than the naive
ried out with the µCRL toolset [4]. Problem sizes before and when it comes to large examples. The distributed implemen-
after reduction can be found in Table 1. 1394-LL,1394-LE tation confirms this idea.
are models of the firewire link layer [17] and of the firewire
leader election protocol with 17 nodes [23]. CCP-2p3t is a 5.2 Distributed tools compared
cache coherence protocol model with two processes and 3
threads [20] and CCP is an older (and smaller) variant of it.
Table 3 shows a comparison of the naive and optimized dis-
lift5, lift6 are models of a distributed lift system with 5 and
tributed implementations on the cluster, for a number of large
6 legs [10]. token ring is the model of a Token Ring leader
LT Ss. The numbers listed for the memory usage represent
election for 4 stations2 .
the maximum total memory touched on all 8 workstations
during a run.
5.1 Sequential tools compared The runs indicate that the optimized implementation out-
performs the naive one most of the time. The optimized is
In [5], the usefulness of the signature approach was proved, designed to perform better when the partition refinement se-
by analysis and performance comparisons of the naive algo- ries needs a large number of iterations to stabilize, yet very
rithm with existing tools. In order to justify the good perfor- few blocks split in every iteration. This is exactly the case for
mance of the marking procedure, we first present a compar- the CCP state space. On the other hand, for state spaces like
ison between the naive and the optimized sequential imple- the Token Ring protocol, where almost all blocks split in ev-
mentations (Table 2). The tests were run on one of the cluster ery iteration, and the whole process ends in just a few rounds,
1 The cluster nodes are dual AMD Athlon MP 1600+ machines with 2G the naive works faster, since it doesn’t waste time on admin-
memory each, running Linux and connected by gigabit ethernet. The Origin istration issues. In all larger examples though, the memory
2000 is a ccNUMA machine with 32 CPUs and 64G of memory running gain is obvious – and for the bisimulation reduction problem,
IRIX, of which we used 16 MIPS R10k processors. On the cluster, we used
LAM/MPI 6.5 On the SGI, we used the native MPI implementation. memory is a more critical resource than time.
2 The original LOTOS model [8] was translated to µCRL by Judi Romijn To test how the optimized distributed algorithm scales,
and extended from 3 to 4 stations. we ran on the cluster series of experiments using 1-8 ma-
10 Stefan Blom, Simona Orzan: Distributed State Space Minimization

problem original minimized


states transitions disk space states transitions number of
(in 106 ) (106 ) (M B) (106 ) (106 ) iterations
CCP 0.21 0.68 15 0.077 0.24 66
1394-LL 0.37 0.64 15 0.034 0.076 73
lift5 2.2 8.7 101 0.032 0.14 86
CCP-2p3t 7.8 59 678 1.0 6.6 94
token ring 19.9 132 1513 8.4 51.1 6
lift6 33.9 165 1898 0.12 0.65 91
1394-LE 44.8 387 4430 1.1 7.7 51
Table 1. Problem sizes.

CCP-2p3t and lift6 runtimes 300 750


cluster, single CPU/node
4000 cluster, dual CPU/node
NAIVE on cluster, CCP-2p3t cluster, single threaded
OPTIMIZED on cluster, CCP-2p3t SGI
NAIVE on cluster, lift6 SGI, single threaded
3500 OPTIMIZED on cluster, lift6 250 625

3000
200 500

2500

time(cluster)

time(SGI)
time(cluster)

150 375
2000

1500
100 250

1000

50 125
500

0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
number of CPUs CPU

CCP-2p3t and lift6 memory use


8000
Figure 8. Runtimes for lift5 on SGI and on the cluster
NAIVE on cluster, CCP-2p3t
OPTIMIZED on cluster, CCP-2p3t
NAIVE on cluster, lift6
7000 OPTIMIZED on cluster, lift6

6000
seem to scale up as nicely. This is partly due to the nondeter-
minism present in the optimized implementation – signatures
total memory used (MB)

5000

can arrive at servers in any order, the order influences the new
4000
IDs assignment to states, the new IDs determine how many
3000 unstable states are there in the next iteration, thus how much
time will that iteration cost etc. It is also due to the possibly
2000
unbalanced distribution of signatures to servers, which intro-
1000 duces unpredictable idle times. Last, there is some latency
due to the MPI implementation. We compared (Figure 8) the
0
1 2 3 4 5 6 7 8 9
number of CPUs
10 11 12 13 14 15 16 reduction of lift5 on the cluster with the reduction on a shared
memory machine that uses its native MPI implementation. It
Figure 7. Runtimes and memory usage for CCP-2p3t and for lift6
appears that the optimized algorithm does scale better on this
other MPI.

chines (2-16 processors). Figure 7 shows the runtimes (in 5.3 The VLTS test suite
seconds) needed to reduce lift6 and CCP-2p3t. Since lift6
is a real industrial case study with serious memory require- After analysing the behaviour of the two algorithms on some
ments, it couldn’t be run single threaded on a cluster node special case studies, we turn to “anonymous” state spaces
or distributed on less than 3 nodes. We see that for both dis- from the VLTS benchmark [6]. Figure 9 shows the times and
tributed implementations and both case studies presented, the total memory usage of the optimized algorithm relative to
memory usage scales well, i.e. the total memory needed on those of the naive algorithm. Unlike the other measurements
the cluster is almost constant, regardless the number of ma- presented, the times considered now are total, that is the I/O
chines used. Hence, more machines available will mean less operations are included. The 25 state spaces in this selection
resources occupied on each machine. are small to medium size (between 0.06 and 12 million states,
On runtimes however, the naive implementation scales in and between 0.3 million and 60 million transitions) and get
a more predictable manner, while the optimized times don’t reduced modulo strong bisimulation in less than 100 itera-
Stefan Blom, Simona Orzan: Distributed State Space Minimization 11

Total runtimes on 8 processors Total memory used on 8 processors (4 machines)


3 3

2.75 2.75

2.5 2.5

total memory optimized / total memory naive


2.25 2.25
time optimized / time naive

2 2

1.75 1.75

1.5 1.5

1.25 1.25

1 1

0.75 0.75

0.5 0.5

0.25 0.25

0 0
0 20 40 60 80 100 0 20 40 60 80 100
number of iterations number of iterations

Figure 9. The VLTS test suite

tions. The stars mark the very small state spaces, i.e. those tical state spaces this condition holds and marking visibly im-
that get reduced in less than 5 seconds by both algorithms. proves the performance.
We present the state spaces ordered by the number of it- The gain in time comes from the more elaborated treat-
erations in which the reduction procedure stabilizes. This is ment of partition refining. The gain in memory usage is the
a relevant order only for the time performance, not for the merit of two elements. First, the same improved refinement
memory usage. procedure makes sure that the hashtable accommodates less
As apparent from the figure, the relative time performance signatures, thus consumes less memory. The second and more
of the optimized is indeed influenced by the number of itera- important reason is that computation and communication are
tions and the size of the state space. This is roughly because, not separate phases anymore, but they are interleaved, saving
compared to the naive, it spends (much) more time on the this way the memory needed for storing intermediate results.
initial setup - and this time pays back only if the reduction The concept of signature refinement also works for other
process has some length. Note that for very short reductions, equivalences, like branching bisimulation [9], weak bisimu-
it can be almost 3 times slower than the naive, but for lengthy lation and τ ∗ a equivalence as defined in [7].
ones it is usually much faster (up to 6 times faster). Another promissing approach to generation, reduction and
Regarding the memory usage, we may notice that the op- model checking of large state spaces is using the disk as ex-
timized is indeed almost always an improvement. Exceptions tra storage. This technique would allow to increase the size
are the small state spaces, where the fixed size buffers used by of the models that can be verified without resorting to bet-
the optimized are significantly larger than needed. This could ter equipment. There has not been much research yet in this
be fixed by using dynamic buffers. direction yet.

6 Conclusions References

1. J. Barnat, L. Brim, and J. Střı́brná. Distributed LTL model-


We designed and implemented an optimized version of the al-
checking in SPIN. In Proceedings SPIN’01, volume 2057 of
gorithm described in [5]. The optimized version uses a mark- LNCS, pages 200–216, 2001.
ing technique for incremental computation of partitions and 2. G. Behrmann, T. Hune, and F.W. Vaandrager. Distributed timed
it allows communication and computation to proceed in par- model checking - How the search order matters. In Proceedings
allel. The result is a distributed strong bisimulation reduction CAV’00, volume 1855 of LNCS, pages 216–231, 2000.
tool, which outperforms its straight forward counterpart in 3. G. Berry, H. Comon, and A. Finkel, editors. Proceedings
memory usage and in most cases also in time. CAV’01, volume 2102 of LNCS, 2001.
The optimized algorithm doesn’t improve on the worst- 4. S.C.C. Blom, W.J. Fokkink, J.F. Groote, I. van Langevelde,
2
case theoretical complexity (O( M NW+N ), where M is the B. Lisser, and J.C. van de Pol. µCRL: A toolset for analysing
algebraic specifications. In Proceedings CAV’01, volume 2102
number of transitions and N the number of states), since there
of LNCS, pages 250–254, 2001.
exist input LT Ss where all states will be marked unstable, on 5. S.C.C. Blom and S.M. Orzan. A distributed algorithm for
every iteration. As the cost of marking is linear in the num- strong bisimulation reduction of state spaces. In Proceedings
ber of transitions, the complexitiy doesn’t get worse either. PDMC’02, volume 68 of ENTCS, 2002.
In practice, the cost of processing a state using marking is 6. CWI/SEN2 and INRIA/VASY. The VLTS bench-
typically twice the cost without. Hence, marking wins if on mark. http://www.inrialpes.fr/vasy/cadp/
average less than 50% of the states are marked. On most prac- resources/benchmark\_bcg.html.
12 Stefan Blom, Simona Orzan: Distributed State Space Minimization

7. J.-C. Fernandez and L. Mounier. Verifying bisimulations “on


the fly”. In Proceedings FORTE’90, 1990.
8. H. Garavel and L. Mounier. Specification and verification
of various distributed leader election algorithms for unidirec-
tion ring networks. Science of Computer Programming, 29(1–
2):171–197, 1997.
9. R.J. van Glabbeek and W.P. Weijland. Branching time and
abstraction in bisimulation semantics. Journal of the ACM,
43(3):555–600, 1996.
10. J.F. Groote, J. Pang, and A.G. Wouters. Analyzing a distributed
system for lifting trucks. Journal of Logic and Algebraic Pro-
gramming, 55(1–2):21–56, 2003.
11. O. Grumberg, T. Heyman, and A. Schuster. Distributed sym-
bolic model checking for µ-calculus. In Berry et al. [3], pages
350–362.
12. J.E. Hopcroft. An n log n algorithm for minimizing the states
in a finite automaton. In The Theory of Machines and Compu-
tations, pages 189–196. Academic Press, 1971.
13. C. Joubert and R. Mateescu. Distributed on-the-fly equivalence
checking. In Proceedings PDMC’04, ENTCS, 2004. To appear.
14. P.C. Kanellakis and S.A. Smolka. CCS expressions, finite state
processes and three problems of equivalence. In Proceedings
of 2nd Annual ACM Symposium on Principles of Distributed
Computing, pages 228–240, 1983.
15. F. Lerda and R. Sisto. Distributed-memory model checking
with SPIN. In Proceedings SPIN’00, volume 1680 of LNCS,
1999.
16. M. Leucker and T. Noll. Truth/SLC - A parallel verification
platform for concurrent systems. In Berry et al. [3], pages 255–
259.
17. S.P. Luttik. Description and formal specification of the Link
Layer of P1394. In Proceedings of the 2nd International Work-
shop on Applied Formal Methods in System Design, 1997.
18. R. Mateescu. A generic on-the-fly solver for alternation-free
boolean equation systems. In Proceedings TACAS’01, volume
2619 of LNCS, pages 81–96, 2003.
19. R. Paige and R. Tarjan. Three partition refinement algorithms.
SIAM Journal of Computing, 16(6):973–989, 1987.
20. J. Pang, W.J. Fokkink, R. Hofman, and R. Veldema. Model
checking a cache coherence protocol for a Java DSM imple-
mentation. In Proceedings FMPPTA’03, 2003.
21. J.C. van de Pol and M. Valero Espada. Verification of JavaS-
paces parallel programs. In Proceedings ACSD’03, pages 196–
205, 2003.
22. S. Rajasekaran and I. Lee. Parallel algorithms for relational
coarsest partition problems. IEEE Transactions on Parallel and
Distributed Systems, 9(7):687–699, 1998.
23. J.M.T. Romijn. A timed verification of the ieee 1394
leader election protocol. Formal Methods in System Design,
19(2):165–194, 2001.
24. U. Stern and D. Dill. Parallelizing the Murφ verifier. In Pro-
ceedings CAV’97, volume 1254 of LNCS, pages 256–278, 1997.
25. S. Zhang and S.A. Smolka. Towards efficient parallelization of
equivalence checking algorithms. In Proceedings FORTE’92,
volume C-10 of IFIP Transactions, pages 133–146, 1993.

You might also like