A Parallel GED Algorithm - Zeina
A Parallel GED Algorithm - Zeina
Graph edit distance (GED) has emerged as a powerful and flexible graph matching
paradigm that can be used to address different tasks in pattern recognition, machine
learning, and data mining. GED is an error-tolerant graph matching technique that
represents the minimum-cost sequence of basic editing operations to transform a graph
into another graph by means of insertion, deletion and substitution of nodes or edges.
Unfortunately, GED is a NP-hard combinatorial optimization problem. The question
of elaborating fast and precise algorithms is of first interest. In this paper, a parallel
algorithm for exact GED computation is proposed. Our proposal is based on a branch-
and-bound algorithm coupled with a load balancing strategy. Parallel threads run a
branch-and-bound algorithm to explore the solution space and to discard misleading
partial solutions. In the mean time, the load balancing scheme ensures that no thread
remains idle. Experiments on 4 publicly available datasets empirically demonstrated that
under time constraints our proposal can drastically improve a sequential approach and
a naive parallel approach. Our proposal was compared to 6 other methods and provided
more precise solutions while requiring a low memory usage. Experiments also showed that
having more precise solutions does not necessarily lead to higher classification rates. Such
a result raises a question of the benefit of having more precise solutions in a classification
context.
Keywords: Graph Matching; Parallel Computing; Graph Edit Distance; Pattern Recog-
nition; Load Balancing.
1. Introduction
Attributed graphs are powerful data structures for the representation of complex
entities. In a graph-based representation, vertices and their attributes describe ob-
jects (or part of objects) while edges represent interrelationships between the ob-
jects. Due to the inherent genericity of graph-based representations, and thanks to
the improvement of computer capacities, structural representations have become
more and more popular in the field of Pattern Recognition.
Graph edit distance (GED) is a graph matching paradigm whose concept was
first reported in 37 . Its basic idea is to find the best set of transformations that can
transform graph G1 into graph G2 by means of edit operations on graph G1 . The
1
December 15, 2016 15:17 WSPC/INSTRUCTION FILE ws-ijprai
allowed operations are inserting, deleting and/or substituting vertices and their
corresponding edges. GED can be used as a dissimilarity measure for arbitrarily
structured and arbitrarily attributed graphs. In contrast to other approaches, it does
not suffer from any restrictions and can be applied to any type of graph (including
hypergraphs 22 ). The main drawback of GED is its computational complexity which
is exponential in the number of vertices of the involved graphs.
Many fast heuristic GED methods have been proposed in the literature
41,43,18,20,38,19,6
. However, these heuristic algorithms can only find unbounded sub-
optimal values. On the other hand, only few exact approaches have been proposed
39,23,35,2
.
Parallel computing has been fruitfully employed to handle time-consuming op-
erations. Research results in the area of parallel algorithms for solving machine
learning and computer vision problems have been reported in 24 . These researches
demonstrated that parallelism can be exploited efficiently in various machine intel-
ligence and vision problems such as deep learning 15 or fast fourier transform 40 . In
this paper, we take benefit of parallel computing to solve the exact GED problem.
The main contribution of this paper is a parallel exact algorithm based on a
load balancing strategy for solving the GED problem. This paper lies in the idea
that a parallel execution can help to converge faster to the optimal solution. Our
method is very generic and can be applied to directed or undirected fully attributed
graphs (i.e., with attributes on both vertices and edges). By limiting the run-time,
our exact method provides (sub)optimal solutions and becomes an efficient upper
bound approximation of GED. A complete comparative study is provided where
6 exact and approximate GED algorithms were compared on a set of 4 graph
datasets. By considering both the quality of the proposed solutions and the speed
of the algorithm, we show that our proposal is a good choice when a fast decision
is required as in a classification context or when the time is less a matter but a
precised solution is required as in image registration.
This paper is organized as follows: Section 2 presents the important definitions
necessary for introducing our GED algorithm. Then, Section 3 reviews the existing
approximate and exact approaches for computing GED. Section 4 describes the
proposed parallel scheme based on a load balancing paradigm. Section 5 presents the
experiments and analyses the obtained results. Section 6 provides some concluding
remarks.
2. Problem Statements
In this section, we first introduce the GED problem which is formally defined as an
optimization problem. Secondly, to cope with the inherent complexity of the GED
problem, the use of parallel computing is argued. However, the parallel execution
of a combinatorial optimization problem is not trivial and consequently the load
balancing question is being raised. Finally, the load balancing problem is formally
defined and presented to establish the basement of an efficient parallel algorithm.
December 15, 2016 15:17 WSPC/INSTRUCTION FILE ws-ijprai
Fig. 1. An incomplete search tree example for solving the GED problem. The first floor represents
possible matchings of vertex A with each vertex of the second graph (in blue). A tree node is a
partial solution which is to say a partial edit path.
Bunke 29 showed that if each elementary operation satisfies the criteria of a distance
(separability, symmetry and triangular inequality) then GED is metric. Recently,
methods to learn the matching edit cost between graphs have been published 14 .
The discussion around the cost functions is beyond the topic of this paper that
essentially focuses on the GED computation.
run-time. Portions that encompass the optimal solution with high probability are
expanded and explored exhaustively, while portions that have unfruitful solutions
are discarded at run-time. To ensure that parallel threads are always busy, tree
nodes have to be dispatched at run-time. Hence, the local workload of a thread is
difficult to predict.
The parallel execution of combinatorial optimization problems relies on load
balancing strategies to divide the global workload of all threads iteratively at run-
time. From the viewpoint of a workload distribution strategy, parallel optimizations
fall in the asynchronous communication category where no thread waits another
thread to finish in order to start a new task 5 . A thread initiates a balancing
operation when it becomes lightly loaded or overloaded. The objective of the data
distribution strategies is to ensure a fast convergence to the optimal solution such
that all the tree nodes are evaluated as fast as possible. In this paper, we propose
a parallel GED approach equipped with a load balancing strategy. This approach
ensures that all threads have the same amount of load and all threads explore the
most promising tree nodes first.
Initiation Rule This rule dictates when to initiate a load balancing operation.
The execution of a balancing operation incurs non-negligible overhead; its invoca-
tion must weight its overhead cost against its expected performance benefit. An
initiation policy is thus needed to determine whether a balancing operation will be
profitable.
3. Related Work
In this section, an overview of the GED methods presented in the literature is
given. Since our goal is to speed up the calculations of GED, parallelism is highly
required. Therefore, we also cover the parallel methods, dedicated to solving branch-
and-bound (BnB) problems, aiming at getting inspired by some of these works for
parallelizing the GED calculations.
that has the least g(p) + h(p) is chosen where g(p) represents the cost of the partial
edit path accumulated so far whereas h(p) denotes the estimated cost from p to
a leaf node representing a complete edit path. The sum g(p) + h(p) is referred to
as a lower bound lb(p). Given that the estimation of the future costs h(p) is lower
than, or equal to, the real costs, an optimal path from the root node to a leaf node
is guaranteed to be found 32 . Leaf nodes correspond to feasible solutions and so
complete edit paths. In the worst case, the space complexity can be expressed as
O(|Γ|) 13 where |Γ| is the cardinality of the set of all possible edit paths. Since |Γ|
is exponential in the number of vertices involved in the graphs, the memory usage
is still an issue.
To overcome the A∗ problem, a recent depth-first BnB GED algorithm, referred
to as DF, has been proposed in 2 . This algorithm speeds up the computations of
GED thanks to its upper and lower bounds pruning strategy and its preprocessing
step. Moreover, this algorithm does not exhaust memory as the number of pending
edit paths that are stored at any time t is relatively small thanks to the space
complexity which is equal to |V1 |.|V2 | in the worst case.
In both A∗ and DF, h(p) can be estimated by mapping the unprocessed vertices
and edges of graph G1 to the unmapped to those of graph G2 such that the resulting
cost is minimal. The unprocessed edges of both graphs are handled separately from
the unprocessed vertices. This mapping is done in a faster way than the exact
computation and should return a good approximation of the true future cost. Note
that the smaller the difference between h(p) and the real future cost, the fewer
nodes will be expanded by A∗ and DF.
Almohamad and Duffuaa in (4 ) proposed the first linear programming formula-
tion of the weighted graph matching problem. It consists in determining the permu-
tation matrix minimizing the L1 norm of the difference between adjacency matrix
of the input graph and the permuted adjacency matrix of the target one. More
recently, Justice and Hero (23 ) also proposed a binary linear programming formu-
lation of the graph edit distance problem. GM is treated as finding a subgraph of
a larger graph known as the edit grid. The edit grid only needs to have as many
vertices as the sum of the total number of vertices in the graphs being compared.
One drawback of this method is that it does not take into account attributes on
edges which limits the range of application.
Table 1 synthesizes the aforementioned methods in terms of the size of the
graphs they could match, the execution time and the complexity. One can see the
complexity of the exact GED in terms of the number of vertices that the methods
could match. Based on these facts, researchers shed light on the approximate GED
side.
posed in 27 . The purpose of BS, is to prune the search tree while searching for an
optimal edit path. Instead of exploring all edit paths in the search tree, a parameter
s is set to an integer x which is in charge of keeping the x most promising partial
edit paths in the set of promising candidates.
In 32 , the problem of graph matching is reduced to finding the minimum assign-
ment cost where in the worst case, the maximum number of operations needed by
the algorithm is O(n3 ). This algorithm is referred to as BP. Since BP considers lo-
cal structures rather than global ones, the optimal GED is overestimated. Recently,
researchers have observed that BP ’s overestimation is very often due to a few in-
correctly assigned vertices. That is, only few vertex substitutions from the next step
are responsible for additional (unnecessary) edge operations in the step after and
thus resulting in the overestimation of the optimal edit distance. In 34 , BP is used
as an initial step. Then, pairwise swapping of vertices (local search) is done aiming
at improving the accuracy of the distance obtained so far. In 36 , a search procedure
based on a genetic algorithm is proposed to improve the accuracy of BP . These
improvements increase run times. However, they improve the accuracy of the BP
solution.
In 21 , the authors propose a novel modification of the Hausdorff distance that
takes into account not only substitution, but also deletion and insertion cost.
P P
H(V1 , V2 ) is defined as follows: H(V1 , V2 ) = g1 ming2 c¯1 (u, v) + g2 ming1 c¯2 (u, v)
which can be interpreted as the sum of distances to the most similar vertex in the
other graph. This approach allows multiple vertex assignments, consequently, the
time complexity is reduced to quadratic (i.e., O(n2 )) with respect to the number
of vertices of the involved graphs. In 33,7 , the GED is shown to be equivalent to a
Quadratic Assignment Problem (QAP). In 7 , with this quadratic formulation, two
well-kown graph matching methods called Integer Projected Fixed Point method 25
and Graduated Non Convexity and Concavity Procedure 26 are applied to GED. In
25
, this heuristic improves an initial solution by trying to solve a linear assignment
problem and the relaxed QAP where binary constraints are relaxed to the contin-
uous domain. Iteratively, the quadratic formulation is linearly approximated by its
1st-order expansion around the current solution. The resulting assignment helps at
guiding the minimization of the relaxed QAP. In 26 , a path following algorithm aims
at approximating the solution of a QAP by considering a convex-concave relaxation
through the modified quadratic function.
December 15, 2016 15:17 WSPC/INSTRUCTION FILE ws-ijprai
3.1.3. Synthesis
Table 2 summarizes the aforementioned approximate GED methods. Approximate
GED methods often have a polynomial computational time in the size of the input
graphs and thus are much faster than the exact ones. Nevertheless, these methods
do not guarantee to find the optimal matching. We, authors, believe that the more
complex the graphs, the larger the error committed by the approximate methods.
The graphs are generally more complex in cases where neighborhoods and attributes
do not allow to easily differentiate between vertices. On the exact GED side, only
few approaches have been proposed to postpone the graph size restriction 39,23,35,2 .
For all these reasons, we believe that proposing a fast and exact GED algorithm is
of great interest.
10
memory. Both 11 and 8 solved the irregularity of BnB. However, the explorations
in both approaches take longer time. That is because in the first kernel each thread
generates only one child at each time while the elimination of branches occurs in
the second kernel.
An OPEN-MP approaches have been put forward in 16 . T threads are established
by the master program. Moreover, the master program generates tree nodes and
put them in a queue. Then T tree nodes are removed from the queue and assigned
to each thread. The best solution must be modified carefully where only one thread
can change it at any time. The same thing is done when a thread tries to insert
a new tree nodes in the global shared queue. In this approach, each thread only
takes one node, explores it and at the end of its exploration it sends its result to the
master that forwards the message to other slaves if the upper bound is updated.
Thus, this model did not tackle the irregularity of BnB.
11
when it has no node left in its queue. Load balancing is performed if the number of
inactive threads with empty queues is above a threshold T . The best permutation
and the best degree of match are only updated at the end of each iteration, such a
fact will not prune the search space as fast as possible.
3.2.3. Synthesis
Based on the aforementioned parallel BnB algorithms and to the best of our knowl-
edge, none of these algorithms addressed the GED problem. We believe that propos-
ing a parallel branch-and-bound algorithm dedicated to solving the GED problem
is of great interest since the computational time will be improved. The search tree
of GED is irregular (i.e., the number of tree nodes varies depending on the ability
of the lower and upper bounds in pruning the search tree) and thus the regular
parallel approaches (e.g., 11,8,16 ) are not suitable for such a problem.
The approaches in 30 , 12 and 28 are interesting since the communication is
asynchronous b and thus there is no need to stop a thread if it did not finish its
tasks, unless another thread ran out of tasks. In 12 , however, load balancing is
not integrated. Thus, when there are no more problems to be generated by the
master thread, some threads might become idle for a certain amount of time while
waiting the other threads to finish their associated tasks. For GED, load balancing
is important to keep the amount of work balanced between all threads.
On this basis, we propose a parallel GED method equipped with a load balancing
strategy. This paper is considered as an extension of the most recent BnB algorithm
(DF ). When thinking of a parallel and/or a distributed approach of DF, the edit
paths can be considered as atomic tasks to be solved. Edit paths can be dispatched
and can be given to threads in order to divide the GED problem into smaller
problems. It is hard to estimate the time needed by threads to explore a sub-tree
(i.e., to become idle). Likewise, the number of CPUs and/or machines have to be
adapted to the amount and type of data that have to be analyzed. Some experiments
in Section 5.4 illustrate this point and are followed by a discussion.
b Asynchronous communication indicates that no thread waits another thread to finish in order to
start its new task 5 .
December 15, 2016 15:17 WSPC/INSTRUCTION FILE ws-ijprai
12
finishes all its assigned edit paths. The algorithm terminates when all threads fin-
ish the exploration of their assigned editpaths. Our algorithm, denoted by PDFS,
consists of three main steps: Initialization-Decomposition-Assignment, Branch-and-
Bound and Load Balancing. Figure 2 pictures the whole steps of PDFS.
13
Each element ci,j in the matrix Cv corresponds to the cost of assigning the ith
vertex of the graph G1 to the j th vertex of the graph G2 . The left upper corner
of the matrix contains all possible node substitutions while the right upper corner
represents the cost of all possible vertices insertions and deletions of vertices of G1 ,
respectively. The left bottom corner contains all possible vertices insertions and
deletions of vertices of G2 , respectively whereas the bottom right corner elements
cost is set to infinity which concerns the substitution of − . Similarly, Ce contains
all the possible substitutions, deletions and insertions of edges of G1 and G2 . Ce is
constructed in the very same way as Cv . The aforementioned matrices Cv and Ce
are used as an input of the following phase.
Second, the vertices of G1 are sorted in order to start with the most promising
vertices in G1 . BP is applied to establish the initial edit path EP 32 . Afterwards,
the edit operations of EP are sorted in ascending order of the matching cost where
EPsorted = {u → v} ∀u ∈ V1 ∪ {}. At last, from EPsorted , each u ∈ V1 is inserted
in sorted -V1 . This operation helps in finding the
Third, a first upper bound (UB ) is computed by BP algorithm as it is relatively
fast and it provides reasonable results, see 32 for more details.
14
Algorithm 1 Dispatch-Tasks
Input: A set of partial edit paths Q generated by A∗ and T threads.
Output: The local list OPEN of each thread Ti
1: Q ← sortAscending(Q)
2: for Tindex ∈ T do
3: OP ENTindex ← {φ}
4: end for
5: i=0 . a variable used for thread’s indices
6: for p ∈ Q do
7: index = i % |T |
8: OP ENTindex .addTask(p)
9: i++
10: end for
11: Return OP ENTindex ∀ index ∈ 1, · · · , |T |
Each thread maintains a local heap to keep the assigned edit paths for exploring
edit paths locally. Such an iterative way guarantees the diversity of nodes difficulty
that are associated to each thread.
4.2. Branch-and-Bound Method
In this section we explain the components of BnB that each thread executes on its
assigned partial edit paths. First, the rules of selecting edit paths, branching and
bounding are described. Second, updating the upper bound and pruning the search
tree are detailed.
Branching Procedure: Initially each thread only has its assigned editpaths in its
local heap set (OPEN ) i.e., the set of the edit paths, found so far. The exploration
starts with the first most promising vertex u1 in sorted -V1 in order to generate the
children of the selected editpath. The children consist of substituting u1 with all
the vertices of G2 , in addition to the deletion of u1 (i.e., u1 ⇒ ). Then, the children
are added to OPEN. Consequently, a minimum edit path (pmin ) is chosen to be
explored by selecting the minimum cost node (i.e., min(g(p) + h(p))) among the
children of pmin and so on. We backtrack to continue the search for a good edit
path if pmin equals φ. In this case, we try out the next child of pmin and so on.
December 15, 2016 15:17 WSPC/INSTRUCTION FILE ws-ijprai
15
Upper Bound Update: The best upper bound is globally shared y all threads
(shared UB ). When a thread finds a better upper bound, the shared UB is updated
(i.e., a complete path found by a thread whose cost is less than the current UB ).
Heuristic: After comparing several heuristics h(p) from the literature, we selected
the bipartite graph matching heuristic proposed in 32 . The complexity of such
a method is O({|V1 |, |V2 |}3 + |E1 |, |E2 |}3 ). For each tree node p, the unmatched
vertices and edges are handled completely independently. Unmatched vertices of
G1 and unmatched vertices of G2 are matched at best by solving an linear sum
assignment problem. Unmatched edges of both graphs are handled analogously.
Obviously, this procedure allows multiple substitutions involving the same vertex
or edge and, therefore, it possibly represents an invalid way to edit the remaining
part of G1 into the remaining part of G2 . However, the estimated cost certainly
constitutes a lower bound of the exact cost.
16
two threads are involved in the load balancing operation: heavy and idle threads.
When a thread becomes idle, the heaviest thread will be in charge of giving to the
idle thread some edit paths to explore. All the edit paths of the heavy thread are
ordered using teir lb(p). The heavy thread distributes the best edit paths between
it and the idle thread. This procedure guarantees the exploration of the best edit
paths first since each thread holds some promising edit paths.
Threads Communication: All threads share Ce , Ce , sorted -V1 and UB. Since
all threads try to find a better UB, a memory coherence protocol is required on the
shared memory location of UB. When two threads simultaneously try to update
UB, a synchronization process based on mutex is applied in order to make sure that
only one thread can access the resource at a given point in time.
5. Experiments
This section aims at evaluating the proposed contribution through an experimen-
tal study that compares 9 methods on reference datasets. We first describe the
datasets, the methods that have been studied and the protocol. Then, the results
are presented and discussed.
5.1. Datasets
To the best of our knowledge, few publicly available graphs databases are dedicated
to precise evaluation of graph matching tasks. However, most of these datasets con-
sist of synthetic graphs that are not representative of PR problems concerning graph
matching under noise and distortion. We shed light on the IAM graph repository
which is a widely used repository dedicated to a wide spectrum of tasks in pattern
recognition and machine learning 31 . Moreover, it contains graphs of both symbolic
and numeric attributes which is not often the case of other datasets. Consequently,
the GED algorithms involved in the experiments are applied to three different real
world graph datasets taken from the IAM repository (i.e., GREC, Mutagenicity
(MUTA) and Protein datasets) 31 . Continuous attributes on vertices and edges of
GREC play an important role in the matching process whereas MUTA is represen-
tative of GM problems where graphs have only symbolic attributes. On the other
hand, the Protein database contains numeric attributes on each vertex as well as a
string sequence that is used to represent the amino acid sequence.
For the scalability experiment, the subsets of GREC, MUTA and Protein, pro-
posed in the repository GDR4GED 1 , were chosen. On the other hand, for the
classification experiment, the experiments were conducted on the train and test
sets of each of them.
In addition to these datasets, a chemical dataset, called PAH, taken from GR-
EYCs Chemistry dataset repository c , was also integrated in the experiments. This
c https://brunl01.users.greyc.fr/CHEMISTRY/index.html
December 15, 2016 15:17 WSPC/INSTRUCTION FILE ws-ijprai
17
dataset is quite challenging since it has no attributes on both vertices and edges.
Table 3 summarizes the characteristics of all the selected datasets.
These datasets have been chosen by carefully reviewing all the publicly avail-
able datasets that have been used in the reference works mentioned in section 3
(LETTER, GREC, COIL, Alkane, FINGERPRINT, PAH, MUTA, PROTEIN and
AIDS to name the most frequent ones). On the basis of this review, a subset of these
datasets has been chosen in order to get a good representativeness of the different
graph features which can affect GED computation (size and labelling):
Table 3. The Characteristics of the GREC, Mutagenicity, Protein and PAH Datasets.
18
between two graphs G1 and G2 except the lower bound GED which only returns a
distance between two graphs.
PDFS and naive-PDFS were implemented using Java threads. The evaluation
of both algorithms was conducted on a 24-core Intel i5 processor 2.10GHz, 16GB
memory. For sequential algorithms, evaluations were conducted on one core.
5.3. Protocol
In this section, the experimental protocol is presented and the objectives of the
experiment are described.
Let S be a graph dataset consisting of k graphs, S = {g1 , g2 , ..., gk }. Let
M = Me ∪ Ma be the set of all the GED methods listed in Section 5.2, with
Me = {A∗, DF, PDFS } the set of exact methods and Ma = {BP,BS-1 ,BS-10 ,BS-
100 , H} the set of approximate methods (where x in BS was set to 1, 10 and 100).
Given a method m ∈ M, we computed all the pairwise comparisons d(gi , gj )m ,
where d(gi , gj )m is the value returned by method m on the graph pair (gi , gj )
within certain time and memory limits.
Two types of experiments were carried out scalability experiment and classica-
tion experiment.
m m
1 XX
timepk = time(Gi , Gj )p and (i, j) ∈ J1, mK2 ∀k ∈ #subsets (2)
m × m i=1 j=1
All these metrics have been proposed in 1 . This experiment was decomposed of
2 tests:
December 15, 2016 15:17 WSPC/INSTRUCTION FILE ws-ijprai
19
Accuracy Test The aim was to illustrate the error committed by approximated
methods over exact methods. In an ideal case, no time constraint (CT ) should
be imposed to reach the optimal solution. Due to the large number of considered
matchings and the exponential complexity of the tested algorithms, we allowed a
maximum CT of 300 seconds. This time constraint was large enough to let the
methods search deeply into the solution space and to ensure that many nodes will
be explored. The key idea was to reach the optimality, whenever it is possible, or
at least to get to the Graal (i.e., the optimal solution) as close as possible. This use
case is necessary when it is important to accurately compare images represented by
graphs even if the execution time is long.
Speed Test The goal was to evaluate the accuracy of exact methods against ap-
proximate methods when time matters. That is to say in a context of very limited
time. Thus, for each dataset, we select the slowest graph comparison using an ap-
proximate method among BP and H as a first time constraint. Unlike BP and H,
BS is not included as it is a tree-search algorithm which could output a solution
even under a limited CT . Mathematically saying, CT is defined as follows:
CT = max{timem (gi , gj )} (4)
m,i,j
Where m ∈ Ms /BS, (i, j) ∈ J1, kK2 and time is a function returning the running
time of method m for a given graph comparison. This way ensures that BP and H
could solve any instance. When the time limit is over, the best solution found so far
is outputted by BS as well as the exact GED methods. So time and memory limits
play a crucial role in our experiments since they impact such methods. In Table 4,
we display the time limits used for each dataset.
20
In all the experiments (i.e., scalability and classification), CM was set to 1GB.
Among all the aforementioned methods, we expected A∗ to violate CM specially
when graphs get larger. In a small CT context, the number of threads in PDFS was
set to 3. The reason is that since CT was quite small, we did not want to lose time
decomposing the workload among a big number of threads. Moreover, because of
the complexity of the calculation of lb, it was removed from each of BS, A∗ , DF
and P DF S
5.4. Parameters
We study the effect of increasing the number of threads T on both accuracy and
speed of naive-PDFS and PDFS. This test was carried out using a 24-core CPU. T
is varied from 2 to 128 threads. Moreover, the effect of several values of N , described
in Section 4.1, were studied. Five values of N were chosen: -1, 100, 250, 500 and
1000, where N =-1 represents the decomposition of the first floor in the search tree
with all possible branches, N = 100 and 250 moderately perform load balancing
while N =500 and 1000 is the exhaustive case where threads have much less time
dedicated to load balancing since each thread will be assigned sufficient number
of works before the parallelism starts. We expected PDFS to perform better when
increasing N up to a threshold where the accuracy of the algorithm is degraded.
5.5. Results
In this section, the results are demonstrated along with their discussions. We con-
ducted experiments on the involved datasets, however, for the part of parameters
selection, we only show the results on GREC-20 1 since this dataset is representative
of the other datasets. Time unit is always expressed in milliseconds.
21
Method #best found solutions #optimal solutions Idle Time over CPU Time
PDFS-2T 67 48 1.7 ∗10−5
PDFS-4T 79 54 9.5 ∗10−5
PDFS-8T 83 66 2.8 ∗10−4
PDFS-16T 92 69 7.7 ∗10−4
PDFS-32T 94 69 0.011
PDFS-64T 95 68 0.043
PDFS-128T 98 66 0.169
1.0
1.0
0.8
0.8
PDFS−64threads
PDFS−128threads
Deviation score
0.6
Deviation score
0.6
0.4
0.4
●
0.2
0.2
●
●
●
●
● ●
●
●
●● ●
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Method #best found solutions #optimal solutions Idle Time over CPU Time
PDFS-1st Floor 82 53 0.067
PDFS-100 EP 85 66 0.067
PDFS-250 EP 85 41 0.053
PDFS-500 EP 82 41 0.023
PDFS-1000 EP 83 40 0.018
Table 6. The effect of the number of edit paths on the performance of PDFS
December 15, 2016 15:17 WSPC/INSTRUCTION FILE ws-ijprai
22
Method #optimal solutions Mean CPU Time (ms) Mean Variance (ms)
naive-PDFS 63 4792444 361157.6
PDFS 91 7275476 49880.62
Table 7. The effect of the number of edit paths on the performance of PDFS when executed on
the GREC dataset
Large Time Constraint Regarding the number of best found solutions and the
number of optimal solutions, PDFS always outperformed DF on GREC, MUTA,
Protein and PAH, see Figures 4, 5 and 6.
On MUTA, the deviation of BP was 20%; this fact confirms that the more
complex the graphs the less accurate the answer achieved by BP, see Figure 6(b).
BP considers only local, rather than global, edge structure during the optimization
process 32 and so when graphs get larger, its solution becomes far from the exact one.
Despite the out-performance of PDFS over BP, H and DF, it did not outperform BS
in terms of number of best found solutions, see Figure 4(b). The major differences
between these algorithms are the search space and the Vertices-Sorting strategies
which are adapted in PDFS and not in BS. Since BP did not give a good estimation
on MUTA, it was irrelevant when sorting vertices of G1 resulting in the exploration
of misleading nodes in the search tree. Since the graphs of MUTA are relatively
large, backtracking nodes took time. However, the difference between BS and PDFS
in terms of deviation was only 0.1%.
On Protein-30, BS-100 was superior to PDFS in terms of number of best found
December 15, 2016 15:17 WSPC/INSTRUCTION FILE ws-ijprai
23
solutions with 50 better solutions. However, this is not the case of a bigger dataset
like Protein-40 where BS-100 outputted unfeasible solutions because of the tremen-
dous size of the search tree and thus PDFS outperformed it. On average, on all
databases and among all methods, PDFS got the best deviation, see Figure 6.
Exploring the search tree in a parallel way has an advantage when we are also
interested in having more optimal solutions, see Figure 5. Results, in Figure 5,
demonstrated that the number of optimal solutions found by PDFS was always
equal or greater than the number of optimal solutions found by DF and A∗ , except
on MUTA-20 where A∗ outperformed it. For instance, on GREC, PDFS found 9.6%
more optimal solutions when compared to DF and 10% more optimal solutions on
PAH. Note that without time constraints all the exact GED algorithms must find
all the optimal solutions except A∗ that has memory bottleneck.
Small Time Constraint Concerning the number of best found solutions, even
under a small CT , PDFS outperformed DF where the average difference between
DF and PDFS was: 10% on GREC, 16% on MUTA, 15% on Protein and 11% on
PAH, see Figure 7.
A∗ got the highest deviation rates (around 30% on GREC, 73% on MUTA,
86% on Protein and 51.94% on PAH) since it did not have time to output feasible
solutions. Despite the fact that PDFS was among the slowest algorithms, it obtained
the lowest deviation (0% on both GREC and Protein, 5% on MUTA and 6% on
PAH), see Figure 8. BS-100 outputted unfeasible solutions on MUTA-50, MUTA-
60, MUTA-70, MUTA-MIX and Protein due to the small CT .
24
200
BP
200
BP H
H BS−1
BS−1 BS−10
BS−10 BS−100
A*
150
BS−100
DFS
DFS PDFS
# of best solutions found (in %)
PDFS
100
100
50
50
0
0
BP BP
150
H H
BS−1 BS−1
150
BS−10 BS−10
BS−100 BS−100
A* A*
DFS DF
# of best solutions found (in %)
PDFS PDFS
100
100
50
50
0
On Protein, one can see a different behavior (see Table 9). DF-UB -LB was the
fastest while DF-UB-LB was the slowest. That is because of the time consumed to
calculate distances using the cost functions of Protein. Thus, as on GREC, PDFS-
UB-LB was included in the tests. Despite the slowness of DF-UB-LB, it was also
the best algorithm in terms of classification rate. PDFS-UB-LB was 36% faster
than DF-UB-LB. Even though BS took relatively enough time to classify graphs
(compared to DF ), it was way far from the results obtained by DF. A∗ was not able
to find feasible solutions of each pair of graphs. That was not the case of all the
variants of DF as they were always able to output feasible solutions before halting.
Computing lb(p) and a first upper bound UB was time consuming on such a
large database. Since CT of MUTA was set to 500ms, we kept only DF -U B-LB
December 15, 2016 15:17 WSPC/INSTRUCTION FILE ws-ijprai
25
200
BP
H
200
BS−1
BP
H
BS−10
BS−1 BS−100
A*
150
BS−10
DFS
PDFS
100
100
50
50
0
0
BP BP
30
H H
BS−1 BS−1
BS−10 BS−10
80
BS−100 BS−100
25
A* A*
# of optimal solutions found (in %)
DFS DF
PDFS PDFS
20
60
15
40
10
20
5
0
and A∗ -LB. Results showed that DF -U B-LB was twice as slow as BP, however,
both of them succeeded in finding the best classification rate (i.e., approximately
0.70). PDFS -U B-LB was also able to find the same classification rate and was 40%
faster than DF -U B-LB.
From all the aforementioned results, one can conclude that even if the deviation
of DF and PDFS was better when compared to BP, it did not have an effect on
the classification rate. In other words, for such an application, one does not need
to have a very accurate algorithm in order to obtain a good classification rate.
December 15, 2016 15:17 WSPC/INSTRUCTION FILE ws-ijprai
26
1.0
1.0
BP BP
H ● H
BS−1 BS−1
●
BS−10 BS−10
BS−100 BS−100 ●
0.8
0.8
A* A*
DF DFS
PDFS PDFS
Deviation score
Deviation score
0.6
0.6
●
0.4
0.4
●
0.2
0.2
● ●
●
● ●
●
● ● ●
0.0
0.0
● ●
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
1.0
BP ● ● BP
H H
BS−1 BS−1
BS−10 BS−10
BS−100 BS−100
0.8
0.8
A* A*
DFS DF
PDFS PDFS
Deviation score
Deviation score
0.6
0.6
0.4
0.4
● ●
● ●
●
0.2
0.2
●
● ●
0.0
0.0
● ● ●
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
27
BP
H
200
BS−1
BP
H
BS−10
BS−100
150
BS−1
BS−10 A*
DFS
DFS
# of best solutions found (in %)
PDFS
100
100
50
50
0
0
5 10 15 20 MIX 10 20 30 40 50 60 70 MIX
BP BP
150
H H
BS−1 BS−1
BS−10 BS−10
BS−100 BS−100
A* A*
150
DF DF
# of best solutions found (in %)
PDFS PDFS
100
100
50
50
0
20 30 40 MIX PAH
threads solve their assigned edit paths in a fully parallel manner. A work stealing or
balancing process is performed whenever a thread finishes all its assigned threads.
Moreover, synchronization is applied in order to ensure upper bound coherence.
In the experiments part, we proposed to evaluate both exact and approximate
GED approaches under large and small time constraints, on 4 publicly available
datasets (GREC, MUTA, Protein and PAH). Such constraints are devoted to accu-
racy and speed tests, respectively. Small time constraints ensured that the approx-
imate methods BP and H were able to find a solution. Experiments demonstrated
the importance of the load balancing strategy when compared to a naive method
that does not include neither static nor dynamic load balancing. Under small and
large time constraints, PDFS proved to have the minimum deviation, the max-
December 15, 2016 15:17 WSPC/INSTRUCTION FILE ws-ijprai
28
1.0
1.0
BP BP
H H
●
BS−1 ● BS−1
BS−10 BS−10
BS−100 BS−100
0.8
0.8
A* ● A*
DFS DFS
PDFS PDFS
Deviation score
Deviation score
0.6
0.6
●
0.4
0.4
●●
●
●
●
0.2
0.2
●
●
●
●
●
0.0
0.0
● ●
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
1.0
BP ● ● BP
H H
BS−1 BS−1
BS−10 BS−10
BS−100 BS−100
0.8
0.8
A* A*
DF DF
PDFS PDFS
Deviation score
Deviation score
0.6
0.6
● ●
●
0.4
0.4
●
0.2
0.2
●
● ●
●
●
●
0.0
0.0
● ●
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
imum number of best found solutions and the maximum number of optimal so-
lutions. However, since our goal was to elaborate methods dealing with rich and
complex attributed graphs, BS was slightly superior to PDFS in terms of deviation
when evaluated on the MUTA dataset under large time constraint. This could be
improved by learning the best sorting strategy for a given database. Results also
indicated that there is always a trade-off between deviation and running time. In
other words, approximate methods are fast, however, they are not as accurate as
exact methods. On the other hand, DF and PDFS take longer time but lead to
better results (except on MUTA). By limiting the run-time, our exact method pro-
vides (sub)optimal solutions and becomes an efficient upper bound approximation
of GED with the same classification rate found by the best approximate method.
Even though DF and so PDFS were more accurate than PDFS, their classification
December 15, 2016 15:17 WSPC/INSTRUCTION FILE ws-ijprai
29
Acronym Details
DF -U B-LB DF without upper bound and
with h(p)=0.
DF -U B-LB DF without U B and with
h(p)=lb2.
DF -U B-LB DF with an initial U B equals
to BP, h(p)=0.
DF-UB-LB DF with an initial U B equals
to BP and lb2
PDFS Parallel GED with the best
parameters of DF
A∗ -LB the A∗ algorithm with lb2
A∗ the A∗ algorithm without lb2
BS-1, BS-10 and BS-100 Beam Search with OPEN size
= 1, 10 and 100, respectively
BP The bipartite GM
H The hausdorff algorithm.
GREC Protein
R Time (ms) R Time (ms)
DF -U B-LB 0.98 171401.54 0.44 128469.57
DF -UB-LB 0.98 163979.45 0.52 124361.61
DF -U B-LB 0.98 140675.00 0.40 147371.86
DF -UB-LB 0.98 140525.48 0.52 145779.68
PDFS-UB-LB 0.98 99850.79 0.52 80038.33
A∗ -LB 0.89 358158.76 0.29 1065106.80
A∗ 0.53 222045.94 0.26 194021.88
BS1 0.98 69236.34 0.24 129571.76
BS10 0.94 83928.21 0.26 139294.88
BS100 0.58 83928.20 0.26 141265.41
BP 0.98 62294.60 0.52 59041.84
H 0.96 63563.74 0.43 71990.62
Table 9. Classification on GREC and Protein. The best exact and approximate methods are
marked in bold style. Note that the response time is the average time needed to classify each test
graph
rate was as good as the best approximate GED (i.e., BP ). On this basis, one could
ask: what is the benefit of having a more precise algorithm in an a classification
context?
A future promising work could be to make PDFS more scalable to have more
precise and thus more optimal solutions under large time constraints. This could be
achieved by extending PDFS from a single machine algorithm to a multi-machines
December 15, 2016 15:17 WSPC/INSTRUCTION FILE ws-ijprai
30
MUTA
R Time (ms)
DF -U B-LB 0.70089 1139134.29
PDFS -U B-LB 0.70 760861.51
A∗ -LB 0.4574 856793.020
BS-1 0.55 1015688.00
BS-10 0.55 1256793.02
BS-100 0.55 1383838.66
BP 0.70 528546.64
H 0.70 376135.51
Table 10. Classification on MUTA. The best exact and approximate methods are marked in bold
style
one. Moreover, learning to sort vertices of G1 based on the structure and character-
istics of graphs is another promising perspective towards a more precise algorithm.
References
1. Zeina Abu-Aisheh, Romain Raveaux, and Jean-Yves Ramel. A graph database repos-
itory and performance evaluation metrics for graph edit distance. In Graph-Based
Representations in Pattern Recognition - GbRPR 2015., pages 138–147, 2015.
2. Zeina Abu-Aisheh, Romain Raveaux, Jean-Yves Ramel, and Patrick Martineau. An
exact graph edit distance algorithm for solving pattern recognition problems. pages
271–278, 2015.
3. Cinque L. Member S. Tanimoto S. Shapiro L. Allen, R. and D. Yasuda. A parallel
algorithm for graph matching and its maspar implementation. IEEE Transactions on
Parallel and Distributed Systems, 8(5):490–501, 1997.
4. H. A. Almohamad and Salih O. Duffuaa. A linear programming approach for
the weighted graph matching problem. IEEE Trans. Pattern Anal. Mach. Intell.,
15(5):522–525, 1993.
5. Dimitri P. Bertsekas and John N. Tsitsiklis. Parallel and Distributed Computation:
Numerical Methods. Athena Scientific, 1997.
6. Sebastien Bougleux, Luc Brun, Vincenzo Carletti, Pasquale Foggia, Benoit Gauzerec,
and Mario Vento. Graph edit distance as a quadratic assignment problem. Pattern
Recognition Letters, 2016.
7. Sbastien Bougleux, Luc Brun, Vincenzo Carletti, Pasquale Foggia, Benoit Gazre, and
Mario Vento. Graph edit distance as a quadratic assignment problem. Pattern Recog-
nition Letters, pages –, 2016.
8. A Boukedjar, M.E. Lalami, and D. El-Baz. Parallel branch and bound on a cpu-gpu
system. In Parallel, Distributed and Network-Based Processing (PDP), pages 392–398,
2012.
9. Luc Brun. Relationships between graph edit distance and maximal common structural
subgraph. 2012.
10. H. Bunke. On a relation between graph edit distance and maximum common sub-
graph. Pattern Recognition Letter., 18:689–694, 1997.
11. Imen Chakroun and Nordine Melab. Operator-level gpu-accelerated branch and bound
algorithms. In ICCS, volume 18, 2013.
December 15, 2016 15:17 WSPC/INSTRUCTION FILE ws-ijprai
31
12. Chia-Shin Chung, James Flynn, and Janche Sang. Parallelization of a branch and
bound algorithm on multicore systems. journal of Software Engineering and Applica-
tions, 5:12–18, 2012.
13. Thomas H. Cormen et al. Introduction to Algorithms. The MIT Press, 3rd edition,
2009.
14. Xavier Cortés and Francesc Serratosa. Learning graph-matching edit-costs based on
the optimality of the oracle’s node correspondences. Pattern Recognition Letters,
56:22–29, 2015.
15. Li Deng and Dong Yu. Deep learning: Methods and applications. Found. Trends Signal
Process., 7:197–387, 2014.
16. I. Dorta, C. Leon, and C. Rodriguez. A comparison between mpi and openmp branch-
and-bound skeletons. In Parallel and Distributed Processing Symposium, 2003. Pro-
ceedings. International, pages 66–73, 2003.
17. Maciej Drozdowski. Scheduling for Parallel Processing. Springer Publishing Company,
Incorporated, 1st edition, 2009.
18. Stefan Fankhauser, Kaspar Riesen, Horst Bunke, and Peter J. Dickinson. Suboptimal
graph isomorphism using bipartite matching. IJPRAI, 26(6), 2012.
19. Miquel Ferrer, Francesc Serratosa, and Kaspar Riesen. A first step towards exact
graph edit distance using bipartite graph matching. In Graph-Based Representations
in Pattern Recognition - 10th IAPR-TC-15 International Workshop, pages 77–86,
2015.
20. Andreas Fischer, Ching Y. Suen, Volkmar Frinken, Kaspar Riesen, and Horst Bunke.
A fast matching algorithm for graph-based handwriting recognition. Graph-Based
Representations in Pattern Recognition, pages 194–203, 2013.
21. Andreas Fischer, Ching Y. Suen, Volkmar Frinken, Kaspar Riesen, and Horst Bunke.
Approximation of graph edit distance based on hausdorff matching. Pattern Recogni-
tion, 48(2):331–343, 2015.
22. G. Allermann. H. Bunke. Inexact graph matching for structural pattern recognition.
Pattern Recognition Letters., 1:245–253, 1983.
23. Hero A Justice D. A binary linear programming formulation of the graph edit distance.
IEEE Trans Pattern Anal Mach Intell., 28:1200–1214, 2006.
24. Vipin Kumar, P. S. Gopalakrishnan, and Laveen N. Kanal, editors. Parallel Algorithms
for Machine Intelligence and Vision. Springer-Verlag New York, Inc., New York, NY,
USA, 1990.
25. Marius Leordeanu, Martial Hebert , and Rahul Sukthankar. An integer projected
fixed point method for graph matching and map inference. In Proceedings Neural
Information Processing Systems, pages 1114–1122, 2009.
26. Zhiyong Liu and Hong Qiao. GNCCP - graduated nonconvexityand concavity proce-
dure. IEEE Trans. Pattern Anal. Mach. Intell., 36:1258–1267, 2014.
27. K. Riesen M. Neuhaus and H. Bunke. Fast suboptimal algorithms for the computation
of graph edit distance. Proceedings of 11th International Workshop on Structural and
Syntactic Pattern Recognition., 28:163–172, 2006.
28. Michael O. Neary and Peter R. Cappello. Advanced eager scheduling for java-based
adaptive parallel computing. Concurrency - Practice and Experience, 17:797–819,
2005.
29. M. Neuhaus and H. Bunke. Bridging the gap between graph edit distance and kernel
machines. Machine Perception and Artificial Intelligence., 68:17–61, 2007.
30. V N Rao and V Kumar. Parallel depth-first search on multiprocessors part i: Imple-
mentation. International journal on Parallel Programming, 16(6):479–499, 1987.
31. Bunke H.. Riesen, K. Iam graph database repository for graph based pattern recog-
December 15, 2016 15:17 WSPC/INSTRUCTION FILE ws-ijprai
32