Pathlad+: An Improved Exact Algorithm For Subgraph Isomorphism Problem

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)

PathLAD+: An Improved Exact Algorithm


for Subgraph Isomorphism Problem
Yiyuan Wang1,2 , Chenghou Jin3 , Shaowei Cai4,5,∗ and Qingwei Lin6
1
School of Computer Science and Information Technology, Northeast Normal University, China
2
Key Laboratory of Applied Statistics of MOE, Northeast Normal University, Changchun, China
3
Computer School, Beijing Information Science and Technology University, Beijing, China
4
State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences,
Beijing, China
5
School of Computer Science and Technology, University of Chinese Academy of Sciences, China
6
Microsoft Research, China
[email protected], [email protected], [email protected], [email protected]

Abstract to search for some similarities between chemical compounds


from their structural formula [Ohlrich et al., 1993].
The subgraph isomorphism problem (SIP) is a chal-
lenging problem with wide practical applications. It is well known that the SIP is NP-complete in the gen-
In the last decade, despite being a theoretical hard eral case [Johnson and Garey, 1979]. For the optimized ver-
problem, researchers designed various algorithms sion of the SIP, i.e., the maximum common induced sub-
for solving SIP. In this work, we propose three main graph (MCS) problem, many methods have been presented
heuristics and develop an improved exact algorithm to deal with the MCS problem [McCreesh et al., 2016a;
for SIP. First, we design a probing search proce- McCreesh et al., 2017; Liu et al., 2020; Gocht et al., 2020;
dure to try whether the search procedure can suc- Zhou et al., 2022; Liu et al., 2022]. The MCS approaches can
cessfully obtain a solution at first sight. Second, be directly used into solving the SIP, but they usually have
we design a novel matching ordering as a value- poor performance practically due to the characteristics of the
ordering heuristic, which uses some useful infor- SIP as the decision problem. Thus, up to now, the SIP has
mation obtained from the probing search procedure been still considered as a challenging problem.
to preferentially select some promising target ver-
tices. Third, we discuss the characteristics of differ- In the last decade, lots of researchers focused on design-
ent propagation methods in the context of SIP and ing several exact methods to address the SIP [Zampelli et
present an adaptive propagation method to make a al., 2010; Solnon, 2010; Bonnici et al., 2013; Audemard et
good balance between these methods. Experimen- al., 2014; McCreesh and Prosser, 2015; Carletti et al., 2017;
tal results on a broad range of real-world bench- McCreesh et al., 2018; Archibald et al., 2019; Solnon, 2019;
marks show that our proposed algorithm performs McCreesh et al., 2020]. We list some representative solvers
better than state-of-the-art algorithms for the SIP. for the SIP as below. An early algorithm for the SIP named
VF2 was proposed, which used a state space representation
of the matching process and introduced a set of five feasibil-
1 Introduction ity rules for pruning the search tree [Cordella et al., 2004].
The (non-induced) subgraph isomorphism problem (SIP), Bonnici et al. [2013] developed a new search strategy called
which is also known as the subgraph matching problem, in- RI based on the pattern graph topology, which significantly
volves deciding if there exists a copy of a pattern graph in reduced the search space without using any complex prun-
a target graph. As one of the basic concepts of graph the- ing rules or reduction procedures. Solnon [2010] introduced
ory, the SIP can be seen as a generalization of both the max- a new filtering algorithm called LAD based on local all-
imum clique problem and the problem of testing whether a different constraints. The LAD algorithm was further im-
graph contains a Hamiltonian cycle. Recently, the SIP has proved by combining the local all-different constraints with
been used in various domains, such as symbol recognition the exploitation of path length properties, resulting in the
[Lladós et al., 2001], social networks [Snijders et al., 2006], PathLAD algorithm [Kotthoff et al., 2016]. Very recently,
computer vision [Damiand et al., 2011], biochemical data Kraiczy and McCreesh [2021] improved the Glasgow [Mc-
[Bonnici et al., 2013], RDF query processing [Kim et al., Creesh et al., 2020] by using a new form of filtering based
2015] and graph databases [Wang et al., 2022]. For example, upon clique-finding and designed a new algorithm called
the SIP has also been used in the field of cheminformatics Glasgow+Clq. According to the literature, the current best al-
gorithm for the SIP is Glasgow+Clq [Kraiczy and McCreesh,

Corresponding author 2021].

5639
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)

1.1 Our Contribution PathLAD+ outperforms four state-of-the-art SIP algorithms


Motivated to contribute to further improving the performance for all the benchmarks. In addition, our experimental analy-
of SIP, in this work, we choose the PathLAD algorithm as a ses report that the proposed strategies play important roles in
baseline algorithm. Our proposed algorithm is divided into the outstanding performance of our proposed algorithm.
two parts, including a probing search procedure and a main In the next section, we introduce some necessary back-
search procedure. Below are three main novel ideas in our ground knowledge. After that, we present our proposed al-
proposed algorithm. gorithm and its components. Experimental results are shown
First, we propose a probing search procedure in which in Section 4. Finally, we make conclusions.
the algorithm tries several times to judge whether the pattern
graph is isomorphic to a subgraph of the target graph quickly. 2 Preliminaries
It has two main purposes. On the one hand, if this procedure 2.1 Basic Definitions and Notations
can successfully solve a given instance, we can obtain an out-
come within a short time. On the other hand, if this procedure Let G = (V, E) be an undirected graph where a vertex set is
cannot get any outcomes (i.e., reaching cutoff time), instead V = {v1 , v2 , . . . , vn } and an edge set E = {e1 , e2 , . . . , em }.
of using some traditional restart mechanisms, we can still ex- Each edge e is a 2-element subset of V , i.e., e = (v, u). For
tract the information from this search procedure to guide a an edge e = (v, u), we say vertices v and u are the end-
main search procedure. That is, the algorithm learns about the points of edge e. For a vertex v ∈ V , the neighborhood of
useful search information between pattern vertices and target vertex v is denoted as NG (v) = {u | (v, u) ∈ E} and its
vertices and then uses this information in our proposed main degree is denoted as degG (v) = |NG (v)|. A finite walk is a
procedure. sequence of edges (e′1 , e′2 , . . . , e′q−1 ) for which there is a se-
Second, we design a new matching ordering method to quence of vertices (v1′ , v2′ , . . . , vq′ ) such that e′i = (vi′ , vi+1

)
decide which target vertex from the domain of the corre- for i ∈ [1, q − 1]. A path is a finite walk in which all vertices
sponding pattern vertex is selected. Recently, several match- and all edges are distinct, denoted as ζ G = (v1′ , v2′ , . . . , vq′ ).
ing ordering methods were proposed [Archibald et al., 2019; The length of ζ G is denoted as |ζ G | = q.
Wang et al., 2022]. For example, Archibald et al. [2019] Given a pattern graph Gp = (Vp , Ep ) and a target graph
found that it is effective to preferentially select vertices with Gt = (Vt , Et ), the SIP is to decide whether Gp is isomorphic
high degree values when selecting a matching target vertex. to some subgraph of Gt . Formally, the aim of the SIP is to ob-
We follow this line of research by attempting to apply the tain an injective matching f : Vp → Vt that associates a differ-
degree information of target vertices in the matching pro- ent target vertex to each pattern vertex, and preserves pattern
cess. At the same time, we also use the useful search infor- edges, i.e., (f (v), f (u)) ∈ Et for ∀(v, u) ∈ Ep . It is noted
mation generated from the probing search procedure as an- that the subgraph is not necessarily induced, which means
other matching criterion. Thus, our proposed matching order- that two pattern vertices that are not linked by an edge may be
ing method considers the above two principles, resulting in a matched to two target vertices that are linked by an edge. Dur-
novel scoring function denoted as oscore used in the match- ing the search procedure, the current matched list of pattern
ing process. and target pairs is denoted as D = {{v1p , v1t }, . . . , {vrp , vrt }}.
Third, we present an adaptive propagation method to dy- For a pattern vertex v p ∈ Vp , the domain of vertex v p is de-
namically use different strong propagation methods for the fined as the set of target vertices that may be matched to v p ,
SIP. Previous algorithms have always applied strong propa- i.e., Dom(v p ) = {v1t , v2t , . . . , vlt }, and the size of its domain
gation methods to remove some unnecessary vertices from is |Dom(v p )| = l.
the corresponding domains, but these propagation methods
need to cost lots of run time in practice, which reduces the 2.2 Some Propagation Methods for the SIP
performance of these algorithms. In some cases, instead of Recently, three filtering propositions [Zampelli et al., 2010;
strong propagation methods, some weak propagation meth- McCreesh and Prosser, 2015] have been used in Glas-
ods can make backtrack quickly or reduce the size of the gow+Clq [Kraiczy and McCreesh, 2021]. We first introduce
corresponding domain effectively on some branches. Con- three propositions that are used to judge whether the pattern
versely, the performance of these algorithms would be also vertices can be matched to the corresponding target vertices.
bad if they don’t use any strong propagation methods because Proposition 1. Given a pattern graph Gp = (Vp , Ep ) and
the algorithms fail to backtrack some branches immediately. a target graph Gt = (Vt , Et ), if v p ∈ Vp can be matched
Based on the above considerations, we analyze the properties to v t ∈ Vt (i.e., f (v p ) = v t ), it must satisfy degGp (v p ) ≤
of strong propagation methods and combine the search infor-
mation generated from the main search procedure to dynami- degGt (v t ).
cally employ different strong propagation methods during the Proposition 2. Given a pattern graph Gp = (Vp , Ep ) and a
search. To our best knowledge, it is the first time that differ- target graph Gt = (Vt , Et ), if v p ∈ Vp can be matched to
ent propagation methods are dynamically used to accelerate v t ∈ Vt (i.e., f (v p ) = v t ), it must satisfy the i-th value of
the search procedure for addressing SIP. N D(v t ) is not less than the same position of N D(v p ) where
By incorporating these ideas, we develop an improved ex- N D(v p ) = {degGp (up ) | up ∈ NGp (v p )}, N D(v t ) =
act algorithm for the SIP called PathLAD+. Extensive exper- {degGt (ut ) | ut ∈ NGt (v t )} and the positions of elements
iments are carried out to evaluate PathLAD+ on the bench- in N D(v p ) and N D(v t ) both are arranged in a descending
marks used in the literature. Experimental results show that order of the degree values.

5640
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)

Proposition 3. Given a pattern graph Gp = (Vp , Ep ) and Algorithm 1 PathLAD+


a target graph Gt = (Vt , Et ), if v p and up in Vp can Input: Pattern graph Gp , target graph Gt and the cutoff time
be matched to v t and ut in Vt (i.e., f (v p ) = v t and Output: outcome st
f (up ) = ut ) as well as P athp and P atht are not empty, 1: reduce the domain of pattern vertices based on Proposi-
it must satisfy |P athp | ≤ |P atht | where P athp = {ζ p | tions 1 and 2;
ζ p = (v p , . . . , up ), |ζ p | = 3} and P atht = {ζ t | ζ t = 2: if some domains become empty then
(v t , . . . , ut ), |ζ t | = 3} . 3: return false;
Some propagation methods of difference constraints [Sol- 4: end if
non, 2010] are used in the PathLAD [Kotthoff et al., 2016], 5: nbnodes := nbf ail := 0 and N b := +∞;
which are shown as below. 6: switchL := switchA := 1;
7: st := unknown;
• Vertex constraint denoted as FC(Diff): whenever a
8: ProSearch(Gp , Gt );
pattern vertex v p is matched to a target vertex v t ,
9: return SearchSIP(Gp , Gt , ∅, cutoff);
FC(Diff) removes v t from the domains of all non-
matched pattern vertices. The time complexity of
FC(Diff) is O(|Vp |). to any subgraph of the target graph; unknown means that the
• Edge constraint denoted as FC(Edges): whenever algorithm cannot solve a given instance within a cutoff time.
a pattern vertex v p is matched to a target vertex v t , In the beginning, the algorithm reduces the domain of pattern
FC(Edges) removes any target vertex not adjacent to v t vertices in a given pattern graph according to Propositions 1
from the domain of every pattern vertex adjacent to v p . and 2. If any domain becomes an empty set, the algorithm
The time complexity of FC(Edges) is O(degGp (v p ) · returns false. Otherwise, six variables are initialized accord-
|Vt |). ingly (Lines 5–7). Then, the proposed algorithm can be di-
• Global neighborhood constraint denoted as vided into two procedures, including a probing search pro-
GAC(allDiff): It ensures that all pattern vertices cedure (ProSearch in Line 8) and a main search procedure
can be assigned to different target vertices. In detail, if a (SearchSIP in Line 9).
set of k pattern vertices can be found with only k target
3.1 The Search Framework for SIP
vertices among the domains of their pattern vertices,
then those target vertices can be removed from the The main function SearchSIP is shown in Algorithm 2, which
domains of other pattern vertices. The time complexity is a recursive function. The input variable D is denoted as
of GAC(allDiff) is O(|Vp |2 · |Vt |2 ). an already-matched list. If all vertices in the pattern graph
are matched, which means that the algorithm has found a
• Filtering method denoted as LAD-filtering: for matched list for all pattern vertices, the algorithm returns
v t ∈ Dom(v p ), a bipartite graph is defined true (Lines 1–2). Otherwise, if the time limit is reached,
as G(vp ,vt ) = (NGp (v p ), NGt (v t ), E(vp ,vt ) ) where the algorithm returns unknown (Lines 3–4). The value of
E(vp ,vt ) = {(v ′ , u′ )|v ′ ∈ NGp (v p ), u′ ∈ NGt (v t ), u′ ∈ nbnodes is increased by 1 (Line 6). The algorithm chooses a
Dom(v ′ )}. If there does not exist a matching of the non-matched pattern vertex vip with the smallest domain size,
bipartite graph G(vp ,vt ) that covers NGp (v p ), the pat- breaking ties by picking the one with the biggest degree value
tern vertices adjacent to v p cannot be matched to all (Line 7). Afterward, the algorithm arranges the positions of
different target vertices and thus v t can be removed target vertices in Dom(vip ) based on a novel matching order-
from Dom(v p ). The time complexity of LAD-filtering ing method (i.e., oscore), which will be introduced in Section
2
is O(|Vp | · |Vt | · degG p
(v p ) · degG
2
t
(v t )). 3.3 (Line 8). In Lines 9–23, the algorithm tries to match each
target vertex in Dom(vip ) orderly. Before executing Line 9,
Note that two strong propagation methods LAD-filtering
the algorithm will store the domain of all pattern vertices. In
and GAC(allDiff) are implemented by the matching algorithm
Line 14, the algorithm restores the domain of all pattern ver-
Hopcroft and Karp, and more details can be seen [Solnon,
tices to their previous saved domain in Line 9. The algorithm
2010].
orderly tries to match a target vertex vit in the Dom(vip ) to the
selected pattern vertex vip (Line 10). In each time, the algo-
3 The PathLAD+ Algorithm rithm reduces the domain of each non-matched pattern vertex
This section describes the proposed PathLAD+ algorithm in based on an adaptive propagation method APM, which will
Algorithm 1. Details of important functions in PathLAD+ be mentioned in Section 3.4 (Line 11). If the domain of some
will be presented in the following subsections. We use pattern vertex becomes empty, which means that vit cannot
switchL and switchA to control whether the proposed al- be matched to vip , the algorithm will restore and then con-
gorithm uses LAD-filtering and GAC(allDiff), respectively. tinue to select the next target vertex in Dom(vip ) (Lines 12–
Meanwhile, nbnodes records the sum of call times and back- 16). The corresponding values of nbf ail and nbnodes will
track times of SearchSIP, nbf ail records the number of back- be increased by 1 (Line 13). If the algorithm doesn’t obtain
track times of SearchSIP, and N b is used in our proposed any empty domains, it will search for the next pattern vertex
adaptive propagation method. The output value st has three (Line 17). st stores the backtracking result of SearchSIP. If
values: true means that the algorithm can return a successful st equals false, the algorithm needs to restore the related do-
matched list; false means the pattern graph is not isomorphic mains and then continue to select the next target vertex (Lines

5641
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)

Algorithm 2 SearchSIP target vertex vit in Dom(v p ), the info({v p , v t }) is increased


Input: Pattern graph Gp , target graph Gt , an already- by 1.
matched list of pattern and target pairs D = The proposed probing search procedure ProSearch works
{{v1p , v1t }, . . . , {vi−1
p t
, vi−1 }} and the cutoff time as follows. During the search procedure of ProSearch, the al-
Output: outcome st gorithm uses four propagation methods and the third propo-
1: if all the pattern vertices have been matched to respective sition which have already been introduced in Section 2.2 to
target vertices then reduce the domain of pattern vertices. ProSearch has two
2: return true; search modes. In the first mode, to explore the vertices in the
3: else if elapsed time > cutoff then deep depth of the search tree, the algorithm runs SearchSIP
4: return unknown; with a cutoff time of 10 seconds without sorting the domains
5: end if in any way (i.e., the default lexicographical order). Because
6: nbnodes++; some SIP instances can be found with only a small num-
p
7: select a vertex vi with the smallest domain size ber of conflicts, the heavy commitment to early branching
p
|Dom(vi )| from all non-matched pattern vertices, break- choices made by backtracking search can be extremely costly
ing ties by picking the one with the biggest degree value; for these instances [Archibald et al., 2019]. Based on the
/∗ recording info values in ProSearch, see Sec. 3.2 ∗/ above consideration, in the second mode, the algorithm runs
p
8: sort the position of vertices in Dom(vi ) based on the SearchSIP 20 times with a cutoff time of 1 second each time.
descending order of oscore values; /∗ see Sec. 3.3 ∗/ To diversify early branch selections, the second mode sorts
p
9: for each target vertex vit ∈ Dom(vi ) satisfying Propo- the position of target vertices in the corresponding domains
sition 3 do randomly each time.
10: match vit to vip ; According to preliminary experiments, we found updating
11: Dom(vjp ) := APM(Gt , vjp ) for each non-matched pat- info during the main search procedure caused some vertex
tern vertex vjp ; pairs with high info values and led to the poor performance.
Thus, we restrict updating info only in ProSearch.
12: if some domains become empty then
13: nbf ail++ and nbnodes++;
14: restore the domain of some pattern vertices; 3.3 A Novel Matching Ordering Method
15: continue; In the search procedure of SearchSIP, among non-matched
16: end if pattern vertices, we select a pattern vertex with the smallest
17: st := SearchSIP(Gp , Gt , D ∪ {vip , vit }, cutoff); domain size. After choosing a pattern vertex, the next key
18: if st == false then step is how to select a target vertex from the domain of the
19: restore the domain of some pattern vertices; selected pattern vertex. Whatever matching ordering method
20: else is used, the method will only affect the performance for some
21: return st; instances that have a successful matched list, whereas it has
22: end if no influence on some instances where a given pattern graph
23: end for is not isomorphic to any subgraph of the target graph because
24: return false; a complete search must be performed.
During the probing search procedure, we use info to col-
lect useful information on the relationship between pattern
18–19). and target vertices. After some pattern vertices have already
matched to different target vertices, we assume that a target
3.2 The Probing Search Procedure for SIP vertex vit is often included in the domain of a pattern ver-
Before calling a main search procedure, a probing search pro- tex Dom(v p ), i.e., info({v p , vit }) with a high value. We be-
cedure ProSearch plans to use less time to try to successfully lieve that vit has more potential to match v p compared to other
solve an instance. If ProSearch can successfully solve an in- target vertices because a matched pair {v p , vit } would bring
stance, the algorithm can return a matched list of pattern and few conflicts. This means that some other pattern vertices
target pairs quickly or can determine that the pattern graph are more likely to successfully find the corresponding target
is not isomorphic to a subgraph of the target graph. Other- vertices in the following search when vit matches v p . At the
wise, the algorithm can also grasp some useful information, same time, we consider the structure information of the tar-
denoted as info in our algorithm, from this search procedure, get graph such as the degree value in our proposed matching
i.e., recording which target vertices are included in the do- ordering method.
main of the corresponding pattern vertex during this proce- As a result, we have the notion of a novel ordering score,
dure. It means that the information obtained from ProSearch which is formally defined as follows.
can reflect which vertex pair has more potential. Definition 1. For a pattern graph Gp = (Vp , Ep ) and a tar-
The specific way of updating info values is presented as get graph Gt = (Vt , Et ), the ordering score function, de-
follows. At first, the info value of each pair of pattern and noted as oscore is a function on v p ∈ Vp and vt ∈ Dom(v p )
target vertices is initialized to 0. In the search procedure, as- such that
suming that we select a pattern vertex v p , we scan all target
vertices in the domain of v p (Line 7 in Algorithm 2). For each oscore(v p , v t ) = info({v p , v t }) + degGt (v t )

5642
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)

In our proposed matching ordering method, when the al- Algorithm 3 APM
gorithm chooses a pattern vertex v p , the positions of target Input: Target graph Gt and a non-matched pattern vertex v p
vertices in the domain of a selected pattern are arranged in a Output: The reduced domain D(v p ) of v p
descending order of the oscore values (Line 8 in Algorithm 1: reduce D(v p ) based on FC(Diff) and FC(Edges);
2). The proposed matching ordering method depends on the 2: if Gt is not a sparse graph then
search information of ProSearch. In the experimental section, 3: if nbnodes > max tries && nbf ail/nbnodes > β1
we will show that this method has outstanding performance then
compared to several other sorting methods. 4: switchL := 0;
3.4 Adaptive Propagation Method 5: if switchL == 0 at the first time then
6: N b := nbnodes;
For the SIP, Glasgow [McCreesh et al., 2020] and PathLAD 7: end if
[Kotthoff et al., 2016] have outstanding performance, but they
8: end if
are completely different search strategies. Especially, Glas- 9: if switchL == 0 && nbnodes > 2N b &&
gow combines a weak propagation method with a fast restart nbf ail/nbnodes > β2 then
mechanism. According to our preliminary experiments, Glas- 10: switchA := 0;
gow can make at least 104 recursive calls per second for some 11: end if
instances. On the contrary, PathLAD uses a strong propaga- 12: end if
tion method at each stage of the search, and thus it sometimes 13: if switchL == 1 then
makes less than one recursive call per second when dealing 14: reduce D(v p ) based on LAD-filtering;
with some large target graphs. Based on our observations, no 15: end if
current algorithms for the SIP use different strengths of prop- 16: if switchA == 1 then
agation methods at different stages of the search. Thus, our 17: reduce D(v p ) based on GAC(allDiff);
motivation is to design a method that can flexibly use some 18: end if
propagation methods in the search. 19: return D(v p );
In the case of ProSearch procedure, the algorithm always
uses LAD-filtering and GAC(allDiff). Both of them have high
time complexity. Therefore, in the main search procedure,
we design an adaptive propagation method to guide the use of ing LAD-filtering on the wrong branches as early as possi-
strong propagation methods GAC(allDiff) and LAD-filtering. ble. On the other hand, if nbnodes is larger than max tries,
The pseudo-code of APM is shown in Algorithm 3. the algorithm has already explored some parts of the whole
Let us consider LAD-filtering first. We define a target graph search space. For this case, we think the relationship be-
to be sparse if the median of its vertex degrees is less than tween nbf ail and nbnodes can provide some useful infor-
degm . In our work, degm is set to 20. When a pattern vertex mation for a given instance. If the number of failed vertices
v p is matched to a target vertex vit ∈ Dom(v p ), LAD-filtering in the search procedure is large (i.e., nbf ail/nbnodes > β1
ensures that every pattern vertex in NGp (v p ) can match dif- where β1 is a parameter), it means that the algorithm has al-
ferent target vertices in NGt (vit ). Its execution time is based ready backtracked a lot and thus the algorithm no longer uses
on the degree values of v p and vit . Because degGt (vit ) must be LAD-filtering (Lines 3–4).
larger than or be equal to degGp (v p ), we just need to focus on In the following, we consider another strong propaga-
the degree of target vertex vit . If the target graph is sparse, the tion method GAC(allDiff). Although this constraint has
execution time of LAD-filtering is reasonable and we think high time complexity in theory, it is actually faster than
that using it at every stage of the main search procedure is LAD-filtering in most cases. We will explain this reason as
feasible. below. GAC(allDiff) constructs a bipartite graph between
In other cases, if the algorithm often backtracks due to lots pattern vertices and target vertices. If a pattern vertex v p is
of conflicts, the algorithm can actually turn to use some weak matched to a target vertex vit , GAC(allDiff) will remove vit
propagation methods including FC(Diff) and FC(Edges) in- from the domain of some other pattern vertices and ensures
stead of strong propagation methods. Although using weak that all pattern vertices can still match different target ver-
propagation methods may result in searching deeper on the tices. Removing a selected target vertex from the generated
wrong branch compared to strong propagation methods, the bipartite graph only needs to find the next free target ver-
algorithm can backtrack faster because the complexity of tices for some pattern vertices by looking for an augmenting
these weak methods is quite low. For such cases, calling path [Derigs, 1981]. In fact, the size of a given target graph
LAD-filtering multiple times during the main search proce- is usually larger than that of a corresponding pattern graph.
dure will waste a lot of computation time. In our work, we Thus, when the sizes of the two graphs are quite different,
analyze whether backtracking often occurs in the main search GAC(allDiff) is likely to be run in linear time.
procedure by observing the values of nbnodes and nbf ail. In the main search procedure, after disabling the
Meanwhile, we use a parameter max tries as the upper LAD-filtering, the algorithm begins to consider whether to
bound of nbnodes. In detail, on the one hand, If nbnodes disable the GAC(allDiff). When LAD-filtering is forbidden
is smaller than max tries, it may occur in the early stage of for the first time, we use variable N b to record the current
the search procedure. Because the backtracking for branch value of nbnodes (Lines 5–6). GAC(allDiff) will continue
selection is costly, we want to explore more conflicts by us- to be used until nbnodes has been increased twofold since

5643
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)

LAD-filtering is disabled at the first time, i.e., nbnodes > • images-PR15 (24 instances): There are 24 pattern
2N b. At this time, if the algorithm still backtracks frequently graphs that have between 4 and 170 vertices and 1 target
(i.e., nbf ail/nbnodes > β2 where β2 is a parameter), we graph that has 4838 vertices. All the graphs have been
disable GAC(allDiff) (Lines 9–10). In the subsequent search derived from segmented images [Solnon et al., 2015].
procedure, the algorithm doesn’t employ any strong propaga-
• scalefree (100 instances): Each instance contains a tar-
tion methods including LAD-filtering and GAC(allDiff).
get graph whose vertices are between 200 and 1000 and
Remark that, in our work, the switch of propagation meth-
a pattern graph whose vertices are 90% of the vertices of
ods is one-way. The strength of weak propagation methods
the corresponding target graph. All the instances in the
increases significantly with search depth, so there is no need
benchmark have been randomly generated using a power
to switch back to using strong propagation methods when the
law distribution of degrees [Solnon, 2010].
search depth reaches a certain point. Based on our prelimi-
nary experiments, we found that one-way switching was both • si (1170 instances): Each instance is composed of a tar-
straightforward and effective, whereas two-way switching ex- get graph (between 200 and 1296 vertices) and a pat-
hibited poor performance on some instances. Recently, re- tern graph (between 20% and 60% of the vertices of
searchers have developed dynamic choice methods for several the corresponding target graph). This benchmark is
well-known problems, such as CSP [Stergiou, 2021]. One from bounded valence graphs, modified bounded va-
crucial step in algorithm design is to dynamically combine lence graphs, 4D meshes, and random generated graphs
various methods for a particular problem. It’s worth noting [Solnon, 2010].
that our method is the first to use a dynamic choice approach
to select propagation methods for the SIP. • phase-transition (200 instances): These random in-
Here, we will summarize the impact of the parameters β1 stances are chosen to be close to the satisfiable-
and β2 on the propagator choices as below. Parameters β1 and unsatisfiable phase transition. Pattern graphs have 30
β2 are two thresholds that define whether a given instance vertices, while target graphs have 150 vertices [Mc-
is easy to backtrack due to numerous conflicts. When the Creesh et al., 2016b].
conflict ratio is larger than these two parameters, we turn to • LV (1176 instances): The selected 49 graphs whose ver-
using some simple propagation methods to make backtrack tices are between 10 and 128 are considered as pattern
fast. Specifically, a higher value of β1 indicates a greater tol- and target graphs, and this benchmark has already been
erance for conflicts, allowing us to use all propagation meth- used as the tested benchmark [Liu et al., 2020]. These
ods. However, if the conflict ratio surpasses β1 , we discard graphs have different properties [Solnon, 2010], such as
the LAD-filtering method. On the other hand, a larger value connected, biconnected, triconnected, etc.
of β2 implies a higher tolerance for conflicts to solely rely
on the strong propagation method GAC(allDiff). When the • LargerLV (3430 instances): From the above 49 LV
conflict ratio exceeds β2 , we also abandon the GAC(allDiff). graphs as the pattern graph and the other 70 graphs as the
target graph whose vertices are between 138 and 6671.
More details of the target graphs can be seen on the web-
4 Experimental Evaluation site2 .
In this section, we carry out experiments to evaluate Path-
LAD+ on a broad range of various benchmarks, compared 4.2 Experiment Setup
against the state-of-the-art algorithms for the SIP.
We compare PathLAD+ with four state-of-the-art SIP al-
4.1 Benchmarks gorithms, including Glasgow+Clq [Kraiczy and McCreesh,
2021], PathLAD [Kotthoff et al., 2016], RI [Bonnici et al.,
For our experiments, we select all used instances from 2013] and VF2 [Cordella et al., 2004]. The codes of these
[Kraiczy and McCreesh, 2021; Liu et al., 2022] which can
competitors are kindly provided by the authors. Our source
also download from the website1 . In total, we choose 15396 code is publicly available at github3 . Our proposed algorithm
instances, which can be grouped into 8 benchmarks. and four competitors are all implemented in C++ and com-
• images-CVIU11 (6278 instances): This benchmark in- piled by g++ with ‘-O3’ option. All the algorithms are run on
cludes 43 pattern graphs and 146 target graphs, which Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz 512GB
have been generated from segmented images [Damiand RAM under CentOS 7.9. The cutoff time is 3600 seconds
et al., 2011]. In the benchmark, pattern graphs have be- for each instance. According to our preliminary experiments,
tween 22 and 151 vertices, whereas target graphs have parameters max tries, β1 , and β2 are set to 1000, 0.85, and
between 1072 and 5972 vertices. 0.8, respectively.
• meshes-CVIU11 (3018 instances): It is composed of 6 For each algorithm, we report the number of instances for
pattern graphs and 503 target graphs, which have been each benchmark (#inst) and the number of successful solved
generated from meshes modeling 3D object [Damiand instances (#solved). The bold values in the tables indicate
et al., 2011]. The number of vertices for pattern graphs the best solution among all the algorithms.
is from 40 to 199, while the number of vertices is from 2
208 to 5873. https://github.com/ciaranm/cpaior2021-finding-subgraphs-
with-side-constraints/tree/main/instances/largerGraphs
1 3
http://liris.cnrs.fr/csolnon/SIP.html https://github.com/yiyuanwang1988/PathLAD-Plus

5644
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)

Benchmark PathLAD+ Glasgow+Clq PathLAD RI VF2


#inst #solved #solved #solved #solved #solved
images-CVIU11 6278 6278 6278 6278 6278 6278
meshes-CVIU11 3018 3008 2987 2983 2695 2647
images-PR15 24 24 24 24 24 24
scalefree 100 100 100 100 82 21
si 1170 1170 1170 1109 1163 886
phase-transition 200 134 128 44 31 0
LV 1176 1139 1136 1130 1039 811
LargerLV 3430 3344 3318 3300 3154 2505
#total 15396 15197 15141 14968 14466 13172

Table 1: Experiment results on all the benchmarks.

4.3 Experiment Results


We show the experiment results of our proposed algorithm
and all competitors in Table 1. As observed from the results
of Table 1, PathLAD+ performs much better than our base-
line algorithm PathLAD on all the benchmarks. Overall, the
performance of PathLAD+ totally dominates Glasgow+Clq,
PathLAD, RI, and VF2. Because all algorithms can solve
simple instances very well, we mainly focus on some hard
instances. We can find that the performance of PathLAD+ is
significantly better than all competitors on some hard bench-
marks, especially in meshes-CVIU11. In this benchmark, all
competitors have at least more than 30 unsolvable instances,
whereas PathLAD+ only has 10 unsolvable instances within a
cutoff time. Among the selected 15396 instances, PathLAD+
can solve 15197 instances within a cutoff time whereas the
current best algorithm Glasgow+Clq can only solve 15141 in-
stances. Furthermore, to intuitively display the performance
of each algorithm, we report detailed results in Figure 1,
Figure 1: Detailed Results of PathLAD+ and all competitors on all which verifies the effectiveness of our proposed algorithm.
the benchmarks.
4.4 Analysis of Proposed Strategies
To confirm the effectiveness of our proposed matching or-
dering method, we evaluate different matching ordering
methods on our baseline algorithm PathLAD, including 1)
PathLAD-our uses our proposed matching ordering method;
2) PathLAD-degree selects a target vertex with the biggest
degree value from the given domain; 3) PathLAD-random
chooses a random target vertex from the given domain; 4)
PathLAD-anti picks a target vertex with the smallest degree
value from the given domain. Since different matching order-
ing methods only affect some isomorphic satisfiable instances
[Archibald et al., 2019], we have shown the performance of
different matching ordering methods in these instances in Fig-
ure 2. Results show that our proposed matching ordering
method performs better than other methods. Moreover, the
proposed sorting method effectively utilizes the useful infor-
mation generated from the probing search procedure, and it
clearly improves the performance of SIP.
We compare PathLAD with one alternative algorithm
Figure 2: The run time of PathLAD with different matching ordering PathLAD-1 that uses the adaptive propagation method.
strategy on all the isomorphic satisfiable instances.
PathLAD-1 and PathLAD don’t use any matching ordering
methods, and the effectiveness of the adaptive propagation
method can be clearly observed in Figure 3. The different

5645
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)

method can be considered as a general idea to solve some


other NP-hard problems [Chen et al., 2023].

Acknowledgements
This work was supported by CAS Project for Young Sci-
entists in Basic Research (Grant No.YSBR-040), NSFC
(61806050), Jilin Science and Technology Association
QT202005, and Science and Technology Development
Program of Jilin Provice (YDZJ202201ZYTS412 and
20230101060JC). We would like to thank the anonymous ref-
erees for their helpful comments.

References
[Archibald et al., 2019] Blair Archibald, Fraser Dunlop,
Ruth Hoffmann, Ciaran McCreesh, Patrick Prosser, and
James Trimble. Sequential and parallel solution-biased
Figure 3: The run time of PathLAD and PathLAD-1 on all the search for subgraph algorithms. In CPAIOR, pages 20–38,
benchmarks. 2019.
[Audemard et al., 2014] Gilles Audemard, Christophe
Lecoutre, Mouny Samy-Modeliar, Gilles Goncalves,
and Daniel Porumbel. Scoring-based neighborhood
dominance for the subgraph isomorphism problem. In
CP, pages 125–141, 2014.
[Bonnici et al., 2013] Vincenzo Bonnici, Rosalba Giugno,
Alfredo Pulvirenti, Dennis Shasha, and Alfredo Ferro. A
subgraph isomorphism algorithm and its application to
biochemical data. BMC bioinformatics, 14(7):1–13, 2013.
[Carletti et al., 2017] Vincenzo Carletti, Pasquale Foggia,
Alessia Saggese, and Mario Vento. Introducing vf3: A
new algorithm for subgraph isomorphism. In GbRPR,
pages 128–139, 2017.
[Chen et al., 2023] Jiejiang Chen, Shaowei Cai, Yiyuan
Wang, Wenhao Xu, Jia Ji, and Minghao Yin. Improved
local search for the minimum weight dominating set prob-
lem in massive graphs by using a deep optimization mech-
anism. Artificial Intelligence, 314:103819, 2023.
Figure 4: The run time of PathLAD+ and PathLAD on all the bench-
marks. [Cordella et al., 2004] Luigi P Cordella, Pasquale Foggia,
Carlo Sansone, and Mario Vento. A (sub) graph isomor-
phism algorithm for matching large graphs. IEEE Trans-
colored points show the instance from the different bench- actions on Pattern Analysis and Machine Intelligence,
marks. Figures 2 and 3 intuitively show that the proposed 26(10):1367–1372, 2004.
two strategies make an important role in our proposed algo-
rithm. Besides, because PathLAD is a baseline algorithm of [Damiand et al., 2011] Guillaume Damiand, Christine Sol-
our proposed algorithm, we compare PathLAD+ with Path- non, Colin De la Higuera, Jean-Christophe Janodet, and
LAD in terms of run time in Figure 4. Once again, the results Émilie Samuel. Polynomial algorithms for subisomor-
show the superiority of PathLAD+. phism of nd open combinatorial maps. Computer Vision
and Image Understanding, 115(7):996–1010, 2011.
5 Conclusion [Derigs, 1981] Ulrich Derigs. A shortest augmenting path
In this paper, we propose a probing search procedure, a method for solving minimal perfect matching problems.
novel matching ordering method, and an adaptive propaga- Networks, 11(4):379–390, 1981.
tion method for the SIP. Based on the above strategies, we de- [Gocht et al., 2020] Stephan Gocht, Ross McBride, Ciaran
velop an efficient algorithm called PathLAD+. Experiments McCreesh, Jakob Nordström, Patrick Prosser, and James
show PathLAD+ significantly outperforms the state-of-the- Trimble. Certifying solvers for clique and maximum com-
art SIP algorithms. mon (connected) subgraph problems. In CP, pages 338–
As for future work, the proposed adaptive propagation 357, 2020.

5646
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)

[Johnson and Garey, 1979] David S Johnson and Michael R using a fast subgraph isomorphism algorithm. In DAC,
Garey. Computers and intractability: A guide to the theory pages 31–37, 1993.
of NP-completeness. WH Freeman, 1979. [Snijders et al., 2006] Tom AB Snijders, Philippa E Pattison,
[Kim et al., 2015] Jinha Kim, Hyungyu Shin, Wook-Shin Garry L Robins, and Mark S Handcock. New specifica-
Han, Sungpack Hong, and Hassan Chafi. Taming subgraph tions for exponential random graph models. Sociological
isomorphism for rdf query processing. Proceedings of the methodology, 36(1):99–153, 2006.
VLDB Endowment, 8(11):1238–1249, 2015. [Solnon et al., 2015] Christine Solnon, Guillaume Damiand,
[Kotthoff et al., 2016] Lars Kotthoff, Ciaran McCreesh, and Colin De La Higuera, and Jean-Christophe Janodet. On the
Christine Solnon. Portfolios of subgraph isomorphism al- complexity of submap isomorphism and maximum com-
gorithms. In LION, pages 107–122, 2016. mon submap problems. Pattern Recognition, 48(2):302–
[Kraiczy and McCreesh, 2021] Sonja Kraiczy and Ciaran 316, 2015.
McCreesh. Solving graph homomorphism and subgraph [Solnon, 2010] Christine Solnon. Alldifferent-based filtering
isomorphism problems faster through clique neighbour- for subgraph isomorphism. Artificial Intelligence, 174(12-
hood constraints. In IJCAI, pages 1396–1402, 2021. 13):850–864, 2010.
[Liu et al., 2020] Yanli Liu, Chu-Min Li, Hua Jiang, and [Solnon, 2019] Christine Solnon. Experimental evaluation
Kun He. A learning based branch and bound for maxi- of subgraph isomorphism solvers. In GbRPR, pages 1–13,
mum common subgraph related problems. In AAAI, pages 2019.
2392–2399, 2020. [Stergiou, 2021] Kostas Stergiou. Adaptive constraint prop-
[Liu et al., 2022] Yanli Liu, Jiming Zhao, Chu-Min Li, Hua agation in constraint satisfaction: review and evaluation.
Jiang, and Kun He. Hybrid learning with new value func- Artificial Intelligence Review, 54(7):5055–5093, 2021.
tion for the maximum common subgraph problem. arXiv [Wang et al., 2022] Hanchen Wang, Ying Zhang, Lu Qin,
preprint arXiv:2208.08620, 2022. Wei Wang, Wenjie Zhang, and Xuemin Lin. Reinforce-
[Lladós et al., 2001] Josep Lladós, Enric Martı́, and Juan J. ment learning based query vertex ordering model for sub-
Villanueva. Symbol recognition by error-tolerant subgraph graph matching. In ICDE, pages 245–258, 2022.
matching between region adjacency graphs. IEEE Trans- [Zampelli et al., 2010] Stéphane Zampelli, Yves Deville,
actions on Pattern Analysis and Machine Intelligence, and Christine Solnon. Solving subgraph isomorphism
23(10):1137–1143, 2001. problems with constraint programming. Constraints,
[McCreesh and Prosser, 2015] Ciaran McCreesh and Patrick 15(3):327–353, 2010.
Prosser. A parallel, backjumping subgraph isomorphism [Zhou et al., 2022] Jianrong Zhou, Kun He, Jiongzhi Zheng,
algorithm using supplemental graphs. In CP, pages 295– Chu-Min Li, and Yanli Liu. A strengthened branch and
312, 2015. bound algorithm for the maximum common (connected)
[McCreesh et al., 2016a] Ciaran McCreesh, Samba Ndojh subgraph problem. In IJCAI, pages 1908–1914, 2022.
Ndiaye, Patrick Prosser, and Christine Solnon. Clique and
constraint models for maximum common (connected) sub-
graph problems. In CP, pages 350–368, 2016.
[McCreesh et al., 2016b] Ciaran McCreesh, Patrick Prosser,
and James Trimble. Heuristics and really hard instances
for subgraph isomorphism problems. In IJCAI, pages 631–
638, 2016.
[McCreesh et al., 2017] Ciaran McCreesh, Patrick Prosser,
and James Trimble. A partitioning algorithm for maxi-
mum common subgraph problems. In IJCAI, pages 712–
719, 2017.
[McCreesh et al., 2018] Ciaran McCreesh, Patrick Prosser,
Christine Solnon, and James Trimble. When subgraph iso-
morphism is really hard, and why this matters for graph
databases. Journal of Artificial Intelligence Research,
61:723–759, 2018.
[McCreesh et al., 2020] Ciaran McCreesh, Patrick Prosser,
and James Trimble. The glasgow subgraph solver: Using
constraint programming to tackle hard subgraph isomor-
phism problem variants. In ICGT, pages 316–324, 2020.
[Ohlrich et al., 1993] Miles Ohlrich, Carl Ebeling, Eka Gint-
ing, and Lisa Sather. Subgemini: Identifying subcircuits

5647

You might also like