Pathlad+: An Improved Exact Algorithm For Subgraph Isomorphism Problem
Pathlad+: An Improved Exact Algorithm For Subgraph Isomorphism Problem
Pathlad+: An Improved Exact Algorithm For Subgraph Isomorphism Problem
5639
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)
5640
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)
5641
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)
5642
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)
In our proposed matching ordering method, when the al- Algorithm 3 APM
gorithm chooses a pattern vertex v p , the positions of target Input: Target graph Gt and a non-matched pattern vertex v p
vertices in the domain of a selected pattern are arranged in a Output: The reduced domain D(v p ) of v p
descending order of the oscore values (Line 8 in Algorithm 1: reduce D(v p ) based on FC(Diff) and FC(Edges);
2). The proposed matching ordering method depends on the 2: if Gt is not a sparse graph then
search information of ProSearch. In the experimental section, 3: if nbnodes > max tries && nbf ail/nbnodes > β1
we will show that this method has outstanding performance then
compared to several other sorting methods. 4: switchL := 0;
3.4 Adaptive Propagation Method 5: if switchL == 0 at the first time then
6: N b := nbnodes;
For the SIP, Glasgow [McCreesh et al., 2020] and PathLAD 7: end if
[Kotthoff et al., 2016] have outstanding performance, but they
8: end if
are completely different search strategies. Especially, Glas- 9: if switchL == 0 && nbnodes > 2N b &&
gow combines a weak propagation method with a fast restart nbf ail/nbnodes > β2 then
mechanism. According to our preliminary experiments, Glas- 10: switchA := 0;
gow can make at least 104 recursive calls per second for some 11: end if
instances. On the contrary, PathLAD uses a strong propaga- 12: end if
tion method at each stage of the search, and thus it sometimes 13: if switchL == 1 then
makes less than one recursive call per second when dealing 14: reduce D(v p ) based on LAD-filtering;
with some large target graphs. Based on our observations, no 15: end if
current algorithms for the SIP use different strengths of prop- 16: if switchA == 1 then
agation methods at different stages of the search. Thus, our 17: reduce D(v p ) based on GAC(allDiff);
motivation is to design a method that can flexibly use some 18: end if
propagation methods in the search. 19: return D(v p );
In the case of ProSearch procedure, the algorithm always
uses LAD-filtering and GAC(allDiff). Both of them have high
time complexity. Therefore, in the main search procedure,
we design an adaptive propagation method to guide the use of ing LAD-filtering on the wrong branches as early as possi-
strong propagation methods GAC(allDiff) and LAD-filtering. ble. On the other hand, if nbnodes is larger than max tries,
The pseudo-code of APM is shown in Algorithm 3. the algorithm has already explored some parts of the whole
Let us consider LAD-filtering first. We define a target graph search space. For this case, we think the relationship be-
to be sparse if the median of its vertex degrees is less than tween nbf ail and nbnodes can provide some useful infor-
degm . In our work, degm is set to 20. When a pattern vertex mation for a given instance. If the number of failed vertices
v p is matched to a target vertex vit ∈ Dom(v p ), LAD-filtering in the search procedure is large (i.e., nbf ail/nbnodes > β1
ensures that every pattern vertex in NGp (v p ) can match dif- where β1 is a parameter), it means that the algorithm has al-
ferent target vertices in NGt (vit ). Its execution time is based ready backtracked a lot and thus the algorithm no longer uses
on the degree values of v p and vit . Because degGt (vit ) must be LAD-filtering (Lines 3–4).
larger than or be equal to degGp (v p ), we just need to focus on In the following, we consider another strong propaga-
the degree of target vertex vit . If the target graph is sparse, the tion method GAC(allDiff). Although this constraint has
execution time of LAD-filtering is reasonable and we think high time complexity in theory, it is actually faster than
that using it at every stage of the main search procedure is LAD-filtering in most cases. We will explain this reason as
feasible. below. GAC(allDiff) constructs a bipartite graph between
In other cases, if the algorithm often backtracks due to lots pattern vertices and target vertices. If a pattern vertex v p is
of conflicts, the algorithm can actually turn to use some weak matched to a target vertex vit , GAC(allDiff) will remove vit
propagation methods including FC(Diff) and FC(Edges) in- from the domain of some other pattern vertices and ensures
stead of strong propagation methods. Although using weak that all pattern vertices can still match different target ver-
propagation methods may result in searching deeper on the tices. Removing a selected target vertex from the generated
wrong branch compared to strong propagation methods, the bipartite graph only needs to find the next free target ver-
algorithm can backtrack faster because the complexity of tices for some pattern vertices by looking for an augmenting
these weak methods is quite low. For such cases, calling path [Derigs, 1981]. In fact, the size of a given target graph
LAD-filtering multiple times during the main search proce- is usually larger than that of a corresponding pattern graph.
dure will waste a lot of computation time. In our work, we Thus, when the sizes of the two graphs are quite different,
analyze whether backtracking often occurs in the main search GAC(allDiff) is likely to be run in linear time.
procedure by observing the values of nbnodes and nbf ail. In the main search procedure, after disabling the
Meanwhile, we use a parameter max tries as the upper LAD-filtering, the algorithm begins to consider whether to
bound of nbnodes. In detail, on the one hand, If nbnodes disable the GAC(allDiff). When LAD-filtering is forbidden
is smaller than max tries, it may occur in the early stage of for the first time, we use variable N b to record the current
the search procedure. Because the backtracking for branch value of nbnodes (Lines 5–6). GAC(allDiff) will continue
selection is costly, we want to explore more conflicts by us- to be used until nbnodes has been increased twofold since
5643
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)
LAD-filtering is disabled at the first time, i.e., nbnodes > • images-PR15 (24 instances): There are 24 pattern
2N b. At this time, if the algorithm still backtracks frequently graphs that have between 4 and 170 vertices and 1 target
(i.e., nbf ail/nbnodes > β2 where β2 is a parameter), we graph that has 4838 vertices. All the graphs have been
disable GAC(allDiff) (Lines 9–10). In the subsequent search derived from segmented images [Solnon et al., 2015].
procedure, the algorithm doesn’t employ any strong propaga-
• scalefree (100 instances): Each instance contains a tar-
tion methods including LAD-filtering and GAC(allDiff).
get graph whose vertices are between 200 and 1000 and
Remark that, in our work, the switch of propagation meth-
a pattern graph whose vertices are 90% of the vertices of
ods is one-way. The strength of weak propagation methods
the corresponding target graph. All the instances in the
increases significantly with search depth, so there is no need
benchmark have been randomly generated using a power
to switch back to using strong propagation methods when the
law distribution of degrees [Solnon, 2010].
search depth reaches a certain point. Based on our prelimi-
nary experiments, we found that one-way switching was both • si (1170 instances): Each instance is composed of a tar-
straightforward and effective, whereas two-way switching ex- get graph (between 200 and 1296 vertices) and a pat-
hibited poor performance on some instances. Recently, re- tern graph (between 20% and 60% of the vertices of
searchers have developed dynamic choice methods for several the corresponding target graph). This benchmark is
well-known problems, such as CSP [Stergiou, 2021]. One from bounded valence graphs, modified bounded va-
crucial step in algorithm design is to dynamically combine lence graphs, 4D meshes, and random generated graphs
various methods for a particular problem. It’s worth noting [Solnon, 2010].
that our method is the first to use a dynamic choice approach
to select propagation methods for the SIP. • phase-transition (200 instances): These random in-
Here, we will summarize the impact of the parameters β1 stances are chosen to be close to the satisfiable-
and β2 on the propagator choices as below. Parameters β1 and unsatisfiable phase transition. Pattern graphs have 30
β2 are two thresholds that define whether a given instance vertices, while target graphs have 150 vertices [Mc-
is easy to backtrack due to numerous conflicts. When the Creesh et al., 2016b].
conflict ratio is larger than these two parameters, we turn to • LV (1176 instances): The selected 49 graphs whose ver-
using some simple propagation methods to make backtrack tices are between 10 and 128 are considered as pattern
fast. Specifically, a higher value of β1 indicates a greater tol- and target graphs, and this benchmark has already been
erance for conflicts, allowing us to use all propagation meth- used as the tested benchmark [Liu et al., 2020]. These
ods. However, if the conflict ratio surpasses β1 , we discard graphs have different properties [Solnon, 2010], such as
the LAD-filtering method. On the other hand, a larger value connected, biconnected, triconnected, etc.
of β2 implies a higher tolerance for conflicts to solely rely
on the strong propagation method GAC(allDiff). When the • LargerLV (3430 instances): From the above 49 LV
conflict ratio exceeds β2 , we also abandon the GAC(allDiff). graphs as the pattern graph and the other 70 graphs as the
target graph whose vertices are between 138 and 6671.
More details of the target graphs can be seen on the web-
4 Experimental Evaluation site2 .
In this section, we carry out experiments to evaluate Path-
LAD+ on a broad range of various benchmarks, compared 4.2 Experiment Setup
against the state-of-the-art algorithms for the SIP.
We compare PathLAD+ with four state-of-the-art SIP al-
4.1 Benchmarks gorithms, including Glasgow+Clq [Kraiczy and McCreesh,
2021], PathLAD [Kotthoff et al., 2016], RI [Bonnici et al.,
For our experiments, we select all used instances from 2013] and VF2 [Cordella et al., 2004]. The codes of these
[Kraiczy and McCreesh, 2021; Liu et al., 2022] which can
competitors are kindly provided by the authors. Our source
also download from the website1 . In total, we choose 15396 code is publicly available at github3 . Our proposed algorithm
instances, which can be grouped into 8 benchmarks. and four competitors are all implemented in C++ and com-
• images-CVIU11 (6278 instances): This benchmark in- piled by g++ with ‘-O3’ option. All the algorithms are run on
cludes 43 pattern graphs and 146 target graphs, which Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz 512GB
have been generated from segmented images [Damiand RAM under CentOS 7.9. The cutoff time is 3600 seconds
et al., 2011]. In the benchmark, pattern graphs have be- for each instance. According to our preliminary experiments,
tween 22 and 151 vertices, whereas target graphs have parameters max tries, β1 , and β2 are set to 1000, 0.85, and
between 1072 and 5972 vertices. 0.8, respectively.
• meshes-CVIU11 (3018 instances): It is composed of 6 For each algorithm, we report the number of instances for
pattern graphs and 503 target graphs, which have been each benchmark (#inst) and the number of successful solved
generated from meshes modeling 3D object [Damiand instances (#solved). The bold values in the tables indicate
et al., 2011]. The number of vertices for pattern graphs the best solution among all the algorithms.
is from 40 to 199, while the number of vertices is from 2
208 to 5873. https://github.com/ciaranm/cpaior2021-finding-subgraphs-
with-side-constraints/tree/main/instances/largerGraphs
1 3
http://liris.cnrs.fr/csolnon/SIP.html https://github.com/yiyuanwang1988/PathLAD-Plus
5644
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)
5645
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)
Acknowledgements
This work was supported by CAS Project for Young Sci-
entists in Basic Research (Grant No.YSBR-040), NSFC
(61806050), Jilin Science and Technology Association
QT202005, and Science and Technology Development
Program of Jilin Provice (YDZJ202201ZYTS412 and
20230101060JC). We would like to thank the anonymous ref-
erees for their helpful comments.
References
[Archibald et al., 2019] Blair Archibald, Fraser Dunlop,
Ruth Hoffmann, Ciaran McCreesh, Patrick Prosser, and
James Trimble. Sequential and parallel solution-biased
Figure 3: The run time of PathLAD and PathLAD-1 on all the search for subgraph algorithms. In CPAIOR, pages 20–38,
benchmarks. 2019.
[Audemard et al., 2014] Gilles Audemard, Christophe
Lecoutre, Mouny Samy-Modeliar, Gilles Goncalves,
and Daniel Porumbel. Scoring-based neighborhood
dominance for the subgraph isomorphism problem. In
CP, pages 125–141, 2014.
[Bonnici et al., 2013] Vincenzo Bonnici, Rosalba Giugno,
Alfredo Pulvirenti, Dennis Shasha, and Alfredo Ferro. A
subgraph isomorphism algorithm and its application to
biochemical data. BMC bioinformatics, 14(7):1–13, 2013.
[Carletti et al., 2017] Vincenzo Carletti, Pasquale Foggia,
Alessia Saggese, and Mario Vento. Introducing vf3: A
new algorithm for subgraph isomorphism. In GbRPR,
pages 128–139, 2017.
[Chen et al., 2023] Jiejiang Chen, Shaowei Cai, Yiyuan
Wang, Wenhao Xu, Jia Ji, and Minghao Yin. Improved
local search for the minimum weight dominating set prob-
lem in massive graphs by using a deep optimization mech-
anism. Artificial Intelligence, 314:103819, 2023.
Figure 4: The run time of PathLAD+ and PathLAD on all the bench-
marks. [Cordella et al., 2004] Luigi P Cordella, Pasquale Foggia,
Carlo Sansone, and Mario Vento. A (sub) graph isomor-
phism algorithm for matching large graphs. IEEE Trans-
colored points show the instance from the different bench- actions on Pattern Analysis and Machine Intelligence,
marks. Figures 2 and 3 intuitively show that the proposed 26(10):1367–1372, 2004.
two strategies make an important role in our proposed algo-
rithm. Besides, because PathLAD is a baseline algorithm of [Damiand et al., 2011] Guillaume Damiand, Christine Sol-
our proposed algorithm, we compare PathLAD+ with Path- non, Colin De la Higuera, Jean-Christophe Janodet, and
LAD in terms of run time in Figure 4. Once again, the results Émilie Samuel. Polynomial algorithms for subisomor-
show the superiority of PathLAD+. phism of nd open combinatorial maps. Computer Vision
and Image Understanding, 115(7):996–1010, 2011.
5 Conclusion [Derigs, 1981] Ulrich Derigs. A shortest augmenting path
In this paper, we propose a probing search procedure, a method for solving minimal perfect matching problems.
novel matching ordering method, and an adaptive propaga- Networks, 11(4):379–390, 1981.
tion method for the SIP. Based on the above strategies, we de- [Gocht et al., 2020] Stephan Gocht, Ross McBride, Ciaran
velop an efficient algorithm called PathLAD+. Experiments McCreesh, Jakob Nordström, Patrick Prosser, and James
show PathLAD+ significantly outperforms the state-of-the- Trimble. Certifying solvers for clique and maximum com-
art SIP algorithms. mon (connected) subgraph problems. In CP, pages 338–
As for future work, the proposed adaptive propagation 357, 2020.
5646
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23)
[Johnson and Garey, 1979] David S Johnson and Michael R using a fast subgraph isomorphism algorithm. In DAC,
Garey. Computers and intractability: A guide to the theory pages 31–37, 1993.
of NP-completeness. WH Freeman, 1979. [Snijders et al., 2006] Tom AB Snijders, Philippa E Pattison,
[Kim et al., 2015] Jinha Kim, Hyungyu Shin, Wook-Shin Garry L Robins, and Mark S Handcock. New specifica-
Han, Sungpack Hong, and Hassan Chafi. Taming subgraph tions for exponential random graph models. Sociological
isomorphism for rdf query processing. Proceedings of the methodology, 36(1):99–153, 2006.
VLDB Endowment, 8(11):1238–1249, 2015. [Solnon et al., 2015] Christine Solnon, Guillaume Damiand,
[Kotthoff et al., 2016] Lars Kotthoff, Ciaran McCreesh, and Colin De La Higuera, and Jean-Christophe Janodet. On the
Christine Solnon. Portfolios of subgraph isomorphism al- complexity of submap isomorphism and maximum com-
gorithms. In LION, pages 107–122, 2016. mon submap problems. Pattern Recognition, 48(2):302–
[Kraiczy and McCreesh, 2021] Sonja Kraiczy and Ciaran 316, 2015.
McCreesh. Solving graph homomorphism and subgraph [Solnon, 2010] Christine Solnon. Alldifferent-based filtering
isomorphism problems faster through clique neighbour- for subgraph isomorphism. Artificial Intelligence, 174(12-
hood constraints. In IJCAI, pages 1396–1402, 2021. 13):850–864, 2010.
[Liu et al., 2020] Yanli Liu, Chu-Min Li, Hua Jiang, and [Solnon, 2019] Christine Solnon. Experimental evaluation
Kun He. A learning based branch and bound for maxi- of subgraph isomorphism solvers. In GbRPR, pages 1–13,
mum common subgraph related problems. In AAAI, pages 2019.
2392–2399, 2020. [Stergiou, 2021] Kostas Stergiou. Adaptive constraint prop-
[Liu et al., 2022] Yanli Liu, Jiming Zhao, Chu-Min Li, Hua agation in constraint satisfaction: review and evaluation.
Jiang, and Kun He. Hybrid learning with new value func- Artificial Intelligence Review, 54(7):5055–5093, 2021.
tion for the maximum common subgraph problem. arXiv [Wang et al., 2022] Hanchen Wang, Ying Zhang, Lu Qin,
preprint arXiv:2208.08620, 2022. Wei Wang, Wenjie Zhang, and Xuemin Lin. Reinforce-
[Lladós et al., 2001] Josep Lladós, Enric Martı́, and Juan J. ment learning based query vertex ordering model for sub-
Villanueva. Symbol recognition by error-tolerant subgraph graph matching. In ICDE, pages 245–258, 2022.
matching between region adjacency graphs. IEEE Trans- [Zampelli et al., 2010] Stéphane Zampelli, Yves Deville,
actions on Pattern Analysis and Machine Intelligence, and Christine Solnon. Solving subgraph isomorphism
23(10):1137–1143, 2001. problems with constraint programming. Constraints,
[McCreesh and Prosser, 2015] Ciaran McCreesh and Patrick 15(3):327–353, 2010.
Prosser. A parallel, backjumping subgraph isomorphism [Zhou et al., 2022] Jianrong Zhou, Kun He, Jiongzhi Zheng,
algorithm using supplemental graphs. In CP, pages 295– Chu-Min Li, and Yanli Liu. A strengthened branch and
312, 2015. bound algorithm for the maximum common (connected)
[McCreesh et al., 2016a] Ciaran McCreesh, Samba Ndojh subgraph problem. In IJCAI, pages 1908–1914, 2022.
Ndiaye, Patrick Prosser, and Christine Solnon. Clique and
constraint models for maximum common (connected) sub-
graph problems. In CP, pages 350–368, 2016.
[McCreesh et al., 2016b] Ciaran McCreesh, Patrick Prosser,
and James Trimble. Heuristics and really hard instances
for subgraph isomorphism problems. In IJCAI, pages 631–
638, 2016.
[McCreesh et al., 2017] Ciaran McCreesh, Patrick Prosser,
and James Trimble. A partitioning algorithm for maxi-
mum common subgraph problems. In IJCAI, pages 712–
719, 2017.
[McCreesh et al., 2018] Ciaran McCreesh, Patrick Prosser,
Christine Solnon, and James Trimble. When subgraph iso-
morphism is really hard, and why this matters for graph
databases. Journal of Artificial Intelligence Research,
61:723–759, 2018.
[McCreesh et al., 2020] Ciaran McCreesh, Patrick Prosser,
and James Trimble. The glasgow subgraph solver: Using
constraint programming to tackle hard subgraph isomor-
phism problem variants. In ICGT, pages 316–324, 2020.
[Ohlrich et al., 1993] Miles Ohlrich, Carl Ebeling, Eka Gint-
ing, and Lisa Sather. Subgemini: Identifying subcircuits
5647