0% found this document useful (0 votes)
47 views9 pages

A Supervised Spectral Framework For Hypergraph Partitioning Solution Improvement

Uploaded by

xzxuan2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views9 pages

A Supervised Spectral Framework For Hypergraph Partitioning Solution Improvement

Uploaded by

xzxuan2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

4QFD1BSU " 4VQFSWJTFE 4QFDUSBM 'SBNFXPSL GPS )ZQFSHSBQI

1BSUJUJPOJOH 4PMVUJPO *NQSPWFNFOU


Ismail Bustany Andrew B. Kahng Ioannis Koutis
Advanced Micro Devices University of California San Diego New Jersey Institute of Technology
San Jose, CA, USA La Jolla, CA, USA Newark, NJ, USA
[email protected] [email protected] [email protected]

Bodhisatta Pramanik Zhiang Wang


Iowa State University University of California San Diego
Ames, IA, USA La Jolla, CA, USA
[email protected] [email protected]
ABSTRACT 1 INTRODUCTION
State-of-the-art hypergraph partitioners follow the multilevel par- Hypergraphs are a generalization of graphs where hyperedges, the
adigm that constructs multiple levels of progressively coarser hy- counterpart of edges in a graph, can connect more than two vertices.
pergraphs that are used to drive cut refinements on each level of A fundamental NP-hard problem related to hypergraphs is to parti-
the hierarchy. Multilevel partitioners are subject to two limitations: tion all the vertices into balanced blocks such that each block has
(i) Hypergraph coarsening processes rely on local neighborhood bounded size and the cutsize, i.e, the number of spanning multiple
structure without fully considering the global structure of the hyper- blocks, is minimized. This balanced hypergraph partitioning has
graph. (ii) Refinement heuristics can stagnate on local minima. In been a well-studied, fundamental combinatorial optimization prob-
this paper, we describe SpecPart, the first supervised spectral frame- lem with application throughout VLSI CAD. Balanced partitioning
work that directly tackles these two limitations. SpecPart solves a can also enable efficient distributed computations when solving area-
generalized eigenvalue problem that captures the balanced partition- constrained hypergraph optimization problems. Many hypergraph
ing objective and global hypergraph structure in a low-dimensional partitioners have been proposed over the past decades. State-of-
vertex embedding while leveraging initial high-quality solutions the-art hypergraph partitioners, including MLPart [21], PaToH [9],
from multilevel partitioners as hints. SpecPart further constructs KaHyPar [24], and hMETIS [6], usually follow the multilevel par-
a family of trees from the vertex embedding and partitions them adigm [6]. The multilevel paradigm constructs a hierarchy of pro-
with a tree-sweeping algorithm. Then, a novel overlay of multiple gressively coarser hypergraphs using local clustering heuristics [24],
tree-based partitioning solutions, followed by lifting to a coarsened partitions the coarsest hypergraph, then uncoarsens, and refines the
hypergraph, where an ILP partitioning instance is solved to alleviate partitioning solution at each level of the hierarchy [11, 14].
local stagnation. We have validated SpecPart on multiple sets of Multilevel partitioners are powerful but subject to two limitations.
benchmarks. Experimental results show that for some benchmarks, The first stems from the propensity of partition refinement heuristics
our SpecPart can substantially improve the cutsize by more than 50% to become trapped on local minima that persist through levels in the
with respect to the best published solutions obtained with leading hierarchy. It is reasonable to hypothesize that any given solution
partitioners hMETIS and KaHyPar. obtained by a multilevel partitioner is ‘in the vicinity’ of potentially
much better solutions. However, finding such solutions may require
CCS CONCEPTS some type of global understanding of the hypergraph. That brings us
• Hardware → Physical design (EDA); • Theory of computation to the second limitation of the multilevel paradigm: the coarsening
→ Design and analysis of algorithms. phase and refinement decisions are usually based on local structure
and greedy computational objectives, hence the global structure of
KEYWORDS the hypergraph is not explicitly taken into account.
Hypergraph Partitioning, Supervised Spectral Partitioning We thus consider a cut obtained by a multilevel partitioner as a
hint for a better solution and set out to design a solution improvement
ACM Reference Format: method that leverages the hint while using global structural informa-
Ismail Bustany, Andrew B. Kahng, Ioannis Koutis, Bodhisatta Pramanik,
tion. This kind of global structure of the hypergraph can be exposed
and Zhiang Wang. 2022. SpecPart: A Supervised Spectral Framework for
Hypergraph Partitioning Solution Improvement. In IEEE/ACM International by spectral algorithms [26–29] based on the well-known Cheeger in-
Conference on Computer-Aided Design (ICCAD ’22), October 30-November equality [31]. Spectral partitioning algorithms have been generalized
3, 2022, San Diego, CA, USA. ACM, New York, NY, USA, 9 pages. https: by Cucuringu et al. [1] to supervised partitioning instances, e.g. in-
//doi.org/10.1145/3508352.3549390 stances where a hint is available. More specifically, the algorithm
of [1] formulates supervised partitioning as a generalized eigenvalue
problem satisfying a generalized Cheeger inequality. This suggests
a clear direction towards obtaining improved partitioning solutions.
We propose SpecPart, the first supervised spectral framework for
hypergraph partitioning solution improvement. In this work, we focus
This work is licensed under a Creative Commons Attribution International 4.0 License.
on the bipartitioning problem which is often used as a subroutine in
ICCAD ’22, October 30-November 3, 2022, San Diego, CA, USA 𝑘-way partitioners.
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9217-4/22/10…$15.00
https://doi.org/10.1145/3508352.3549390
*$$"% ً 0DUPCFS /PWFNCFS   4BO %JFHP $" 64" #PEIJTBࡻB BOE ;IJBOH FU BM

Our contributions include: Let us now consider an example of how balanced graph bipartitioning
• A novel method that incorporates pre-computed hint solutions into relates to spectral methods. Let 𝐾 be the Laplacian of a complete
a generalized eigenvalue problem. The computed eigenvectors unweighted graph on 𝑉 . Using expression (1), we have
yield high-quality vertex embeddings that are superior to those ob-
tained without supervision. Importantly, our carefully engineered 𝑥𝑇 𝐿𝑥 𝑐𝑢𝑡𝑠𝑖𝑧𝑒𝐺 (𝑆)
𝑅(𝑥)  𝑇 = .
code yields a practically fast implementation. [Section 4.1]. 𝑥 𝐾𝑥 |𝑆 | · |𝑉 − 𝑆 |
• A novel algorithm for converting a vertex embedding into a parti- Minimizing 𝑅(𝑥) over 0-1 vectors 𝑥 incentivizes a small 𝑐𝑢𝑡𝑠𝑖𝑧𝑒 (𝑆)
tioning solution. The algorithm uses the embedding to construct with a simultaneous balance between |𝑆 | and |𝑉 − 𝑆 |, hence 𝑅(𝑥)
a family of trees that in some sense distill the cut structure of can be viewed as a proxy for the balanced partitioning objective.
the hypergraph. Then, fast algorithms can be used on the tree to We can relax the problem over the real vectors 𝑥 constrained to be
explore a large space of candidate solutions from which the best orthogonal to the common null space of 𝐿 and 𝐾. It is well understood
can be picked. [Section 4.2]. that the minimum is achieved by the first non-trivial eigenvector of
• A novel cut overlay method for improving a small pool of initial the problem 𝐿𝑥 = 𝜆𝐾𝑥 .
solutions. Specifically, we compute clusters by removing from the
hypergraph the union of the hyperedges cut by any of the solutions 2.3 Spectral Embeddings and Partitioning
in the pool. The size of the clustered hypergraph is small, but it
Spectral graph partitioning algorithms embed the vertices of an input
nearly always contains an improved solution that can often be
graph 𝐺 into a 𝑚-dimensional space and then cluster the points in this
computed optimally using an ILP formulation. [Section 3].
geometric space. The vertex embedding comes from the computation
• We have validated SpecPart on multiple benchmark sets (ISPD98
of 𝑚 non-trivial eigenvectors of an appropriate eigenvalue problem
VLSI Circuit Benchmark Suite [4], Titan23 [8] and Industrial
involving the Laplacian 𝐿𝐺 of the graph 𝐺. More specifically, if
benchmarks from a leading FPGA company) with state-of-the-art
partitioners (hMETIS [6] and KaHyPar [24]). Experimental results 𝑋 ∈ R |𝑉 |×𝑚 is the matrix containing 𝑚 (column) eigenvectors, then
show that for some benchmarks, our SpecPart can substantially row 𝑋𝑢 of 𝑋 is the embedding of vertex 𝑢.
improve the cutsize by more than 50% with respect to hMETIS Spectral algorithms have also been used for hypergraph parti-
and/or KaHyPar. [Section 5.1]. tioning. In this context, the hypergraph 𝐻 is first transformed to a
• We apply autotuning to tune the hyperparameters of existing par- corresponding graph 𝐺, and then the spectral embedding is computed
titioners and generate a better initial solution for SpecPart. Ex- using 𝐿𝐺 . For example, the eigenvalue problem solved in [26] is
periments suggest that the autotuning-based SpecPart can further 𝐿𝐺 𝑥 = 𝜆𝐷 𝑤 𝑥 (2)
push the leaderboard for these benchmarks. [Section 5.3].
SpecPart draws strength from recent theoretical and algorithmic where 𝐷 𝑤 is the diagonal matrix containing positive vertex weights.
progress [1, 18, 20, 22]. In particular, a careful choice of the nu- In this paper we solve the more general problem
merical solvers enables a very efficient implementation. Moreover, 𝐿𝐺 𝑥 = 𝜆𝐵𝑥 (3)
SpecPart’s capacity to include supervision information makes it po-
tentially even more powerful in industrial pipelines. We thus believe where 𝐵 is also a graph Laplacian. In practical instances, hypergraphs
that our work may eventually lead to a departure from the multilevel are ‘essentially’ connected with possibly a few outstanding vertices
paradigm that has dominated the field for the past quarter-century. and edges that can be processed separately. Thus, since 𝐺 can be
considered connected, the problem is well-defined even if 𝐵 does
2 PRELIMINARIES not correspond to a connected graph, because 𝐿𝐺 ’s null space is a
subspace of that of 𝐵 [19]. This enables us to handle zero vertex
2.1 Hypergraph Partitioning Formulation weights as required in practice, and to encode in a natural ‘graphical’
In a hypergraph 𝐻 (𝑉 , 𝐸), 𝑉 is a set of vertices with each vertex way prior supervision information into the matrix 𝐵.
𝑣 ∈ 𝑉 associated with a weight 𝑤 𝑣 , and 𝐸 is a set of hyperedges
where a hyperedge 𝑒 ∈ 𝐸 is a subset of 𝑉 . Each hyperedge 𝑒 can Term Description
be also associated with a weight 𝑤𝑒 . Given a positive integer 𝑘 𝐻 (𝑉 , 𝐸) Hypergraph 𝐻 with vertices 𝑉 and hyperedges 𝐸
(𝑘 ≥ 2) and a positive real number 𝜖 (𝜖 ≤ 1.0 𝑘 ), the 𝑘-way balanced Clustered hypergraph 𝐻𝑐 where each vertex 𝑣𝑐 in 𝑉𝑐
hypergraph partitioning problem is to partition 𝑉into 𝑘 disjoint 𝐻𝑐 (𝑉𝑐 , 𝐸𝑐 )
corresponds to a group of vertices in 𝐻 (𝑉 , 𝐸)
blocks 𝑆 = {𝑉0, 𝑉1, ..., 𝑉𝑘−1 } such that (letting 𝑊 = 𝑣 ∈𝑉 𝑤 𝑣 ) 𝐺 (𝑉 , 𝐸) Graph 𝐺 with vertices 𝑉 and edges 𝐸

• (1/𝑘 − 𝜖)𝑊 ≤ 𝑣 ∈𝑉𝑖 𝑤 𝑣 ≤ (1/𝑘 + 𝜖)𝑊 , for 0 ≤ 𝑖 ≤ 𝑘-1 𝐺˜ Spectral sparsifier of 𝐺
 𝑇 (𝑉 , 𝐸𝑇 ) Tree 𝑇 with vertices 𝑉 and edges 𝐸𝑇
• 𝑐𝑢𝑡𝑠𝑖𝑧𝑒𝐻 (𝑆) = {𝑒 |𝑒 𝑉𝑖 for any 𝑖 } 𝑤𝑒 is minimized
𝑢, 𝑣 Vertices in 𝑉
Here 𝑘 is the number of blocks in the partitioning solution, 𝜖 is the 𝑒𝑢𝑣 Edge or hyperedge connecting 𝑢 and 𝑣
allowed imbalance between blocks, 𝑉𝑖 is a partition block and we 𝑒𝑇 Edge of tree 𝑇
say that 𝑆 is an 𝜖-balanced partitioning solution. 𝑤 𝑣 , 𝑤𝑒 Weight of vertex 𝑣, or hyperedge 𝑒, respectively
𝑘 Number of blocks in a partitioning solution
2.2 Laplacians, Cuts and Eigenvectors 𝑆 Partitioning solution, 𝑆 = {𝑉0, 𝑉1, ..., 𝑉𝑘−1 }
Suppose 𝐺 = (𝑉 , 𝐸, 𝑤) is a weighted graph. The Laplacian matrix 𝜖 Allowed imbalance (1-49) between blocks in 𝑆
𝐿𝐺 of 𝐺 is defined
 as follows: (i) 𝐿(𝑢, 𝑣) = −𝑤𝑒𝑢𝑣 if 𝑢 ≠ 𝑣 and 𝑐𝑢𝑡 (𝑆) Cut of 𝑆, 𝑐𝑢𝑡 (𝑆) = {𝑒 |𝑒 ∉ 𝑉𝑖 for any 𝑖}
(ii) 𝐿(𝑢, 𝑢) = 𝑣≠𝑢 𝑤𝑒𝑢𝑣 . Let 𝑥 be an indicator vector for the biparti- 𝑐𝑢𝑡𝑆𝑖𝑧𝑒𝐻 (𝑆) Cutsize of 𝑆 on (hyper)graph 𝐻 .
tion solution 𝑆 = {𝑉0, 𝑉1 } containing 1s in entries corresponding to ISSHP Iterative Supervised Spectral Hypergraph Partitioning
𝑉1 , and 0s everywhere else (𝑉0 ). Then, we have Table 1: Notation
𝑥𝑇 𝐿𝑥 = 𝑐𝑢𝑡𝑠𝑖𝑧𝑒𝐺 (𝑆). (1)
4QFD1BSU " 4VQFSWJTFE 4QFDUSBM 'SBNFXPSL GPS )ZQFSHSBQI 1BSUJUJPOJOH 4PMVUJPO *NQSPWFNFOU *$$"% ً 0DUPCFS /PWFNCFS   4BO %JFHP $" 64"

Parameter Description (default setting)


𝑚 Number of eigenvectors (𝑚 = 2)
𝜏 Number of trees (𝜏 = 8)
𝛿 Number of best solutions (𝛿 = 5)
𝛽 Number of iterations of ISSHP (𝛽 = 2)
𝜁 Number of random cycles (𝜁 = 2)
𝛾 Threshold of number of hyperedges (𝛾 = 300)
𝜃 Number of iterations of eigenvalue solver (𝜃 = 80)
Table 2: Parameters of SpecPart framework.

Figure 2: Two vertex embeddings of ISPD IBM14 benchmark. Both are


2.4 ILP for Hypergraph Partitioning based on the smallest two eigenvectors, computed without supervision
Hypergraph partitioning can be solved optimally by casting the prob- (Eq. 2) and with supervision (Eq. 3). The red and blue dots highlight
vertices bipartitioned by hMETIS with 𝜖 = 2. With supervision, the
lem as an integer linear program (ILP) [23]. To write balanced hyper-
distinction between the bipartitioned vertices is cleaner.
graph partitioning as an ILP, for each block 𝑉𝑖 we introduce integer
{0,1} variables, 𝑥 𝑣,𝑖 for each vertex 𝑣, and 𝑦𝑒,𝑖 for each hyperedge 𝑒,
and require that: — distill the cut structure of the hypergraph (Section 4.2). Then, fast
• 𝑥 𝑣,𝑖 = 1 if 𝑣 ∈ 𝑉𝑖 • 𝑦𝑒,𝑖 = 1 if 𝑒 ⊆ 𝑉𝑖 tree-based algorithms are employed to find the best solution 𝑆𝑏𝑒𝑠𝑡
We then define the  following constraints for each 𝑖 ∈ [0, 𝑘 − 1]: on those trees. Finally, we set 𝑆 = 𝑆𝑏𝑒𝑠𝑡 and the process iterates.
• (1/𝑘 − 𝜖)𝑊 ≤ 𝑣 ∈𝑉𝑖 𝑤 𝑣 𝑥 𝑣,𝑖 ≤ (1/𝑘 + 𝜖)𝑊 2. Cut-Overlay Clustering and Optimal-Attempt Partitioning.

• 𝑘−1𝑗=0 𝑥 𝑣,𝑗 = 1 for 𝑣 ∈ 𝑉 In the course of its iterations, ISSHP generates a collection of dif-
• 𝑦𝑒,𝑖 ≤ 𝑥 𝑣,𝑖 for each 𝑒 ∈ 𝐸, and each 𝑣 ∈ 𝑒 ferent solutions. We select the 𝛿 best solutions, denoted as “candidate
 partitioning solutions” in Figure 1.
where 𝑊 = 𝑣 ∈𝑉 𝑤 𝑣 . The objective is
  Cut-Overlay clustering. Let 𝐸 1, . . . , 𝐸𝛿 ⊂ 𝐸 be the sets of hy-
𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑤𝑒 𝑦𝑒,𝑖 . peredges cut in the 𝛿 candidate solutions. We remove the union of
𝑒 ∈𝐸 0≤𝑖 ≤𝑘−1 these sets from 𝐻 to yield a number of connected clusters. Then, we
perform a cluster contraction process that is standard in multilevel
3 SPECPART: AN OVERVIEW partitioners, to give rise to a clustered hypergraph 𝐻𝑐 (𝑉𝑐 , 𝐸𝑐 ). A solu-
The architecture of our SpecPart framework is shown in Figure 1. tion on 𝐻𝑐 can be “lifted” to 𝐻 , and by construction it is guaranteed
The input is a hypergraph 𝐻 (𝑉 , 𝐸), an initial partitioning solution that 𝐻𝑐 contains a solution which is at least as good as the best
𝑆𝑖𝑛𝑖𝑡 , and 𝜖, the allowed imbalance between blocks in a partition- among the cuts 𝐸𝑖 .
ing solution. The output is an improved partitioning solution 𝑆𝑜𝑢𝑡 . Optimal-Attempt Partitioning. While one would expect that 𝐻𝑐
Here the initial partitioning solution 𝑆𝑖𝑛𝑖𝑡 can come from any source, has not many more than 2𝛿 vertices, empirically we often observe
including available open-source partitioners.1 hundreds of vertices and hyperedges (e.g., even for 𝛿 = 5). Given
such a size for 𝐻𝑐 , we would also expect that it is infeasible to
run an ILP-based partitioner on it. Remarkably, due to the special
generative process that yields 𝐻𝑐 , it is often the case that the ILP
computes within stringent walltime a solution that is better than any
of the 𝛿 solutions in the pool. In our current implementation, we
include a parameter 𝛾; in the case when the number of hyperedges in
𝐻𝑐 is larger than 𝛾 (default value of 𝛾 is 300) we run hMETIS on 𝐻𝑐 .

4 THE ISSHP ALGORITHM


The Iterative Supervised Spectral Hypergraph Partitioning (ISSHP)
process is described in Algorithm 1, with pointers to subsequent
sections that discuss the details.
Figure 1: Overview of the SpecPart framework.
4.1 Vertex Embedding Generation
The SpecPart framework consists of two major components: In order to generate a vertex embedding, we need to construct the
1. Iterative Supervised Spectral Hypergraph Partitioning. generalized eigenvalue problem and compute the first 𝑚 nontrivial
ISSHP constitutes the fundamental algorithmic core of SpecPart. eigenvectors. Here 𝑚 is the number of eigenvectors that we use,
The initial solution 𝑆𝑖𝑛𝑖𝑡 is incorporated into a generalized eigenvalue which is set to 2 by default.
problem in order to generate a vertex embedding (Section 4.1). With
the hint from 𝑆 = 𝑆𝑖𝑛𝑖𝑡 , the vertex embedding from the generalized  $MJRVF &YQBOTJPO (SBQI We define the clique expan-
eigenvalue problem is of higher quality relative to that obtained sion graph 𝐺 of the hypergraph 𝐻 , as a sum, i.e., superposition,
from the standard eigenvalue problem, as illustrated in Figure 2. The of weighted cliques; the clique corresponding to edge 𝑒 ∈ 𝐸 has
embedding is used to compute a family of trees that — in some sense the same vertices as 𝑒 and edge weights |𝑒 1|−1 . Graph 𝐺 has size
 2
1 The input initial solution 𝑆 𝑒 ∈𝐸 |𝑒 | where |𝑒 | is the size of hyperedge
 𝑒. This is usually quite
𝑖𝑛𝑖𝑡 may even be a partial solution where block membership
information is given for only some of the vertices. This may be potentially useful in large relative to the input size |𝐼 | = 𝑒 ∈𝐸 |𝑒 |. For this reason we only
practical situations but we do not consider it further in this paper. construct a function 𝑓𝐿𝐺 that evaluates matrix-vector products of
*$$"% ً 0DUPCFS /PWFNCFS   4BO %JFHP $" 64" #PEIJTBࡻB BOE ;IJBOH FU BM

Algorithm 1: ISSHP: In general any vector 𝑥 can be written in the form 𝑥 = 𝑦 + 𝑐1,
Iterative Supervised Spectral Hypergraph Partitioning. where 𝑦𝑇 1 = 0. Substituting this decomposition of 𝑥 into the above
Input: Hypergraph 𝐻 (𝑉 , 𝐸), Initial partitioning solution 𝑆𝑏𝑒𝑠𝑡 equation, we get that 𝐵𝑏𝑎𝑠𝑒 𝑥 = w ◦ 𝑦. In other words, 𝐵𝑏𝑎𝑠𝑒 acts like
Output: Candidate partitioning solutions {𝑆𝑐 𝑗 } a diagonal matrix on 𝑦 and nullifies the constant component of 𝑥.
1 Construct the Laplacian 𝐿𝐺 of the clique expansion for 𝐻 (4.1.1)  )JOU (SBQI The hint graph 𝐺ℎ is a complete bipartite graph
2 Construct the Laplacian 𝐵𝑏𝑎𝑠𝑒 of weight-balance graph (4.1.2) on the two vertex sets 𝑉0 and 𝑉1 defined by the hint solution 𝑆𝑏𝑒𝑠𝑡 .
3 for 𝑖 = 0; 𝑖 < 𝛽 ; 𝑖 + + do It is used to incentivize the computation of cuts that are similar to
4 Construct Laplacian 𝐵𝑆𝑏𝑒𝑠𝑡 based on hint 𝑆𝑏𝑒𝑠𝑡 (4.1.3) 𝑆𝑏𝑒𝑠𝑡 , as elaborated in Section 4.1.4. If 𝐵𝑆𝑏𝑒𝑠𝑡 denotes the Laplacian
5 Let 𝐵 = 𝐵𝑏𝑎𝑠𝑒 + 𝐵𝑆𝑏𝑒𝑠𝑡 of the hint graph,
6 Solve the generalized eigenvalue problem 𝐿𝐺 𝑥 = 𝜆𝐵𝑥 to
compute 𝑚 nontrivial eigenvectors (4.1.5)
7 Construct a family of trees {𝑇𝑖 𝑗 } based on computed 𝑥𝑇 1 𝑥 𝑇 1V 𝑥𝑇 1V
𝐵𝑆𝑏𝑒𝑠𝑡 𝑥 = (𝑥 − 𝑇 · 1) − (𝑥 − 𝑇 0 · 1V0 ) − (𝑥 − 𝑇 1 · 1V1 ) (8)
eigenvectors (4.2) 1 1 1𝑉 1𝑉0 1𝑉 1𝑉1
8 Generate candidate solutions {𝑆𝑖 𝑗 } by running tree-sweep and 0 1

METIS on trees {𝑇𝑖 𝑗 } (4.3) where 1𝑉𝑖 denotes the 1-0 vector with 1s in entries corresponding
9 Set 𝑆𝑏𝑒𝑠𝑡 to the best partitioning solution in {𝑆𝑖 𝑗 } to the vertices in 𝑉𝑖 . By exploiting the sparsity in 1𝑉𝑖 , the product is
10 end implemented in 𝑂 (|𝑉 |) time.
11 Construct {𝑆𝑐 𝑗 } by picking the best 𝛿 solutions from { {𝑆𝑖 𝑗 } }
12 return {𝑆𝑐 𝑗 }  *OUVJUJPO PO UIF DPOTUSVDUFE HSBQIT We solve the gen-
eralized eigenvalue problem 𝐿𝐺 𝑥 = 𝜆𝐵𝑥, where 𝐵 = 𝐵𝑏𝑎𝑠𝑒 + 𝐵𝑆𝑏𝑒𝑠𝑡 .
From the discussion in Section 2.2 recall that the eigenvalue problem
is directly related to solving
the form 𝐿𝐺 𝑥, where 𝐿𝐺 is the Laplacian of 𝐺, which is all we need
to perform the eigenvector computation. In all places where Algo- 𝑥𝑇 𝐿𝐺 𝑥 𝑥𝑇 𝐿𝐺 𝑥
rithm 1 mentions the construction of any Laplacian, we construct the min 𝑅(𝑥) = min 𝑇
= min 𝑇 (9)
𝑥 𝑥 𝑥 𝐵𝑥 𝑥 𝑥 𝐵𝑏𝑎𝑠𝑒 𝑥 + 𝑥 𝑇 𝐵𝑆 𝑥
𝑏𝑒𝑠𝑡
equivalent function for evaluating matrix-vector products. This is
further justified in Section 4.1.5. The function 𝑓𝐿𝐺 is an application over the real vectors 𝑥. Recall also that this is a relaxation of the
of the following identity that is based on expressing 𝐿𝐺 as a sum of minimization problem over 0-1 cut indicator vectors. Let 𝑥𝑆 be the
Laplacians of cliques: indicator vector for some set 𝑆 ⊂ 𝑉 . Then, using Equation (1) we
 1  𝑥 𝑇 1e
 have:
𝐿𝐺 𝑥 = 𝑥 − 𝑇 · 1e , (4) • 𝑥𝑆𝑇 𝐿𝐺 𝑥𝑆 = 𝑐𝑢𝑡𝑠𝑖𝑧𝑒𝐺 (𝑆) which is a proxy for 𝑐𝑢𝑡𝑠𝑖𝑧𝑒𝐻 (𝑆). Thus,
|𝑒 | − 1 1e 1e the numerator incentivizes smaller cuts in 𝐻 .
𝑒 ∈𝐸
• 𝑥𝑆𝑇 𝐵𝑏𝑎𝑠𝑒 𝑥𝑆 = 𝑐𝑢𝑡𝑠𝑖𝑧𝑒𝐺 𝑤 (𝑆). By Equation (6), this is equal to
where 1e is the 1-0 vector with 1s in the entries corresponding to
the vertices in 𝑒. By exploiting the sparsity in 1e , the product is 𝑤 𝑆 · 𝑤𝑉 −𝑆 , where 𝑤 𝑆 is the total weight of the vertices in 𝑆. Thus
implemented to run in 𝑂 (|𝐼 |) time. the denominator incentivizes a large 𝑤 𝑆 · 𝑤𝑉 −𝑆 , which implies
balance.
 8FJHIU#BMBODF (SBQI The weight-balance graph 𝐺 𝑤 is a • 𝑥𝑆𝑇 𝐵𝑆𝑏𝑒𝑠𝑡 𝑥𝑆 is maximized when all edges of 𝐺ℎ𝑖𝑛𝑡 are cut, thus the
complete weighted graph used to capture arbitrary vertex weights and denominator incentivizes cutting many edges that are also cut by
incentivize balanced cuts, as we elaborate in Section 4.1.4. 𝐺 𝑤 has the hint.
the same vertices as hypergraph 𝐻 , and edges of weight 𝑤𝑢 · 𝑤 𝑣
between any two vertices 𝑢 and 𝑣. Let 𝑤𝑉𝑖 be the weight of block 𝑉𝑖
in a partitioning solution 𝑆, i.e.,

𝑤𝑉𝑖 = 𝑤𝑣 . (5)
𝑣 ∈𝑉𝑖
We have
  
𝑤𝑉0 · 𝑤𝑉1 = 𝑤𝑣 · 𝑤𝑣 = 𝑤 𝑣 · 𝑤𝑢
𝑣 ∈𝑉0 𝑣 ∈𝑉1 𝑣 ∈𝑉0 ,𝑢 ∈𝑉1

= 𝑤𝑒 𝑣𝑢 = 𝑐𝑢𝑡𝑠𝑖𝑧𝑒𝐺 𝑤 (𝑆) (6)
𝑣 ∈𝑉0 ,𝑢 ∈𝑉1
We now discuss how to compute matrix-vector products with the
Laplacian matrix of 𝐺 𝑤 , which we denote by 𝐵𝑏𝑎𝑠𝑒 . Let w be the
vector of vertex weights. We have the identity
𝑥𝑇 1
𝐵𝑏𝑎𝑠𝑒 𝑥 = w ◦ 𝑥 − 𝑇 · w, (7)
1 1
where 1 is the all-ones vector and ◦ denotes the Hadamard product. Figure 3: Graphs used in ISSHP, Algorithm 1.
Clearly, this can be carried out in time 𝑂 (|𝑉 |).
4QFD1BSU " 4VQFSWJTFE 4QFDUSBM 'SBNFXPSL GPS )ZQFSHSBQI 1BSUJUJPOJOH 4PMVUJPO *NQSPWFNFOU *$$"% ً 0DUPCFS /PWFNCFS   4BO %JFHP $" 64"

 $PNQVUBUJPO We solve the generalized eigenvalue problem we reorder 𝐺ˆ𝑌 using the order induced by sorting the smallest non-
𝐿𝐺 𝑥 = 𝜆𝐵𝑥 using the preconditioned eigensolver LOBPCG [13]. Due trivial eigenvector computed earlier. Empirically, this order has the
to its iterative nature, LOBPCG does not require explicit matrices 𝐿𝐺 advantage of producing slightly better LSSTs.
and 𝐵, but merely functions that evaluate matrix-vector products with MST: A graph can contain multiple different LSSTs, with each
them. For fast computation, the solver can utilize a preconditioner of them approximating to different degrees the length 𝑙 (𝑒𝑢𝑣 ) for
for 𝐿𝐺 , also in an implicit functional form. To compute the precon- any given 𝑒𝑢𝑣 . It should also be noted that the AKPW algorithm
ditioner we first obtain an explicit graph 𝐺˜ that
 is spectrally-similar
is known to be suboptimal with respect to the approximation fac-
with 𝐺 and has size at most 3|𝐼 |, where |𝐼 | = 𝑒 ∈𝐸 |𝑒 |. More specifi- tor 𝑓 (|𝑉 |); more sophisticated algorithms exist but they are far from
cally, we build 𝐺˜ by replacing every hyperedge 𝑒 in 𝐻 with the sum practical. For these reasons we also compute a Minimum Spanning
of 3 uniformly weighted random cycles on the vertices 𝑉𝑒 of 𝑒. This Tree of 𝐺.ˆ For most weighted graphs an MST can be viewed as an
is an essentially optimal sparse spectral approximation for the clique easy-to-compute proxy to an LSST, which potentially has better or
on 𝑉𝑒 .2 Since 𝐺 is a sum of cliques, and 𝐺ˆ is a sum of tight spec- complementary distance-preserving properties relative to the tree
tral approximations of cliques, standard graph support theory [38] computed by the AKPW algorithm. We construct the MST using
implies that 𝐺ˆ is a tight spectral approximation for 𝐺. Finally, we Kruskal’s algorithm [3].
compute a preconditioner of 𝐿𝐺ˆ using the CMG algorithm [20]; by  'BNJMZ PG 5SFFT Recall now that we have a matrix 𝑋 of 𝑚
transitivity [38], it is also a preconditioner for 𝐿𝐺 . eigenvectors. We construct the LSST and MST for the graphs 𝐺ˆ𝑋𝑖
for 𝑖 = 1, . . . , 𝑚., and for the graph 𝐺ˆ𝑋 . Along with the path graphs,
4.2 Tree Construction these comprise a family 𝐹 of trees. In total, we have 𝜏 = 𝑚+2×(𝑚+1)
After solving the generalized eigenvalue problem, we have a matrix trees, comprised of 𝑚 path graphs, 𝑚 + 1 MSTs, and 𝑚 + 1 LSSTs.
𝑋 ∈ R |𝑉 |×𝑚 of 𝑚 computed eigenvectors {𝑥 1, 𝑥 2, ..., 𝑥𝑚 } that we use In the default setting, 𝜏 = 8.
to construct a number of trees on 𝑉 .
4.3 Cut Distilling and Partitioning on a Tree
 1BUIT We first use a standard linear ordering algorithm [39] We will use each tree 𝑇 in the family of trees to distill the cut structure
to obtain a path graph for each eigenvector 𝑥𝑖 , by sorting the vertices of 𝐻 over 𝑇 , in the following sense: For any fixed tree 𝑇 = (𝑉 , 𝐸𝑇 ),
in 𝑉 based on 𝑥𝑖 in non-decreasing order and connecting the sorted observe that the removal of an edge 𝑒𝑇 of 𝑇 yields a partitioning
vertices in that order. The path graph is implicit in the proof of the 𝑆𝑒𝑇 ⊂ 𝑉 and thus of the original hypergraph 𝐻 . We would thus like
Cheeger inequality [31] which shows that a relatively good cut of to reweight each edge 𝑒 ∈ 𝐸𝑇 with the corresponding 𝑐𝑢𝑡𝑠𝑖𝑧𝑒
the graph into two parts can be found by sweeping over the 𝑛 − 1  𝐻 (𝑆𝑒𝑇 ).
Computing these edge weights on𝑇 can be done in 𝑂 ( 𝑒 |𝑒 | log |𝑒 |)
tree cuts. We thus use the 𝑚 eigenvectors to construct 𝑚 path graphs time, via an elaborate algorithm involving the computation of least
in total. These path graphs naturally arrange together vertices with common ancestors (LCA) on 𝑇 , in combination with dynamic pro-
similar global positioning, but neighboring nodes in the path are not gramming on 𝑇 . We now describe the main idea by example; the
necessarily neighbors in the original hypergraph 𝐻 . That means the omitted details can be found in our code.
local neighborhood information is not fully preserved in the paths.

 $MJRVF &YQBOTJPO 4QBOOJOH 5SFFT To address the issue


of preserving local information, we work with a weighted graph
that reflects both the connectivity of 𝐻 and the global information
contained in the embedding, adapting an idea that has been used in
work on 𝑘-way Cheeger inequalities [22].
Concretely, we form a graph 𝐺ˆ by replacing every edge 𝑒 of 𝐻 with
a sum of 𝜁 cycles (as discussed also in Section 4.1.5). Suppose that
𝑌 ∈ R |𝑉 |×𝑑 is an embedding matrix and denote by 𝑌𝑢 the row of 𝑌
containing the embedding of vertex 𝑢. We construct the weighted
graph 𝐺ˆ𝑌 by setting the length of each edge 𝑒𝑢𝑣 ∈ 𝐺ˆ to ||𝑌𝑢 − 𝑌𝑣 || 2 ,
i.e., equal to the Euclidean distance between the two vertices in the
embedding. We will be computing spanning trees of 𝐺ˆ𝑌 . Figure 4: Hyperedge, junctions and their numerical labels
LSST: A desired property for a spanning tree 𝑇ˆ of 𝐺ˆ𝑌 is to pre- .
serve the embedding information contained in 𝐺ˆ as faithfully as We consider 𝑇 to be rooted at an arbitrary vertex. In the example of
possible. Thus, we let 𝑇ˆ be a Low Stretch Spanning Tree (LSST) Figure 4, consider hyperedge 𝑒 = {𝑣 1, 𝑣 5, 𝑣 9 }. The LCA of its nodes
of 𝐺,ˆ which by definition means that the length 𝑙 (𝑒𝑢𝑣 ) of each edge is 𝑣 7 . Then, the weight of 𝑒 should be accounted for the set 𝐶𝑒 ⊂ 𝐸𝑇
in 𝐺ˆ is approximated on average, and up to a small function 𝑓 (|𝑉 |), of all tree edges that are ancestors of {𝑣 1, 𝑣 5, 𝑣 9 } and descendants
by the distance between the nodes 𝑢 and 𝑣 in 𝑇ˆ [2]. We compute the of 𝑣 7 . We do this as follows. (i) We compute a set of junction vertices
LSST using the AKPW algorithm of Alon et al. [2]. The output of that are LCAs of {𝑣 1, 𝑣 5 } and {𝑣 1, 𝑣 5, 𝑣 9 }. (ii) We then “label” these
the AKPW algorithm depends on the vertex ordering of its input. To junctions with −𝑤𝑒 , where 𝑤𝑒 is the weight of 𝑒. More generally, for
make it invariant to the vertex ordering in the original hypergraph 𝐻 , a hyperedge e={𝑣𝑖 1 , . . . , 𝑣𝑖𝑘 } ordered according to 𝑇 , we calculate the
LCAs for the 𝑘−1 sets {𝑣𝑖 1 , . . . , 𝑣𝑖 𝑗 } for 𝑗 = 2, . . . , 𝑘, and the junctions
are labeled with appropriate negative multiples of 𝑤𝑒 . We also label
2 The the vertices in 𝑒 with 𝑤𝑒 . (iii) All other vertices are implicitly labeled
construction relies on theory about the asymptotic properties of random 𝑑 -regular
expanders (e.g., see [32] or Theorem 4.16 in [33]). For the hyperedges in our context, with 0. Consider an arbitrary edge 𝑒𝑇 of the tree, and compute the sum-
the near-optimality of our construction can also be verified numerically. below-𝑒𝑇 , i.e., the sum of the labels of vertices that are descendants of
*$$"% ً 0DUPCFS /PWFNCFS   4BO %JFHP $" 64" #PEIJTBࡻB BOE ;IJBOH FU BM

𝑒𝑇 . This will be 𝑤𝑒 on all edges of 𝐶𝑒 and 0 otherwise, thus correctly KaHyPar on 𝐻 to generate an initial partitioning solution 𝑆𝑖𝑛𝑖𝑡 , which
accounting for the hyperedge 𝑒 on the intended set of edges 𝐶𝑒 . is leveraged by SpecPart as a “hint” to generate an improved partition
In order to compute the correct total counts on all tree edges, 𝑆𝑜𝑢𝑡 . Here we run hMETIS and KaHyPar with their respective default
we iterate over hyperedges, compute their junctions and tally the parameter settings.6 To avoid any possible confusion, we adopt
associated labels. Then, for any tree edge 𝑒𝑇 , the sum-below-𝑒𝑇 will these conventions: 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡ℎ and 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡𝑘 represent the cutsizes
equal 𝑐𝑢𝑡𝑠𝑖𝑧𝑒𝐻 (𝑆𝑒𝑇 ). These sums can be computed in 𝑂 (|𝑉 |) time, of SpecPart with the initial solutions generated by hMETIS and
via dynamic programming on 𝑇 . A similar application of dynamic KaHyPar respectively; 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡 represents the best cutsize between
programming can compute the total weight of the vertices that lie 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡ℎ and 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡𝑘 ; and ℎ𝑀𝐸𝑇 𝐼𝑆𝑖 and 𝐾𝑎𝐻𝑦𝑃𝑎𝑟𝑖 represent
below 𝑒𝑇 on 𝑇 . We can thus compute the value for the balanced the best cutsizes generated by running hMETIS and KaHyPar 𝑖 times
cut objective for 𝑆𝑒𝑇 and pick the 𝑆𝑒𝑇 that minimizes the objective with different random seeds respectively.
among the 𝑛 − 1 cuts suggested by the tree.
For a partition 𝑆 ⊂ 𝑉 that cuts more than one edge on 𝑇 we have Statistics ℎ𝑀𝐸𝑇 𝐼𝑆 5 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡ℎ ℎ𝑀𝐸𝑇 𝐼𝑆 20 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡 20
𝑐𝑢𝑡𝑆𝑖𝑧𝑒𝐻 (𝑆) ≤ 𝑐𝑢𝑡𝑆𝑖𝑧𝑒𝑇 (𝑆), and owing to the spectral origin of 𝑇 Benchmark |𝑉 | |𝐸| 𝜖 = 2/20 𝜖 = 2/20 𝜖 = 2/20 𝜖 = 2/20
sparcT1_core 91976 92827 1073 / 1242 1012 / 903 1066 / 1172 1012 / 903
we hope that 𝑐𝑢𝑡𝑆𝑖𝑧𝑒𝑇 (𝑆) can provide a good proxy for 𝑐𝑢𝑡𝑆𝑖𝑧𝑒𝐻 (𝑆)
neuron 92290 125305 260 / 228 252 / 206 260 / 228 252 / 206
the cuts of 𝐻 . Therefore, we use METIS [5] to solve a balanced stereovision 94050 127085 213 / 129 180 / 91 180 / 129 180 / 91
partitioning problem on the reweighted tree, with the original vertex des90 111221 139557 403 / 377 402 / 358 402 / 377 402 / 358
weights from 𝐻 . This can potentially return a partition 𝑆 ⊂ 𝑉 that cuts SLAM_spheric 113115 142408 1061 / 1061 1061 / 1061 1061 / 1061 1061 / 1061
cholesky_mc 113250 144948 301 / 478 285 / 345 285 / 478 285 / 345
more than one edge on 𝑇 . In some cases we do get 𝑐𝑢𝑡𝑆𝑖𝑧𝑒𝐻 (𝑆) ≤ segmentation 138295 179051 141 / 112 126 / 78 136 / 112 126 / 78
𝑐𝑢𝑡𝑆𝑖𝑧𝑒𝐻 (𝑆𝑒𝑇 ), thus further improving the solution. bitonic_mesh 192064 235328 667 / 554 585 / 483 614 / 554 587 / 483
dart 202354 223301 849 / 546 807 / 543 844 / 540 807 / 540
5 EXPERIMENTAL VALIDATION openCV
stap_qrd
217453 284108
240240 290123
535 / 552
399 / 295
510 / 518
399 / 295
511 / 541
399 / 295
510 / 518
399 / 295
The SpecPart framework is implemented in Julia and we provide both minres 261359 320540 215 / 189 215 / 189 215 / 189 215 / 189
Julia and Python interfaces. We use CPLEX [36] and LOBPCG [17] cholesky_bdti 266422 342688 1161 / 1024 1156 / 998 1157 / 947 1156 / 947
denoise 275638 356848 814 / 478 416 / 224 722 / 478 416 / 224
as our ILP solver and eigenvalue solver respectively. We run all sparcT2_core 300109 302663 1282 / 1630 1244 / 1245 1273 / 1447 1244 / 1245
experiments on a server with 56 Xeon E5-2650L, 1.70GHz proces- gsm_switch 493260 507821 5883 / 5352 1852 / 1407 5077 / 5352 1827 / 1407
sors and 256 GB memory. We have compared our framework with mes_noc 547544 577664 674 / 632 641 / 617 648 / 632 634 / 617
LU230 574372 669477 3328 / 2710 3273 / 2677 3328 / 2677 3273 / 2677
two state-of-the-art hypergraph partitioners3 (hMETIS [6] and KaHy- LU_Network 635456 726999 549 / 528 525 / 524 549 / 528 525 / 524
Par [24]) on three different sets of benchmarks (ISPD98 VLSI Circuit sparcT1_chip2 820886 821274 1198 / 1023 899 / 783 1198 / 951 899 / 783
Benchmark Suite [4], Titan23 Suite [8] and Industrial Benchmark directrf 931275 1374742 588 / 343 574 / 295 588 / 295 574 / 295
1089284 1448151 1576 / 1225 1514 / 1225 1489 / 1225 1297 / 1225
Suite from a leading FPGA company).4 The statistics of these bench- bitcoin_miner

marks are summarized in Table 3, Table 4 and Table 5 respectively. Table 4: Statistics of Titan23 suite [8]. ℎ𝑀𝐸𝑇 𝐼𝑆 5 and ℎ𝑀𝐸𝑇 𝐼𝑆 20 represent
the best cutsizes generated by running hMETIS 5 and 20 times with
different random seeds. 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡ℎ represents the cutsize generated by
Statistics 𝐵𝑒𝑠𝑡 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡 𝐵𝑒𝑠𝑡 𝑤 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡 𝑤
Benchmark |𝑉 | |𝐸| 𝜖 = 2/10 𝜖 = 2/10 𝜖 = 2/10 𝜖 = 2/10
SpecPart where the hint is obtained from running hMETIS once with
IBM01 12752 14111 203 [7] / 180 [43] 202 / 171 227 [44] / 215 [43] 215 / 197 default random seed. 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡 20 represents the cutsize generated by
IBM02 19601 19584 354 [43] / 262 [43] 336 / 262 266 [43] / 266 [45] 282 / 256 SpecPart where the hint is the solution corresponding to ℎ𝑀𝐸𝑇 𝐼𝑆 20 .
IBM03 23136 27401 957 [4] / 956 [7] 959 / 952 748 [43] / 681 [43] 813 / 541
IBM04 27507 31970 595 [43] / 542 [43] 593 / 388 506 [43] / 440 [43] 476 / 393
IBM05 29347 28446 1733 [43] / 1715 [7] 1720 / 1688 1727 [43] / 1716 [44] 1724 / 1692
IBM06 32498 34826 978 [43] / 885 [43] 963 / 733 531 [43] / 367 [43] 500 / 306
IBM07 45926 48117 951 [43] / 853 [43] 935 / 760 739 [43] / 737 [43] 776 / 634 Statistics 𝐾𝑎𝐻𝑦𝑃𝑎𝑟 𝐾𝑎𝐻𝑦𝑃𝑎𝑟 10 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡𝑘
IBM08 51309 50513 1159 [4] / 1159 [4] 1146 / 1140 1188 [43] / 1157 [43] 1196 / 1116 Benchmark # Vertices # Hyperedges 𝜖 = 2/20 𝜖 = 2/20 𝜖 = 2/20
IBM09 53395 60902 629 [43] / 624 [25] 620 / 519 523 [43] / 523 [43] 519 / 519 industrial01 349927 428676 2910 / 2426 2806 / 2426 2814 / 2401
IBM10 69429 75196 1333 [43] / 1254 [25] 1318 / 1261 1133 [43] / 756 [43] 1076 / 443
IBM11 70558 81454 1071 [43] / 960 [25] 1062 / 764 781 [43] / 695 [43] 765 / 649
industrial02 499718 778588 1871 / 1436 1455 / 955 520 / 234
IBM12 71076 77240 1918 [43] / 1872 [25] 1920 / 1842 1998 [43] / 1982 [43] 1965 / 1973 industrial03 522302 553375 10398 / 8628 8720 / 7646 8392 / 6711
IBM13 84199 99666 859 [43] / 832 [25] 848 / 693 902 [43] / 833 [43] 843 / 822 industrial04 570076 648667 2232 / 2889 2058 / 2889 2057 / 2369
IBM14 147605 152772 1865 [43] / 1805 [25] 1859 / 1768 1772 [43] / 1527 [43] 1819 / 1339 industrial05 656245 829321 2679 / 1838 2670 / 1838 2670 / 1829
IBM15 161570 186608 2833 [43] / 2622 [25] 2741 / 2235 2099 [43] / 1801 [43] 1904 / 1605 industrial06 733740 796261 10929 / 8321 9852 / 7646 9884 / 7646
IBM16 183484 190048 2059 [43] / 1720 [25] 1951 / 1619 1692 [43] / 1668 [43] 1623 / 1619 industrial07 733740 796261 680 / 560 680 / 560 680 / 560
IBM17 185495 189581 2403 [43] / 2210 [25] 2354 / 1989 2353 [43] / 2257 [43] 2270 / 2008
IBM18 210613 201920 1587 [43] / 1541 [43] 1535 / 1537 1664 [43] / 1522 [43] 1612 / 1532
industrial08 1245270 1262096 39785 / 34659 39518 / 34614 39546 / 34614
Table 3: Statistics of ISPD98 VLSI circuit benchmark suite [4]. 𝐵𝑒𝑠𝑡 and Table 5: Statistics of industrial benchmark suite from a leading FPGA
𝐵𝑒𝑠𝑡 𝑤 represent the best published cutsizes for unit weights and actual company. 𝐾𝑎𝐻 𝑦𝑃𝑎𝑟 and 𝐾𝑎𝐻 𝑦𝑃𝑎𝑟 10 represent the best cutsize gener-
weights respectively. 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡 and 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡 𝑤 represent the cutsizes ated by running KaHyPar once and 10 times respectively. 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡𝑘
generated by SpecPart for unit weights and actual weights respectively. represents the cutsize generated by SpecPart where the hint is obtained
from running KaHyPar once with default random seed.

5.1 Experimental Results  *41% CFODINBSLT XJUI VOJU XFJHIUT Here we present
In this section, we present the experimental results of SpecPart with results for the ISPD98 VLSI Circuit Benchmark Suite with unit vertex
default parameter setting.5 We run SpecPart as follows. Given a weights. In Table 3 we present the solutions generated by SpecPart
hypergraph 𝐻 and an imbalance factor 𝜖, we first run hMETIS and/or and compare them with the corresponding best previously published
solutions, with references to the corresponding publications.
3 We do not compare our results with PaToH since it generates weaker cuts compared Figures 5(a)-(b) reports the solutions sizes obtained from SpecPart,
to hMETIS and KaHyPar on the ISPD98, Titan23 and industrial benchmarks.
4 We make public with permissive open-source license all partition solutions, scripts 6 The default parameter setting for hMETIS [7] is: Nruns = 10, CType = 1, RType = 1,
and code at [41]. Vcycle = 1, Reconst = 0 and seed = 0. The default configuration file we use for KaHyPar
5 The default values for parameters (𝛿 , 𝛽 , 𝛾 , 𝜁 , 𝜃 and 𝑚 ) are shown in Table 1. is cut_rKaHyPar_sea20.ini [40]).
4QFD1BSU " 4VQFSWJTFE 4QFDUSBM 'SBNFXPSL GPS )ZQFSHSBQI 1BSUJUJPOJOH 4PMVUJPO *NQSPWFNFOU *$$"% ً 0DUPCFS /PWFNCFS   4BO %JFHP $" 64"

(a) 𝜖 = 2. (b) 𝜖 = 10. (c) 𝜖 = 2.

(d) 𝜖 = 10. (e) 𝜖 = 2. (f) 𝜖 = 20.

Figure 5: Results of SpecPart on ISPD98 VLSI Circuit Benchmark Suite [4] and Titan23 Suite [8] with different imbalance factors (𝜖).

𝐾𝑎𝐻𝑦𝑃𝑎𝑟 5 , and ℎ𝑀𝐸𝑇 𝐼𝑆 5 , normalized by the best published solu- benchmarks is very large (more than two hours), too high for any rea-
tion sizes. While ℎ𝑀𝐸𝑇 𝐼𝑆 5 and (mostly) 𝐾𝑎𝐻𝑦𝑃𝑎𝑟 5 also improve sonable industrial setting (for more details on runtime see [41]). For
upon these previous solutions, it can be seen that SpecPart generates this reason we do not compare against KaHyPar. It should be noted
a significant improvement over both KaHyPar and hMETIS on a that because we could not find previous published results on Titan23,
number of instances. The reasoning behind picking ℎ𝑀𝐸𝑇 𝐼𝑆 5 is mo- Figure 5 reports cut sizes normalized by those obtained by ℎ𝑀𝐸𝑇 𝐼𝑆 5 ,
tivated by an “iso” (similar) runtime comparison. For these relatively i.e., the best cut size generated by running hMETIS five times with
small instances SpecPart has approximately a 50% runtime overhead different random seeds. It can be seen that SpecPart generates sig-
over ℎ𝑀𝐸𝑇 𝐼𝑆 5 , which is subject to significant improvement. This nificantly better partitioning solutions. The improvements are even
illustrates that SpecPart can improve very quickly upon solutions more than 50% for benchmarks gsm_switch and denoise. To further
computed under stringent walltime requirements.7 examine the performance of SpecPart, we add these experiments:
(i) run hMETIS twenty times with different random seeds and report
 *41% CFODINBSLT XJUI BDUVBM XFJHIUT We further the best cut size ℎ𝑀𝐸𝑇 𝐼𝑆 20 ; and (ii) set the solution corresponding to
verify our framework on the vertex-weighted ISPD98 benchmarks. ℎ𝑀𝐸𝑇 𝐼𝑆 20 as the initial solution to SpecPart and generate the cutsize
Mirroring the considerations of section 5.1.1, the results are presented 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡 20 . We observe that 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡ℎ is still much better even
in Table 3 and Figures 5(c)-(d). The inclusion of weights makes the compared to ℎ𝑀𝐸𝑇 𝐼𝑆 20 for almost all the benchmarks. 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡 20
problem more general and potentially more difficult. Here, we see a is also better than 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡 5 for some benchmarks. This suggests
tendency of SpecPart to yield bigger improvements. that SpecPart can achieve better performance even when standard
The Titan23 and Industrial benchmarks are interesting not just partitioners are allowed significantly more running time (see also
because they are significantly larger than ISPD98, but also because Section 5.3).
they are generated by different, more modern synthesis processes.
They hence provide a ‘test of time’ for hMETIS, but also for KaHyPar
which does not include Titan23 in its experimental study [24].  *OEVTUSJBM CFODINBSLT GSPN B MFBEJOH '1(" DPNQBOZ
Table 5 presents the results of Industrial Benchmark Suite from
 5JUBO CFODINBSLT Table 4 and Figures 5(e)-(f) show a leading FPGA company. Here we present results for imbalance
the results. While the SpecPart runtime overhead over ℎ𝑀𝐸𝑇 𝐼𝑆 5 factors (𝜖 = 2 and 20) as per guidance from our industrial collaborator.
remains at around 50%, the runtime of KaHyPar on some of these We do not compare against hMETIS because it fails with a segmenta-
7 Of tion fault on these benchmarks. KaHyPar remains impractically slow
course, hMETIS and KaHyPar can be run for more random starts. We include
such an experimental study for the larger and more interesting Titan23 and Industrial on these large benchmarks, taking almost one hour on some of the in-
benchmarks, but we omit them for ISPD98. dustrial benchmarks; SpecPart adds less than 5% overhead to single
*$$"% ً 0DUPCFS /PWFNCFS   4BO %JFHP $" 64" #PEIJTBࡻB BOE ;IJBOH FU BM

(a) Validation of SpecPart default parameter values. (b) Comparison on benchmark sparcT2_core (𝜖 = 10). (c) Comparison on benchmark gsm_switch (𝜖 = 10).

Figure 6: (a): Validation of SpecPart parameters discussed in Section 5.2. (b,c): QoR vs. runtime overhead of 𝑀𝑢𝑙𝑡𝑖-𝑠𝑡𝑎𝑟𝑡 -ℎ𝑀𝐸𝑇 𝐼𝑆,
𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛-𝑜𝑣𝑒𝑟𝑙𝑎𝑦-𝑝𝑎𝑟𝑡 , 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡ℎ , and 𝐴𝑢𝑡𝑜𝑡𝑢𝑛𝑒𝑖 -𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡 . 𝑀𝑢𝑙𝑡𝑖-𝑠𝑡𝑎𝑟𝑡 -ℎ𝑀𝐸𝑇 𝐼𝑆 = best cutsize from running hMETIS multiple times with different
random seeds. 𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛-𝑜𝑣𝑒𝑟𝑙𝑎𝑦-𝑝𝑎𝑟𝑡 = cutsize from running Cut-Overlay Clustering and Optimal-Attempt Partitioning directly on candidate solutions.
𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡ℎ = cutsize from SpecPart when the initial solution is from one hMETIS run with default random seed. 𝐴𝑢𝑡𝑜𝑡𝑢𝑛𝑒𝑖 -𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡 = cutsize from
SpecPart when the initial solution is from autotuning of hMETIS with 𝑖 trials.

run of KaHypar. Nevertheless, we allow the very large runtime and 100 times and report the average result in Figures 6(b,c). We observe
report a comparison with a single run of 𝐾𝑎𝐻𝑦𝑃𝑎𝑟 and 𝐾𝑎𝐻𝑦𝑃𝑎𝑟 10 that 𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛-𝑜𝑣𝑒𝑟𝑙𝑎𝑦-𝑝𝑎𝑟𝑡 is much better than 𝑀𝑢𝑙𝑡𝑖-𝑠𝑡𝑎𝑟𝑡-ℎ𝑀𝐸𝑇 𝐼𝑆,
in Table 5. It can be seen that even when the hint is based on a fairly and that SpecPart generates superior solutions in less runtime com-
expensive computation (a single run of KaHyPar), 𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡 can still pared to 𝑀𝑢𝑙𝑡𝑖-𝑠𝑡𝑎𝑟𝑡-ℎ𝑀𝐸𝑇 𝐼𝑆 and 𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛-𝑜𝑣𝑒𝑟𝑙𝑎𝑦-𝑝𝑎𝑟𝑡. This sug-
generate significant improvements even over 𝐾𝑎𝐻𝑦𝑃𝑎𝑟 10 on some gests that ISSHP is an important component of SpecPart.
of the benchmarks, especially industrial05 where the improvement
is more than 50%. We speculate that the improvements would have  4PMVUJPO FOIBODFNFOU hMETIS has parameters whose
been greater if based on a hint provided by hMETIS, which is in setting may significantly impact the quality of generated partition-
general much faster than KaHyPar. ing solutions. We use Ray [42] to tune the following parameters
of hMETIS: CType with possible values {1, 2, 3, 4, 5}, RType with
5.2 Validation of Parameters possible values {1, 2, 3}, Vcycle with possible values {1, 2, 3}, and
We now discuss the effect of tuning parameters on SpecPart. The pa- Reconst with possible values {0, 1}. The search algorithm we use in
rameters we explore are the number of best solutions (𝛿), the number Ray [42] is HyperOptSearch. We set the number of trials to five, ten
of iterations of ISSHP (𝛽), the number of random cycles (𝜁 ), and the and forty, i.e., Ray will launch five, ten and forty runs of hMETIS
threshold of the number of hyperedges in the clustered hypergraph with different parameters respectively. We set the number of threads
𝐻𝑐 (𝛾). We define the score value as the average improvement of to ten to reduce the runtime. The results appear in Figures 6(b,c).
𝑆𝑝𝑒𝑐𝑃𝑎𝑟𝑡ℎ with respect to ℎ𝑀𝐸𝑇 𝐼𝑆 5 on benchmarks sparcT1_core, Here we normalize the cutsize and runtime to that of running hMETIS
cholesky_mc, segmentation, denoise, gsm_switch and directf. When once with default random seed. Autotuning increases the runtime
we sweep (i.e., vary the value of) one parameter, the remaining pa- for hMETIS and computes a better hint 𝑆𝑖𝑛𝑖𝑡 , yet we see a further 2%
rameters are fixed at their default values (Table 2) and 𝜖 is set to and 4% cutsize improvement from SpecPart for sparcT2_core and
20. The results appear in Figure 6(a). Sweeping for 𝛿 and 𝛾 did not gsm_switch, respectively, lending further support to the observation
change the score value in our experiments. Using 𝑚 > 2 did not gen- in Section 5.1.3.
erate further improvement. We also note that using hMETIS instead
of ILP for Optimal Attempt Partitioning, worsens the score value 6 CONCLUSION AND FUTURE DIRECTIONS
by 2.43%. From the results of tuning parameters on SpecPart we We have proposed SpecPart, the first general supervised framework
establish that our default parameter setting is a local minimum in the for hypergraph partitioning solution improvement. Experiments con-
hyperparameter search space. firm its outstanding performance compared to traditional multilevel
partitioners with similar runtime. The code, scripts, and best known
5.3 Effect of ISSHP and Solution Enhancement solution vectors are available through [41]. SpecPart opens multiple
 &৤FDU PG *44)1 In order to show the effect of ISSHP in the future research directions, with its K-way generalization being a pri-
SpecPart framework, we run Cut-Overlay Clustering and Optimal- ority. SpecPart can be integrated with the internal levels of multilevel
Attempt Partitioning directly on candidate solutions, which are gen- partitioners; producing improved solutions on each level may lead
erated by running hMETIS multiple times with different random to further improved solutions. We also believe that the Cut-Overlay
seeds. The flow is as follows. (i) We generate candidate solutions and Optimal-Attempt Partitioning are of independent interest and
{𝑆 1, 𝑆 2, ..., 𝑆𝜓 } by running hMETIS 𝜓 times with different random amenable to machine learning techniques.
seeds, and report the best cutsize 𝑀𝑢𝑙𝑡𝑖-𝑠𝑡𝑎𝑟𝑡-ℎ𝑀𝐸𝑇 𝐼𝑆. Here 𝜓 is Acknowledgments. Bodhisatta Pramanik thanks Dr. Chris Chu
an integer parameter ranging from one to twenty. (ii) We run Cut- for his early guidance. We thank Dr. Grigor Gasparyan for provid-
Overlay Clustering and Optimal-Attempt Partitioning directly on ing testcases and sharing his thoughts on SpecPart. This work was
the best five solutions from {𝑆 1, 𝑆 2, ..., 𝑆𝜓 } and report the cutsize partially supported by NSF grants CCF-2112665, CCF-2039863 and
𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛-𝑜𝑣𝑒𝑟𝑙𝑎𝑦-𝑝𝑎𝑟𝑡. For each value of 𝜓 , we run the above flow CCF-1813374, and by DARPA HR0011-18-2-0032.
4QFD1BSU " 4VQFSWJTFE 4QFDUSBM 'SBNFXPSL GPS )ZQFSHSBQI 1BSUJUJPOJOH 4PMVUJPO *NQSPWFNFOU *$$"% ً 0DUPCFS /PWFNCFS   4BO %JFHP $" 64"

REFERENCES [31] F. R. K. Chung, “Spectral graph theory”, CBMS Regional Conference Series in
[1] M. Cucuringu, I. Koutis, S. Chawla, G. Miller and R. Peng, “Simple and scal- Mathematics, 1997.
able constrained clustering: a generalized spectral method”, Proc. International [32] M. Kapralov and R. Panigrahy, “Spectral sparsification via random spanners”,
Conference on Artificial Intelligence and Statistics, 2016, pp. 445-454. Proc. Innovations in Theoretical Computer Science Conference, 2012, pp. 393-
[2] N. Alon, R. M. Karp, D. Peleg and D. West, “A graph-theoretic game and its 398.
application to the 𝑘 -server problem”, SIAM Journal on Computing (24)(1) (1995), [33] S. Hoory and N. Linial. “Expander graphs and their applications”, Bulletin of the
pp. 78-100. American Mathematical Society (43) (2006), pp. 439-561.
[3] J. B. Kruskal. “On the shortest spanning subtree of a graph and the traveling [34] C. Ravishankar, D. Gaitonde and T. Bauer, “Placement strategies for 2.5D FPGA
salesman problem”, Proc. American Mathematical Society (7)(1) (1956), pp. 48- fabric architectures”, Proc. International Conference on Field Programmable
50. Logic and Applications (FPL), 2018, pp. 16-164.
[4] C. J. Alpert, “The ISPD98 circuit benchmark suite”, Proc. ACM/IEEE International [35] R. L. Graham and P. Hell, “On the history of the minimum spanning tree problem”,
Symposium on Physical Design (ISPD), 1998, pp. 80-85. Annals of the History of Computing (7)(1) (1985), pp. 43-57.
[5] G. Karypis and V. Kumar, “A fast and high quality multilevel scheme for parti- [36] IBM ILOG CPLEX optimizer, https://www.ibm.com/analytics/cplex-optimizer.
tioning irregular graphs”, SIAM Journal on Scientific Computing (20)(1) (1998), [37] V. D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre, “Fast unfolding of
pp. 359-392. communities in large networks”, Journal of Statistical Mechanics: Theory and
[6] G. Karypis, R. Aggarwal, V. Kumar and S. Shekhar, “Multilevel hypergraph Experiment (2008)(10) (2008), pp. 10008.
partitioning: applications in VLSI domain”, IEEE Transactions on Very Large [38] E. G. Boman and B. Hendrickson, “Support theory for preconditioning”, SIAM
Scale Integration (VLSI) Systems (7))(1) (1999), pp. 69-79. Journal on Matrix Analysis and Applications (25)(3) (2003), pp. 694-717.
[7] G. Karypis and V. Kumar, “hMETIS, a hypergraph partitioning package, version [39] C. J Alpert, A. B. Kahng and S.-Z. Yao, “Spectral partitioning with multiple
1.5.3”, 1998. http://glaros.dtc.umn.edu/gkhome/fetch/sw/hMETIS/manual.pdf eigenvectors”, Discrete Applied Mathematics (90)(1) (1999), pp. 3-26.
[8] K. E. Murray, S. Whitty, S. Liu, J. Luu and V. Betz, “Titan: Enabling large and [40] https://github.com/kahypar/kahypar/blob/master/config/cut_rKaHyPar_sea20.ini
complex benchmarks in academic CAD”, Proc. International Conference on Field [41] Partition solutions, scripts and SpecPart, https://github.com/TILOS-AI-Institute/
programmable Logic and Applications, 2013, pp. 1-8. HypergraphPartitioning.
[42] Ray, https://docs.ray.io/en/latest/index.html.
[9] Ü. Çatalyürek and C. Aykanat, “PaToH (partitioning tool for hypergraphs)”,
[43] Latest actual area results for hMETIS, https://vlsicad.ucsd.edu/UCLAWeb/cheese/
Boston, MA, Springer US, 2011.
errata.html.
[10] J. Bezanson, A. Edelman, S. Karpinski and V. B. Shah, “Julia: a fresh approach to
[44] Comparison of UCLA MLPart (v4.17) and hMETIS (v1.5.3) on instances
numerical computing”, SIAM Review (59)(1) (2017), pp. 65-98.
with actual cell areas (2% configuration), https://vlsicad.ucsd.edu/UCLAWeb/
[11] C. M. Fiduccia and R. M. Mattheyses, “A linear-time heuristic for improving
benchmarks/hMETISML02Tab.html.
network partitions”, Proc. IEEE/ACM Design Automation Conference (DAC),
[45] Comparison of UCLA MLPart (v4.17) and hMETIS (v1.5.3) on instances
1982, pp. 175-181.
with actual cell areas (10% configuration), https://vlsicad.ucsd.edu/UCLAWeb/
[12] R. Shaydulin, J. Chen and I. Safro, “Relaxation-based coarsening for multilevel
benchmarks/hMETISML10Tab.html.
hypergraph partitioning”, Multiscale Modeling & Simulation (17)(1) (2019), pp.
482-506.
[13] A. V. Knyazev, “Toward the optimal preconditioned eigensolver: locally optimal
block preconditioned conjugate gradient method”, SIAM Journal on Scientific
Computing (23)(2) (2001), pp. 517-541.
[14] T. Heuer, P. Sanders and S. Schlag, “Network flow-based refinement for multilevel
hypergraph partitioning”, ACM Journal of Experimental Algorithmics (24)(2)
(2019), pp. 1-36.
[15] D. Kucar, S. Areibi and A. Vannelli, “Hypergraph partitioning techniques”, Dy-
namics of Continuous, Discrete & Impulsive Systems. Series A: Mathematical
Analysis (11)(2) (2004), pp. 339-367.
[16] R. Merris, “Laplacian matrices of graphs: a survey”, Linear Algebra and its Appli-
cations (197) (1994), pp. 143-176.
[17] A. V. Knyazev, I. Lashuk, M. E. Argentati, and E. Ovchinnikov, “Block locally
optimal preconditioned eigenvalue xolvers (BLOPEX) in hypre and PETSc”, SIAM
Journal on Scientific Computing (25)(5) (2007), pp. 2224-2239.
[18] I. Koutis, G. L. Miller and R. Peng, “Approaching optimality for solving SDD
linear system”, SIAM Journal on Computing (43)(1) (2014), pp. 337-354.
[19] J. G. Sun and G. W. Stewart, Matrix Perturbation Theory, Cambridge, MA, ACA-
DEMIC PressINC, 1990.
[20] I. Koutis, G. L. Miller and D. Tolliver, “Combinatorial preconditioners and multi-
level solvers for problems in computer vision and image processing”, Computer
Vision and Image Understanding (115)(12) (2011), pp. 1638-1646.
[21] A. E. Caldwell, A. B. Kahng and I. L. Markov, “Improved algorithms for hyper-
graph bipartitioning”, Proc. IEEE/ACM Design Automation Conference (DAC),
2000, pp. 661-666.
[22] J. R. Lee, S. O. Gharan and L. Trevisan, “Multiway spectral partitioning and
higher-order cheeger inequalities”, Journal of the ACM (JACM) (61) (2014), pp.
1-30.
[23] T. Heuer, “Engineering initial partitioning algorithms for direct k-way hypergraph
partitioning”, Karlsruher Institut für Technologie, 2015.
[24] S. Sebastian, H. Tobias, G. Lars, A. Yaroslav, S. Christian and S. Peter, “High-
quality hypergraph partitioning”, ACM Journal of Experimental Algorithmics
(2022).
[25] S. Schlag, V. Henne, T. Heuer, H. Meyerhenke, P. Sanders and C. Schulz, “k-way
Hypergraph Partitioning via n-Level Recursive Bisection”, Proc. The Meeting On
Algorithm Engineering And Experiments (ALENEX), 2016, pp. 53-67.
[26] J. Y. Zien, M. D. F. Schlag and P. K. Chan, “Multilevel spectral hypergraph
partitioning with arbitrary vertex sizes”, IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems (18)(9) (1999), pp. 1389-1399.
[27] L. Hagen and A. B. Kahng, “Fast spectral methods for ratio cut partitioning
and clustering”, Proc. IEEE/ACM International Conference on Computer-Aided
Design (ICCAD), 1991, pp. 10-13.
[28] N. Rebagliati and A. Verri. “Spectral clustering with more than K eigenvectors”,
Neurocomputing (74)(9) (2011), pp. 1391-1401.
[29] C. J. Alpert and A. B. Kahng, “Multiway partitioning via geometric embeddings,
orderings, and dynamic programming”, IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems (14)(11) (1995), pp. 1342-1358.
[30] R. Horaud, “A short tutorial on graph Laplacians, Laplacian embedding,
and spectral clustering”, 2009. https://csustan.csustan.edu/~tom/Clustering/
GraphLaplacian-tutorial.pdf.

You might also like