0% found this document useful (0 votes)
25 views14 pages

Multi-Objective Hypergraph Partitioning Algorithms For Cut and Maximum Subdomain Degree Minimization

This document presents new multi-objective hypergraph partitioning algorithms that simultaneously minimize cut size and maximum subdomain degree. It introduces the problem and motivates the need to consider both objectives. It then describes the proposed multi-phase refinement algorithms and evaluates their performance against existing methods on benchmark problems, finding improvements in maximum subdomain degree of up to 36% while maintaining comparable cut size.

Uploaded by

saviod2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views14 pages

Multi-Objective Hypergraph Partitioning Algorithms For Cut and Maximum Subdomain Degree Minimization

This document presents new multi-objective hypergraph partitioning algorithms that simultaneously minimize cut size and maximum subdomain degree. It introduces the problem and motivates the need to consider both objectives. It then describes the proposed multi-phase refinement algorithms and evaluates their performance against existing methods on benchmark problems, finding improvements in maximum subdomain degree of up to 36% while maintaining comparable cut size.

Uploaded by

saviod2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, VOL XX, NO.

XX, 2005 1

Multi-Objective Hypergraph Partitioning Algorithms


for Cut and Maximum Subdomain Degree
Minimization
Navaratnasothie Selvakkumaran and George Karypis Member, IEEE

Abstract In this paper we present a family of multi-objective TABLE I


hypergraph partitioning algorithms based on the multilevel T HE RATIOS OF THE MAXIMUM SUBDOMAIN DEGREE OVER THE AVERAGE
paradigm, which are capable of producing solutions in which SUBDOMAIN DEGREE OF VARIOUS SOLUTIONS FOR THE ISPD98
both the cut and the maximum subdomain degree are simul-
BENCHMARK .
taneously minimized. This type of partitionings are critical for
existing and emerging applications in VLSI CAD as they allow 4-way 8-way 16-way 32-way 64-way
to both minimize and evenly distribute the interconnects across ibm01 1.27 1.55 1.60 1.70 1.76
the physical devices. Our experimental evaluation on the ISPD98 ibm02 1.35 1.35 1.43 1.51 1.55
benchmark show that our algorithms produce solutions that when ibm03 1.18 1.43 1.68 1.70 1.84
compared against those produced by hMETIS have a maximum ibm04 1.28 1.35 1.41 1.72 2.39
subdomain degree that is reduced by up to 36% while achieving
ibm05 1.16 1.17 1.24 1.33 1.41
comparable quality in terms of cut.
ibm06 1.22 1.46 1.46 1.50 1.63
Index Terms Multi-chip partitioning, placement, interconnect ibm07 1.29 1.46 1.79 1.94 2.04
congestion. ibm08 1.06 1.22 1.45 1.73 2.12
ibm09 1.09 1.23 1.65 1.91 2.31
ibm10 1.23 1.43 1.69 1.78 1.85
I. I NTRODUCTION
ibm11 1.21 1.55 1.54 1.66 2.02
Hypergraph partitioning is an important problem with ex- ibm12 1.26 1.47 1.72 2.10 2.15
tensive applications to many areas, including VLSI design [5], ibm13 1.31 1.81 1.66 1.91 1.85
efficient storage of large databases on disks [32], information ibm14 1.20 1.47 1.46 1.63 1.96
retrieval [37], and data mining [12], [20]. The problem is to ibm15 1.28 1.51 1.71 1.87 2.09
partition the vertices of a hypergraph into k equal-size sub- ibm16 1.22 1.39 1.45 1.70 1.84
domains, such that the number of the hyperedges connecting ibm17 1.18 1.42 1.52 1.80 2.13
vertices in different subdomains (called the cut) is minimized. ibm18 1.16 1.61 2.33 2.65 2.78
The importance of the problem has attracted a considerable
amount of research interest and over the last thirty years a
variety of heuristic algorithms have been developed that offer
different cost-quality trade-offs. The survey by Alpert and That is, the number of hyperedges that are being cut by a
Kahng [5] provides a detailed description and comparison of particular subdomain (referred to as the subdomain degree)
various such schemes. Recently a new class of hypergraph is significantly higher than that cut by other subdomains.
partitioning algorithms has been developed [2], [4], [11], [14], This is illustrated in Table I that shows the ratios of the
[19], that are based upon the multilevel paradigm. In these maximum subdomain degree over the average subdomain
algorithms, a sequence of successively smaller hypergraphs degree of various k-way partitionings obtained for the ISPD98
is constructed. A partitioning of the smallest hypergraph is benchmark [3] using the state-of-the-art hMETIS [22] multilevel
computed. This partitioning is then successively projected to hypergraph partitioning algorithm. In many cases, the result-
the next level finer hypergraph, and at each level an iterative ing partitionings contain subdomains whose degree is up to
refinement algorithm (e.g., KL [26] or FM [13]) is used to two times higher than the average degree of the remaining
further improve its quality. Experiments presented in [2][4], subdomains.
[9], [19], [25], [35] have shown that multilevel hypergraph par- For many existing and emerging applications in VLSI CAD,
titioning algorithms can produce substantially better solutions producing partitioning solutions that both minimize the cut
than those produced by non-multilevel schemes. and also minimize the maximum subdomain degree is of great
However, despite the success of multilevel algorithms in importance. For example, within the context of partitioning-
producing partitionings in which the cut is minimized, this cut driven placement, since the maximum subdomain degree of
is not uniformly distributed across the different subdomains. a partition (also known as the bin degree) can be considered
a lower bound on the number of routing resources that is re-
quired, being able to compute partitions that both minimize the
The authors are with the Department of Computer Science & Engi-
neering, University of Minnesota, Minneapolis, MN 55455. E-mail: number of interconnects (which is achieved by minimizing the
{selva,karypis}@cs.umn.edu. cut) and also evenly distribute these interconnects across the
IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, VOL XX, NO. XX, 2005 2

perimentally evaluated the performance of these algorithms


on the ISPD98 [3] benchmark and compared them against
the solutions produced by hMETIS [22]. Our experimental
results show that our algorithms are capable of producing
solutions whose maximum subdomain degree is lower by 5%
to 36% while producing comparable solutions in terms of cut.
Moreover, the computational complexity of these algorithms
is relatively low, requiring on the average no more than twice
the amount of time required by hMETIS in most cases.
The rest of the paper is organized as follows. Section II
provides some definitions, describes the notation that is used
throughout the paper, and provides a brief description of the
multilevel graph partitioning paradigm. Section III discusses
Fig. 1. An example of multi-chip systems with grid interconnect topology
the various issues arising with minimizing the maximum
subdomain degree and formally defines the two multi-objective
formulations used in this paper. Sections IV and V describe
physical device to eliminate high density interconnect regions the direct and aggressive multi-phase refinement algorithms
(which is achieved by minimizing the maximum subdomain that we developed to simultaneously minimize the maximum
degree) can significantly reduce the peak demand of routing subdomain degree and the cut of the resulting partitioning.
resources and thus, help in reducing the peak congestion [17], Section VI experimentally evaluates these algorithms and
[38]. Similarly, in the context of multi-chip configurations, a compares them against hMETIS. Finally, Section VII provides
partitioned design cannot be mapped on to a set of chips, if some concluding remarks and outlines directions of future
there is a partition that exceeds the number of available I/O research.
pins. For example, in the multi-chip configuration shown in
Figure 1, if the degree of a subdomain exceeds 20, it cannot be
mapped to any of the available chips. To address this problem, II. D EFINITIONS AND N OTATION
techniques based on pin-multiplexing have been developed [6] A hypergraph G = (V, E) is a set of vertices V and a set of
that allow multiple signals to go through the same I/O pins hyperedges E. Each hyperedge is a subset of the set of vertices
by using time division multiplexing. However, this approach V . The size of a hyperedge is the cardinality of this subset.
reduces the speed at which the system can operate (due to A vertex v is said to be incident on a hyperedge e, if v e.
time division) and increases the overall system design (due Each vertex v and hyperedge e has a weight associated with
to the extra logic required for the multiplexing). However, them and they are denoted by w(v) and w(e), respectively.
the costs associated with pin-multiplexing can be significantly A decomposition
 of V into k disjoint subsets V 1 , V2 , . . . , Vk ,
reduced and even eliminated by computing a decomposition such that i Vi = V is called a k-way partitioning of V . We
that significantly reduces the maximum number of I/O pins will use the terms subdomain or partition to refer to each
required by any given partition. one of these k sets. A k-way partitioning of V is denoted
In this paper we present a family of hypergraph partitioning by a vector P such that P [i] indicates the partition number
algorithms based on the multilevel paradigm that are capable that vertex i belongs to. We say that a k-way partitioning of
of producing solutions in which both the cut and the max- V satisfies a balancing constraint specified
 by [l, u], where
imum subdomain degree are simultaneously minimized. Our l < u, if for each subdomain V i , l vVi w(v) u. The
algorithms treat the minimization of the maximum subdomain cut of a k-way partitioning of V is equal to the sum of the
degree as a multi-objective optimization problem that is solved weights of the hyperedges that contain vertices from different
once a high-quality, cut based, k-way partitioning has been subdomains. The subdomain degree of V i is equal to the sum
obtained. Toward this goal, we present highly effective multi- of the weights of the hyperedges that contain at least one
objective refinement algorithms that are capable to produce vertex in Vi and one vertex in V V i . The maximum subdomain
solutions that explicitly minimize the maximum subdomain degree of a k-way partitioning is the highest subdomain degree
degree and ensure that the cut does not significantly increase. over all k partitions. The sum-of-external-degrees (abbreviated
This approach has a number of inherent advantages. First, by as SOED) of a k-way partitioning is equal to the sum of
building upon a cut-based k-way partitioning, it leverages the the subdomain degrees of all the partitions. A net is said
huge body of existing research on this topic, and it can benefit to be exposed w.r.t. a subdomain when it contributes to the
from future improvements. Second, because the initial k-way subdomains degree.
solution is of extremely high-quality, it allows the algorithm Given a k-way partitioning of V and a vertex v V that
to focus on minimizing the maximum subdomain degree belongs to partition V i , vs internal degree denoted by ID i (v)
without being overly concerned about the cut of the final is equal to the sum of the weights of its incident hyperedges
solution. Finally, it provides a user-adjustable and predictable that contain only vertices from V i , and vs external degree with
framework in which the user can specify how much (if any) respect to partition V j denoted by EDj (v) is equal to the sum
deterioration on the cut he or she is willing to tolerate in of the weights of its incident hyperedges whose all remaining
order to reduce the maximum subdomain degree. We ex- vertices belong to partition V j .
IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, VOL XX, NO. XX, 2005 3

Multilevel Hypergraph Bisection


The k-way hypergraph partitioning problem is defined as
follows. Given a hypergraph G = (V, E) and a balancing GO
GO
constraint specified by [l, u], compute a k-way partitioning of
V such that it satisfies the balancing constraint and minimizes

Uncoarsening and Refinement Phase


the cut. The requirement that the size of each partition satisfies projected partition
refined partition

the balancing constraint is referred to as the partitioning

Coarsening Phase
G1 G1
constraint, and the requirement that a certain function is
optimized is referred to as the partitioning objective.

G2
A. The Multilevel Paradigm for Hypergraph Partitioning G2

The key idea behind the multilevel approach for hypergraph


partitioning is fairly simple and straightforward. Multilevel G3 G3
partitioning algorithms, instead of trying to compute the
partitioning directly in the original hypergraph, first obtain a G4
sequence of successive approximations of the original hyper- Initial Partitioning Phase
graph. Each one of these approximations represents a problem
whose size is smaller than the size of the original hypergraph. Fig. 2. The various phases of the multilevel hypergraph bisection. During
This process continues until a level of approximation is the coarsening phase, the size of the hypergraph is successively decreased;
during the initial partitioning phase, a bisection of the smaller hypergraph is
reached in which the hypergraph contains only a few tens of computed; and during the uncoarsening and refinement phase, the bisection
vertices. At this point, these algorithms compute a partitioning is successively refined as it is projected to the larger hypergraphs. During
of that hypergraph. Since the size of this hypergraph is quite the uncoarsening and refinement phase, the dashed lines indicate projected
partitionings and dark solid lines indicate partitionings that were produced
small, even simple algorithms such as Kernighan-Lin (KL) after refinement.
[26] or Fiduccia-Mattheyses (FM) [13] lead to reasonably
good solutions. The final step of these algorithms is to take the
partitioning computed at the smallest hypergraph and use it to substantially better partitionings than other popular techniques
derive a partitioning of the original hypergraph. This is usually such as spectral-based partitioning algorithms [7], [28], in
done by propagating the solution through the successive better a fraction of the time required by them. Karypis et al [19]
approximations of the hypergraph and using simple approaches extended their multilevel graph partitioning work to hyper-
to further refine the solution. graph partitioning. The hMETIS [22] package contains many
In the multilevel partitioning terminology, the above process of these algorithms and have been shown to produce high-
is described in terms of three phases. The coarsening phase, in quality partitionings for a wide-range of circuits.
which the sequence of successively approximate hypergraphs
(coarser) is obtained, the initial partitioning phase, in which III. M INIMIZING THE M AXIMUM S UBDOMAIN D EGREE
the smallest hypergraph is partitioned, and the uncoarsening There are two different approaches for computing a k-
and refinement phase, in which the solution of the smallest way partitioning of a hypergraph. One is based on recursive
hypergraph is projected to the next level finer graph, and at bisectioning and the other on direct k-way partitioning [18].
each level an iterative refinement algorithm such as KL or In recursive bisectioning, the overall partitioning is obtained
FM is used to further improve the quality of the partitioning. by initially bisecting the hypergraph to obtain a two-way
The various phases of multilevel approach in the context of partitioning. Then, each of these parts is further bisected to
hypergraph bisection are illustrated in Figure 2. obtain a four-way partitioning, and so on. Assuming that k
This paradigm was independently studied by Bui and is a power of two, then the final k-way partitioning can be
Jones [8] in the context of computing fill-reducing matrix obtained in log(k) such steps (or after performing k 1
reordering, by Hendrickson and Leland [15] in the con- bisections). In this approach, each partitioning step usually
text of finite element mesh-partitioning, and by Hauck and takes into account information from only two partitions, and
Borriello [14] (called Optimized KLFM), and by Cong and as such it does not have sufficient information to explicitly
Smith [11] for hypergraph partitioning. Karypis and Kumar minimize the maximum subdomain degree of the resulting
extensively studied this paradigm in [21], [23], [24] for the k-way partitioning. In principle, additional information can
partitioning of graphs. They presented novel graph coarsening be propagated down at each bisection level to account for
schemes and they showed both experimentally and analytically the degrees of the various subdomains. For example, during
that even a good bisection of the coarsest graph alone is each bisection step, the change in the degrees of the adjacent
already a very good bisection of the original graph. These subdomains can be taken into account (either explicitly or via
coarsening schemes made the overall multilevel paradigm very variations of terminal-propagation-based techniques [16]) to
robust and made it possible to use simplified variants of KL favor solutions that in addition to minimizing the cut also
or FM refinement schemes during the uncoarsening phase, reduce these subdomain degrees. However, the limitation of
which significantly speeded up the refinement process without such approaches is that they end-up over-constraining the
compromising overall quality. METIS [23], a multilevel graph problem because not only do they try to reduce the maximum
partitioning algorithm based upon this work, routinely finds subdomain degree of the final k-way partitioning, but they also
IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, VOL XX, NO. XX, 2005 4

try to reduce the maximum degree of the intermediate lower-k optimization problem but they differ on the starting point
partitioning solutions. of that refinement. The first algorithm called Direct Multi-
For this reason, approaches based on direct k-way parti- Phase Refinement directly optimizes the multi-objective cost
tioning are better suited for the problem of minimizing the using k-way V-cycle framework [19], while the second al-
maximum subdomain degree, as they provide a concurrent gorithm called Aggressive Multi-Phase Refinement utilizes
view of the entire k-way partitioning solution. The ability of refinement strategies that enable large scale perturbations
direct k-way partitioning to optimize objective functions that of the solution space. Details on the exact multi-objective
depend on knowing how the hyperedges are partitioned across formulation is provided in the rest of this section and the two
all k partitions has been recognized by various researchers, refinement algorithms are described in subsequent sections.
and a number of different algorithms have been developed
to minimize objective functions such as the sum-of-external-
degrees, scaled cost, absorption etc. [5], [10], [25], [30], [34]). A. Multi-Objective Problem Formulation
Moreover, direct k-way partitioning can potentially produce In general, the objectives of producing a k-way partitioning
much better solutions than a method that computes a k-way that both minimize the cut and the maximum subdomain
partitioning via recursive bisection. In fact, in the context degree are reasonably well correlated with each other, as
of a certain classes of graphs it was shown that recursive the partitionings with low cuts will also tend to have low
bisectioning can be up to an O(log n) factor worse than the maximum subdomain degrees. However, this correlation is not
optimal solution [33]. perfect, and these two objectives can actually be at odds with
However, despite the inherent advantage of direct k-way each other. That is, a reduction in the maximum subdomain
partitioning to naturally model much more complex objectives, degree may only be achieved if the cut of the partitioning is
and the theoretical results which suggest that it can lead increased. This situation arises with vertices that are adjacent
to superior partitioning solutions, a number of studies have to vertices that belong to more than two subdomains. For
shown that existing direct k-way partitioning algorithms for example, consider a vertex v that belongs to the maximum
hypergraphs, produce solutions that are in general inferior to degree partition V i and let Vq and Vr be two other partitions
those produced via recursive bisectioning [10], [25], [30], [34]. such that v is connected to vertices in V i , Vq , and Vr . Now,
The primary reason for that is the fact that computationally if EDq (v) IDi (v) < 0 and EDr (v) IDi (v) < 0, then the
efficient k-way partitioning refinement algorithms are often move of v to either partitions V q or Vr will increase the cut
trapped into local minima, and usually require much more but if EDq (v) + EDr (v) IDi (v) > 0, then moving v to either
sophisticated and expensive optimizers to climb out of them. Vq or Vr will actually decrease Vi s subdomain degree. One
To overcome these conflicting requirements and characteris- such scenario is illustrated in Figure 3, in which vertex v from
tics, our algorithms for minimizing the maximum subdomain partition Vi is connected to vertices x, y, and z of partitions V i ,
degree combine the best features of the recursive bisectioning Vq and Vr , respectively, and the weights of the respective edges
and direct k-way partitioning approaches. We achieve this by are 6, 5, and 3. Moving vertex v from partition V i to either
treating the minimization of the maximum subdomain degree partitions Vq or Vr will reduce the subdomain degree of V i ;
as a post-processing problem to be performed once a high- however, either of these moves will increase the overall cut of
quality k-way partitioning has been obtained. Specifically, we the partitioning. For example, if v moves to V q , the subdomain
use existing state-of-the-art multilevel-based techniques [19], degree of Vi will reduce from 8 to 6, whereas the overall cut
[22] to obtain an initial k-way solution via repeated bisec- will increase from 8 to 9. This discussion suggests that in order
tioning, and then refine this solution using various k-way to develop effective algorithms that explicitly minimize the
partitioning refinement algorithms that (i) explicitly minimize maximum subdomain degree and the cut, these two objectives
the maximum subdomain degree, (ii) ensure that the cut does need to be coupled together into a multi-objective framework
not significantly increase, and (iii) ensure that the balancing that allows the optimization algorithm to intelligently select
constraints of the resulting k-way partitioning are satisfied. the preferred solution.
This approach has a number of inherent advantages. First, The problem of multi-objective optimization within the
by building upon a cut-based k-way partitioning, it leverages context of graph and hypergraph partitioning has been exten-
the huge body of existing research on this topic, and it can sively studied in the literature [1], [27], [29], [31], [36] and
benefit from future improvements. Second, in terms of cut, two general approaches have been developed for combining
its initial k-way solution is of extremely high-quality; thus, multiple objectives. The first approach keeps the different
allowing us to primarily focus on minimizing the maximum objectives separate and couples them by assigning to them
subdomain degree without being overly concerned about the different priorities. Essentially in this scheme, a solution that
cut of the final solution (as long as the partitioning is not optimizes the highest priority objective the most is always
significantly perturbed). Third, it allows for a user-adjustable preferred and the lower priority objectives are used as tie-
and predictable framework in which the user can specify how breakers (i.e., used to select among equivalent solutions in
much (if any) deterioration on the cut he or she is willing to terms of the higher priority objectives). The second approach
tolerate in order to reduce the maximum subdomain degree. creates an explicit multi-objective function that numerically
To actually perform the maximum subdomain-degree fo- combines the individual functions. For example, a multi-
cused k-way refinement we developed two classes of algo- objective function can be obtained as the weighted sum of the
rithms. Both of them treat the problem as a multi-objective individual objective functions. In this scheme, the choice of
IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, VOL XX, NO. XX, 2005 5

Vq IV. D IRECT M ULTI -P HASE R EFINEMENT

y Our first k-way refinement algorithm for the multi-objective


5
Vi problem formulations described in Section III-A is based on
v
the multi-phase refinement approach implemented by hMETIS
and was initially described in [19]. The idea behind multi-
x 6
3 phase refinement is quite simple. It consists of two phases,
z namely a coarsening and an uncoarsening phase. The un-
Vr
coarsening phase is identical to the uncoarsening phase of
the multilevel hypergraph partitioning algorithm [19]. The
Fig. 3. An example in which the objectives of maximum subdomain degree coarsening phase, called restricted coarsening [19], however
and cut are in conflict with each other. Lets say Vi is the subdomain with
the maximum degree. If v is moved to either Vq or Vr it will increase the is somewhat different, as it preserves the initial partitioning
cut either by one or three, respectively. However both moves will reduce the that is input to the algorithm. Given a hypergraph G and
maximum subdomain degree by two (5+3-6). a partitioning P , during the coarsening phase a sequence
of successively coarser hypergraphs and their partitionings is
constructed. Let (G i , Pi ) for i = 1, 2, . . . , m, be the sequence
the weight values is used to determine the relative importance of hypergraphs and partitionings. Given a hypergraph G i and
of the various objectives. One of the advantages of such an its partitioning Pi , restricted coarsening will collapse vertices
approach is that it tends to produce somewhat more natural together that belong to the same partition. The partitioning
and predictable solutions as it will prefer solutions that to a Pi+1 of the next level coarser hypergraph G i+1 is computed
certain extent, optimize all different objective functions. by simply inheriting the partition from G i . By constructing
In our algorithms we used both of these methods to combine Gi+1 and Pi+1 in this way we ensure that the number of
the two different objectives. Specifically, our priority-based hyperedges cut by the partitioning is identical to the number
scheme produces a multi-objective solution in which the of hyperedges cut by P i in Gi .
maximum subdomain degree is the highest priority objective The set of vertices to be collapsed together in this restricted
and the cut is the second highest. This choice of priorities was coarsening scheme can be selected by using any of the
motivated by the fact that within our framework, the solution coarsening schemes that have been previously developed [19].
is already at a local minima in terms of cut; thus, focusing on In our algorithm, we use the first-choice (FC) scheme [25] as
the maximum subdomain degree is a natural choice. our default, since it leads to the best overall solutions [22]. The
Our combining multi-objective function couples the differ- FC scheme is derived by modifying the commonly used edge-
ent objectives using the formula coarsening scheme. In the edge-coarsening scheme, a vertex
is randomly selected and it is merged with a highly connected
Cost = (MaximumDegree) + Cut, (1)
and unmatched neighbor. The connectivity to the neighbors is
where MaximumDegree is the maximum subdomain degree, estimated by representing each hyperedge by a clique of edges
Cut is the hyperedge cut, and is a user-specified weight each with the weight of w(e)/(|e| 1) and by summing the
indicating the importance of maximum subdomain degree weights of edges common to each neighbor and the vertex
relative to the cut. Selecting the proper value of this parameter in consideration. The FC coarsening scheme is derived from
is, in general, problem dependent. As discussed earlier, in the edge-coarsening scheme by relaxing the requirement that
many cases the maximum subdomain degree can be only a vertex is matched only with another unmatched vertex.
reduced by increasing the overall cut of the partitioning. Specifically, in the FC coarsening scheme, the vertices are
As a result, in order for Equation 1 to provide meaningful again visited in a random order. However, for each vertex
maximum subdomain degree reduction, should be greater v, all vertices (both matched and unmatched) that belong to
than 1.0. Moreover, since the cut worsening moves that lead to hyperedges incident to v are considered, and the one that is
improvements in the maximum subdomain degree are those in connected via the edge with the largest weight is matched with
which the moved vertices are connected to vertices of different v, breaking ties in favor of unmatched vertices. The FC scheme
partitions (i.e., corner vertices), then the value should be tends to remove a large amount of the exposed hyperedge-
an increasing function on the number of partitions k; thus, weight in successive coarse hypergraphs, and thus, makes it
allowing for the movement of vertices that are adjacent to easy to find high-quality initial partitionings that require little
many subdomains (as long as such moves reduce the maximum refinement during the uncoarsening phase.
subdomain degree). The sensitivity on these parameters is Due to the randomization in the coarsening phase, succes-
further studied in the experiments shown in Section VI. sive runs of the multi-phase refinement algorithm can lead to
In addition, in both of these schemes, we break ties in favor additional improvements of the partitioning solution. For this
of solutions that lead to lower sum-of-external-degrees. This reason, in our algorithm we perform multiple such iterations
was motivated by the fact that lower SOED solutions may and the entire process is stopped when the solution quality
lead to subsequent improvements in either one of the main does not improve in successive iterations. Such an approach is
objective functions. Also, if a gain of the move is tied even identical to the V -cycle refinement algorithm used by hMETIS
after considering SOED, the ability of the move to improve [22].
area balancing is considered for tie breaking. The actual k-way partitioning refinement at a given level
IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, VOL XX, NO. XX, 2005 6

during the uncoarsening phase is performed using a greedy Input Hypergraph

algorithm that is motivated by a similar algorithm used in the


direct k-way partitioning algorithm of hMETIS. More precisely, Compute 2L k subdomains by
the greedy k-way refinement algorithm works as follows. 1
recursive bisection

Consider a hypergraph G = (V, E), and its partitioning vector


P . The vertices are visited in a random order. Let v be such a Apply direct multi-phase
vertex, let P [v] = a be the partition that v belongs to. If v is a 2 refinement

node internal to partition a then v is not moved. If v is at the Collapse into macro nodes

boundary of the partition, then v can potentially be moved to Bottom-up initialization

one of the partitions N (v) that vertices adjacent to v belong Create x pairs using 2x macro
to (the set N (v) is often referred to as the neighborhood of v). nodes

Let N  (v) be the subset of N (v) that contains all partitions b Swap macro nodes
to improve the pairs
such that movement of vertex v to partition b does not violate Convert the pairs to larger
the balancing constraint. Now the partition b N  (v) that 3 macro nodes
leads to the greatest positive reduction in the multi-objective
function is selected and v is moved to that partition.
are there k nodes?
NO
V. AGGRESSIVE M ULTI -P HASE R EFINEMENT
One of the potential problems with the multi-objective YES

refinement algorithm described in Section IV is that it is


limited to the extent in which it can make large-scale per- Restore 2L k macro nodes
turbations on the initial k-way partitioning produced by the 4
(2L nodes per subdomain)
cut-focused recursive-bisectioning algorithm. This is due to the Apply, randomized/hill-climb
combination of two factors. First, the greedy, non-hill climbing pair-wise macro node swapping

nature of its refinement algorithm limits the perturbations that


are explored, and second, since it is based on an FM-derived 5
Restore original hypergraph, and
apply direct multi-phase
framework, it is constrained to make moves that do not violate refinement
the balancing constraints of the resulting solution. As a result
(shown later in our experiments (Section VI)), it tends to Output Partition Information
produce solutions that retain the low-cut characteristics of the
initial k-way solution, but it does not significantly reduce the Fig. 4. The various steps of the bottom-up aggressive multi-phase refinement
maximum subdomain degree. Ideally, we will like a multi- algorithm.
objective refinement algorithm that is capable of effectively
exploring the entire space of possible solutions in order to
select the one that best optimizes the particular multi-objective
function. During the third step, a k-way partitioning of these macro
Toward this goal, we developed two multi-objective refine- nodes is computed such that each partition has exactly 2 l
ment algorithms that allows large-scale perturbations of the macro nodes. The goal of this macro-node partitioning is to
partitioning produced by the recursive bisectioning algorithm. obtain an initial partitioning that has low maximum subdomain
These algorithms are described in detail in the following degree and is achieved by greedily combining macro nodes
sections. that lead to the smallest maximum subdomain degree as
follows. For each pair of macro-nodes u i and uj (i < j),
let vi,j be the node obtained by merging u i and uj , and let
A. Bottom-Up Aggressive Multi-phase Refinement deg(ui ) and deg(vi,j ) be the degrees of macro-node u i and
The first algorithm, referred to as bottom-up aggressive merged node v i,j , respectively. For l = 1, the algorithm orders
multi-phase refinement, consists of five major steps (outlined all possible macro-node pairs in non-increasing order based on
in Figure 4) and operates as follows. the maximum degree of its constituent macro-nodes (i.e., for
In the first step, given the initial k-way partitioning, the each pair vi,j it considers max{deg(u i ), deg(uj )}) and the
algorithm proceeds to further subdivide each of these parti- macro-node pairs that have the same maximum degree are
tions into 2l parts (where l is a user specified parameter). ordered in non-decreasing order of their resulting degree (i.e.,
These subdivisions are performed using a min-cut hypergraph for each pair v i,j it considers deg(vi,j )). The algorithm then
partitioning algorithm, resulting in a high-quality fine-grain traverses the list in that order to identify the pairs of unmatched
partitioning. During the second step, this 2 l k-way partitioning macro-nodes to form the initial partitioning. As a result of this
is refined using the direct multi-phase refinement algorithm traversal order, the algorithm provides the highest flexibility
described in Section IV to optimize the particular multi- to the macro-nodes that have high degree and tries to combine
objective function. Each of the resulting 2 l k partitions are them with the macro-nodes that will lead to pairs that have the
then collapsed into single nodes, that we will refer to them smallest degree. However, for l > 1, since a direct extension
as macro nodes. of such an approach is not computationally feasible (the
IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, VOL XX, NO. XX, 2005 7

number of combinations that needs to be considered increases algorithm that can be considered an extension of the classi-
exponentially with l), the algorithm obtains the partitioning in cal Kernighan-Lin algorithm [26] for k-way refinement that
a bottom-up fashion by repeatedly applying the above scheme operates as follows.
l times. In addition, after each round of pairings, the macro- The algorithm consists of a number of iterations. During
node-level partitioning is further refined by applying a pair- each iteration it identifies and performs a sequence of macro-
wise macro-node swapping algorithm described in Section V- nodes swaps that improve the value of the objective function
A.1. and terminates when no such sequence can be identified within
In the fourth step, the quality in terms of the particular a particular iteration. Each of these iterations is performed as
multi-objective function of the resulting macro-node level follows. Let k be the number of partitions, m the number
partitioning is improved using a pair-wise macro-node swap- of macro-nodes, and q = m/k the number of macro-nodes
ping algorithm (described in Section V-A.1). This algorithm per partition. Since each macro-node v in partition V i can
operates at the macro-node level and selects two macro- be swapped with any macro-node belonging to a different
nodes, each one from a different partition, and swaps the partition, there are a total of m(m q) possible pairs of
partitions that they belong so that to improve the overall macro-nodes that can be swapped. For each of these swaps,
quality of the solution. Since by construction, each macro the algorithm computes the improvement in the value of the
node is approximately of the same size, such swaps almost objective function (i.e., the gain) achieved by performing it,
always lead to feasible solutions in terms of the balance and inserts all the m(mq) possible swaps into a max-priority
constraint. The use of such a refinement algorithm was the queue based on this gain value. Then it proceeds to repeatedly
primary motivation behind the development of the aggressive (i) extract from this queue the macro-node pair whose swap
multi-phase algorithm as it allows us to move large portions leads to the highest gain, (ii) modify the partitioning by
of the hypergraph between partitions without having to either performing the macro-node swap, (iii) record the current value
violate the balancing constraints or rely on a sequence of small of the objective function, and (iv) update the gains of the
vertex-moves in order to achieve the same effect. Moreover, macro-node pairs in the priority queue to reflect the new
because by construction, each macro-node corresponds to a partitioning. Once the priority queue becomes empty, the
good cluster (as opposed to a random collection of nodes) such algorithm determines the point within this sequence of swaps
swaps can indeed lead to improved quality very efficiently. that resulted in the best value of the objective function and
Finally, in the fifth step, the macro-node based partitioning reverts the swaps that it performed after that point. An outline
is used to induce a partitioning of the original hypergraph, of the single iteration of this hill-climb swap algorithm is
which is then further improved using the direct multi-phase presented in Algorithm 1.
refinement algorithm described in Section IV.
1) Macro-node Partitioning Refinement: We developed two Algorithm 1 Hill-climbing algorithm for identifying a se-
algorithms for refining a partitioning solution at the macro- quence of pair-wise macro node swaps to reach a lower cost.
node level. The differences between the two algorithms are
Compute initial gain values for all possible pairs
the method used to identify the pairs of macro nodes to be Insert them in a priority queue
swapped and the policy used in determining whether or not a
particular swap will be accepted. Details on these two schemes while Pairs exist in priority queue do
are provided in the next two sections. Pop the highest gain pair
Make the swap
a) Randomized Pair-wise Node Swapping: In this Lock the pair
scheme, two nodes belonging to different partitions are ran-
domly selected and the quality of the partitioning resulting if Cost is minimum then
by their swap is evaluated in terms of the particular multi- Record roll back point
objective function. If that swap leads to a better solution, Record new minimum cost
end if
the swap is performed, otherwise it is not. Swaps that do
not improve or degrade the particular multi-objective function if Maximum subdomain degree changed then
are also allowed, as they often introduce desirable perturba- Update the gain values of all pairs remaining in priority queue.
tions. The primary motivation for this algorithm is its low else
computational complexity, and in practice it produces very Update the gain values of affected pairs remaining in priority
queue.
good results. Also, when there are two nodes per subdomain,
end if
the randomized pair-wise node swapping can be done quite end while
efficiently by pre-computing the cut and degree of all possible
pairings and storing them in a 2D table. This loop-up based Roll back to minimum cost point (i.e., undo all swaps after the
swapping takes less than one second to evaluate the cost of minimum cost point in reverse order)
one million swaps on a 1.5 GHz workstation.
b) Coordinated Sequence of Pair-wise Node Swaps: One Due to the global nature of the maximum subdomain degree
of the limitations of the previous scheme is that it lacks the cost, if a macro-node swap changes the value of the maximum
ability to climb out of local minima as it does not allow any subdomain degree, the gains of all the pairwise swaps that are
swaps that decrease the value of the objective function. To still in the priority queue needs to be recomputed. However, if
overcome this problem, we developed a heuristic refinement a swap does not change the value of the maximum subdomain
IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, VOL XX, NO. XX, 2005 8

degree, then only the gains of the macro-node pairs that Input Hypergraph

contain nodes adjacent to those being swapped need to be


recomputed. Since only a small fraction of the swaps will end 1 Compute k subdomains by
up changing the value of the maximum subdomain degree, recursive bisection

the cost of updating the priority queue is relatively small.


Despite this, since the algorithm needs to evaluate all m(mq) Apply direct multi-phase
2
possible pairs of swaps, its runtime complexity is significantly refinement
higher than that of the randomized swapping algorithm. Top-down initialization
It is easy to see that this algorithm is quite similar in spirit i=1
to the Kernighan-Lin algorithm with the key difference being
that the priority queue stores the effect of a pairwise macro- If (i >1), Restore k subdomains.
node swap as opposed to the effect of a single vertex move. Recursively bisect each of the k sub
This swapping-based view allows this algorithm to operate for domains to obtain 2 i k sub domains.

an arbitrary number of partitions k. 3 Collapse the sub domains into macro


nodes
Refine by pairwise macro node swap
i += 1
B. Top-Down Aggressive Multi-Phase Refinement
The key parameter of the aggressive refinement scheme NO
described in Section V-A is the value of l, which controls the is i > L ?

granularity of the macro-nodes that are used. The effectiveness YES


of the overall refinement approach can be affected both for
small as well as large values of l. Small values may lead to Restore 2L k macro nodes
(2L nodes per subdomain)
4
large macro-nodes whose swaps do not improve the quality,
Apply, randomized/hill-climb
whereas large values may lead to small macro-nodes that pair-wise macro node swapping
require a coordinated sequence of swaps to achieve the desired
perturbations. Moreover, large values of l have the additional Restore original hypergraph, and
5
drawback of increasing the overall runtime of the algorithm apply direct multi-phase
refinement
as it requires more time to obtain the initial clusters and more
refinement time.
Output Partition Information
In developing the bottom-up aggressive multi-phase refine-
ment algorithm we initially expected that its performance will
Fig. 5. The various steps of the top-down aggressive multi-phase refinement
continue to improve as we increase the value of l, until the algorithm.
point at which the size of the macro-nodes will become so
small so that this macro-node based partitioning does not
provide any advantages over the unclustered hypergraph. Our
experimental results (presented later in Section VI) partially optimize the intermediate solutions so that they minimize
verified this intuition but also showed that the point after which their respective maximum subdomain degree. However, the
we obtain no improvements is actually much higher from what maximum subdomain degree at the 4k-way (or 2k-way) parti-
we had expected. In particular, our results will show that as l tioning level may not directly affect the maximum subdomain
increases from zero to two, the bottom-up scheme can achieve degree at the desired k-way partitioning level. In fact, due to
progressively better results, but its performance for l 3 is the heuristic nature of the refinement strategies, the fact that
actually worse than that for l = 2. In analyzing this behavior these intermediate maximum subdomain degrees have been
we realized that there is another parameter that affects the optimized may reduce our ability to obtain low maximum
effectiveness of this scheme and has to do with the bottom-up subdomain degrees at the k-way level. Thus, an inherent
nature of the k-way partitioning that is computed in the third problem with the bottom-up scheme is the fact that it ends
step of the algorithm. up spending a lot of time refining intermediate solutions that
Recall from Section V-A that for l > 1 the k-way may not directly or indirectly benefit our ultimate objective,
macro-node partitioning is computed by repeatedly merging which is to obtain a k-way partitioning that minimizes the
successively larger partitions, followed by a swapping-based maximum subdomain degree.
refinement. For example, when l = 3, we first merge pairs of Having taken cognizance of this phenomenon, we devised
macro-nodes to obtain a 4k-way partitioning, then merge these another aggressive multi-phase refinement algorithm that em-
partitions to obtain a 2k-way partitioning, and finally obtain ploys a top-down framework, which allows it to always
the desired k-way partitioning by merging these 2k partitions. optimize the objective function within the context of the k-way
Moreover, between these merging operations we apply the partitioning solution. The overall structure of the algorithm is
swap-based refinement to optimize the overall quality of the shown in Figure 5 and shares many of the characteristics of
4k-, 2k-, and k-way partitioning. The problem with this the bottom-up algorithm. In particular, the last two steps of
approach arises from the fact that as we move from the these algorithms are identical and differ only in the first three
8k macro-nodes to a 4k-, 2k-, and k-way partitioning we steps, out of which the third step represents the key conceptual
IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, VOL XX, NO. XX, 2005 9

difference. TABLE II
The top-down algorithm starts by computing a k-way par- T HE CHARACTERISTICS OF THE HYPERGRAPHS USED TO EVALUATE OUR
titioning that minimizes the cut using recursive bisectioning ALGORITHM .

and it further refines the solution using the direct multi-phase


refinement algorithm of Section IV that takes into consider- Benchmark No. of vertices No. of hyperedges
ation the multi-objective nature of the problem. During the ibm01 12506 14111
third step it performs l levels of aggressive refinement by ibm02 19342 19584
repeatedly creating macro-nodes and swapping them between ibm03 22853 27401
the k partitions. Specifically, during the ith refinement level, ibm04 27220 31970
it splits each one of the k partitions into 2 i sub-partitions ibm05 28146 28446
(resulting in a total of 2 i k sub-partitions), creates a macro ibm06 32332 34826
ibm07 45639 48117
node for each sub-partition, and optimizes the overall k-way
ibm08 51023 50513
solution by applying a swap-based macro-node refinement ibm09 53110 60902
algorithm (Section V-A.1). Note that the initial macro-node- ibm10 68685 75196
level k-way partitioning is obtained by inheriting the current k- ibm11 70152 81454
way partitioning of the hypergraph. Since this approach always ibm12 70439 77240
focuses on optimizing the solution at the k-way partitioning ibm13 83709 99666
level, it does not suffer from the limitations of the bottom-up ibm14 147088 152772
scheme. Moreover, as l increases, this scheme considers for ibm15 161187 186608
swapping successively smaller macro-nodes, which allows it ibm16 182980 190048
to perform large-scale perturbations at multiple scales (as it ibm17 184752 189581
was the case with the bottom-up scheme). ibm18 210341 201920

VI. E XPERIMENTAL R ESULTS


average. This geometric mean of ratios ensures that ratios
We experimentally evaluated our multi-objective partition- corresponding to comparable degradations or improvements
ing algorithms on the 18 hypergraphs that are part of the (i.e., ratios that are less than or greater than one) are given
ISPD98 circuit partitioning benchmark suite [3] (with unit equal importance.
area). The characteristics of these hypergraphs are shown in
Table II. For each of these circuits, we computed a 4-, 8-,
16-, 32-, and 64-way partitioning solution using the recursive A. Direct Multi-Phase Refinement
bisection-based partitioning routine of hMETIS 1.5.3 [22] and Our first set of experiments was focused on evaluating the
the various algorithms that we developed for minimizing effectiveness of the direct multi-phase refinement algorithm
the maximum subdomain degree. The hMETIS solutions were described in Section IV. Toward this goal we performed a
obtained by using a 4951 bisection balance constraint and series of experiments using both formulations of the multi-
hMETISs default set of parameters. Since these balance con- objective problem definition described in Section III-A. The
straints are specified at each bisection level, the final k-way performance achieved in these experiments relative to those
partitioning may have a somewhat higher load imbalance. To obtained by hMETISs recursive bisectioning algorithm is
ensure that the results produced by our algorithm can be easily shown in Table III. Specifically, this table shows four sets
compared against those produced by hMETIS, we used the of results. The first set uses the priority-based multi-objective
resulting minimum and maximum partition sizes obtained by formulation whereas the remaining three sets use Equation 1
hMETIS as the balancing constraints for our multi-objective k- to numerically combine the two different objectives. The
way refinement algorithm. objectives were combined using three different values of ,
The quality of the solutions produced by our algorithm and namely 1, 2, and k (where k is the number of partitions that
those produced by hMETIS were evaluated by looking at three is computed).
different quality measures, which are the maximum subdomain The results of Table III show that irrespective of the number
degree, the cut, and the average subdomain degree. To ensure of partitions or the particular multi-objective formulation, the
the statistical significance of our experimental results, these direct multi-phase refinement algorithm produces solutions
measures were averaged over ten different runs for each whose average quality along each one of the three different
particular set of experiments. quality measures is better than the corresponding solutions
Furthermore, due to space constraints, our comparisons produced by hMETIS. As expected, the relative improvements
against hMETIS are presented in a summary form, which shows are higher for the maximum subdomain degree. In particular,
the relative maximum subdomain degree (RMax), relative cut depending on the number of partitions, the direct multi-
(RCut), and relative average degree (RDeg) achieved by our phase refinement algorithm reduces the maximum subdomain
algorithms over those achieved by hMETIS recursive bisection degree by 5% to 15%. The relative improvements increase as
averaged over the entire set of 18 benchmarks. To ensure the the number of partitions increase, because as the results in
meaningful averaging of these ratios, we first took their log 2 - Table I showed, these are the partitioning solutions in which
values, calculated their mean , and then used 2 as their the maximum subdomain degree is significantly higher than
IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, VOL XX, NO. XX, 2005 10

TABLE III
D IRECT M ULTI -P HASE R EFINEMENT R ESULTS .

Prioritized Combined, = 1 Combined, = 2 Combined, = k


k RMax RCut RDeg RMax RCut RDeg RMax RCut RDeg RMax RCut RDeg
4 0.955 0.981 0.948 0.940 0.967 0.934 0.928 0.964 0.931 0.929 0.967 0.934
8 0.890 0.967 0.913 0.877 0.947 0.892 0.886 0.952 0.897 0.881 0.959 0.906
16 0.884 0.969 0.898 0.876 0.958 0.886 0.886 0.965 0.894 0.886 0.966 0.894
32 0.865 0.967 0.886 0.874 0.959 0.874 0.871 0.963 0.877 0.870 0.964 0.878
64 0.851 0.970 0.880 0.864 0.966 0.872 0.876 0.970 0.875 0.859 0.969 0.875
RMax, RCut, and RDeg are the average maximum subdomain degree, cut, and average subdomain degree, respectively of the multi-objective solution relative
to hMETIS. Numbers less than one indicate that the multi-objective algorithm produces solutions that have lower maximum subdomain degree, cut, or average
subdomain degree than those produced by hMETIS.

the average and thus there is significantly more room for superior to the randomized algorithm. For example, for l = 2
improvement. both of these measures are over 10% better than the corre-
Furthermore, the direct multi-phase refinement algorithm sponding measures for the randomized algorithm. However,
also leads to partitionings that on the average have lower cut in terms of the maximum subdomain degree (measured by
and average subdomain degree. Specifically, the cut tends to RMax), the hill-climbing algorithm provides little advantage.
improve by 1% to 4%, whereas the average subdomain degree In fact, its overall performance is slightly worse than the
improves by 5% to 13%. Finally, comparing the different randomized schemeleading to solutions whose maximum
multi-objective formulations we can see that in general, there subdomain degree is about 1% to 3% higher for l = 1 and
are very few differences between them, with both of them l = 2, respectively.
leading to comparable solutions. The mixed performance of the hill-climbing algorithm and
its inability to produce solutions that have lower maximum
B. Aggressive Multi-Phase Refinement subdomain degree suggest that this type of refinement may not
Our experimental evaluation of the aggressive multi-phase be well-suited for the step-nature of the maximum subdomain
refinement schemes (described in Section V) focused along degree objective. Since there are relatively few macro-node
two directions. First, we designed a set of experiments to swaps that affect the maximum subdomain degree, the priority
evaluate the effectiveness of the macro-node-level partitioning queue used by the hill-climbing algorithm forces it to order
refinement algorithms used by these schemes and second, the moves based on their gains with respect to the cut (as it
we performed a series of experiments that were designed is the secondary objective). Because of this, this refinement is
to evaluate the effectiveness of the bottom-up and top-down very effective in minimizing RCut and RDeg but it does not
schemes within the context of aggressive refinement. affect RMax. In fact, as the results suggest, this emphasis on
1) Evaluation of Macro-node Partitioning Refinement Al- the cut may affect the ability of subsequent swaps to reduce
gorithms: To directly evaluate the relative performance of the maximum subdomain degree. To see if this is indeed the
the two refinement algorithms described in Section V-A.1 we case we performed another sequence of experiments in which
performed a series of experiments using a simple version of we modified the randomized algorithm so that to perform the
the aggressive refinement schemes. Specifically, we computed moves using the same priority-queue-based approach used by
a 2l k-way partitioning, collapsed each partition into a macro the hill-climbing scheme and terminated each inner-iteration
node, and obtained an initial k-way partitioning of these macro as soon as the priority queue contained negative gain vertices
nodes using a random assignment. This initial partitioning was (i.e., it did not perform any hill-climbing). Our experiments
then refined using the two macro-node partitioning refinement (not presented here) showed that this scheme produced results
algorithmsrandomized swap and hill-climbing swap. This whose RMax was worse than that of the randomized and
experiment was performed for each one of the circuits in our hill-climbing approaches but its RCut and RDeg were be-
benchmark suite and the overall performance achieved by the tween those obtained by the randomized and the hill-climbing
two algorithms for k = 8, 16, 32 and l = 1, 2 relative to schemesverifying our hypothesis that due to the somewhat
those obtained by hMETISs recursive bisectioning algorithm conflicting nature of the two objectives, a greedy ordering
is shown in Table IV. Note that for this set of experiments, scheme does not necessarily lead to better results.
the two objectives of maximum subdomain degree and cut The columns of Table IV labeled RTime shows the
were combined using a priority scheme, which uses the min- amount of time required by the two refinement algorithms.
imization of the maximum subdomain degree as the primary As expected, the randomized algorithm is faster than the
objective. hill-climbing algorithm and its relative runtime advantage
From these results we can see that contrary to our initial improves as the number of macro-nodes increases. Due to the
expectations, the hill-climbing algorithm does not outperform mixed performance of the hill-climbing algorithm and its con-
the randomized randomized-swapping algorithm for all three siderably higher computational requirements for large values
performance metrics. Specifically, in terms of the cut (RCut) of l, our subsequent experiments used only the randomized
and the average degree (RDeg), the hill-climbing algorithm is refinement algorithm.
IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, VOL XX, NO. XX, 2005 11

TABLE IV
A NALYSIS OF RANDOMIZED VS HILL - CLIMB SWAPPING .

Randomized Swap Hill-climb Swap


l=1 l=2 l=1 l=2
k RMax RCut RDeg RTime RMax RCut RDeg RTime RMax RCut RDeg RTime RMax RCut RDeg RTime
8 0.953 1.001 1.005 1.979 0.999 1.113 1.137 2.201 0.956 0.995 0.993 1.710 0.970 1.032 1.038 2.830
16 0.930 1.018 1.027 1.931 0.917 1.127 1.174 2.321 0.948 1.016 1.021 2.273 0.943 1.031 1.039 15.577
32 0.905 1.017 1.036 1.883 0.845 1.112 1.168 2.323 0.921 1.007 1.014 12.900 0.913 1.022 1.035 549.267

2) Evaluation of Bottom-up and Top-down Schemes: Ta- that for l = 1 and l = 2 and large values of k, the bottom-up
ble V shows the performance achieved by the bottom-up scheme actually leads to solutions than are somewhat better
and top-down aggressive multi-phase refinement schemes for than those obtained by the top-down scheme. We believe
l = 1, . . . , 3, and k = 4, 8, . . . , 64 relative to those obtained that this is due to the fact that for small values of l, the
by hMETISs recursive bisectioning algorithm. Specifically, for macro-node pairing scheme used by the bottom-up scheme to
each value of l and k, this table shows four sets of results. derive the macro-node level k-way partitioning (that takes into
The first two sets (one for the bottom-up and one for the top- account all possible pairings of macro-nodes), is inherently
down scheme) were obtained using the priority-based multi- more powerful than macro-node-level refinement used by the
objective formulation whereas the remaining two sets used top-down scheme. This becomes more evident for large values
the combining scheme. Due to space constraints, we only of k, for which there is considerably more room for alternate
present results in which the two objectives were combined pairingsresulting in relatively better results.
using = k.
From these results, we can observe a number of general C. Comparison of Direct and Aggressive Multi-phase Refine-
trends about the performance of the aggressive multi-phase ment Schemes
refinement schemes and their sensitivities to the various pa- Comparing the results obtained by the aggressive multi-
rameters. In particular, as l increases from one to two (i.e., phase refinement with the corresponding results obtained by
each partition is further subdivided into two or four parts), the the direct multi-phase refinement algorithm (Tables V and III),
effectiveness of the multi-objective partitioning algorithm to we can see that in terms of the maximum subdomain degree,
produce solutions that have lower maximum subdomain degree the aggressive scheme leads to substantially better solutions
compared to the solutions obtained by hMETIS, improves. In than those obtained by the direct scheme, whereas in terms of
general, for l = 1, the multi-objective algorithm reduces the cut and the average subdomain degree, the direct scheme
the maximum subdomain degree by 7% to 28%, whereas is superior. These results are in agreement with the design
for l = 2, the corresponding improvements range from 6% principles behind these two multi-phase refinement schemes
to 35%. However, these improvements lead to solutions in for the multi-objective optimization problem at hand, and
which the cut and the average subdomain degree obtained for illustrate that the former is capable of making relatively large
l = 2 are somewhat higher than those obtained for l = 1. For perturbations on the initial partitioning obtained by recursive
example, for l = 1, the multi-objective algorithm is capable bisectioning, as long as these perturbations improve the multi-
of improving the cut over hMETIS by 0% to 4%, whereas for objective function. In general, the aggressive multi-phase re-
l = 2, the multi-objective algorithm leads to solutions whose finement scheme with l = 1, dominates the direct scheme, as it
cut is up to 5% worse than those obtained by hMETIS. Note leads to better improvements in terms of maximum subdomain
that these observations are to a large extent independent of the degree and still improves over hMETIS in terms of cut and
particular multi-objective formulation or the method used to average degree. However, if the goal is to achieve the highest
obtain the initial macro-node-level partitioning. reduction in the maximum average degree, then the aggressive
For the reasons discussed in Section V-B, the trend of scheme with l = 2 should be the preferred choice, as it does
successive improvements in the maximum subdomain degree so with relatively little degradation on the cut.
does not hold for the bottom-up scheme any more for l = 3.
In particular, the improvements in the maximum subdomain D. Runtime Complexity
degree relative to hMETIS are in the range of 0%35%, which Table VI shows the amount of time required by the various
are somewhat lower than the corresponding improvements multi-objective partitioning algorithms using either direct or
for l = 2. On the other hand, the top-down scheme is aggressive multi-phase refinement. For each value of k and
able to further reduce the maximum subdomain degree when particular multi-objective algorithm, this table shows the total
l = 3, leading to results that are 10% to 36% lower than the amount of time that was required to partition all 18 bench-
corresponding results of hMETIS. Note that this trend continues marks relative to the amount of time required by hMETIS to
for higher values of l as well (due to space constraints these compute the corresponding partitionings. From these results
results are not reported here). These results suggest that the we can see that the multi-objective algorithm that uses the
top-down scheme is better than the bottom-up scheme for large direct multi-phase refinement is the least computationally ex-
values of l. However, a closer inspection of the results reveals pensive and requires around 50% more time than hMETIS does.
IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, VOL XX, NO. XX, 2005 12

TABLE V
A GGRESSIVE M ULTI -P HASE R EFINEMENT R ESULTS .

l=1
Prioritized Combined, = k
Bottom-Up Top-Down Bottom-Up Top-Down
k RMax RCut RDeg RMax RCut RDeg RMax RCut RDeg RMax RCut RDeg
4 0.927 0.990 0.958 0.911 0.956 0.929 0.904 0.972 0.941 0.897 0.957 0.927
8 0.838 0.995 0.945 0.849 0.960 0.918 0.834 0.992 0.943 0.830 0.968 0.921
16 0.787 1.005 0.942 0.811 0.980 0.928 0.795 1.000 0.935 0.812 0.991 0.934
32 0.754 0.993 0.923 0.762 0.984 0.913 0.758 0.991 0.917 0.795 0.989 0.921
64 0.724 0.996 0.916 0.738 0.988 0.910 0.721 0.993 0.905 0.749 0.992 0.911
l=2
Prioritized Combined, = k
Bottom-Up Top-Down Bottom-Up Top-Down
k RMax RCut RDeg RMax RCut RDeg RMax RCut RDeg RMax RCut RDeg
4 0.938 1.021 0.991 0.901 0.943 0.917 0.905 0.992 0.963 0.883 0.956 0.924
8 0.825 1.046 1.004 0.822 0.964 0.922 0.814 1.041 1.001 0.806 0.974 0.926
16 0.749 1.049 1.008 0.761 0.997 0.943 0.751 1.048 1.003 0.761 1.013 0.952
32 0.693 1.041 0.991 0.704 1.000 0.936 0.689 1.033 0.976 0.728 1.017 0.945
64 0.654 1.040 0.983 0.664 1.007 0.934 0.652 1.041 0.974 0.704 1.018 0.937
l=3
Prioritized Combined, = k
Bottom-Up Top-Down Bottom-Up Top-Down
k RMax RCut RDeg RMax RCut RDeg RMax RCut RDeg RMax RCut RDeg
4 1.007 1.121 1.091 0.899 0.953 0.927 0.950 1.058 1.029 0.877 0.950 0.919
8 0.848 1.119 1.088 0.815 0.974 0.931 0.842 1.109 1.073 0.796 0.982 0.934
16 0.759 1.101 1.070 0.748 1.000 0.945 0.754 1.077 1.034 0.759 1.022 0.966
32 0.697 1.095 1.059 0.682 1.006 0.943 0.700 1.064 1.010 0.722 1.023 0.954
64 0.701 1.100 1.052 0.645 1.012 0.941 0.663 1.066 1.006 0.683 1.024 0.945
RMax, RCut, and RDeg are the average maximum subdomain degree, cut, and average subdomain degree, respectively of the multi-objective solution relative
to hMETIS. Numbers less than one indicate that the multi-objective algorithm produces solutions that have lower maximum subdomain degree, cut, or average
subdomain degree than those produced by hMETIS.

TABLE VI VII. C ONCLUSIONS


T HE AMOUNT OF TIME REQUIRED BY THE MULTI - OBJECTIVE
In this paper we described a family of multi-objective hyper-
ALGORITHMS RELATIVE TO THAT REQUIRED BY hMETIS.
graph partitioning algorithms for computing k-way partition-
Direct ings that simultaneously minimize the cut and the maximum
Scheme Aggressive Schemes subdomain degree of the resulting partitions. Our experimental
Bottom-Up Top-Down
k l=1 l=2 l=3 l=1 l=2 l=3
evaluation showed that these algorithms are quite effective
4 1.431 2.081 2.794 3.809 2.139 3.374 5.785 in optimizing these two objectives with relatively low com-
8 1.399 2.151 2.990 3.924 2.505 3.520 5.561 putational requirements. The key factor contributing to the
16 1.397 2.029 3.018 3.584 2.355 3.462 5.418 success of these algorithms was the idea of focusing on the
32 1.450 2.018 2.763 3.599 2.548 4.078 5.627
64 1.535 2.060 3.067 4.522 2.979 4.025 6.103 maximum subdomain degree objective once a good solution
with respect to the cut has been identified. We believe that
such a framework can be applied to a number of other multi-
objective problems involving objectives that are reasonably
well-correlated with each other.
On the other hand, the time required by the aggressive multi- The multi-objective algorithms presented here can be im-
phase refinement schemes is somewhat higher and increases proved further in a number of directions. In particular, our
with the value of l. However, even for these schemes, their results showed that the aggressive multi-phase refinement
overall computational requirements are relatively small. For approach, especially when deployed in a top-down fashion,
instance, in the case of the bottom-up scheme, for l = 1 is very promising in reducing the maximum subdomain de-
and l = 2 (the cases in which it led to the best results) it gree. Within this framework, our experiments revealed that
only requires two and three times more time than hMETIS, due to the step-nature of the maximum subdomain degree
respectively; and in the case of the top-down scheme its objective, the hill-climbing macro-node refinement algorithm
runtime is two to six times higher than that of hMETIS as l is not significantly more effective than the randomized swap-
increases from one to three. ping algorithm in reducing the maximum subdomain degree.
IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, VOL XX, NO. XX, 2005 13

Developing computationally scalable refinement algorithms [13] C. M. Fiduccia and R. M. Mattheyses. A linear time heuristic for
that can successfully climb out of local minima for this improving network partitions. In In Proc. 19th IEEE Design Automation
Conference, pages 175181, 1982.
type of objectives is still an open research problem whose [14] S. Hauck and G. Borriello. An evaluation of bipartitioning technique.
solution can lead to even better results both for the partitioning In Proc. Chapel Hill Conference on Advanced Research in VLSI, 1995.
problem addressed in this paper as well as other objective [15] B. Hendrickson and R. Leland. A multilevel algorithm for partitioning
graphs. Technical Report SAND93-1301, Sandia National Laboratories,
functions that share similar characteristics. Also, our work 1993.
so far was focused on producing multi-objective solutions, [16] B. Hendrickson, R. Leland, and R. V. Driessche. Enhancing data locality
which satisfy the same balancing constraints as those resulting by using terminal propagation. In Proceedings of the 29th Hawaii
International Conference on System Science, 1996.
from the initial recursive bisectioning based solution. However, [17] B. Hu and M. Marek-sadowska. Congestion minimization during
additional improvements can be obtained by relaxing the placement without estimation. In Proceedings of ICCAD, pages 737
lower-bound constraint. Our preliminary results with such an 745, Nov 2002.
[18] G. Karypis. Multilevel hypergraph partitioning. In J. Cong and
approach appears promising. J. Shinnerl, editors, Multilevel Optimization Methods for VLSI, chapter 6.
Kluwer Academic Publishers, Boston, MA, 2002.
[19] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel
ACKNOWLEDGMENT hypergraph partitioning: Application in vlsi domain. IEEE Transactions
on VLSI Systems, 20(1), 1999. A short version appears in the proceedings
This work was supported in part by NSF CCR- of DAC 1997.
9972519, EIA-9986042, ACI-9982274, ACI-0133464, and [20] G. Karypis, E. Han, and V. Kumar. Chameleon: A hierarchical clustering
algorithm using dynamic modeling. IEEE Computer, 32(8):6875, 1999.
ACI-0312828; the Digital Technology Center at the University [21] G. Karypis and V. Kumar. Analysis of multilevel graph partitioning.
of Minnesota; and by the Army High Performance Comput- In Proceedings of Supercomputing, 1995. Also available on WWW at
ing Research Center (AHPCRC) under the auspices of the URL http://www.cs.umn.edu/karypis.
[22] G. Karypis and V. Kumar. hMETIS 1.5: A hypergraph partition-
Department of the Army, Army Research Laboratory (ARL) ing package. Technical report, Department of Computer Science,
under Cooperative Agreement number DAAD19-01-2-0014. University of Minnesota, 1998. Available on the WWW at URL
The content of which does not necessarily reflect the position http://www.cs.umn.edu/metis.
[23] G. Karypis and V. Kumar. METIS 4.0: Unstructured graph partitioning
or the policy of the government, and no official endorsement and sparse matrix ordering system. Technical report, Department of
should be inferred. Access to research and computing facilities Computer Science, University of Minnesota, 1998. Available on the
was provided by the Digital Technology Center and the WWW at URL http://www.cs.umn.edu/metis.
[24] G. Karypis and V. Kumar. A fast and highly quality multilevel
Minnesota Supercomputing Institute. scheme for partitioning irregular graphs. SIAM Journal on Scien-
tific Computing, 20(1), 1999. Also available on WWW at URL
http://www.cs.umn.edu/karypis. A short version appears in Intl. Conf.
R EFERENCES on Parallel Processing 1995.
[25] G. Karypis and V. Kumar. Multilevel k-way hypergraph partitioning.
[1] C. Ababei, N. Selvakkumaran, K. Bazargan, and G. Karypis. Multi- VLSI Design, 2000.
objectivecircuit partitioning for cutsize and path-based delay minimiza- [26] B. W. Kernighan and S. Lin. An efficient heuristic procedure for
tion. In Proceedings of ICCAD, 2002. Also available on WWW at URL partitioning graphs. The Bell System Technical Journal, 49(2):291307,
http://www.cs.umn.edu/karypis. 1970.
[2] C. Alpert and A. Kahng. A hybrid multilevel/genetic approach for circuit [27] P.Fishburn. Decision and Value Theory. J.Wiley & Sons, New York,
partitioning. In Proceedings of the Fifth ACM/SIGDA Physical Design 1964.
Workshop, pages 100105, 1996. [28] A. Pothen, H. D. Simon, and K.-P. Liou. Partitioning sparse matrices
[3] C. J. Alpert. The ISPD98 circuit benchmark suite. In Proc. of the Intl. with eigenvectors of graphs. SIAM Journal of Matrix Analysis and
Symposium of Physical Design, pages 8085, 1998. Applications, 11(3):430452, 1990.
[4] C. J. Alpert, J. H. Huang, and A. B. Kahng. Multilevel circuit [29] R.Keeney and H. Raiffa. Decisions with Multiple Objectives: Prefer-
partitioning. In Proc. of the 34th ACM/IEEE Design Automation ences and Value Tradeoffs. J.Wiley & Sons, New York, 1976.
Conference, 1997. [30] L. Sanchis. Multiple-way network partitioning. IEEE Trans. On
[5] C. J. Alpert and A. B. Kahng. Recent directions in netlist partitioning. Computers, 38(1):6281, 1989.
Integration, the VLSI Journal, 19(1-2):181, 1995. [31] K. Schloegel, G. Karypis, and V. Kumar. A new algorithm for multi-
[6] J. Babb, R. Tessier, and A. Agarwal. Virtual wires: Overcoming objective graph partitioning. In Proceedings of EuroPar 99, pages 322
pin limitations in FPGA-based logic emulators. In IEEE Workshop 331, 1999.
on FPGAs for Custom Computing Machines, pages 142151. IEEE [32] S. Shekhar and D. R. Liu. Partitioning similarity graphs: A framework
Computer Society Press, 1993. for declustering problmes. Information Systems Journal, 21(4), 1996.
[7] S. T. Barnard and H. D. Simon. A fast multilevel implementation of [33] H. D. Simon and S.-H. Teng. How good is recursive bisection? Technical
recursive spectral bisection for partitioning unstructured problems. In Report RNR-93-012, NAS Systems Division, NASA, Moffet Field, CA,
Proceedings of the sixth SIAM conference on Parallel Processing for 1993.
Scientific Computing, pages 711718, 1993. [34] M. Wang, S. K. Lim, J. Cong, and M. Sarrafzadeh. Multi-way
[8] T. Bui and C. Jones. A heuristic for reducing fill in sparse matrix partitioning using bi-partition heuristics. In Proceedings of ASPDAC,
factorization. In 6th SIAM Conf. Parallel Processing for Scientific pages 441446. IEEE, January 2000.
Computing, pages 445452, 1993. [35] S. Wichlund and E. J. Aas. On Multilevel Circuit Partitioning. In Intl.
[9] A. E. Caldwell, A. B. Kahng, and I. L. Markov. Improved algorithms for Conference on Computer Aided Design, 1998.
hypergraph bipartitioning. In Asia and South Pacific Design Automation [36] P. Yu. Multiple-Criteria Decision Making: Concepts, Techniques, and
Conference, pages 661666, 2000. Extensions. Plenum Press, New York, 1985.
[10] J. Cong and S. K. Lim. Multiway partitioning with pairwise movement. [37] H. Zha, X. He, C. Ding, H. Simon, and M. Gu. Bipartite graph
In Proceedings of ICCAD, pages 512516, 1998. partitioning and data clustering. In CIKM, 2001.
[11] J. Cong and M. L. Smith. A parallel bottom-up clustering algorithm with [38] K. Zhong and S. Dutt. Algorithms for simultaneous satisfaction of
applications to circuit partitioning in vlsi design. In Proc. ACM/IEEE multiple constraints and objective optimizaion in a placement flow with
Design Automation Conference, pages 755760, 1993. application to congestion control. In Proceedings of DAC, pages 854
[12] R. Cooley, B. Mobasher, and J. Srivastava. Web mining: Information and 859, 2002.
pattern discovery on the world wide web. In International Conference
on Tools with Artificial Intelligence, pages 558567, Newport Beach,
1997. IEEE.
IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, VOL XX, NO. XX, 2005 14

Navaratnasothie Selvakkumaran received a Ph.D.


from the Department of Computer Science at the
University of Minnesota in 2005, and is currently
working at Xilinx Corporation. His interests are in
the areas of VLSI physical design automation, graph
partitioning and optimization algorithms.

George Karypis received his Ph.D. degree in com-


puter science at the University of Minnesota and he
is currently an associate professor at the Department
of Computer Science and Engineering at the Uni-
versity of Minnesota. His research interests spans
the areas of parallel algorithm design, data mining,
bioinformatics, information retrieval, applications of
parallel processing in scientific computing and op-
timization, sparse matrix computations, parallel pre-
conditioners, and parallel programming languages
and libraries. His research has resulted in the devel-
opment of software libraries for serial and parallel graph partitioning (METIS
and ParMETIS), hypergraph partitioning (hMETIS), for parallel Cholesky
factorization (PSPASES), for collaborative filtering-based recommendation
algorithms (SUGGEST), clustering high dimensional datasets (CLUTO), and
finding frequent patterns in diverse datasets (PAFI). He has coauthored
over ninety journal and conference papers on these topics and a book title
Introduction to Parallel Computing (Publ. Addison Wesley, 2003, 2nd
edition). In addition, he is serving on the program committees of many
conferences and workshops on these topics and is an associate editor of the
IEEE Transactions on Parallel and Distributed Systems.

You might also like