Genetic Algorithms For Graph Partitioning and Incremental Graph Partitioning
Genetic Algorithms For Graph Partitioning and Incremental Graph Partitioning
Harpal Maini Kishan Mehrotra Chilukuri Mohan Sanjay R a w School of Computer and Information Science, 4-116 CST Syracuse University, Syracuse, NY 13244-4100
email: hsmaini/kishan/mohan/[email protected]
Abstract
Partitioning graphs into equally large groups of nodes, minimizing the number of edges between different groups, is an extremely important problem in parallel computing. This paper presents genetic algorithms for suboptimal graph partitioning, with new crossouer operators (KNUX, DKNUX) that lead to orders of magnitude improvement over traditional genetic operators in solution quality and speed. Our method can improve on good solutions previously obtained by using other algorithms or graph theoretic heuristics in minimizing the total communication cost or the worst case cost of communication for a single processor. We also extend our algorithm to Incremental Graph Partitioning problems, in which the graph structure or system properties changes with time.
Introduction
Graph partitioning is the task of dividing the nodes of a graph into groups called partitions, in such a way that each partition has roughly the same number of nodes, and minimizing the cut-size, i.e., the number of edges that connect nodes in different partitions. This problem has important applications in parallel computing. For instance, efficiently parallelizing many scientific and engineering applications requires partitioning data or tasks among processors, such that the computational load on each node is roughly the same, while inter-processor communication is minimized.
'Partially supported by NSF grant CCR-9110812 and DARPA contract #DABY63-91-C-0028. The contents of this paper do not necessarily reflect the position or policy of the United States government, and no official endorsement should be inferred.
Obtaining exact solutions for graph partitioning is computationally intractable, and several suboptimal methods have been suggested for finding good solutions to the graph partitioning problem. Important heuristics include recursive coordinate bisection, recursive graph bisection, recursive spectral bisection, mincut based methods, clustering techniques, geometry-based mapping, block-based spatial decomposition, and scattered decomposition [3, 11, 12, 151. Genetic algorithms (GA's) are stochastic statespace search techniques modeled on natural evolutionary mechanisms 141. The population, a set of individuals (potential solutions to the optimization problem) steadily changes with time due t o the applica-, tion of operators such as crossouer and mutation. A selection process determines which individuals (from among parents and offspring) remain in the next generation. Genetic algorithms have been used in the past to find good suboptimal solutions t o the graph1 partitioning problem [l,8, 5, 6 . We present genetic algorithms for graph partitioning, using new crossover operators that utilize information available from the history of genetic search. Our work is characterized by the following features:
1. Use of prior information to improve solu tions. 2. Efficient partitioning of graphs to which incre-
rithm model.
4. Refinement of partitions obtained by other methods.
5 . Optimization of the worst case communication
cost, a non-differentiablefunction.
We have obtained excellent results due to newly developed genetic recombination operators (KNUX
449
Authorized licensed use limited to: ULAKBIM UASL - Bahcesehir University. Downloaded on March 9, 2009 at 16:44 from IEEE Xplore. Restrictions apply.
and DKNUX) that exploit domain-specific knowledge. These give improved solutions and faster convergence rates when compared with the traditional crossover operators. Exact comparisons of the different algorithms are not available due to the unavailability of benchmark problems and results. However, our experiments with the traditional crossover operators used by some of these researchers gave results of lower quality than using the operators presented in this paper. The results achieved by our methods are better or comparable to the best known methods for graph partitioning, for graphs with a few hundred nodes. The quality of solutions obtained using DKNUX is competitive with recursive spectral bisection as a graph partitioning strategy, especially for incremental graph partitioning. However, genetic algorithms do require much more execution time than greedy algorithms, and are recommended in applications where the quality of solution is important enough to warrant the extra computational effort. Fortunately, GAs are readily parallelizable, with near-linear speedups. Applying a prior graph contraction step should precede the partitioning of very large graphs using GAs. The rest of this section introduces notation. Section 2 describes how genetic algorithms are applied to the graph partitioning problem. Experimental results follow.
We would like to make an assignment such that the time ( W ( q ) PC(q)), spent by every node is minimized, where P represents the cost of unit computation/cost of unit communication on a machine. This objective is achieved by minimizing either
W ( q )+ P
9
9
C(q),
which focuses on the communication cost for the worst partition. For domain decomposition methods, optimizing the latter function is more desirable. The former has been traditionally used because most methods require that the function being optimized is differentiable. Our methods work with either formulation.
1.1
Notation
Consider a graph G = (V, E ) , where V represents a set of vertices, E represents a set of undirected edges, and the number of vertices is given by n = [VI, the number of edges is given by m = /El. The graphpartitioning problem consists of finding an assignment scheme M : V P that maps vertices to partitions. We denote by B(q)the set of vertices assigned to a partition q, i.e., B ( q ) = {U E V : M ( v ) = q } . For graphs representing the computational structure of a physical domain, each vertex U, E V , 1 5 i 5 m corresponds to a physical coordinate in a d-dimensional space ( x , ,~ S 2 , . . ,x E d ) ,and each edge is a pair (vZ1,w t 2 ) . x For such graphs, these edges connect physically proximate vertices. The weight w,corresponds to the computation cost (or weight) of the vertex w,. The cost of an edge ~~(~1,212) is given by the amount of interaction between vertices 211 and 212. Thus the weight of every partition can be defined as
This section describes the representation used to solve the graph partitioning problem, the functions being optimized, the genetic operators used, and various methods of improving the performance of the GA.
2.1
Representation
For graph partitioning, we select a vector representation for each individual (candidate solution), in which the ith element of an individual is j iff the ith node of the graph is allocated to the partition labelled j. For instance, the string 11100011 represents the mapping that assigns nodes 1,2,3,7,8 to partition (processor) 1 and nodes 4,5,6 to partition (processor) 0.
2.2
Fitness function
w(q)
= C u , E B ( q )w t .
The cost of all the outgoing edges from a partition represent the total amount of communication cost and is given by
Fitness is a numerical quantity evaluated using the implied load imbalance and communication costs. If the graph is one in which the ith node is adjacent to the (i + l)stnode for each i, then 11100011 would be less fit than 11100001 (which is a more balanced partition), but more fit than 10101011 (which has 6 inter-partition edges). We use the following two fitness and functions, approximating W ( q ) by (IB(q)I-
k)2
450
Authorized licensed use limited to: ULAKBIM UASL - Bahcesehir University. Downloaded on March 9, 2009 at 16:44 from IEEE Xplore. Restrictions apply.
Fitness Function
1:
c ( l B ( q ) l- - ) z + , b x C ( y )
n
important to obtain a good, fast heuristic estimate of a solution. DKNUX utilizes information inherent in the history of the genetic search, and continually updates the estimate I to be the current best solution, using this to build the bias vector.
2.5
L Fitness Function 2: c ( I B ( y ) I - -) n
Q
+,bmqwC(y)
23 .
Crossover
One-point crossover 141 works by selecting a site in chromosomes CY,^ and y6 to produce a6 and rP. A popular generalization is 2-point crossover, in which the parents a/3r and 664 produce offspring acy and 15/34. This has been further generalized to Ic-point crossover. In uniform crossower (UX) [14], the it component of an offspring is chosen to be the same as that of one of the two parents, with equal probability. UX ignores the fact that one parent may have much better genetic material than another, or that one region of the search space is already known to produce individuals of higher fitness than other regions. UX can be described in terms of a bit-vector mask, each bit of which determines the parent from which an offspring inherits a value for a particular bit-position. Our new Knowledge-based Non- Unaform Crossover operator ( K N U X ) generalizes this idea, using a bias probability vector p = ( P I , . . . ,pn),where each p , is a real number E [0,1]. The value of each bias probability p , depends on i , the relative fitness of the parent strings, and on problem-specific knowledge. Given p and the two parents, a = ( a l , . . . ,a,) and b = ( b l , . . . ,bn), the offspring c = (cl,. . . ,c,) is obtained such that if a, = b,, then C, = a,, else the probability that cz = a, is p , . For graph partitioning, an initial candidate solution I is first generated. Let v ( i ) be the set of neighbors of node a in the graph under consideration. For any candidate solution X , let # ( i ,X , I ) be the number of nodes in .(a) that are allocated by I to partition X,. If a and b are the two parents, then we define
pi =
We use a coarse-grained, distributed-population genetic algorithm (DPGA), where individuals are distributed into various subpopulations which may be physically located on different processors configured in some architecture (e.g., mesh). Crossovers are restricted to occur between members of the same subpopulation. Each subpopulation periodically communicates copies of its best individuals to its neighboring subpopulations (situated on neighboring processors in the parallel architecture); this is how genetic information is exchanged.
2.6
Population Initialization
The initial population can be seeded with a preestimated heuristic solution such as that obtained through an Index Based Partitioning scheme or the results of recursive spectral bisection. In the incremental case, the previous partitioning can itself be used to generate a good partitioning for the changed graph by randomly assigning new graph nodes to various nodes, while at the same time ensuring that balance is maintained.
2.7
Hill Climbing
It is possible to perform hill-climbing on offspring, to obtain the nearest local optima of the fitness function. Only the boundary points of each partition (with neighbors in other partitions) are examined to see if migrating them to the appropriate neighboring partition improves fitness.
Experimental Results
2.4
The quality of solutions obtained by KNUX depends on the quality of the heuristic estimate ( I above) used t o derive bias probabilities. It is therefore
In this section, we compare the results obtained using our approach with those of traditional heuristics (e.g., IBP or RSB) as well as with genetic algorithms that invoke traditional crossover operators. The figures are obtained by averaging the results of 5 runs, and the tables represent the best solutions obtained in these 5 runs. All experiments were done with algorithm DPGA set with a total population size of 320. The crossover rate p , = 0.7 and the mutation rate
451
Authorized licensed use limited to: ULAKBIM UASL - Bahcesehir University. Downloaded on March 9, 2009 at 16:44 from IEEE Xplore. Restrictions apply.
p , = 0.01. Tables 1, 2 and 3 report C , C ( q ) / 2 values, while Tables 4, 5 and 6 report max, C(q) values, where C(q) is the number of edges that cut across partition q. Experiments were conducted with a single population as well as with 16 subpopulations configured as a four dimensional hypercube. Graphs with unit weight nodes and edges were assumed, although weighted edges and nodes can also be handled easily. For clarity, the cut-size numbers are given in the tables, instead of the actual fitness function values; for graphpartitioning, smaller cut-size numbers indicate superior performance. The results establish very clearly the excellent performance of KNUX and DKNUX in comparison with two-point crossover and also that DKNUX is competitive with recursive spectral bisection as a graph partitioning strategy.
Number of Partitions 139 Nodes Cut Using RSB 213 Nodes Cut Using 243 Nodes Cut Using Cut Using 279 Nodes Cut Using Cut Using RSB DKNUX
11
RSB
DKNUX RSB
43 47 36 37
n
78 88 139 155
3.1
Table 2: Improving the Solution found through Recursive Spectral Bisection, using Fitness Function 1. In each case, the total number of inter-partition edges is reported for the best individual explored by the GA.
3.3
Fast heuristic algorithms can be used to obtain an
initial candidate solution which is then improved by applying the genetic algorithm. Table 1 compares the results of Recursive Spectral Bisection (RSB) [ll, 12, 131 with the GA initialized by a solution obtained by the Index-Based Partitioning algorithm (IBP) [lo] described in the Appendix. Number of Partitions 167 Nodes Cut Using DKNUX Cut Using RSB 144 Nodes Cut Using DKNUX Cut Using RSB 2 20 20
I
4
63 59
8 109 120
11 33 I 65 I 120
I/
36
Unlike other methods which can be used only with a differentiable optimization function, genetic algorithms can be used t o solve the original problem, directly optimizing 01 C,(IB(q)I P m a , Ciep,jeq e i j - Table 4 exhibits the effect of partitioning graphs of 78, 88, 98, 144 and 167 nodes into 4 and 8 partitions. Table 4 shows the best solution found using operator DKNUX is better than that obtained using RSB in most cases. In other cases, improvements can be obtained by seeding the initial population with a heuristically obtained good solution such as the index based partitioner.
i)2 +
78
I 119
Table 1: A Comparison of the Best Solutions found Using DKNUX and RSB: starting with a population initialized with an IBP solution, using Fitness Function 1. In each case, the total number of inter-partition edges is reported for the best individual explored by the GA.
Conclusions
3.2
For this series of experiments, we start with a graph, partition it, then modify by adding some number of nodes in a local area chosen randomly within the graph. The modified graphs are then partitioned.
We have solved the graph partitioning problem using GAs with new knowledge-based crossover operators; problem-specific knowledge is used t o generate bias probabilities, and the environment and current population play roles in controlling genetic expression. The trajectory that the population takes in search space is constrained, driving evolution in certain preferred directions. We have introduced novel operators that exploit the locality information inherent in most computational graphs. We have shown this enhances the speed and performance of genetic search by orders of magnitude. We have demonstrated that genetic algorithms can be
452
Authorized licensed use limited to: ULAKBIM UASL - Bahcesehir University. Downloaded on March 9, 2009 at 16:44 from IEEE Xplore. Restrictions apply.
Number of Partitions 78 Nodes Worst Cut Using DKNUX Worst Cut Using RSB 88 Nodes Worst Cut Using DKNUX Worst Cut Using RSB 98 Nodes Worst Cut Using DKNUX Worst Cut Using RSB 213 Nodes Worst Cut Using DKNUX Worst Cut Using RSB 243 Nodes Worst Cut Using DKNUX Worst Cut Using RSB 279 Nodes Worst Cut Using DKNUX Worst Cut Using RSB 309 Nodes Worst Cut Using DKNUX Worst Cut Using RSB
4 23 26 24 33 24 30 40 46 45 51 42 46 44 46
a
20
Table 5: A Comparison of the Best Solutions found Using DKNUX: Improving Upon RSB Solutions Using Fitness Function 2. Worst Cut refers to maz,C(q), where C(q) is the number of edges leading out of partition q. For the GA, the maximum number of edges leading out of a partition is reported, for the best individual explored by the GA.
Number of Partitions 78 plus 10 nodes Worst Cut Using DKNUX Worst Cut Using RSB 78 plus 20 nodes Worst Cut Using DKNUX 118 plus 21 Nodes Worst Cut Using DKNUX Worst Cut Using RSB 118 plus 41 Nodes Worst Cut Using DKNUX Worst Cut Using RSB 183 plus 30 Nodes Worst Cut Using DKNUX Worst Cut Using RSB 183 plus 60 Nodes Worst Cut Using DKNUX Worst Cut Using RSB 249 plus 30 Nodes Worst Cut Using DKNUX Worst Cut Using RSB 249 plus 60 Nodes Worst Cut Using DKNUX Worst Cut Using RSB -
56 52
Table 6: A Comparison of the Best Solutions found Using DKNUX and RSB: Incremental Partitioning with Fitness Function 2. Worst Cut refers t o maz,C(q), where C(q) is the number of edges leading out of partition q. For the GA, the maximum number of edges leading out of a partition is reported, for the best individual explored by the GA.
453
Authorized licensed use limited to: ULAKBIM UASL - Bahcesehir University. Downloaded on March 9, 2009 at 16:44 from IEEE Xplore. Restrictions apply.
1 Number of Partitions I(
Number of Partitions
118 plus 41 Nodes Cut Using DKNUX Cut Using RSB 183 plus 30 Nodes Cut Using DKNUX Cut Using RSB
)I 31 1 66 I 120
(1
33 37 41
75 72 82
Table 3: A Comparison of the Best Solutions found Using DKNUX and RSB: Incremental Graph Partitioning, using Fitness Function 1. In each case, the total number of inter-partition edges is reported for the best individual explored by the GA.
used to greatly refine previously estimated partitions with the help of KNUX and DKNUX. We show how the strategies discussed in this paper extend naturally to incremental graph partitioning. The incremental partitioning results obtained using DKNUX could not be obtained by a simple deterministic algorithm that assigns new nodes to the partition to which most of its nearest neighbors belong. Performance can further be improved by incorporating a hill-climbing step. We have presented preliminary results showing the feasibility of this approach and the gains obtainable by examining the history of the search process; unfortunately, partitioning very large graphs does require high amounts of computation by the genetic algorithm. A prior graph contraction step would allow these techniques to be applied to graphs much larger than those explored in this paper [13]. Some gains can be expected from executing the GA on parallel computers, since DPGA is an inherently parallel algorithm from which we can expect near-linear speedups. We are currently parallelizing the algorithm to run on distributed memory machines such as the CM-5 and the Intel Paragon.
Table 4: A Comparison of the Best Solutions found Using DKNUX and RSB: Starting with a Randomly Initialized Population and Using Fitness Function 2. Worst Cut refers to maz,C(q), where C(q) is the number of edges leading out of partition q. For the GA, the maximum number of edges leading out of a partition is reported, for the best individual explored by the GA. hypercubes, Proc. 4th ICGA, 1991, pp. 244248. R. Collins and D. Jefferson, Selection in massively parallel genetic algorithms, Proc. 4th ICGA, 1991, pp. 249-256.
F. Ercal, Heuristic Approaches t o Task Allocation for Parallel Computing, Ph.D. Thesis, Ohio State University, 1988.
J . H. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, 1975. D. R. Jones and M. A. Beltramo, Solving partitioning problems with genetic algorithms, Proc. of the 4th ICGA, 1991, pp. 442-450. Gregor von Laszewski, Intelligent structural operators for the k-way graph partitioning problem, Proc. of the 4th ICGA, 1991, pp. 45-52. H. S. Maini, Incorporation of Knowledge in Genetic Recombination, Ph.D. Thesis,
References
J. P. Cohoon, W. N. Martin, and D. S. Richards, A multi-population genetic algorithm for solving the k-partition problem on
454
Authorized licensed use limited to: ULAKBIM UASL - Bahcesehir University. Downloaded on March 9, 2009 at 16:44 from IEEE Xplore. Restrictions apply.
School of Computer & Information Science, Syracuse University, August 1994. [8] Nashat Mansour, Physical Optimization Algorithms for Mapping Data to DistributedMemory Multiprocessors, Ph.D. Thesis, School of Computer and Information Science, Syracuse University, 1992. [9] H. Muhlenbein, Parallel genetic algorithms, population genetics and combinatorial optimization, Proc. 3rd ICGA, 1989, pp. 416422.
[lo] Chao-Wei
Ou, Sanjay Ranka, and Geoffrey Fox, Fast mapping and remapping algorithm for irregular and adaptive problems, Proc. of the International Conference on Parallel and Distributed Computing, December 1993.
00 01 02 03 04 05 06 07
08 16 24 32 40 48 56 09 17 25 33 41 49 57
[ll] A. Pothen, H. Simon, and K-P. Liou, Partitioning sparse matrices with eigenvectors of graphs, SIAM J. Matrix Anal. Appl., 11, 3 (July), 1990, pp. 430-452.
10 18 26 34 42 50 58
11 19 27 35 43 51 59
12 20 28 36 44 52 60
13 21 29 37 45 53 61
14 22 30 38 46 54 62
15 23 31 39 47 55 63
00 02 08 10 32 34 40 42
01 03 09 11 33 35 41 43
04 06 12 14 36 38 44 46
05 07 13 15 37 39 45 47
16 18 24 26 48 50 56 58
17 19 25 27 49 51 57 59
20 22 28 30 52 54 60 62
21 23 29 31 53 55 61 63
[12] H. Simon, Partitioning of unstructured mesh problems for parallel processing, Proc. Conf. Parallel Methods on Large Scale Structural Analysis and Physics Applications, Pergamon Press, 1991.
[13] S. T . Barnard, H. D. Simon, A fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems, Technical Report RNR-92-033, November 1992.
Figure 1: (a) Row-Major and (b) Shuffled Row-Major Indexing for an 8 x 8 image
[14] G. Syswerda, Uniform crossover in genetic algorithms, Proc. of the 3rd ICGA, 1989, pp. 2-9. [15] R. D. Williams, Performance of dynamic load balancing algorithms for unstructured mesh calculations, Concurrency: Practice and Experience, 3(5), 1991, pp. 457-481.
A simple example of interleaving indices is as follows. Suppose index1 = 001, index2 = 010, and index3 = 110. Then the interleaved index would be 001011100. In the above case the number of bits in each dimension are equal. This could easily be generalized to cases when the sizes are different. For example if index1 = 101, index2 = 01, and index3 = 0, then the interleaved index would be 100110. This is done by choosing bits (right to left) of each of the dimensions one by one, starting from dimension 3. When the bits of a particular dimension are no longer available, that dimension is not considered. After indexing is done, an efficient sorting algorithm can be applied to sort these vertices according to their indices. Finally, this sorted list is divided into P equal sublists.
455
Authorized licensed use limited to: ULAKBIM UASL - Bahcesehir University. Downloaded on March 9, 2009 at 16:44 from IEEE Xplore. Restrictions apply.
155
1
..... .......
I
243to8
Two-Point DKNUX RSB(151)
....
160 155
i;j
---
Two-Point DKNUX
....
RSB (154)
---
6
140
...........................
I I
150 145
...........
I I
- .....
-........
I
........._....
I
200213t04 86
300
DKNUX
400
....
500
-
140
100
I
200
I
243to4
300
400
500
98 96
bl
.e
Two-Point
94 92
v)
78
1
0
...-.I
................. ..........................
~
6
- .....
I
c1
90
88
I
76 74
t'
100
I
200 213t02
I
300
I
400
I
500
100
200
300
243to2
400
500
Two-Point DKNUX
....
48
v) .e
RSB (47)
-__-___-____-__--------------p
' 3
.-I
4 6
-....
'tJ
36
100
200
300
400
500
100
200
300
400
500
GENERATIONS
GENERATIONS
Figure 2: Partitioning a 213 node graph and a 243 node graph into 2, 4 and 8 partitions: The effect of operators Two-Point and DKNUX on improving solutions obtained through RSB
456
Authorized licensed use limited to: ULAKBIM UASL - Bahcesehir University. Downloaded on March 9, 2009 at 16:44 from IEEE Xplore. Restrictions apply.
159to8 280
260
Two-Point I
DyX
240
220
d)
. ,
'
'
RSB 128) - - -
.A
I
500
139t08
Two-Point -
180
160 - ": 150 - :,,
140
120 0
.... .._
-.....
100
I
200
159104
300
400
I
200
180 160
,
Two-Point
DKNUX
.... BSB(75) - - -
..
Two-Point -
40 _----230 -
I*__
____. , - ..................
_*lr
-...................
..
I
- .....
I
I
100
200
300
400
500
GENERATIONS
Figure 3: Partitioning a 118 node graph incremented by 21 nodes and 41 nodes; into 2 , 4 and 8 partitions: A comparison between Two-Point, DKNUX and RSB
451
Authorized licensed use limited to: ULAKBIM UASL - Bahcesehir University. Downloaded on March 9, 2009 at 16:44 from IEEE Xplore. Restrictions apply.