Academia.eduAcademia.edu

Counting and generating integer partitions in parallel

1992, Proceedings ICCI `92: Fourth International Conference on Computing and Information

I I zyxwvutsrqpo Counting and Generating Integer Partitions in Parallel zy Laura, A. Sanchis zyxwvuts zyxwvuts zyxwvutsr zyxwvuts Coinputer Science Depa,rtineiit, Colgate University Hamilton, NY 13346 Abstract compute sequentially in O(N 2 ) time all quantities P ( n , k ) for 1 5 k 5 n 5 N for a given integer N . W e present parallel shared m.emory algorithms f o r counting the num.ber of partitaon,s of a given. integer N , where th,e partitions m.uy be s,ubject to restrictions, such as beinq composed of distinct parts, of IL given number of parts: and/or of parts belonging to a specified set. W e shoui that this can be done in polylogan'thmic parallel time, although the dgorithm requires an excessive number of processors. W e also present more practical algorithms that run, in. tim,e O(&?(log N ) ' ) but use m.uch fewer processors. The technique used in these algorith,ms can' be used to obtain adaptive, optimal algorithms for the case when a iamited number of processors is available. Parallel logarithmac time a1gon'thm.s th,at generate purtitions un,iformly at random., u s k y the q.iimntities com.puted by the coun<tingalgorithms. are ulso presmted. 1 Given these quantities it is then possible to generate at random a partition of R , for any n between 1 and N , in O ( n )time. This algorithm is described in (111. The above recurrence relation and others like it are not very suitable for parallelization. This paper presents other types of partition identities which allow for more parallelization in the counting of partitions. The computed quantities are then used in parallel algorithins for generating integer partitions at random. The generation algorithms rely on the fact that a polynomial nuniber of random choices can he made in constant time, in parallel, and then the pertinent ones combined in logarithmic time to produce the required random partition. Several algorithins for the enumeration of combinatorial objects can be found in the literature. Sequential algorithms for the generation of all of the partitions of a given integer, either in general or subject to some restrictions, may be found in [lo, 5, 14, 11, 131. The reference (111 includes an algorithm for the random generation of integer partitions. Most parallel algorithms in the area have focused on enunieratioil rather than on random generation (see (3, 1, 9, 41 and chapter 6 of (21). Reference (31 deals with generation of integer partitions. A sublogarithmic time parallel algorithm for the random generation of permutations may be found in (121. We assume the CREW PRAM (concurrent-read, exclusive-write parallel RAM) model of computation (see [GI). If . 2 ' 1 , 5 2 , . . . ~ n are natural numbers, then the prefix-sum computation problem consists of com21 for 1 5 k 5 n. We puting the n partial sums will use the result t h a t prefix sum computation can be done in time O(1ogn) using nllogn processors ([SI,Brent's theorem (61). We will need the following additional definitions. Let P ' ( n , k ) denote the number of partitions of n in which the largest part is less than or equal to IC. Thus P ' ( n , k )= E:=, P ( n , l ) . If S = (s1,sz,..., s y } , where s1 < s2 ... < s p , denote by P s ( n , k ) the nuin- zyxwvutsr zyxw zyxwvuts Introduction Let 71 be a positive integer. A partition of n into m parts is a representation of the forin I ) = k1 k2 ... k,,,, where each k , is a positive integer. The partition number P ( n ) gives the nuniber of partitions of the integer (the order of the parts is not important), while P( 12, k) denotes the number of partitions of I ) having largest part equal to k. P ( n ) and P( 1 2 , A - ) are fundamental combinatorial quantities witli several applications, including the random generation of partitions. However, no coinputatioiially useful closed foriiis for these quantities are known; they are typically coinput ed using recurrence relations. One of the basic partition identities is the following, which may be found among other places in [Ill: + + + P ( n ,k) = P ( n - k, k) + P ( n - 1, k - 1) (I) (The first, summand gives the number of partitions witli more than one part. equal to k, while the second summand gives t,lie number of partitions with only one part equal to k). This formula allows one to 54 0-8186-2812-X/!Z $03.00 Q 1992 IEEE I I zyxwvutsrqponmlkjihgfedcba zyxwvutsr zyxwvutsrq zyxwvuts zyxw zyxwvuts 3 ber of partitions of n having largest part equal to sk aiid all parts belonging to the set S. P s ( ~ L aiid ) P&(n,k) are defined analogously. We also define P D ( n ) to be the number of partitions of tt having distinct parts. PD(7i, k ) . P D ' ( n , k ) , P D s ( t i ,k ) , etc. are defined analogously. Finally, define P( [ I , A-, tn) to be tlie iiumber of partitions of the integer n into m parts, for which tlie largest part of the partition is k, and define P ' ( n , k , m ) = P ( t i , l , m ) and P " ( n , k , m )= E;"=] P(n,k,l). 2 More practical algorithms In this section we present more practical parallel algorithms for computing tlie values Ps(n,k) aiid Pi(71.,k) for 1 5 k 5 n 5 N, for a given N aiid a given set S. These algorithms run in time O( filog N) but require only [lV'.'/ log N1 processors, if the quantities Ps(n,k) and Pi(n,k) do not exceed tlie word size of the machine. However, these quaiitit,ies may grow quite rapidly with increasing n, particularly when the partitions are not subject to restrictions. For the sake of clarity we present the algorithms ignoring tlie potential word size problems, and then show what adjustments must be made in order to take tlie magnitude of the computed quantit,ies into account. Polylogarithmic time algorithm As remarked in tlie introduction, Forniula 1 can be used to compute tlie quaiitities P ( t I , k ) for 1 5 k 5 n 5 N, sequentially in quadratic time. The obvious way to parallelize tliis algoritliiii is to use -Ar steps, each of wliicli computes, for each subsequent n, the quantities P ( n , k )for 1 5 k 5 t j , in coiistaiit time, using at most N processors. This process takes O(N) time; it is clear that 110 further parallelism call be obtained from this particular formula. By finding other formulas where tlie comput atioii of a particular quantity depends on quantities whose parameters are further removed from those of tlie quantity being computed, more parallelism may be introduced. The algorithm presented in this section is based 011 the following formula for P( 1 1 , A*, I ) ) ) : 3.1 The basic algorithm zyxwvuts This algorithm is based on derivations from Formula 1. By repeated applications of this formula we obtain tlic following expanded formulas: \n/k-l] P(n-Tk-1,k-1) p(tl,k)= (2) r= 0 zyxwvutsrqponmlkj Here P ( n - r k - 1 , k - 1) gives tlie iiumber of partitions of n with largest part equal to k, in which exactly r + 1 of tlie parts have size k. For n > I;, k P ( n - k, I ) = P'(n - k, k) zyxwv zyxwvutsr P ( n ,k ) = (3) I=1 P ( n - k , I) gives the number of partitions of n with largest part equal to k aid second largest part equal to 1. Fix some integer q, 1 5 q 5 N, and assume that tlie quantities P( n , I ) have already been determined for all I I and all I such that 1 5 15 m i n ( q , n ) . Note that this iinplies that all values P ( n , k ) , 1 5 k 5 n, are known for all n between 1 and q, inclusive. It follows that Foriiiula 3 can now he used to compute, in parallel, the values P ( n ,k ) for all n , k such that q < ti 5 3q 1 aiid q < k 5 R (since in this case t i - k 5 q ) . This will then make it possible to compute tlie values P ( n ,k ) for all n, k such that 3q 1 < t ) 5 3q 2 aiid q < k 5 n, and so on. All reinaiiiiiig P ( n ,k ) values can thus be computed in roughly N / ( q 1) stages. The greater q is, the less will be the number of stages required, but more values P ( n , I) inust be precomputed. Formula 2 allows us to precompute the values P ( n , l ) for 1 5 n 5 N and 1 5 1 5 m i n ( q , n ) in q stages. Thus the total number of stages is about q N / ( q 1). I t is not hard to see that this quantity is minimized by setting q = fi- 1, resulting in O ( n ) total number (It is assumed that P"(n', k'. 111') = O if R' < 0 or if k' > I ] ' , aiid that P"(O,O, n i ' ) = 1 ) . This formulacan be explained as follows. Let I ) = kl A.2 ... k,,, where 1 5 1.1 5 k2 ... 5 k,,, = k. For any pair of integers 1 1 1 ,I 1 , if we assume that k [ r , 1 / 2 ]= E1 a i d k l + ...+kL,,,pj = 111. then there are P ( i i l , l l , [n)/2]) ways of clioosiiig the part!, k l . .... kL,,t/21. and PI'(t i nl - I l [ m / 2 1 , k - 1 1 . [m/21) mays of clioosiiig the reniaiiiiiig parts. The algorithm proceeds in stages. First the cliiaiitities P ( n ,k, 1) and P"( t ) , k , 1) are computed for all relevant n,k. At each successive stage t 2 1, the quantities P( t i , k , n i ) , P"( 1 3 , A., t n ) will have been computed for all t i , k and all 7 ~ 1such that 1 5 711 5 2'-', and therefore they can be used to compute these quantities for all t n such that "-I < )7i _< 2', using tlie above formula. Thus the algorithm takes [logNI stages; it can be shown that each stage requires O(1ogN) time and uses at iiiost [N5/lGl processors. + + + + + + + + 55 + zyxwvutsrqpo zyxwvutsr zyxwvutsrqp zyxwvutsrqp zyxwvut above formula. The values PD$(n,k) are also computed by setting P D $ ( n , k ) = PD$(n,k - 1) PDs(n,k). These computations take constant parallel time. Because of this, a lower overall runtime can he achieved. Suppose t h a t S = {1,2, ...,N } . Then if we set. g = Phase 2 consists of a t most N/( d m ) stages, each taking O(1og N) time. Thus the whole algorithm has time complexity O ( p m ) , but requires [N1.5/~5$q processors. Again, it is possible, as in the basic algorithm, t o achieve a run time of O ( 0 l o g N ) using [N1.5/log N1 processors. Partitions into a fixed number of parts can also be counted. For unrestricted partitions, one may use the fact that tlie number of partitions of n into k parts equals the number of partitions of n with largest part equal to k. For restricted partitions, the values Ps(n,k , m ) must be computed, and hence more processors are required (details omitted). of stages. Let tlie first part of tlie algorithm, where the values P ( n , l ) are comput,ed for all n , l where 1 n 5 N and 1 5 1 min,(q,n) be denoted as Phase 1. The rest of tlie algorithni will be denoted as Phase 2. It can be sliowii t,liat,each stage of Phase 1 can be performed in O(1ogN) time using N - 1 processors, using prefix sum comput,ations. Likewise each stage of Phase 2 can be performed in O(1og N) parallel time using [( q + l ) N / log NI = [A’’.‘/ log Nl processors (details are omitted). Thus tlie whole algorithm takes O( n l o g N) parallel time. The product of time and processors is O ( N 2 ) . Since O ( N 2 ) quantities are computed, this many sequential steps are necessary, and t h e parallel algorithm is optimal. s 3.2 + < Jm, zyxwvutsrq zyxwvutsr zyxwvutsrq Adaptive optimal algorithms When fewer than [N’.‘/logNl but. at least N processors are available, the strategy used in the hasic algorithm can still be employed, by adjust,ing tlie number of stages y used in Phase 1 to t.he number of available processors for Phase 2. Suppose T processors are available, where N 5 T < N ’ . ‘ / l o g N . If q stages are used for Phase 1, then Phase 2 requires N(y 1)/log N processors. Setting T equal to this quantity, me get that q should be (Tlog N ) / N - 1. Using t.liis many stages for Phase 1, we obtain an algorit.lini t.liat,uses T processors and whose time complexity is O(((I N / ( q 1))log N) = O(T(1ogN ) 2 / N- log N + N 2 / T ) .Hence t,lie product. of time and processors is O(T’(1og iV)’/N-T log N + N 2 ) ,which is O ( N 2 )since we are assuming that T < NI.’/ log N . Thus t.he algorit,lini is optminial. 3.4 As previously remarked, the quantities P ( n ,k), P’(n.,k ) , etc., grow exponentially as n increases. The + + 3.3 Other variations Taking the magnitude of the computed quantities into account asymptotic behavior of P ( n ) has been extensively analyzed; in [7] may be found tlie following approximation: log P ( n ) A f i , where A = The quantities P(n.,k), P’(n,k), Ps(n,k), etc. are of course no larger than P(n). Suppose that W words are required to hold each computed quantity. Adding 2 int,egers, each of which occupies W words, may he performed in parallel in O(1og W) time using [lV/ log 14’1 processors (O(1og W) time is required to propagate the carry bits; W / log W instead of W processors suffice by using the technique of processor improvement, [Brent]). Hence the algorithms presented in this section may be amended so t h a t the time required is multiplied by a factor of logW and the number of processors is multiplied by [W/ log W l . If we pessimistically assume that the word size of each processor is O(1og N), where N is the largest integer for which partitions are being counted, then each of tlie quant,ities to be computed requires a t most Vi/* = U ( f i / l o g N ) words. So the basic algorithm will require O(N2/(logN ) 3 ) processors and O( n ( l o g N)’) time. This algorithm is still optimal since O(N2.’/log N) values (words) are being computed. Note however that for many practical purposes, setting MI to a small constant will suffice. In this case it is probably more efficient to perform each single sum sequentially (in O ( W )steps) rather than multiplying the number of processors by [W/ log Wl . Also we remark that tlie same problem must be dealt with in any sequential algorithm t h a t computes these quantities. - + ~m. zyxwvu A similar algoritliiii can be used in tlie more general case where each part of the partition must belong to a specified set S = {SI, ...,s p } , where 1 5 SI < s2 < ... < sI, 5 AT and p = \SI > 0. This algorithm is based on tlie following formulas, which are generalizations of tlie formulas used in the basic case. The details are oniitted. The algorithm may also be adapted to count partitions consisting of distinct parts. Two phases are again used, but the same partition identity is used in both phases, namely P D s ( n , k ) = PDL(n s k , k - 1). At iteration k of Phase 1, the values P D s ( n , k ) are computed for all I I 2 SI: using the 56 4 References Generating partitions zyxw zyxwvutsr zyxwvutsrq zyxwvuts zyxwvutsr zyxwvut zyxwvut zyxwvut [l] Seliin G. Akl. Adaptive and optimal parallel algorithms for enumerating permutations The Computer Journal, and combinations. 30( 5):433-43G, 1987. The idea behind the generating algorithms is to first generate in parallel all of the random choices that may be necessary for the construction of the random partition. The rniidoiii choices that are actually used caii be combined to produce the required partition in O(1ogN) parallel time, where N is the integer being partitioned. We describe an algorithm for generating a random partit,ioii of N with parts restricted to lie in the set S and with largest part equal to S I C . In order to generatre a raiidom partition of N , first choose the largest. part by generating a random number between 0 and P s ( N ) and using the probabilities derived from the quantities Pk(N, l),...,P k ( N , p ) (this can be done in constant parallel time using p processors). The algorithm relies on the following formula: P s ( n , k ) = Ps(n,- s k , k ) + Ps(n - s k [2] Selim G. Akl. The Design and Analysis of Parallel Algorithms. Prentice Hall, 1989. (31 S.G. Akl and I. Stojmenovic. Parallel algorithms for generating integer partitions and compositions. Technical Report TR-91-34, Computer Science Department, University of Ottawa, September 1991, to appear in Journal of Combinatorial Mathematics and Combinatorial Computing. [4] G.H. Cheii and Maw-Sheng Chern. Parallel generation of permutatioiis and combinations. BIT, 26277-283, 1986. + ~ k - ~ -, k1) We assume the processors are indexed by tlie tuples (n,k),for 1 5 SI; 5 11 5 N . The following is a sketch of t.he algorithm. More details will be found in another version of this paper. First, each processor ( n ,k) assunies t,liat, a partition of n haviiig largest part equal to s k must be generated, and decides, using the appropriate probabilities (derived from the Ps(I I . , k) quantities) whether the second largest part should also equal sk (in which case Nezt(,,,I;)is set equal t,o 1 ) or whether this part should be siiialler t.lian sk (in which case Nezt(,,,k) is set equal to 0). In eit,lier case, a link is established to the processor t,liat, would make the next, choice about the part,ition. Once the links are est,ablished, followiiig t,he links from processor (Ar, I<) identifies all the processors t,liat act,ually participate in choosing the partit,ion. There are at. most N of these, and they are marked active. The next step is to identify tlie processors t,liat actually specify the size of a part of t,he generat.ed partition. These are the processors whose parent link conies from a processor ( n , k ) wit,li Ne.z.t(,,k) = 1. These processors are marked Chosen. The links are updated so that processors that are not marked Chosen are skipped in the linked list, of processors; the skipped processors are deactivated. Finally, each chosen processor can be assigned the part, number in t,he sequence for which it is responsible, so that. it, can write out the part in t,he correct, posit,ion. The algorit,liin as described above takes O(1og N ) time and requires at most, n'p processors. Adjustineiits may be made to take into account, t,he magnitude of the partition quantit.ies (det,ails omitted from this version). Using other partition iclent,ities, similar algorithms can be devised for the random geiieratioii of the other t,ypes of part,it,ioiis discussed in this paper. (5) T.I. Feiiiier and G. Loizou. Tree traversal related algorithms for generating integer partitions. SIAM Journal on Computing, 12(3):5515G4, 1983. [G) Alan Gibbons and Wojciecli Rytter. Bfi- cient Parallel Algorithms. Cambridge University Press, 1988. [7] Marshall Hall. Combinatorial Theory. John Wiley 8c. Sons, 1986 [8] R.E. Ladiier and M.J. Fischer. Parallel prefix computation. Journal of the ACM, 27(4):831838, 1980. [9] Chau-Jy Liii aiid Jong-Chuang Tsay. A systolic generation of combinations. BIT, 29:23-36, 1989. zyxwv [lo] T.V. Narayana, R.M. Mathsen, and J. Sarangi. An algorithm for generating partitions and its applications. Journal of Combinatorial Theory, 11:54-G1, 1971. [ll] Albert, Nijenhuis and Herbert S. Wilf. Combinatorial Algorithms. Academic Press, 1978. [12] Saiiguthevar Rajasekaran and John H. Reif. Optiinal aiid sublogarithinic time randomized parallel sorting algorithms. SIAM Journal on Computing, 18. 1989. [13] W. R.iha and K.R. James. Efficient algorithms for doubly aiid multiply restricted partitions. Computing, 16:163-1G8, 1976 [14] Carla D. Savage. Gray code sequences of partitions. Journal of Algorithm.s, 10:577-595, 1989. 57