I
I
zyxwvutsrqpo
Counting and Generating Integer Partitions in Parallel
zy
Laura, A. Sanchis
zyxwvuts
zyxwvuts
zyxwvutsr
zyxwvuts
Coinputer Science Depa,rtineiit, Colgate University
Hamilton, NY 13346
Abstract
compute sequentially in O(N 2 ) time all quantities
P ( n , k ) for 1 5 k 5 n 5 N for a given integer N .
W e present parallel shared m.emory algorithms f o r
counting the num.ber of partitaon,s of a given. integer
N , where th,e partitions m.uy be s,ubject to restrictions, such as beinq composed of distinct parts, of IL
given number of parts: and/or of parts belonging to
a specified set. W e shoui that this can be done in
polylogan'thmic parallel time, although the dgorithm
requires an excessive number of processors. W e also
present more practical algorithms that run, in. tim,e
O(&?(log N ) ' ) but use m.uch fewer processors. The
technique used in these algorith,ms can' be used to obtain adaptive, optimal algorithms for the case when
a iamited number of processors is available. Parallel
logarithmac time a1gon'thm.s th,at generate purtitions
un,iformly at random., u s k y the q.iimntities com.puted
by the coun<tingalgorithms. are ulso presmted.
1
Given these quantities it is then possible to generate
at random a partition of R , for any n between 1 and
N , in O ( n )time. This algorithm is described in (111.
The above recurrence relation and others like it
are not very suitable for parallelization. This paper presents other types of partition identities which
allow for more parallelization in the counting of partitions. The computed quantities are then used in
parallel algorithins for generating integer partitions
at random. The generation algorithms rely on the
fact that a polynomial nuniber of random choices
can he made in constant time, in parallel, and then
the pertinent ones combined in logarithmic time to
produce the required random partition.
Several algorithins for the enumeration of combinatorial objects can be found in the literature.
Sequential algorithms for the generation of all of
the partitions of a given integer, either in general
or subject to some restrictions, may be found in
[lo, 5, 14, 11, 131. The reference (111 includes an
algorithm for the random generation of integer partitions. Most parallel algorithms in the area have
focused on enunieratioil rather than on random generation (see (3, 1, 9, 41 and chapter 6 of (21). Reference (31 deals with generation of integer partitions.
A sublogarithmic time parallel algorithm for the random generation of permutations may be found in
(121.
We assume the CREW PRAM (concurrent-read,
exclusive-write parallel RAM) model of computation
(see [GI). If . 2 ' 1 , 5 2 , . . . ~ n are natural numbers, then
the prefix-sum computation problem consists of com21 for 1 5 k 5 n. We
puting the n partial sums
will use the result t h a t prefix sum computation can
be done in time O(1ogn) using nllogn processors
([SI,Brent's theorem (61).
We will need the following additional definitions.
Let P ' ( n , k ) denote the number of partitions of n
in which the largest part is less than or equal to IC.
Thus P ' ( n , k )= E:=, P ( n , l ) . If S = (s1,sz,..., s y } ,
where s1 < s2 ... < s p , denote by P s ( n , k ) the nuin-
zyxwvutsr
zyxw
zyxwvuts
Introduction
Let 71 be a positive integer. A partition of n
into m parts is a representation of the forin I ) =
k1 k2
... k,,,, where each k , is a positive integer. The partition number P ( n ) gives the nuniber of partitions of the integer (the order of the
parts is not important), while P( 12, k) denotes the
number of partitions of I ) having largest part equal
to k. P ( n ) and P( 1 2 , A - ) are fundamental combinatorial quantities witli several applications, including
the random generation of partitions. However, no
coinputatioiially useful closed foriiis for these quantities are known; they are typically coinput ed using
recurrence relations. One of the basic partition identities is the following, which may be found among
other places in [Ill:
+ + +
P ( n ,k) = P ( n - k, k) + P ( n - 1, k - 1)
(I)
(The first, summand gives the number of partitions
witli more than one part. equal to k, while the second
summand gives t,lie number of partitions with only
one part equal to k). This formula allows one to
54
0-8186-2812-X/!Z $03.00 Q 1992 IEEE
I
I
zyxwvutsrqponmlkjihgfedcba
zyxwvutsr
zyxwvutsrq
zyxwvuts
zyxw
zyxwvuts
3
ber of partitions of n having largest part equal to
sk aiid all parts belonging to the set S. P s ( ~ L
aiid
)
P&(n,k) are defined analogously. We also define
P D ( n ) to be the number of partitions of tt having
distinct parts. PD(7i, k ) . P D ' ( n , k ) , P D s ( t i ,k ) , etc.
are defined analogously. Finally, define P( [ I , A-, tn)
to be tlie iiumber of partitions of the integer n into
m parts, for which tlie largest part of the partition
is k, and define P ' ( n , k , m ) =
P ( t i , l , m ) and
P " ( n , k , m )= E;"=]
P(n,k,l).
2
More practical algorithms
In this section we present more practical parallel algorithms for computing tlie values Ps(n,k)
aiid Pi(71.,k) for 1 5 k 5 n 5 N, for a given N
aiid a given set S. These algorithms run in time
O( filog N) but require only [lV'.'/ log N1 processors, if the quantities Ps(n,k) and Pi(n,k) do not
exceed tlie word size of the machine. However, these
quaiitit,ies may grow quite rapidly with increasing n,
particularly when the partitions are not subject to
restrictions. For the sake of clarity we present the algorithms ignoring tlie potential word size problems,
and then show what adjustments must be made in
order to take tlie magnitude of the computed quantit,ies into account.
Polylogarithmic time algorithm
As remarked in tlie introduction, Forniula 1 can
be used to compute tlie quaiitities P ( t I , k ) for 1 5
k 5 n 5 N, sequentially in quadratic time. The
obvious way to parallelize tliis algoritliiii is to use -Ar
steps, each of wliicli computes, for each subsequent
n, the quantities P ( n , k )for 1 5 k 5 t j , in coiistaiit
time, using at most N processors. This process takes
O(N) time; it is clear that 110 further parallelism call
be obtained from this particular formula.
By finding other formulas where tlie comput atioii
of a particular quantity depends on quantities whose
parameters are further removed from those of tlie
quantity being computed, more parallelism may be
introduced. The algorithm presented in this section
is based 011 the following formula for P( 1 1 , A*, I ) ) ) :
3.1
The basic algorithm
zyxwvuts
This algorithm is based on derivations from Formula 1. By repeated applications of this formula we
obtain tlic following expanded formulas:
\n/k-l]
P(n-Tk-1,k-1)
p(tl,k)=
(2)
r= 0
zyxwvutsrqponmlkj
Here P ( n - r k - 1 , k - 1) gives tlie iiumber of partitions of n with largest part equal to k, in which
exactly r + 1 of tlie parts have size k. For n > I;,
k
P ( n - k, I ) = P'(n - k, k)
zyxwv
zyxwvutsr
P ( n ,k ) =
(3)
I=1
P ( n - k , I) gives the number of partitions of n with
largest part equal to k aid second largest part equal
to 1.
Fix some integer q, 1 5 q 5 N, and assume that
tlie quantities P( n , I ) have already been determined
for all I I and all I such that 1 5 15 m i n ( q , n ) . Note
that this iinplies that all values P ( n , k ) , 1 5 k 5 n,
are known for all n between 1 and q, inclusive. It
follows that Foriiiula 3 can now he used to compute, in parallel, the values P ( n ,k ) for all n , k such
that q < ti 5 3q 1 aiid q < k 5 R (since in this
case t i - k 5 q ) . This will then make it possible
to compute tlie values P ( n ,k ) for all n, k such that
3q 1 < t ) 5 3q 2 aiid q < k 5 n, and so on. All
reinaiiiiiig P ( n ,k ) values can thus be computed in
roughly N / ( q 1) stages. The greater q is, the less
will be the number of stages required, but more values P ( n , I) inust be precomputed. Formula 2 allows
us to precompute the values P ( n , l ) for 1 5 n 5 N
and 1 5 1 5 m i n ( q , n ) in q stages. Thus the total
number of stages is about q N / ( q 1). I t is not
hard to see that this quantity is minimized by setting q = fi- 1, resulting in O ( n ) total number
(It is assumed that P"(n', k'. 111') = O if R' < 0 or if
k' > I ] ' , aiid that P"(O,O, n i ' ) = 1 ) . This formulacan
be explained as follows. Let I ) = kl A.2 ... k,,,
where 1 5 1.1 5 k2 ... 5 k,,, = k. For any pair of
integers 1 1 1 ,I 1 , if we assume that k [ r , 1 / 2 ]= E1 a i d
k l + ...+kL,,,pj = 111. then there are P ( i i l , l l , [n)/2])
ways of clioosiiig the part!, k l . .... kL,,t/21. and PI'(t i nl - I l [ m / 2 1 , k - 1 1 . [m/21) mays of clioosiiig the
reniaiiiiiig parts.
The algorithm proceeds in stages. First the cliiaiitities P ( n ,k, 1) and P"( t ) , k , 1) are computed for
all relevant n,k. At each successive stage t 2 1,
the quantities P( t i , k , n i ) , P"( 1 3 , A., t n ) will have been
computed for all t i , k and all 7 ~ 1such that 1 5 711 5
2'-', and therefore they can be used to compute
these quantities for all t n such that "-I < )7i _< 2',
using tlie above formula. Thus the algorithm takes
[logNI stages; it can be shown that each stage requires O(1ogN) time and uses at iiiost [N5/lGl processors.
+ + +
+
+
+
+
+
55
+
zyxwvutsrqpo
zyxwvutsr
zyxwvutsrqp
zyxwvutsrqp
zyxwvut
above formula. The values PD$(n,k) are also computed by setting P D $ ( n , k ) = PD$(n,k - 1)
PDs(n,k). These computations take constant parallel time. Because of this, a lower overall runtime
can he achieved. Suppose t h a t S = {1,2, ...,N } .
Then if we set. g =
Phase 2 consists of
a t most N/( d m ) stages, each taking O(1og N)
time. Thus the whole algorithm has time complexity O ( p m ) , but requires [N1.5/~5$q processors. Again, it is possible, as in the basic algorithm, t o achieve a run time of O ( 0 l o g N ) using
[N1.5/log N1 processors.
Partitions into a fixed number of parts can also
be counted. For unrestricted partitions, one may
use the fact that tlie number of partitions of n into
k parts equals the number of partitions of n with
largest part equal to k. For restricted partitions,
the values Ps(n,k , m ) must be computed, and hence
more processors are required (details omitted).
of stages. Let tlie first part of tlie algorithm, where
the values P ( n , l ) are comput,ed for all n , l where
1 n 5 N and 1 5 1
min,(q,n) be denoted as
Phase 1. The rest of tlie algorithni will be denoted
as Phase 2. It can be sliowii t,liat,each stage of Phase
1 can be performed in O(1ogN) time using N - 1
processors, using prefix sum comput,ations. Likewise
each stage of Phase 2 can be performed in O(1og N)
parallel time using [( q + l ) N / log NI = [A’’.‘/ log Nl
processors (details are omitted). Thus tlie whole algorithm takes O( n l o g N) parallel time. The product of time and processors is O ( N 2 ) . Since O ( N 2 )
quantities are computed, this many sequential steps
are necessary, and t h e parallel algorithm is optimal.
s
3.2
+
<
Jm,
zyxwvutsrq
zyxwvutsr
zyxwvutsrq
Adaptive optimal algorithms
When fewer than [N’.‘/logNl but. at least N
processors are available, the strategy used in the hasic algorithm can still be employed, by adjust,ing tlie
number of stages y used in Phase 1 to t.he number of
available processors for Phase 2.
Suppose T processors are available, where N 5
T < N ’ . ‘ / l o g N . If q stages are used for Phase 1,
then Phase 2 requires N(y 1)/log N processors.
Setting T equal to this quantity, me get that q should
be (Tlog N ) / N - 1. Using t.liis many stages for Phase
1, we obtain an algorit.lini t.liat,uses T processors and
whose time complexity is O(((I N / ( q 1))log N) =
O(T(1ogN ) 2 / N- log N + N 2 / T ) .Hence t,lie product.
of time and processors is O(T’(1og iV)’/N-T log N +
N 2 ) ,which is O ( N 2 )since we are assuming that T <
NI.’/ log N . Thus t.he algorit,lini is optminial.
3.4
As previously remarked, the quantities P ( n ,k),
P’(n.,k ) , etc., grow exponentially as n increases. The
+
+
3.3
Other variations
Taking the magnitude of the computed quantities into account
asymptotic behavior of P ( n ) has been extensively
analyzed; in [7] may be found tlie following approximation: log P ( n ) A f i , where A =
The quantities P(n.,k), P’(n,k), Ps(n,k), etc. are
of course no larger than P(n). Suppose that W
words are required to hold each computed quantity.
Adding 2 int,egers, each of which occupies W words,
may he performed in parallel in O(1og W) time using
[lV/ log 14’1 processors (O(1og W) time is required to
propagate the carry bits; W / log W instead of W processors suffice by using the technique of processor improvement, [Brent]). Hence the algorithms presented
in this section may be amended so t h a t the time
required is multiplied by a factor of logW and the
number of processors is multiplied by [W/ log W l .
If we pessimistically assume that the word size of
each processor is O(1og N), where N is the largest
integer for which partitions are being counted, then
each of tlie quant,ities to be computed requires a t
most Vi/* = U ( f i / l o g N ) words. So the basic algorithm will require O(N2/(logN ) 3 ) processors and
O( n ( l o g N)’) time. This algorithm is still optimal since O(N2.’/log N) values (words) are being
computed.
Note however that for many practical purposes,
setting MI to a small constant will suffice. In this
case it is probably more efficient to perform each
single sum sequentially (in O ( W )steps) rather than
multiplying the number of processors by [W/ log Wl .
Also we remark that tlie same problem must be dealt
with in any sequential algorithm t h a t computes these
quantities.
-
+
~m.
zyxwvu
A similar algoritliiii can be used in tlie more
general case where each part of the partition must
belong to a specified set S = {SI, ...,s p } , where
1 5 SI < s2 < ... < sI, 5 AT and p = \SI > 0.
This algorithm is based on tlie following formulas,
which are generalizations of tlie formulas used in the
basic case. The details are oniitted.
The algorithm may also be adapted to count partitions consisting of distinct parts. Two phases are
again used, but the same partition identity is used
in both phases, namely P D s ( n , k ) = PDL(n s k , k - 1). At iteration k of Phase 1, the values
P D s ( n , k ) are computed for all I I 2 SI: using the
56
4
References
Generating partitions
zyxw
zyxwvutsr
zyxwvutsrq
zyxwvuts
zyxwvutsr
zyxwvut
zyxwvut
zyxwvut
[l] Seliin G. Akl. Adaptive and optimal parallel algorithms for enumerating permutations
The Computer Journal,
and combinations.
30( 5):433-43G, 1987.
The idea behind the generating algorithms is to
first generate in parallel all of the random choices
that may be necessary for the construction of the
random partition. The rniidoiii choices that are actually used caii be combined to produce the required
partition in O(1ogN) parallel time, where N is the
integer being partitioned. We describe an algorithm
for generating a random partit,ioii of N with parts
restricted to lie in the set S and with largest part
equal to S I C . In order to generatre a raiidom partition of N , first choose the largest. part by generating a random number between 0 and P s ( N ) and
using the probabilities derived from the quantities
Pk(N, l),...,P k ( N , p ) (this can be done in constant
parallel time using p processors). The algorithm relies on the following formula:
P s ( n , k ) = Ps(n,- s k , k ) + Ps(n - s k
[2] Selim G. Akl. The Design and Analysis of Parallel Algorithms. Prentice Hall, 1989.
(31 S.G. Akl and I. Stojmenovic. Parallel algorithms
for generating integer partitions and compositions.
Technical Report TR-91-34, Computer Science Department, University of Ottawa, September 1991, to appear in Journal of
Combinatorial Mathematics and Combinatorial
Computing.
[4] G.H. Cheii and Maw-Sheng Chern. Parallel generation of permutatioiis and combinations. BIT,
26277-283, 1986.
+ ~ k - ~ -, k1)
We assume the processors are indexed by tlie tuples
(n,k),for 1 5 SI; 5 11 5 N .
The following is a sketch of t.he algorithm. More
details will be found in another version of this paper.
First, each processor ( n ,k) assunies t,liat, a partition
of n haviiig largest part equal to s k must be generated, and decides, using the appropriate probabilities (derived from the Ps(I I . , k) quantities) whether
the second largest part should also equal sk (in which
case Nezt(,,,I;)is set equal t,o 1 ) or whether this part
should be siiialler t.lian sk (in which case Nezt(,,,k)
is set equal to 0). In eit,lier case, a link is established to the processor t,liat, would make the next,
choice about the part,ition. Once the links are est,ablished, followiiig t,he links from processor (Ar, I<)
identifies all the processors t,liat act,ually participate
in choosing the partit,ion. There are at. most N of
these, and they are marked active. The next step is
to identify tlie processors t,liat actually specify the
size of a part of t,he generat.ed partition. These are
the processors whose parent link conies from a processor ( n , k ) wit,li Ne.z.t(,,k) = 1. These processors
are marked Chosen. The links are updated so that
processors that are not marked Chosen are skipped
in the linked list, of processors; the skipped processors are deactivated. Finally, each chosen processor
can be assigned the part, number in t,he sequence for
which it is responsible, so that. it, can write out the
part in t,he correct, posit,ion.
The algorit,liin as described above takes O(1og N )
time and requires at most, n'p processors. Adjustineiits may be made to take into account, t,he magnitude of the partition quantit.ies (det,ails omitted from
this version). Using other partition iclent,ities, similar algorithms can be devised for the random geiieratioii of the other t,ypes of part,it,ioiis discussed in
this paper.
(5) T.I. Feiiiier and G. Loizou. Tree traversal related algorithms for generating integer partitions. SIAM Journal on Computing, 12(3):5515G4, 1983.
[G) Alan Gibbons and Wojciecli Rytter.
Bfi-
cient Parallel Algorithms. Cambridge University Press, 1988.
[7] Marshall Hall. Combinatorial Theory. John Wiley 8c. Sons, 1986
[8] R.E. Ladiier and M.J. Fischer. Parallel prefix
computation. Journal of the ACM, 27(4):831838, 1980.
[9] Chau-Jy Liii aiid Jong-Chuang Tsay. A systolic generation of combinations. BIT, 29:23-36,
1989.
zyxwv
[lo] T.V. Narayana, R.M. Mathsen, and J. Sarangi.
An algorithm for generating partitions and its
applications. Journal of Combinatorial Theory,
11:54-G1, 1971.
[ll] Albert, Nijenhuis and Herbert S. Wilf. Combinatorial Algorithms. Academic Press, 1978.
[12] Saiiguthevar Rajasekaran and John H. Reif. Optiinal aiid sublogarithinic time randomized parallel sorting algorithms. SIAM Journal on Computing, 18. 1989.
[13] W. R.iha and K.R. James. Efficient algorithms
for doubly aiid multiply restricted partitions.
Computing, 16:163-1G8, 1976
[14] Carla D. Savage. Gray code sequences of partitions. Journal of Algorithm.s, 10:577-595, 1989.
57