ALGO p2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

52. NoU, R. Reforming Regulation. The Brookings Inst.

, Washington,
D.C., 1971.
53. Parker, D.B. Crime by Computer. Scribners, New York, 1976.
54. Peterson, H.E., and Turn, R. System implications of information
privacy. Proc. AFIPS 1967 SJCC, Vol. 30, AFIPS Press, Montvale,
N.J., pp. 291-300.
55. Portway, P.S. EFT systems? No thanks, not yet. Computerworld Programming S. L. Graham, R. L. Rivest
12, 2 (Jan. 9, 1978), 14--16, 21, 23-25. Techniques Editors
56. Privacy Protection Study Commission. Personal Privacy in an
Information Society. U.S. Gov't. Printing Office, Washington, D.C.,
July, 1977.
57. Prives, D. The explosion of state laws on electronic fund transfer
Fast Parallel Sorting
systems. P-76-1, Prog. Inform. Technologies and Public Policy,
Harvard U., Cambridge, Mass., 1976.
Algorithms
58. Reid, S. The New Industrial Order: Concentration Regulation and
Public Policy. McGraw-Hill, New York, 1976. D. S. Hirschberg
59. Richardson, D.W. Electric Money:. Evolution of an Electronic Rice University
Funds-Transfer System. M.I.T. Press, Cambridge, Mass., 1970.
60. Rose, S. More bang for the buck: The magic of electronic
banking. Fortune 95, 5 (1977), 202-226.
61. Rossman, L.W. Financial industry sees EFT privacy laws
adequate. American Banker CXLL 210 (Oct. 28, 1976), 1, 11. A parallel bucket-sort algorithm is presented that
62. Rule, J. Private Lives and Public Surveillance. Schocken Books, requires time O(log n) and the use of n processors. The
New York, 1974. algorithm makes use of a technique that requires more
63. Rule, J. Value Choices in Electronic Funds Transfer Policy.
Office of Telecommunications Policy, Executive Office of the space than the product of processors and time. A
President, Washington, D.C., Oct. 1975. realistic model is used in which no memory contention
64. Saltzer, J., and Schroeder, M. The protection of information in is permitted. A procedure is also presented to sort n
computer systems. Proc. IEEE 65, 9 (Sept. 1975), 1278-1308.
65. Sayre, K., Ed. Values in the Electric Power Industry. U. of Notre numbers in time O(k log n) using n 1+1/k processors, for
Dame Press, Notre Dame, Ind., 1977. k an arbitrary integer. The model of computation for
66. Schick, B. Some impacts of electronic funds transfer on consumer this procedure permits simultaneous fetches from the
transactions. Federal Reserve Bank of Boston. The economics of a
national electronics funds transfer system. Conf. Ser. No. 13, Boston, same memory location.
Mass., Oct. 1974, pp. 165-179. Key Words and Phrases: parallel processing,
67. Schuck, P.H. Electronic funds transfer: A technology in search of sorting, algorithms, bucket sort
a market. Maryland Law Review 35, 1 (1975), 74--87.
68. Schultze, L. The public use of the private interest. Harpers 254, CR Categories: 3.74, 4.34, 5.25, 5.31
1524 (May 1977), 43-62.
69. Simpson, R.C. Money transfer services. Computers and Society 7,
4 (Winter 1976), 3-9.
70. Steifel, R.C. A checkless society or an unchecked society?
Computers and Automation 19 (Oct. 1970), 32-35.
71. Sterling, T., and Laudon, K. Humanizing information systems. There is often a time-space tradeoff in serial algo-
Datamation 22, 12 (Dec. 1976), 53-59.
72. The time is NOW. Forbes Magazine 120, 1 (July 1, 1977), 61-62.
rithms. In order to solve a problem within a certain time
73. Turoff, M., and Mitroff, I. A case study of technology assessment bound, a minimal amount of space is required. This
applied to the "cashless society" concept. Technol. Forecasting Soc., space requirement may be reduced if we aUow more time
Change 7 (1975), 317-325.
74. Weissman, C. Secure computer operation with virtual machine
for the process. In the limit, there wiU be a minimum
partitioning. Proc. AFIPS 1975 NCC, Vol. 44, AFIPS Press, amount of space required.
Montvale, N.J., 1975, pp. 929-934. Much work has recently been devoted to developing
75. U.S. Dept. HEW, Secretary's Advisory Committee on Automated
Personal Data Systems. Records Computers, and the Rights of
algorithms for paraUel processors. Problem areas include
Citizens. Washington, D.C., 1973. sorting [3, 6, 16, 17], evaluation of polynomials, and
76. Walker, G. M. Electronic funds transfer systems. Electronics general arithmetic expressions [14, 4], and matrix- and
(July 24, 1975), 79-85.
77. Webber, M. The BART experience--What have we learned? The
graph theoretic problems [15, 5, 2, 12, 9]. In parallel
Public Interest, Vol. 45 (Fall 1976), 79-108. algorithms, there is a similar tradeoff between time and
78. Weizenbaum, J. Computer Power and Human Reason. Freeman, the number of processors used. In order to solve a
San Fransisco, 1976.
79. Wessel, M. Freedom's Edge: The Computer Threat to Society.
problem using a bounded number of processors, a min-
Addison-Wesley, Reading, Mass., 1974. imal amount of time is required. This time requirement
80. Westin, A., and Baker, M. Databanks in a Free Society.
Quadrangle Books, New York, 1972.
81. Whiteside, T. Computer Capers. Crowell, New York, 1978. General permission to make fair use in teaching or research of all
82. Wilcox, C., and Shepard, W. Public Policies Towards Business. or part of this material is granted to individual readers and to nonprofit
Richard D. Irwin, Homewood, Fifth ed., 1975. libraries acting for them provided that ACM's copyright notice is given
83. Winner, L. Autonomous Technology: Technology Out-of-Control and that reference is made to the publication, to its date of issue, and
as a Theme in Political Thought. M.I.T. Press, Cambridge, Mass., to the fact that reprinting privileges were granted by permission of the
1977. Association for Computing Machinery. To otherwise reprint a figure,
8,1. Wise, D. The American Police State. Random House, New York, table, other substantial excerpt, or the entire work requires specific
1976. permission as does republication, or systematic or multiple reproduc-
tion.
Research supported by NSF grant MCS-76-07683.
Author's present address: Department of Electrical Engineering,
Rice University, Houston, TX, 77001.
© 1978 ACM 0001-0782/78/0800-0657 $00.75

657 Communications August 1978


of Volume 21
the ACM Number 8
m a y be reduced if we allow more processors to help in search for the presence of (the marks of) other active
the computation. In the limit, there will be a m i n i m u m processors. I f two processors discover each other's pres-
time requirement. ence (such discoveries will turn out to be simultaneous),
We shall present an algorithm design technique the lower ranking one (i.e. the one with smaller index)
which dramatically exemplifies the three-way tradeoff will continue while the higher ranking one will deacti-
between space, time, and processors. It is hoped that this vate.
technique can also be applied to strictly serial algorithms. There being n locations per area (numbered 0
Muller and Preparata [13] were the first to exhibit a through n - 1), each processor pi can make its m a r k at
network capable of sorting n numbers in time O(log n). location i in area ci without fear of m e m o r y conflict.
Their method required O ( n 2) processing elements. We Iteratively, each processor then determines whether or
shall present an algorithm for sorting n numbers in time not its " b u d d y " is active within the same area. (Here,
O(log n) that requires asymptotically fewer processors. " b u d d y " is defined analogously to the definition used in
We first present parallel bucket-sorting algorithms in the Buddy System for dynamic m e m o r y allocation
which, at the expense of greater space requirements, the [10].) I f so, then the processor with higher rank (i.e.
n u m b e r of processors and the amount o f time used are larger index 0 will deactivate. I f the buddy is not active
both reduced. The algorithms are unusual in that the or if it is active but is of higher rank, then the processor
space requirements are greater than the processor time will continue, shifting its m a r k to the location o f the
requirements. buddy if that location is of lower index than the one
Our computational model assumes that all processors currently in use. I f the buddy was active, then, of course,
have access to a c o m m o n m e m o r y as well as having no shift will occur.
small local memories. All processors are synchronized After the kth iteration, a m a r k will be present at each
and follow the instructions of a unique instruction location whose last k bits are zeros and whose other
stream. This model has been called an S I M D (Single (log n) - k bits coincide with the corresponding bits of
Instruction stream, Multiple D a t a stream) computer [7]. the address of a processor active in the same area. Thus,
The instructions m a y involve m e m o r y references or con- each such location will be marked iff any of 2 k processors
stants that are a linear function of the bits in the binary had been active in that area originally. After log n
representation of the p r o c e s s o r n u m b e r , the processors iterations, the first location in an area will be marked iff
being numbered by consecutive integers starting from any o f the n processors originally were active in that
zero. We assume that the addition of two numbers can area, i.e. iff any of the n numbers to be sorted was j, the
be executed in one time unit. area bucket number.
Our preliminary algorithm will sort n numbers using The algorithm is expressed formally below. Variables
n (parallel) processors in time O(log n) under the as- in capitals are in c o m m o n memory, variables in lower
sumption that the numbers that are to be sorted, {c i}i=l, n case are in local m e m o r y (i.e. there will be one copy of
are from {0, 1..... m - 1} and with the proviso that each such variable for each processor).
duplicate numbers (should there be any) are to be dis-
carded. The proviso will be dropped in the second al-
gorithm. This is a parallel version of the "bucket sort."
An obvious implementation of the parallel bucket Algorithm l - - p a r a l l e l bucket sort
Input: 0_<i_<n- l A [ j , t ] = O
sort would be for each processor p i (which has tempo-
O<_j<_m-1 B[j]=O
rarily been "assigned to" cz, the ith n u m b e r being sorted) ci E {0, 1..... m - l}, not necessarilydistinct
to place the value i in bucket ci. The problem with this Output:O<i_n- 1 A[j,i]=O
solution is that, in general, there m a y be several values O <_j <_ m - 1 B[j] = min is.t. ci = j , O if none such
of i with identical numbers c~. A m e m o r y conflict would Let ek -- 0 ... 0 10 ... O, all bits 0 except the kth from the
result from the simultaneous attempts o f several proces- fight
sors to store different values of i into the same bucket.
for all / d o
Our answer to this problem is to eliminate duplicate
Let i = x~og,~ ... x2x~ be the binary representation of i
copies of the same number. Processor p~ will be (tempo- x~ i x is the location that p~ is
rarily) deactivated if there is another processor p j whose marking
index, j, is smaller than i, and cj = ci. Then, for each .4[c, x] ~ 1
n u m b e r appearing a m o n g the numbers being sorted, flag +- 1 flag = 1 iffpi is active
only one processor (the one with smallest index) will be for k ,--- 1 step 1 until log n do
begin
active when we place i in bucket c~. buddy +- x • ek address of buddy
Our implementation of the elimination procedure is count ~ A[ci, buddy] count ~ 0 if buddy is active
interesting. We shall have m areas of memory, one for if xk = i AND count ~ 0 then
each bucket. Each area will be of size n, the n u m b e r of flag ~ 0 if buddy is active and if we are
input numbers to be sorted. Within each area, j, the the higher of the 2 then we'll
wait and buddy will continue
processors ( p i ) having ci = j will leave marks indicating if xk = 1 AND count = 0 AND
their presence. Then, in a binary-tree fashion, they will flag = 1 then begin
658 Communications August 1978
of Volume 21
the ACM Number 8
A [ci, x] ..- 0 if buddy is not active then there then if flag = 1 then [flag~--O;
x ~ buddy is no problem in using his r*--k- 1]
A [ci, x] ~ 1 space else null (eliminates "dangling else")
end else (count=O) buddy is inactive
end if flag = 1 then
if flag = 1 then B[ci] ~ i begin if we are still active then our
A[c,, x] ,.- 0 Alc~. x] ~ - A[c, y] count must be moved to head
A[ci, y] * " 0 of the block
It is n o t e d t h a t this b u c k e t - s o r t a l g o r i t h m r e q u i r e s y~-- X
s p a c e S = O ( m n ) , t i m e T -- O ( l o g n), a n d t h e use o f n end (of then block)
processors. end (of else x~ = 1)
We now present another bucket-sort algorithm. This end (of for loop)
Blcd ,.- A[c~, y] y will be zero here
algorithm will give the actual ranking of the input num-
bers, e q u a l n u m b e r s b e i n g k e p t in t h e s a m e r e l a t i v e o r d e r A t this p o i n t w e h a v e i s o l a t e d o n e r e p r e s e n t a t i v e o f
but getting different ranks, assuming that the input num- e a c h n u m b e r t h a t a p p e a r s a m o n g t h e n u m b e r s to b e
b e r s a r e f r o m a p r e d e f i n e d s m a l l set. sorted and we have obtained a count of how many times
T h e a l g o r i t h m f o l l o w s t h e s a m e b a s i c p a t t e r n set b y e a c h n u m b e r occurs. W e n o w w i l l a c c u m u l a t e t h e c o u n t s
our previous algorithm. However, instead of a simple (for e a c h n u m b e r ci) o f all n u m b e r s t h a t a r e g r e a t e r t h a n
m a r k bit, w e shall k e e p a r u n n i n g c o u n t o f h o w m a n y ci in o r d e r to k n o w t h e a c t u a l r a n k i n g o f t h e n u m b e r s ,
p r o c e s s o r s w e r e o r i g i n a l l y a c t i v e in e a c h b l o c k o f i n d i c e s a s s u m i n g t h a t d u p l i c a t e n u m b e r s w i l l b e kept. T h i s
o f size 2 k. I f a p r o c e s s o r e n c o u n t e r s a n a c t i v e b u d d y t h e n a c c u m u l a t i o n w i l l b e d o n e in a m a n n e r s i m i l a r to t h a t
o n l y t h e l o w e r b u d d y c o n t i n u e s to b e active. I n a n y case, used previously.
all p r o c e s s o r s p i ( a c t i v e o r n o t ) will a d d to t h e i r r u n n i n g
c o u n t o f t h e n u m b e r o f p r o c e s s o r s (that w e r e o r i g i n a l l y
a c t i v e ) h a v i n g i n d i c e s g r e a t e r t h a n i. A c t i v e p r o c e s s o r s
A l g o r i t h m 2 . 2 - - p a r a l l e l b u c k e t sort ( p a r t 2)
k e e p t h e i r c o u n t at t h e h e a d o f t h e largest b l o c k t h a t t h e y Input: from Algorithm 2.1
h a v e i n v e s t i g a t e d ( w h i c h w i l l be o f size 2k). A t t h e e n d Output: 0 <_ i _< n - 1 D[i] = sorted position of c~ only for the first
t h e r e will be at m o s t o n e a c t i v e p r o c e s s o r p e r a r e a a n d instance of each c~
A [ j , 0] w i l l b e t h e n u m b e r o f d i f f e r e n t f s s u c h t h a t larger values of cl will have smaller values of D = (CAof k
s.t. ck>ci) + 1
ci = j . A n i n a c t i v e p r o c e s s o r , p~, w i l l k e e p its c o u n t at t h e
for all i do
h e a d o f t h e largest b l o c k w h i c h h a d n o o t h e r p r o c e s s o r s o[0 , - 0
o f i n d e x s m a l l e r t h a n i. if flag = 1 then
begin Let ci = Wlog. . . . w2w~ be the binary representation of c~
Algorithm 2 . 1 - - p a r a l l e l b u c k e t sort ( p a r t 1) flag2 ~-- 1
Input: 0_<i_<n- 1 A[j,t]=0 W<---Ci w is head of the largest block
O<_j<_m - 1 B[J]=0 that pi has counted
cl E {0, 1..... m - 1} not necessarily distinct Z ~"'Ci z is the location at which pl is
Local accumulating that count
Output: r = max k s.t. ~:1 t in same 2k-block as i B[z] now = # of k s.t. ck = z
with t. < i and ct = ci for k ,-- 1 step 1 until log m do
y = head of 2r-block containing i begin
Output: O<_i<_n - 1 A[j,t]=0exceptA[ci, y ] = # o f t _ > i s . t ; buddy ~- w • ek
Ct = Ci count ~ B[buddy] B[z] will = # of k s.t. ck -> z
0 <_j <-- m - 1 B[j] = rain i s.t. cl = j, 0 if none such if Wk= 0 then
for all i do BIzl ,-- BIz] + count
flag ~-- 1 else (wk = 1)
r ~ log n begin w ~-- buddy
x~--i x is the head of the largest block if count # 0 then
that pi has counted flag2 ,.- 0
y~--i y is the location at which pl is else if flag 2= 1 then
accumulating that count begin
A [ci. y] ~-- 1 A[c,, y] will hold # of t s.t. B[w] *-- B[z]
t_> iand ct = ci B[z] ,.- 0
for k ~-- 1 step 1 until log n do g~--w
begin end (of then block)
buddy ~- x @ eh end (of else Wk=I)
count ~ A [cl. buddy] end (of for loop)
if xk = 0 then A[ci. y] ..-- A[ci. y] if buddy is at higher index then Dlt] ,-- BIz] - .4[c,, y] + l
+ count accumulate count A[c, y] ,-- B[z]
else (Xk = 1) B[z] ,--- 0
begin end (of if flag= 1)
x ~-- buddy buddy is head of the 2k-size
block
A t this point, t h e r e p r e s e n t a t i v e o f e a c h n u m b e r t h a t
if count ~ 0 if buddy is active then we are
not lowest ranked in this 2k- a p p e a r s a m o n g t h e (ci} h a s a c o u n t o f t h e t o t a l n u m b e r
block o f c / s t h a t a r e g r e a t e r t h a n it p l u s t h e n u m b e r o f ct's t h a t

659 Communications August 1978


of Volume 2 l
the ACM Number 8
are equal to it. Each of the duplicates has a count of the to the same location is prohibited even for memory-fetch
number of ct's that are equal to it but of higher index. instructions. It is possible that several entries that are
The D-value, i.e. rank, is just the difference of these two accessed in parallel may have random contents, more
quantities plus one. than one o f which points to the same location. A memory
The D-value of the representatives was calculated at fetch conflict would ensue.
the end of Algorithm 2.2 and so the D-values of the For situations similar to that in Algorithm 2, the
dupl:icates can be calculated by: following is a possible solution. At each step of the
for all i do
process, each active processor can initialize the location
if flag ffi 0 t h e n D[z] ..--A[ci, 0] -A[ci, y] + 1 its buddy is working on (to zero), then reinitialize the
contents of the location it is working on (to its latest
If we insist that not more than one processor may value). A location will thus be initialized if either of the
simultaneously access a location, not even for fetches, two processes to which it might be relevant is active.
then the D-values of duplicates can be evaluated using We now present algorithms that will sort n arbitrary
the reverse of the procedure of Algorithm 2.1. This is numbers in time O(log n). They are based on an exten-
done below in Algorithm 2.3. sion of an algorithm due to Gavril [8] that merges two
linearly-ordered sets in time O(log n). Our first algorithm
to do this, Algorithm 3, will require the use of n 3/2
processors.
Algorithm 2.3--paraUel bucket sort (part 3)
Algorithm 3--paraUel sort using n a/2 processors
Input: from Algorithm 2.2
Input: 0 <_ i _< n - 1 ci E integers
Output: 0_<i_<n- 1A[j,i 1 =0
Output: {ci} will be stably sorted, smallest first
0 _<j _< m - l D[ll = sorted position o f c~
=#ofks.t. ck>c~ 1. Partition the n input n u m b e r s into n ~/2 groups, each having n ~/2
+ # o f k -<- is.t. Ck = ci elements.
larger values of ci will have smaller values of D 2. Within each group do
equal values o f c~ will keep relative order For each element, j, determine count[j] = ( # of i such that c,<cj) +
for all i d o ( # o f i<_j such that ci=cj). This can be done in time O(log n) using
for k *-- (log n) - 1 step - 1 until 0 do n ~/2 processors per element (a total o f n processors per group or n 3/2
if r = k then processors in toto). The n ~/2 processors for element j will be assigned,
begin get value ofA [ci, 0] from buddy one to each element i in j ' s group, to compare c~ with cj. S u m m i n g
x ~ y ~9 e,. location the results o f these comparisons can be done in time O(log n).
D[i] ~ Alci. x] - A l c l . y] + 1 3. Within each group do
A[ci, y] *'-" A[ci, x] Bucket sort the elements, using count[j] as the key for the j t h
end element in the group. This is done by: Ccoun~jl ",-- ci, where j and
else if r > k then count[j] are offsets (of value at most n ~/2) from the beginning o f
begin each group. There will be no m e m o r y conflicts since the count[j]'s
x ~--y O R ( / A N D ek) within a group are all distinct. Steps 2 and 3 have effectively sorted
if x ~ y then begin the elements within each group using an "Enumeration Sort" [1 I].
A[ci, x] *-- A[ci, y] 4. All elements do a binary search o f the n ~/2 groups. That is, each
.4[c, yl ,---0 element(cj) in group g, has n ~/2 processors which are assigned, one to
end (of then block) each group, to do a binary search on the elements in a group (which
end (of if r > k ) are sorted) so as to determine, for all groups k, the value of
(end o f for loop)
count[j, k] = if k < g, # o f elements i such that cl <-- cj
a l c , y] ~ - 0
ilk = g,j
It is noted that Algorithm 2 (the sequence of Algo- if k > g, # o f elements i such that c~ < ci
rithms 2.1, 2.2, 2.3) requires space S = O ( m n ) , time where ci refers to the ith element in group k and cj is fLxed.
T = O(log n + log m), and the use of n processors. 5. For all elements, j, evaluate count[j] = s u m (over k) of count-
The algorithms given above assume that an area (,4) [j, k]. This can be done in time O(log n) and requires n ~/2 processors
per element for a total of n 3/2 processors.
o f memory has been initialized to zero. This is not 6. Do a bucket sort on all n elements using count[j] as the key for the
unreasonable. Many instances o f this algorithm can be j t h element. Again, there will be no m e m o r y conflicts since count[j]
executed one after another. The memory will be clear will be the rank o f t h e j t h element.
upon the termination of each program. 7. E N D o f Algorithm 3.
However, there are methods which make it unnec- We note that Algorithm 3 requires time O(log n) and
essary to initialize the area. For serial programs, one can the use o f n 3/2 processors. We now show a simple modi-
include at each location a pointer to a backpointer on a fication of Algorithm 3 which will use the same order of
stack. Each time an entry is accessed, verification can be magnitude of time and require only n 4/3 processors.
made that the contents are not random by checking that
the pointer in that entry points to the active region on Algorithm 4---parallel sort using n 4/3 processors
the stack and that the backpointer points to the entry 1. Partition the n input n u m b e r s into n 2/3 groups each having n ~/3
elements.
Ill • 2. Within each group do
This method is also valid for parallel programs unless For each element, j, determine count[j] = # of i such that c~ < cj)
we add the restriction that simultaneous multiple access + ( # o f i _<j such that cl = ci).

660 Communications August 1978


of Volume 2 l
the A C M Number S
3. Within each group do 10. Knuth, D.E. The Art o f Computer Programming, Vol. 1:
Bucket sort the count[j]'s obtained in step 2. This will rearrange the Fundamental Algorithms. Addison-Wesley, Reading, Mass., Sec. Ed.,
elements in rank order within each group. 1973.
4. Divide the n 2/3 groups into n '/3 sectors, each sector consisting o f n '/3 11. Knuth, D.E. The Art o f Computer Programming, Vol. 3: Sorting
and Searching. Addison-Wesley, Reading, Mass., 1973.
groups.
12. Levitt, K.N., and Kautz, W.H. Cellular arrays for the solution of
5. Within each sector do
graph problems. Comm. A C M 15, 9 (Sept. 1972), 789-801.
For each element ( j ) in group g, do a binary search of each of the 13. Muller, D.E., and Preparata, F.P. Bounds to complexities of
n '/a groups i n j ' s sector to determine, for all k, the value of networks for sorting and for switching. J. A C M 22, 2 (April 1975),
195-201.
count[j, k] = if k < g, # of i in group k such that ci <- ci 14. Munro, I., and Paterson, M. Optimal algorithms for parallel
ilk = g,j polynomial evaluation, J. Comptr. Syst. Sci. 7 (1973), 189-198.
if k > g, # of i in group k such that ci < cj. 15. Muraoka, Y., and Kuck, D.J. On the time required for a
sequence of matrix products. Comm. A C M 16, 1 (Jan. 1973), 22-26.
Then, for each element j, evaluate count[j] = ( # of i in j ' s sector 16. Stone, H.S. Parallel processing with the perfect shuffle. I E E E
such that ci < cj) + ( # of i <_j in j ' s sector such that ci = cy). This Trans. Comptrs. C-20 (1971), 153-161.
number is simply the sum (over k) of count[j, k]. 17. Thompson, C.D., and Kung, H.T. Sorting on a mesh-connected
6. Within each sector, do a bucket sort of the elements within the parallel computer. Proc. 8th Annual ACM Symp. on Theory of
sector using count[j] as they key for element./'. This will rearrange the Comptng. Hershey, Pa., May 1976, pp. 58-64.
elements in rank order within each sector. 18. Valiant, L.G. Parallelism in comparison problems. S I A M J.
7. For all elements (j) in sector t, do a binary search of each of the n '/3 Comping. 4, 3 (Sept. 1975), 348-355.
sectors to determine, for all k, the value of
count[j, k] = i l k < t, # o f i in sector k such that cz <- ci
ilk = t,j
if k > t, # of i in sector k such that ci < cj.
Then evaluate count[j] = the sum (over k) of count[j, k].
8. Do a bucket sort of all n elements.
9. END of Algorithm 4.

In a like manner, we can exhibit an algorithm to sort


n numbers in O(k log n) time that uses n 1+1/k processors.
Interestingly, by setting k -- log n (initially splitting the
n elements into n/2 groups of 2 each), we obtain an
algorithm to sort n numbers in O(log 2 n) time using O(n)
processors, the same resources used by Batcher's algo-
rithms.
We note that these algorithms, although avoiding
memory-store conflicts, do have memory-fetch conflicts.
That is, we allow more than one processor to simulta-
neously access the same memory location. As an open
problem, we pose the question: Can n numbers be sorted
in time O(log n) if memory-fetch conflicts are not per-
mitted?

Received December 1976; revised September 1977

References
!. Aho, A.V., Hopcroft, J.E., and Ullman, J.D. The Design and
Analysis o f Computer Algorithms. Addison-Wesley, Reading, Mass.,
1973, p. 71.
2. Arjomandi, E. A study of parallelism in graph theory. Ph.D. Th.,
Dept. of Comptr. Sci., U. of Toronto, Toronto, Ont., Dec. 1975.
3. Batcher, K.E. Sorting networks and their applications. Proc.
AFIPS 1968 SJCC, Vol. 32, AFIPS Press, Montvale, N.J., pp.
307-314.
4. Brent, R.P. The parallel evaluation of general arithmetic
expressions. J. A C M 21, 2 (April 1974), 201-206.
5. Csansky, L. Fast parallel matrix inversion algorithms. Proc. 16th
Annual Symp. on Foundations of Comptr. Sci., IEEE, Berkeley,
Calif., Oct. 1975, pp. 11-12.
6. Even, S. Parallelism in tape-sorting. Comm. A C M 17, 4 (April
1974), 202-204.
7. Flynn, M.J. Very high-speed computing systems, Proc. I E E E 54
(Dec. 1966), 1901-1909.
8. Gavril, F. Merging with parallel processors. Comm. A C M 18, 10
(Oct. 1975), 588-591.
9. Hirschberg, D.S. Parallel algorithms for the transitive closure and
the connected component problems. Proc. 8th Annual ACM Syrup.
on Theory of Comptng. Hershey, Pa., May 1976, pp. 55-57.

661 Communications August 1978


of Volume 21
the ACM Number 8

You might also like