K Dim DC
K Dim DC
K Dim DC
Introduction
L 0- . . . . .
A B
V
LJ
----n.~
I
I V
I I
I !
I I
I
I I
I i
S. The first step of a query algorithm compares the x- In analyzing the search time our recurrence will depend
value of the query point with the line L; the two possible on whether the point lies in A or B, so we assume it lies
outcomes are illustrated in Figure 5 as points u and v. If in B and analyze its worst case. In this case we must
the point lies to the left of L (as u does), then we fmd its make one comparison, perform a binary search in a
rank in S by recursively searching the substructure rep- structure of size N/2, and then recursively search a
resenting A, for it cannot dominate any point in B. If the structure of size N/2. The cost of this will be
point lies to the right o f L (as v does), then searching B
Q(N) = Q(N/2) + O(lg N)
tells how many points in B are dominated by v, but we
still must find how many points in A are dominated by so we know that the worst-case cost of searching is
v. To do this we need only calculate v's y-rank in A; this Q(N) = O(lg 2 N).
is illustrated in Figure 6.
We can now describe the planar E C D F tree more Having analyzed the performance of the planar E C D F
precisely. An internal node representing a set o f N points tree, we can turn our attention to higher-dimensional
will contain an x-value (representing the line L), a E C D F searching problems.
pointer to a left son (representing A, the N/2 points with A node representing an N-element E C D F tree in 3-
lesser x-values), a right son representing B, and an array space contains two subtress (each representing N/2
of the N/2 points of A sorted by y-value. To build an points in 3-space) and a two-dimensional E C D F tree
E C D F tree recursively one divides the set into A and B, (representing the projection of the points in A onto the
builds the subtrees representing each, and then sorts the cut plane P). This structure is built recursively (analogous
elements of A by y-value (actually by presorting). To to the ECDF3 algorithm). The searching algorithm com-
search the tree recursively one first compares the x-value pares the query point's x value to the value defining the
of the node with the x-value of the query point. If the cut plane, and if less, searches only the left substructure.
query point is less, then only the left son is searched If the query point lies in B, then the right substructure is
recursively. If the value is greater, then the right son is searched, and a search is done in the two-dimensional
searched recursively, a binary search is done in the sorted E C D F tree. The full k-dimensional structure is analo-
y-sequence representing A to find the query point's y- gous: A node in this structure contains two substructures
rank in A, and the two ranks are added together and of N/2 points in k-space, and one substructure o f N/2
returned as the result. points in ( k - 0 - s p a c e . The recurrences describing the
To analyze this search structure we again use recur- structure containing N points in k-space are
rences. In counting the preprocessing cost we note that
P(N, k) = 2P(N/2, k) + P(N/2, k - 1 ) + O(N),
the recurrence describing the algorithm (with presorting)
S(N, k) = 2S(N/2, k) + S(N/2, k - l ) + O(1),
is
Q(N, k) = Q(N/2, k) + Q(N/2, k - l ) + O(1).
P(N) = 2P(N/2) + O(N)
We can use the performance of the two-dimensional
and the solution is
structure as a basis for induction on k, and thus establish
P(N) = O(U lg N). (for fixed values of k) that
To store an N element set we must store two N/2 element
P(N, k) = O(N lg k-1 N),
sets plus one sorted list of N / 2 elements, so the recurrence
S(N, k) = O(N lg k-a N),
is
Q(N, k) = O(lg k N).
S(N) = 2S(N/2) + N/2
It is interesting to note how faithfully the actions of the
which has solution
multidimensional divide-and-conquer algorithms are de-
s(_,v) = O ( N lg U). scribed by the recurrences. Indeed, the recurrences might
® ®
®
® []
[]
® ® ®
performing the same task rated on the two dimensions order and then scan that sorted list right to left, observing
of space efficiency and time efficiency. If we plot these successive "highest y-values so far observed" and mark-
measures as points in the x-y plane, then a point (pro- ing those as maxima. It is easy to prove that this algo-
gram) dominates another only if it is more space efficient rithm gives exactly the maxima, for a point is maximal
and more time efficient. The maximal programs of the if and only if all points with greater x-values (before it
set are the only ones we might consider for use, because on the list) have lesser y-values. The computational cost
any other program is dominated by one of the maxima. of the algorithm will be O(N lg N) for the sort and then
In general, if we are seeking to maximize some multi- O(N) for the scan. (So note that if we have presorted
variate goodness function (monotone in all variables) the list, then the total time for finding the maxima is
over some finite point set, then it suffices to consider linear.)
only maxima of the set. This observation can signifi- We can also develop a multidimensional divide-and-
cantly decrease the cost of optimization if many optimi- conquer algorithm to solve the planar problem. As be-
zations are to be performed. Such computation is com- fore, we divide by L into A and B and solve those
mon in econometric problems. subproblems recursively (finding the maxima of each
Problems about maxima are very similar to problems set). This is illustrated in Figure 8, in which the maxima
about ECDFs. If we define the negation of point set A of A are circled and the maxima of B are in boxes.
(written - A ) to consist of each of the points of A multi- Because no point in B is dominated by any point in A,
plied by -1, then a point is a maximum of A if and only the maxima of B are also maxima of the entire set S.
if its rank in - A is zero (for if it is dominated by no Thus the third step (the "marriage" step) of our algo-
points in A, then it dominates no points in -A). By this rithm must discard points which are maxima of A but
observation we can solve the all-points maxima problem not of the whole set, i.e., those maxima of A which are
in O(N lgk-I N) time and the maxima searching problem dominated by some point in B. Since all points in B x-
with similar preprocessing time and space and O(lg k N) dominate all points in A, we need check only for y-
query time, by using the ECDF algorithms of Section domination. We therefore project the maxima of A and
2.1. In this section we investigate a different multidimen- B onto L, then discard A-points dominated by B-points
sional divide-and-conquer algorithm that allows us to on the line. This third step can be easily implemented by
reduce those cost functions by a factor of O(lg N). The just comparing the y-value of all A-maxima with the
all-points maxima algorithm we will see is due to Kung maximum y-value of the B-maxima and discarding all
et al. [17] (although our presentation is less complicated A's with lesser y-value (we described it otherwise to ease
than theirs). The searching structure of this section is the transition to higher spaces). The running time of this
described here for the first time. Although the algorithms algorithm is described by the recurrence
that we will see are similar to the ECDF algorithms of
T(N) = 2T(N/2) + O(N)
the last section in many respects, they do have some
interesting expected-time properties that the ECDF al- which has solution O(N lg N).
gorithms do not have. Having made these introductory We can generalize the planar algorithm to yield a
comments, we can now turn our attention to the maxima maxima algorithm for 3-space. The first step divides into
problems, investigating first the all-points problem and A and B, and the second step recursively finds the
then the searching problem. maxima of each of those sets. Since every maxima of B
The maximum of N points on a line is just the is a maxima of the whole set, the third step must discard
maximum element of the set, which can be found in every maxima of A which is dominated by a maxima of
exactly N - 1 comparisons. Computing the maxima of N B. This is accomplished by projecting the respective
points in the plane is just a bit more difficult. Looking at maxima sets onto the plane and then solving the planar
Figure 7, we notice that the maxima (circled) are increas- problem. We could modify the two-dimensional maxima
ing upward as the point set is scanned right to left. This algorithm to solve this task, but it will be slightly more
suggests an algorithm: Sort the points into increasing x- efficient to use the "scanning" algorithm. Suppose we
lO HI
MID
A B
B A • . . . . . - - -o
.o
C ~D
i
i
I
I
I
I
I
and dominate point L." This kind of query is usually ing must include a term of O(F) in the analysis of
called an orthogonal range query because we are in fact query time.
giving for each dimension i a range Ri = [li, ui] and then We will now describe range trees, a structure intro-
asking the search to report all points x such that xi is in duced by Bentley [4]; as usual, we first examine the
range Ri for all i. A geometric interpretation of the query planar case. There are six elements in a range tree's node
is that we are asking for all points that lie in a given describing set S. These values are illustrated in Figure
hyper-rectangle. Such a search might be used in querying 10. The reals LO and HI give the minimum and maxi-
a geographic database to list all cities with latitude mum x-values in the set S (these are accumulated
between 37 ° and 41° N and longitude between 102 ° and "down" the tree as it is built). The real MID holds the x-
109 ° W (this asks for all cities in Colorado). In addition value defining the line L, which divides S into A and B,
to database problems, range queries are also used in as usual; we then store two pointers to range trees
certain statistical applications. These applications and a representing the sets A and B. The final element stored
survey of the different approaches to the problem are in the node is a pointer to a sorted array, containing the
discussed in Bentley and Friedman's [5] survey of range points of S sorted by y-value. A range tree can be built
searching. The multidimensional divide-and-conquer recursively in a manner similar to constructing an E C D F
technique that we will see has also been applied to this tree. We answer a range query asking for all points with
problem by Lee and Wong [ 18], Lueker [20], and Willard x-value in range X and y-value in range Y by visiting
[28] who independently achieved structures very similar the root of the tree with the following recursive proce-
to the ones we describe. dure. When visiting node N we compare the range X to
In certain applications of the range searching prob- the range [LO, HI]. If [LO, HI] is contained in X, then
lem we are not interested in actually processing each we can do a range search in the sorted array for all
point found in the query rectangle--it suffices rather to points in the range Y (all these points satisfy both the X
know only how many such points there are. (One such and Y ranges). If the X range lies wholly to one side of
example is multivariate density estimation.) Such a prob- MID, then we search only the appropriate subtree (re-
lem can be solved by using the E C D F searching algo- cursively); otherwise we search both subtrees. If one
rithm of Section 2.1 and the principle of inclusion and views this recursive process as happening all at once, we
exclusion. Figure 9 illustrates how four planar rank see that we are performing a set of range searches in a
queries can be combined to tell the number of points in set of arrays sorted by y. The preprocessing costs of this
rectangle R (we use "r" as an abbreviation for "rank"); structure and the storage costs are both O(N lg N). To
in k-space 2k range searches are sufficient. analyze the query cost we note that at most two sorted
The sorted array is one suitable structure for range lists are searched at each of the lg N levels of the tree,
searching in one-dimensional point sets. The points are and each of those searches cost at most O(lg N), plus the
organized into increasing order exactly as they were for number of points found during that search. The query
the E C D F searching problem of Section 2.1. To answer cost of this structure is therefore O(lg2N + F), where F
a query we do two binary searches in the array to locate (as before) is the number of points found in the desired
the positions of the low and high ends of the range; this range.
identifies a sequence of points in the array which are the The range tree structure can of course be generalized
answer to the query, and they can then be reported by a to k-space. Each node in such a range tree contains
simple procedure. The analysis of this structure for range pointers to two subtrees representing N / 2 points in k-
searching is very similar to our previous analysis: The space and one N point subtree in (k-l)-space. Analysis
storage cost is linear and the preprocessing cost is O(N of range trees shows that
lg N). The query cost is then O(lg N) for the binary P(N, k) = O(N lg k-1 N), S(N, k) = O(N lg k-1 N),
searches plus O(F), if a total of F points are found to be Q(N, k) = O(lg k N + F )
in the region. Note that any algorithm for range search- where F is the number of points found.
223 Communications April 1980
of Volume 23
the ACM Number 4
Saxe [24] has used the decision tree model of com- Fig, 11. Fixed-radius near neighbor algorithm.
putation to show a lower bound on the range searching
problem of approximately 2k lg N. Bentley and Maurer l
[7] have given a range searching data structure that A B
realizes this query time, at the cost of extremely high
storage and preprocessing requirements. An interesting
open problem is to give bounds on the complexity of this
problem in the presence of only limited space (or pre- Q
this direction.
3. Closest-Point Problems
4. Additional Work 9 The author cannot resist pointing out that the planar divide-and-
conquer paradigm is also used by police officers. Murray [22] offers
the following advice in a hypothetical situation: "A crowd of rioters far
In Sections 2 and 3 we saw many aspects of the outnumbers the police assigned to disperse it. If you were in command,
multidimensional divide-and-conquer paradigm, but the best action to take would be to split the crowd into two or m o r e
parts and disperse the parts separately." An interesting open problem
there are many other aspects that can only be briefly is to apply other algorithmic paradigms to problems in police work,
mentioned. The paradigm has been used to create fast thus establishing a discipline of "computational criminology."
References
!. Aho, AV., Hopcroft, J.E., and Ullman, J.D. The Design and
Analysis of Computer Algorithms. Addison-Wesley, Reading, Mass.,
1974.
2. Bentley, J.L. Multidimensional binary search trees used for
associative searching. Comm. ACM 18, 9 (Sept. 1975), 509-517. Programming R. Rivest
3. Bentley, J.L. Divide and conquer algorithms for closest point Techniques Editor
problems in multidimensional space. Unpublished Ph.D. dissertation,
Univ. of North Carolina, Chapel Hill, N.C., 1976.
4. Bentley, J.L. Decomposable searching problems. Inform. Proc.
Letters 8, 5 (June 1979), 244-251.
A Unifying Look
5. Bentley, J.L., and Friedman, J.H. Algorithms and data structures
for range searching. Comptng. Surv. 11, 4 (Dec. 1979), 397-409. at Data Structures
6. Bentley, J.L., Kung, H.T., Schkolnick, M., and Thompson, C.D.
On the average number of maxima in a set of vectors and Jean Vuillemin
applications. J. ACM 25, 4 (Oct. 1978), 536-543.
7. Bentley, J.L., and Maurer, H.A. Efficient worst-case data University of Paris-South
structures for range searching. To appear in Acta lnformatica
(1980).
g. Bentley, J.L., and Shamos, M.I. Divide and conquer in
multidimensional space. In Proc. ACM Symp. Theory of Comptng., Examples of fruitful interaction between
May 1976, pp. 220-230. geometrical combinatorics and the design and analysis
9. Bentley, J.L., and Shamos, M.I. A problem in multivariate of algorithms are presented. A demonstration is given
statistics: Algorithm, data structure, and applications. In Proc. 15th
Allerton Conf. Communication, Control, and Comptng., Sept. 1977, of the way in which a simple geometrical construction
pp. 193-201. yields new and efficient algorithms for various
10. Blum, M., et al. Time bounds for selection. J. Comptr. Syst. Sci. searching and list manipulation problems.
7, 4 (Aug. 1972), 448--461.
il. Dobkin, D., and Lipton, R.J. Multidimensional search problems. Key Words and Phrases: data structures,
SIAM J. Comptng. 5, 2 (June 1976), 181-186. dictionaries, linear list, search, merge, permutations,
12. Fredman, M. A near optimal data structure for a type of range analysis of algorithms
query problem. In Proc. 1lth ACM Symp. Theory of Comptng.,
April 1979, pp. 62-66. CR Categories: 4.34, 5.24, 5.25, 5.32, 8.1
13. Fredman, M., and Weide, B.W. On the complexity of computing
the measure of O [ai, bi]. Comm. ACM 21, 7 (July 1978), 540-544.
14. Friedman, J.H. A recursive partitioning decision rule for
nonparametric classification. 1EEE Trans. Comptrs. C-26, 4 (April 1. Introduction
1977), 404--408.
15. Friedman, J. H. A nested partitioning algorithm for numerical W h e n e v e r t w o c o m b i n a t o r i a l s t r u c t u r e s are c o u n t e d
multiple integration. Rep. SLAC-PUB-2006, Stanford Linear
Accelerator Ctr., 1978. b y t h e s a m e n u m b e r , t h e r e exist b i j e c t i o n s ( o n e - o n e
16. Knuth, D.E. The Art of Computer Programming, Vol. 3: Sorting m a p p i n g s ) b e t w e e n t h e t w o structures. O n e g o a l o f g e o -
and Searching. Addison-Wesley, Reading, Mass., 1973. m e t r i c a l c o m b i n a t o r i c s (see, for e x a m p l e , F o a t a a n d
17. Kung, H.T., Luccio, F., and Preparata, F.P. On finding the
maxima of a set of vectors. J. A CM 22, 4 (Oct. 1975), 469-476. S c h u t z e n b e r g e r [7]) is to e x p l i c i t l y c o n s t r u c t s u c h b i j e c -
18. Lee, D.T., and Wong, C.K. Qintary trees: A file structure for tions. T h i s is b r i n g i n g t h e field v e r y close to c o m p u t e r
multidimensional database systems. To appear in ACM Trans.
Database Syst. science: O n e c a n r e g a r d c o m b i n a t o r i a l r e p r e s e n t a t i o n s o f
19. Lipton, R., and Tarjan, R.E. Applications of a planar separator r e m a r k a b l e n u m b e r s as e q u i v a l e n t d a t a structures; ex-
theorem. In Proc. 18th Symp. Foundations of Comptr. Sci., Oct. plicit b i j e c t i o n s b e t w e e n s u c h r e p r e s e n t a t i o n s p r o v i d e
1977, pp. 162-170.
20. Lueker, G. A data structure for orthogonal range queries. In c o d i n g a n d d e c o d i n g a l g o r i t h m s b e t w e e n the structures.
Proc. 19th Symp. Foundations of Comptr. Sci., Oct. 1978, pp. E a r l i e r i n v e s t i g a t i o n s a l o n g t h e s e lines are 1;eported in
28-34. F r a n ~ o n et al. [10] a n d F l a j o l e t et al. [6].
21. Monier, L. Combinatorial solutions of multidimensional divide-
T h i s p a p e r s h o u l d b e r e g a r d e d as a n i n t r o d u c t i o n to
and-conquer recurrences. To appear in the J. of Algorithms.
22. Murray, J.A. Lieutenant, Police Department--The Complete Study u s i n g m e t h o d s o f g e o m e t r i c a l c o m b i n a t o r i c s in the field
Guidefor Scoring High (4th ed.). Arco, New York, 1966, p. 184, o f a l g o r i t h m d e s i g n a n d analysis. F o r this p u r p o s e , w e
question 3.
c o n s i d e r r e p r e s e n t a t i o n o f n! as a r u n n i n g e x a m p l e a n d
23. Reddy, D.R., and Rubin, S. Representation of three-dimensional
objects. Carnegie-Mellon Comptr. Sci. Rep. CMU-CS-78-113, d e m o n s t r a t e h o w we are led to d i s c o v e r i n g n e w a n d
Carnegie-Mellon Univ., Pittsburgh, Pa., 1978. e f f i c i e n t d a t a s t r u c t u r e s a n d a l g o r i t h m s for s o l v i n g v a r -
24. Saxe, J.B. On the number of range queries in k-space. Discrete
Appl. Math. 1, 3 (Nov. 1979), 217-225. ious data manipulation problems.
25. Shamos, M.I. Computational geometry. Unpublished Ph.D. Permission to copy without fee all or part of this material is
dissertation, Yale Univ., New Haven, Conn., 1978. granted provided that the copies are not made or distributed for direct
26. Shamos, M.I. Geometric complexity. In Proc. 7th ACM Symp. commercial advantage, the ACM copyright notice and the title of the
Theory of Comptng., May 1975, pp. 224-233. publication and its date appear, and notice is given that copying is by
27. Weide, B. A survey of analysis techniques for discrete algorithms. permission of the Association for Computing Machinery. To copy
Comptng. Surv. 9, 4 (Dec. 1977), 291-313. otherwise, or to republish, requires a fee and/or specific permission.
28. Willard, D.E. New data structures for orthogonal queries. This work was supported by the National Center for Scientific
Harvard Aiken Comptr. Lab. Rep., Cambridge, Mass., 1978. Research (CNRS), Paris, under Grant 3941.
29. Yao, F.F. On f'mding the maximal elements in a set of planar Author's address: J. Vuillemin, Laboratory for Information Re-
vectors. Rep. UIUCDCS-R-74-667, Comptr. Sci. Dept., Univ. of search, Building 490, University of Paris-South, 91405 Orsay, France.
Illinois, Urbana, July 1974. © 1980 ACM 0001-0782/80/0400-0229 $00.75.