Unit - 2 DAA
Unit - 2 DAA
Brute Force – Computing an – String Matching - Closest-Pair and Convex-Hull Problems - Exhaustive Search -
Travelling Salesman Problem - Knapsack Problem - Assignment problem. Divide and Conquer Methodology – Binary
Search – Merge sort – Quick sort – Heap Sort - Multiplication of Large Integers – Closest-Pair and Convex - Hull
Problems.
Brute force is a straightforward approach to solving a problem, usually directly based on the
problem’s statement and definitions of the concepts involved. For e.g. the algorithm to find the gcd of
two numbers.
Brute force approach is not an important algorithm design strategy for the following reasons:
• First, unlike some of the other strategies, brute force is applicable to a very wide variety
of problems. Its used for many elementary but algorithmic tasks such as computing the
sum of n numbers, finding the largest element in a list and so on.
• Second, for some problem it yields reasonable algorithms of at least some practical
value with no limitation on instance size.
• Third, the expense of designing a more efficient algorithm if few instances to be solved
and with acceptable speed for solving it.
• Fourth, even though it is inefficient, it can be used to solve small-instances of a problem.
• Last, it can serve as an important theoretical or educational propose.
The closest-pair problem calls for finding the two closest points in a set of n points. It is the
simplest of a variety of problems in computational geometry that deals with proximity of points in the
plane or higher-dimensional spaces. Points in question can represent such physical objects as airplanes
or post offices as well as database records, statistical samples, DNA sequences, and so on. An air-
traffic controller might be interested in two closest planes as the most probable collision
candidates.Aregional postal service manager might need a solution to the closestpair problem to find
candidate post-office locations to be closed.
One of the important applications of the closest-pair problem is cluster analysis in statistics.
Based on n data points, hierarchical cluster analysis seeks to organize them in a hierarchy of clusters
based on some similarity metric. For numerical data, this metric is usually the Euclidean distance; for
text and other nonnumerical data, metrics such as the Hamming distance are used. A bottom-up
algorithm begins with each element as a separate cluster and merges them into successively larger
clusters by combining the closest pair of clusters.
For simplicity, we consider the two-dimensional case of the closest-pair problem. We assume that the
points in question are specified in a standard fashion by their (x, y) Cartesian coordinates and that the
distance between two points pi(xi,yi) and pj(xj, yj ) is the standard Euclidean distance
d(pi, pj ) = (xi− xj )2 + (yi− yj )2.
The brute-force approach to solving this problem leads to the following obvious algorithm: compute
the distance between each pair of distinct points and find a pair with the smallest distance. Of course,
we do not want to compute the distance between the same pair of points twice. To avoid doing so, we
consider only the pairs of points (pi, pj ) for which i < j.
Pseudocode below computes the distance between the two closest points; getting the closest points
themselves requires just a trivial modification.
ALGORITHM BruteForceClosestPair(P )
//Finds distance between two closest points in the plane by brute force
//Input: A list P of n (n ≥ 2) points p1(x1, y1), . . . , pn(xn, yn)
//Output: The distance between the closest pair of
points d←∞
for i ←1 to n − 1 do
for j ←i + 1 to n do
d ←min(d, sqrt((xi
− xj )2 + (yi
− yj )2)) //sqrt is square
root return d
The basic operation of the algorithm is computing the square root. In the age of electronic
calculators with a square-root button, one might be led to believe that computing the square root is as
simple an operation as, say, addition or multiplication. Of course, it is not. For starters, even for most
integers, square roots are irrational numbers that therefore can be found only approximately. Moreover,
computing such approximations is not a trivial matter. But, in fact, computing square roots in the loop
can be avoided! (Can you think how?) The trick is to realize that we can simply ignore the square-root
function and compare the values (xi− xj )2 + (yi− yj )2 themselves.
We can do this because the smaller a number of which we take the square root, the smaller its
square root, or, as mathematicians say, the square-root function is strictly increasing. Then the basic
operation of the algorithm will be squaring a number. The number of times it will be executed can be
computed as follows: Of course, speeding up the innermost loop of the algorithm could only decrease
the algorithm’s running time by a constant but it cannot improve its asymptotic efficiency class.
The problem asks to find the shortest tour through a given set of n cities that visits each city
exactly once before returning to the city where it started. The problem can be stated as the problem of
finding the shortest Hamiltonian circuit of the graph-which is a weighted graph, with the graph’s
vertices representing the cities and the edge weights specifying the distance.
Hamiltonian circuit is defined as a cycle that passes thru all the vertices of the graph exactly
once.The Hamiltonian circuit can also be defined as a sequence of n+1 adjacent vertices vi0, vi1, ….
Vin-1, vi0, where the first vertex of the sequence is the same as the last one while all other n-1 vertices
are distinct. Obtain the tours by generating all the permutations of n-1 intermediate cities, compute the
tour lengths, and find the shortest among them.
Consider the condition that the vertex B precedes C then,The total no of permutations will be
(n-1)! /2, which is impractical except for small values of n. On the other hand, if the starting vertex is
not considered for a single vertex, the number of permutations will be even large for n values.
2.2.2 KNAPSACK PROBLEM
The problem states that: given n items of known weights w1,w 2, … wn and values v1, v2,…, vn and a
knapsack of capacity w, find the most valuable subset of the items that fit into the knapsack. Eg
consider a transport plane that has to deliver the most valuable set of items to a remote location without
exceeding its capacity.
Example:
W=10, w1,w2, w3, w4 = { 7,3,4,5 } and v1,v2, v3, v4 = { 42,12,40,25 }
Thus, for both TSP and knapsack, exhaustive search leads to algorithms that are inefficient on
every input. These two problems are the best known examples of NP- hard problems. No polynomial-
time algorithm is known for any NP-hard problem. The two methods Backtracking and Branch &
bound enable us to solve this problem in less than exponential time.
The problem is: given n people who need to be assigned to execute n jobs, one person per job.
th th
The cost if the i person is assigned to the j job is a known quantity
c[i,j] for each pair i,j=1,2,….,n. The problem is to find an assignment with the smallest total cost.
Example:
Job1 Job2 Job3 Job4
Person1 9 2 7 8
Person2 6 4 3 7
Person3 5 8 1 8
Person4 7 6 9 4
From the problem, we can obtain a cost matrix, C. The problem calls for a selection of one
element in each row of the matrix so that all selected elements are in different columns and the total
sum of the selected elements is the smallest possible.
Describe feasible solutions to the assignment problem as n- tuples <j1 , j2, …., jn> in which the
th th
i component indicates the column n of the element selected in the i row. i.e.,The job number
th
assigned to the i person. For eg <2,3,4,1> indicates a feasible assignment of person 1 to job2, person
2 to job 3, person3 to job 4 and person 4 to job 1. There is a one-to-one correspondence between
feasible assignments and permutations of the first n integers. If requires generating all the permutations
of integers 1,2,…n, computing the total cost of each assignment by summing up the corresponding
elements of the cost matrix, and finally selecting the one with the smallest sum.
Based on number of permutations, the general case for this problem is n!, which is impractical
except for small instances. There is an efficient algorithm for this problem called the Hungarian
method. This problem has a exponential problem solving algorithm, which is also an efficient one. The
problem grows exponentially, so there cannot be any polynomial-time algorithm.
Divide and Conquer is a best known design technique, it works according to the following plan:
a) A problem’s instance is divided into several smaller instances of the same problem of
equal size.
b) The smaller instances are solved recursively.
c) The solutions of the smaller instances are combined to get a solution of the original
problem.
As an example, let us consider the problem of computing the sum of n numbers a0, a1, … an-1. If n>1,
we can divide the problem into two instances: to compare the sum of first n/2 numbers and the
remaining n/2 numbers, recursively. Once each subset is obtained add the two values to get the final
solution. If n=1, then return a0 as the solution.
i.e. a0 + a1….. + an-1 = (a0 + …..+ an/2– 1) + (an/2+ …. + a n-1)
This is not an efficient way, we can use the Brute – force algorithm here. Hence, all the problems are
not solved based on divide – and – conquer. It is best suited for parallel computations, in which each
sub problem can be solved simultaneously by its own processor.
Analysis:
In general, for any problem, an instance of size n can be divided into several instances of size
n/b with a of them needing to be solved. Here, a & b are constants; a≥1 and b > 1. Assuming that size n
is a power of b; to simplify it, the recurrence relation for the running time T(n) is:
T(n) = a T (n / b) + f (n)
Where f (n) is a function, which is the time spent on dividing the problem into smaller ones and on
combining their solutions. This is called the general divide-and- conquer recurrence. The order of
growth of its solution T(n) depends on the values of the constants a and b and the order of growth of
the function f (n).
Theorem:
d d d
Є θ(n ) where d≥0 in the above recurrence equation, then θ(n ) if a < b
d d log a
T(n) Є θ(n logn if a = b θ(n b )
d
if a > b
For example, the recurrence equation for the number of additions A(n) made by divide-and-conquer
k
on inputs of size n=2 is:
A (n) = 2 A (n/2) + 1
d
Thus for eg., a=2, b=2, and d=0 ; hence since a > b
log a
A (n) Є θ(n b )
log 2
= θ(n 2 )
l
= θ(n )
The merging of two sorted arrays can be done as follows: Two pointers are initialized to point to first el
Then the elements pointed to are compared and the smaller of them is added to a new array being
const smaller element is incremented to point to its immediate successor in the array it was copied
from. This the two given arrays is exhausted then the remaining elements of the other array are copied
to the end of the new array.
Algorithm Merge (B[0…P-1], C[0…q-1], A[0…p + q-1]) //Merge two sorted arrays into one sorted
array. //Input: Arrays B[0..p-1] and C[0…q-1] both sorted
if B[i] ≤ C[j]
A[k] = B[i]; i = i+1
else
A[k] = B[j]; j = j+1
K = k+1
if i=p
copy C[j..q-1] to A[k..p+q-1]
else copy B[i..p-1] to A[k..p+q-1]
Analysis:
Assuming for simplicity that n is a power of 2, the recurrence relation for the number of key
comparisons C(n) is
At each step, exactly one comparison is made, after which the total number of elements in the two
arrays still needed to be processed is reduced by one element. In the worst case, neither of the two
arrays becomes empty before the other one contains just one element. Therefore, for the worst case,
Cmerge (n) = n-1 and the recurrence is:
Cworst (n) = 2Cworst (n/2) + n-1 for n>1, Cworst (1) = 0 When n is a power of 2, n=
k
2 , by successive substitution, we get,
C(n) = 2 C (n/2) + Cn
= 2 (2 C (n/4) + C n/2 ) + Cn
= 2 C (n/4) +2 Cn
= 4 (2 C (n/8) + C n/4 ) + 2Cn
= 8 C (n/8) +3 Cn
:
:
k
= 2 C(1) + kCn
= an + Cnlog2n
k k k+1
Since k = logn and n = 2 , we get, log2n = k(log22) = k * 1 It is easy to see that if 2 ≤ n ≤ 2 ,
then
k+1
C(n) ≤ C(2 ) Є θ(nlog2n)
From this we conclude that A(2) < A(4) < A(1) < A (6) and A (5) < A (3) < A (7) < A(8).
2. The stack space used for recursion. The maximum depth of the stack is proportional to log2n. This
is developed in top-down manner. The need for stack space can be eliminated if we develop
algorithm in Bottom-up approach.
It can be done as an in-place algorithm, which is more complicated and has a larger multiplicative
constant.
Quick sort is another sorting algorithm that is based on divide-and-conquer strategy. Quick
sort divides according to their values. It rearranges elements of a given array A[o…n-1] to achieve its
partition, a situation where all the elements before some position s are smaller than or equal to A [s]
and all elements after s are greater than or equal to A [s]:
After this partition A [S] will be in its final position and this proceeds for the two sub arrays:
sub array, called as pivot. The pivot by default is considered to be the first element in the list. i.e. P =
A (l)
The method which we use to rearrange is as follows which is an efficient method based on
two scans of the sub array ; one is left to right and the other right to left comparing each element with
the pivot. The L R scan starts with the second element. Since we need elements smaller than the
pivot to be in the first part of the sub array, this scan skips over elements that are smaller than the
pivot and stops on encountering the first element greater than or equal to the pivot. The R L scan
starts with last element of the sub array. Since we want elements larger than the pivot to be in the
second part of the sub array, this scan skips over elements that are larger than the pivot and stops on
encountering the first smaller element than the pivot.
Three situations may arise, depending on whether or not the scanning indices have
crossed. If scanning indices i and j have not crossed, i.e . i < j, exchange A [i] and A [j] and resume
the scans by incrementing and decrementing j, respectively.
If the scanning indices have crossed over, i.e. i>j, we have partitioned the array after
exchanging the pivot with A [j].
Finally, if the scanning indices stop while pointing to the same elements, i.e. i=j, the value
they are pointing to must be equal to p. Thus, the array is partitioned. The cases where, i>j and i=j
can be combined to have i ≥ j and do the exchanging with the pivot.
until i ≥ j
Swap (A[i], A[j] ) //undo the last swap when i ≥ j Swap
(A [l], A [j])
return j
Analysis:
The efficiency is based on the number of key comparisons. If all the splits happen in the
middle of the sub arrays, we will have the best case. The no. of key comparisons will be:
C best (n) = 2 C best (n/2) + n for n > 1 C
best (1) = 0
k
According to theorem, C best (n) Є θ (n log 2 n); solving it exactly for n = 2 yields C best
(n) = n log 2 n.
In the worst case, all the splits will be skewed to the extreme : one of the two sub arrays
will be empty while the size of the other will be just one less than the size of a subarray being
partitioned. It happens for increasing arrays, i.e., the inputs which are already solved. If A [0…n-1] is
a strictly increasing array and we use A [0] as the pivot, the L→ R scan will stop on A[1] while the R
→ L scan will go all the way to reach A[0], indicating the split at position 0:
So, after making n+1 comparisons to get to this partition and exchanging the pivot A [0] with itself,
the algorithm will find itself with the strictly increasing array A[1..n-1] to sort. This sorting of
increasing arrays of diminishing sizes will continue until the last one A[n- 2..n-1] has been
processed. The total number of key comparisons made will be equal to:
Finally, the average case efficiency, let Cavg(n) be the average number of key comparisons made by
quick sort on a randomly ordered array of size n. Assuming that the partition split can happen in each
position s (o ≤ s ≤ n – 1) with the same probability 1/n, we get the following recurrence relation:
n-1
Cavg(0) = 0 , Cavg(1) = 0
Therefore, Cavg(n) ≈ 2nln2 ≈ 1.38nlog2n
Thus, on the average, quick sort makes only 38% more comparisons than in the best case. To refine
this algorithm : efforts were taken for better pivot selection methods (such as the median – of – three
partitioning that uses as a pivot the median of the left most, right most and the middle element of the
array) ; switching to a simpler sort on smaller sub files ; and recursion elimination (so called non
recursive quick sort). These improvements can cut the running time of the algorithm by 20% to 25%
Partitioning can be useful in applications other than sorting, which is used in selection
problem also.
Analysis:
The efficiency of binary search is to count the number of times the search key is compared
with an element of the array. For simplicity, we consider three-way comparisons. This assumes that
after
one comparison of K with A [M], the algorithm can determine whether K is smaller, equal to, or
larger than A [M]. The comparisons not only depends on ‘n’ but also the particular instance of the
problem. The worst case comparison Cw (n) include all arrays that do not contain a search key, after
one comparison the algorithm considers the half size of the array.
Cw (n) = Cw (n/2) + 1 for n > 1, Cw (1) = 1 eqn (1)
k
To solve such recurrence equations, assume that n = 2 to obtain the solution.
k
Cw (2 ) = k +1 = log2n+1
For any positive even number n, n = 2i, where I > 0. now the LHS of eqn (1) is:
Cw [n/2] + 1 = Cw [2i / 2] + 1
= Cw (i) + 1
= ([log2 i] + 1) + 1
= ([log2 i] + 2
Since both expressions are the same, we proved the assertion.
The worst – case efficiency is in θ (log n) since the algorithm reduces the size of the array
remained as about half the size, the numbers of iterations needed to reduce the initial size n to the
final size 1 has to be about log2n. Also the logarithmic functions grows so slowly that its values
remain small even for very large values of n.
The average-case efficiency is the number of key comparisons made which is slightly
smaller than the worst case.
More accurately, for successful search Cavg (n) ≈ log2n –1 and for unsuccessful search Cavg (n) ≈
log2n + 1.
Though binary search is an optional algorithm there are certain algorithms like
interpolation search which gives better average – case efficiency. The hashing technique does not
even require the
array to be sorted. The application of binary search is used for solving non-linear equations in one
unknown.
A binary tree T is defined as a finite set of nodes that is either empty or consists of a
root and two disjoint binary trees TL and TR called the left and right sub tree of the root.
Since, here we consider the divide-and-conquer technique that is dividing a tree into left
subtree and right subtrees. As an example, we consider a recursive algorithm for computing the
height of a binary tree. Note, the height of a tree is defined as the length of the longest path from the
root to a leaf. Hence, it can be computed as the maximum of the heights of the root’s left and right
sub trees plus 1. Also define, the height of the empty tree as – 1. Recursive algorithm is as follows:
Algorithm Height (T)
// Computes recursively the height of a binary tree T
// Input : A binary tree T
T if T = ∅ return – 1
// Output : The height of
Analysis:
whether the tree is empty or not. For an empty tree, the comparison T = ∅ is executed once but these
The efficiency is based on the comparisons and addition operations, and also we should check
are no additions and for a single node tree, the comparison and addition numbers are three and one
respectively.
The tree’s extension can be drawn by replacing the empty sub trees by special nodes which
helps in analysis. The extra nodes (square) are called external; the original nodes (circles) are called
internal nodes. The extension of the empty binary tree is a single external node.
The algorithm height makes one addition for every internal node of the extended treed and
one comparison to check whether the tree is empty for every internal and external node. The number
of external nodes x is always one more than the number internal nodes n:
i.e. x = n + 1
To prove this equality by induction in the no. of internal nodes n ≥ 0. The induction’s basis is
true because for n = 0 we have the empty tree with 1 external node by definition: In general, let us
assume thatx = K + 1 for any extended binary tree with 0 ≤ K ≤ n internal nodes. Let T be an
extended binary tree with n internal nodes and x external nodes, let n L and xL be the number of
internal and
external nodes in the left sub tree of T and nR & nR is the internal & external nodes of right subtree
respectively. Since n > 0, T has a root which is its internal node and hence n = nL + nR + 1 using the
equality
x = xL + xR = (nL + 1) + (nR + 1)
= (nL + nR + 1) +1 =
n+1 which completes the proof
In algorithm Height, C(n), the number of comparisons to check whether the tree is empty is:
c (n) = n + x
= n + (n + 1)
= 2n + 1
The algorithm can be designed based on recursive calls. Not all the operations require the
traversals of a binary tree. For example, the find, insert and delete operations of a binary search tree
requires traversing only one of the two sub trees. Hence, they should be considered as the
applications of variable – size decrease technique (Decrease-and-Conquer) but not the divide-and-
conquer technique.
Some applications like cryptology require manipulation of integers of more than 100 decimal
digits. It is too long to fit into a word. Moreover, the multiplication of two n digit numbers, result
2
with n digit multiplications. Here by using divide-and-conquer concept we can decrease the number
of multiplications by slightly increasing the number of additions.
For example, say two numbers 23 and 14. It can be represented as follows:
1 0 1 0
23 = 2 x 10 + 3 x 10 and 14 = 1 x 10 + 4 x 10
Now, let us multiply them:
1 0 1 0
23 * 14 = (2 x 10 + 3 x 10 ) * (1 x 10 + 4 x 10 )
2 1 0
= (2*1) 10 + (3*1 + 2*4) 10 + (3*4) 10
= 322
When done in straight forward it uses the four digit multiplications. It can be reduced to one digit
multiplication by taking the advantage of the products (2 * 1) and (3 * 4) that need to be computed
anyway.
i.e. 3 * 1 + 2 * 4 + (2 + 3) * (1 +4) – (2 * 1) – (3 * 4)
In general, for any pair of two-digit numbers a = a1 ao and b = b1 bo their product c can be computed
by the formula:
2 1
C = a * b = C2 10 + C1 10 + Co
Where,
st
C2 = a1 * b1 – Product of 1 digits
nd
Co = ao * bo – Product of their 2 digits
a1 + ao) * (b1 + bo) – (C2 + Co) product of the sum of the a’s digits and the sum of the b’s digits
minus the sum of C2 and Co.
Now based on divide-and-conquer, if there are two n-digits integers a and b where n is a
positive even number. Divide both numbers in the middle; the first half and second half ‘a’ are a1
and ao respectively. Similarly, for b it is b1 and bo respectively.
n/2
Here, a = a1 ao implies that a = a1 10 + ao
n/2
And, b = b1 bo implies that b = b1 10 + bo
n/2 n/2
Therefore, C = a * b = (a1 10 + ao) * (b1 10 + bo)
n n/2
= (a1 * b1) 10 + (a1* b0 + ao * b1) 10 + (a0* b0)
n n/2
= C210 + C110 + Co
i k-i K K-K K
= ….3 M (2 )= …..3 M (2 )=3
Since k = log2n,
log n
M(n) = 3 2
= n log23 ≈ n 1.585
1.585 2
n , is much less than n
This is a problem which is used for multiplying two n x n matrixes. Volker Strassen in 1969
introduced a set of formula with fewer number of multiplications by increasing the number of
additions.
Where,
m1 = ( a00 + a11 ) * (b00 + b11)
m2 = ( a10 + a11 )* b00
m3 = a00 * (b01 - b11)
m4 = a11 * (b10 – b00)
m5 = ( a00 + a01 * b11
m6 = (a10 – a00) * (b00 + b01)
m7 = ( a01 - a11 ) * (b10 + b11)
Thus, to multiply two 2-by-2 matrixes, Strassen’s algorithm requires seven multiplications
and 18 additions / subtractions, where as the brute-force algorithm requires eight multiplications and
4 additions. Let A and B be two n-by-n matrixes when n is a power of two. (If not, pad the rows and
columns with zeroes). We can divide A, B and their product C into four n/2 by n/2 sub matrices as
follows:
Analysis:
The efficiency of this algorithm, M(n) is the number of multiplications in multiplying two n
by n matrices according to Strassen’s algorithm. The recurrence relation is as follows:
k
Solving it by backward substitutions for n=2 yields.
i k-i K K-K K
= ….7 M (2 )= …..7 M (2 )=7
Since k = log2n,
log n
M(n) = 7 2
= n log27
≈ n 2.807
3
which is smaller than n required by Brute force algorithm.
Since this saving is obtained by increasing the number of additions, A (n) has to be checked for
obtaining the number of additions. To multiply two matrixes of order n>1, the algorithm needs to
multiply seven matrices of order n/2 and make 18 additions of matrices of size n/2; when n=1, no
additions are made since two numbers are simply multiplied. The recurrence relation is
2
A(n) = 7 A (n/2) + 18 (n/2) for n>1
A (1) = 0
log 7
This can be deduced based on Master’s Theorem, as A(n) Є θ (n 2 ). In other words, the
number of additions has the same order of growth as the number of multiplications. Thus in
log 7 3
Strassen’s algorithm it is θ (n 2 ), which is better than θ (n ) of brute force.
Let P be a set of n > 1 points in the Cartesian plane. For the sake of simplicity, we assume
that the points are distinct. We can also assume that the points are ordered in nondecreasing order of
their x coordinate. (If they were not, we could sort them first by an efficeint sorting algorithm such as
mergesort.) It will also be convenient to have the points sorted in a separate list in nondecreasing
order of the y coordinate; we will denote such a list Q.
If 2 ≤ n ≤ 3, the problem can be solved by the obvious brute-force algorithm. If n > 3, we can
divide the points into two subsets Pl and Pr of _n/2_ and _n/2_points, respectively, by drawing a
vertical line through the median m of their x coordinates so that _n/2_ points lie to the left of or on
the line itself, and _n/2_ points lie to the right of or on the line. Then we can solve the closest-pair
problem recursively for subsets Pl and Pr . Let dl and dr be the smallest distances between pairs of
points in Pl and Pr , respectively, and let d = min{dl, dr}.
Note that d is not necessarily the smallest distance between all the point pairs because points
of a closer pair can lie on the opposite sides of the separating line. Therefore, as a step combining the
solutions to the smaller subproblems, we need to examine such points. Obviously, we can limit our
attention to the points inside the symmetric vertical strip of width 2d around the separating line, since
the distance between any other pair of points is at least d obtained from Q and hence ordered in
nondecreasing order of their y coordinate.
Scan this list, updating the information about dmin, the minimum distance seen so far, if we
encounter a closer pair of points. Initially, dmin = d, and subsequently dmin ≤ d. Let p(x, y) be a
point on this list. For a point p(x, y) to have a chance to be closer to p than dmin, the point must
follow p on list S and the difference between their y coordinates must be less than dmin.
Geometrically,this means that p must belong to the rectangle. The principal insight exploited
by the algorithm is the observation that the rectangle can contain just a few such points, because the
points in each half (left and right) of the rectangle must be at least distance d apart. It is easy to prove
that the total number of such points in the rectangle, including p, does not exceed eight a more
careful analysis reduces this number to six .Thus, the algorithm can consider no more than five next
points following p on the list S, before moving up to the next point. Here is pseudocode of the
algorithm.
ALGORITHM EfficientClosestPair(P, Q)
}
m←P[_n/2_ − 1].x
copy all the points of Q for which |x − m| < d into array S[0..num − 1] dminsq ←d2
for i ←0 to num − 2 do k←i + 1
Applying the Master Theorem (with a = 2, b = 2, and d = 1), we get T (n) 樺 _(n log n). The necessity
to presort input points does not change the overall efficiency class if sorting is done by a O(n log n)
algorithm such as mergesort. In fact, this is the best efficiency class one can achieve, because it has
been proved that any algorithm for this problem must be in _(n log n) under some natural
assumptions about operations an algorithm can perform.
Let us revisit the convex-hull problem,: find the smallest convex polygon that contains n
given points in the plane. We consider here a divide-and-conquer algorithm called quickhull because
of its resemblance to quicksort.
Let S be a set ofn> 1 points p1(x1, y1), . . . , pn(xn, yn) in the Cartesian plane.We assume that
the points are sorted in nondecreasing order of their x coordinates,with ties resolved by increasing
order of the y coordinates of the points involved.It is not difficult to prove the geometrically obvious
fact that the leftmost point p1 and the rightmost point pn are two distinct extreme points of the set’s
convex hull
Let −−−→p1pn be the straight line through points p1 and pn
directed from p1 to pn. This line separates the points of S into two
sets: S1 is the set of points to the left of this line, and S2 is the set of
points to the right of this line. (We say that point q3 is to the left of
the line −−−−→ q1q2 directed from point q1 to point q2 if q1q2q3
forms a counterclockwise cycle. Later, we cite an analytical way to
check this condition, based on checking the sign of a determinant
formed by the coordinates of the three points.) The points of S on the
line −−−→p1pn, other than p1and pn, cannot be extreme points of the
convex hull and hence are excluded from further consideration.
The boundary of the convex hull of S is made up of two polygonal chains:an
“upper” boundary
and a “lower” boundary. The “upper” boundary, called the upper
hull, is a sequence of line segments with vertices at p1, some of the
points in S1 (if S1 is not empty) and pn. The “lower” boundary,
called
the lower hull, is a sequence of line segments with vertices at p1,
some of the points in S2 (if S2 is not empty) and pn. The fact that
the convex hull of the entire set S is composed of the upper and
lower
hulls, which can be constructed independently and in a similar
fashion, is a very useful observation exploited by several
algorithms for this problem.
For concreteness, let us discuss how quickhull proceeds to
construct the upper hull; the lower hull can be constructed in the same
manner. If S1 is empty, the empty, the algorithm identifies point
pmax in S1, which is the farthest from the line −−−→ p1pn . If there
is a tie, the point that maximizes the angle _ pmaxppn can be selected.
(Note that point pmax maximizes the area of the triangle with two
vertices at p1 and pn and the third one at some other point of S1.)
Then the algorithm identifies all the points of set S1 that are to the left
of the line −−−→ p1pmax;these are the points that will make up the
set S1,1. The points of S1 to the left of the line −−−−−−→ pmaxpn
will make up the set S1,2. It is not difficult to prove the following:
pmax is a vertex of the upper hull.
The points inside _p1pmaxpn cannot be vertices of the upper hull (and
hence can be
eliminated from further consideration).There are no points to the left of both lines
−−−→p1pmax and
−−−−−−→