Symbolic and Numerical Computation For Artificial Intelligence
Symbolic and Numerical Computation For Artificial Intelligence
Computation for
Artificial Intelligence
edited by
Deepak Kapur
Department of Computer Science
State University of New York, USA
Joseph L. Mundy
AI Laboratory
GE Corporate R&D, Schenectady, USA
Academic Press
Harcourt Brace Jovanovich, Publishers
London San Diego New York
Boston Sydney Tokyo Toronto
ACADEMIC PRESS LIMITED
24--28 Oval Road
London NW1
US edition published by
ACADEMIC PRESS INC.
San Diego, CA 92101
Copyright © 1992 by
ACADEMIC PRESS LIMITED
A catalogue record for this book is available from the British Library
ISBN 0-12-220535-9
Daniel P. Huttenlochert
Computer Science Department
Cornell University, Ithaca, NY 14853
Klara Kedem
Computer Science Department
Tel Aviv University, Tel Aviv, Israel
1. Introduction
f This work was supported in part by NSF grant IRI-9057928 and matching funds from General
Electric and Kodak, and in part by the Air Force Office of Scientific Research under contract AFOSR-
91-0328. The second author was supported by a fellowship from the Pikkowski- Valazzi FUnd and by the
Eshkol grant 04601-90.
202 D.P. Huttenlocher and K. Kedem
exists some transformation T E T such that T(A) = B, where Tis a given transformation
group. When A and B are the same shape according to this definition, we say A= B.
For example, the triangles A and B with side lengths 3, 4, 5 and 6, 8, 10 respectively have
the same shape under a similarity transformation (translation, rotation and change of
scale). In this paper we consider several types of geometric objects: (i) sets of points in
lRd, d = 2, 3, (ii) sets of line segments in the plane, and (iii) simple polygons in the plane.
We discuss methods for comparing shapes of the first two types under translation, and
shapes of the third type under similarity transformation.
In pattern recognition and model-based vision applications, a stored set of 'model'
shapes is often compared with an unknown shape that has been detected by some sensory
device. The difference between each model shape and the unknown shape is computed,
and the model that is closest to the unknown shape is reported as the best match.
We have argued elsewhere (Arkin et al., 1991) that for such applications the function
used to measure the difference between shapes should be a metric (see Mumford, 1987)
for similar arguments). This means that given a class of geometric objects the shape
difference function d should obey the following properties, for any three shapes A, B and
c,
1 d(A, B) ;:=: 0 for all A and B.
2 d(A, B)= 0 if and only if A= B (Identity).
3 d(A, B)= d(B, A) for all A and B (Symmetry).
4 d(A, B)+ d(B, C);:=:: d(A, C) for all A, B, and C (Triangle Inequality).
similarity transformations. This distance was originally reported in Arkin et al. (1991).
The distance function can be computed in time 0( mn log( mn)) for two simple polygons
with m and n vertices respectively. It is based on computing the L 2 distance between
the turning function representations of the two polygons.
A shape comparison function D(A, B) that measures the minimum Hausdorff distance
between two point sets A and B under translation was defined in Huttenlocher and
Kedem (1990). Here we present the definition of this function, claim that it is a metric,
and describe methods for computing it efficiently.
The Hausdorff distance between two sets, A = { a 1 , ..• , ap} and B = { b1 , ..• , bq},
where each a;, bj is either a point or a line segment, is given by
H(A, B)= max(h(A*, B*), h(B*, A*)) (2.1)
where A* (resp. B*) is the union of all points and segments in A (resp. B),
h(A*, B*) = max min p(a, b), (2.2)
aEA• bEB•
and p(a, b) is the underlying metric. The function h(A*, B*) is the directed Hausdorff
distance from A* to B* , and measures the distance of the point of A* that is farthest
from any point of B* (under p). Intuitively, the Hausdorff distance is small if and only
if each point of A* is near some point of B* and vice versa (it measures the distance of
the maximal outlying point).
It is well-known that the function H(A, B) is a metric over the set of all closed,
bounded sets. The Hausdorff distance, H(A, B), can be trivially computed in time O(pq)
for two point sets of size p and q respectively; with some care, this can be improved to
O((p + q) log(p + q)) (Alt et al., 1991).
The shape comparison function D(A, B) is then defined to be the minimum value of
the Hausdorff distance under translation. Without loss of generality we assume that the
set A is fixed, and only the set B is allowed to translate, then
D(A, B)= minH(A, B EEl x) (2.3)
X
where BE£) x ={ b+ x lb E B}, and H is the Hausdorff distance as defined above. That is,
the distance is defined to be the minimal value of the Hausdorff distance over all possible
translations of the set B.
A problem closely related to measuring D(A, B) is that of finding the best approximate
congruence under translation for two sets of n points, A and B (Alt et al., 1988). Formally,
this problem is to find the translation x of B and the bijection I : B -* A that minimizes
d = maxp(b
bEB
+ x, l(b)).
This function is not very well suited to pattern recognition problems, however, because
a sensing device will often merge two close points into one or split one point into two.
When this happens there will either no longer be a total matching between the point
sets, or the least cost matching will be forced to pair relatively distant points with one
another and thus greatly increase the cost. In contrast, the minimum Hausdorff distance
under translation just measures the proximity of each point in one set to the nearest
point in the other- without requiring a matching between the sets.
We now describe how to compute the minimum Hausdorff distance under translation,
D(A, B)= minx H(A, B E9 x), for sets of points in ~ 2 and ~ 3 • First we consider sets of
points in the plane, and then show how the method generalizes to points in space. Note
that because in this section the sets A and B contain only points, their unions A* and B*
can be identified with A and B respectively. The main idea is to consider the distances
defined in (2.1) and (2.2) as functions that depend on the translation x of the set B. These
functions lead to constructs called Voronoi surfaces. Below we define Voronoi surfaces
and the upper envelope (pointwise maximum) of a set of Voronoi surfaces. We then show
the correspondence between the upper envelope of Voronoi surfaces and our problem of
finding the minimum Hausdorff distance between sets of points.
Given a set S = {Pili = 1, ... , n} of sources (points or line segments) in ~d, and
some metric p(., .), the Voronoi diagram of S, denoted by Vor(S), is the decomposition
of ~d into 'Voronoi cells' C1, ... , Gn, where each cell Gi contains those points of ~d that
are closer to Pi than to any other source (with closeness measured using the metric p).
Consider now the function
d(x) = minp(x, q). (2.4)
qES
'The graph of this function, {(x,d(x))ix E Rd}, is a surface which we call the Voronoi
surface of S (we use a slight abuse of notation and refer to the surface also as d(x )). Note
that this surface is at a local minimum (of zero) exactly when x is coincident with some
source Pi E S, and is at a local maximum for certain points that lie along the boundary of
cells ofVor(S). That is, the surface gives the distance from x to the nearest point Pi E S.
An illustration of such a surface in one-dimension is shown in figure 1, where the four
Shape Metrics 205
Pl P2 P3 P4
points Pi E S are denoted by open circles, and a particular point x is denoted by a closed
circle. The height of the surface, d( x), clearly gives the distance to the nearest point Pi.
The analogous surface for sets of points in lR 2 looks like an irregular 'egg-carton'.
The upper envelope (pointwise maximum) of m Voronoi surfaces was investigated in
Huttenlocher et al. (1991a). This work provided bounds on the number of vertices of
the upper envelope of m Voronoi surfaces, and presented theoretical algorithms for effi-
ciently computing the upper envelope. We now discuss the relation between the problem
of computing the minimum Hausdorff distance and the problem of computing the up-
per envelope of Voronoi surfaces. This yields algorithms for computing the minimum
Hausdorff distance for sets of points in lR 2 and lR 3 , and for sets of line segments in lR 2 •
\
In more detail, we can express the distance between a pair of points a; E A and bj E B,
as bj undergoes a translation x, by
O;,j(x) = p(a;, bj + x) = p(a;- bj, x)
where p is the underlying metric, which can be any Lp metric (but in this paper we will
refer to the most common metrics, namely, L1, L 00 and L 2 ). We then define the function
d;(x) to be the lower envelope of the functions Di,j(x) for a fixed point a; E A and over
allbjEB,
(2.5)
If we denote the set a; e B by S; (i.e. S; = {a;- bj lbi E B}) then substituting we obtain
d;(x) = min p(p, x ),
pES;
which is by definition the Voronoi surface of S; from equation (2.4). Recall that this
surface specifies the distance from a point X to the closest point of the set S; = a; e B.
Similarly, denoting the set A e bi by Sj, the function
dj(x) = min 8;,j(x) =min p(p, x)
a;EA pESj
is the lower envelope of the functions Di,j (x) for a given bj E B and over all a; E A.
Denote by f( x) the upper envelope of the functions d; (x), dj (x), then
:: f(x) =max (~;;d;(x), ~t:dj(x)) = H(A, B E9 x). (2.6)
Hence
minf(x) = minH(A, B E9 x).
X · X
Thus, in order to determine the minimum Hausdorff distance between two sets A and B,
where the set B is translated by x, we have to identify the value of x that minimizes the
206 D.P. Huttenlocher and K. Kedem
=
upper envelope of all the Voronoi surfaces defined by the sets S; a;GB and Sj =
Aeb;.
Moreover, note that f(x) specifies the value of H(A, B EB x) for each translation x of the
set B.
CLAIM 2.2. The number of local minima of f(x) is O(pq(p+ q)) for the metrics Lt and
Loo as underlying metrics, and is O(pq(p + q)a(pq)) for L2 as the underlying metric.
PROOF. It was shown in Huttenlocher et al. (1991a) that the upper envelope of m Voronoi
surfaces with a total of n source points is of complexity 0( mn) for the £ 1 and Leo metrics,
and O(mna(n)) for the £ 2 metric (where a is inverse Ackermann function). In computing
the minimum Hausdorff distance we have m = p+q sets (S;, 1 ~ i ~ p and S;, 1 ~ j ~ q),
and a total of n = 2pq points over all the sets. Substituting these quantities, the result
follows immediately. 0
In order to determine D(A, B) we must identify the global minimum of f(x) which can
be done by calculating all the local minima and inspecting each of them. Huttenlocher
et al. (1991a) 'showed that the upper envelope of m Voronoi surfaces with a total of n
source points can be computed in time O(mn log(n)). Computing the upper envelope
clearly dominates the running time, and thus,
CLAIM 2.3. The minimum Hausdorff distance under translation between two sets of
points in the plane (and the translation that achieves this minimum) can be computed in
time O(pq(p + q) log(pq)) for the metrics Lt, L2, and L 00 •
We do not present this algorithm here, however, because the method tha,t we have
implemented is an approximation based on rasterizing the Voronoi surfaces. The exact
method requires the computation of O(pq) Voronoi diagrams and the computation of
O(pq) unions of convex polygons having altogether O(pq(p + q)) edges and vertices,
which in practice is not as fast as the rasterized method. Moreover, since most data
from pattern matching and computer vision applications is already in raster form, the
rasterized method is particularly appropriate.
When the sets A ={ ={
a1, ... , ap} and B b1, ... , bq} consist of points in ~ 3 , and the
underlying distance metric pis £ 2 , we can apply the results ofHuttenlocher et al. (1991a)
on the upper envelope of four-dimensional Voronoi surfaces. They show that with m sets
and a total of n source points, the complexity of computing the upper envelope in this
case is O(mn 2 log(m)a(n)). For two sets of points A and Bin space, with p and q points
respectively, tl}is yields the following bound (again m = p + q, n = 2pq),
CLAIM 2.4. The minimum Hausdorff distance under translation between two sets of
points in ~3 based on the L 2 metric {and the translation that achieves this minimum)
can be computed in time O((pq) 2 (p + q)a(pq) log 2 (p + q)).
The problem of computing the minimum Hausdorff distance for sets of line segments
and points can also be solved by the technique of upper envelopes of Voronoi surfaces,
although some extra care is needed here. The reason for the difficulties is that in the case
Shape Metrics 207
of line segments there is a substantial difference between the finite sets A, B and the
infinite sets A*, B* (the unions of the points of A and of B respectively). Thus, direct
application of the method in the previous section would call for computing the upper
envelope of uncountably many Voronoi surfaces, each having a_ set of sources obtained
as the Minkowski differences of segments in A and a point in B*, or of a point in A*
and the segments of B. However, it suffices instead to form the upper envelope of only a
finite number of surfaces, each obtained by computing the Minkowski sum of the Voronoi
surface of the (reflected) set A with each segment of B, or the Minkowski sum of the
Voronoi surface of (the reflected) set B with with each segment of A, as we describe
below.
Let A = { a 1 , ... , ap} and B = { b1 , ... , bq} be two sets of points and segments in the
plane, where we require that the segments are all open, and that if a set contains an
(open) segment it also contains its endpoints; in other words, each closed line segment
appears in a set as three distinct (and pairwise disjoint) sites-its relative interior and
its endpoints (note that segments can share endpoints).
Define as above A* = UaEA a, and B* = ubEB b. We want to compute
D(A, B)= minH(A, B EB x) =
X
and that the right hand side of this equation is the Voronoi surface of (the reflected)
B translated by y E A*. Similarly, for the other minimization, we have for any point
z E bj and for every x, minyEA• p(x, y- z) = mina;EA p(x, ai- z), which is the Voronoi
surface of Vor(A) translated by z E B*. That is, in the minimization portions of these
expressions, we can minimize over (translated) objects in A or in B and not over their
unions. We denote these Voronoi surfaces by
d.(x) = minp(x,ai- z)
a;EA
It follows that f(x) is the upper envelope of at most p + q surfaces, each defined either
208 D.P. Huttenlocher and K. Kedem
by D;(x) =
maxyea; dy(x), or by Dj(x) =
maxzebi d.. (x). D;(x) is the upper boundary
of the volume obtained by sweeping the surface of Vor( -B) horizontally along a;; Dj (x)
has a similar interpretation.
Let us fix a segment a; E A. Denote the endpoints of this segment by a' and a". For
each face F of da'(x), when yEa; moves from a' to a 11 , the face F is swept horizontally
along the segment a; = a"- a1 • The resulting swept volume is simply the Minkowski sum
F E9 a;, which we denote by :F. For general metrics, including L 2 , the structure of the
swept volume is rather complicated, but in the L1 and L 00 metrics the structure is quite
straightforward. In these cases each such face F is a polygon, hence the boundary of :F
is a prism whose two bases are parallel to F and all its other sides are parallelograms
whose parallel edges are parallel to a;. It follows that we can represent the swept surfaces
as a collection of O(pq) triangles: every D;, for i = 1, ... , p has 0( q) triangles, and every
Dj for j = 1, ... , q, has O(p) triangles, hence a total of O(pq). This gets us to,
CLAIM 2.5. The minimum Hausdorff distance under translation between two sets A and
B 1 of p and q line segments respectively, with L 1 and L 00 as underlying metrics, in the
plane, can be computed in time O((pq) 2 a(pq)).
This is because the upper envelope f(x) is the upper envelope of O(pq) triangles, so
its complexity is O((pq) 2 a(pq)) (Pach and Sharir, 1989), and it can be computed in time
O((pq) 2 a(pq)) {Edelsbrunner et al., 1989).
It is interesting to compare this result with recent results of Alt et al. {1991) for com-
puting the minimum Hausdorff distance under translation, between sets of line segments
in the plane, in time O((pq?(P + q) log(pq)) for the L2 metric (whereas Claim 2.5 is
for L1 and L 00 ). Recently Agarwal et al. {1992) have come up with an algorithm that
computes the minimum Hausdorff distance between sets of line segments in the plane
under the L 2 metric in time O((pq) 2 (p + q) log3 (pq)), using parametric search.
We now turn to the task of comparing two point sets A and B where the points of
each set lie on an integer grid. For machine vision and pattern recognition applications
this is a reasonable model, because most data comes from grid-based sensors. Thus we
assume that we are given two sets of points A ={
a 1 , ... , ap} and B ={
b1 , ... , bq} such
that each point p.; E A and bj E B has integer coordinates. Consider the characteristic
function of the set A,
1 if {x, y) E A
a(x, y) = { 0 otherwise.
This function is always zero for any non-integer values of x or y, because the set A is
restricted to have points with integral coordinates. Thus the function can be represented
using a binary array A[k, 1] where the k, 1-th entry in the array is nonzero exactly when
the point (k, 1) E A (which of course is common in image processing applications). The
set B has an analogously defined characteristic function and its array representation
B[s, t].
As in the continuous case considered above, we wish to compute the Hausdorff distance
Shape Metrics 209
L
210 D.P. Huttenlocher and K. Kedem
buffering operation. Thus D[x, y] can be computed rapidly by rendering q cones (or
approximations thereof) and computing the view from z = -oo.
(andsimilarly for FB[x, y]). Note that this maximization can be performed very rapidly
with special-purpose graphics hardware for doing pan (shift) and z-buffer operations.
It is also possible to view the computation of FA[x, y] slightly differently, and note
that (2.7) is simply equivalent to maximizing the product of A[k, l] and D[s, t] at a given
relative position,
FA[x, y] = maxmaxA[k, l]D[k- x, 1- y]. (2.8)
k I
In other words, the maximization can be performed by 'positioning' the reflected D[k, /]
at each location (x, y), and computing the maximum of the product of A with D (and
similarly for FB[x, y]).
This form of the directed Hausdorff distance under translation is very similar to the
binary correlation of the two arrays A[k, I] and B[s, t],
The only differences a~e that the array B[s, t] in the correlation is replaced by the distance
array D[s, t] (the distance to the nearest pixel of B[s, t]), and the summation operations
in the correlation are replaced by maximization operations. Binary correlation is one of
the most commonly used tools in image processing, and thus it is interesting to briefly
compare the Hausdorff-distance under translation with correlation. One drawback of the
correlation measure is that it is not a metric. In particular, this means that there may
be several dissimilar models that all have high correlations with the same portion of
the same image. Second, for binary images the correlation operation is quite sensitive to
errors in the classification of a point (i.e. a 1-pixel that is classified as a 0 or vice versa),
because it measures the exact superposition of points in A[k, /]and B[s, t]. In contrast, the
Shape Metrics 211
minimum Hausdorff distance measures nearby points in the two sets, and thus is not very
sensitive to pixel classification errors as long as nearby pixels were correctly classified.
Algorithm 2.1. Given two input binary image arrays, A[k, l] and B[s, t], compute the
discrete directed Hausdorff distance under translation, FB[x, y], from B[s, t] to A[k, l] for
each translation (x, y) of B[s, t],
;!
f 1. Compute the array D'[k, l] that specifies the distance to the closest nonzero pixel of
A[k, 1] using a distance transform or special graphics hardware, as discussed above.
:I 2. Denote the nonzero pixels of B[s,t] by (s 1 ,t 1 ), ... ,(sq,tq)· For each value (x,y)
a. Compute the maximum over all values j, 1 ~ j ~ q of B[sj, tj]D'[sj + x, tj + y].
b. Set FB [x, y] to the maximum computed in the previous step.
The bidirectional Hausdorff distance is the maximum of the directed Hausdorff distance
from A to Band from B to A. However, in computing the bidirectional distance we must
be sure to translate the sets A and B consistently. Simply using Algorithm 2.1as a
subroutine ends up with inconsistent translations (the translations will have opposite
signs from each other). The output F[x, y] specifies a distance for each translation (x, y)
of the model array B[s, t].
Algorithm 2.2. Given two input binary image arrays A[k, 1] and B[s, t], compute the dis-
crete Hausdorff distance under translation, F[x, y], for each translation (x, y) of B[s, t].
1. Compute the array D[s, t] that specifies the distance to the closest nonzero pixel of
B[s, t] using a distance transform or special graphics hardware, as discussed above.
2. Compute the analogous array D'[k, /] for A[k, /].
3. Denote the nonzero pixels of A[k, /] by (k 1 ,lt), ... , (kp, lp), and denote the nonzero
pixels of B[s,t] by (s 1 ,t 1 ), •.. ,(sq,tq)· For each value (x,y)
a. Compute the maximum over all values i, 1 ~ i ~ p of A[ki, li]D[ki- x, li- y].
' b. Compute the maximum over all values j, 1 ~ j ~ q of B[sj, tj]D'[sj + x, tj + y].
' ~
c. Set F[x, y] to the larger of the values computed in the previous two steps.
I
212 D.P. Huttenlocher and K. Kedem
practical use in many situations because there are often nonzero pixels of the image that
have nothing to do with an instance of the object. In such cases, the distance "from the
image to the model" will never be small, because these image pixels will not be near any
nonzero pixels of the translated model. Thus, rather than maximizing over all nonzero
pixels (ki, h) of A[k, 1], it is more natural to maximize over those nonzero pixels of A[k, ij
that are covered by the translated array B[s, t]. This is analogous to the operation of
correlation, where in effect a given translation of the model masks out all of the image
except the portion that is covered by the model.
In many machine vision and pattern recognition applications it is also important to be
able to identify instances of a model that are only partly visible (either due to occlusion
or to failure of the sensing device to detect the entire object). The Hausdorff distance
under translation can naturally be extended to the problem of finding the best partial
matches between an 'model' bitmap B[s, t] and a 'image' bitmap A[k, /].Recall that for
each location t the computation of F[x, y] simply determines the distance of the point
of the translated model B EIJ t that is farthest from any point of the image A (and vice
versa). Thus iR effect each point of B E1J t is ranked by the distance to the nearest point of
A (and vice versa). That is, the largest ranked point- the point farthest from any point
of the other set - determines the distance. Hence, rather than maximizing over these
rankings, it is possible to some compute quantile, or percentage value (for more details
see Huttenlocher et al., 1991b). This yields a natural notion of the best partial match at
each translation.
We have implemented the above algorithms for computing the rasterized approxima-
tion to the Hausdorff distance under translation. In experiments contrasting the method
with binary correlation, we have found that the Hausdorff distance is substantially less
sensitive to small perturbations in the data than is correlation. The main reason is that
the Hausdorff distance measures spatial proximity, whereas correlation measures exact
superposition.
We now describe a metric for comparing polygonal shapes under similarity transfor-
mations (Arkin et al., 1991). Consider a simple polygon represented as a sequence of
points P = (Pl, ... , Pn), Pi E ~ 2 • An alternative representation of P is the turning func-
tion 0p(s), which measures the cumulative angle of the counter-clockwise tangent as
a function of the arc-length s (starting from some reference point on the boundary).
That is, 0p(s) keeps track of the turning that takes place (see figure 2). Note that this
is somewhat different from the standard definition, because we measure the cumulative
angle rather than the angle between the tangent and the reference orientation. Without
loss of generality, we assume that each polygon is rescaled so that the total perimeter
length is 1; hence, 0p is a function from [0, 1] to ~.
We define the distance between two polygons P and Q to be the minimum of the L 2
distance between their two turning functions 0p(s) and eQ(s), where the minimization
is done over all possible relative orientations and all possible starting locations, s 0 , along
the boundary. Schwartz and Sharir (1984) have defined a similar distance function that is
limited to comparing convex polygons. However, they compute an approximation based
r Shape Metrics 213
e e
211"
1 1
on discretizing the turning functions into equally spaced points, where the quality of the
approximation depends on the number of points chosen.
Consider two polygons P and Q and their associated turning functions 0p(s) and
0q(s). The L 2 distance between 0p(s) and 0q(s) is given by
min
' ( 8E!R tE(O,l] } 0
r l0p(s + t)- 0q(s) + 01 2 ds)
1
2
( 8E!Rmin hpq(t,o)) ,
tE(O,l] '
where
214 D.P. Huttenlocher and K. Kedem
s
1 2
Figure 3, The rectangular strips formed by the functions 9p(s) (solid lines) and 9q(s) (dashed
lines). The shaded region between 9p(s + t) and 9q(s) is the area being minimized.
PROOF. Clearly d 2 (., .) is everywhere positive, is symmetric, and has the identity prop-
erty, because the L 2 norm a metric and has these properties. By a straightforward ap-
plication of the Minkowski inequality for Lp metrics, it can also be shown that d 2 (., .)
obeys the triangle inequality; see Arkin et a/. (1991) for more detail. D
We can compute the value of hP,Q(t, 0) for a fixed t simply by adding up the value of
the integral within each strip defined by a consecutive pair of discontinuities in 0p(s)
and 0Q(s) (see figure 3). The integral within a strip is trivially computed as the width of
the strip times the square of the difference j0p(s +t)- eQ (s)l (which is constant within
each strip). Note that if m and n are the numbers of vertices in P and Q, respectively,
then there are m + n strips and that as 0 changes, the value of the integral for each strip
is a quadratic function of 0. Thus it is straightforward to show that,
CLAIM 3.2. For any fixed value oft, hP,Q(t, 0) is a quadratic function of 0.
In order to compute d2(P, Q), we must minimize hP,Q(t, 0) over all t and 0. We begin
by finding the optimal 0 for any fixed value oft. To simplify notation in the following
discussion, we use f(s) = 0p(s), g(s) = 0Q(s), and h(t, 0) = hp,Q(t, 0).
PROOF.
8h(t,B)
{)(}
fol (28 + 2/(s + t)- 2g(s)) ds
= 28 + 2 fo\f(s + t)- g(s)) d~.
Claim 3.2 assures us that the minimum occurs when we set this quantity equal to zero
and solve for B. Thus,
1
B*(t) = fo (g(s)- f(s + t)) ds.
0
. '
'
Substituting the expression for (}* (t) in d2 (P, Q) we are left with a one-variable mini-
mization problem,
1
1 2
d2(P,Q) = {min [ {
tE[O,l] }
[f(s+t) -g(s)j2 ds- [B*(t)F]} (3.1)
0
In order to compute d 2 (P, Q) we show that the function we are minimizing, h(t, B),
achieves its minimum at one of mn discrete points on [0, 1], which we call critical events.
Recall that in the process of finding d2 (P, Q) we have to shift the function f(s) to
f(s+t) fortE [0, 1]. During this shifting operation, the breakpoints off collide with the
breakpoints of g. We define a critical event as a value oft where a breakpoint off collides
with a breakpoint of g. Clearly there are mn such critical events for m breakpoints in f
and n breakpoints in g. Using the fact that the minimum is obtained at a critical event,
we show how to compute d2 (P, Q) in time O(n 2 log(n)) (or O(mn log(mn)) for unequal
numbers of vertices).
CLAIM 3.4. Iff(-) and g(·) are two piecewise-constant functions with m and n break-
points respectively, then for constant e,
1
h(t, B)= fo (f(s + t)- g(s) + 8) 2 ds
PROOF. We give a geometric proof. First recall that for a given value oft the disconti-
nuities in f and g define a set of m + n rectangular strips (see figure 3). The value of
h(t, 0) is simply the sum over all these strips of the width of a strip times the square of
its height. Except at critical events, as f is shifted the width of each strip changes, but
the height remains constant. Each changing rectangle contributes to changes in h(t, 0).
If t is the amount of shift, then for a shrinking rectangle, the change is ( -t) times the
square of the height; for a growing rectangle the change is (+t) times the square of the
height. Since the heights are constant, the change in h(t, 0) is a sum of linear terms and
is therefore linear. Breakpoints in h(t, 0) clearly occur at each of the mn critical events
where a discontinuity off is aligned with a discontinuity of g. D
This result leads to a straightforward algorithm for computing d2 (P, B). Let (t*, 0*)
be the location of the minimum value of h(t, 0). By the preceding proposition, h(t, 0*) is
piecewise-linear as a function oft with breakpoints at a fixed set of critical values; thus,
t* must be at one of the critical values. Now, h(t, O*(t)) = h(t, 0)- [O*(t)]2 = h(t, 0)-
[a- 21rt]2 (froip equation 3.1), so it suffices to evaluate h(t, 0) = J[f(s + t)- g(s)J2ds at
critical values oft. For each such value oft, recall that we can compute h(t, 0) in linear
time, simply by adding up the squared heights of all of the strips. The optimal value
O*(t) for each t can then be computed in constant time (by Claim 3.3). Thus the time
for each critical event is linear, and the overall running time is O(mn(m+n)). This time
bound can be improved by using a somewhat more complex algorithm.
CLAIM 3.5. The distance d 2 (P, Q) between two polygons P and Q (with m and n vertices}
can be computed exactly in time O(mn log(mn)).
The basic idea is to compute h(t, O*(t)) for each of the critical values oft. From the
above discussion we know that it suffices to evaluate h(t, 0) = f[J(s + t)- g(s)]2ds at
critical values oft. Now we observe that by keeping track of a small set of values we can
easily determine how the function h(t, 0) changes at each critical event. The values we
keep track of are based on the rectangular strips that appear between the two functions
f(s) and g(s). Recall that g(s) is fixed in place and that f(s) is shifted backwards by t.
For a given value oft, the discontinuities in f(s + t) and g(s) define a set of rectangular
strips, as was illustrated in figure 3. Each rectangular strip has f at the top and g at the
bottom or vice-versa. The sides of a strip are determined by discontinuities in f and g.
We separate the strips into two groups based on the discontinuities at the sides of the
strips: Rjg for those with f on the left and g on the right and Rgf for those with g on the
left and f on the right. We keep track of two quantities: Hfg and H 91 • H 19 is the sum
of the squares of the heights of all the strips in R 19 , and H 9 1 is the sum of the squares
of the heights of all the strips in R 9 1. The algorithm is based on the observation that for
values oft between two critical events the slope of h(t, 0) is Hjg- H 9 1. This follows from
the fact that, as f is shifted backwards by t, RJu is the set of all strips that increase in
width by t, and Rgf is the set of all strips that decrease in width by t. The widths of the
R11 and R 99 strips remain unchanged.
Consider what happens at one of the critical events, where the change is no longer
simply linear. We claim that the quantities Hjg and H 91 can be easily updated at these
Shape Metrics 217
0 60 w
0 1 2 3 4
QD6AA
5 6 7 8 9
Figure 4. Comparing several polygons with a square using d2(P,Q) ..
points. To see this note that, at a critical event, a gf-type strip disappears (its width
goes to zero) and a new fg-type strip appears (see figure 3). At the same time, the right
boundary of the adjacent strip to the left is converted from g to J, and the left boundary
of the adjacent strip to the right is converted from f. to g. To update H 19 and H gf we
need to know just the values of f and g around the critical event.
Algorithm 3.1. Given twq polygons P and Q, compute the distance d 2 (P, Q).
1. Compute the turning function representations, f and g, of the polygons P and Q,
respectively.
2. Initialize:
• Given the piecewise-constant functions f and g, determine the critical events:
the shifts off by t such that a discontinuity in f coincides with a discontinuity
in g. Sort these critical events by how far f must be shifted for each event
to occur. Let co, c1, ... , Ce be the ordered list of shifts for the critical events;
co= 0.
• Calculate h(O, 0). This involves summing the contributions of each of m + n
strips and takes linear time.
• Determine initial values for Hjg and Hgf.
3. For i = 1 to e
• Determine the value of
h(ci, 0) = (HJg- Hgf )(ci- Ci-d + h(ci-b 0).
• Update Hjg and Hgf·
It is easy to see that the time for initialization is dominated by the time it takes to sort
the critical events: 0( e loge), where e is the number of critical events, or 0( mn log( mn))
where m and n are the sizes of the two polygons. The updates required for the remainder
of the algorithm take a total of O(e), or O(mn) time.
We have implemented this method, and it runs quickly in practice, even for polygons
with hundreds of vertices. We illustrate some of the qualitative aspects of the distance
218 D.P. Huttenlocher and K. Kedem
0
U6AA 1 2 3 4
6WDDO 5 6 7 8 9
Figure 5. Comparing several polygons with a triangle using d2 (P, Q).
function d2 (P, Q) by comparing some simple polygons using the above algorithm. In
addition to providing a distance, d 2 (P, Q), between two polygons, the method gives the
relative orientation, ()*, and the corresponding reference points of the two polygons for
which this distance is attained. Consider the ten shapes shown in figures 4 and 5. In
figure 4 the shapes are ordered by their distance from the square, and in figure 5 the
same shapes are ordered by their distance from the triangle. (Note that the numbers under
each shape reflect just the ordering, and not the magnitude of the distance.) The order of
the shapes corresponds remarkably well to our intuitive idea of shape-resemblance. The
match to the cut-off triangle suggests that the metric is useful for matching partially
occluded objects, as long as the overall shape of the object does not change too radically.
A straightforward extension of the algorithm applies to shapes composed of piecewise
circular arcs rather than line segments. In this case, the function 0p(s) is piecewise linear
rather than piecewise constant.
4. Summary
We have discu~sed a number of methods for comparing shapes. We defined two geomet-
=
ric objects A and B to have the same shape, A B, if they are in the same equivalence
class with respect to a given transformation group (i.e. if T(A) = B for some transfor-
mation Tin the group). We have argued that shape comparison functions should obey
metric properties, and have presented several distance metrics that are efficiently com-
putable both in theory and practice. The methods are applicable to problems in pattern
recognition, computer- vision, and robotics. In particular, we described a function for
comparing sets of points or line segments in the plane using the Hausdorff distance as
a function of translation. This shape comparison function can be computed efficiently
in theory, and a close approximation can be computed efficiently in practice. We also
investigated how this method can be extended to sets of points in ~ 3 . The second class
of shape comparison functions we discussed are based on comparing the turning function
Shape Metrics 219
representations of polygons. This latter class of methods does not appear to extend easily
to shapes in higher dimensions, which is an important area for many applications.
References
P.K. Agarwal, M. Sharir and S. Toledo (1992), "Applications of-parametric searching in geometric
optimization", Proc. 3rd ACM-SIAM Symp. Discrete Algorithms, to appear.
H. Alt, B. Behrends and J. Blomer (1991), "Measuring the resemblance of polygonal shapes", Proc. 7th
ACM Symp. Computational Geom., N. Conway, NH, 186-193.
H. Alt, K. Mehlhom, H. Wagener and E. Welzl (1988), "Congruence, similarity, and symmetries of
geometric objects", Discrete and Computational Geom., 3, 237-256.
E. Arkin, L.P. Chew, D.P. Huttenlocher, K. Kedem and J.S.B Mitchell (1991 ), "An efficiently computable
metric for comparing polygonal shapes", IEEE Trans. Patt. Anal. Mach. In tell., 13(3), 209-216.
G. Borgefors {1986), "Distance transforms in digital images", Comput. Vision Graph. Image Processing,
34, 344-371.
H. Edelsbrunner, L.J. Guibas and M. Sharir (1989), "The upper envelope of piecewise linear functions",
Discrete and Computational Geom., 4, 311-336.
H. Edelsbrunner and R. Seidel (1986), "Voronoi diagrams and arrangements", Discrete and Computa-
tional Geom., 1, 25-44.
D.P. Huttenlocher and K. Kedem (1990), "Computing the minimum hausdorff distance for point sets
under translation", Proc. 6th ACM Symp. Computational Geom., Berkeley, CA, 340-349.
D.P. Huttenlocher, K. Kedem and M. Sharir (1991a), "The upper envelope of Voronoi surfaces and its
applications", Proc. 7th ACM Symp. Computational Geom., N. Conway, NH, 194-293.
D.P. Huttenlocher, G.A. Klanderman and W.J. Rucklidge {1991b), Comparing Images Using the Haus-
dorff Distance Under Translation, Technical Report 91-1211, Dept. of Comput. Sci., Comell Uni-
versity, Ithaca, NY.
D. Mumford {1987), "The problem of robust shape descriptors", Proc. 1st Int. Conf. Comput. Vision,
IEEE Comput. Soc. Press, 602-606.
J. Pach and M. Sharir (1989), "The upper envelope of piecewise linear functions and the region enclosed
by convex plates, I: combinatorial analysis", Discrete and Computational Geom., 4, 291-309.
H.L. Royden (1968), Real Analysis, Macmillan, NY.
J.T. Schwartz and M. Sharir (1984), Some Remarks on Robot Vision, Technical Report 119, Robotics
Report 25, Courant Institute of Math. Sci., New York University, NY.