0% found this document useful (0 votes)
11 views15 pages

Efficient Graph-Based Image Segmentation: Abstract

This paper presents an efficient graph-based algorithm for image segmentation that measures the evidence for boundaries between regions using a graph representation. The algorithm is designed to run in nearly linear time, capturing perceptually important non-local properties while preserving detail in low-variability regions. The authors demonstrate the effectiveness of their method through various examples, highlighting its practical applicability in computer vision tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views15 pages

Efficient Graph-Based Image Segmentation: Abstract

This paper presents an efficient graph-based algorithm for image segmentation that measures the evidence for boundaries between regions using a graph representation. The algorithm is designed to run in nearly linear time, capturing perceptually important non-local properties while preserving detail in low-variability regions. The authors demonstrate the effectiveness of their method through various examples, highlighting its practical applicability in computer vision tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

International Journal of Computer Vision 59(2), 167–181, 2004

c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.




Efficient Graph-Based Image Segmentation

PEDRO F. FELZENSZWALB
Artificial Intelligence Lab, Massachusetts Institute of Technology
[email protected]

DANIEL P. HUTTENLOCHER
Computer Science Department, Cornell University
[email protected]

Received September 24, 1999; Revised August 26, 2003; Accepted September 17, 2003

Abstract. This paper addresses the problem of segmenting an image into regions. We define a predicate for
measuring the evidence for a boundary between two regions using a graph-based representation of the image. We
then develop an efficient segmentation algorithm based on this predicate, and show that although this algorithm
makes greedy decisions it produces segmentations that satisfy global properties. We apply the algorithm to image
segmentation using two different kinds of local neighborhoods in constructing the graph, and illustrate the results
with both real and synthetic images. The algorithm runs in time nearly linear in the number of graph edges and
is also fast in practice. An important characteristic of the method is its ability to preserve detail in low-variability
image regions while ignoring detail in high-variability regions.

Keywords: image segmentation, clustering, perceptual organization, graph algorithm

1. Introduction Our goal is to develop computational approaches to


image segmentation that are broadly useful, much in
The problems of image segmentation and grouping re- the way that other low-level techniques such as edge
main great challenges for computer vision. Since the detection are used in a wide range of computer vision
time of the Gestalt movement in psychology (e.g., tasks. In order to achieve such broad utility, we believe
Wertheimer, 1938), it has been known that perceptual it is important that a segmentation method have the
grouping plays a powerful role in human visual percep- following properties:
tion. A wide range of computational vision problems
could in principle make good use of segmented images,
were such segmentations reliably and efficiently com- 1. Capture perceptually important groupings or re-
putable. For instance intermediate-level vision prob- gions, which often reflect global aspects of the im-
lems such as stereo and motion estimation require an age. Two central issues are to provide precise char-
appropriate region of support for correspondence oper- acterizations of what is perceptually important, and
ations. Spatially non-uniform regions of support can be to be able to specify what a given segmentation tech-
identified using segmentation techniques. Higher-level nique does. We believe that there should be pre-
problems such as recognition and image indexing can cise definitions of the properties of a resulting seg-
also make use of segmentation results in matching, to mentation, in order to better understand the method
address problems such as figure-ground separation and as well as to facilitate the comparison of different
recognition by parts. approaches.
168 Felzenszwalb and Huttenlocher

2. Be highly efficient, running in time nearly linear As with certain classical clustering methods
in the number of image pixels. In order to be of (Urquhart, 1982; Zahn, 1971), our method is based on
practical use, we believe that segmentation meth- selecting edges from a graph, where each pixel corre-
ods should run at speeds similar to edge detection or sponds to a node in the graph, and certain neighboring
other low-level visual processing techniques, mean- pixels are connected by undirected edges. Weights on
ing nearly linear time and with low constant factors. each edge measure the dissimilarity between pixels.
For example, a segmentation technique that runs at However, unlike the classical methods, our technique
several frames per second can be used in video pro- adaptively adjusts the segmentation criterion based on
cessing applications. the degree of variability in neighboring regions of the
image. This results in a method that, while making
greedy decisions, can be shown to obey certain non-
While the past few years have seen considerable obvious global properties. We also show that other
progress in eigenvector-based methods of image seg- adaptive criteria, closely related to the one developed
mentation (e.g., Shi and Malik, 1997; Weiss, 1999), here, result in problems that are computationally diffi-
these methods are too slow to be practical for many cult (NP hard).
applications. In contrast, the method described in this We now turn to a simple synthetic example illustrat-
paper has been used in large-scale image database ap- ing some of the non-local image characteristics cap-
plications as described in Ratan et al. (1999). While tured by our segmentation method. Consider the image
there are other approaches to image segmentation that shown in the top left of Fig. 1. Most people will say
are highly efficient, these methods generally fail to cap- that this image has three distinct regions: a rectangular-
ture perceptually important non-local properties of an shaped intensity ramp in the left half, a constant in-
image as discussed below. The segmentation technique tensity region with a hole on the right half, and a
developed here both captures certain perceptually im- high-variability rectangular region inside the constant
portant non-local image characteristics and is compu- region. This example illustrates some perceptually im-
tationally efficient—running in O(n log n) time for n portant properties that we believe should be captured by
image pixels and with low constant factors, and can run a segmentation algorithm. First, widely varying intensi-
in practice at video rates. ties should not alone be judged as evidence for multiple

Figure 1. A synthetic image with three perceptually distinct regions, and the three largest regions found by our segmentation method (image
320 × 240 pixels; algorithm parameters σ = 0.8, k = 300, see Section 5 for an explanation of the parameters).
Efficient Graph-Based Image Segmentation 169

regions. Such wide variation in intensities occurs both portions of the image. In the Appendix we show that a
in the ramp on the left and in the high variability re- straightforward generalization of the region compari-
gion on the right. Thus it is not adequate to assume son predicate presented in Section 3 makes the problem
that regions have nearly constant or slowly varying of finding a good segmentation NP-hard.
intensities.
A second perceptually important aspect of the exam-
ple in Fig. 1 is that the three regions cannot be obtained 2. Related Work
using purely local decision criteria. This is because the
intensity difference across the boundary between the There is a large literature on segmentation and clus-
ramp and the constant region is actually smaller than tering, dating back over 30 years, with applications in
many of the intensity differences within the high vari- many areas other than computer vision (cf. Jain and
ability region. Thus, in order to segment such an image, Dubes, 1988). In this section we briefly consider some
some kind of adaptive or non-local criterion must be of the related work that is most relevant to our approach:
used. early graph-based methods (e.g., Urquhart, 1982; Zahn,
The method that we introduce in Section 3.1 mea- 1971), region merging techniques (e.g., Cooper, 1998;
sures the evidence for a boundary between two regions Pavlidas, 1977), techniques based on mapping image
by comparing two quantities: one based on intensity pixels to some feature space (e.g., Comaniciu and Meer,
differences across the boundary, and the other based 1997, 1999) and more recent formulations in terms of
on intensity differences between neighboring pixels graph cuts (e.g., Shi and Malik, 1997; Wu and Leahy,
within each region. Intuitively, the intensity differences 1993) and spectral methods (e.g., Weiss, 1999).
across the boundary of two regions are perceptually Graph-based image segmentation techniques gener-
important if they are large relative to the intensity dif- ally represent the problem in terms of a graph G =
ferences inside at least one of the regions. We develop (V, E) where each node vi ∈ V corresponds to a pixel
a simple algorithm which computes segmentations us- in the image, and an edge (vi , v j ) ∈ E connects vertices
ing this idea. The remaining parts of Fig. 1 show the vi and v j . A weight is associated with each edge based
three largest regions found by our algorithm. Although on some property of the pixels that it connects, such
this method makes greedy decisions, it produces re- as their image intensities. Depending on the method,
sults that capture certain global properties which are there may or may not be an edge connecting each pair
derived below and whose consequences are illustrated of vertices. The earliest graph-based methods use fixed
by the example in Fig. 1. The method also runs in a thresholds and local measures in computing a segmen-
small fraction of a second for the 320 × 240 image in tation. The work of Zahn (1971) presents a segmen-
the example. tation method based on the minimum spanning tree
The organization of this paper is as follows. In the (MST) of the graph. This method has been applied
next Section we discuss some related work, includ- both to point clustering and to image segmentation.
ing both classical formulations of segmentation and For image segmentation the edge weights in the graph
recent graph-based methods. In Section 3 we consider are based on the differences between pixel intensities,
a particular graph-based formulation of the segmenta- whereas for point clustering the weights are based on
tion problem and define a pairwise region comparison distances between points.
predicate. Then in Section 4 we present an algorithm The segmentation criterion in Zahn’s method is
for efficiently segmenting an image using this predi- to break MST edges with large weights. The inade-
cate, and derive some global properties that it obeys quacy of simply breaking large edges, however, is il-
even though it is a greedy algorithm. In Section 5 we lustrated by the example in Fig. 1. As mentioned in
show results for a number of images using the image the introduction, differences between pixels within the
grid to construct a graph-based representation of the high variability region can be larger than those be-
image data. Then in Section 6 we illustrate the method tween the ramp and the constant region. Thus, de-
using more general graphs, but where the number of pending on the threshold, simply breaking large weight
edges is still linear in the number of pixels. Using this edges would either result in the high variability re-
latter approach yields results that capture high-level gion being split into multiple regions, or would merge
scene properties such as extracting a flower bed as a the ramp and the constant region together. The algo-
single region, while still preserving fine detail in other rithm proposed by Urquhart (1982) attempts to address
170 Felzenszwalb and Huttenlocher

this shortcoming by normalizing the weight of an edge cut criterion developed by Shi and Malik (1997), which
using the smallest weight incident on the vertices touch- takes into account self-similarity of regions. These cut-
ing that edge. When applied to image segmentation based approaches to segmentation capture non-local
problems, however, this is not enough to provide a rea- properties of the image, in contrast with the early
sonable adaptive segmentation criterion. For example, graph-based methods. However, they provide only a
many pixels in the high variability region of Fig. 1 have characterization of each cut rather than of the final
some neighbor that is highly similar. segmentation.
Another early approach to image segmentation is The normalized cut criterion provides a significant
that of splitting and merging regions according to how advance over the previous work in Wu and Leahy
well each region fits some uniformity criterion (e.g., (1993), both from a theoretical and practical point of
Cooper, 1998; Pavlidas, 1977). Generally these unifor- view (the resulting segmentations capture intuitively
mity criteria obey a subset property, such that when a salient parts of an image). However, the normalized
uniformity predicate U (A) is true for some region A cut criterion also yields an NP-hard computational
then U (B) is also true for any B ⊂ A. Usually such problem. While Shi and Malik develop approxima-
criteria are aimed at finding either uniform intensity tion methods for computing the minimum normalized
or uniform gradient regions. No region uniformity cri- cut, the error in these approximations is not well un-
terion that has been proposed to date could be used derstood. In practice these approximations are still
to correctly segment the example in Fig. 1, due to the fairly hard to compute, limiting the method to rel-
high variation region. Either this region would be split atively small images or requiring computation times
into pieces, or it would be merged with the surrounding of several minutes. Recently Weiss (1999) has shown
area. how the eigenvector-based approximations developed
A number of approaches to segmentation are based by Shi and Malik relate to more standard spectral parti-
on finding compact clusters in some feature space (cf. tioning methods on graphs. However, all such methods
Comaniciu and Meer, 1997; Jain and Dubes, 1988). are too slow for many practical applications.
These approaches generally assume that the image is An alternative to the graph cut approach is to look
piecewise constant, because searching for pixels that for cycles in a graph embedded in the image plane. For
are all close together in some feature space implicitly example in Jermyn and Ishikawa (2001) the quality of
requires that the pixels be alike (e.g., similar color). each cycle is normalized in a way that is closely related
A recent technique using feature space clustering to the normalized cuts approach.
(Comaniciu and Meer, 1999) first transforms the data
by smoothing it in a way that preserves boundaries be-
tween regions. This smoothing operation has the over- 3. Graph-Based Segmentation
all effect of bringing points in a cluster closer together.
The method then finds clusters by dilating each point We take a graph-based approach to segmentation. Let
with a hypersphere of some fixed radius, and finding G = (V, E) be an undirected graph with vertices
connected components of the dilated points. This tech- v ∈ V , the set of elements to be segmented, and edges
nique for finding clusters does not require all the points (vi , v j ) ∈ E corresponding to pairs of neighboring
in a cluster to lie within any fixed distance. The tech- vertices. Each edge (vi , v j ) ∈ E has a corresponding
nique is actually closely related to the region compar- weight w(vi , v j ), which is a non-negative measure of
ison predicate that we introduce in Section 3.1, which the dissimilarity between neighboring elements vi and
can be viewed as an adaptive way of selecting an ap- v j . In the case of image segmentation, the elements in
propriate dilation radius. We return to this issue in V are pixels and the weight of an edge is some mea-
Section 6. sure of the dissimilarity between the two pixels con-
Finally we briefly consider a class of segmentation nected by that edge (e.g., the difference in intensity,
methods based on finding minimum cuts in a graph, color, motion, location or some other local attribute).
where the cut criterion is designed in order to mini- In Sections 5 and 6 we consider particular edge sets and
mize the similarity between pixels that are being split. weight functions for image segmentation. However, the
Work by Wu and Leahy (1993) introduced such a cut formulation here is independent of these definitions.
criterion, but it was biased toward finding small com- In the graph-based approach, a segmentation S
ponents. This bias was addressed with the normalized is a partition of V into components such that each
Efficient Graph-Based Image Segmentation 171

component (or region) C ∈ S corresponds to a con- tation NP-hard, as discussed in the Appendix. Thus
nected component in a graph G ′ = (V, E ′ ), where a small change to the segmentation criterion vastly
E ′ ⊆ E. In other words, any segmentation is induced changes the difficulty of the problem.
by a subset of the edges in E. There are different ways The region comparison predicate evaluates if there
to measure the quality of a segmentation but in general is evidence for a boundary between a pair or compo-
we want the elements in a component to be similar, nents by checking if the difference between the compo-
and elements in different components to be dissimi- nents, Dif (C1 , C2 ), is large relative to the internal dif-
lar. This means that edges between two vertices in the ference within at least one of the components, Int(C1 )
same component should have relatively low weights, and Int(C2 ). A threshold function is used to control the
and edges between vertices in different components degree to which the difference between components
should have higher weights. must be larger than minimum internal difference. We
define the pairwise comparison predicate D(C1 , C2 ),
as
3.1. Pairwise Region Comparison Predicate

true if Dif (C1 , C2 ) > MInt(C1 , C2 )
In this section we define a predicate, D, for evaluating D(C1 , C2 ) =
false otherwise
whether or not there is evidence for a boundary be-
(3)
tween two components in a segmentation (two regions
of an image). This predicate is based on measuring the
dissimilarity between elements along the boundary of where the minimum internal difference, MInt, is de-
the two components relative to a measure of the dissim- fined as,
ilarity among neighboring elements within each of the
two components. The resulting predicate compares the MInt(C1 , C2 )
inter-component differences to the within component = min(Int(C1 ) + τ (C1 ), Int(C2 ) + τ (C2 )). (4)
differences and is thereby adaptive with respect to the
local characteristics of the data. The threshold function τ controls the degree to which
We define the internal difference of a component the difference between two components must be greater
C ⊆ V to be the largest weight in the minimum span- than their internal differences in order for there to be
ning tree of the component, MST(C, E). That is, evidence of a boundary between them (D to be true).
For small components, Int(C) is not a good estimate
Int(C) = max w(e). (1) of the local characteristics of the data. In the extreme
e∈MST(C,E)
case, when |C| = 1, Int(C) = 0. Therefore, we use a
threshold function based on the size of the component,
One intuition underlying this measure is that a given
component C only remains connected when edges of
τ (C) = k/|C| (5)
weight at least Int(C) are considered.
We define the difference between two components
C1 , C2 ⊆ V to be the minimum weight edge connect- where |C| denotes the size of C, and k is some constant
ing the two components. That is, parameter. That is, for small components we require
stronger evidence for a boundary. In practice k sets a
scale of observation, in that a larger k causes a pref-
Dif (C1 , C2 ) = min w(vi , v j ). (2)
vi ∈C1 ,v j ∈C2 ,(vi ,v j )∈E erence for larger components. Note, however, that k is
not a minimum component size. Smaller components
If there is no edge connecting C1 and C2 we let are allowed when there is a sufficiently large difference
Dif (C1 , C2 ) = ∞. This measure of difference could between neighboring components.
in principle be problematic, because it reflects only Any non-negative function of a single component
the smallest edge weight between two components. In can be used for τ without changing the algorithmic
practice we have found that the measure works quite results in Section 4. For instance, it is possible to have
well in spite of this apparent limitation. Moreover, the segmentation method prefer components of certain
changing the definition to use the median weight, or shapes, by defining a τ which is large for components
some other quantile, in order to make it more robust to that do not fit some desired shape and small for ones
outliers, makes the problem of finding a good segmen- that do. This would cause the segmentation algorithm
172 Felzenszwalb and Huttenlocher

to aggressively merge components that are not of the too fine, so such a segmentation is not unique. On the
desired shape. Such a shape preference could be as question of existence, there is always some segmenta-
weak as preferring components that are not long and tion that is both not too coarse and not too fine, as we
thin (e.g., using a ratio of perimeter to area) or as strong now establish.
as preferring components that match a particular shape
model. Note that the result of this would not solely be Property 1. For any (finite) graph G = (V, E) there
components of the desired shape, however for any two exists some segmentation S that is neither too coarse
neighboring components one of them would be of the nor too fine.
desired shape.
It is easy to see why this property holds. Consider
4. The Algorithm and its Properties the segmentation where all the elements are in a single
component. Clearly this segmentation is not too fine,
In this section we describe and analyze an algorithm for because there is only one component. If the segmenta-
producing a segmentation using the decision criterion tion is also not too coarse we are done. Otherwise, by
D introduced above. We will show that a segmenta- the definition of what it means to be too coarse there
tion produced by this algorithm obeys the properties of is a proper refinement that is not too fine. Pick one of
being neither too coarse nor too fine, according to the those refinements and keep repeating this procedure
following definitions. until we obtain a segmentation that is not too coarse.
The procedure can only go on for n − 1 steps because
Definition 1. A segmentation S is too fine if there is whenever we pick a proper refinement we increase the
some pair of regions C1 , C2 ∈ S for which there is no number of components in the segmentation by at least
evidence for a boundary between them. one, and the finest segmentation we can get is the one
where every element is in its own component.
In order to define the complementary notion of what We now turn to the segmentation algorithm, which is
it means for a segmentation to be too coarse (to have closely related to Kruskal’s algorithm for constructing
too few components), we first introduce the notion of a a minimum spanning tree of a graph (cf. Cormen et al.,
refinement of a segmentation. Given two segmentations 1990). It can be implemented to run in O(m log m)
S and T of the same base set, we say that T is a refine- time, where m is the number of edges in the graph.
ment of S when each component of T is contained in
(or equal to) some component of S. In addition, we say Algorithm 1. Segmentation algorithm.
that T is a proper refinement of S when T = S. Note
that if T is a proper refinement of S, then T can be The input is a graph G = (V, E), with n vertices
obtained by splitting one or more regions of S. When and m edges. The output is a segmentation of V into
T is a proper refinement of S we say that T is finer than components S = (C1 , . . . , Cr ).
S and that S is coarser than T .
0. Sort E into π = (o1 , . . . , om ), by non-decreasing
Definition 2. A segmentation S is too coarse when edge weight.
there exists a proper refinement of S that is not too 1. Start with a segmentation S 0 , where each vertex vi
fine. is in its own component.
2. Repeat step 3 for q = 1, . . . , m.
This captures the intuitive notion that if regions of a seg- 3. Construct S q given S q−1 as follows. Let vi and
mentation can be split and yield a segmentation where v j denote the vertices connected by the q-th edge
there is evidence for a boundary between all pairs of in the ordering, i.e., oq = (vi , v j ). If vi and v j
neighboring regions, then the initial segmentation has are in disjoint components of S q−1 and w(oq ) is
too few regions. small compared to the internal difference of both
Two natural questions arise about segmentations that those components, then merge the two components
q−1
are neither too coarse nor too fine, namely whether or otherwise do nothing. More formally, let Ci be
q−1
not one always exists, and if so whether or not it is the component of S q−1 containing vi and C j
q−1 q−1
unique. First we note that in general there can be more the component containing v j . If Ci = C j and
q−1 q−1
than one segmentation that is neither too coarse nor w(oq ) ≤ MInt(Ci , C j ) then S q is obtained
Efficient Graph-Based Image Segmentation 173

q−1 q−1 q−1 q−1


from S q−1 by merging Ci and C j . Otherwise implies w(oq ) > MInt(Ci , C j ). By Lemma 1 we
q−1 q−1
S q = S q−1 . know that either Ci = Ci or C j = C j . Either way
4. Return S = S m . we see that w(oq ) > MInt(Ci , C j ) implying D holds
for Ci and C j , which is a contradiction.
We now establish that a segmentation S produced
by Algorithm 1 obeys the global properties of being Theorem 2. The segmentation S produced by
neither too fine nor too coarse when using the re- Algorithm 1 is not too coarse according to Definition 2,
gion comparison predicate D defined in (3). That is, using the region comparison predicate D defined in
although the algorithm makes only greedy decisions (3).
it produces a segmentation that satisfies these global
properties. Moreover, we show that any of the possi- Proof: In order for S to be too coarse there must be
ble non-decreasing weight edge orderings that could some proper refinement, T , that is not too fine. Con-
be picked in Step 0 of the algorithm produce the same sider the minimum weight edge e that is internal to a
segmentation. component C ∈ S but connects distinct components
A, B ∈ T . Note that by the definition of refinement
Lemma 1. In Step 3 of the algorithm, when consider- A ⊂ C and B ⊂ C.
ing edge oq , if two distinct components are considered Since T is not too fine, either w(e) > Int(A) + τ (A)
and not merged then one of these two components will or w(e) > Int(B)+τ (B). Without loss of generality, say
q−1 q−1
be in the final segmentation. Let Ci and C j denote the former is true. By construction any edge connecting
the two components connected by edge oq = (vi , v j ) A to another sub-component of C has weight at least as
when this edge is considered by the algorithm. Then ei- large as w(e), which is in turn larger than the maximum
q−1 q−1
ther Ci = Ci or C j = C j , where Ci is the compo- weight edge in MST(A, E) because w(e) > Int(A).
nent containing vi and C j is the component containing Thus the algorithm, which considers edges in non-
v j in the final segmentation S. decreasing weight order, must consider all the edges
in MST(A, E) before considering any edge from A to
Proof: There are two cases that would result other parts of C. So the algorithm must have formed
in a merge not happening. Say that it is due to A before forming C, and in forming C it must have
q−1 q−1
w(oq ) > Int(Ci ) + τ (Ci ). Since edges are consid- merged A with some other sub-component of C. The
ered in non-decreasing weight order, w(ok ) ≥ w(oq ), weight of the edge that caused this merge must be least
for all k ≥ q + 1. Thus no additional merges will as large as w(e). However, the algorithm would not have
q−1
happen to this component, i.e., Ci = Ci . The case merged A in this case because w(e) > Int(A) + τ (A),
q−1 q−1
for w(oq ) > Int(C j ) + τ (C j ) is analogous. which is a contradiction because the algorithm did
form C.
Note that Lemma 1 implies that the edge causing
the merge of two components is exactly the minimum Theorem 3. The segmentation produced by
weight edge between the components. Thus the edges Algorithm 1 does not depend on which non-decreasing
causing merges are exactly the edges that would be weight order of the edges is used.
selected by Kruskal’s algorithm for constructing the
minimum spanning tree (MST) of each component. Proof: Any ordering can be changed into another
one by only swapping adjacent elements. Thus it is
Theorem 1. The segmentation S produced by sufficient to show that swapping the order of two ad-
Algorithm 1 is not too fine according to Definition 1, jacent edges of the same weight in the non-decreasing
using the region comparison predicate D defined in (3). weight ordering does not change the result produced by
Algorithm 1.
Proof: By definition, in order for S to be too fine Let e1 and e2 be two edges of the same weight that
there is some pair of components for which D does not are adjacent in some non-decreasing weight ordering.
hold. There must be at least one edge between such Clearly if when the algorithm considers the first of these
a pair of components that was considered in Step 3 two edges they connect disjoint pairs of components
and did not cause a merge. Let oq = (vi , v j ) be the or exactly the same pair of components, then the or-
first such edge in the ordering. In this case the algo- der in which the two are considered does not matter.
q−1 q−1
rithm decided not to merge Ci with C j which The only case we need to check is when e1 is between
174 Felzenszwalb and Huttenlocher

two components A and B and e2 is between one of size of a component after a merge is simply the sum of
these components, say B, and some other component the sizes of the two components being merged.
C.
Now we show that e1 causes a merge when con-
5. Results for Grid Graphs
sidered after e2 exactly when it would cause a merge
if considered before e2 . First, suppose that e1 causes
First we consider the case of monochrome (intensity)
a merge when considered before e2 . This implies
images. Color images are handled as three separate
w(e1 ) ≤ MInt(A, B). If e2 were instead considered
monochrome images, as discussed below. As in other
before e1 , either e2 would not cause a merge and triv-
graph-based approaches to image segmentation (e.g.,
ially e1 would still cause a merge, or e2 would cause
Shi and Malik, 1997; Wu and Leahy, 1993; Zahn, 1971)
a merge in which case the new component B ∪ C
we define an undirected graph G = (V, E), where each
would have Int(B ∪ C) = w(e2 ) = w(e1 ). So we
image pixel pi has a corresponding vertex vi ∈ V . The
know w(e1 ) ≤ MInt(A, B ∪ C) which implies e1 will
edge set E is constructed by connecting pairs of pixels
still cause a merge. On the other hand, suppose that
that are neighbors in an 8-connected sense (any other
e1 does not cause a merge when considered before
local neighborhood could be used). This yields a graph
e2 . This implies w(e1 ) > MInt(A, B). Then either
where m = O(n), so the running time of the segmenta-
w(e1 ) > Int(A) + τ (A), in which case this would
tion algorithm is O(n log n) for n image pixels. We use
still be true if e2 were considered first (because e2
an edge weight function based on the absolute intensity
does not touch A), or w(e1 ) > Int(B) + τ (B). In this
difference between the pixels connected by an edge,
second case, if e2 were considered first it could not
cause a merge since w(e2 ) = w(e1 ) and so w(e2 ) >
MInt(B, C). Thus when considering e1 after e2 we still w(vi , v j ) = |I ( pi ) − I ( p j )|
have w(e1 ) > MInt(A, B) and e1 does not cause a
merge. where I ( pi ) is the intensity of the pixel pi . In general we
use a Gaussian filter to smooth the image slightly before
computing the edge weights, in order to compensate for
4.1. Implementation Issues and Running Time digitization artifacts. We always use a Gaussian with
σ = 0.8, which does not produce any visible change
Our implementation maintains the segmentation S us- to the image but helps remove artifacts.
ing a disjoint-set forest with union by rank and path For color images we run the algorithm three times,
compression (cf. Cormen et al., 1990). The running once for each of the red, green and blue color planes,
time of the algorithm can be factored into two parts. and then intersect these three sets of components.
First in Step 0, it is necessary to sort the weights into Specifically, we put two neighboring pixels in the same
non-decreasing order. For integer weights this can be component when they appear in the same component in
done in linear time using counting sort, and in general all three of the color plane segmentations. Alternatively
it can be done in O(m log m) time using any one of one could run the algorithm just once on a graph where
several sorting methods. the edge weights measure the distance between pixels
Steps 1–3 of the algorithm take O(mα(m)) time, in some color space, however experimentally we ob-
where α is the very slow-growing inverse Ackerman’s tained better results by intersecting the segmentations
function. In order to check whether two vertices are in for each color plane in the manner just described.
the same component we use set-find on each vertex, and There is one runtime parameter for the algorithm,
in order to merge two components we use use set-union. which is the value of k that is used to compute
Thus there are at most three disjoint-set operations per the threshold function τ . Recall we use the function
edge. The computation of MInt can be done in constant τ (C) = k/|C| where |C| is the number of elements
time per edge if we know Int and the size of each com- in C. Thus k effectively sets a scale of observation, in
ponent. Maintaining Int for a component can be done in that a larger k causes a preference for larger compo-
constant time for each merge, as the maximum weight nents. We use two different parameter settings for the
edge in the MST of a component is simply the edge examples in this section (and throughout the paper),
causing the merge. This is because Lemma 1 implies depending on the resolution of the image and the de-
that the edge causing the merge is the minimum weight gree to which fine detail is important in the scene. For
edge between the two components being merged. The instance, in the 128×128 images of the COIL database
Efficient Graph-Based Image Segmentation 175

of objects we use k = 150. In the 320 × 240 or larger inated if desired, by removing long thin regions whose
images, such as the street scene and the baseball player, color or intensity is close to the average of neighboring
we use k = 300. regions.
The first image in Fig. 2 shows a street scene. Note Figure 5 shows three simple objects from the
that there is considerable variation in the grassy slope Columbia COIL image database. Shown for each is the
leading up to the fence. It is this kind of variability that largest region found by our algorithm that is not part
our algorithm is designed to handle (recall the high of the black background. Note that each of these ob-
variability region in the synthetic example in Fig. 1). jects has a substantial intensity gradient across the face
The second image shows the segmentation, where each of the object, yet the regions are correctly segmented.
region is assigned a random color. The six largest com- This illustrates another situation that the algorithm was
ponents found by the algorithm are: three of the grassy designed to handle, slow changes in intensity due to
areas behind the fence, the grassy slope, the van, and lighting.
the roadway. The missing part of the roadway at the
lower left is a visibly distinct region in the color image
from which this segmentation was computed (a spot 6. Results for Nearest Neighbor Graphs
due to an imaging artifact). Note that the van is also
not uniform in color, due to specular reflections, but One common approach to image segmentation is based
these are diffuse enough that they are treated as inter- on mapping each pixel to a point in some feature
nal variation and incorporated into a single region. space, and then finding clusters of similar points (e.g.,
The first image in Fig. 3 shows two baseball players Comaniciu and Meer, 1997, 1999; Jain and Dubes,
(from Shi and Malik, 1997). As in the previous exam- 1988). In this section we investigate using the graph-
ple, there is a grassy region with considerable variation. based segmentation algorithm from Section 4 in order
The uniforms of the players also have substantial vari- to find such clusters of similar points. In this case, the
ation due to folds in the cloth. The second image shows graph G = (V, E) has a vertex corresponding to each
the segmentation. The six largest components found by feature point (each pixel) and there is an edge (vi , v j )
the algorithm are: the back wall, the Mets emblem, a connecting pairs of feature points vi and v j that are
large grassy region (including part of the wall under nearby in the feature space, rather than using neighbor-
the top player), each of the two players’ uniforms, and ing pixels in the image grid. There are several possible
a small grassy patch under the second player. The large ways of determining which feature points to connect
grassy region includes part of the wall due to the rela- by edges. We connect each point to a fixed number of
tively high variation in the region, and the fact that there nearest neighbors. Another possibility is to use all the
is a long slow change in intensity (not strong evidence neighbors within some fixed distance d. In any event,
for a boundary) between the grass and the wall. This it is desirable to avoid considering all O(n 2 ) pairs of
“boundary” is similar in magnitude to those within the feature points.
player uniforms due to folds in the cloth. The weight w(vi , v j ) of an edge is the distance be-
Figure 4 shows the results of the algorithm for an tween the two corresponding points in feature space.
image of an indoor scene, where both fine detail and For the experiments shown here we map each pixel
larger structures are perceptually important. Note that to the feature point (x, y, r, g, b), where (x, y) is the
the segmentation preserves small regions such as the location of the pixel in the image and (r, g, b) is the
name tags the people are wearing and things behind color value of the pixel. We use the L 2 (Euclidean)
the windows, while creating single larger regions for distance between points as the edge weights, although
high variability areas such as the air conditioning duct other distance functions are possible.
near the top of the image, the clothing and the fur- The internal difference measure, Int(C), has a rela-
niture. This image also shows that sometimes small tively simple underlying intuition for points in feature
“boundary regions” are found, for example at the edge space. It specifies the minimum radius of dilation
of the jacket or shirt. Such narrow regions occur be- necessary to connect the set of feature points con-
cause there is a one or two pixel wide area that is tained in C together into a single volume in feature
halfway between the two neighboring regions in color space. Consider replacing each feature point by a ball
and intensity. This is common in any segmentation with radius r . From the definition of the MST it can
method based on grid graphs. Such regions can be elim- be seen that the union of these balls will form one
176 Felzenszwalb and Huttenlocher

Figure 2. A street scene (320 × 240 color image), and the segmentation results produced by our algorithm (σ = 0.8, k = 300).

Figure 3. A baseball scene (432 × 294 grey image), and the segmentation results produced by our algorithm (σ = 0.8, k = 300).

Figure 4. An indoor scene (image 320 × 240, color), and the segmentation results produced by our algorithm (σ = 0.8, k = 300).

single connected volume only when r ≥ Int(C)/2. space (however they first use a novel transformation
The difference between components, Dif (C1 , C2 ), also of the data that we do not perform, and then use a
has a simple underlying intuition. It specifies the min- fixed dilation radius rather than the variable one that we
imum radius of dilation necessary to connect at least use).
one point of C1 to a point of C2 . Our segmentation tech- Rather than constructing the complete graph, where
nique is thus closely related to the work of Comaniciu all points are neighbors of one another, we find a
and Meer (1999), which similarly takes an approach small fixed number of neighbors for each point. This
to clustering based on dilating points in a parameter results in a graph with O(n) edges for n image
Efficient Graph-Based Image Segmentation 177

neighbors (if their color is highly similar and interven-


ing image pixels are of dissimilar color). For instance,
this can result segmentations with regions that are dis-
connected in the image, which did not happen in the
grid-graph case.
Figure 6 shows a synthetic image from Perona and
Freeman (1998) and Gdalyahu et al. (1999) and its
segmentation, using k = 150 and with no smoothing
(σ = 0). In this example the spatially disconnected re-
gions do not reflect interesting scene structures, but we
will see examples below which do.
For the remaining examples in this section, we use
k = 300 and σ = 0.8, as in the previous section.
Figure 5. Three images from the COIL database, and the largest First, we note that the nearest neighbor graph produces
non-background component found in each image (128 × 128 color similar results to the grid graph for images in which
images; algorithm parameters σ = 0.8, k = 150).
the perceptually salient regions are spatially connected.
For instance, the street scene and baseball player scene
pixels, and an overall running time of the segmenta- considered in the previous section yield very similar
tion method of O(n log n) time. There are many pos- segmentations using either the nearest neighbor graph
sible ways of picking a small fixed number of neigh- or the grid graph, as can be seen by comparing the
bors for each point. We use the ANN algorithm (Arya results in Fig. 7 with those in Figs. 2 and 3.
and Mount, 1993) to find the nearest neighbors for Figure 8 shows two additional examples using the
each point. This algorithm is quite fast in practice, nearest neighbor graph. These results are not possible
given a 5-dimensional feature space with several hun- to achieve with the grid graph approach because certain
dred thousand points. The ANN method also allows interesting regions are not spatially connected. The first
the finding of approximate nearest neighbors, which example shows a flower garden, where the red flowers
runs more quickly than finding the actual nearest neigh- are spatially disjoint in the foreground of the image, and
bors. For the examples reported here we use ten near- then merge together in the background. Most of these
est neighbors of each pixel to generate the edges of the flowers are merged into a single region, which would
graph. not be possible with the grid-graph method. The sec-
One of the key differences from the previous section, ond example in Fig. 8 shows the Eiffel tower at night.
where the image grid was used to define the graph, The bright yellow light forms a spatially disconnected
is that the nearest neighbors in feature space capture region. These examples show that the segmentation
more spatially non-local properties of the image. In the method, coupled with the use of a nearest neighbor
grid-graph case, all of the neighbors in the graph are graph, can capture very high level properties of im-
neighbors in the image. Here, points can be far apart ages, while preserving perceptually important region
in the image and still be among a handful of nearest boundaries.

Figure 6. A synthetic image (40 × 32 grey image) and the segmentation using the nearest neighbor graph (σ = 0, k = 150).
178 Felzenszwalb and Huttenlocher

Figure 7. Segmentation of the street and baseball player scenes from the previous section, using the nearest neighbor graph rather than the
grid graph (σ = 0.8, k = 300).

Figure 8. Segmentation using the nearest neighbor graph can capture spatially non-local regions (σ = 0.8, k = 300).

7. Summary and Conclusions between pairs of regions. Our segmentation algorithm


makes simple greedy decisions, and yet produces seg-
In this paper we have introduced a new method for mentations that obey the global properties of being not
image segmentation based on pairwise region compar- too coarse and not too fine using a particular region
ison. We have shown that the notions of a segmentation comparison function. The method runs in O(m log m)
being too coarse or too fine can be defined in terms of a time for m graph edges and is also fast in practice,
function which measures the evidence for a boundary generally running in a fraction of a second.
Efficient Graph-Based Image Segmentation 179

The pairwise region comparison predicate we use tion that is neither too coarse nor too fine becomes
considers the minimum weight edge between two re- NP-hard.
gions in measuring the difference between them. Thus The only difference between the new problem and
our algorithm will merge two regions even if there is the old one is the definition of the difference between
a single low weight edge between them. This is not two regions C1 , C2 ∈ S in Eq. (2), which becomes
as much of a problem as it might first appear, in part
because this edge weight is compared only to the min- Dif (C1 , C2 ) = K th w(vi , v j ) (6)
imum spanning tree edges of each component. For in-
stance, the examples considered in Sections 5 and 6 where K th selects the K th quantile of its arguments
illustrate that the method finds segmentations that cap- (K should be between zero and one). For example
ture many perceptually important aspects of complex with K = 0.5 the difference becomes the median edge
imagery. Nonetheless, one can envision measures that weight between the two components. This quantile is
require more than a single cheap connection before de- computed over all edges (vi , v j ) such that vi ∈ C1 and
ciding that there is no evidence for a boundary between v j ∈ C2 .
two regions. One natural way of addressing this issue is We reduce the min ratio cut problem with uniform
to use a quantile rather than the minimum edge weight. capacities and demands to the problem of finding a
However, in this case finding a segmentation that is nei- good segmentation. The min ratio cut problem with
ther too coarse nor too fine is an NP-hard problem (as uniform capacities and demands is the following: we
shown in the Appendix). Our algorithm is unique, in are given a graph G = (V, E) and another set of edges
that it is both highly efficient and yet captures non-local F. Each edge in E indicates a unit capacity between
properties of images. a pair of nodes and each edge in F indicates a unit
We have illustrated our image segmentation algo- demand between a pair of nodes. The value of a cut
rithm with two different kinds of graphs. The first of (A, B) is the ratio of the total capacity and the total
these uses the image grid to define a local neighborhood demand between the sets A and B. So it is the ratio of
between image pixels, and measures the difference in the number edges in E crossing the cut and the number
intensity (or color) between each pair of neighbors. of edges in F crossing the cut. Finding the value of the
The second of these maps the image pixels to points in minimum ratio cut is an NP-hard problem (cf. Ausiello
a feature space that combines the (x, y) location and et al., to appear).
(r, g, b) color value. Edges in the graph connect points First we show how to transform an instance of this
that are close together in this feature space. The algo- problem to one where the sets E and F are disjoint,
rithm yields good results using both kinds of graphs, without modifying the value of the minimum cut. For
but the latter type of graph captures more perceptually every edge (a, b) ∈ E ∩ F we create a new node ab,
global aspects of the image. and exchange the edge in E with the edges (a, ab) and
Image segmentation remains a challenging problem, (b, ab). For a cut with a and b in the same side, its
however we are beginning to make substantial progress always better to keep ab in that side too and the value
through the introduction of graph-based algorithms of the cut is the same as in the original graph. For a
that both help refine our understanding of the problem cut with a and b in different sides the node ab can be
and provide useful computational tools. The work re- in either side and there will be one capacity and one
ported here and the normalized cuts approach (Shi and demand edge crossing the cut and the value of cut is
Malik, 1997) are just a few illustrations of these recent again the same as in the original graph.
advances. Now we show how to decide if the modified instance
of the min ratio cut problem has a cut with value at
most v by solving a segmentation problem. Let c be
Appendix: NP-Hardness of D with Quantiles
the number of edges from E crossing a cut (A, B) and
similarly d is the number of edges from F crossing
Intuitively the region comparison predicate D de-
(A, B). It’s easy to show that the cut value is small
fined in Section 3.1 could be made more robust by
exactly when the fraction of edges crossing the cut that
changing the definition of the difference between two
come from F is large,
regions to reflect a quantile rather than the mini-
mum weight edge between them. We show that with c d 1
this modification the problem of finding a segmenta- ≤v⇔ ≥ (7)
d c+d v+1
180 Felzenszwalb and Huttenlocher

Define G ′ = (V, E ′ ) where E ′ = E ∪ F. We let the Acknowledgments


edges from E have weight zero and the edges from F
have weight one. This work was supported in part by gifts from Intel,
Microsoft and Xerox corporations, in part by DARPA
Lemma 2. The graph G has a cut with value at most under contract DAAL01-97-K-0104, and in part by
v if and only if a segmentation of G ′ is not one single NSF Research Infrastructure award CDA-9703470.
component, where Dif is defined in Eq. (6), K = 1 − We would like to thank Shree Nayar, Jianbo Shi and
1/(v + 1) and τ (C) = 0 for all C. Daphna Weinshall for use of their images. We would
also like to thank Jon Kleinberg, Eva Tardos and Dan
Proof: First we show that if G has a cut (A, B) with Ramras for discussions about the algorithm and the NP
value at most v there exists C ⊆ A such that the seg- hardness result.
mentation {C, C̄} is not too fine. We just need to find C
such that Int(C) = 0 and Dif (C, C̄) = 1. If G has a cut References
(A, B) with value at most v, than Eq. (7) tells us that
d/(c+d) ≥ 1/(v+1). Remember that there are d edges Arya, S. and Mount, D.M. 1993. Approximate nearest neighbor
of weight one and c edges of weight zero crossing the searching. In Proc. 4th Annual ACM-SIAM Symposium on Dis-
cut. So the fraction of weight one edges crossing the crete Algorithms, pp. 271–280.
Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti
cut is at least 1/(v + 1). Look at the connected com-
Spaccamela, A., and Protasi, M. (to appear). Complexity and Ap-
ponents of A using only edges of weight zero. Clearly proximation. Combinatorial Optimization Problems and their Ap-
Int(C) = 0 for all such components. Let C be the com- proximability Properties. Springer-Verlag: Berlin.
ponent with largest fraction of weight one edges going Comaniciu, D. and Meer, P. 1997. Robust analysis of feature
to B. This fraction must be at least 1/(v + 1). More- spaces: Color image segmentation. In Proceedings of IEEE Con-
ference on Computer Vision and Pattern Recognition, pp. 750–
over, the fraction of weight one edges between C and
755.
C̄ = V − C is at least as large since C̄ = B ∪ (C̄ ∩ A) Comaniciu, D. and Meer, P. 1999. Mean shift analysis and appli-
and the there are only weight one edges between C and cations. In Proceedings of IEEE Conference on Computer Vision
C̄ ∩ A. This implies the fraction of weight zero edges and Pattern Recognition, pp. 1197–1203.
between C and C̄ is less than 1 − 1/(v + 1) = K . So Cooper, M.C. 1998. The tractability of segmentation and scene anal-
ysis. International Journal of Computer Vision, 30(1):27–42.
the K th quantile weight between C and C̄ is one. Thus
Cormen, T.H., Leiserson, C.E., and Rivest, R.L. 1990. Introduction
Dif (C, C̄) = 1 and the segmentation S = {C, C̄} of to Algorithms. The MIT Press: McGraw-Hill Book Company.
G ′ is not too fine. Hence the segmentation of G ′ as a Felzenszwalb, P. and Huttenlocher, D. 1998. Image segmentation us-
single component is too coarse. ing local variation. In Proceedings of IEEE Conference on Com-
If G ′ has a segmentation that is not a single puter Vision and Pattern Recognition, pp. 98–104.
Gdalyahu, Y., Weinshall, D., and Werman, M. 1999. Stochastic clus-
component S = {C1 , . . . , Cl } then the K th quantile
tering by typical cuts. In Proceedings of IEEE Conference on Com-
edge weight between every pair of components Ci and puter Vision and Pattern Recognition, pp. 2596–2601.
C j is one (or else the segmentation would be too fine). Jain, A.K. and Dubes, R.C. 1988. Algorithms for Clustering Data.
Thus the K th quantile edge weight between C1 and Prentice Hall.
C¯1 = C2 ∪ · · · ∪ Cl is one. So the fraction of weight Jermyn, I. and Ishikawa, H. 2001. Globally optimal regions and
boundaries as minimum ratio weight cycles. IEEE Transac-
one edges between C1 and C¯1 is at least 1/(v + 1).
tions on Pattern Analysis and Machine Intelligence, 23:1075–
Equation (7) implies that value of the cut (C1 , C¯1 ) is 1088.
at most v. Pavlidas, T. 1977. Structural Pattern Recognition. Springer-Verlag.
Perona, P. and Freeman, W. 1998. A factorization approach to group-
It is straightforward to see that the transformation of ing. In Proceedings of the European Conference on Computer
Vision, pp. 655–670.
the min ratio cut problem to the problem of finding a
Ratan, A.L., Maron, O., Grimson, W.E.L., and Lozano-Perez, T.
segmentation presented above can be done in polyno- 1999. A framework for learning query concepts in image clas-
mial time. This is sufficient to show the hardness of the sification. In Proceedings of the IEEE Conference on Computer
segmentation problem. Vision and Pattern Recognition, pp. 423–431.
Shi, J. and Malik, J. 1997. Normalized cuts and image segmentation.
In Proceedings of the IEEE Conference on Computer Vision and
Theorem 4. The problem of finding a segmentation Pattern Recognition, pp. 731–737.
that is neither too coarse nor too fine using Dif as Urquhart, R. 1982. Graph theoretical clustering based on limited
defined in Eq. (6) is NP-hard. neighborhood sets. Pattern Recognition, 15(3):173–187.
Efficient Graph-Based Image Segmentation 181

Weiss, Y. 1999. Segmentation using eigenvectors: A unifying view. In Wu, Z. and Leahy, R. 1993. An optimal graph theoretic approach to
Proceedings of the International Conference on Computer Vision, data clustering: Theory and its application to image segmentation.
2:975–982. IEEE Transactions on Pattern Analysis and Machine Intelligence,
Wertheimer, M. 1938. Laws of organization in perceptual forms (par- 11:1101–1113.
tial translation). In A Sourcebook of Gestalt Psychology. W.B. Ellis Zahn, C.T. 1971. Graph-theoretic methods for detecting and describ-
(Ed.). Harcourt, Brace and Company, pp. 71–88. ing gestalt clusters. IEEE Transactions on Computing, 20:68–86.

You might also like