Contour Detection
Contour Detection
Segmentation
Pablo Arbelaez
Michael Maire
Charless Fowlkes
Jitendra Malik
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission.
1
Abstract—This paper investigates two fundamental problems in computer vision: contour detection and image segmentation. We
present state-of-the-art algorithms for both of these tasks. Our contour detector combines multiple local cues into a globalization
framework based on spectral clustering. Our segmentation algorithm consists of generic machinery for transforming the output of
any contour detector into a hierarchical region tree. In this manner, we reduce the problem of image segmentation to that of contour
detection. Extensive experimental evaluation demonstrates that both our contour detection and segmentation methods significantly
outperform competing algorithms. The automatically generated hierarchical segmentations can be interactively refined by user-
specified annotations. Computation at multiple image resolutions provides a means of coupling our system to recognition applications.
1 I NTRODUCTION
This paper presents a unified approach to contour de- Section 3 covers the development of the gP b contour
tection and image segmentation. Contributions include: detector. We couple multiscale local brightness, color,
• A high performance contour detector, combining and texture cues to a powerful globalization framework
local and global image information. using spectral clustering. The local cues, computed by
• A method to transform any contour signal into a hi- applying oriented gradient operators at every location
erarchy of regions while preserving contour quality. in the image, define an affinity matrix representing the
• Extensive quantitative evaluation and the release of similarity between pixels. From this matrix, we derive
a new annotated dataset. a generalized eigenproblem and solve for a fixed num-
Figures 1 and 2 summarize our main results. The ber of eigenvectors which encode contour information.
two Figures represent the evaluation of multiple con- Using a classifier to recombine this signal with the local
tour detection (Figure 1) and image segmentation (Fig- cues, we obtain a large improvement over alternative
ure 2) algorithms on the Berkeley Segmentation Dataset globalization schemes built on top of similar cues.
(BSDS300) [1], using the precision-recall framework in- To produce high-quality image segmentations, we link
troduced in [2]. This benchmark operates by compar- this contour detector with a generic grouping algorithm
ing machine generated contours to human ground-truth described in Section 4 and consisting of two steps. First,
data (Figure 3) and allows evaluation of segmentations we introduce a new image transformation called the
in the same framework by regarding region boundaries Oriented Watershed Transform for constructing a set of
as contours. initial regions from an oriented contour signal. Second,
Especially noteworthy in Figure 1 is the contour de- using an agglomerative clustering procedure, we form
tector gP b, which compares favorably with other leading these regions into a hierarchy which can be represented
techniques, providing equal or better precision for most by an Ultrametric Contour Map, the real-valued image
choices of recall. In Figure 2, gP b-owt-ucm provides obtained by weighting each boundary by its scale of
universally better performance than alternative segmen- disappearance. We provide experiments on the BSDS300
tation algorithms. We introduced the gP b and gP b-owt- as well as the BSDS500, a superset newly released here.
ucm algorithms in [3] and [4], respectively. This paper Although the precision-recall framework [2] has found
offers comprehensive versions of these algorithms, mo- widespread use for evaluating contour detectors, con-
tivation behind their design, and additional experiments siderable effort has also gone into developing metrics
which support our basic claims. to directly measure the quality of regions produced by
We begin with a review of the extensive literature on segmentation algorithms. Noteworthy examples include
contour detection and image segmentation in Section 2. the Probabilistic Rand Index, introduced in this context
by [5], the Variation of Information [6], [7], and the
• P. Arbeláez and J. Malik are with the Department of Electrical Engineering Segmentation Covering criteria used in the PASCAL
and Computer Science, University of California at Berkeley, Berkeley, CA, challenge [8]. We consider all of these metrics and
94720. E-mail: {arbelaez,malik}@eecs.berkeley.edu
• M. Maire is with the Department of Electrical Engineering, California
demonstrate that gP b-owt-ucm delivers an across-the-
Institute of Technology, Pasadena, CA, 91125. E-mail: [email protected] board improvement over existing algorithms.
• C. Fowlkes is with the Department of Computer Science, University of Sections 5 and 6 explore ways of connecting our
California at Irvine, Irvine, CA, 92697. E-mail: [email protected]
purely bottom-up contour and segmentation machinery
2
1 1
iso−F iso−F
0.9 0.9
0.9 0.9
0.8 0.8
0.7 0.7
0.8 0.8
0.6 0.6
0.7 0.7
Precision
Precision
0.5 0.5
0.6 0.6
[F = 0.79] Human
0.4 0.4
[F = 0.70] gPb
[F = 0.68] Multiscale − Ren (2008)
[F = 0.66] BEL − Dollar, Tu, Belongie (2006) 0.5 [F = 0.79] Human 0.5
0.3 [F = 0.66] Mairal, Leordeanu, Bach, Herbert, Ponce (2008) 0.3 [F = 0.71] gPb−owt−ucm
[F = 0.65] Min Cover − Felzenszwalb, McAllester (2006) [F = 0.67] UCM − Arbelaez (2006)
[F = 0.65] Pb − Martin, Fowlkes, Malik (2004) 0.4 [F = 0.63] Mean Shift − Comaniciu, Meer (2002) 0.4
[F = 0.64] Untangling Cycles − Zhu, Song, Shi (2007) [F = 0.62] Normalized Cuts − Cour, Benezit, Shi (2005)
0.2 [F = 0.64] CRF − Ren, Fowlkes, Malik (2005) 0.2 [F = 0.58] Canny−owt−ucm
0.3 0.3
[F = 0.58] Canny (1986) [F = 0.58] Felzenszwalb, Huttenlocher (2004)
[F = 0.56] Perona, Malik (1990) [F = 0.58] Av. Diss. − Bertelli, Sumengen, Manjunath, Gibou (2008)
[F = 0.50] Hildreth, Marr (1980) 0.2 [F = 0.56] SWA − Alpert, Galun, Basri, Brandt (2007) 0.2
0.1 0.1
[F = 0.48] Prewitt (1970) [F = 0.55] ChanVese − Bertelli, Sumengen, Manjunath, Gibou (2008)
[F = 0.48] Sobel (1968) 0.1 [F = 0.55] Donoser, Urschler, Hirzer, Bischof (2009) 0.1
[F = 0.47] Roberts (1965) [F = 0.53] Yang, Wright, Ma, Sastry (2007)
0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall Recall
Fig. 1. Evaluation of contour detectors on the Berke- Fig. 2. Evaluation of segmentation algorithms on
ley Segmentation Dataset (BSDS300) Benchmark [2]. the BSDS300 Benchmark. Paired with our gP b contour
Leading contour detection approaches are ranked ac- detector as input, our hierarchical segmentation algorithm
cording to their maximum F-measure ( 2·P recision·Recall
P recision+Recall ) gP b-owt-ucm [4] produces regions whose boundaries
with respect to human ground-truth boundaries. Iso-F match ground-truth better than those produced by other
curves are shown in green. Our gP b detector [3] performs methods [7], [29], [30], [31], [32], [33], [34], [35].
significantly better than other algorithms [2], [17], [18],
[19], [20], [21], [22], [23], [24], [25], [26], [27], [28] across
almost the entire operating regime. Average agreement
between human subjects is indicated by the green dot.
spanning tree of R. Considering edges in non-decreasing the formulation of the problem in a variational frame-
order by weight, each step of the algorithm merges work. An example is the model proposed by Mumford
components R1 and R2 connected by the current edge if and Shah [53], where the segmentation of an observed
the edge weight is less than: image u0 is given by the minimization of the functional:
Z Z
min(Int(R1 ) + τ (R1 ), Int(R2 ) + τ (R2 )) (1) 2
F(u, C) = (u − u0 ) dx + µ |∇(u)|2 dx + ν|C| (4)
Ω Ω\C
where τ (R) = k/|R|. k is a scale parameter that can be
used to set a preference for component size. where u is piecewise smooth in Ω\C and µ, ν are weight-
The Mean Shift algorithm [34] offers an alternative ing parameters. Theoretical properties of this model can
clustering framework. Here, pixels are represented in be found in, e.g. [53], [54]. Several algorithms have been
the joint spatial-range domain by concatenating their developed to minimize the energy (4) or its simplified
spatial coordinates and color values into a single vector. version, where u is piecewise constant in Ω \ C. Koepfler
Applying mean shift filtering in this domain yields a et al. [55] proposed a region merging method for this
convergence point for each pixel. Regions are formed by purpose. Chan and Vese [56], [57] follow a different
grouping together all pixels whose convergence points approach, expressing (4) in the level set formalism of
are closer than hs in the spatial domain and hr in the Osher and Sethian [58], [59]. Bertelli et al. [30] extend
range domain, where hs and hr are respective bandwidth this approach to more general cost functions based on
parameters. Additional merging can also be performed pairwise pixel similarities. Recently, Pock et al. [60] pro-
to enforce a constraint on minimum region area. posed to solve a convex relaxation of (4), thus obtaining
Spectral graph theory [48], and in particular the Nor- robustness to initialization. Donoser et al. [29] subdivide
malized Cuts criterion [45], [46], provides a way of the problem into several figure/ground segmentations,
integrating global image information into the grouping each initialized using low-level saliency and solved by
process. In this framework, given an affinity matrix W minimizing an energy based on Total Variation.
whose entries encode the similarity
P between pixels, one
defines diagonal matrix Dii = j Wij and solves for the
2.3 Benchmarks
generalized eigenvectors of the linear system:
Though much of the extensive literature on contour
(D − W )v = λDv (2) detection predates its development, the BSDS [2] has
Traditionally, after this step, K-means clustering is since found wide acceptance as a benchmark for this task
applied to obtain a segmentation into regions. This ap- [23], [24], [25], [26], [27], [28], [35], [61]. The standard for
proach often breaks uniform regions where the eigenvec- evaluating segmentations algorithms is less clear.
tors have smooth gradients. One solution is to reweight One option is to regard the segment boundaries
the affinity matrix [47]; others have proposed alternative as contours and evaluate them as such. However, a
graph partitioning formulations [49], [50], [51]. methodology that directly measures the quality of the
A recent variant of Normalized Cuts for image seg- segments is also desirable. Some types of errors, e.g. a
mentation is the Multiscale Normalized Cuts (NCuts) missing pixel in the boundary between two regions, may
approach of Cour et al. [33]. The fact that W must not be reflected in the boundary benchmark, but can
be sparse, in order to avoid a prohibitively expensive have substantial consequences for segmentation quality,
computation, limits the naive implementation to using e.g. incorrectly merging large regions. One might argue
only local pixel affinities. Cour et al. solve this limitation that the boundary benchmark favors contour detectors
by computing sparse affinity matrices at multiple scales, over segmentation methods, since the former are not
setting up cross-scale constraints, and deriving a new burdened with the constraint of producing closed curves.
eigenproblem for this constrained multiscale cut. We therefore also consider various region-based metrics.
Sharon et al. [52] propose an alternative to improve
the computational efficiency of Normalized Cuts. This 2.3.1 Variation of Information
approach, inspired by algebraic multigrid, iteratively The Variation of Information metric was introduced for
coarsens the original graph by selecting a subset of nodes the purpose of clustering comparison [6]. It measures the
such that each variable on the fine level is strongly distance between two segmentations in terms of their
coupled to one on the coarse level. The same merging average conditional entropy given by:
strategy is adopted in [31], where the strong coupling of
V I(S, S 0 ) = H(S) + H(S 0 ) − 2I(S, S 0 ) (5)
a subset S of the graph nodes V is formalized as:
P
pij where H and I represent respectively the entropies and
P j∈S >ψ ∀i ∈ V − S (3) mutual information between two clusterings of data S
j∈V pij and S 0 . In our case, these clusterings are test and ground-
where ψ is a constant and pij the probability of merging truth segmentations. Although V I possesses some inter-
i and j, estimated from brightness and texture similarity. esting theoretical properties [6], its perceptual meaning
Many approaches to image segmentation fall into a and applicability in the presence of several ground-truth
different category than those covered so far, relying on segmentations remains unclear.
5
truth segmentations S and G is given by the sum of the Lower Half−Disc Histogram
where N denotes the total number of pixels in the image. 1 X (g(i) − h(i))2
χ2 (g, h) = (9)
Similarly, the covering of a machine segmentation S by 2 i g(i) + h(i)
a family of ground-truth segmentations {Gi } is defined
We then apply second-order Savitzky-Golay filtering
by first covering S separately with each human segmen-
[63] to enhance local maxima and smooth out multiple
tation Gi , and then averaging over the different humans.
detection peaks in the direction orthogonal to θ. This is
To achieve perfect covering the machine segmentation
equivalent to fitting a cylindrical parabola, whose axis
must explain all of the human data. We can then define
is orientated along direction θ, to a local 2D window
two quality descriptors for regions: the covering of S by
surrounding each pixel and replacing the response at the
{Gi } and the covering of {Gi } by S.
pixel with that estimated by the fit.
Figure 4 shows an example. This computation is moti-
3 C ONTOUR D ETECTION vated by the intuition that contours correspond to image
As a starting point for contour detection, we consider discontinuities and histograms provide a robust mech-
the work of Martin et al. [2], who define a function anism for modeling the content of an image region. A
P b(x, y, θ) that predicts the posterior probability of a strong oriented gradient response means a pixel is likely
boundary with orientation θ at each image pixel (x, y) to lie on the boundary between two distinct regions.
by measuring the difference in local image brightness, The P b detector combines the oriented gradient sig-
color, and texture channels. In this section, we review nals obtained from transforming an input image into
these cues, introduce our own multiscale version of the four separate feature channels and processing each chan-
P b detector, and describe the new globalization method nel independently. The first three correspond to the
we run on top of this multiscale local detector. channels of the CIE Lab colorspace, which we refer to
6
π
Channel θ=0 θ= 2 G(x, y)
Fig. 5. Filters for creating textons. We use 8 oriented
even- and odd-symmetric Gaussian derivative filters and
a center-surround (difference of Gaussians) filter.
Fig. 7. Spectral Pb. Left: Image. Middle Left: The thinned non-max suppressed multiscale Pb signal defines a sparse
affinity matrix connecting pixels within a fixed radius. Pixels i and j have a low affinity as a strong boundary separates
them, whereas i and k have high affinity. Middle: First four generalized eigenvectors resulting from spectral clustering.
Middle Right: Partitioning the image by running K-means clustering on the eigenvectors erroneously breaks smooth
regions. Right: Instead, we compute gradients of the eigenvectors, transforming them back into a contour signal.
Fig. 8. Eigenvectors carry contour information. Left: Image and maximum response of spectral Pb over
orientations, sP b(x, y) = maxθ {sP b(x, y, θ)}. Right Top: First four generalized eigenvectors, v1 , ..., v4 , used in
creating sP b. Right Bottom: Maximum gradient response over orientations, maxθ {∇θ vk (x, y)}, for each eigenvector.
two halves of a disc of radius σ(i, s) centered at (x, y) and is the “soft” manner in which we use the eigenvectors
divided by a diameter at angle θ. The parameters αi,s obtained from spectral partitioning.
weight the relative contribution of each gradient signal. As input to the spectral clustering stage, we construct
In our experiments, we sample θ at eight equally spaced a sparse symmetric affinity matrix W using the interven-
orientations in the interval [0, π). Taking the maximum ing contour cue [49], [64], [65], the maximal value of mP b
response over orientations yields a measure of boundary along a line connecting two pixels. We connect all pixels
strength at each pixel: i and j within a fixed radius r with affinity:
mP b(x, y) = max{mP b(x, y, θ)} (11)
θ Wij = exp − max{mP b(p)}/ρ (12)
p∈ij
An optional non-maximum suppression step [22] pro-
duces thinned, real-valued contours. where ij is the line segment connecting i and j and ρ is
In contrast to [2] and [28] which use a logistic regres- a constant. We set r = 5 pixels and ρ = 0.1.
sion classifier to combine cues, we learn the weights αi,s In order
P to introduce global information, we define
by gradient ascent on the F-measure using the training Dii = j Wij and solve for the generalized eigenvectors
images and corresponding ground-truth of the BSDS. {v0 , v1 , ..., vn } of the system (D − W )v = λDv (2),
corresponding to the n+1 smallest eigenvalues 0 = λ0 ≤
λ1 ≤ ... ≤ λn . Figure 7 displays an example with four
3.3 Globalization eigenvectors. In practice, we use n = 16.
Spectral clustering lies at the heart of our globalization At this point, the standard Normalized Cuts approach
machinery. The key element differentiating the algorithm associates with each pixel a length n descriptor formed
described in this section from other approaches [45], [47] from entries of the n eigenvectors and uses a clustering
8
Thresholded P bg Tg
Thresholded gP b
gP bT
Fig. 9. Benefits of globalization. When compared with the local detector P b, our detector gP b reduces clutter and
completes contours. The thresholds shown correspond to the points of maximal F-measure on the curves in Figure 1.
0.5
with Gaussian directional derivative filters at multiple
orientations θ, obtaining oriented signals {∇θ vk (x, y)}. 0.4
0.6
k=1
λk 0
[F = 0.65] Pb − Martin, Fowlkes, Malik (2004)
Fig. 11. Watershed Transform. Left: Image. Middle Left: Boundary strength E(x, y). We regard E(x, y) as a
topographic surface and flood it from its local minima. Middle Right: This process partitions the image into catchment
basins P0 and arcs K0 . There is exactly one basin per local minimum and the arcs coincide with the locations where
the floods originating from distinct minima meet. Local minima are marked with red dots. Right: Each arc weighted by
the mean value of E(x, y) along it. This weighting scheme produces artifacts, such as the strong horizontal contours
in the small gap between the two statues.
and spectral signals: advantages. Regions come with their own scale estimates
XX and provide natural domains for computing features
gP b(x, y, θ) = βi,s Gi,σ(i,s) (x, y, θ) + γ · sP b(x, y, θ) used in recognition. Many visual tasks can also benefit
s i
(14) from the complexity reduction achieved by transforming
We subsequently rescale gP b using a sigmoid to match an image with millions of pixels into a few hundred or
a probabilistic interpretation. As with mP b (10), the thousand “superpixels” [67].
weights βi,s and γ are learned by gradient ascent on the In this section, we show how to recover closed con-
F-measure using the BSDS training images. tours, while preserving the gains in boundary quality
achieved in the previous section. Our algorithm, first
reported in [4], builds a hierarchical segmentation by
3.4 Results exploiting the information in the contour signal. We
Qualitatively, the combination of the multiscale cues introduce a new variant of the watershed transform
with our globalization machinery translates into a re- [68], [69], the Oriented Watershed Transform (OWT), for
duction of clutter edges and completion of contours in producing a set of initial regions from contour detector
the output, as shown in Figure 9. output. We then construct an Ultrametric Contour Map
Figure 10 breaks down the contributions of the mul- (UCM) [35] from the boundaries of these initial regions.
tiscale and spectral signals to the performance of gP b. This sequence of operations (OWT-UCM) can be seen
These precision-recall curves show that the reduction of as generic machinery for going from contours to a hier-
false positives due to the use of global information in archical region tree. Contours encoded in the resulting
sP b is concentrated in the high thresholds, while gP b hierarchical segmentation retain real-valued weights in-
takes the best of both worlds, relying on sP b in the high dicating their likelihood of being a true boundary. For a
precision regime and on mP b in the high recall regime. given threshold, the output is a set of closed contours
Looking again at the comparison of contour detectors that can be treated as either a segmentation or as a
on the BSDS300 benchmark in Figure 1, the mean im- boundary detector for the purposes of benchmarking.
provement in precision of gP b with respect to the single To describe our algorithm in the most general setting,
scale P b is 10% in the recall range [0.1, 0.9]. we now consider an arbitrary contour detector, whose
output E(x, y, θ) predicts the probability of an image
4 S EGMENTATION boundary at location (x, y) and orientation θ.
The nonmax suppressed gP b contours produced in the
previous section are often not closed and hence do not 4.1 Oriented Watershed Transform
partition the image into regions. These contours may still Using the contour signal, we first construct a finest
be useful, e.g. as a signal on which to compute image partition for the hierarchy, an over-segmentation whose
descriptors. However, closed regions offer additional regions determine the highest level of detail considered.
10
Fig. 12. Oriented Watershed Transform. Left: Input boundary signal E(x, y) = maxθ E(x, y, θ). Middle Left:
Watershed arcs computed from E(x, y). Note that thin regions give rise to artifacts. Middle: Watershed arcs with
an approximating straight line segment subdivision overlaid. We compute this subdivision in a scale-invariant manner
by recursively breaking an arc at the point maximally distant from the straight line segment connecting its endpoints, as
shown in Figure 13. Subdivision terminates when the distance from the line segment to every point on the arc is less
than a fixed fraction of the segment length. Middle Right: Oriented boundary strength E(x, y, θ) for four orientations θ.
In practice, we sample eight orientations. Right: Watershed arcs reweighted according to E at the orientation of their
associated line segments. Artifacts, such as the horizontal contours breaking the long skinny regions, are suppressed
as their orientations do not agree with the underlying E(x, y, θ) signal.
Fig. 14. Hierarchical segmentation from contours. Far Left: Image. Left: Maximal response of contour detector
gP b over orientations. Middle Left: Weighted contours resulting from the Oriented Watershed Transform - Ultrametric
Contour Map (OWT-UCM) algorithm using gP b as input. This single weighted image encodes the entire hierarchical
segmentation. By construction, applying any threshold to it is guaranteed to yield a set of closed contours (the ones
with weights above the threshold), which in turn define a segmentation. Moreover, the segmentations are nested.
Increasing the threshold is equivalent to removing contours and merging the regions they separated. Middle Right:
The initial oversegmentation corresponding to the finest level of the UCM, with regions represented by their mean
color. Right and Far Right: Contours and corresponding segmentation obtained by thresholding the UCM at level 0.5.
E(x, y, θ), to assign each arc pixel (x, y) a boundary between regions. The algorithm proceeds by sorting the
strength of E(x, y, o(x, y)). We quantize o(x, y) in the links by similarity and iteratively merging the most
same manner as θ, so this operation is a simple lookup. similar regions. Specifically:
Finally, each original arc in K0 is assigned weight equal 1) Select minimum weight contour:
to average boundary strength of the pixels it contains. C ∗ = argminC∈K0 W (C).
Comparing the middle left and far right panels of Fig- 2) Let R1 , R2 ∈ P0 be the regions separated by C ∗ .
ure 12 shows this reweighting scheme removes artifacts. 3) Set R = R1 ∪ R2 , and update:
P0 ← P0 \{R1 , R2 } ∪ {R} and K0 ← K0 \{C ∗ }.
4.2 Ultrametric Contour Map 4) Stop if K0 is empty.
Otherwise, update weights W (K0 ) and repeat.
Contours have the advantage that it is fairly straightfor-
This process produces a tree of regions, where the leaves
ward to represent uncertainty in the presence of a true
are the initial elements of P0 , the root is the entire image,
underlying contour, i.e. by associating a binary random
and the regions are ordered by the inclusion relation.
variable to it. One can interpret the boundary strength
We define dissimilarity between two adjacent regions
assigned to an arc by the Oriented Watershed Transform
as the average strength of their common boundary in
(OWT) of the previous section as an estimate of the
K0 , with weights W (K0 ) initialized by the OWT. Since at
probability of that arc being a true contour.
every step of the algorithm all remaining contours must
It is not immediately obvious how to represent un-
have strength greater than or equal to those previously
certainty about a segmentation. One possibility, which
removed, the weight of the contour currently being
we exploit here, is the Ultrametric Contour Map (UCM)
removed cannot decrease during the merging process.
[35] which defines a duality between closed, non-self-
Hence, the constructed region tree has the structure of
intersecting weighted contours and a hierarchy of re-
an indexed hierarchy and can be described by a den-
gions. The base level of this hierarchy respects even
drogram, where the height H(R) of each region R is the
weak contours and is thus an oversegmentation of the
value of the dissimilarity at which it first appears. Stated
image. Upper levels of the hierarchy respect only strong
equivalently, H(R) = W (C) where C is the contour
contours, resulting in an under-segmentation. Moving
whose removal formed R. The hierarchy also yields a
between levels offers a continuous trade-off between
metric on P0 ×P0 , with the distance between two regions
these extremes. This shift in representation from a single
given by the height of the smallest containing segment:
segmentation to a nested collection of segmentations
frees later processing stages to use information from D(R1 , R2 ) = min{H(R) : R1 , R2 ⊆ R} (15)
multiple levels or select a level based on additional
knowledge. This distance satisfies the ultrametric property:
Our hierarchy is constructed by a greedy graph-based D(R1 , R2 ) ≤ max(D(R1 , R), D(R, R2 )) (16)
region merging algorithm. We define an initial graph
G = (P0 , K0 , W (K0 )), where the nodes are the regions since if R is merged with R1 before R2 , then D(R1 , R2 ) =
P0 , the links are the arcs K0 separating adjacent regions, D(R, R2 ), or if R is merged with R2 before R1 , then
and the weights W (K0 ) are a measure of dissimilarity D(R1 , R2 ) = D(R1 , R). As a consequence, the whole
12
BSDS300 BSDS500
hierarchy can be represented as an Ultrametric Contour ODS OIS AP ODS OIS AP
Map (UCM) [35], the real-valued image obtained by Human 0.79 0.79 − 0.80 0.80 −
gPb-owt-ucm 0.71 0.74 0.73 0.73 0.76 0.73
weighting each boundary by its scale of disappearance. [34] Mean Shift 0.63 0.66 0.54 0.64 0.68 0.56
Figure 14 presents an example of our method. The [33] NCuts 0.62 0.66 0.43 0.64 0.68 0.45
Canny-owt-ucm 0.58 0.63 0.58 0.60 0.64 0.58
UCM is a weighted contour image that, by construction, [32] Felz-Hutt 0.58 0.62 0.53 0.61 0.64 0.56
has the remarkable property of producing a set of closed [31] SWA 0.56 0.59 0.54 − − −
gPb 0.70 0.72 0.66 0.71 0.74 0.65
curves for any threshold. Conversely, it is a convenient Canny 0.58 0.62 0.58 0.60 0.63 0.58
representation of the region tree since the segmentation
at a scale k can be easily retrieved by thresholding the
TABLE 1. Boundary benchmarks on the BSDS. Results
UCM at level k. Since our notion of scale is the average
for six different segmentation methods (upper table) and
contour strength, the UCM values reflect the contrast
two contour detectors (lower table) are given. Shown are
between neighboring regions.
the F-measures when choosing an optimal scale for the
entire dataset (ODS) or per image (OIS), as well as the
4.3 Results average precision (AP). Figures 1, 2, and 17 show the full
While the OWT-UCM algorithm can use any source of precision-recall curves for these algorithms.
contours for the input E(x, y, θ) signal (e.g. the Canny
edge detector before thresholding), we obtain best re- BSDS300
sults by employing the gP b detector [3] introduced in Covering PRI VI
ODS OIS Best ODS OIS ODS OIS
Section 3. We report experiments using both gP b as well Human 0.73 0.73 − 0.87 0.87 1.16 1.16
as the baseline Canny detector, and refer to the resulting gPb-owt-ucm 0.59 0.65 0.75 0.81 0.85 1.65 1.47
[34] Mean Shift 0.54 0.58 0.66 0.78 0.80 1.83 1.63
segmentation algorithms as gP b-owt-ucm and Canny- [32] Felz-Hutt 0.51 0.58 0.68 0.77 0.82 2.15 1.79
owt-ucm, respectively. Canny-owt-ucm 0.48 0.56 0.66 0.77 0.82 2.11 1.81
[33] NCuts 0.44 0.53 0.66 0.75 0.79 2.18 1.84
Figures 15 and 16 illustrate results of gP b-owt-ucm [31] SWA 0.47 0.55 0.66 0.75 0.80 2.06 1.75
on images from the BSDS500. Since the OWT-UCM [29] Total Var. 0.57 − − 0.78 − 1.81 −
[70] T+B Encode 0.54 − − 0.78 − 1.86 −
algorithm produces hierarchical region trees, obtaining a [30] Av. Diss. 0.47 − − 0.76 − 2.62 −
single segmentation as output involves a choice of scale. [30] ChanVese 0.49 − − 0.75 − 2.54 −
One possibility is to use a fixed threshold for all images BSDS500
in the dataset, calibrated to provide optimal performance Covering PRI VI
ODS OIS Best ODS OIS ODS OIS
on the training set. We refer to this as the optimal dataset Human 0.72 0.72 − 0.88 0.88 1.17 1.17
scale (ODS). We also evaluate performance when the gPb-owt-ucm 0.59 0.65 0.74 0.83 0.86 1.69 1.48
optimal threshold is selected by an oracle on a per-image [34] Mean Shift 0.54 0.58 0.66 0.79 0.81 1.85 1.64
[32] Felz-Hutt 0.52 0.57 0.69 0.80 0.82 2.21 1.87
basis. With this choice of optimal image scale (OIS), one Canny-owt-ucm 0.49 0.55 0.66 0.79 0.83 2.19 1.89
naturally obtains even better segmentations. [33] NCuts 0.45 0.53 0.67 0.78 0.80 2.23 1.89
Fig. 15. Hierarchical segmentation results on the BSDS500. From Left to Right: Image, Ultrametric Contour Map
(UCM) produced by gP b-owt-ucm, and segmentations obtained by thresholding at the optimal dataset scale (ODS)
and optimal image scale (OIS). All images are from the test set.
14
Fig. 16. Additional hierarchical segmentation results on the BSDS500. From Top to Bottom: Image, UCM
produced by gP b-owt-ucm, and ODS and OIS segmentations. All images are from the test set.
15
1
iso−F
0.9
0.9
0.8
0.7
0.8
0.6
0.7
Precision
0.5
0.6
0.4
0.5
0.3
0.4
[F = 0.80] Human
0.2 [F = 0.73] gPb−owt−ucm
0.3
[F = 0.71] gPb
[F = 0.64] Mean Shift − Comaniciu, Meer (2002)
0.1 [F = 0.64] Normalized Cuts − Cour, Benezit, Shi (2005)
[F = 0.61] Felzenszwalb, Huttenlocher (2004)
0.2 Fig. 18. Pairwise comparison of segmentation algo-
[F = 0.60] Canny
[F = 0.60] Canny−owt−ucm
0.1 rithms on the BSDS300. The coordinates of the red
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 dots are the boundary benchmark scores at the optimal
Recall
image scale (OIS) for each of the two methods compared
on single images. Boxed totals represent the number of
Fig. 17. Boundary benchmark on the BSDS500. Com- images where one algorithm is better. For example, the
paring boundaries to human ground-truth allows us to top-left shows gP b-owt-ucm outscores NCuts on 99/100
evaluate contour detectors [3], [22] (dotted lines) and seg- images. When comparing with SWA, we further restrict
mentation algorithms [4], [32], [33], [34] (solid lines) in the the output of the second method to match the number of
same framework. Performance is consistent when going regions produced by SWA. All differences are statistically
from the BSDS300 (Figures 1 and 2) to the BSDS500 significant except between NCuts and Mean Shift.
(above). Furthermore, the OWT-UCM algorithm pre-
serves contour detector quality. For both gP b and
1 10
Canny, comparing the resulting segment boundaries to 0.9
gPb−owt−ucm
Canny−owt−ucm 9
gPb−owt−ucm
Canny−owt−ucm
Variation of Information
0.7
7
0 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
4.4.2 Region Quality Scale Scale
Fig. 20. Interactive segmentation. Left: Image. Middle: UCM produced by gP b-owt-ucm (grayscale) with additional
user annotations (color dots and lines). Right: The region hierarchy defined by the UCM allows us to automatically
propagate annotations to unlabeled segments, resulting in the desired labeling of the image with minimal user effort.
5 I NTERACTIVE S EGMENTATION
6 M ULTISCALE FOR O BJECT A NALYSIS
Until now, we have only discussed fully automatic image
segmentation. Human assisted segmentation is relevant Our contour detection and segmentation algorithms cap-
for many applications, and recent approaches rely on the ture multiscale information by combining local gradient
graph-cuts formalism [72], [73], [74] or other energy min- cues computed at three different scales, as described in
imization procedure [75] to extract foreground regions. Section 3.2. We did not see any performance benefit on
For example, [72] cast the task of determining binary the BSDS by using additional scales. However, this fact is
foreground/background pixel assignments in terms of not an invitation to conclude that a simple combination
a cost function with both unary and pairwise poten- of a limited range of local cues is a sufficient solution
tials. The unary potentials encode agreement with es- to the problem of multiscale image analysis. Rather,
timated foreground or background region models and it is a statement about the nature of the BSDS. The
the pairwise potentials bias neighboring pixels not sep- fixed resolution of the BSDS images and the inherent
arated by a strong boundary to have the same label. photographic bias of the dataset lead to the situation in
17
Fig. 21. Multiscale segmentation for object detection. Top: Images from the PASCAL 2008 dataset, with objects
outlined at ground-truth locations. Detailed Views: For each window, we show the boundaries obtained by running the
entire gP b-owt-ucm segmentation algorithm at multiple scales. Scale increases by factors of 2 moving from left to right
(and top to bottom for the blue window). The total scale range is thus larger than the three scales used internally for
each segmentation. Highlighted Views: The highlighted scale best captures the target object’s boundaries. Note the
link between this scale and the absolute size of the object in the image. For example, the small sailboat (red outline) is
correctly segmented only at the finest scale. In other cases (e.g. parrot, magenta outline), bounding contours appear
across several scales, but informative internal contours are scale sensitive. A window-based object detector could
learn and exploit an optimal coupling between object size and segmentation scale.
which a small range of scales captures the boundaries windows densely sampled from the image. Thus, we
humans find important. know the size of the object we are looking for in each
Dealing with the full variety one expects in high window and hence the scale at which contours belonging
resolution images of complex scenes requires more than to the object would appear. Varying the contour scale
a naive weighted average of signals across the scale with the window size produces the best input signal for
range. Such an average would blur information, result- the object detector. Note that this procedure does not
ing in good performance for medium-scale contours, prevent the object detector itself from using multiscale
but poor detection of both fine-scale and large-scale information, but rather provides the correct central scale.
contours. Adaptively selecting the appropriate scale at As each segmentation internally uses gradients at
each location in the image is desirable, but it is unclear three scales, [ σ2 , σ, 2σ], by stepping by a factor of 2 in
how to estimate this robustly using only bottom-up cues. scale between segmentations, we can reuse shared local
For some applications, in particular object detection, cues. The globalization stage (sP b signal) can optionally
we can instead use a top-down process to guide scale be customized for each window by computing it using
selection. Suppose we wish to apply a classifier to de- only a limited surrounding image region. This strategy,
termine whether a subwindow of the image contains used here, results in more work overall (a larger number
an instance of a given object category. We need only of simpler globalization problems), which can be miti-
report a positive answer when the object completely fills gated by not sampling sP b as densely as one samples
the subwindow, as the detector will be run on a set of windows.
18
A PPENDIX
E FFICIENT C OMPUTATION
Computing the oriented gradient of histograms (Fig-
ure 4) directly as outlined in Section 3.1 is expensive.
In particular, for an N pixel image and a disc of radius Fig. 22. Efficient computation of the oriented gradient
r it takes O(N r2 ) time to compute since a region of of histograms. Left: The two half-discs of interest can
area O(r2 ) is examined at every pixel location. This be approximated arbitrarily well by a series of rectangular
entire procedure is repeated 32 times (4 channels with boxes. We found a single box of equal area to the half-disc
8 orientations) for each of 3 choices of r (the cost of to be a sufficient approximation. Middle: Replacing the
the largest scale dominates the time complexity). Martin circular disc of Figure 4 with the approximation reduces
et al. [2] suggest ways to speed up this computation, the problem to computing the histograms within rectan-
including incremental updating of the histograms as the gular regions. Right: Instead of rotating the rectangles,
disc is swept across the image. However, this strategy rotate the image and use the integral image trick to
still requires O(N r) time. We present an algorithm for compute the histograms efficiently. Rotate the final result
the oriented gradient of histograms computation that to map it back to the original coordinate frame.
runs in O(N ) time, independent of the radius r.
Following Figure 22, we can approximate each half-
disc by a series of rectangles. It turns out that a single
exact half-disc gpp
approximate each half-disc. For an intuition as to why [7] A. Y. Yang, J. Wright, Y. Ma, and S. S. Sastry, “Unsupervised
a single rectangle turns out to be sufficient, look again segmentation of natural images via lossy data compression,”
CVIU, 2008.
at the overlap of the rectangle with the half disc in the [8] M. Everingham, L. van Gool, C. Williams, J. Winn, and A. Zisser-
lower left of Figure 22. The majority of the pixels used in man, “PASCAL 2008 Results,” http://www.pascal-network.org/
forming the histogram lie within both the rectangle and challenges/VOC/voc2008/workshop/index.html, 2008.
[9] D. Hoiem, A. A. Efros, and M. Hebert, “Geometric context from
the disc, and those pixels that differ are far from the a single image,” ICCV, 2005.
center of the disc (the pixel at which we are computing [10] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and
the gradient). Thus, we are only slightly changing the S. Belongie, “Objects in context,” ICCV, 2007.
[11] T. Malisiewicz and A. A. Efros, “Improving spatial support for
shape of the region we use for context around each pixel. objects via multiple segmentations,” BMVC, 2007.
Figure 23 shows that the result using the single rectangle [12] N. Ahuja and S. Todorovic, “Connected segmentation tree: A joint
approximation is visually indistinguishable from that representation of region layout and hierarchy,” CVPR, 2008.
[13] A. Saxena, S. H. Chung, and A. Y. Ng, “3-d depth reconstruction
using the original half-disc. from a single still image,” IJCV, 2008.
Note that the same image rotation technique can be [14] T. Brox, C. Bregler, and J. Malik, “Large displacement optical
used for computing convolutions with any oriented flow,” CVPR, 2009.
[15] C. Gu, J. Lim, P. Arbeláez, and J. Malik, “Recognition using
separable filter, such as the oriented Gaussian derivative regions,” CVPR, 2009.
filters used for textons (Figure 5) or the second-order [16] J. Lim, P. Arbeláez, C. Gu, and J. Malik, “Context by region
Savitzky-Golay filters used for spatial smoothing of our ancestry,” ICCV, 2009.
[17] L. G. Roberts, “Machine perception of three-dimensional solids,”
oriented gradient output. Rotating the image, convolving In Optical and Electro-Optical Information Processing, J. T. Tippett et
with two 1D filters, and inverting the rotation is more al. Eds. Cambridge, MA: MIT Press, 1965.
efficient than convolving with a rotated 2D filter. More- [18] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis.
New York: Wiley, 1973.
over, in this case, no approximation is required as these [19] J. M. S. Prewitt, “Object enhancement and extraction,” In Picture
operations are equivalent up to the numerical accuracy Processing and Psychopictorics, B. Lipkin and A. Rosenfeld. Eds.
of the interpolation done when rotating the image. This Academic Press, New York, 1970.
[20] D. C. Marr and E. Hildreth, “Theory of edge detection,” Proceed-
means that all of the filtering performed as part of the ings of the Royal Society of London, 1980.
local cue computation can be done in O(N r) time instead [21] P. Perona and J. Malik, “Detecting and localizing edges composed
of O(N r2 ) time where here r = max(w, h) and w and h of steps, peaks and roofs,” ICCV, 1990.
[22] J. Canny, “A computational approach to edge detection,” PAMI,
are the width and height of the 2D filter matrix. For large 1986.
r, the computation time can be further reduced by using [23] X. Ren, C. Fowlkes, and J. Malik, “Scale-invariant contour com-
the Fast Fourier Transform to calculate the convolution. pletion using conditional random fields,” ICCV, 2005.
[24] Q. Zhu, G. Song, and J. Shi, “Untangling cycles for contour
The entire local cue computation is also easily paral- grouping,” ICCV, 2007.
lelized. The image can be partitioned into overlapping [25] P. Felzenszwalb and D. McAllester, “A min-cover approach for
finding salient curves,” POCV, 2006.
subimages to be processed in parallel. In addition, the [26] J. Mairal, M. Leordeanu, F. Bach, M. Hebert, and J. Ponce, “Dis-
96 intermediate results (3 scales of 4 channels with 8 criminative sparse image models for class-specific edge detection
orientations) can all be computed in parallel as they are and image interpretation,” ECCV, 2008.
[27] P. Dollar, Z. Tu, and S. Belongie, “Supervised learning of edges
independent subproblems. Catanzaro et al. [77] have cre- and object boundaries,” CVPR, 2006.
ated a parallel GPU implementation of our gP b contour [28] X. Ren, “Multi-scale improves boundary detection in natural
detector. They also exploit the integral histogram trick images,” ECCV, 2008.
[29] M. Donoser, M. Urschler, M. Hirzer, and H. Bischof, “Saliency
introduced here, with the single rectangle approxima- driven total variational segmentation,” ICCV, 2009.
tion, while replicating our precision-recall performance [30] L. Bertelli, B. Sumengen, B. Manjunath, and F. Gibou, “A varia-
curve on the BSDS benchmark. The speed improvements tional framework for multi-region pairwise similarity-based im-
age segmentation,” PAMI, 2008.
due to both the reduction in computational complexity [31] S. Alpert, M. Galun, R. Basri, and A. Brandt, “Image segmentation
and parallelization make our gP b contour detector and by probabilistic bottom-up aggregation and cue integration,”
gP b-owt-ucm segmentation algorithm practical tools. CVPR, 2007.
[32] P. Felzenszwalb and D. Huttenlocher, “Efficient graph-based im-
age segmentation,” IJCV, 2004.
[33] T. Cour, F. Benezit, and J. Shi, “Spectral segmentation with
R EFERENCES multiscale graph decomposition.” CVPR, 2005.
[34] D. Comaniciu and P. Meer, “Mean shift: A robust approach
[1] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human toward feature space analysis,” PAMI, 2002.
segmented natural images and its application to evaluating seg- [35] P. Arbeláez, “Boundary extraction in natural images using ultra-
mentation algorithms and measuring ecological statistics,” ICCV, metric contour maps,” POCV, 2006.
2001. [36] M. C. Morrone and R. Owens, “Feature detection from local
[2] D. Martin, C. Fowlkes, and J. Malik, “Learning to detect natural energy,” Pattern Recognition Letters, 1987.
image boundaries using local brightness, color and texture cues,” [37] W. T. Freeman and E. H. Adelson, “The design and use of
PAMI, 2004. steerable filters,” PAMI, 1991.
[3] M. Maire, P. Arbeláez, C. Fowlkes, and J. Malik, “Using contours [38] T. Lindeberg, “Edge detection and ridge detection with automatic
to detect and localize junctions in natural images,” CVPR, 2008. scale selection,” IJCV, 1998.
[4] P. Arbeláez, M. Maire, C. Fowlkes, and J. Malik, “From contours [39] Z. Tu, “Probabilistic boosting-tree: Learning discriminative mod-
to regions: An empirical evaluation,” CVPR, 2009. els for classification, recognition, and clustering,” ICCV, 2005.
[5] R. Unnikrishnan, C. Pantofaru, and M. Hebert, “Toward objective [40] P. Parent and S. W. Zucker, “Trace inference, curvature consis-
evaluation of image segmentation algorithms,” PAMI, 2007. tency, and curve detection,” PAMI, 1989.
[6] M. Meila, “Comparing clusterings: An axiomatic view,” ICML, [41] L. R. Williams and D. W. Jacobs, “Stochastic completion fields: A
2005. neural model of illusory contour shape and salience,” ICCV, 1995.
20
[42] J. Elder and S. Zucker, “Computing contour closures,” ECCV, [74] Y. Li, J. Sun, C.-K. Tang, and H.-Y. Shum, “Lazy snapping,”
1996. SIGGRAPH, 2004.
[43] Y. Weiss, “Correctness of local probability propagation in graph- [75] S. Bagon, O. Boiman, and M. Irani, “What is a good image
ical models with loops,” Neural Computation, 2000. segment? A unified approach to segment extraction,” ECCV, 2008.
[44] S. Belongie, C. Carson, H. Greenspan, and J. Malik, “Color and [76] P. Arbeláez and L. Cohen, “Constrained image segmentation from
texture-based image segmentation using EM and its application hierarchical boundaries,” CVPR, 2008.
to content-based image retrieval,” ICCV, pp. 675–682, 1998. [77] B. Catanzaro, B.-Y. Su, N. Sundaram, Y. Lee, M. Murphy, and
[45] J. Shi and J. Malik, “Normalized cuts and image segmentation,” K. Keutzer, “Efficient, high-quality image contour detection,”
PAMI, 2000. ICCV, 2009.
[46] J. Malik, S. Belongie, T. Leung, and J. Shi, “Contour and texture
Pablo Arbeláez received a PhD with honors in
analysis for image segmentation,” IJCV, 2001.
Applied Mathematics from the Université Paris
[47] D. Tolliver and G. L. Miller, “Graph partitioning by spectral
Dauphine in 2005. He is a Postdoctoral Scholar
rounding: Applications in image segmentation and clustering,”
with the Computer Vision group at UC Berkeley
CVPR, 2006.
since 2007. His research interests are in com-
[48] F. R. K. Chung, Spectral Graph Theory. American Mathematical
puter vision, where he has worked on a num-
Society, 1997.
ber of problems, including perceptual grouping,
[49] C. Fowlkes and J. Malik, “How much does globalization help
object recognition and the analysis of biological
segmentation?” UC Berkeley, Tech. Rep. CSD-04-1340, 2004.
images.
[50] S. Wang, T. Kubota, J. M. Siskind, and J. Wang, “Salient closed
boundary extraction with ratio contour,” PAMI, 2005.
[51] S. X. Yu, “Segmentation induced by scale invariance,” CVPR, 2005. Michael Maire received a BS with honors from
[52] E. Sharon, M. Galun, D. Sharon, R. Basri, and A. Brandt, “Hier- the California Institute of Technology in 2003 and
archy and adaptivity in segmenting visual scenes,” Nature, vol. a PhD in Computer Science from the University
442, pp. 810–813, 2006. of California, Berkeley in 2009. He is currently
[53] D. Mumford and J. Shah, “Optimal approximations by piecewise a Postdoctoral Scholar in the Department of
smooth functions, and associated variational problems,” Commu- Electrical Engineering at Caltech. His research
nications on Pure and Applied Mathematics, pp. 577–684, 1989. interests are in computer vision as well as its use
[54] J. M. Morel and S. Solimini, Variational Methods in Image Segmen- in biological image and video analysis.
tation. Birkhäuser, 1995.
[55] G. Koepfler, C. Lopez, and J. Morel, “A multiscale algorithm
for image segmentation by variational method,” SIAM Journal on Charless Fowlkes received a BS with honors
Numerical Analysis, 1994. from Caltech in 2000 and a PhD in Computer
[56] T. Chan and L. Vese, “Active contours without edges,” IP, 2001. Science from the University of California, Berke-
[57] L. Vese and T. Chan, “A multiphase level set framework for image ley in 2005, where his research was supported
segmentation using the mumford and shah model,” IJCV, 2002. by a US National Science Foundation Graduate
[58] S. Osher and J. Sethian, “Fronts propagation with curvature Research Fellowship. He is currently an Assis-
dependent speed: Algorithms based on Hamilton-Jacobi formu- tant Professor in the Department of Computer
lations,” Journal of Computational Physics, 1988. Science at the University of California, Irvine. His
[59] J. A. Sethian, Level Set Methods and Fast Marching Methods. Cam- research interests are in computer and human
bridge University Press, 1999. vision, and in applications to biological image
[60] T. Pock, D. Cremers, H. Bischof, and A. Chambolle, “An algorithm analysis.
for minimizing the piecewise smooth mumford-shah functional,”
ICCV, 2009. Jitendra Malik was born in Mathura, India in
[61] F. J. Estrada and A. D. Jepson, “Benchmarking image segmenta- 1960. He received the B.Tech degree in Elec-
tion algorithms,” IJCV, 2009. trical Engineering from Indian Institute of Tech-
[62] W. M. Rand, “Objective criteria for the evaluation of clustering nology, Kanpur in 1980 and the PhD degree in
methods,” Journal of the American Statistical Association, vol. 66, Computer Science from Stanford University in
pp. 846–850, 1971. 1985. In January 1986, he joined the university
[63] A. Savitzky and M. J. E. Golay, “Smoothing and differentiation of of California at Berkeley, where he is currently
data by simplified least squares procedures,” Analytical Chemistry, the Arthur J. Chick Professor in the Computer
1964. Science Division, Department of Electrical Engi-
[64] C. Fowlkes, D. Martin, and J. Malik, “Learning affinity functions neering and Computer Sciences. He is also on
for image segmentation: Combining patch-based and gradient- the faculty of the Cognitive Science and Vision
based approaches,” CVPR, 2003. Science groups. During 2002-2004 he served as the Chair of the
[65] T. Leung and J. Malik, “Contour continuity in region-based image Computer Science Division and during 2004-2006 as the Department
segmentation,” ECCV, 1998. Chair of EECS. He serves on the advisory board of Microsoft Research
[66] S. Belongie and J. Malik, “Finding boundaries in natural images: A India, and on the Governing Body of IIIT Bangalore.
new method using point descriptors and area completion,” ECCV, His current research interests are in computer vision, computational
1998. modeling of human vision and analysis of biological images. His work
[67] X. Ren and J. Malik, “Learning a classification model for segmen- has spanned a range of topics in vision including image segmentation,
tation,” ICCV, 2003. perceptual grouping, texture, stereopsis and object recognition with ap-
[68] S. Beucher and F. Meyer, Mathematical Morphology in Image Pro- plications to image based modeling and rendering in computer graphics,
cessing. Marcel Dekker, 1992, ch. 12. intelligent vehicle highway systems, and biological image analysis. He
[69] L. Najman and M. Schmitt, “Geodesic saliency of watershed has authored or co-authored more than a hundred and fifty research
contours and hierarchical segmentation,” PAMI, 1996. papers on these topics, and graduated twenty-five PhD students who
[70] S. Rao, H. Mobahi, A. Yang, S. Sastry, and Y. Ma, “Natural image occupy prominent places in academia and industry. According to Google
segmentation with adaptive texture and boundary encoding,” Scholar, four of his papers have received more than a thousand cita-
ACCV, 2009. tions.
[71] J. Shotton, J. Winn, C. Rother, and A. Criminisi, “Textonboost: He received the gold medal for the best graduating student in Elec-
Joint appearance, shape and context modeling for multi-class trical Engineering from IIT Kanpur in 1980 and a Presidential Young
object recognition and segmentation,” ECCV, 2006. Investigator Award in 1989. At UC Berkeley, he was selected for the
[72] Y. Boykov and M.-P. Jolly, “Interactive graph cuts for optimal Diane S. McEntyre Award for Excellence in Teaching in 2000, a Miller
boundary & region segmentation of objects in n-d images.” ICCV, Research Professorship in 2001, and appointed to be the Arthur J. Chick
2001. Professor in 2002. He received the Distinguished Alumnus Award from
[73] C. Rother, V. Kolmogorov, and A. Blake, ““Grabcut”: Interactive IIT Kanpur in 2008. He was awarded the Longuet-Higgins Prize for a
foreground extraction using iterated graph cuts,” SIGGRAPH, contribution that has stood the test of time twice, in 2007 and in 2008.
2004. He is a fellow of the IEEE and the ACM.