0% found this document useful (0 votes)
19 views7 pages

Statistical Region Merging: Richard Nock and Frank Nielsen

This document discusses statistical region merging as an approach to image segmentation. It proposes a model for generating theoretical images and shows that region merging can approximate the optimal segmentation of observed images generated by this model with only overmerging errors that are statistically small. Experimental results on gray-scale and color images using the described approach demonstrate the quality of segmentations obtained.

Uploaded by

Saad Juboory
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views7 pages

Statistical Region Merging: Richard Nock and Frank Nielsen

This document discusses statistical region merging as an approach to image segmentation. It proposes a model for generating theoretical images and shows that region merging can approximate the optimal segmentation of observed images generated by this model with only overmerging errors that are statistically small. Experimental results on gray-scale and color images using the described approach demonstrate the quality of segmentations obtained.

Uploaded by

Saad Juboory
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO.

11, NOVEMBER 2004 1

Statistical Region Merging


Richard Nock and Frank Nielsen

Abstract—This paper explores a statistical basis for a process often described in computer vision: image segmentation by region
merging following a particular order in the choice of regions. We exhibit a particular blend of algorithmics and statistics whose
segmentation error is, as we show, limited from both the qualitative and quantitative standpoints. This approach can be efficiently
approximated in linear time/space, leading to a fast segmentation algorithm tailored to processing images described using most
common numerical pixel attribute spaces. The conceptual simplicity of the approach makes it simple to modify and cope with hard
noise corruption, handle occlusion, authorize the control of the segmentation scale, and process unconventional data such as spherical
images. Experiments on gray-level and color images, obtained with a short readily available C-code, display the quality of the
segmentations obtained.

Index Terms—Grouping, image segmentation.

1 INTRODUCTION
pixels with homogeneous properties and they are iteratively
I Tis established since the Gestalt movement in psychology
that perceptual grouping plays a fundamental role in
human perception. Even though this observation is rooted in
grown by combining smaller regions or pixels, pixels being
elementary regions. Region growing/merging techniques
the early part of the 20th century, the adaptation and usually work with a statistical test to decide the merging of
automation of the segmentation (and, more generally, regions [9]. A merging predicate uses this test, and builds the
grouping) task with computers has remained so far a segmentation on the basis of (essentially) local decisions. This
tantalizing and central problem for image processing. Vision locality in decisions has to preserve global properties, such as
is widely accepted as an inference problem, i.e., the search of those responsible for the perceptual units of the image [8]. In
what caused the observed data [1]. In this respect, the Fig. 1, the grassy region below the castle is one such unit, even
grouping problem can be roughly presented as the transfor- when its variability is high compared to the other regions of
mation of the collection of pixels of an image into a visually the image. In that case, a good region merging algorithm has
meaningful partition of regions and objects. to find a good balance between preserving this unit and the
This postulates implicitly the existence of optimal seg- risk of overmerging for the remaining regions. Fig. 1b shows
mentation(s) which we should aim at recovering or approx- the result of our approach. As long as the approach is greedy,
imating, and this task implies casting the perceptual two essential components participate in defining a region
formulation of optimality into a formalized, well-defined merging algorithm: the merging predicate and the order
problem. A prominent trend in grouping focuses on graph followed to test the merging of regions. There is a lack of
cuts, mapping image pixels onto graph vertices, and the theoretical results on the way these two components interact
spatial relationships between pixels onto weighted graph together, and can benefit from each other. This might be
edges. The objective is to minimize a cut criterion, given that partially due to the fact that most approaches use assump-
any cut on this graph yields a partition of the image into tions on distributions, more or less restrictive, which would
(hopefully) coherent visual patterns. Cut criteria range from make any theoretical insight into how region merging works
conventional [2] to more sophisticated criteria, tailored to restricted to such settings and, therefore, of possibly moder-
grouping [3], [4], [5]. These are basically global criteria; ate interest (see, e.g., [10] for related criticisms).
however, the strategies adopted for their minimization range Our aim in this paper is to propose a path and its
through a broad spectrum, from local [6] to global optimiza- milestones from a novel model of image generation, the
tion [5], through intermediate choices [7], [3]. Global theoretical properties of possible segmentation approaches to
optimization strategies have the advantage to directly tackle a practical, readily available system of image segmentation,
the problem as a whole, and may offer good approximations and its extensions to miscellaneous problems related to image
[5], at possible algorithmic expenses though [3], [5]. segmentation. First, the key idea of this model is to really
In this paper, we focus on a different strategy which formulate image segmentation as an inference problem [1]. It
belongs to the family of region growing and merging is the reconstruction of regions on the observed image, based
techniques [8], [9]. In region merging, regions are sets of on an unknown theoretical (true) image on which the true
regions we seek are statistical regions whose borders are
defined from a simple axiom. Second, we show the existence
. R. Nock is with the Université Antilles-Guyane, Département Scientifique of a particular blend of statistics and algorithmics to process
Inter-facultaire/GRIMAAG Lab., B.P. 7209, 97278 Schoelcher, Martini- observed images generated with this model, by region
que, France. E-mail: [email protected]. merging, with two statistical properties. With high prob-
. F. Nielsen is with Sony Computer Science Laboratories Inc., 3-14-13
Higashi Gotanda, Shinagawa-Ku, Tokyo 141-0022, Japan.
ability, the algorithm suffers only one source of error for
E-mail: [email protected]. image segmentation: overmerging, that is, the fact that some
observed region may contain more than one true region. The
Manuscript received 8 Aug. 2003; revised 26 Jan. 2004; accepted 1 Apr. 2004.
Recommended for acceptance by R. Basri. algorithm does not suffer neither undermerging, nor the—-
For information on obtaining reprints of this article, please send e-mail to: most frequent—hybrid cases where observed regions may
[email protected], and reference IEEECS Log Number TPAMI-0219-0803. partially span several true regions. Yet, there is more: With
0162-8828/04/$20.00 ß 2004 IEEE Published by the IEEE Computer Society
2 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 11, NOVEMBER 2004

Fig. 3. Schematic view of some theoretical image I  , and a corresponding


observed image I. Only the average over R; G; B of the theoretical
pixels’ first moments are shown in I  . According to the homogeneity
property (see text), (a) shows the true (optimal) segmentation of I. a is the
process generating the observed image (see also Fig. 2), and b is
grouping’s objective (i.e., find the statistical regions’ borders, given I).

Fig. 1. An RGB image and the segmentation found by our segmentation chosen not to use complex formulations of the colors, such as
method (regions are white bordered and averaged inside).
the L  u  v  space [10].
I is an observation of a perfect scene I  we do not know of,
high probability, this overmerging error is, as we show, in which pixels are perfectly represented by a family of
formally small as the algorithm manages an accuracy in distributions, from which each of the observed color channel is
segmentation close to the optimum, up to low order terms. sampled. In I  , the optimal (or true, or statistical) regions
The algorithm has some desirable features: It relies on a represent theoretical objects sharing a common homogeneity
simple interaction between a merging predicate easily property:
implementable, and an order in merging approximable in
linear time. Furthermore, it can be adapted to most numerical . Inside any statistical region and given any color
feature description spaces (RGB, HSI, L  u  v  , etc.). channel 2 fR; G; Bg, the statistical pixels have the
Third, we provide a C-code implementation of this last same expectation for this color channel.
algorithm, which is a few hundred lines of C, and experi- . The expectations of adjacent statistical regions are
ments on various benchmark images, as well as comparisons different for at least one color channel 2 fR; G; Bg.
with other algorithms. Last, we show how to extend the I is obtained from I  by sampling each statistical pixel for
algorithm to naturally cope with hard noise and/or sig- observed RGB values. Fig. 2 presents an example of a color
nificantly occluded images at very affordable algorithmic channel for one pixel in I  and how to generate the
complexity. Though running the algorithm does not require corresponding observed color channel of the pixel in I. In
tuning its parameters, the control of a statistical complexity each pixel of I  , each color channel is replaced by a set of
exactly Q independent random variables (r.v.), taking
parameter makes it possible to adjust the segmentation scale
positive values on domains bounded by g=Q, such that any
in a simple manner.
possible sum of outcomes of these Q r.v. belongs to
The next section presents our model of image generation.
f1; 2; . . . ; gg. Fig. 3 gives an example of some true image I 
Section 3 details our analysis and algorithm, first for the gray- (in fact, it is the result of our algorithm, ran on Fig. 3b) which
level setting, and then for color images. Section 4 presents our displays the expectation of statistical pixels, and the observed
experiments. The last two sections conclude and detail the image I generated from I  . Given the homogeneity property,
code availability. frontiers between true regions are connecting pixels with
differences in their color expectations, and the ideal segmen-
2 PRELIMINARY NOTATIONS AND MODELS tation of I relies on the frontiers between the statistical regions
shown on I  in Fig. 3.
The notation j:j stands for cardinal. The observed image, I, The sampling of each pixel and its color channels are
contains jIj pixels, each containing Red-Green-Blue (RGB) supposed independent from each other. It is important to note
values, each of the three belonging to the set f1; 2; . . . ; gg (in that this is the only assumption we make on I  , and the
practice, we would have g ¼ 256). We have deliberately frequent independent identically distributed (i.i.d.) assump-

Fig. 2. Generation of a single color channel for one pixel from a statistical region O of I  to some observed pixel of I.
NOCK AND NIELSEN: STATISTICAL REGION MERGING 3

tion is relaxed in this model to that of ordinary independence. the deviation with the absolute value is at most twice that
Inside a statistical region, it can be the case that all without, and using Theorem 1 (solving for ) brings our
distributions associated to each pixel are different, as long as result. u
t
the homogeneity property is satisfied. This freedom has a
counterpart, which led us to introduce Q, not necessarily to
make our model more general, but, essentially, for practical Suppose we do N merging tests in I. Then, with probability
purposes: The conventional choice Q ¼ 1 would actually  1  ðNÞ, all couples of regions ðR; R0 Þ whose merging is
0 0
make it hard to estimate reliably anything for small regions or, tested shall satisfy jðR  R Þ  EðR  R Þj  bðR; R0 Þ, with
equivalently, would make it necessary to consider very large bðR; R0 Þ the right member of Corollary 1. Remark that N is
images to improve the statistical accuracy of the segmenta-
tion. Notice that Q is a parameter which makes sense: It allows small: for a single-pass algorithm, N < jIj2 . In our 4-connexity
us to quantify the statistical complexity of I  , the generality of setting (each pixel is connected to its north, south, east, and
the model, and the statistical hardness of the task as well. From west neighbors when they exist), we even have N < 2jIj. What
an experimental standpoint, tuning Q modifies the statistical we really need to test the merging of two observed regions R
complexity of the scene, and makes it possible to control the
coarseness of the segmentation, with the possibility to build a and R0 is a predicate accurate enough when the pixels of R [
hierarchy of coarse-to-fine (multiscale) segmentations of an R0 come from the same statistical region of I  . From this
image [3]. standpoint, using Corollary 1 to devise a merging predicate is
0
straightforward: In this case, we have EðR  R Þ ¼ 0 and,
0
3 THEORETICAL ANALYSIS AND ALGORITHMS thus, with high probability, the deviation jR  R j does not
0
For the sake of simplicity, we first state our theoretical exceed bðR; R Þ. The merging predicate on two candidate
results for a single color band (e.g., gray-level). On this regions R and R0 could thus be “merge R and R0 iff
basis, the extension of the results to more numerical 0
jR  R j  bðR; R0 Þ,” with bðR; R0 Þ a merging threshold. We
channels, such as RGB, does not require an involved
shall see hereafter that such a predicate is optimistic: Under
analysis: It is presented in Section 3.3. Recall that it is
enough to give a merging predicate and an order to test some assumption, it tends sometimes to favor overmerging
region mergings, to completely define our segmentation (i.e., it does more merges than necessary to actually recover
algorithm. I  ), but this phenomenon formally remains quantitatively
3.1 Merging Predicate small. For both theoretical and practical considerations, we
Our first result is based on the following theorem. are going to replace this merging predicate by one slightly
more optimistic, i.e., with a larger merging threshold. This one
Theorem 1 (The independent bounded difference in-
equality, [11]). Let X ¼ ðX1 ; X2 ; . . . ; Xn Þ be a family of n turns out to theoretically incur the same error (up to low order
independent r.v. with Xk taking values in a set Ak for terms), and it gives very good visual results. Let Rl be the set of
Qeach pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
k. Suppose that the real-valued function f defined on k Ak regions with l pixels and bðRÞ ¼ g ð1=ð2QjRjÞÞ lnðjRjRj j=Þ.
satisfies jfðxÞ  fðx0 Þj  ck whenever vectors x and x0 differ Remark that provided regions R and R0 are not empty,
only in the kth coordinate. Let  be the expected value of the pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
r.v. fðXÞ. Then, for any   0, bðR; R0 Þ  b2 ðRÞ þ b2 ðR0 Þ < bðRÞ þ bðR0 Þ: ð3Þ
!
X Hereafter, we prove a quantitative bound on the error
2 2
PrðfðXÞ    Þ  exp 2 = ðck Þ : ð1Þ obtained with the largest quantity (the right one) used as
k merging threshold: it holds for both others as well. The
center quantity is the merging threshold we use. An
From this theorem, we obtain the following result on the
deviation of observed differences between regions of I. upperbound on jRl j makes it quite reasonable with regard
Here, the notation EðRÞ for some arbitrary region R is the to bðR; R0 Þ. Considering that a region is an unordered bag of
expectation over all corresponding statistical pixels of I  of pixels (each color channel is given 0; 1; . . . ; l pixels), we may
their sum of expectations of their Q r.v. for the single color fix jRl j  ðl þ 1Þminfl;gg (we have l þ 1 choices for the number
band, and R is the observed average of this color band. of pixels having each color channel, which makes
jRl j  ðl þ 1Þg , and then we reduce this large upperbound
Corollary 1. Consider a fixed couple ðR; R0 Þ of regions of I.
80 <   1, the probability is no more than  that by counting the duplicates for l < g). To summarize, our
merging predicate is:
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
  ffi
0 0 1 1 1 2  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
jðR  R Þ  EðR  R Þj  g þ ln : ð2Þ 0 true if jR0  Rj  b2 ðRÞ þ b2 ðR0 Þ
2Q jRj jR0 j  PðR; R Þ ¼ ð4Þ
false otherwise:

Proof. Suppose we shift the value of the outcome of one r.v. 3.2 Order in Merging
among the QðjRj þ jR0 jÞ possible for the couple ðR; R0 Þ. The order in which we test the merging of regions follows a
0
jR  R j is subject to a variation of at most cR ¼ g=ðQjRjÞ simple invariant A which we define as follows:
when this modification affects region R (among QjRj def
possible), and at most cR0 ¼ g=ðQjR 0 . ðAÞ ¼ when any test between two (parts of) true
P jÞ for2 a change inside
R (among QjR j possible). We get k ðck Þ ¼ QðjRjðcR Þ2 þ
0 0 regions occurs, that means that all tests inside each
jR0 jðcR0 Þ2 Þ ¼ ðg2 =QÞðð1=jRjÞ þ ð1=jR0 jÞÞ. Using the fact that of the two true regions have previously occurred.
4 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 11, NOVEMBER 2004

It is crucial to note that A does not postulate the knowledge of Here, Ra denotes the observed average for color channel
the segmentation of I  . To make it clear why we should strive a in region R. Provided invariant A holds as in Section 3.2,
to fulfill A, let us first recall the three types of error a our predicate preserves overmerging, and the same bound
segmentation can suffer. First, undermerging represents the as that of Theorem 3 holds on the error if we measure it as
case where one or more regions obtained are strict subparts of the sum of errors over the three color channels.
true regions. Second, overmerging represents the case where
some regions obtained strictly contain more than one true 3.4 Our Algorithm: SRM
region. Third, there is the “hybrid” (and most probable) case In 4-connexity, there are N < 2jIj couples of adjacent pixels.
where some regions obtained contain more than one strict Let SI be the set of these couples. Let fðp; p0 Þ be a real-
subpart of true regions. We have already partially outlined valued function, with p and p0 pixels of I. Our segmentation
this in the preceeding section related to the merging algorithm, SRM (for Statistical Region Merging) is simple.
predicate: together with P (4), A makes it possible to control We first sort the couples of SI in increasing order of fð:; :Þ,
the segmentation error from both the qualitative and and then traverse this order only once. We make for any
quantitative standpoints. The next theorem states that only current couple of pixels ðp; p0 Þ 2 SI for which RðpÞ 6¼ Rðp0 Þ
overmerging occurs with high probability. In this theorem, (RðpÞ stands for the current region to which p belongs) the
we define s ðIÞ as the set of regions of the ideal (optimal) test PðRðpÞ; Rðp0 ÞÞ, and merge RðpÞ and Rðp0 Þ iff it returns
segmentation of I (defined from I  , see Fig. 3) and sðIÞ the set true. The objective is obviously to choose fð:; :Þ so as to
of regions in our segmentation of I. approximate A as best as possible.
The next section reviews some choices we have made for
Theorem 2. With probability  1  OðjIjÞ, the segmentation on fð:; :Þ, each of constant time computation. Because we do
I satisfying A is an overmerging of I  , that is: 8O 2 s ðIÞ; 9R 2 not update the list of merging tests after merging two
sðIÞ : O  R. regions, a simple ordering based on radix sorting with color
Proof. From Corollary 1, with probability > 1  ðNÞ ¼ 1  differences as the keys yields a preordering time complexity
OðjIjÞ, any couple of regions (R; R0 ) coming from the OðjIj logðgÞÞ—linear in jIj—for our basic implementations of
same statistical region of I  , and whose merging is tested, SRM. The merging steps afterward are space/time compu-
0 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi tational optimal [13], which makes SRM also optimal from
satisfy jR  R j  bðR; R0 Þ. Since bðR; R0 Þ  b2 ðRÞþb2 ðR0 Þ,
both standpoints. The execution time of our basic imple-
our merging predicate PðR; R0 Þ (4) would authorize the mentation of SRM, which is not optimized, segments our
merging of R and R0 . Using the fact that A holds together largest images (512  512) in about one second on an Intel
with this property, we first rebuild all true regions of I  , Pentium 1 IV 2.40 GHz processor.
and then eventually make some more merges: The
segmentation obtained is an overmerging of I  with high
4 EXPERIMENTAL RESULTS
probability, as claimed. u
t
The next theorem shows a quantitative upperbound on Our model of image generation makes implicitly the
the error incurred with respect to the optimal segmentation. assumption that observed color variations inside true regions
We define this error as the weighted average of the should reasonably be smaller than between true regions. Such
(absolute) channel differences over all nonempty intersec- an assumption is made explicit in [8]. Thus, a good way to
tions of regions between s ðIÞ and sðIÞ: approximate A is to capture the between-pixel local gradi-
ents, and then compute their maximal per-channel variation
ErrðsðIÞÞ ¼ E R\O;R2sðIÞ;O2s ðIÞ jEðOÞ  EðRÞj; ð5Þ
in fð:; :Þ: fðp; p0 Þ ¼ maxa2fR;G;Bg fa ðp; p0 Þ. Below, we review
with E (slanted) denoting the expectation with associated some experiments using SRM. The reader may keep in mind
probability measure ðR \ OÞ ¼ jR \ Oj=jIj. that, unless otherwise stated, the values of the parameters of
Theorem 3. 80 <  < 1, with probability  1  OðjIjÞ: SRM are the same for all images:  ¼ 1=ð6jIj2 Þ (Corollary 1)
and Q ¼ 32. Furthermore, the images are used as they are, i.e.,
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
 !
js ðIÞj ln js ðIÞj 1 without any preprocessing. Therefore, there is no extensive
ErrðsðIÞÞ  O g ln þ g ln jIj : ð6Þ domain nor image-dependent tuning of the parameters.
jIjQ 
4.1 Basic Choices for fð:; :Þ
(Proof omitted.) This theorem is interesting for three (mostly)
We have tested two basic choices to compute fa ðp; p0 Þ. The
theoretical reasons. First,
pffiffiffi the constant hidden in the big-Oh simplest choice is to pick directly the pixel channel values
notation is small (< 6); second, it is proven for the largest
(pa and p0a ):
merging threshold in (3). Last, if we ignore log-terms, the error
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
incurred by our segmentation is driven by g js ðIÞj=ðjIjQÞ, a fa ðp; p0 Þ ¼ jpa  p0a j: ð8Þ
close order approximation to the optimum [12].
Our second choice for fa ð:; :Þ consists of extending convolu-
3.3 Color Images tion kernels classically used in edge detection for pixel-wise
The merging predicate for the RGB setting is: gradient estimation. In 4-connexity, neighbor pixels are
8 either horizontal or vertical. Thus, we only need @^x or @^y
> true if 8a 2 fR; G; Bg; between neighbor pixels p and p0 , for each color channel.
>
< We have chosen Sobel convolution filters, where smoothing
PðR; R0 Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð7Þ
> jR0 a  Ra j  b2 ðRÞ þ b2 ðR0 Þ is performed by the convolution mask ½1 2 1 and the
>
: derivative filter is performed by the convolution mask
false otherwise:
½1 0 0 1.
NOCK AND NIELSEN: STATISTICAL REGION MERGING 5

Fig. 4. An experimental display of the importance of sorting. Regions


obtained in the segmentations are gray-level averaged with white borders.

Regardless of the choice of fa ð:; :Þ, the fact that we do


not reorder the merging list during the merging steps
might appear to be quite a strong constraint for efficient
approximations to A. The following simple experiment is
an advocacy that it is not the case, and it uses our
simplest implementation of fa ð:; :Þ (8). Fig. 4 displays the
result of SRM on gray-level images (with  ¼ 1=jIj2 ), and
the result of the same algorithm in which the order is
replaced by a conventional scanning of the image (from
left to right and top to bottom) [13]. The preordering
Fig. 6. Sample results comparing both versions of SRM and [8].
clearly manages dramatic improvements over conven- Segmentation conventions are [8]s: region colors are chosen at random.
tional scanning.
Fig. 5 presents experiments obtained with our two gradients, there is a visual advantage to SRM(w) (e.g., cup).
methods for computing fa ð:; :Þ on images for which results Notice also the segmentation of SRM(w) on bldg, a picture
are visually different. On images with significant color exhibiting a large amount of motion blur. Remark from
squirrel that SRM is able to isolate regions with high
variability (e.g., the grass), and obtains results even better
than [9] on the squirrel image: Their segmentation,
although tailor-made for textured images, obtains a
segmentation of the grass with many holes, a common
drawback of region-merging techniques [9].
4.2 Random Noise Corruption
We have chosen to study the way SRM handles noise with
our two choices for fa ð:; :Þ, against two hard noise types.
Each color channel 2 fR; G; Bg of each pixel 2 I is
transformed with probability q 2 ½0; 1 into a new value:

. chosen uniformly in f1; 2; . . . ; gg for transmission


noise (tðqÞ), or
. chosen uniformly in f1; gg (the extremes) for salt and
pepper noise (sðqÞ).
Fig. 6 shows different images corrupted with increasing
amounts of noise, and the results of [8] and SRM. From
45 percent noise, the results of [8] appear to be random,
whereas SRM still manages to find most interesting regions of
the images. However, on some images, SRM obtained a brutal
degradation of its performances for significant noise levels
Fig. 5. Comparison of SRM without specific gradient estimation ((8),
center images) and with convolution kernels (right images). Regions are and for both ways of computing fa ð:; :Þ, indicating that the
color averaged with white borders. algorithm seems to reach its limits. To handle larger noise
6 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 11, NOVEMBER 2004

Fig. 7. Results and comparisons with related approaches of SRM with


our noise extensions for (8) (w/o) and Sobel-type filters (w), with  ¼ 10.
(See text for the segmentation conventions.)
Fig. 8. Results on images with occlusions. The largest regions found by
SRM are displayed for each image.
corruption, we have extended both solutions for fa ð:; :Þ to
more robust estimations, paying no significant additional
computed, seems to vary between images, but it always has
computational cost. First, we replace pa (8), by a moving to remain small so as not to obtain “boxy-shaped” regions.
average over a region defining a neighborhood around the In this respect,  ¼ 10 is not far from the maximal value.
pixel: fa ðp; p0 Þ ¼ jNp ðp0 Þa  Np0 ðpÞa j. Here, Np ðp0 Þ is the region
defined by the set of points of I that are within Manhattan 4.3 Handling Occlusions
distance  2 to p0 (for some integer   0), and that are In our model, occlusions make it necessary to relax the 4-
closer to p0 than they are to p. Whenever  ¼ 0, this expression connexity constraint on the statistical regions of I  . Handling
matches that of (8). Second, we replace our convolution filter them from an experimental point of view is simple: We first
by larger ones, where smoothing is performed by the run SRM as already presented. Then, in a second stage, we run
convolution mask ½1 2     þ 1     1, and the derivative it again with two major modifications on the preordering
filter on each color channel is replaced by a local least-square step: We replace the pixels of I by the regions found after the
linear regression on points whose abscissae are those of the first step, and replace the 4-adjacency graph between pixels
convolution mask ½   þ 1    , and ordinates by the clique graph between these regions. We also replace
defined by color channels of the corresponding pixels. fa ð:; :Þ in (8) by: fa ðR; R0 Þ ¼ jR0 a  Ra j. Radix sorting with the
Whenever  ¼ 1, this matches our Sobel filter’s extension in fð:; :Þ values as the keys brings an overall time complexity
Section 4.1. OððjIj þ k2 Þ log gÞ, where k is the number of regions found
Using radix sorting again with fð:; :Þ values as the keys, after the first step. The fact that our approach relies on slight
the whole time complexity of our modifications of SRM overmerging tends to narrow the influence of k in the
becomes OðjIjðlog g þ ÞÞ, which is still linear in jIj if  is complexity. Fig. 8 shows some results obtained, on which
constant.
SRM has managed to rebuild accurately the principal
Fig. 7 reports results on the castle of Fig. 1. Conven-
occluded regions (such as the road on the road image,
tions for the segmentations results are as follows: [10]’s
regions are averaged with the original colors, [8]’s are despite the relative noise of this video-extracted picture).
averaged with random colors, and SRMs follow [10]’s (with 4.4 Controlling the Scale of the Segmentation
white bordered regions). Notice that the number of regions
found by [10], [8] explode with corruption, a phenomenon Some authors have emphasized the need to control the
which does not appear for SRM modified. The segmentation coarseness of a segmentation [7], [3], [4]. The objective of
time for the three algorithms gives a clear advantage to [8] multiscale segmentation is to get a hierarchy of segmenta-
and SRM modified. This image gives a slight advantage to tions at different scales, and get at each scale a level of
SRM(w/o) over SRM(w) for noise handling, but we have details compatible with the perceptual organization of the
remarked that both versions perform quite similarly on image at this scale. In our case, controlling the scale is made
average. The best value of , which controls the local easy with the tuning of parameter Q: The smaller it is, the
number of pixels on which each gradient approximation is harder is the statistical estimation task, and the less
NOCK AND NIELSEN: STATISTICAL REGION MERGING 7

REFERENCES
[1] D. Forsyth and J. Ponce, Computer Vision—A Modern Approach.
Prentice Hall, 2003.
[2] Z. Wu and R. Leahy, “An Optimal Graph Theoretic Approach to
Data Clustering,” IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 15, pp. 1101-1113, 1993.
[3] E. Sharon, A. Brandt, and R. Basri, “Fast Multiscale Image
Segmentation,” Proc. IEEE Int’l Conf. Computer Vision and Pattern
Recognition, pp. 70-77, 2000.
[4] E. Sharon, A. Brandt, and R. Basri, “Segmentation and Boundary
Detection Using Multiscale Intensity Measurement,” Proc. IEEE
Int’l Conf. Computer Vision and Pattern Recognition, pp. 469-476,
2001.
[5] J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22,
pp. 888-905, 2000.
[6] A. Barbu and S.-C. Zhu, “Graph Partition by Swendsen-Wang
Cuts,” Proc. Ninth IEEE Int’l Conf. Computer Vision, pp. 320-327,
2003.
[7] M. Galun, E. Sharon, R. Basri, and A. Brandt, “Texture
Segmentation by Multiscale Aggregation of Filter Responses and
Shape Elements,” Proc. Ninth IEEE Int’l Conf. Computer Vision,
pp. 716-725, 2003.
[8] P.F. Felzenszwalb and D.P. Huttenlocher, “Image Segmentation
Using Local Variations,” Proc. IEEE Int’l Conf. Computer Vision and
Fig. 9. Segmentations of SRM on image house, for different values of Q. Pattern Recognition, pp. 98-104, 1998.
Regions found are white-bordered (see text for details). [9] S.-C. Zhu and A. Yuille, “Region Competition: Unifying Snakes,
Region Growing, and Bayes/MDL for Multiband Image Segmen-
tation,” IEEE Trans. Pattern Analysis and Machine Intelligence,
numerous are the regions in the final segmentation. To vol. 18, pp. 884-900, 1996.
visualize this, we run SRM with  ¼ 2 for our extension of [10] D. Comaniciu and P. Meer, “Robust Analysis of Feature Spaces:
Color Image Segmentation,” Proc. IEEE Int’l Conf. Computer Vision
Sobel convolution filters (see Section 4.2), and making Q and Pattern Recognition, pp. 750-755, 1997.
range through the values 1; 2; 4; . . . ; 256. Fig. 9 presents the [11] C. McDiarmid, “Concentration,” Probabilistic Methods for Algorith-
results obtained on house image. It is interesting to note mic Discrete Math., M. Habib, C. McDiarmid, J. Ramirez-Alfonsin,
and B. Reed, eds., pp. 1-54, Springer Verlag, 1998.
that as Q increases, the regions found are getting smaller, [12] M.J. Kearns and Y. Mansour, “A Fast, Bottom-Up Decision Tree
but they often correspond to smaller perceptual regions of Pruning Algorithm with Near-Optimal Generalization,” Proc. 15th
the image at different scales (e.g., the house gets segmented Int’l Conf. Machine Learning, pp. 269-277, 1998.
[13] C. Fiorio and J. Gustedt, “Two Linear Time Union-Find Strategies
gradually, from itself as a whole until all its details are for Image Processing,” Theoretical Computer Science, vol. 154,
gradually extracted: facades, windows, rooftops, etc.). pp. 165-181, 1996.

Richard Nock received the agronomical engi-


5 CONCLUSION neering degree from the Ecole Nationale Super-
ieure Agronomique de Montpellier, France
In this paper, we propose a segmentation algorithm based (1993), the PhD degree in computer science
(1998), and an accreditation to lead research
on the idea that perceptual grouping with region merging (HDR, 2002) from the University of Montpellier
has to catch the big picture of a scene by only having II, France. Since 1998, he has been a faculty
primary local glimpses on it. Our algorithm is based on a member at the Universite Antilles-Guyane in
Guadeloupe and in Martinique, where his
model of image generation which captures the idea that primary research interests include machine
grouping is an inference problem. This provides us with a learning, data mining, computational complexity, and image processing.
simple merging predicate, and a simple ordering in merges
Frank Nielsen received the BSc and MSc
which, with high probability, both suffers only one source degrees from Ecole Normale Superieure (ENS)
of error (overmerging), and achieves with high probability a of Lyon (France) in 1992 and 1994, respectively.
low error in segmentation. It can be reliably approximated He defended his PhD thesis on adaptive
computational geometry prepared at INRIA
by very fast segmentation algorithm, SRM, which from our Sophia-Antipolis (France) under the supervision
experiments tends indeed to satisfy our goal of image of Professor J.-D. Boissonnat in 1996. As a civil
servant of the University of Nice (France), he
segmentation. Experiments display the ability of the gave lectures at the engineering schools ESSI
approach to cope with significant noise corruption, handle and ISIA (Ecole des Mines). In 1997, he served
occlusions, and perform scale-sensitive segmentations. in the army as a scientific member in the computer science laboratory of
Ecole Polytechnique. In 1998, he joined Sony Computer Science
Laboratories Inc., Tokyo (Japan), as a researcher. His current research
interests include geometry, vision, graphics, learning, and optimization.
ACKNOWLEDGMENTS
The authors would like to thank the reviewers for their
insightful comments on this paper. This work was done . For more information on this or any other computing topic,
while Richard Nock was visiting Sony CSL Tokyo. please visit our Digital Library at www.computer.org/publications/dlib.
Resources related to SRM may be obtained from the authors
Web pages.

You might also like