0% found this document useful (0 votes)
44 views41 pages

5 Segmentation Chapter

This document discusses image thresholding techniques for image segmentation. It begins by introducing thresholding as a simple yet effective method for segmenting images based on intensity values. Global thresholding uses a single intensity threshold value across the entire image, while local or adaptive thresholding uses variable threshold values that change across the image. The success of thresholding depends on factors like the separation between intensity peaks in the image histogram and the amount of noise. Thresholding becomes more difficult as noise increases and merges intensity peaks. The document then examines how noise and non-uniform illumination can affect the image histogram and complicate thresholding.

Uploaded by

Kareem Samir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views41 pages

5 Segmentation Chapter

This document discusses image thresholding techniques for image segmentation. It begins by introducing thresholding as a simple yet effective method for segmenting images based on intensity values. Global thresholding uses a single intensity threshold value across the entire image, while local or adaptive thresholding uses variable threshold values that change across the image. The success of thresholding depends on factors like the separation between intensity peaks in the image histogram and the amount of noise. Thresholding becomes more difficult as noise increases and merges intensity peaks. The document then examines how noise and non-uniform illumination can affect the image histogram and complicate thresholding.

Uploaded by

Kareem Samir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

738 Chapter 10 ■ Image Segmentation

original image. The lines were obtained by joining all gaps not exceeding
20% of the image height (approximately 100 pixels). These lines clearly corre-
spond to the edges of the runway of interest.
Note that the only key knowledge needed to solve this problem was the ori-
entation of the runway and the observer’s position relative to it. In other
words, a vehicle navigating autonomously would know that if the runway of in-
terest faces north, and the vehicle’s direction of travel also is north, the runway
should appear vertically in the image. Other relative orientations are handled
in a similar manner. The orientations of runways throughout the world are
available in flight charts, and direction of travel is easily obtainable using GPS
(Global Positioning System) information. This information also could be used
to compute the distance between the vehicle and the runway, thus allowing es-
timates of parameters such as expected length of lines relative to image size, as
we did in this example. ■

10.3 Thresholding
Because of its intuitive properties, simplicity of implementation, and computa-
tional speed, image thresholding enjoys a central position in applications of
image segmentation. Thresholding was introduced in Section 3.1.1, and we
have used it in various discussions since then. In this section, we discuss thresh-
olding in a more formal way and develop techniques that are considerably
more general than what has been presented thus far.

10.3.1 Foundation
In the previous section, regions were identified by first finding edge segments
and then attempting to link the segments into boundaries. In this section, we
discuss techniques for partitioning images directly into regions based on inten-
sity values and/or properties of these values.
The basics of intensity thresholding
Suppose that the intensity histogram in Fig. 10.35(a) corresponds to an image,
f(x, y), composed of light objects on a dark background, in such a way that ob-
ject and background pixels have intensity values grouped into two dominant
modes. One obvious way to extract the objects from the background is to se-
lect a threshold, T, that separates these modes. Then, any point (x, y) in the
image at which f(x, y) 7 T is called an object point; otherwise, the point is
called a background point. In other words, the segmented image, g(x, y), is
given by

g(x, y) = b
Although we follow
convention in using 0 1 if f(x, y) 7 T
intensity for the (10.3-1)
background and 1 for
0 if f(x, y) … T
object pixels, any two
distinct values can be When T is a constant applicable over an entire image, the process given in this
used in Eq. (10.3-1).
equation is referred to as global thresholding. When the value of T changes
over an image, we use the term variable thresholding. The term local or
regional thresholding is used sometimes to denote variable thresholding in
10.3 ■ Thresholding 739

a b
FIGURE 10.35
Intensity
histograms that
can be partitioned
(a) by a single
threshold, and
(b) by dual
thresholds.
T T1 T2

which the value of T at any point (x, y) in an image depends on properties of


a neighborhood of (x, y) (for example, the average intensity of the pixels in
the neighborhood). If T depends on the spatial coordinates (x, y) themselves,
then variable thresholding is often referred to as dynamic or adaptive thresh-
olding. Use of these terms is not universal, and one is likely to see them used
interchangeably in the literature on image processing.
Figure 10.35(b) shows a more difficult thresholding problem involving a
histogram with three dominant modes corresponding, for example, to two
types of light objects on a dark background. Here, multiple thresholding classi-
fies a point (x, y) as belonging to the background if f(x, y) … T1, to one ob-
ject class if T1 6 f(x, y)… T2, and to the other object class if f(x, y) 7 T2.
That is, the segmented image is given by

g(x, y) = c b
a if f(x, y) 7 T2
if T1 6 f(x, y) … T2 (10.3-2)
c if f(x, y) … T1

where a, b, and c are any three distinct intensity values. We discuss dual thresh-
olding in Section 10.3.6. Segmentation problems requiring more than two
thresholds are difficult (often impossible) to solve, and better results usually
are obtained using other methods, such as variable thresholding, as discussed
in Section 10.3.7, or region growing, as discussed in Section 10.4.
Based on the preceding discussion, we may infer intuitively that the success
of intensity thresholding is directly related to the width and depth of the val-
ley(s) separating the histogram modes. In turn, the key factors affecting the
properties of the valley(s) are: (1) the separation between peaks (the further
apart the peaks are, the better the chances of separating the modes); (2) the
noise content in the image (the modes broaden as noise increases); (3) the rel-
ative sizes of objects and background; (4) the uniformity of the illumination
source; and (5) the uniformity of the reflectance properties of the image.

The role of noise in image thresholding


As an illustration of how noise affects the histogram of an image, consider
Fig. 10.36(a).This simple synthetic image is free of noise, so its histogram consists
of two “spike” modes, as Fig. 10.36(d) shows. Segmenting this image into two
regions is a trivial task involving a threshold placed anywhere between the two
740 Chapter 10 ■ Image Segmentation

0 63 127 191 255 0 63 127 191 255 0 63 127 191 255

a b c
d e f
FIGURE 10.36 (a) Noiseless 8-bit image. (b) Image with additive Gaussian noise of mean 0 and standard
deviation of 10 intensity levels. (c) Image with additive Gaussian noise of mean 0 and standard deviation of
50 intensity levels. (d)–(f) Corresponding histograms.

modes. Figure 10.36(b) shows the original image corrupted by Gaussian noise
of zero mean and a standard deviation of 10 intensity levels. Although the cor-
responding histogram modes are now broader [Fig. 10.36(e)], their separation
is large enough so that the depth of the valley between them is sufficient to
make the modes easy to separate. A threshold placed midway between the two
peaks would do a nice job of segmenting the image. Figure 10.36(c) shows the
result of corrupting the image with Gaussian noise of zero mean and a stan-
dard deviation of 50 intensity levels. As the histogram in Fig. 10.36(f) shows,
the situation is much more serious now, as there is no way to differentiate be-
tween the two modes. Without additional processing (such as the methods dis-
cussed in Sections 10.3.4 and 10.3.5) we have little hope of finding a suitable
threshold for segmenting this image.

The role of illumination and reflectance


Figure 10.37 illustrates the effect that illumination can have on the histogram of
an image. Figure 10.37(a) is the noisy image from Fig. 10.36(b), and Fig. 10.37(d)
shows its histogram. As before, this image is easily segmentable with a single
threshold. We can illustrate the effects of nonuniform illumination by multiply-
ing the image in Fig. 10.37(a) by a variable intensity function, such as the in-
tensity ramp in Fig. 10.37(b), whose histogram is shown in Fig. 10.37(e).
Figure 10.37(c) shows the product of the image and this shading pattern. As
Fig. 10.37(f) shows, the deep valley between peaks was corrupted to the point
10.3 ■ Thresholding 741

0 63 127 191 255 0 0.2 0.4 0.6 0.8 1 0 63 127 191 255

a b c
d e f
FIGURE 10.37 (a) Noisy image. (b) Intensity ramp in the range [0.2, 0.6]. (c) Product of (a) and (b).
(d)–(f) Corresponding histograms.

where separation of the modes without additional processing (see Sections In theory, the histogram
of a ramp image is uni-
10.3.4 and 10.3.5) is no longer possible. Similar results would be obtained if the form. In practice, achiev-
illumination was perfectly uniform, but the reflectance of the image was not, ing perfect uniformity
depends on the size of
due, for example, to natural reflectivity variations in the surface of objects the image and number of
and/or background. intensity bits. For exam-
ple, a 256 * 256, 256-
The key point in the preceding paragraph is that illumination and reflectance level ramp image has a
play a central role in the success of image segmentation using thresholding or uniform histogram, but a
256 * 257 ramp image
other segmentation techniques. Therefore, controlling these factors when it is with the same number of
possible to do so should be the first step considered in the solution of a seg- intensities does not.

mentation problem. There are three basic approaches to the problem when
control over these factors is not possible. One is to correct the shading pattern
directly. For example, nonuniform (but fixed) illumination can be corrected by
multiplying the image by the inverse of the pattern, which can be obtained by
imaging a flat surface of constant intensity. The second approach is to attempt
to correct the global shading pattern via processing using, for example, the
top-hat transformation introduced in Section 9.6.3. The third approach is to
“work around” nonuniformities using variable thresholding, as discussed in
Section 10.3.7.

10.3.2 Basic Global Thresholding


As noted in the previous section, when the intensity distributions of objects
and background pixels are sufficiently distinct, it is possible to use a single
(global) threshold applicable over the entire image. In most applications, there
742 Chapter 10 ■ Image Segmentation

is usually enough variability between images that, even if global thresholding


is a suitable approach, an algorithm capable of estimating automatically the
threshold value for each image is required. The following iterative algorithm
can be used for this purpose:
1. Select an initial estimate for the global threshold, T.
2. Segment the image using T in Eq. (10.3-1). This will produce two groups of
pixels: G1 consisting of all pixels with intensity values 7 T, and G2 consist-
ing of pixels with values … T.
3. Compute the average (mean) intensity values m1 and m2 for the pixels in
G1 and G2, respectively.
4. Compute a new threshold value:
1
T = (m + m2)
2 1
5. Repeat Steps 2 through 4 until the difference between values of T in suc-
cessive iterations is smaller than a predefined parameter ¢T.
This simple algorithm works well in situations where there is a reasonably
clear valley between the modes of the histogram related to objects and back-
ground. Parameter ¢T is used to control the number of iterations in situations
where speed is an important issue. In general, the larger ¢T is, the fewer itera-
tions the algorithm will perform. The initial threshold must be chosen greater
than the minimum and less than maximum intensity level in the image (Prob-
lem 10.28). The average intensity of the image is a good initial choice for T.

EXAMPLE 10.15: ■ Figure 10.38 shows an example of segmentation based on a threshold esti-
Global mated using the preceding algorithm. Figure 10.38(a) is the original image, and
thresholding. Fig. 10.38(b) is the image histogram, showing a distinct valley. Application of the
preceding iterative algorithm resulted in the threshold T = 125.4 after three it-
erations, starting with T = m (the average image intensity), and using ¢T = 0.
Figure 10.38(c) shows the result obtained using T = 125 to segment the original
image.As expected from the clear separation of modes in the histogram, the seg-
mentation between object and background was quite effective. ■

The preceding algorithm was stated in terms of successively thresholding


the input image and calculating the means at each step because it is more intu-
itive to introduce it in this manner. However, it is possible to develop a more
efficient procedure by expressing all computations in the terms of the image
histogram, which has to be computed only once (Problem 10.26).

10.3.3 Optimum Global Thresholding Using Otsu’s Method


Thresholding may be viewed as a statistical-decision theory problem whose
objective is to minimize the average error incurred in assigning pixels to two
or more groups (also called classes). This problem is known to have an elegant
closed-form solution known as the Bayes decision rule (see Section 12.2.2).
The solution is based on only two parameters: the probability density function
(PDF) of the intensity levels of each class and the probability that each class
occurs in a given application. Unfortunately, estimating PDFs is not a trivial
10.3 ■ Thresholding 743

0 63 127 191 255

a b c
FIGURE 10.38 (a) Noisy fingerprint. (b) Histogram. (c) Segmented result using a global threshold (the border
was added for clarity). (Original courtesy of the National Institute of Standards and Technology.)

matter, so the problem usually is simplified by making workable assumptions


about the form of the PDFs, such as assuming that they are Gaussian functions.
Even with simplifications, the process of implementing solutions using these as-
sumptions can be complex and not always well-suited for practical applications.
The approach discussed in this section, called Otsu’s method (Otsu [1979]), is
an attractive alternative. The method is optimum in the sense that it maximizes
the between-class variance, a well-known measure used in statistical discrimi-
nant analysis. The basic idea is that well-thresholded classes should be distinct
with respect to the intensity values of their pixels and, conversely, that a thresh-
old giving the best separation between classes in terms of their intensity values
would be the best (optimum) threshold. In addition to its optimality, Otsu’s
method has the important property that it is based entirely on computations
performed on the histogram of an image, an easily obtainable 1-D array.
Let 50, 1, 2, Á , L - 16 denote the L distinct intensity levels in a digital image
of size M * N pixels, and let ni denote the number of pixels with intensity i. The
total number, MN, of pixels in the image is MN = n0 + n1 + n2 + Á + nL - 1.
The normalized histogram (see Section 3.3) has components pi = ni>MN, from
which it follows that
L-1

a pi = 1, pi Ú 0 (10.3-3)
i=0

Now, suppose that we select a threshold T(k) = k, 0 6 k 6 L - 1, and use it


to threshold the input image into two classes, C1 and C2, where C1 consists of
all the pixels in the image with intensity values in the range [0, k] and C2 con-
sists of the pixels with values in the range [k + 1, L - 1]. Using this threshold,
the probability, P1(k), that a pixel is assigned to (i.e., thresholded into) class C1
is given by the cumulative sum
744 Chapter 10 ■ Image Segmentation

k
P1(k) = a pi (10.3-4)
i=0

Viewed another way, this is the probability of class C1 occurring. For example,
if we set k = 0, the probability of class C1 having any pixels assigned to it is
zero. Similarly, the probability of class C2 occurring is
L-1
P2(k) = a pi = 1 - P1(k) (10.3-5)
i=k+1

From Eq. (3.3-18), the mean intensity value of the pixels assigned to class C1 is
k
m1(k) = a iP(i>C1)
i=0
k
= a iP(C1>i)P(i)>P(C1) (10.3-6)
i=0
k
1
P1(k) ia
= ipi
=0

where P1(k) is given in Eq. (10.3-4). The term P(i>C1) in the first line of
Eq. (10.3-6) is the probability of value i, given that i comes from class C1. The
second line in the equation follows from Bayes’ formula:

P(A>B) = P(B>A)P(A)>P(B)

The third line follows from the fact that P(C1>i), the probability of C1 given i,
is 1 because we are dealing only with values of i from class C1. Also, P(i) is the
probability of the ith value, which is simply the ith component of the his-
togram, pi. Finally, P(C1) is the probability of class C1, which we know from
Eq. (10.3-4) is equal to P1(k).
Similarly, the mean intensity value of the pixels assigned to class C2 is
L-1
m2(k) = a iP(i>C2)
i=k+1
L-1
(10.3-7)
1
P2(k) i =a
= ipi
k+1

The cumulative mean (average intensity) up to level k is given by


k
m(k) = a ipi (10.3-8)
i=0

and the average intensity of the entire image (i.e., the global mean) is given by
L-1
mG = a ipi (10.3-9)
i=0
10.3 ■ Thresholding 745

The validity of the following two equations can be verified by direct substitution
of the preceding results:

P1m1 + P2m2 = mG (10.3-10)


and
P1 + P2 = 1 (10.3-11)

where we have omitted the ks temporarily in favor of notational clarity.


In order to evaluate the “goodness” of the threshold at level k we use the
normalized, dimensionless metric

s2B
h = (10.3-12)
s2G

where s2G is the global variance [i.e., the intensity variance of all the pixels in
the image, as given in Eq. (3.3-19)],
L-1
s2G = a (i - mG)2pi (10.3-13)
i=0

and s2B is the between-class variance, defined as

s2B = P1(m1 - mG)2 + P2(m2 - mG)2 (10.3-14)

This expression can be written also as

s2B = P1P2(m1 - m2)2


(10.3-15)
(mGP1 - m)2
= The second step in
P1(1 - P1) Eq. (10.3-15) makes
sense only if P1 is greater
than 0 and less than 1,
where mG and m are as stated earlier. The first line of this equation follows which, in view of
from Eqs. (10.3-14), (10.3-10), and (10.3-11). The second line follows from Eq. (10.3-11), implies
that P2 must satisfy the
Eqs. (10.3-5) through (10.3-9). This form is slightly more efficient computa- same condition.
tionally because the global mean, mG, is computed only once, so only two pa-
rameters, m and P1, need to be computed for any value of k.
We see from the first line in Eq. (10.3-15) that the farther the two means m1
and m2 are from each other the larger s2B will be, indicating that the between-
class variance is a measure of separability between classes. Because s2G is a
constant, it follows that h also is a measure of separability, and maximizing this
metric is equivalent to maximizing s2B. The objective, then, is to determine the
threshold value, k, that maximizes the between-class variance, as stated at the
beginning of this section. Note that Eq. (10.3-12) assumes implicitly that
s2G 7 0. This variance can be zero only when all the intensity levels in the
image are the same, which implies the existence of only one class of pixels. This
in turn means that h = 0 for a constant image since the separability of a single
class from itself is zero.
746 Chapter 10 ■ Image Segmentation

Reintroducing k, we have the final results:

s2B (k)
h(k) = (10.3-16)
s2G

and
2
C mGP1(k) - m(k) D
s2B(k) = (10.3-17)
P1(k) C 1 - P1(k) D

Then, the optimum threshold is the value, k*, that maximizes s2B (k):

s2B (k*) = max s2B (k) (10.3-18)


0…k…L-1

In other words, to find k* we simply evaluate Eq. (10.3-18) for all integer values
of k (such that the condition 0 6 P1(k) 6 1 holds) and select that value of k
that yielded the maximum s2B (k). If the maximum exists for more than one
value of k, it is customary to average the various values of k for which s2B (k) is
maximum. It can be shown (Problem 10.33) that a maximum always exists,
subject to the condition that 0 6 P1(k) 6 1. Evaluating Eqs. (10.3-17) and
(10.3-18) for all values of k is a relatively inexpensive computational proce-
dure, because the maximum number of integer values that k can have is L.
Once k* has been obtained, the input image f(x, y) is segmented as before:

g(x, y) = b
1 if f(x, y) 7 k*
(10.3-19)
0 if f(x, y) … k*

for x = 0, 1, 2, Á , M - 1 and y = 0, 1, 2, Á , N - 1. Note that all the quan-


tities needed to evaluate Eq. (10.3-17) are obtained using only the histogram
of f(x, y). In addition to the optimum threshold, other information regarding
the segmented image can be extracted from the histogram. For example,
P1(k*) and P2(k*), the class probabilities evaluated at the optimum threshold,
indicate the portions of the areas occupied by the classes (groups of pixels) in
the thresholded image. Similarly, the means m1(k*) and m2(k*) are estimates
of the average intensity of the classes in the original image.
The normalized metric h, evaluated at the optimum threshold value, h(k*),
can be used to obtain a quantitative estimate of the separability of classes,
which in turn gives an idea of the ease of thresholding a given image. This mea-
Although our interest is sure has values in the range
in the value of h at the
optimum threshold, k*,
this inequality holds in 0 … h(k*) … 1 (10.3-20)
general for any value of
k in the range [0, L - 1].
The lower bound is attainable only by images with a single, constant intensity
level, as mentioned earlier. The upper bound is attainable only by 2-valued
images with intensities equal to 0 and L - 1 (Problem 10.34).
10.3 ■ Thresholding 747

Otsu’s algorithm may be summarized as follows:

1. Compute the normalized histogram of the input image. Denote the com-
ponents of the histogram by pi, i = 0, 1, 2, Á , L - 1.
2. Compute the cumulative sums, P1(k), for k = 0, 1, 2, Á , L - 1, using
Eq. (10.3-4).
3. Compute the cumulative means, m(k), for k = 0, 1, 2, Á , L - 1, using
Eq. (10.3-8).
4. Compute the global intensity mean, mG, using (10.3-9).
5. Compute the between-class variance, s2B (k), for k = 0, 1, 2, Á , L - 1,
using Eq. (10.3-17).
6. Obtain the Otsu threshold, k*, as the value of k for which s2B (k) is maxi-
mum. If the maximum is not unique, obtain k* by averaging the values of
k corresponding to the various maxima detected.
7. Obtain the separability measure, h*, by evaluating Eq. (10.3-16) at
k = k*.

The following example illustrates the preceding concepts.

■ Figure 10.39(a) shows an optical microscope image of polymersome cells, EXAMPLE 10.16:
and Fig. 10.39(b) shows its histogram. The objective of this example is to seg- Optimum global
thresholding using
ment the molecules from the background. Figure 10.39(c) is the result of using
Otsu’s method.
the basic global thresholding algorithm developed in the previous section. Be-
cause the histogram has no distinct valleys and the intensity difference be-
Polymersomes are cells
tween the background and objects is small, the algorithm failed to achieve the artificially engineered
desired segmentation. Figure 10.39(d) shows the result obtained using Otsu’s using polymers. Polymor-
somes are invisible to the
method. This result obviously is superior to Fig. 10.39(c). The threshold value human immune system
computed by the basic algorithm was 169, while the threshold computed by and can be used, for ex-
ample, to deliver medica-
Otsu’s method was 181, which is closer to the lighter areas in the image defin- tion to targeted regions
ing the cells. The separability measure h was 0.467. of the body.

As a point of interest, applying Otsu’s method to the fingerprint image in


Example 10.15 yielded a threshold of 125 and a separability measure of 0.944.
The threshold is identical to the value (rounded to the nearest integer) ob-
tained with the basic algorithm. This is not unexpected, given the nature of the
histogram. In fact, the separability measure is high due primarily to the rela-
tively large separation between modes and the deep valley between them. ■

10.3.4 Using Image Smoothing to Improve Global Thresholding


As noted in Fig. 10.36, noise can turn a simple thresholding problem into an
unsolvable one. When noise cannot be reduced at the source, and thresholding
is the segmentation method of choice, a technique that often enhances perfor-
mance is to smooth the image prior to thresholding. We illustrate the approach
with an example.
Figure 10.40(a) is the image from Fig. 10.36(c), Fig. 10.40(b) shows its his-
togram, and Fig. 10.40(c) is the image thresholded using Otsu’s method. Every
black point in the white region and every white point in the black region is a
748 Chapter 10 ■ Image Segmentation

a b
c d
FIGURE 10.39
(a) Original
image.
(b) Histogram
(high peaks were
clipped to
highlight details in
the lower values).
(c) Segmentation
result using the
basic global
algorithm from
0 63 127 191 255
Section 10.3.2.
(d) Result
obtained using
Otsu’s method.
(Original image
courtesy of
Professor Daniel
A. Hammer, the
University of
Pennsylvania.)

thresholding error, so the segmentation was highly unsuccessful. Figure 10.40(d)


shows the result of smoothing the noisy image with an averaging mask of size
5 * 5 (the image is of size 651 * 814 pixels), and Fig. 10.40(e) is its histogram.
The improvement in the shape of the histogram due to smoothing is evident, and
we would expect thresholding of the smoothed image to be nearly perfect. As
Fig. 10.40(f) shows, this indeed was the case.The slight distortion of the boundary
between object and background in the segmented, smoothed image was caused
by the blurring of the boundary. In fact, the more aggressively we smooth an
image, the more boundary errors we should anticipate in the segmented result.
Next we consider the effect of reducing the size of the region in Fig. 10.40(a)
with respect to the background. Figure 10.41(a) shows the result. The noise in
this image is additive Gaussian noise with zero mean and a standard deviation
of 10 intensity levels (as opposed to 50 in the previous example). As Fig. 10.41(b)
shows, the histogram has no clear valley, so we would expect segmentation to fail,
a fact that is confirmed by the result in Fig. 10.41(c). Figure 10.41(d) shows the
image smoothed with an averaging mask of size 5 * 5, and Fig. 10.40(e) is the
corresponding histogram. As expected, the net effect was to reduce the
spread of the histogram, but the distribution still is unimodal. As Fig.
10.40(f) shows, segmentation failed again. The reason for the failure can
be traced to the fact that the region is so small that its contribution to the
histogram is insignificant compared to the intensity spread caused by noise. In
10.3 ■ Thresholding 749

0 63 127 191 255

0 63 127 191 255

a b c
d e f
FIGURE 10.40 (a) Noisy image from Fig. 10.36 and (b) its histogram. (c) Result obtained using Otsu’s method.
(d) Noisy image smoothed using a 5 * 5 averaging mask and (e) its histogram. (f) Result of thresholding using
Otsu’s method.

situations such as this, the approach discussed in the following section is more
likely to succeed.

10.3.5 Using Edges to Improve Global Thresholding


Based on the discussion in the previous four sections, we conclude that the
chances of selecting a “good” threshold are enhanced considerably if the his-
togram peaks are tall, narrow, symmetric, and separated by deep valleys. One ap-
proach for improving the shape of histograms is to consider only those pixels that
lie on or near the edges between objects and the background. An immediate and
obvious improvement is that histograms would be less dependent on the relative
sizes of objects and the background. For instance, the histogram of an image com-
posed of a small object on a large background area (or vice versa) would be dom-
inated by a large peak because of the high concentration of one type of pixels.We
saw in the previous section that this can lead to failure in thresholding.
If only the pixels on or near the edges between objects and background
were used, the resulting histogram would have peaks of approximately the
same height. In addition, the probability that any of those pixels lies on an object
would be approximately equal to the probability that it lies on the back-
ground, thus improving the symmetry of the histogram modes. Finally, as indi-
cated in the following paragraph, using pixels that satisfy some simple
measures based on gradient and Laplacian operators has a tendency to deepen
the valley between histogram peaks.
750 Chapter 10 ■ Image Segmentation

0 63 127 191 255

0 63 127 191 255

a b c
d e f
FIGURE 10.41 (a) Noisy image and (b) its histogram. (c) Result obtained using Otsu’s method. (d) Noisy
image smoothed using a 5 * 5 averaging mask and (e) its histogram. (f) Result of thresholding using Otsu’s
method. Thresholding failed in both cases.

The approach just discussed assumes that the edges between objects and
background are known. This information clearly is not available during segmen-
tation, as finding a division between objects and background is precisely what
segmentation is all about. However, with reference to the discussion in Section
10.2, an indication of whether a pixel is on an edge may be obtained by comput-
ing its gradient or Laplacian. For example, the average value of the Laplacian is
0 at the transition of an edge (see Fig. 10.10), so the valleys of histograms formed
from the pixels selected by a Laplacian criterion can be expected to be sparsely
populated. This property tends to produce the desirable deep valleys discussed
above. In practice, comparable results typically are obtained using either the
gradient or Laplacian images, with the latter being favored because it is compu-
It is possible to modify
this algorithm so that tationally more attractive and is also an isotropic edge detector.
both the magnitude of The preceding discussion is summarized in the following algorithm, where
the gradient and the
absolute value of the f(x, y) is the input image:
Laplacian images are
used. In this case, we 1. Compute an edge image as either the magnitude of the gradient, or ab-
would specify a threshold
for each image and form
solute value of the Laplacian, of f(x, y) using any of the methods dis-
the logical OR of the two cussed in Section 10.2.
results to obtain the
marker image. This
2. Specify a threshold value, T.
approach is useful when 3. Threshold the image from Step 1 using the threshold from Step 2 to produce
more control is desired
over the points deemed
a binary image, gT(x, y). This image is used as a mask image in the following
to be valid edge points. step to select pixels from f(x, y) corresponding to “strong” edge pixels.
10.3 ■ Thresholding 751

4. Compute a histogram using only the pixels in f(x, y) that correspond to


the locations of the 1-valued pixels in gT(x, y).
5. Use the histogram from Step 4 to segment f(x, y) globally using, for ex-
ample, Otsu’s method.
If T is set to the maximum value of the edge image then, according to Eq. (10.3-1), The nth percentile is the
smallest number that is
gT(x, y) will consist of all 0s, implying that all pixels of f(x, y) will be used to com- greater than n% of the
pute the image histogram. In this case, the preceding algorithm becomes global numbers in a given set.
For example, if you re-
thresholding in which the histogram of the original image is used without modifi- ceived a 95 in a test and
cation. It is customary to specify the value of T corresponding to a percentile, this score was greater
than 85% of all the stu-
which typically is set high (e.g., in the high 90s) so that few pixels in the gradi- dents taking the test,
ent/Laplacian image will be used in the computation. The following examples il- then you would be in the
85th percentile with re-
lustrate the concepts just discussed. The first example uses the gradient and the spect to the test scores.
second uses the Laplacian. Similar results can be obtained in both examples using
either approach. The important issue is to generate a suitable derivative image.

■ Figures 10.42(a) and (b) show the image and histogram from Fig. 10.41. You EXAMPLE 10.17:
saw that this image could not be segmented by smoothing followed by thresh- Using edge
information based
olding. The objective of this example is to solve the problem using edge infor- on the gradient to
mation. Figure 10.42(c) is the gradient magnitude image thresholded at the improve global
thresholding.

0 63 127 191 255

0 63 127 191 255

a b c
d e f
FIGURE 10.42 (a) Noisy image from Fig. 10.41(a) and (b) its histogram. (c) Gradient magnitude image
thresholded at the 99.7 percentile. (d) Image formed as the product of (a) and (c). (e) Histogram of the
nonzero pixels in the image in (d). (f) Result of segmenting image (a) with the Otsu threshold based on the
histogram in (e). The threshold was 134, which is approximately midway between the peaks in this histogram.
752 Chapter 10 ■ Image Segmentation

99.7 percentile. Figure 10.42(d) is the image formed by multiplying this (mask)
image by the input image. Figure 10.42(e) is the histogram of the nonzero ele-
ments in Fig. 10.42(d). Note that this histogram has the important features dis-
cussed earlier; that is, it has reasonably symmetrical modes separated by a
deep valley. Thus, while the histogram of the original noisy image offered no
hope for successful thresholding, the histogram in Fig. 10.42(e) indicates that
thresholding of the small object from the background is indeed possible. The
result in Fig. 10.42(f) shows that indeed this is the case. This image was ob-
tained by using Otsu’s method to obtain a threshold based on the histogram in
Fig. 10.42(e) and then applying this threshold globally to the noisy image in
Fig. 10.42(a). The result is nearly perfect. ■

EXAMPLE 10.18: ■ In this example we consider a more complex thresholding problem. Figure
Using edge 10.43(a) shows an 8-bit image of yeast cells in which we wish to use global
information based thresholding to obtain the regions corresponding to the bright spots. As a
on the Laplacian
to improve global starting point, Fig. 10.43(b) shows the image histogram, and Fig. 10.43(c) is the
thresholding. result obtained using Otsu’s method directly on the image, using the histogram
shown. We see that Otsu’s method failed to achieve the original objective of
detecting the bright spots, and, although the method was able to isolate some
of the cell regions themselves, several of the segmented regions on the right
are not disjoint. The threshold computed by the Otsu method was 42 and the
separability measure was 0.636.
Figure 10.43(d) shows the image gT(x, y) obtained by computing the absolute
value of the Laplacian image and then thresholding it with T set to 115 on an
intensity scale in the range [0, 255]. This value of T corresponds approximately
to the 99.5 percentile of the values in the absolute Laplacian image, so thresh-
olding at this level should result in a sparse set of pixels, as Fig. 10.43(d) shows.
Note in this image how the points cluster near the edges of the bright spots, as
expected from the preceding discussion. Figure 10.43(e) is the histogram of the
nonzero pixels in the product of (a) and (d). Finally, Fig. 10.43(f) shows the re-
sult of globally segmenting the original image using Otsu’s method based on
the histogram in Fig. 10.43(e). This result agrees with the locations of the
bright spots in the image. The threshold computed by the Otsu method was
115 and the separability measure was 0.762, both of which are higher than the
values obtained by using the original histogram.
By varying the percentile at which the threshold is set we can even improve
on the segmentation of the cell regions. For example, Fig. 10.44 shows the re-
sult obtained using the same procedure as in the previous paragraph, but with
the threshold set at 55, which is approximately 5% of the maximum value of
the absolute Laplacian image. This value is at the 53.9 percentile of the values
in that image. This result clearly is superior to the result in Fig. 10.43(c)
obtained using Otsu’s method with the histogram of the original image. ■

10.3.6 Multiple Thresholds


Thus far, we have focused attention on image segmentation using a single global
threshold. The thresholding method introduced in Section 10.3.3 can be ex-
tended to an arbitrary number of thresholds, because the separability measure
10.3 ■ Thresholding 753

0 63 127 191 255

0 63 127 191 255

a b c
d e f
FIGURE 10.43 (a) Image of yeast cells. (b) Histogram of (a). (c) Segmentation of (a) with Otsu’s method
using the histogram in (b). (d) Thresholded absolute Laplacian. (e) Histogram of the nonzero pixels in the
product of (a) and (d). (f) Original image thresholded using Otsu’s method based on the histogram in (e).
(Original image courtesy of Professor Susan L. Forsburg, University of Southern California.)

FIGURE 10.44
Image in
Fig. 10.43(a)
segmented using
the same
procedure as
explained in
Figs. 10.43(d)–(f),
but using a lower
value to threshold
the absolute
Laplacian image.
754 Chapter 10 ■ Image Segmentation

on which it is based also extends to an arbitrary number of classes (Fukunaga


[1972]). In the case of K classes, C1, C2, Á CK, the between-class variance gen-
eralizes to the expression
K
s2B = a Pk (mk - mG)2 (10.3-21)
k=1

where

Pk = a pi (10.3-22)
iHCk
1
mk = a ipi
Pk iHC
(10.3-23)
k

and mG is the global mean given in Eq. (10.3-9). The K classes are separated by
K - 1 thresholds whose values, k…1, k…2, Á , k…K - 1, are the values that maximize
Eq. (10.3-21):

s2B (k…1, k…2, Á , k…K - 1) = max s2B (k1, k2, Á kK - 1) (10.3-24)


0 6 k1 6 k2 6 Á kn - 1 6 L - 1

Although this result is perfectly general, it begins to lose meaning as the num-
ber of classes increases, because we are dealing with only one variable (inten-
sity). In fact, the between-class variance usually is cast in terms of multiple
variables expressed as vectors (Fukunaga [1972]). In practice, using multiple
global thresholding is considered a viable approach when there is reason to
believe that the problem can be solved effectively with two thresholds. Appli-
cations that require more than two thresholds generally are solved using more
than just intensity values. Instead, the approach is to use additional descriptors
(e.g., color) and the application is cast as a pattern recognition problem, as ex-
plained in Section 10.3.8.
For three classes consisting of three intensity intervals (which are separated
Thresholding with two by two thresholds) the between-class variance is given by:
thresholds sometimes is
referred to as hysteresis
thresholding. s2B = P1 (m1 - mG)2 + P2 (m2 - mG)2 + P3 (m3 - mG)2 (10.3-25)

where
k1
P1 = a pi
i=0
k2
P2 = a pi (10.3-26)
i = k1 + 1
L-1
P3 = a pi
i = k2 + 1
10.3 ■ Thresholding 755

and
1 k1
P1 ia
m1 = ipi
=0

1 k2
P2 i =a
m2 = ipi (10.3-27)
k1 + 1

1 L-1
P3 i =a
m3 = ipi
k2 + 1

As in Eqs. (10.3-10) and (10.3-11), the following relationships hold:

P1m1 + P2m2 + P3m3 = mG (10.3-28)


and
P1 + P2 + P3 = 1 (10.3-29)

We see that the P and m terms and, therefore s2B, are functions of k1 and k2.
The two optimum threshold values, k…1 and k…2, are the values that maximize
s2B (k1, k2). In other words, as in the single-threshold case discussed in Section
10.3.3, we find the optimum thresholds by finding

s2B (k…1, k…2) = max s2B (k1, k2) (10.3-30)


0 6 k1 6 k2 6 L - 1

The procedure starts by selecting the first value of k1 (that value is 1 because
looking for a threshold at 0 intensity makes no sense; also, keep in mind that the
increment values are integers because we are dealing with intensities). Next, k2
is incremented through all its values greater than k1 and less than L - 1 (i.e.,
k2 = k1 + 1, Á , L - 2). Then k1 is incremented to its next value and k2 is in-
cremented again through all its values greater than k1. This procedure is re-
peated until k1 = L - 3. The result of this process is a 2-D array, s2B (k1, k2),
and the last step is to look for the maximum value in this array. The values of k1
and k2 corresponding to that maximum are the optimum thresholds, k…1 and k…2.
If there are several maxima, the corresponding values of k1 and k2 are averaged
to obtain the final thresholds. The thresholded image is then given by

g(x, y) = c b
a if f(x, y) … k…1
if k…1 6 f(x, y) … k…2 (10.3-31)
c if f(x, y) 7 k…2

where a, b, and c are any three valid intensity values.


Finally, we note that the separability measure defined in Section 10.3.3 for
one threshold extends directly to multiple thresholds:

s2B (k…1, k…2)


h(k…1, k…2) = (10.3-32)
s2G
where s2G is the total image variance from Eq. (10.3-13).
756 Chapter 10 ■ Image Segmentation

EXAMPLE 10.19: ■ Figure 10.45(a) shows an image of an iceberg. The objective of this exam-
Multiple global ple is to segment the image into three regions: the dark background, the illu-
thresholding.
minated area of the iceberg, and the area in shadows. It is evident from the
image histogram in Fig. 10.45(b) that two thresholds are required to solve
this problem. The procedure discussed above resulted in the thresholds
k…1 = 80 and k…2 = 177, which we note from Fig. 10.45(b) are near the centers
of the two histogram valleys. Figure 10.45(c) is the segmentation that result-
ed using these two thresholds in Eq. (10.3-31). The separability measure was
0.954. The principal reason this example worked out so well can be traced to
the histogram having three distinct modes separated by reasonably wide,
deep valleys. ■

10.3.7 Variable Thresholding


As discussed in Section 10.3.1, factors such as noise and nonuniform illumina-
tion play a major role in the performance of a thresholding algorithm. We
showed in Sections 10.3.4 and 10.3.5 that image smoothing and using edge in-
formation can help significantly. However, it frequently is the case that this
type of preprocessing is either impractical or simply ineffective in improving
the situation to the point where the problem is solvable by any of the methods
discussed thus far. In such situations, the next level of thresholding complexity
involves variable thresholding. In this section, we discuss various techniques
for choosing variable thresholds.

Image partitioning
One of the simplest approaches to variable thresholding is to subdivide an
image into nonoverlapping rectangles. This approach is used to compensate
for non-uniformities in illumination and/or reflectance. The rectangles are
chosen small enough so that the illumination of each is approximately uni-
form. We illustrate this approach with an example.

0 63 127 191 255

a b c
FIGURE 10.45 (a) Image of iceberg. (b) Histogram. (c) Image segmented into three regions using dual Otsu
thresholds. (Original image courtesy of NOAA.)
10.3 ■ Thresholding 757

■ Figure 10.46(a) shows the image from Fig. 10.37(c), and Fig. 10.46(b) shows EXAMPLE 10.20:
its histogram. When discussing Fig. 10.37(c) we concluded that this image Variable
thresholding via
could not be segmented with a global threshold, a fact confirmed by Figs.
image
10.46(c) and (d), which show the results of segmenting the image using the it- partitioning.
erative scheme discussed in Section 10.3.2 and Otsu’s method, respectively.
Both methods produced comparable results, in which numerous segmentation
errors are visible.
Figure 10.46(e) shows the original image subdivided into six rectangular
regions, and Fig. 10.46(f) is the result of applying Otsu’s global method to each
subimage. Although some errors in segmentation are visible, image subdivi-
sion produced a reasonable result on an image that is quite difficult to seg-
ment. The reason for the improvement is explained easily by analyzing the
histogram of each subimage. As Fig. 10.47 shows, each subimage is character-
ized by a bimodal histogram with a deep valley between the modes, a fact that
we know will lead to effective global thresholding.
Image subdivision generally works well when the objects of interest and the
background occupy regions of reasonably comparable size, as in Fig. 10.46.
When this is not the case, the method typically fails because of the likelihood
of subdivisions containing only object or background pixels. Although this sit-
uation can be addressed by using additional techniques to determine when a
subdivision contains both types of pixels, the logic required to address different

0 63 127 191 255

a b c
d e f
FIGURE 10.46 (a) Noisy, shaded image and (b) its histogram. (c) Segmentation of (a) using the iterative
global algorithm from Section 10.3.2. (d) Result obtained using Otsu’s method. (e) Image subdivided into six
subimages. (f) Result of applying Otsu’s method to each subimage individually.
758 Chapter 10 ■ Image Segmentation

FIGURE 10.47
Histograms of the
six subimages in
Fig. 10.46(e).

scenarios can get complicated. In such situations, methods such as those


discussed in the remainder of this section typically are preferable. ■

Variable thresholding based on local image properties


A more general approach than the image subdivision method discussed in the
previous section is to compute a threshold at every point, (x, y), in the image
based on one or more specified properties computed in a neighborhood of
(x, y). Although this may seem like a laborious process, modern algorithms
and hardware allow for fast neighborhood processing, especially for common
functions such as logical and arithmetic operations.
We illustrate the basic approach to local thresholding using the standard
deviation and mean of the pixels in a neighborhood of every point in an image.
These two quantities are quite useful for determining local thresholds because
they are descriptors of local contrast and average intensity. Let sxy and mxy de-
note the standard deviation and mean value of the set of pixels contained in a
neighborhood, Sxy, centered at coordinates (x, y) in an image (see Section
3.3.4 regarding computation of the local mean and standard deviation). The
following are common forms of variable, local thresholds:
Txy = asxy + bmxy (10.3-33)

where a and b are nonnegative constants, and


Txy = asxy + bmG (10.3-34)

where mG is the global image mean. The segmented image is computed as

g(x, y) = b
1 if f(x, y) 7 Txy
(10.3-35)
0 if f(x, y) … Txy
where f(x, y) is the input image. This equation is evaluated for all pixel loca-
tions in the image, and a different threshold is computed at each location
(x, y) using the pixels in the neighborhood Sxy.
10.3 ■ Thresholding 759

Significant power (with a modest increase in computation) can be added to


local thresholding by using predicates based on the parameters computed in
the neighborhoods of (x, y):

g(x, y) = b
1 if Q(local parameters) is true
(10.3-36)
0 if Q(local parameters) is false

where Q is a predicate based on parameters computed using the pixels in


neighborhood Sxy. For example, consider the following predicate, Q(sxy, mxy),
based on the local mean and standard deviation:

Q(sxy, mxy) = b
true if f(x, y) 7 asxy AND f(x, y) 7 bmxy
(10.3-37)
false otherwise

Note that Eq. (10.3-35) is a special case of Eq. (10.3-36), obtained by letting Q
be true if f(x, y) 7 Txy and false otherwise. In this case, the predicate is based
simply on the intensity at a point.

■ Figure 10.48(a) shows the yeast image from Example 10.18. This image has EXAMPLE 10.21:
three predominant intensity levels, so it is reasonable to assume that perhaps Variable
thresholding
dual thresholding could be a good segmentation approach. Figure 10.48(b) is
based on local
the result of using the dual thresholding method explained in Section 10.3.6. image properties.
As the figure shows, it was possible to isolate the bright areas from the back-
ground, but the mid-gray regions on the right side of the image were not seg-
mented properly (recall that we encountered a similar problem with Fig. 10.43(c)
in Example 10.18). To illustrate the use of local thresholding, we computed the
local standard deviation sxy for all (x, y) in the input image using a neighbor-
hood of size 3 * 3. Figure 10.48(c) shows the result. Note how the faint outer
lines correctly delineate the boundaries of the cells. Next, we formed a predi-
cate of the form shown in Eq. (10.3-37) but using the global mean instead of
mxy. Choosing the global mean generally gives better results when the back-
ground is nearly constant and all the object intensities are above or below the
background intensity. The values a = 30 and b = 1.5 were used in completing
the specification of the predicate (these values were determined experimen-
tally, as is usually the case in applications such as this). The image was then seg-
mented using Eq. (10.3-36). As Fig. 10.48(d) shows, the result agrees quite
closely with the two types of intensity regions prevalent in the input image.
Note in particular that all the outer regions were segmented properly and that
most of the inner, brighter regions were isolated correctly. ■

Using moving averages


A special case of the local thresholding method just discussed is based on
computing a moving average along scan lines of an image. This implementation
is quite useful in document processing, where speed is a fundamental require-
ment. The scanning typically is carried out line by line in a zigzag pattern to
760 Chapter 10 ■ Image Segmentation

a b
c d
FIGURE 10.48
(a) Image from
Fig. 10.43.
(b) Image
segmented using
the dual
thresholding
approach
discussed in
Section 10.3.6.
(c) Image of local
standard
deviations.
(d) Result
obtained using
local thresholding.

reduce illumination bias. Let zk + 1 denote the intensity of the point encountered
in the scanning sequence at step k + 1. The moving average (mean intensity)
at this new point is given by

The first expression is k+1


valid for k Ú n - 1.
1
When k is less than
m(k + 1) = n a zi
i=k+2-n
n - 1, averages are (10.3-38)
formed with the 1
available points. = m(k) + n (zk + 1 - zk - n)
Similarly, the second
expression is valid for
k Ú n + 1. where n denotes the number of points used in computing the average and
m(1) = z1>n. This initial value is not strictly correct because the average of a single
point is the value of the point itself. However, we use m(1) = z1>n so that no spe-
cial computations are required when Eq. (10.3-38) first starts up. Another way of
viewing it is that this is the value we would obtain if the border of the image were
padded with n - 1 zeros. The algorithm is initialized only once, not at every row.
Because a moving average is computed for every point in the image, segmentation
is implemented using Eq. (10.3-35) with Txy = bmxy where b is constant and mxy
is the moving average from Eq. (10.3-38) at point (x, y) in the input image.
10.3 ■ Thresholding 761

■ Figure 10.49(a) shows an image of handwritten text shaded by a spot intensity EXAMPLE 10.22:
pattern. This form of intensity shading is typical of images obtained with a Document
thresholding using
photographic flash. Figure 10.49(b) is the result of segmentation using the
moving averages.
Otsu global thresholding method. It is not unexpected that global thresholding
could not overcome the intensity variation. Figure 10.49(c) shows successful
segmentation with local thresholding using moving averages. A rule of thumb
is to let n equal 5 times the average stroke width. In this case, the average
width was 4 pixels, so we let n = 20 in Eq. (10.3-38) and used b = 0.5.
As another illustration of the effectiveness of this segmentation approach
we used the same parameters as in the previous paragraph to segment the
image in Fig. 10.50(a), which is corrupted by a sinusoidal intensity variation
typical of the variation that may occur when the power supply in a document
scanner is not grounded properly. As Figs. 10.50(b) and (c) show, the segmen-
tation results are comparable to those in Fig. 10.49.
It is of interest to note that successful segmentation results were obtained in
both cases using the same values for n and b, which shows the relative rugged-
ness of the approach. In general, thresholding based on moving averages
works well when the objects of interest are small (or thin) with respect to the
image size, a condition satisfied by images of typed or handwritten text. ■

10.3.8 Multivariable Thresholding


Thus far, we have been concerned with thresholding based on a single variable:
gray-scale intensity. In some cases, a sensor can make available more than one
variable to characterize each pixel in an image, and thus allow multivariable
thresholding. A notable example is color imaging, where red (R), green (G),
and blue (B) components are used to form a composite color image (see
Chapter 6). In this case, each “pixel” is characterized by three values, and can
be represented as a 3-D vector, z = (z1, z2, z3)T, whose components are the
RGB colors at a point. These 3-D points often are referred to as voxels, to de-
note volumetric elements, as opposed to image elements.

a b c
FIGURE 10.49 (a) Text image corrupted by spot shading. (b) Result of global thresholding using Otsu’s
method. (c) Result of local thresholding using moving averages.
762 Chapter 10 ■ Image Segmentation

As discussed in some detail in Section 6.7, multivariable thresholding may


be viewed as a distance computation. Suppose that we want to extract from a
color image all regions having a specified color range: say, reddish hues. Let a
denote the average reddish color in which we are interested. One way to seg-
ment a color image based on this parameter is to compute a distance measure,
D(z, a), between an arbitrary color point, z, and the average color, a. Then, we
segment the input image as follows:

g = b
1 if D(z, a) 6 T
(10.3-39)
0 otherwise

where T is a threshold, and it is understood that the distance computation is


performed at all coordinates in the input image to generate the corresponding
segmented values in g. Note that the inequalities in this equation are the op-
posite of the inequalities we used in Eq. (10.3-1) for thresholding a single vari-
able.The reason is that the equation D(z, a) = T defines a volume (see Fig. 6.43)
and it is more intuitive to think of segmented pixel values as being contained
within the volume and background pixel values as being on the surface or out-
side the volume. Equation (10.3-39) reduces to Eq. (10.3-1) by letting
D(z, a) = - f(x, y).
Observe that the condition f(x, y) 7 T basically says that the Euclidean
distance between the value of f and the origin of the real line exceeds the
value of T. Thus, thresholding is based on the computation of a distance mea-
sure, and the form of Eq. (10.3-39) depends on the measure used. If, in gener-
al, z in an n-dimensional vector, we know from Section 2.6.6 that the
n-dimensional Euclidean distance is defined as

D(z, a) = 7 z - a 7
(10.3-40)
1

= C (z - a) (z - a) D
T 2

a b c
FIGURE 10.50 (a) Text image corrupted by sinusoidal shading. (b) Result of global thresholding using Otsu’s
method. (c) Result of local thresholding using moving averages.
10.4 ■ Region-Based Segmentation 763

The equation D(z, a) = T describes a sphere (called a hypersphere) in n-


dimensional Euclidean space (Fig. 6.43 shows a 3-D example). A more powerful
distance measure is the so-called Mahalanobis distance, defined as
1

D(z, a) = C (z - a)T C -1 (z - a) D 2 (10.3-41)

where C is the covariance matrix of the zs, as discussed Section 12.2.2.


D(z, a) = T describes an n-dimensional hyperellipse (Fig. 6.43 shows a 3-D
example). This expression reduces to Eq. (10.3-40) when C = I, the identity
matrix.
We gave a detailed example in Section 6.7 regarding the use of these expres-
sions. We also discuss in Section 12.2 the problem of segmenting regions out of
an image using pattern recognition techniques based on decision functions,
which may be viewed as a multiclass, multivariable thresholding problem.

10.4 Region-Based Segmentation


As discussed in Section 10.1, the objective of segmentation is to partition an You should review the
image into regions. In Section 10.2, we approached this problem by attempting to terminology introduced
in Section 10.1 before
find boundaries between regions based on discontinuities in intensity levels, proceeding.
whereas in Section 10.3, segmentation was accomplished via thresholds based on
the distribution of pixel properties, such as intensity values or color. In this sec-
tion, we discuss segmentation techniques that are based on finding the regions
directly.

10.4.1 Region Growing


As its name implies, region growing is a procedure that groups pixels or subre-
gions into larger regions based on predefined criteria for growth. The basic ap-
proach is to start with a set of “seed” points and from these grow regions by
appending to each seed those neighboring pixels that have predefined properties
similar to the seed (such as specific ranges of intensity or color).
Selecting a set of one or more starting points often can be based on the
nature of the problem, as shown later in Example 10.23. When a priori infor-
mation is not available, the procedure is to compute at every pixel the same set
of properties that ultimately will be used to assign pixels to regions during the
growing process. If the result of these computations shows clusters of values,
the pixels whose properties place them near the centroid of these clusters can
be used as seeds.
The selection of similarity criteria depends not only on the problem under
consideration, but also on the type of image data available. For example, the
analysis of land-use satellite imagery depends heavily on the use of color. This
problem would be significantly more difficult, or even impossible, to solve
without the inherent information available in color images. When the images
are monochrome, region analysis must be carried out with a set of descriptors
based on intensity levels and spatial properties (such as moments or texture).
We discuss descriptors useful for region characterization in Chapter 11.
764 Chapter 10 ■ Image Segmentation

Descriptors alone can yield misleading results if connectivity properties are


not used in the region-growing process. For example, visualize a random
arrangement of pixels with only three distinct intensity values. Grouping pixels
with the same intensity level to form a “region” without paying attention to
connectivity would yield a segmentation result that is meaningless in the con-
text of this discussion.
Another problem in region growing is the formulation of a stopping rule.
Region growth should stop when no more pixels satisfy the criteria for inclu-
sion in that region. Criteria such as intensity values, texture, and color are local
in nature and do not take into account the “history” of region growth. Addi-
tional criteria that increase the power of a region-growing algorithm utilize
the concept of size, likeness between a candidate pixel and the pixels grown so
far (such as a comparison of the intensity of a candidate and the average in-
tensity of the grown region), and the shape of the region being grown. The use
of these types of descriptors is based on the assumption that a model of ex-
pected results is at least partially available.
Let: f(x, y) denote an input image array; S(x, y) denote a seed array con-
taining 1s at the locations of seed points and 0s elsewhere; and Q denote a
predicate to be applied at each location (x, y). Arrays f and S are assumed to
be of the same size. A basic region-growing algorithm based on 8-connectivity
may be stated as follows.
See Sections 2.5.2 and 1. Find all connected components in S(x, y) and erode each connected com-
9.5.3 regarding connected
components, and Section ponent to one pixel; label all such pixels found as 1. All other pixels in S
9.2.1 regarding erosion. are labeled 0.
2. Form an image fQ such that, at a pair of coordinates (x, y), let fQ (x, y) = 1
if the input image satisfies the given predicate, Q, at those coordinates;
otherwise, let fQ (x, y) = 0.
3. Let g be an image formed by appending to each seed point in S all the
1-valued points in fQ that are 8-connected to that seed point.
4. Label each connected component in g with a different region label (e.g.,
1, 2, 3, Á ). This is the segmented image obtained by region growing.

We illustrate the mechanics of this algorithm by an example.

EXAMPLE 10.23: ■ Figure 10.51(a) shows an 8-bit X-ray image of a weld (the horizontal dark
Segmentation by region) containing several cracks and porosities (the bright regions running
region growing.
horizontally through the center of the image). We illustrate the use of region
growing by segmenting the defective weld regions. These regions could be
used in applications such as weld inspection, for inclusion in a database of his-
torical studies, or for controlling an automated welding system.
The first order of business is to determine the seed points. From the physics
of the problem, we know that cracks and porosities will attenuate X-rays con-
siderably less than solid welds, so we expect the regions containing these types
of defects to be significantly brighter than other parts of the X-ray image. We
can extract the seed points by thresholding the original image, using a thresh-
old set at a high percentile. Figure 10.51(b) shows the histogram of the image
10.4 ■ Region-Based Segmentation 765

0 63 127 191 255

0 63 127 191 255

a b c
d e f
g h i
FIGURE 10.51 (a) X-ray image of a defective weld. (b) Histogram. (c) Initial seed image. (d) Final seed image
(the points were enlarged for clarity). (e) Absolute value of the difference between (a) and (c). (f) Histogram
of (e). (g) Difference image thresholded using dual thresholds. (h) Difference image thresholded with the
smallest of the dual thresholds. (i) Segmentation result obtained by region growing. (Original image courtesy
of X-TEK Systems, Ltd.)

and Fig. 10.51(c) shows the thresholded result obtained with a threshold equal
to the 99.9 percentile of intensity values in the image, which in this case was
254 (see Section 10.3.5 regarding percentiles). Figure 10.51(d) shows the result
of morphologically eroding each connected component in Fig. 10.51(c) to a
single point.
Next, we have to specify a predicate. In this example, we are interested in
appending to each seed all the pixels that (a) are 8-connected to that seed and
766 Chapter 10 ■ Image Segmentation

(b) are “similar” to it. Using intensity differences as a measure of similarity,


our predicate applied at each location (x, y) is

Q = c
TRUE if the absolute difference of the intensities
between the seed and the pixel at (x, y) is … T
FALSE otherwise

where T is a specified threshold. Although this predicate is based on intensity


differences and uses a single threshold, we could specify more complex
schemes in which a different threshold is applied to each pixel, and properties
other than differences are used. In this case, the preceding predicate is suffi-
cient to solve the problem, as the rest of this example shows.
From the previous paragraph, we know that the smallest seed value is 255
because the image was thresholded with a threshold of 254. Figure 10.51(e)
shows the absolute value of the difference between the images in Figs.
10.51(a) and (c). The image in Fig. 10.51(e) contains all the differences need-
ed to compute the predicate at each location (x, y). Figure 10.51(f) shows the
corresponding histogram. We need a threshold to use in the predicate to
establish similarity. The histogram has three principal modes, so we can start
by applying to the difference image the dual thresholding technique dis-
cussed in Section 10.3.6. The resulting two thresholds in this case were
T1 = 68 and T2 = 126, which we see correspond closely to the valleys of the
histogram. (As a brief digression, we segmented the image using these two
thresholds. The result in Fig. 10.51(g) shows that the problem of segmenting
the defects cannot be solved using dual thresholds, even though the thresh-
olds are in the main valleys.)
Figure 10.51(h) shows the result of thresholding the difference image with
only T1. The black points are the pixels for which the predicate was TRUE; the
others failed the predicate. The important result here is that the points in the
good regions of the weld failed the predicate, so they will not be included in
the final result. The points in the outer region will be considered by the region-
growing algorithm as candidates. However, Step 3 will reject the outer points,
because they are not 8-connected to the seeds. In fact, as Fig. 10.51(i) shows,
this step resulted in the correct segmentation, indicating that the use of con-
nectivity was a fundamental requirement in this case. Finally, note that in Step 4
we used the same value for all the regions found by the algorithm. In this case,
it was visually preferable to do so. ■

10.4.2 Region Splitting and Merging


The procedure discussed in the last section grows regions from a set of seed
points. An alternative is to subdivide an image initially into a set of arbitrary,
disjoint regions and then merge and/or split the regions in an attempt to satis-
fy the conditions of segmentation stated in Section 10.1. The basics of splitting
and merging are discussed next.
10.4 ■ Region-Based Segmentation 767

Let R represent the entire image region and select a predicate Q. One
approach for segmenting R is to subdivide it successively into smaller and
smaller quadrant regions so that, for any region Ri, Q(Ri) = TRUE. We start
with the entire region. If Q(R) = FALSE, we divide the image into quadrants.
If Q is FALSE for any quadrant, we subdivide that quadrant into subquad-
rants, and so on. This particular splitting technique has a convenient represen-
tation in the form of so-called quadtrees, that is, trees in which each node has
exactly four descendants, as Fig. 10.52 shows (the images corresponding to the
nodes of a quadtree sometimes are called quadregions or quadimages). Note
that the root of the tree corresponds to the entire image and that each node
corresponds to the subdivision of a node into four descendant nodes. In this
case, only R4 was subdivided further.
If only splitting is used, the final partition normally contains adjacent re-
gions with identical properties. This drawback can be remedied by allowing
merging as well as splitting. Satisfying the constraints of segmentation outlined See Section 2.5.2
in Section 10.1 requires merging only adjacent regions whose combined pixels regarding region
adjacency.
satisfy the predicate Q. That is, two adjacent regions Rj and Rk are merged
only if Q(Rj ´ Rk) = TRUE.
The preceding discussion can be summarized by the following procedure in
which, at any step, we
1. Split into four disjoint quadrants any region Ri for which Q(Ri) = FALSE.
2. When no further splitting is possible, merge any adjacent regions Rj and
Rk for which Q(Rj ´ Rk) = TRUE.
3. Stop when no further merging is possible.
It is customary to specify a minimum quadregion size beyond which no further
splitting is carried out.
Numerous variations of the preceding basic theme are possible. For example,
a significant simplification results if in Step 2 we allow merging of any two ad-
jacent regions Ri and Rj if each one satisfies the predicate individually. This re-
sults in a much simpler (and faster) algorithm, because testing of the predicate
is limited to individual quadregions. As the following example shows, this sim-
plification is still capable of yielding good segmentation results.

a b
R
FIGURE 10.52
(a) Partitioned
image.
R1 R2 (b)
R1 R2 R3 R4 Corresponding
quadtree. R
R41 R42 represents the
R3 entire image
R43 R44 R41 R42 R43 R44 region.
768 Chapter 10 ■ Image Segmentation

EXAMPLE 10.24: ■ Figure 10.53(a) shows a 566 * 566 X-ray band image of the Cygnus Loop.
Segmentation by The objective of this example is to segment out of the image the “ring” of less
region splitting
dense matter surrounding the dense center. The region of interest has some
and merging.
obvious characteristics that should help in its segmentation. First, we note that
the data in this region has a random nature, indicating that its standard devia-
tion should be greater than the standard deviation of the background (which is
near 0) and of the large central region, which is fairly smooth. Similarly, the
mean value (average intensity) of a region containing data from the outer ring
should be greater than the mean of the darker background and less than the
mean of the large, lighter central region. Thus, we should be able to segment
the region of interest using the following predicate:

Q = b
TRUE if s 7 a AND 0 6 m 6 b
FALSE otherwise

where m and s are the mean and standard deviation of the pixels in a quadre-
gion, and a and b are constants.
Analysis of several regions in the outer area of interest revealed that the
mean intensity of pixels in those regions did not exceed 125 and the standard
deviation was always greater than 10. Figures 10.53(b) through (d) show the
results obtained using these values for a and b, and varying the minimum size
allowed for the quadregions from 32 to 8. The pixels in a quadregion whose

a b
c d
FIGURE 10.53
(a) Image of the
Cygnus Loop
supernova, taken
in the X-ray band
by NASA’s
Hubble Telescope.
(b)–(d) Results of
limiting the
smallest allowed
quadregion to
sizes of
32 * 32, 16 * 16,
and 8 * 8 pixels,
respectively.
(Original image
courtesy of
NASA.)
10.5 ■ Segmentation Using Morphological Watersheds 769

pixels satisfied the predicate were set to white; all others in that region were set
to black. The best result in terms of capturing the shape of the outer region was
obtained using quadregions of size 16 * 16. The black squares in Fig. 10.53(d)
are quadregions of size 8 * 8 whose pixels did not satisfied the predicate. Using
smaller quadregions would result in increasing numbers of such black regions.
Using regions larger than the one illustrated here results in a more “block-
like” segmentation. Note that in all cases the segmented regions (white pixels)
completely separate the inner, smoother region from the background. Thus,
the segmentation effectively partitioned the image into three distinct areas
that correspond to the three principal features in the image: background,
dense, and sparse regions. Using any of the white regions in Fig. 10.53 as a
mask would make it a relatively simple task to extract these regions from the
original image (Problem 10.40). As in Example 10.23, these results could not
have been obtained using edge- or threshold-based segmentation. ■

As used in the preceding example, properties based on the mean and standard
deviation of pixel intensities in a region attempt to quantify the texture of the
region (see Section 11.3.3 for a discussion on texture). The concept of texture
segmentation is based on using measures of texture in the predicates. In other
words, we can perform texture segmentation by any of the methods discussed
in this section simply by specifying predicates based on texture content.

10.5 Segmentation Using Morphological Watersheds


Thus far, we have discussed segmentation based on three principal concepts:
(a) edge detection, (b) thresholding, and (c) region growing. Each of these ap-
proaches was found to have advantages (for example, speed in the case of
global thresholding) and disadvantages (for example, the need for post-
processing, such as edge linking, in edge-based segmentation). In this section
we discuss an approach based on the concept of so-called morphological
watersheds. As will become evident in the following discussion, segmentation
by watersheds embodies many of the concepts of the other three approaches
and, as such, often produces more stable segmentation results, including con-
nected segmentation boundaries. This approach also provides a simple frame-
work for incorporating knowledge-based constraints (see Fig. 1.23) in the
segmentation process.

10.5.1 Background
The concept of watersheds is based on visualizing an image in three dimen-
sions: two spatial coordinates versus intensity, as in Fig. 2.18(a). In such a
“topographic” interpretation, we consider three types of points: (a) points be-
longing to a regional minimum; (b) points at which a drop of water, if placed at
the location of any of those points, would fall with certainty to a single mini-
mum; and (c) points at which water would be equally likely to fall to more
than one such minimum. For a particular regional minimum, the set of points
satisfying condition (b) is called the catchment basin or watershed of that
770 Chapter 10 ■ Image Segmentation

minimum. The points satisfying condition (c) form crest lines on the topo-
graphic surface and are termed divide lines or watershed lines.
The principal objective of segmentation algorithms based on these concepts
is to find the watershed lines. The basic idea is simple, as the following analogy
illustrates. Suppose that a hole is punched in each regional minimum and that
the entire topography is flooded from below by letting water rise through the
holes at a uniform rate. When the rising water in distinct catchment basins is
about to merge, a dam is built to prevent the merging. The flooding will even-
tually reach a stage when only the tops of the dams are visible above the water
line. These dam boundaries correspond to the divide lines of the watersheds.
Therefore, they are the (connected) boundaries extracted by a watershed seg-
mentation algorithm.
These ideas can be explained further with the aid of Fig. 10.54. Figure 10.54(a)
shows a gray-scale image and Fig. 10.54(b) is a topographic view, in which the
height of the “mountains” is proportional to intensity values in the input
image. For ease of interpretation, the backsides of structures are shaded. This
is not to be confused with intensity values; only the general topography of the
three-dimensional representation is of interest. In order to prevent the rising
water from spilling out through the edges of the image, we imagine the

a b
c d
FIGURE 10.54
(a) Original image.
(b) Topographic
view. (c)–(d) Two
stages of flooding.
10.5 ■ Segmentation Using Morphological Watersheds 771

perimeter of the entire topography (image) being enclosed by dams of height


greater than the highest possible mountain, whose value is determined by the
highest possible intensity value in the input image.
Suppose that a hole is punched in each regional minimum [shown as dark
areas in Fig. 10.54(b)] and that the entire topography is flooded from below by
letting water rise through the holes at a uniform rate. Figure 10.54(c) shows the
first stage of flooding, where the “water,” shown in light gray, has covered only
areas that correspond to the very dark background in the image. In Figs. 10.54(d)
and (e) we see that the water now has risen into the first and second catchment
basins, respectively. As the water continues to rise, it will eventually overflow
from one catchment basin into another. The first indication of this is shown in
10.54(f). Here, water from the left basin actually overflowed into the basin on
the right and a short “dam” (consisting of single pixels) was built to prevent
water from merging at that level of flooding (the details of dam building are dis-
cussed in the following section).The effect is more pronounced as water continues
to rise, as shown in Fig. 10.54(g).This figure shows a longer dam between the two
catchment basins and another dam in the top part of the right basin. The latter
dam was built to prevent merging of water from that basin with water from areas
corresponding to the background. This process is continued until the maximum

e f
g h
FIGURE 10.54
(Continued)
(e) Result of
further flooding.
(f) Beginning of
merging of water
from two
catchment basins
(a short dam was
built between
them). (g) Longer
dams. (h) Final
watershed
(segmentation)
lines.
(Courtesy of Dr. S.
Beucher,
CMM/Ecole des
Mines de Paris.)
772 Chapter 10 ■ Image Segmentation

level of flooding (corresponding to the highest intensity value in the image) is


reached. The final dams correspond to the watershed lines, which are the de-
sired segmentation result. The result for this example is shown in Fig.
10.54(h) as dark, 1-pixel-thick paths superimposed on the original image.
Note the important property that the watershed lines form connected paths,
thus giving continuous boundaries between regions.
One of the principal applications of watershed segmentation is in the ex-
traction of nearly uniform (bloblike) objects from the background. Regions
characterized by small variations in intensity have small gradient values. Thus,
in practice, we often see watershed segmentation applied to the gradient of an
image, rather than to the image itself. In this formulation, the regional minima
of catchment basins correlate nicely with the small value of the gradient corre-
sponding to the objects of interest.

10.5.2 Dam Construction


Before proceeding, let us consider how to construct the dams or watershed
lines required by watershed segmentation algorithms. Dam construction is
based on binary images, which are members of 2-D integer space Z2 (see
Section 2.4.2). The simplest way to construct dams separating sets of binary
points is to use morphological dilation (see Section 9.2.2).
The basics of how to construct dams using dilation are illustrated in Fig. 10.55.
Figure 10.55(a) shows portions of two catchment basins at flooding step n - 1
and Fig. 10.55(b) shows the result at the next flooding step, n. The water has
spilled from one basin to the other and, therefore, a dam must be built to keep
this from happening. In order to be consistent with notation to be introduced
shortly, let M1 and M2 denote the sets of coordinates of points in two regional
minima. Then let the set of coordinates of points in the catchment basin associ-
ated with these two minima at stage n - 1 of flooding be denoted by Cn - 1(M1)
and Cn - 1(M2), respectively. These are the two gray regions in Fig. 10.55(a).
Let C[n - 1] denote the union of these two sets. There are two connected
components in Fig. 10.55(a) (see Section 2.5.2 regarding connected compo-
nents) and only one connected component in Fig. 10.55(b). This connected
component encompasses the earlier two components, shown dashed. The fact
that two connected components have become a single component indicates
that water between the two catchment basins has merged at flooding step n.
Let this connected component be denoted q. Note that the two components
from step n - 1 can be extracted from q by performing the simple AND oper-
ation q ¨ C[n - 1]. We note also that all points belonging to an individual
catchment basin form a single connected component.
Suppose that each of the connected components in Fig. 10.55(a) is dilated
by the structuring element shown in Fig. 10.55(c), subject to two conditions:
(1) The dilation has to be constrained to q (this means that the center of the
structuring element can be located only at points in q during dilation), and (2)
the dilation cannot be performed on points that would cause the sets being di-
lated to merge (become a single connected component). Figure 10.55(d) shows
that a first dilation pass (in light gray) expanded the boundary of each original
connected component. Note that condition (1) was satisfied by every point
10.5 ■ Segmentation Using Morphological Watersheds 773

Origin
1 1 1
1 1 1
1 1 1

First dilation
Second dilation
Dam points
a
b
c
d
FIGURE 10.55 (a) Two partially flooded catchment basins at stage n - 1 of flooding.
(b) Flooding at stage n, showing that water has spilled between basins. (c) Structuring
element used for dilation. (d) Result of dilation and dam construction.
774 Chapter 10 ■ Image Segmentation

during dilation, and condition (2) did not apply to any point during the dila-
tion process; thus the boundary of each region was expanded uniformly.
In the second dilation (shown in black), several points failed condition (1)
while meeting condition (2), resulting in the broken perimeter shown in the fig-
ure. It also is evident that the only points in q that satisfy the two conditions
under consideration describe the 1-pixel-thick connected path shown crossed-
hatched in Fig. 10.55(d). This path constitutes the desired separating dam at
stage n of flooding. Construction of the dam at this level of flooding is complet-
ed by setting all the points in the path just determined to a value greater than the
maximum intensity value of the image. The height of all dams is generally set at
1 plus the maximum allowed value in the image. This will prevent water from
crossing over the part of the completed dam as the level of flooding is increased.
It is important to note that dams built by this procedure, which are the desired
segmentation boundaries, are connected components. In other words, this
method eliminates the problems of broken segmentation lines.
Although the procedure just described is based on a simple example, the
method used for more complex situations is exactly the same, including the use
of the 3 * 3 symmetric structuring element shown in Fig. 10.55(c).

10.5.3 Watershed Segmentation Algorithm


Let M1, M2, Á , MR be sets denoting the coordinates of the points in the
regional minima of an image g(x, y). As indicated at the end of Section 10.5.1,
this typically will be a gradient image. Let C(Mi) be a set denoting the coordi-
nates of the points in the catchment basin associated with regional minimum
Mi (recall that the points in any catchment basin form a connected component).
The notation min and max will be used to denote the minimum and maximum
values of g(x, y). Finally, let T[n] represent the set of coordinates (s, t) for
which g(s, t) 6 n. That is,

T[n] = 5(s, t) ƒ g(s, t) 6 n6 (10.5-1)

Geometrically, T[n] is the set of coordinates of points in g(x, y) lying below


the plane g(x, y) = n.
The topography will be flooded in integer flood increments, from
n = min + 1 to n = max + 1. At any step n of the flooding process, the algo-
rithm needs to know the number of points below the flood depth. Conceptual-
ly, suppose that the coordinates in T[n] that are below the plane g(x, y) = n
are “marked” black, and all other coordinates are marked white. Then when
we look “down” on the xy-plane at any increment n of flooding, we will see a
binary image in which black points correspond to points in the function that
are below the plane g(x, y) = n. This interpretation is quite useful in helping
clarify the following discussion.
Let Cn(Mi) denote the set of coordinates of points in the catchment basin
associated with minimum Mi that are flooded at stage n. With reference to the
discussion in the previous paragraph, Cn(Mi) may be viewed as a binary image
given by
10.5 ■ Segmentation Using Morphological Watersheds 775

Cn(Mi) = C(Mi) ¨ T[n] (10.5-2)

In other words, Cn(Mi) = 1 at location (x, y) if (x, y) H C(Mi) AND


(x, y) H T[n]; otherwise Cn(Mi) = 0. The geometrical interpretation of this re-
sult is straightforward. We are simply using the AND operator to isolate at
stage n of flooding the portion of the binary image in T[n] that is associated
with regional minimum Mi.
Next, we let C[n] denote the union of the flooded catchment basins at stage n:
R
C[n] = d Cn(Mi) (10.5-3)
i=1

Then C[max + 1] is the union of all catchment basins:


R
C[max + 1] = d C(Mi) (10.5-4)
i=1
It can be shown (Problem 10.41) that the elements in both Cn(Mi) and T[n] are
never replaced during execution of the algorithm, and that the number of ele-
ments in these two sets either increases or remains the same as n increases.
Thus, it follows that C[n - 1] is a subset of C[n]. According to Eqs. (10.5-2)
and (10.5-3), C[n] is a subset of T[n], so it follows that C[n - 1] is a subset of
T[n]. From this we have the important result that each connected component
of C[n - 1] is contained in exactly one connected component of T[n].
The algorithm for finding the watershed lines is initialized with
C[min + 1] = T[min + 1]. The algorithm then proceeds recursively, computing
C[n] from C[n - 1]. A procedure for obtaining C[n] from C[n - 1] is as fol-
lows. Let Q denote the set of connected components in T[n]. Then, for each
connected component q H Q[n], there are three possibilities:
1. q ¨ C[n - 1] is empty.
2. q ¨ C[n - 1] contains one connected component of C[n - 1].
3. q ¨ C[n - 1] contains more than one connected component of C[n - 1].

Construction of C[n] from C[n - 1] depends on which of these three conditions


holds. Condition 1 occurs when a new minimum is encountered, in which case
connected component q is incorporated into C[n - 1] to form C[n]. Condition 2
occurs when q lies within the catchment basin of some regional minimum, in
which case q is incorporated into C[n - 1] to form C[n]. Condition 3 occurs
when all, or part, of a ridge separating two or more catchment basins is en-
countered. Further flooding would cause the water level in these catchment
basins to merge. Thus a dam (or dams if more than two catchment basins are
involved) must be built within q to prevent overflow between the catchment
basins. As explained in the previous section, a one-pixel-thick dam can be con-
structed when needed by dilating q ¨ C[n - 1] with a 3 * 3 structuring ele-
ment of 1s, and constraining the dilation to q.
Algorithm efficiency is improved by using only values of n that correspond
to existing intensity values in g(x, y); we can determine these values, as well as
the values of min and max, from the histogram of g(x, y).
776 Chapter 10 ■ Image Segmentation

a b
c d
FIGURE 10.56
(a) Image of blobs.
(b) Image gradient.
(c) Watershed lines.
(d) Watershed lines
superimposed on
original image.
(Courtesy of Dr.
S. Beucher,
CMM/Ecole des
Mines de Paris.)

EXAMPLE 10.25: ■ Consider the image and its gradient in Figs. 10.56(a) and (b), respectively.
Illustration of the Application of the watershed algorithm just described yielded the watershed
watershed lines (white paths) of the gradient image in Fig. 10.56(c). These segmentation
segmentation
algorithm. boundaries are shown superimposed on the original image in Fig. 10.56(d). As
noted at the beginning of this section, the segmentation boundaries have the
important property of being connected paths. ■

10.5.4 The Use of Markers


Direct application of the watershed segmentation algorithm in the form
discussed in the previous section generally leads to oversegmentation due to
noise and other local irregularities of the gradient. As Fig. 10.57 shows, over-
segmentation can be serious enough to render the result of the algorithm vir-
tually useless. In this case, this means a large number of segmented regions. A
practical solution to this problem is to limit the number of allowable regions
by incorporating a preprocessing stage designed to bring additional knowl-
edge into the segmentation procedure.
An approach used to control oversegmentation is based on the concept of
markers. A marker is a connected component belonging to an image. We have
internal markers, associated with objects of interest, and external markers, as-
sociated with the background. A procedure for marker selection typically will
consist of two principal steps: (1) preprocessing; and (2) definition of a set of
criteria that markers must satisfy. To illustrate, consider Fig. 10.57(a) again.
10.5 ■ Segmentation Using Morphological Watersheds 777

a b
FIGURE 10.57
(a) Electrophoresis
image. (b) Result
of applying the
watershed
segmentation
algorithm to the
gradient image.
Oversegmentation
is evident.
(Courtesy of Dr.
S. Beucher,
CMM/Ecole des
Mines de Paris.)

Part of the problem that led to the oversegmented result in Fig. 10.57(b) is the
large number of potential minima. Because of their size, many of these minima
are irrelevant detail. As has been pointed out several times in earlier discus-
sions, an effective method for minimizing the effect of small spatial detail is to
filter the image with a smoothing filter. This is an appropriate preprocessing
scheme in this particular case.
Suppose that we define an internal marker as (1) a region that is surround-
ed by points of higher “altitude”; (2) such that the points in the region form a
connected component; and (3) in which all the points in the connected com-
ponent have the same intensity value. After the image was smoothed, the in-
ternal markers resulting from this definition are shown as light gray, bloblike
regions in Fig. 10.58(a). Next, the watershed algorithm was applied to the

a b
FIGURE 10.58 (a) Image showing internal markers (light gray regions) and external
markers (watershed lines). (b) Result of segmentation. Note the improvement over Fig.
10.47(b). (Courtesy of Dr. S. Beucher, CMM/Ecole des Mines de Paris.)
778 Chapter 10 ■ Image Segmentation

smoothed image, under the restriction that these internal markers be the only
allowed regional minima. Figure 10.58(a) shows the resulting watershed lines.
These watershed lines are defined as the external markers. Note that the
points along the watershed line pass along the highest points between neigh-
boring markers.
The external markers in Fig. 10.58(a) effectively partition the image into
regions, with each region containing a single internal marker and part of the
background. The problem is thus reduced to partitioning each of these regions
into two: a single object and its background. We can bring to bear on this sim-
plified problem many of the segmentation techniques discussed earlier in this
chapter. Another approach is simply to apply the watershed segmentation
algorithm to each individual region. In other words, we simply take the gradient
of the smoothed image [as in Fig. 10.56(b)] and then restrict the algorithm to
operate on a single watershed that contains the marker in that particular re-
gion. The result obtained using this approach is shown in 10.58(b). The im-
provement over the image in 10.57(b) is evident.
Marker selection can range from simple procedures based on intensity
values and connectivity, as was just illustrated, to more complex descriptions in-
volving size, shape, location, relative distances, texture content, and so on (see
Chapter 11 regarding descriptors).The point is that using markers brings a priori
knowledge to bear on the segmentation problem. The reader is reminded that
humans often aid segmentation and higher-level tasks in everyday vision by
using a priori knowledge, one of the most familiar being the use of context.Thus,
the fact that segmentation by watersheds offers a framework that can make ef-
fective use of this type of knowledge is a significant advantage of this method.

10.6 The Use of Motion in Segmentation


Motion is a powerful cue used by humans and many other animals to extract
objects or regions of interest from a background of irrelevant detail. In imag-
ing applications, motion arises from a relative displacement between the sens-
ing system and the scene being viewed, such as in robotic applications,
autonomous navigation, and dynamic scene analysis. In the following sections
we consider the use of motion in segmentation both spatially and in the fre-
quency domain.

10.6.1 Spatial Techniques


Basic approach
One of the simplest approaches for detecting changes between two image
frames f(x, y, ti) and f(x, y, tj) taken at times ti and tj, respectively, is to com-
pare the two images pixel by pixel. One procedure for doing this is to form a
difference image. Suppose that we have a reference image containing only sta-
tionary components. Comparing this image against a subsequent image of the
same scene, but including a moving object, results in the difference of the two
images canceling the stationary elements, leaving only nonzero entries that
correspond to the nonstationary image components.

You might also like