5 Segmentation Chapter
5 Segmentation Chapter
original image. The lines were obtained by joining all gaps not exceeding
20% of the image height (approximately 100 pixels). These lines clearly corre-
spond to the edges of the runway of interest.
Note that the only key knowledge needed to solve this problem was the ori-
entation of the runway and the observer’s position relative to it. In other
words, a vehicle navigating autonomously would know that if the runway of in-
terest faces north, and the vehicle’s direction of travel also is north, the runway
should appear vertically in the image. Other relative orientations are handled
in a similar manner. The orientations of runways throughout the world are
available in flight charts, and direction of travel is easily obtainable using GPS
(Global Positioning System) information. This information also could be used
to compute the distance between the vehicle and the runway, thus allowing es-
timates of parameters such as expected length of lines relative to image size, as
we did in this example. ■
10.3 Thresholding
Because of its intuitive properties, simplicity of implementation, and computa-
tional speed, image thresholding enjoys a central position in applications of
image segmentation. Thresholding was introduced in Section 3.1.1, and we
have used it in various discussions since then. In this section, we discuss thresh-
olding in a more formal way and develop techniques that are considerably
more general than what has been presented thus far.
10.3.1 Foundation
In the previous section, regions were identified by first finding edge segments
and then attempting to link the segments into boundaries. In this section, we
discuss techniques for partitioning images directly into regions based on inten-
sity values and/or properties of these values.
The basics of intensity thresholding
Suppose that the intensity histogram in Fig. 10.35(a) corresponds to an image,
f(x, y), composed of light objects on a dark background, in such a way that ob-
ject and background pixels have intensity values grouped into two dominant
modes. One obvious way to extract the objects from the background is to se-
lect a threshold, T, that separates these modes. Then, any point (x, y) in the
image at which f(x, y) 7 T is called an object point; otherwise, the point is
called a background point. In other words, the segmented image, g(x, y), is
given by
g(x, y) = b
Although we follow
convention in using 0 1 if f(x, y) 7 T
intensity for the (10.3-1)
background and 1 for
0 if f(x, y) … T
object pixels, any two
distinct values can be When T is a constant applicable over an entire image, the process given in this
used in Eq. (10.3-1).
equation is referred to as global thresholding. When the value of T changes
over an image, we use the term variable thresholding. The term local or
regional thresholding is used sometimes to denote variable thresholding in
10.3 ■ Thresholding 739
a b
FIGURE 10.35
Intensity
histograms that
can be partitioned
(a) by a single
threshold, and
(b) by dual
thresholds.
T T1 T2
g(x, y) = c b
a if f(x, y) 7 T2
if T1 6 f(x, y) … T2 (10.3-2)
c if f(x, y) … T1
where a, b, and c are any three distinct intensity values. We discuss dual thresh-
olding in Section 10.3.6. Segmentation problems requiring more than two
thresholds are difficult (often impossible) to solve, and better results usually
are obtained using other methods, such as variable thresholding, as discussed
in Section 10.3.7, or region growing, as discussed in Section 10.4.
Based on the preceding discussion, we may infer intuitively that the success
of intensity thresholding is directly related to the width and depth of the val-
ley(s) separating the histogram modes. In turn, the key factors affecting the
properties of the valley(s) are: (1) the separation between peaks (the further
apart the peaks are, the better the chances of separating the modes); (2) the
noise content in the image (the modes broaden as noise increases); (3) the rel-
ative sizes of objects and background; (4) the uniformity of the illumination
source; and (5) the uniformity of the reflectance properties of the image.
a b c
d e f
FIGURE 10.36 (a) Noiseless 8-bit image. (b) Image with additive Gaussian noise of mean 0 and standard
deviation of 10 intensity levels. (c) Image with additive Gaussian noise of mean 0 and standard deviation of
50 intensity levels. (d)–(f) Corresponding histograms.
modes. Figure 10.36(b) shows the original image corrupted by Gaussian noise
of zero mean and a standard deviation of 10 intensity levels. Although the cor-
responding histogram modes are now broader [Fig. 10.36(e)], their separation
is large enough so that the depth of the valley between them is sufficient to
make the modes easy to separate. A threshold placed midway between the two
peaks would do a nice job of segmenting the image. Figure 10.36(c) shows the
result of corrupting the image with Gaussian noise of zero mean and a stan-
dard deviation of 50 intensity levels. As the histogram in Fig. 10.36(f) shows,
the situation is much more serious now, as there is no way to differentiate be-
tween the two modes. Without additional processing (such as the methods dis-
cussed in Sections 10.3.4 and 10.3.5) we have little hope of finding a suitable
threshold for segmenting this image.
0 63 127 191 255 0 0.2 0.4 0.6 0.8 1 0 63 127 191 255
a b c
d e f
FIGURE 10.37 (a) Noisy image. (b) Intensity ramp in the range [0.2, 0.6]. (c) Product of (a) and (b).
(d)–(f) Corresponding histograms.
where separation of the modes without additional processing (see Sections In theory, the histogram
of a ramp image is uni-
10.3.4 and 10.3.5) is no longer possible. Similar results would be obtained if the form. In practice, achiev-
illumination was perfectly uniform, but the reflectance of the image was not, ing perfect uniformity
depends on the size of
due, for example, to natural reflectivity variations in the surface of objects the image and number of
and/or background. intensity bits. For exam-
ple, a 256 * 256, 256-
The key point in the preceding paragraph is that illumination and reflectance level ramp image has a
play a central role in the success of image segmentation using thresholding or uniform histogram, but a
256 * 257 ramp image
other segmentation techniques. Therefore, controlling these factors when it is with the same number of
possible to do so should be the first step considered in the solution of a seg- intensities does not.
mentation problem. There are three basic approaches to the problem when
control over these factors is not possible. One is to correct the shading pattern
directly. For example, nonuniform (but fixed) illumination can be corrected by
multiplying the image by the inverse of the pattern, which can be obtained by
imaging a flat surface of constant intensity. The second approach is to attempt
to correct the global shading pattern via processing using, for example, the
top-hat transformation introduced in Section 9.6.3. The third approach is to
“work around” nonuniformities using variable thresholding, as discussed in
Section 10.3.7.
EXAMPLE 10.15: ■ Figure 10.38 shows an example of segmentation based on a threshold esti-
Global mated using the preceding algorithm. Figure 10.38(a) is the original image, and
thresholding. Fig. 10.38(b) is the image histogram, showing a distinct valley. Application of the
preceding iterative algorithm resulted in the threshold T = 125.4 after three it-
erations, starting with T = m (the average image intensity), and using ¢T = 0.
Figure 10.38(c) shows the result obtained using T = 125 to segment the original
image.As expected from the clear separation of modes in the histogram, the seg-
mentation between object and background was quite effective. ■
a b c
FIGURE 10.38 (a) Noisy fingerprint. (b) Histogram. (c) Segmented result using a global threshold (the border
was added for clarity). (Original courtesy of the National Institute of Standards and Technology.)
a pi = 1, pi Ú 0 (10.3-3)
i=0
k
P1(k) = a pi (10.3-4)
i=0
Viewed another way, this is the probability of class C1 occurring. For example,
if we set k = 0, the probability of class C1 having any pixels assigned to it is
zero. Similarly, the probability of class C2 occurring is
L-1
P2(k) = a pi = 1 - P1(k) (10.3-5)
i=k+1
From Eq. (3.3-18), the mean intensity value of the pixels assigned to class C1 is
k
m1(k) = a iP(i>C1)
i=0
k
= a iP(C1>i)P(i)>P(C1) (10.3-6)
i=0
k
1
P1(k) ia
= ipi
=0
where P1(k) is given in Eq. (10.3-4). The term P(i>C1) in the first line of
Eq. (10.3-6) is the probability of value i, given that i comes from class C1. The
second line in the equation follows from Bayes’ formula:
P(A>B) = P(B>A)P(A)>P(B)
The third line follows from the fact that P(C1>i), the probability of C1 given i,
is 1 because we are dealing only with values of i from class C1. Also, P(i) is the
probability of the ith value, which is simply the ith component of the his-
togram, pi. Finally, P(C1) is the probability of class C1, which we know from
Eq. (10.3-4) is equal to P1(k).
Similarly, the mean intensity value of the pixels assigned to class C2 is
L-1
m2(k) = a iP(i>C2)
i=k+1
L-1
(10.3-7)
1
P2(k) i =a
= ipi
k+1
and the average intensity of the entire image (i.e., the global mean) is given by
L-1
mG = a ipi (10.3-9)
i=0
10.3 ■ Thresholding 745
The validity of the following two equations can be verified by direct substitution
of the preceding results:
s2B
h = (10.3-12)
s2G
where s2G is the global variance [i.e., the intensity variance of all the pixels in
the image, as given in Eq. (3.3-19)],
L-1
s2G = a (i - mG)2pi (10.3-13)
i=0
s2B (k)
h(k) = (10.3-16)
s2G
and
2
C mGP1(k) - m(k) D
s2B(k) = (10.3-17)
P1(k) C 1 - P1(k) D
Then, the optimum threshold is the value, k*, that maximizes s2B (k):
In other words, to find k* we simply evaluate Eq. (10.3-18) for all integer values
of k (such that the condition 0 6 P1(k) 6 1 holds) and select that value of k
that yielded the maximum s2B (k). If the maximum exists for more than one
value of k, it is customary to average the various values of k for which s2B (k) is
maximum. It can be shown (Problem 10.33) that a maximum always exists,
subject to the condition that 0 6 P1(k) 6 1. Evaluating Eqs. (10.3-17) and
(10.3-18) for all values of k is a relatively inexpensive computational proce-
dure, because the maximum number of integer values that k can have is L.
Once k* has been obtained, the input image f(x, y) is segmented as before:
g(x, y) = b
1 if f(x, y) 7 k*
(10.3-19)
0 if f(x, y) … k*
1. Compute the normalized histogram of the input image. Denote the com-
ponents of the histogram by pi, i = 0, 1, 2, Á , L - 1.
2. Compute the cumulative sums, P1(k), for k = 0, 1, 2, Á , L - 1, using
Eq. (10.3-4).
3. Compute the cumulative means, m(k), for k = 0, 1, 2, Á , L - 1, using
Eq. (10.3-8).
4. Compute the global intensity mean, mG, using (10.3-9).
5. Compute the between-class variance, s2B (k), for k = 0, 1, 2, Á , L - 1,
using Eq. (10.3-17).
6. Obtain the Otsu threshold, k*, as the value of k for which s2B (k) is maxi-
mum. If the maximum is not unique, obtain k* by averaging the values of
k corresponding to the various maxima detected.
7. Obtain the separability measure, h*, by evaluating Eq. (10.3-16) at
k = k*.
■ Figure 10.39(a) shows an optical microscope image of polymersome cells, EXAMPLE 10.16:
and Fig. 10.39(b) shows its histogram. The objective of this example is to seg- Optimum global
thresholding using
ment the molecules from the background. Figure 10.39(c) is the result of using
Otsu’s method.
the basic global thresholding algorithm developed in the previous section. Be-
cause the histogram has no distinct valleys and the intensity difference be-
Polymersomes are cells
tween the background and objects is small, the algorithm failed to achieve the artificially engineered
desired segmentation. Figure 10.39(d) shows the result obtained using Otsu’s using polymers. Polymor-
somes are invisible to the
method. This result obviously is superior to Fig. 10.39(c). The threshold value human immune system
computed by the basic algorithm was 169, while the threshold computed by and can be used, for ex-
ample, to deliver medica-
Otsu’s method was 181, which is closer to the lighter areas in the image defin- tion to targeted regions
ing the cells. The separability measure h was 0.467. of the body.
a b
c d
FIGURE 10.39
(a) Original
image.
(b) Histogram
(high peaks were
clipped to
highlight details in
the lower values).
(c) Segmentation
result using the
basic global
algorithm from
0 63 127 191 255
Section 10.3.2.
(d) Result
obtained using
Otsu’s method.
(Original image
courtesy of
Professor Daniel
A. Hammer, the
University of
Pennsylvania.)
a b c
d e f
FIGURE 10.40 (a) Noisy image from Fig. 10.36 and (b) its histogram. (c) Result obtained using Otsu’s method.
(d) Noisy image smoothed using a 5 * 5 averaging mask and (e) its histogram. (f) Result of thresholding using
Otsu’s method.
situations such as this, the approach discussed in the following section is more
likely to succeed.
a b c
d e f
FIGURE 10.41 (a) Noisy image and (b) its histogram. (c) Result obtained using Otsu’s method. (d) Noisy
image smoothed using a 5 * 5 averaging mask and (e) its histogram. (f) Result of thresholding using Otsu’s
method. Thresholding failed in both cases.
The approach just discussed assumes that the edges between objects and
background are known. This information clearly is not available during segmen-
tation, as finding a division between objects and background is precisely what
segmentation is all about. However, with reference to the discussion in Section
10.2, an indication of whether a pixel is on an edge may be obtained by comput-
ing its gradient or Laplacian. For example, the average value of the Laplacian is
0 at the transition of an edge (see Fig. 10.10), so the valleys of histograms formed
from the pixels selected by a Laplacian criterion can be expected to be sparsely
populated. This property tends to produce the desirable deep valleys discussed
above. In practice, comparable results typically are obtained using either the
gradient or Laplacian images, with the latter being favored because it is compu-
It is possible to modify
this algorithm so that tationally more attractive and is also an isotropic edge detector.
both the magnitude of The preceding discussion is summarized in the following algorithm, where
the gradient and the
absolute value of the f(x, y) is the input image:
Laplacian images are
used. In this case, we 1. Compute an edge image as either the magnitude of the gradient, or ab-
would specify a threshold
for each image and form
solute value of the Laplacian, of f(x, y) using any of the methods dis-
the logical OR of the two cussed in Section 10.2.
results to obtain the
marker image. This
2. Specify a threshold value, T.
approach is useful when 3. Threshold the image from Step 1 using the threshold from Step 2 to produce
more control is desired
over the points deemed
a binary image, gT(x, y). This image is used as a mask image in the following
to be valid edge points. step to select pixels from f(x, y) corresponding to “strong” edge pixels.
10.3 ■ Thresholding 751
■ Figures 10.42(a) and (b) show the image and histogram from Fig. 10.41. You EXAMPLE 10.17:
saw that this image could not be segmented by smoothing followed by thresh- Using edge
information based
olding. The objective of this example is to solve the problem using edge infor- on the gradient to
mation. Figure 10.42(c) is the gradient magnitude image thresholded at the improve global
thresholding.
a b c
d e f
FIGURE 10.42 (a) Noisy image from Fig. 10.41(a) and (b) its histogram. (c) Gradient magnitude image
thresholded at the 99.7 percentile. (d) Image formed as the product of (a) and (c). (e) Histogram of the
nonzero pixels in the image in (d). (f) Result of segmenting image (a) with the Otsu threshold based on the
histogram in (e). The threshold was 134, which is approximately midway between the peaks in this histogram.
752 Chapter 10 ■ Image Segmentation
99.7 percentile. Figure 10.42(d) is the image formed by multiplying this (mask)
image by the input image. Figure 10.42(e) is the histogram of the nonzero ele-
ments in Fig. 10.42(d). Note that this histogram has the important features dis-
cussed earlier; that is, it has reasonably symmetrical modes separated by a
deep valley. Thus, while the histogram of the original noisy image offered no
hope for successful thresholding, the histogram in Fig. 10.42(e) indicates that
thresholding of the small object from the background is indeed possible. The
result in Fig. 10.42(f) shows that indeed this is the case. This image was ob-
tained by using Otsu’s method to obtain a threshold based on the histogram in
Fig. 10.42(e) and then applying this threshold globally to the noisy image in
Fig. 10.42(a). The result is nearly perfect. ■
EXAMPLE 10.18: ■ In this example we consider a more complex thresholding problem. Figure
Using edge 10.43(a) shows an 8-bit image of yeast cells in which we wish to use global
information based thresholding to obtain the regions corresponding to the bright spots. As a
on the Laplacian
to improve global starting point, Fig. 10.43(b) shows the image histogram, and Fig. 10.43(c) is the
thresholding. result obtained using Otsu’s method directly on the image, using the histogram
shown. We see that Otsu’s method failed to achieve the original objective of
detecting the bright spots, and, although the method was able to isolate some
of the cell regions themselves, several of the segmented regions on the right
are not disjoint. The threshold computed by the Otsu method was 42 and the
separability measure was 0.636.
Figure 10.43(d) shows the image gT(x, y) obtained by computing the absolute
value of the Laplacian image and then thresholding it with T set to 115 on an
intensity scale in the range [0, 255]. This value of T corresponds approximately
to the 99.5 percentile of the values in the absolute Laplacian image, so thresh-
olding at this level should result in a sparse set of pixels, as Fig. 10.43(d) shows.
Note in this image how the points cluster near the edges of the bright spots, as
expected from the preceding discussion. Figure 10.43(e) is the histogram of the
nonzero pixels in the product of (a) and (d). Finally, Fig. 10.43(f) shows the re-
sult of globally segmenting the original image using Otsu’s method based on
the histogram in Fig. 10.43(e). This result agrees with the locations of the
bright spots in the image. The threshold computed by the Otsu method was
115 and the separability measure was 0.762, both of which are higher than the
values obtained by using the original histogram.
By varying the percentile at which the threshold is set we can even improve
on the segmentation of the cell regions. For example, Fig. 10.44 shows the re-
sult obtained using the same procedure as in the previous paragraph, but with
the threshold set at 55, which is approximately 5% of the maximum value of
the absolute Laplacian image. This value is at the 53.9 percentile of the values
in that image. This result clearly is superior to the result in Fig. 10.43(c)
obtained using Otsu’s method with the histogram of the original image. ■
a b c
d e f
FIGURE 10.43 (a) Image of yeast cells. (b) Histogram of (a). (c) Segmentation of (a) with Otsu’s method
using the histogram in (b). (d) Thresholded absolute Laplacian. (e) Histogram of the nonzero pixels in the
product of (a) and (d). (f) Original image thresholded using Otsu’s method based on the histogram in (e).
(Original image courtesy of Professor Susan L. Forsburg, University of Southern California.)
FIGURE 10.44
Image in
Fig. 10.43(a)
segmented using
the same
procedure as
explained in
Figs. 10.43(d)–(f),
but using a lower
value to threshold
the absolute
Laplacian image.
754 Chapter 10 ■ Image Segmentation
where
Pk = a pi (10.3-22)
iHCk
1
mk = a ipi
Pk iHC
(10.3-23)
k
and mG is the global mean given in Eq. (10.3-9). The K classes are separated by
K - 1 thresholds whose values, k…1, k…2, Á , k…K - 1, are the values that maximize
Eq. (10.3-21):
Although this result is perfectly general, it begins to lose meaning as the num-
ber of classes increases, because we are dealing with only one variable (inten-
sity). In fact, the between-class variance usually is cast in terms of multiple
variables expressed as vectors (Fukunaga [1972]). In practice, using multiple
global thresholding is considered a viable approach when there is reason to
believe that the problem can be solved effectively with two thresholds. Appli-
cations that require more than two thresholds generally are solved using more
than just intensity values. Instead, the approach is to use additional descriptors
(e.g., color) and the application is cast as a pattern recognition problem, as ex-
plained in Section 10.3.8.
For three classes consisting of three intensity intervals (which are separated
Thresholding with two by two thresholds) the between-class variance is given by:
thresholds sometimes is
referred to as hysteresis
thresholding. s2B = P1 (m1 - mG)2 + P2 (m2 - mG)2 + P3 (m3 - mG)2 (10.3-25)
where
k1
P1 = a pi
i=0
k2
P2 = a pi (10.3-26)
i = k1 + 1
L-1
P3 = a pi
i = k2 + 1
10.3 ■ Thresholding 755
and
1 k1
P1 ia
m1 = ipi
=0
1 k2
P2 i =a
m2 = ipi (10.3-27)
k1 + 1
1 L-1
P3 i =a
m3 = ipi
k2 + 1
We see that the P and m terms and, therefore s2B, are functions of k1 and k2.
The two optimum threshold values, k…1 and k…2, are the values that maximize
s2B (k1, k2). In other words, as in the single-threshold case discussed in Section
10.3.3, we find the optimum thresholds by finding
The procedure starts by selecting the first value of k1 (that value is 1 because
looking for a threshold at 0 intensity makes no sense; also, keep in mind that the
increment values are integers because we are dealing with intensities). Next, k2
is incremented through all its values greater than k1 and less than L - 1 (i.e.,
k2 = k1 + 1, Á , L - 2). Then k1 is incremented to its next value and k2 is in-
cremented again through all its values greater than k1. This procedure is re-
peated until k1 = L - 3. The result of this process is a 2-D array, s2B (k1, k2),
and the last step is to look for the maximum value in this array. The values of k1
and k2 corresponding to that maximum are the optimum thresholds, k…1 and k…2.
If there are several maxima, the corresponding values of k1 and k2 are averaged
to obtain the final thresholds. The thresholded image is then given by
g(x, y) = c b
a if f(x, y) … k…1
if k…1 6 f(x, y) … k…2 (10.3-31)
c if f(x, y) 7 k…2
EXAMPLE 10.19: ■ Figure 10.45(a) shows an image of an iceberg. The objective of this exam-
Multiple global ple is to segment the image into three regions: the dark background, the illu-
thresholding.
minated area of the iceberg, and the area in shadows. It is evident from the
image histogram in Fig. 10.45(b) that two thresholds are required to solve
this problem. The procedure discussed above resulted in the thresholds
k…1 = 80 and k…2 = 177, which we note from Fig. 10.45(b) are near the centers
of the two histogram valleys. Figure 10.45(c) is the segmentation that result-
ed using these two thresholds in Eq. (10.3-31). The separability measure was
0.954. The principal reason this example worked out so well can be traced to
the histogram having three distinct modes separated by reasonably wide,
deep valleys. ■
Image partitioning
One of the simplest approaches to variable thresholding is to subdivide an
image into nonoverlapping rectangles. This approach is used to compensate
for non-uniformities in illumination and/or reflectance. The rectangles are
chosen small enough so that the illumination of each is approximately uni-
form. We illustrate this approach with an example.
a b c
FIGURE 10.45 (a) Image of iceberg. (b) Histogram. (c) Image segmented into three regions using dual Otsu
thresholds. (Original image courtesy of NOAA.)
10.3 ■ Thresholding 757
■ Figure 10.46(a) shows the image from Fig. 10.37(c), and Fig. 10.46(b) shows EXAMPLE 10.20:
its histogram. When discussing Fig. 10.37(c) we concluded that this image Variable
thresholding via
could not be segmented with a global threshold, a fact confirmed by Figs.
image
10.46(c) and (d), which show the results of segmenting the image using the it- partitioning.
erative scheme discussed in Section 10.3.2 and Otsu’s method, respectively.
Both methods produced comparable results, in which numerous segmentation
errors are visible.
Figure 10.46(e) shows the original image subdivided into six rectangular
regions, and Fig. 10.46(f) is the result of applying Otsu’s global method to each
subimage. Although some errors in segmentation are visible, image subdivi-
sion produced a reasonable result on an image that is quite difficult to seg-
ment. The reason for the improvement is explained easily by analyzing the
histogram of each subimage. As Fig. 10.47 shows, each subimage is character-
ized by a bimodal histogram with a deep valley between the modes, a fact that
we know will lead to effective global thresholding.
Image subdivision generally works well when the objects of interest and the
background occupy regions of reasonably comparable size, as in Fig. 10.46.
When this is not the case, the method typically fails because of the likelihood
of subdivisions containing only object or background pixels. Although this sit-
uation can be addressed by using additional techniques to determine when a
subdivision contains both types of pixels, the logic required to address different
a b c
d e f
FIGURE 10.46 (a) Noisy, shaded image and (b) its histogram. (c) Segmentation of (a) using the iterative
global algorithm from Section 10.3.2. (d) Result obtained using Otsu’s method. (e) Image subdivided into six
subimages. (f) Result of applying Otsu’s method to each subimage individually.
758 Chapter 10 ■ Image Segmentation
FIGURE 10.47
Histograms of the
six subimages in
Fig. 10.46(e).
g(x, y) = b
1 if f(x, y) 7 Txy
(10.3-35)
0 if f(x, y) … Txy
where f(x, y) is the input image. This equation is evaluated for all pixel loca-
tions in the image, and a different threshold is computed at each location
(x, y) using the pixels in the neighborhood Sxy.
10.3 ■ Thresholding 759
g(x, y) = b
1 if Q(local parameters) is true
(10.3-36)
0 if Q(local parameters) is false
Q(sxy, mxy) = b
true if f(x, y) 7 asxy AND f(x, y) 7 bmxy
(10.3-37)
false otherwise
Note that Eq. (10.3-35) is a special case of Eq. (10.3-36), obtained by letting Q
be true if f(x, y) 7 Txy and false otherwise. In this case, the predicate is based
simply on the intensity at a point.
■ Figure 10.48(a) shows the yeast image from Example 10.18. This image has EXAMPLE 10.21:
three predominant intensity levels, so it is reasonable to assume that perhaps Variable
thresholding
dual thresholding could be a good segmentation approach. Figure 10.48(b) is
based on local
the result of using the dual thresholding method explained in Section 10.3.6. image properties.
As the figure shows, it was possible to isolate the bright areas from the back-
ground, but the mid-gray regions on the right side of the image were not seg-
mented properly (recall that we encountered a similar problem with Fig. 10.43(c)
in Example 10.18). To illustrate the use of local thresholding, we computed the
local standard deviation sxy for all (x, y) in the input image using a neighbor-
hood of size 3 * 3. Figure 10.48(c) shows the result. Note how the faint outer
lines correctly delineate the boundaries of the cells. Next, we formed a predi-
cate of the form shown in Eq. (10.3-37) but using the global mean instead of
mxy. Choosing the global mean generally gives better results when the back-
ground is nearly constant and all the object intensities are above or below the
background intensity. The values a = 30 and b = 1.5 were used in completing
the specification of the predicate (these values were determined experimen-
tally, as is usually the case in applications such as this). The image was then seg-
mented using Eq. (10.3-36). As Fig. 10.48(d) shows, the result agrees quite
closely with the two types of intensity regions prevalent in the input image.
Note in particular that all the outer regions were segmented properly and that
most of the inner, brighter regions were isolated correctly. ■
a b
c d
FIGURE 10.48
(a) Image from
Fig. 10.43.
(b) Image
segmented using
the dual
thresholding
approach
discussed in
Section 10.3.6.
(c) Image of local
standard
deviations.
(d) Result
obtained using
local thresholding.
reduce illumination bias. Let zk + 1 denote the intensity of the point encountered
in the scanning sequence at step k + 1. The moving average (mean intensity)
at this new point is given by
■ Figure 10.49(a) shows an image of handwritten text shaded by a spot intensity EXAMPLE 10.22:
pattern. This form of intensity shading is typical of images obtained with a Document
thresholding using
photographic flash. Figure 10.49(b) is the result of segmentation using the
moving averages.
Otsu global thresholding method. It is not unexpected that global thresholding
could not overcome the intensity variation. Figure 10.49(c) shows successful
segmentation with local thresholding using moving averages. A rule of thumb
is to let n equal 5 times the average stroke width. In this case, the average
width was 4 pixels, so we let n = 20 in Eq. (10.3-38) and used b = 0.5.
As another illustration of the effectiveness of this segmentation approach
we used the same parameters as in the previous paragraph to segment the
image in Fig. 10.50(a), which is corrupted by a sinusoidal intensity variation
typical of the variation that may occur when the power supply in a document
scanner is not grounded properly. As Figs. 10.50(b) and (c) show, the segmen-
tation results are comparable to those in Fig. 10.49.
It is of interest to note that successful segmentation results were obtained in
both cases using the same values for n and b, which shows the relative rugged-
ness of the approach. In general, thresholding based on moving averages
works well when the objects of interest are small (or thin) with respect to the
image size, a condition satisfied by images of typed or handwritten text. ■
a b c
FIGURE 10.49 (a) Text image corrupted by spot shading. (b) Result of global thresholding using Otsu’s
method. (c) Result of local thresholding using moving averages.
762 Chapter 10 ■ Image Segmentation
g = b
1 if D(z, a) 6 T
(10.3-39)
0 otherwise
D(z, a) = 7 z - a 7
(10.3-40)
1
= C (z - a) (z - a) D
T 2
a b c
FIGURE 10.50 (a) Text image corrupted by sinusoidal shading. (b) Result of global thresholding using Otsu’s
method. (c) Result of local thresholding using moving averages.
10.4 ■ Region-Based Segmentation 763
EXAMPLE 10.23: ■ Figure 10.51(a) shows an 8-bit X-ray image of a weld (the horizontal dark
Segmentation by region) containing several cracks and porosities (the bright regions running
region growing.
horizontally through the center of the image). We illustrate the use of region
growing by segmenting the defective weld regions. These regions could be
used in applications such as weld inspection, for inclusion in a database of his-
torical studies, or for controlling an automated welding system.
The first order of business is to determine the seed points. From the physics
of the problem, we know that cracks and porosities will attenuate X-rays con-
siderably less than solid welds, so we expect the regions containing these types
of defects to be significantly brighter than other parts of the X-ray image. We
can extract the seed points by thresholding the original image, using a thresh-
old set at a high percentile. Figure 10.51(b) shows the histogram of the image
10.4 ■ Region-Based Segmentation 765
a b c
d e f
g h i
FIGURE 10.51 (a) X-ray image of a defective weld. (b) Histogram. (c) Initial seed image. (d) Final seed image
(the points were enlarged for clarity). (e) Absolute value of the difference between (a) and (c). (f) Histogram
of (e). (g) Difference image thresholded using dual thresholds. (h) Difference image thresholded with the
smallest of the dual thresholds. (i) Segmentation result obtained by region growing. (Original image courtesy
of X-TEK Systems, Ltd.)
and Fig. 10.51(c) shows the thresholded result obtained with a threshold equal
to the 99.9 percentile of intensity values in the image, which in this case was
254 (see Section 10.3.5 regarding percentiles). Figure 10.51(d) shows the result
of morphologically eroding each connected component in Fig. 10.51(c) to a
single point.
Next, we have to specify a predicate. In this example, we are interested in
appending to each seed all the pixels that (a) are 8-connected to that seed and
766 Chapter 10 ■ Image Segmentation
Q = c
TRUE if the absolute difference of the intensities
between the seed and the pixel at (x, y) is … T
FALSE otherwise
Let R represent the entire image region and select a predicate Q. One
approach for segmenting R is to subdivide it successively into smaller and
smaller quadrant regions so that, for any region Ri, Q(Ri) = TRUE. We start
with the entire region. If Q(R) = FALSE, we divide the image into quadrants.
If Q is FALSE for any quadrant, we subdivide that quadrant into subquad-
rants, and so on. This particular splitting technique has a convenient represen-
tation in the form of so-called quadtrees, that is, trees in which each node has
exactly four descendants, as Fig. 10.52 shows (the images corresponding to the
nodes of a quadtree sometimes are called quadregions or quadimages). Note
that the root of the tree corresponds to the entire image and that each node
corresponds to the subdivision of a node into four descendant nodes. In this
case, only R4 was subdivided further.
If only splitting is used, the final partition normally contains adjacent re-
gions with identical properties. This drawback can be remedied by allowing
merging as well as splitting. Satisfying the constraints of segmentation outlined See Section 2.5.2
in Section 10.1 requires merging only adjacent regions whose combined pixels regarding region
adjacency.
satisfy the predicate Q. That is, two adjacent regions Rj and Rk are merged
only if Q(Rj ´ Rk) = TRUE.
The preceding discussion can be summarized by the following procedure in
which, at any step, we
1. Split into four disjoint quadrants any region Ri for which Q(Ri) = FALSE.
2. When no further splitting is possible, merge any adjacent regions Rj and
Rk for which Q(Rj ´ Rk) = TRUE.
3. Stop when no further merging is possible.
It is customary to specify a minimum quadregion size beyond which no further
splitting is carried out.
Numerous variations of the preceding basic theme are possible. For example,
a significant simplification results if in Step 2 we allow merging of any two ad-
jacent regions Ri and Rj if each one satisfies the predicate individually. This re-
sults in a much simpler (and faster) algorithm, because testing of the predicate
is limited to individual quadregions. As the following example shows, this sim-
plification is still capable of yielding good segmentation results.
a b
R
FIGURE 10.52
(a) Partitioned
image.
R1 R2 (b)
R1 R2 R3 R4 Corresponding
quadtree. R
R41 R42 represents the
R3 entire image
R43 R44 R41 R42 R43 R44 region.
768 Chapter 10 ■ Image Segmentation
EXAMPLE 10.24: ■ Figure 10.53(a) shows a 566 * 566 X-ray band image of the Cygnus Loop.
Segmentation by The objective of this example is to segment out of the image the “ring” of less
region splitting
dense matter surrounding the dense center. The region of interest has some
and merging.
obvious characteristics that should help in its segmentation. First, we note that
the data in this region has a random nature, indicating that its standard devia-
tion should be greater than the standard deviation of the background (which is
near 0) and of the large central region, which is fairly smooth. Similarly, the
mean value (average intensity) of a region containing data from the outer ring
should be greater than the mean of the darker background and less than the
mean of the large, lighter central region. Thus, we should be able to segment
the region of interest using the following predicate:
Q = b
TRUE if s 7 a AND 0 6 m 6 b
FALSE otherwise
where m and s are the mean and standard deviation of the pixels in a quadre-
gion, and a and b are constants.
Analysis of several regions in the outer area of interest revealed that the
mean intensity of pixels in those regions did not exceed 125 and the standard
deviation was always greater than 10. Figures 10.53(b) through (d) show the
results obtained using these values for a and b, and varying the minimum size
allowed for the quadregions from 32 to 8. The pixels in a quadregion whose
a b
c d
FIGURE 10.53
(a) Image of the
Cygnus Loop
supernova, taken
in the X-ray band
by NASA’s
Hubble Telescope.
(b)–(d) Results of
limiting the
smallest allowed
quadregion to
sizes of
32 * 32, 16 * 16,
and 8 * 8 pixels,
respectively.
(Original image
courtesy of
NASA.)
10.5 ■ Segmentation Using Morphological Watersheds 769
pixels satisfied the predicate were set to white; all others in that region were set
to black. The best result in terms of capturing the shape of the outer region was
obtained using quadregions of size 16 * 16. The black squares in Fig. 10.53(d)
are quadregions of size 8 * 8 whose pixels did not satisfied the predicate. Using
smaller quadregions would result in increasing numbers of such black regions.
Using regions larger than the one illustrated here results in a more “block-
like” segmentation. Note that in all cases the segmented regions (white pixels)
completely separate the inner, smoother region from the background. Thus,
the segmentation effectively partitioned the image into three distinct areas
that correspond to the three principal features in the image: background,
dense, and sparse regions. Using any of the white regions in Fig. 10.53 as a
mask would make it a relatively simple task to extract these regions from the
original image (Problem 10.40). As in Example 10.23, these results could not
have been obtained using edge- or threshold-based segmentation. ■
As used in the preceding example, properties based on the mean and standard
deviation of pixel intensities in a region attempt to quantify the texture of the
region (see Section 11.3.3 for a discussion on texture). The concept of texture
segmentation is based on using measures of texture in the predicates. In other
words, we can perform texture segmentation by any of the methods discussed
in this section simply by specifying predicates based on texture content.
10.5.1 Background
The concept of watersheds is based on visualizing an image in three dimen-
sions: two spatial coordinates versus intensity, as in Fig. 2.18(a). In such a
“topographic” interpretation, we consider three types of points: (a) points be-
longing to a regional minimum; (b) points at which a drop of water, if placed at
the location of any of those points, would fall with certainty to a single mini-
mum; and (c) points at which water would be equally likely to fall to more
than one such minimum. For a particular regional minimum, the set of points
satisfying condition (b) is called the catchment basin or watershed of that
770 Chapter 10 ■ Image Segmentation
minimum. The points satisfying condition (c) form crest lines on the topo-
graphic surface and are termed divide lines or watershed lines.
The principal objective of segmentation algorithms based on these concepts
is to find the watershed lines. The basic idea is simple, as the following analogy
illustrates. Suppose that a hole is punched in each regional minimum and that
the entire topography is flooded from below by letting water rise through the
holes at a uniform rate. When the rising water in distinct catchment basins is
about to merge, a dam is built to prevent the merging. The flooding will even-
tually reach a stage when only the tops of the dams are visible above the water
line. These dam boundaries correspond to the divide lines of the watersheds.
Therefore, they are the (connected) boundaries extracted by a watershed seg-
mentation algorithm.
These ideas can be explained further with the aid of Fig. 10.54. Figure 10.54(a)
shows a gray-scale image and Fig. 10.54(b) is a topographic view, in which the
height of the “mountains” is proportional to intensity values in the input
image. For ease of interpretation, the backsides of structures are shaded. This
is not to be confused with intensity values; only the general topography of the
three-dimensional representation is of interest. In order to prevent the rising
water from spilling out through the edges of the image, we imagine the
a b
c d
FIGURE 10.54
(a) Original image.
(b) Topographic
view. (c)–(d) Two
stages of flooding.
10.5 ■ Segmentation Using Morphological Watersheds 771
e f
g h
FIGURE 10.54
(Continued)
(e) Result of
further flooding.
(f) Beginning of
merging of water
from two
catchment basins
(a short dam was
built between
them). (g) Longer
dams. (h) Final
watershed
(segmentation)
lines.
(Courtesy of Dr. S.
Beucher,
CMM/Ecole des
Mines de Paris.)
772 Chapter 10 ■ Image Segmentation
Origin
1 1 1
1 1 1
1 1 1
First dilation
Second dilation
Dam points
a
b
c
d
FIGURE 10.55 (a) Two partially flooded catchment basins at stage n - 1 of flooding.
(b) Flooding at stage n, showing that water has spilled between basins. (c) Structuring
element used for dilation. (d) Result of dilation and dam construction.
774 Chapter 10 ■ Image Segmentation
during dilation, and condition (2) did not apply to any point during the dila-
tion process; thus the boundary of each region was expanded uniformly.
In the second dilation (shown in black), several points failed condition (1)
while meeting condition (2), resulting in the broken perimeter shown in the fig-
ure. It also is evident that the only points in q that satisfy the two conditions
under consideration describe the 1-pixel-thick connected path shown crossed-
hatched in Fig. 10.55(d). This path constitutes the desired separating dam at
stage n of flooding. Construction of the dam at this level of flooding is complet-
ed by setting all the points in the path just determined to a value greater than the
maximum intensity value of the image. The height of all dams is generally set at
1 plus the maximum allowed value in the image. This will prevent water from
crossing over the part of the completed dam as the level of flooding is increased.
It is important to note that dams built by this procedure, which are the desired
segmentation boundaries, are connected components. In other words, this
method eliminates the problems of broken segmentation lines.
Although the procedure just described is based on a simple example, the
method used for more complex situations is exactly the same, including the use
of the 3 * 3 symmetric structuring element shown in Fig. 10.55(c).
a b
c d
FIGURE 10.56
(a) Image of blobs.
(b) Image gradient.
(c) Watershed lines.
(d) Watershed lines
superimposed on
original image.
(Courtesy of Dr.
S. Beucher,
CMM/Ecole des
Mines de Paris.)
EXAMPLE 10.25: ■ Consider the image and its gradient in Figs. 10.56(a) and (b), respectively.
Illustration of the Application of the watershed algorithm just described yielded the watershed
watershed lines (white paths) of the gradient image in Fig. 10.56(c). These segmentation
segmentation
algorithm. boundaries are shown superimposed on the original image in Fig. 10.56(d). As
noted at the beginning of this section, the segmentation boundaries have the
important property of being connected paths. ■
a b
FIGURE 10.57
(a) Electrophoresis
image. (b) Result
of applying the
watershed
segmentation
algorithm to the
gradient image.
Oversegmentation
is evident.
(Courtesy of Dr.
S. Beucher,
CMM/Ecole des
Mines de Paris.)
Part of the problem that led to the oversegmented result in Fig. 10.57(b) is the
large number of potential minima. Because of their size, many of these minima
are irrelevant detail. As has been pointed out several times in earlier discus-
sions, an effective method for minimizing the effect of small spatial detail is to
filter the image with a smoothing filter. This is an appropriate preprocessing
scheme in this particular case.
Suppose that we define an internal marker as (1) a region that is surround-
ed by points of higher “altitude”; (2) such that the points in the region form a
connected component; and (3) in which all the points in the connected com-
ponent have the same intensity value. After the image was smoothed, the in-
ternal markers resulting from this definition are shown as light gray, bloblike
regions in Fig. 10.58(a). Next, the watershed algorithm was applied to the
a b
FIGURE 10.58 (a) Image showing internal markers (light gray regions) and external
markers (watershed lines). (b) Result of segmentation. Note the improvement over Fig.
10.47(b). (Courtesy of Dr. S. Beucher, CMM/Ecole des Mines de Paris.)
778 Chapter 10 ■ Image Segmentation
smoothed image, under the restriction that these internal markers be the only
allowed regional minima. Figure 10.58(a) shows the resulting watershed lines.
These watershed lines are defined as the external markers. Note that the
points along the watershed line pass along the highest points between neigh-
boring markers.
The external markers in Fig. 10.58(a) effectively partition the image into
regions, with each region containing a single internal marker and part of the
background. The problem is thus reduced to partitioning each of these regions
into two: a single object and its background. We can bring to bear on this sim-
plified problem many of the segmentation techniques discussed earlier in this
chapter. Another approach is simply to apply the watershed segmentation
algorithm to each individual region. In other words, we simply take the gradient
of the smoothed image [as in Fig. 10.56(b)] and then restrict the algorithm to
operate on a single watershed that contains the marker in that particular re-
gion. The result obtained using this approach is shown in 10.58(b). The im-
provement over the image in 10.57(b) is evident.
Marker selection can range from simple procedures based on intensity
values and connectivity, as was just illustrated, to more complex descriptions in-
volving size, shape, location, relative distances, texture content, and so on (see
Chapter 11 regarding descriptors).The point is that using markers brings a priori
knowledge to bear on the segmentation problem. The reader is reminded that
humans often aid segmentation and higher-level tasks in everyday vision by
using a priori knowledge, one of the most familiar being the use of context.Thus,
the fact that segmentation by watersheds offers a framework that can make ef-
fective use of this type of knowledge is a significant advantage of this method.