Mod3 - Computer Vision
Mod3 - Computer Vision
where the terms in capital letters are the Fourier transforms of the corresponding
terms in Eq. (5-1). These two equations are the foundation for most of the restora-
tion material in this chapter.
In the following three sections, we work only with degradations caused by noise.
Beginning in Section 5.5 we look at several methods for image restoration in the
presence of both $ and h.
FIGURE 5.1
A model of the g (x, y)
image f(x, y) fˆ(x, y)
degradation/
restoration
process.
vtucircle.com
5.2 Noise Models 319
Gaussian Noise
Because of its mathematical tractability in both the spatial and frequency domains,
Gaussian noise models are used frequently in practice. In fact, this tractability is so
convenient that it often results in Gaussian models being used in situations in which
they are marginally applicable at best.
The PDF of a Gaussian random variable, z, is defined by the following familiar
expression:
(z z )2
1
p(z) e 2s2
∞ z ∞ (5-3)
where z represents intensity, z is the mean (average) value of z, and s is its standard
deviation. Figure 5.2(a) shows a plot of this function. We know that for a Gaussian
random variable, the probability that values of z are in the range z s is approxi-
mately 0.68; the probability is about 0.95 that the values of z are in the range z 2s.
vtucircle.com
320 Chapter 5 Image Restoration and Reconstruction
p(z) p(z) p(z)
1 2
0.607
b K
Rayleigh
0.607
a(b — 1)b—1 —(b—1)
e
(b — 1)!
_ _ _
— a b z (b — 1)/a z
a+
2
p(z) p(z) p(z)
1 Pp )
a
b—a
Ps
Pp
z a b z 0 V 2k 1 z
a b c
de f
FIGURE 5.2 Some important probability density functions.
Rayleigh Noise
The PDF of Rayleigh noise is given by
2 (z a)e(z a) 2
b
za
(5-4)
0 z a
The mean and variance of z when this random variable is characterized by a Ray-
leigh PDF are
z a (5-5)
and
2 b4 p
s (5-6)
4
Figure 5.2(b) shows a plot of the Rayleigh density. Note the displacement from the
origin, and the fact that the basic shape of the density is skewed to the right. The
Rayleigh density can be quite useful for modeling the shape of skewed histograms.
vtucircle.com
5.2 Noise Models 321
abzb 1 az
e z 0
p(z) (b 1)! (5-7)
0 z 0
where the parameters are such that a b, b is a positive integer, and “!” indicates
factorial. The mean and variance of z are
b
z (5-8)
a
and
b
s2 (5-9)
a2
Figure 5.2(c) shows a plot of this density. Although Eq. (5-9) often is referred to as
the gamma density, strictly speaking this is correct only when the denominator is
the gamma function, Г(b). When the denominator is as shown, the density is more
appropriately called the Erlang density.
Exponential Noise
The PDF of exponential noise is given by
aeaz z 0
p(z) (5-10)
0 z 0
Uniform Noise
The PDF of uniform noise is
1
a z b
p(z) b a (5-13)
0
otherwise
vtucircle.com
322 Chapter 5 Image Restoration and Reconstruction
The mean and variance of z are
a b
z (5-14)
2
and
2 (b a)2
s (5-15)
12
Figure 5.2(e) shows a plot of the uniform density.
Salt-and-Pepper Noise
If k represents the number of bits used to represent the intensity values in a digital
image, then the range of possible intensity values for that image is [0, 2k 1] (e.g.,
[0, 255] for an 8-bit image). The PDF of salt-and-pepper noise is given by
When image intensities
are scaled to the range P for z 2k 1
[0, 1], we replace by 1 the
value of salt in this equa- s
vtucircle.com
5.2 Noise Models 323
where we have included 0 as a value explicit in both equations to indicate that the
value of pepper noise is assumed to be zero.
As a group, the preceding PDFs provide useful tools for modeling a broad range
of noise corruption situations found in practice. For example, Gaussian noise arises
in an image due to factors such as electronic circuit noise and sensor noise caused by
poor illumination and/or high temperature. The Rayleigh density is helpful in char-
acterizing noise phenomena in range imaging. The exponential and gamma densities
find application in laser imaging. Impulse noise is found in situations where quick
transients, such as faulty switching, take place during imaging. The uniform density
is perhaps the least descriptive of practical situations. However, the uniform density
is quite useful as the basis for numerous random number generators that are used
extensively in simulations (Gonzalez, Woods, and Eddins [2009]).
Figure 5.3 shows a test pattern used for illustrating the noise models just discussed. This is a suitable pat-
tern to use because it is composed of simple, constant areas that span the gray scale from black to near
white in only three increments. This facilitates visual analysis of the characteristics of the various noise
components added to an image.
Figure 5.4 shows the test pattern after addition of the six types of noise in Fig. 5.2. Below each image
is the histogram computed directly from that image. The parameters of the noise were chosen in each
case so that the histogram corresponding to the three intensity levels in the test pattern would start to
merge. This made the noise quite visible, without obscuring the basic structure of the underlying image.
We see a close correspondence in comparing the histograms in Fig. 5.4 with the PDFs in Fig. 5.2.
The histogram for the salt-and-pepper example does not contain a specific peak for V because, as you
will recall, V is used only during the creation of the noise image to leave values in the original image
unchanged. Of course, in addition to the salt and pepper peaks, there are peaks for the other intensi-
ties in the image. With the exception of slightly different overall intensity, it is difficult to differentiate
FIGURE 5.3
Test pattern used
to illustrate the
characteristics of
the PDFs from
Fig. 5.2.
vtucircle.com
324 Chapter 5 Image Restoration and Reconstruction
a b c
de f
FIGURE 5.4 Images and histograms resulting from adding Gaussian, Rayleigh, and Erlanga noise to the image in
Fig. 5.3.
visually between the first five images in Fig. 5.4, even though their histograms are significantly different.
The salt-and-pepper appearance of the image in Fig. 5.4(i) is the only one that is visually indicative of
the type of noise causing the degradation.
PERIODIC NOISE
Periodic noise in images typically arises from electrical or electromechanical inter-
ference during image acquisition. This is the only type of spatially dependent noise
we will consider in this chapter. As we will discuss in Section 5.4, periodic noise can
be reduced significantly via frequency domain filtering. For example, consider the
image in Fig. 5.5(a). This image is corrupted by additive (spatial) sinusoidal noise.
The Fourier transform of a pure sinusoid is a pair of conjugate impulses† located at
† Be careful not to confuse the term impulse in the frequency domain with the use of the same term in impulse
noise discussed earlier, which is in the spatial domain.
vtucircle.com
5.2 Noise Models 325
g i
j l
FIGURE 5.4 (continued) Images and histograms resulting from adding exponential, uniform, and salt-and-pepper noise
to the image in Fig. 5.3. In the salt-and-pepper histogram, the peaks in the origin (zero intensity) and at the far end
of the scale are shown displaced slightly so that they do not blend with the page background.
the conjugate frequencies of the sine wave (see Table 4.4). Thus, if the amplitude of
a sine wave in the spatial domain is strong enough, we would expect to see in the
spectrum of the image a pair of impulses for each sine wave in the image. As shown
in Fig. 5.5(b), this is indeed the case. Eliminating or reducing these impulses in the
frequency domain will eliminate or reduce the sinusoidal noise in the spatial domain.
We will have much more to say in Section 5.4 about this and other examples of peri-
odic noise.
vtucircle.com
326 Chapter 5 Image Restoration and Reconstruction
a b
FIGURE 5.5
(a) Image
corrupted by
additive
sinusoidal noise.
(b) Spectrum
showing two
conjugate
impulses caused
by the sine wave.
(Original
image courtesy of
NASA.)
of noise components directly from the image, but this is possible only in simplis-
tic cases. Automated analysis is possible in situations in which the noise spikes are
either exceptionally pronounced, or when knowledge is available about the general
location of the frequency components of the interference (see Section 5.4).
The parameters of noise PDFs may be known partially from sensor specifications,
but it is often necessary to estimate them for a particular imaging arrangement. If
the imaging system is available, one simple way to study the characteristics of system
noise is to capture a set of “flat” images. For example, in the case of an optical sen-
sor, this is as simple as imaging a solid gray board that is illuminated uniformly. The
resulting images typically are good indicators of system noise.
When only images already generated by a sensor are available, it is often possible
to estimate the parameters of the PDF from small patches of reasonably constant
background intensity. For example, the vertical strips shown in Fig. 5.6 were cropped
from the Gaussian, Rayleigh, and uniform images in Fig. 5.4. The histograms shown
were calculated using image data from these small strips. The histograms in Fig. 5.4
that correspond to the histograms in Fig. 5.6 are the ones in the middle of the group
of three in Figs. 5.4(d), (e), and (k).We see that the shapes of these histograms cor-
respond quite closely to the shapes of the corresponding histograms in Fig. 5.6. Their
heights are different due to scaling, but the shapes are unmistakably similar.
The simplest use of the data from the image strips is for calculating the mean and
variance of intensity levels. Consider a strip (subimage) denoted by S, and let pS (zi ),
i 0, 1, 2, … , L 1, denote the probability estimates (normalized histogram values)
of the intensities of the pixels in S, where L is the number of possible intensities in
the entire image (e.g., 256 for an 8-bit image). As in Eqs. (2-69) and (2-70), we esti-
mate the mean and variance of the pixel values in S as follows:
L1
z
i0
zi pS (zi ) (5-19)
and
vtucircle.com
5.3 Restoration in the Presence of Noise Only—Spatial Filtering 327
FIGURE 5.6 Histograms computed using small strips (shown as inserts) from (a) the Gaussian, (b) the Rayleigh, and
(c) the uniform noisy images in Fig. 5.4.
L1
The shape of the histogram identifies the closest PDF match. If the shape is approxi-
mately Gaussian, then the mean and variance are all we need because the Gaussian
PDF is specified completely by these two parameters. For the other shapes discussed
earlier, we use the mean and variance to solve for the parameters a and b. Impulse
noise is handled differently because the estimate needed is of the actual probability
of occurrence of white and black pixels. Obtaining this estimate requires that both
black and white pixels be visible, so a mid-gray, relatively constant area is needed in
the image in order to be able to compute a meaningful histogram of the noise. The
heights of the peaks corresponding to black and white pixels are the estimates of Pa
and Pb in Eq. (5-16).
and
The noise terms generally are unknown, so subtracting them from g(x, y) [G(u, v)]
to obtain f (x, y) [F(u, v)] typically is not an option. In the case of periodic noise,
vtucircle.com
328 Chapter 5 Image Restoration and Reconstruction
sometimes it is possible to estimate N(u, v) from the spectrum of G(u, v), as noted
in Section 5.2. In this case N(u, v) can be subtracted from G(u, v) to obtain an esti-
mate of the original image, but this type of knowledge is the exception, rather than
the rule.
Spatial filtering is the method of choice for estimating f (x, y) [i.e., denoising
image g(x, y)] in situations when only additive random noise is present. Spatial fil-
tering was discussed in detail in Chapter 3. With the exception of the nature of the
computation performed by a specific filter, the mechanics for implementing all the
filters that follow are exactly as discussed in Sections 3.4 through 3.7.
MEAN FILTERS
In this section, we discuss briefly the noise-reduction capabilities of the spatial filters
introduced in Section 3.5 and develop several other filters whose performance is in
many cases superior to the filters discussed in that section.
where, as in Eq. (2-43), r and c are the row and column coordinates of the pixels
contained in the neighborhood Sxy . This operation can be implemented using a spa-
tial kernel of size m n in which all coefficients have value 1 mn. A mean filter
smooths local variations in an image, and noise is reduced as a result of blurring.
where M indicates multiplication. Here, each restored pixel is given by the product of
all the pixels in the subimage area, raised to the power 1 mn. As Example 5.2 below
illustrates, a geometric mean filter achieves smoothing comparable to an arithmetic
mean filter, but it tends to lose less image detail in the process.
vtucircle.com
5.3 Restoration in the Presence of Noise Only—Spatial Filtering 329
mn
fˆ(x, y) (5-25)
(r, c)Sxy
1
g(r, c)
The harmonic mean filter works well for salt noise, but fails for pepper noise. It does
well also with other types of noise like Gaussian noise.
g(r, c)Q1
fˆ(x, y) (5-26)
(r, c)Sxy
g(r, c)Q
(r, c)Sxy
where Q is called the order of the filter. This filter is well suited for reducing or vir-
tually eliminating the effects of salt-and-pepper noise. For positive values of Q, the
filter eliminates pepper noise. For negative values of Q, it eliminates salt noise. It
cannot do both simultaneously. Note that the contraharmonic filter reduces to the
arithmetic mean filter if Q 0, and to the harmonic mean filter if Q 1.
Figure 5.7(a) shows an 8-bit X-ray image of a circuit board, and Fig. 5.7(b) shows the same image, but
corrupted with additive Gaussian noise of zero mean and variance of 400. For this type of image, this is
a significant level of noise. Figures 5.7(c) and (d) show, respectively, the result of filtering the noisy image
with an arithmetic mean filter of size 3 3 and a geometric mean filter of the same size. Although both
filters did a reasonable job of attenuating the contribution due to noise, the geometric mean filter did
not blur the image as much as the arithmetic filter. For instance, the connector fingers at the top of the
image are sharper in Fig. 5.7(d) than in (c). The same is true in other parts of the image.
Figure 5.8(a) shows the same circuit image, but corrupted now by pepper noise with probability of
0.1. Similarly, Fig. 5.8(b) shows the image corrupted by salt noise with the same probability. Figure 5.8(c)
shows the result of filtering Fig. 5.8(a) using a contraharmonic mean filter with Q 1.5, and Fig. 5.8(d)
shows the result of filtering Fig. 5.8(b) with Q 1.5. Both filters did a good job of reducing the effect of
the noise. The positive-order filter did a better job of cleaning the background, at the expense of slightly
thinning and blurring the dark areas. The opposite was true of the negative order filter.
In general, the arithmetic and geometric mean filters (particularly the latter) are well suited for ran-
dom noise like Gaussian or uniform noise. The contraharmonic filter is well suited for impulse noise, but
it has the disadvantage that it must be known whether the noise is dark or light in order to select the
proper sign for Q. The results of choosing the wrong sign for Q can be disastrous, as Fig. 5.9 shows. Some
of the filters discussed in the following sections eliminate this shortcoming.
vtucircle.com
330 Chapter 5 Image Restoration and Reconstruction
c d
FIGURE 5.7
(a) X-ray image
of circuit board.
(b) Image
corrupted by
additive Gaussian
noise. (c) Result
of filtering with
an arithmetic
mean filter of size
3 3. (d) Result
of filtering with a
geometric mean
filter of the same
size. (Original
image courtesy of
Mr. Joseph E.
Pascente, Lixi,
Inc.)
ORDER-STATISTIC FILTERS
We introduced order-statistic filters in Section 3.6. We now expand the discussion
in that section and introduce some additional order-statistic filters. As noted in Sec-
tion 3.6, order-statistic filters are spatial filters whose response is based on ordering
(ranking) the values of the pixels contained in the neighborhood encompassed by
the filter. The ranking result determines the response of the filter.
Median Filter
The best-known order-statistic filter in image processing is the median filter, which,
as its name implies, replaces the value of a pixel by the median of the intensity levels
in a predefined neighborhood of that pixel:
where, as before, Sxy is a subimage (neighborhood) centered on point (x, y). The val-
ue of the pixel at (x, y) is included in the computation of the median. Median filters
vtucircle.com
5.3 Restoration in the Presence of Noise Only—Spatial Filtering 331
c d
FIGURE 5.8
(a) Image
corrupted by
pepper noise with
a probability of
0.1. (b) Image
corrupted by salt
noise with the
same
probability.
(c) Result of
filtering (a) with
a 3 3 contra-
harmonic filter
Q 1.5. (d) Result
of filtering (b)
with Q 1.5.
a b
FIGURE 5.9
Results of
selecting the
wrong sign in
contraharmonic
filtering.
(a) Result of
filtering Fig. 5.8(a)
with a
contraharmonic
filter of size 3 3
and Q 1.5.
(b) Result of
filtering Fig. 5.8(b)
using Q 1.5.
vtucircle.com
332 Chapter 5 Image Restoration and Reconstruction
are quite popular because, for certain types of random noise, they provide excellent
noise-reduction capabilities, with considerably less blurring than linear smoothing
filters of similar size. Median filters are particularly effective in the presence of both
bipolar and unipolar impulse noise, as Example 5.3 below shows. Computation of
the median and implementation of this filter are discussed in Section 3.6.
fˆ(x, y) max
(r, c)Sxy
g(r, c) (5-28)
This filter is useful for finding the brightest points in an image or for eroding dark
regions adjacent to bright areas. Also, because pepper noise has very low values, it
is reduced by this filter as a result of the max selection process in the subimage area
Sxy .
The 0th percentile filter is the min filter:
fˆ(x, y) min
(r, c)Sxy
g(r, c) (5-29)
This filter is useful for finding the darkest points in an image or for eroding light
regions adjacent to dark areas. Also, it reduces salt noise as a result of the min opera-
tion.
Midpoint Filter
The midpoint filter computes the midpoint between the maximum and minimum
values in the area encompassed by the filter:
1
fˆ(x, y) max g(r, c) min g(r, c) (5-30)
(r, c)S
2 xy
(r, c)Sxy
Note that this filter combines order statistics and averaging. It works best for ran-
domly distributed noise, like Gaussian or uniform noise.
vtucircle.com
5.3 Restoration in the Presence of Noise Only—Spatial Filtering 333
1
fˆ(x, y) g (r, c) (5-31)
mn d (r, c)S R
where the value of d can range from 0 to mn 1. When d 0 the alpha-trimmed fil-
ter reduces to the arithmetic mean filter discussed earlier. If we choose d mn 1,
the filter becomes a median filter. For other values of d, the alpha-trimmed filter is
useful in situations involving multiple types of noise, such as a combination of salt-
and-pepper and Gaussian noise.
Figure 5.10(a) shows the circuit board image corrupted by salt-and-pepper noise with probabilities
Ps Pp 0.1. Figure 5.10(b) shows the result of median filtering with a filter of size 3 3. The improve-
ment over Fig. 5.10(a) is significant, but several noise points still are visible. A second pass [on the im-
age in Fig. 5.10(b)] with the median filter removed most of these points, leaving only few, barely visible
noise points. These were removed with a third pass of the filter. These results are good examples of the
power of median filtering in handling impulse-like additive noise. Keep in mind that repeated passes
of a median filter will blur the image, so it is desirable to keep the number of passes as low as possible.
Figure 5.11(a) shows the result of applying the max filter to the pepper noise image of Fig. 5.8(a). The
filter did a reasonable job of removing the pepper noise, but we note that it also removed (set to a light
intensity level) some dark pixels from the borders of the dark objects. Figure 5.11(b) shows the result
of applying the min filter to the image in Fig. 5.8(b). In this case, the min filter did a better job than the
max filter on noise removal, but it removed some white points around the border of light objects. These
made the light objects smaller and some of the dark objects larger (like the connector fingers in the top
of the image) because white points around these objects were set to a dark level.
The alpha-trimmed filter is illustrated next. Figure 5.12(a) shows the circuit board image corrupted
this time by additive, uniform noise of variance 800 and zero mean. This is a high level of noise corrup-
tion that is made worse by further addition of salt-and-pepper noise with Ps Pp 0.1, as Fig. 5.12(b)
shows. The high level of noise in this image warrants use of larger filters. Figures 5.12(c) through (f) show
the results, respectively, obtained using arithmetic mean, geometric mean, median, and alpha-trimmed
mean (with d 6 ) filters of size 5 5. As expected, the arithmetic and geometric mean filters (especially
the latter) did not do well because of the presence of impulse noise. The median and alpha-trimmed
filters performed much better, with the alpha-trimmed filter giving slightly better noise reduction. For
example, note in Fig. 5.12(f) that the fourth connector finger from the top left is slightly smoother in
the alpha-trimmed result. This is not unexpected because, for a high value of d, the alpha-trimmed filter
approaches the performance of the median filter, but still retains some smoothing capabilities.
ADAPTIVE FILTERS
Once selected, the filters discussed thus far are applied to an image without regard
for how image characteristics vary from one point to another. In this section, we
take a look at two adaptive filters whose behavior changes based on statistical char-
acteristics of the image inside the filter region defined by the m n rectangular
neighborhood Sxy . As the following discussion shows, adaptive filters are capable
of performance superior to that of the filters discussed thus far. The price paid for
vtucircle.com
334 Chapter 5 Image Restoration and Reconstruction
c d
FIGURE 5.10
(a) Image
corrupted by salt-
and- pepper noise
with probabilities
Ps Pp 0.1.
(b) Result of one
pass with a medi-
an filter of size
3 3. (c) Result
of processing (b)
with this filter.
(d) Result of
processing (c)
with the same
filter.
a b
FIGURE 5.11
(a) Result of
filtering Fig. 5.8(a)
with a max filter
of size 3 3.
(b) Result of
filtering Fig. 5.8(b)
with a min filter of
the same size.
vtucircle.com
5.3 Restoration in the Presence of Noise Only—Spatial Filtering 335
a b
c d
e f
FIGURE 5.12
(a) Image
corrupted by
additive uniform
noise. (b) Image
additionally
corrupted by
additive salt-and-
pepper noise.
(c)-(f) Image (b)
filtered with a
5 5:
(c) arithmetic
mean filter;
(d) geometric
mean filter;
(e) median filter;
(f) alpha-trimmed
mean filter, with
d 6.
vtucircle.com
336 Chapter 5 Image Restoration and Reconstruction
The only quantity that needs to be known a priori is s2 , the variance of the noise
corrupting image f (x, y). This is a constant that can be estimated from sample noisy
images using Eq. (3-26). The other parameters are computed from the pixels in
neighborhood Sxy using Eqs. (3-27) and (3-28).
An assumption in Eq. (5-32) is that the ratio of the two variances does not exceed 1,
which implies that s2 s2 . The noise in our model is additive and position indepen-
h Sxy
dent, so this is a reasonable assumption to make because Sxy is a subset of g(x, y).
However, we seldom have exact knowledge of s2 . Therefore, it is possible for this
condition to be violated in practice. For that reason, a test should be built 2into 2an
implementation of Eq. (5-32) so that the ratio is set to 1 if the condition s s
h Sxy
occurs. This makes this filter nonlinear. However, it prevents nonsensical results (i.e.,
negative intensity levels, depending on the value of zSxy ) due to a potential lack of
knowledge about the variance of the image noise. Another approach is to allow the
negative values to occur, and then rescale the intensity values at the end. The result
then would be a loss of dynamic range in the image.
vtucircle.com
5.3 Restoration in the Presence of Noise Only—Spatial Filtering 337
c d
FIGURE 5.13
(a) Image
corrupted by
additive
Gaussian noise of
zero mean and a
variance of 1000.
(b) Result of
arithmetic mean
filtering.
(c) Result of
geometric mean
filtering.
(d) Result of
adaptive noise-
reduction filtering.
All filters used
were of size 7 7.
vtucircle.com
338 Chapter 5 Image Restoration and Reconstruction
The preceding results used a value for s2 that matched the variance of the noise exactly. If this
quantity is not known, and the estimate used is too low, the algorithm will return an image that closely
resembles the original because the corrections will be smaller than they should be. Estimates that are
too high will cause the ratio of the variances to be clipped at 1.0, and the algorithm will subtract the
mean from the image more frequently than it would normally. If negative values are allowed and the
image is rescaled at the end, the result will be a loss of dynamic range, as mentioned previously.
The adaptive median-filtering algorithm uses two processing levels, denoted level A
and level B, at each point (x, y) :
where Sxy and Smax are odd, positive integers greater than 1. Another option in the
last step of level A is to output zxy instead of zmed. This produces a slightly less
blurred result, but can fail to detect salt (pepper) noise embedded in a constant
background having the same value as pepper (salt) noise.
vtucircle.com
5.3 Restoration in the Presence of Noise Only—Spatial Filtering 339
This algorithm has three principal objectives: to remove salt-and-pepper (impulse)
noise, to provide smoothing of other noise that may not be impulsive, and to reduce
distortion, such as excessive thinning or thickening of object boundaries. The values
zmin and zmax are considered statistically by the algorithm to be “impulse-like” noise
components in region Sxy , even if these are not the lowest and highest possible pixel
values in the image.
With these observations in mind, we see that the purpose of level A is to deter-
mine if the median filter output, zmed , is an impulse (salt or pepper) or not. If the
condition zmin zmed zmax holds, then zmed cannot be an impulse for the reason
mentioned in the previous paragraph. In this case, we go to level B and test to see
if the point in the center of the neighborhood is itself an impulse (recall that (x, y)
is the location of the point being processed, and zxy is its intensity). If the condition
zmin zxy zmax is true, then the pixel at zxy cannot be the intensity of an impulse for
the same reason that zmed was not. In this case, the algorithm outputs the unchanged
pixel value, zxy . By not changing these “intermediate-level” points, distortion is
reduced in the filtered image. If the condition zmin zxy zmax is false, then either
zxy zmin or zxy zmax . In either case, the value of the pixel is an extreme value and
the algorithm outputs the median value, zmed , which we know from level A is not a
noise impulse. The last step is what the standard median filter does. The problem is
that the standard median filter replaces every point in the image by the median of
the corresponding neighborhood. This causes unnecessary loss of detail.
Continuing with the explanation, suppose that level A does find an impulse (i.e.,
it fails the test that would cause it to branch to level B). The algorithm then increas-
es the size of the neighborhood and repeats level A. This looping continues until
the algorithm either finds a median value that is not an impulse (and branches to
stage B), or the maximum neighborhood size is reached. If the maximum size is
reached, the algorithm returns the value of zmed. Note that there is no guarantee
that this value is not an impulse. The smaller the noise probabilities Pa and/or Pb are,
or the larger Smax is allowed to be, the less likely it is that a premature exit will occur.
This is plausible. As the density of the noise impulses increases, it stands to reason
that we would need a larger window to “clean up” the noise spikes.
Every time the algorithm outputs a value, the center of neighborhood Sxy is
moved to the next location in the image. The algorithm then is reinitialized and
applied to the pixels in the new region encompassed by the neighborhood. As indi-
cated in Problem 3.37, the median value can be updated iteratively from one loca-
tion to the next, thus reducing computational load.
Figure 5.14(a) shows the circuit-board image corrupted by salt-and-pepper noise with probabilities
Ps Pp 0.25, which is 2.5 times the noise level used in Fig. 5.10(a). Here the noise level is high enough
to obscure most of the detail in the image. As a basis for comparison, the image was filtered first using a
7 7 median filter, the smallest filter required to remove most visible traces of impulse noise in this case.
Figure 5.14(b) shows the result. Although the noise was effectively removed, the filter caused significant
vtucircle.com
340 Chapter 5 Image Restoration and Reconstruction
FIGURE 5.14 (a) Image corrupted by salt-and-pepper noise with probabilities Ps Pp 0.25. (b) Result of filtering
with a 7 7 median filter. (c) Result of adaptive median filtering with S max 7.
loss of detail in the image. For instance, some of the connector fingers at the top of the image appear
distorted or broken. Other image details are similarly distorted.
Figure 5.14(c) shows the result of using the adaptive median filter with Smax 7. Noise removal
performance was similar to the median filter. However, the adaptive filter did a much better job of pre-
serving sharpness and detail. The connector fingers are less distorted, and some other features that were
either obscured or distorted beyond recognition by the median filter appear sharper and better defined
in Fig. 5.14(c). Two notable examples are the feed-through small white holes throughout the board, and
the dark component with eight legs in the bottom, left quadrant of the image.
Considering the high level of noise in Fig. 5.14(a), the adaptive algorithm performed quite well. The
choice of maximum allowed size for Sxy depends on the application, but a reasonable starting value can
be estimated by experimenting with various sizes of the standard median filter first. This will establish a
visual baseline regarding expectations on the performance of the adaptive algorithm.
vtucircle.com
5.4 Periodic Noise Reduction Using Frequency Domain Filtering 341
HNR (u, v)
k1
Hk (u, v)Hk (u, v) (5-33)
where Hk (u, v) and Hk (u, v) are highpass filter transfer functions whose centers
are at (uk , vk ) and (uk , vk ), respectively.† These centers are specified with respect
to the center of the frequency rectangle, floor( M 2), floor(N 2), where, as usual,
M and N are the number of rows and columns in the input image. Thus, the distance
computations for the filter transfer functions are given by
1/2
D (u, v) (u M 2 u )2 (v N 2 v )2 (5-34)
k k k
and
1/2
D (u, v) (u M 2 u )2 (v N 2 v )2 (5-35)
k k k
For example, the following is a Butterworth notch reject filter transfer function of
order n with three notch pairs:
1 1
H ( , )
3
(5-36)
k1 1 D0k Dk (u, v) 1 D0k Dk (u, v)
n
Because notches are specified as symmetric pairs, the constant D0k is the same for
each pair. However, this constant can be different from one pair to another. Other
notch reject filter functions are constructed in the same manner, depending on the
highpass filter function chosen. As explained in Section 4.10, a notch pass filter
transfer function is obtained from a notch reject function using the expression
where HNP (u, v) is the transfer function of the notch pass filter corresponding to
the notch reject filter with transfer function HNR (u, v). Figure 5.15 shows perspec-
tive plots of the transfer functions of ideal, Gaussian, and Butterworth notch reject
filters with one notch pair. As we discussed in Chapter 4, we see again that the shape
of the Butterworth transfer function represents a transition between the sharpness
of the ideal function and the broad, smooth shape of the Gaussian transfer function.
As we show in the second part of the following example, we are not limited to
notch filter transfer functions of the form just discussed. We can construct notch
† Remember, frequency domain transfer functions are symmetric about the center of the frequency rectangle, so
the notches are specified as symmetric pairs. Also, recall from Section 4.10 that we use unpadded images when
working with notch filters in order to simplify the specification of notch locations.
vtucircle.com
342 Chapter 5 Image Restoration and Reconstruction
FIGURE 5.15 Perspective plots of (a) ideal, (b) Gaussian, and (c) Butterworth notch reject filter transfer functions.
Figure 5.16(a) is the same as Fig. 2.45(a), which we used in Section 2.6 to introduce the concept of filter-
ing in the frequency domain. We now look in more detail at the process of denoising this image, which is
corrupted by a single, 2-D additive sine wave. You know from Table 4.4 that the Fourier transform of a
pure sine wave is a pair of complex, conjugate impulses, so we would expect the spectrum to have a pair
of bright dots at the frequencies of the sine wave. As Fig. 5.16(b) shows, this is indeed is the case. Because
we can determine the location of these impulses accurately, eliminating them is a simple task, consisting
of using a notch filter transfer function whose notches coincide with the location of the impulses.
Figure 5.16(c) shows an ideal notch reject filter transfer function, which is an array of 1's (shown in
white) and two small circular regions of 0's (shown in black). Figure 5.16(d) shows the result of filtering
the noisy image this transfer function. The sinusoidal noise was virtually eliminated, and a number of
details that were previously obscured by the interference are clearly visible in the filtered image (see, for
example, the thin fiducial marks and the fine detail in the terrain and rock formations). As we showed
in Example 4.25, obtaining an image of the interference pattern is straightforward. We simply turn the
reject filter into a pass filter by subtracting it from 1, and filter the input image with it. Figure 5.17 shows
the result.
Figure 5.18(a) shows the same image as Fig. 4.50(a), but covering a larger area (the interference
pattern is the same). When we discussed lowpass filtering of that image in Chapter 4, we indicated that
there were better ways to reduce the effect of the scan lines. The notch filtering approach that follows
reduces the scan lines significantly, without introducing blurring. Unless blurring is desirable for reasons
we discussed in Section 4.9, notch filtering generally gives much better results.
Just by looking at the nearly horizontal lines of the noise pattern in Fig. 5.18(a), we expect its con-
tribution in the frequency domain to be concentrated along the vertical axis of the DFT. However,
the noise is not dominant enough to have a clear pattern along this axis, as is evident in the spectrum
shown in Fig. 5.18(b). The approach to follow in cases like this is to use a narrow, rectangular notch filter
function that extends along the vertical axis, and thus eliminates all components of the interference
along that axis. We do not filter near the origin to avoid eliminating the dc term and low frequencies,
vtucircle.com
5.4 Periodic Noise Reduction Using Frequency Domain Filtering 343
c d
FIGURE 5.16
(a) Image cor-
rupted by sinusoi-
dal interference.
(b) Spectrum
showing the
bursts of energy
caused by the
interference. (The
bursts were
enlarged for
display purposes.)
(c) Notch filter
(the radius of the
circles is 2 pixels)
used to eliminate
the energy bursts.
(The thin borders
are not part of the
data.)
(d) Result of
notch reject
filtering.
(Original
image courtesy of
NASA.)
which, as you know from Chapter 4, are responsible for the intensity differences between smooth areas.
Figure 5.18(c) shows the filter transfer function we used, and Fig. 5.18(d) shows the filtered result. Most
of the fine scan lines were eliminated or significantly attenuated. In order to get an image of the noise
pattern, we proceed as before by converting the reject filter into a pass filter, and then filtering the input
image with it. Figure 5.19 shows the result.
FIGURE 5.17
Sinusoidal
pattern extracted
from the DFT
of Fig. 5.16(a)
using a notch pass
filter.
vtucircle.com
344 Chapter 5 Image Restoration and Reconstruction
c d
FIGURE 5.18
(a) Satellite image
of Florida and the
Gulf of Mexico.
(Note horizontal
sensor scan lines.)
(b) Spectrum of
(a). (c) Notch
reject filter
transfer
function. (The
thin black border
is not part of the
data.) (d) Filtered
image. (Original
image courtesy of
NOAA.)
FIGURE 5.19
Noise pattern
extracted from
Fig. 5.18(a) by
notch pass
filtering.
vtucircle.com
5.4 Periodic Noise Reduction Using Frequency Domain Filtering 345
h(x, y) 1
HNP(u, v)G(u, v) (5-39)
Because the corrupted image is assumed to be formed by the addition of the uncor-
rupted image f (x, y) and the interference, h(x, y), if the latter were known com-
pletely, subtracting the pattern from g(x, y) to obtain f (x, y) would be a simple mat-
ter. The problem, of course, is that this filtering procedure usually yields only an
approximation of the true noise pattern. The effect of incomplete components not
present in the estimate of h(x, y) can be minimized by subtracting from g(x, y) a
weighted portion of h(x, y) to obtain an estimate of f (x, y) :
vtucircle.com
346 Chapter 5 Image Restoration and Reconstruction
1 ^ _
s (x, y)
2
[ f (r, c) f^]2 (5-41)
mn (r,c)Sxy
ˆ
f (r, c)
1
fˆ (5-42)
mn (r, c)Sxy
Points on or near the edge of the image can be treated by considering partial neigh-
borhoods or by padding the border with 0's.
Substituting Eq. (5-40) into Eq. (5-41) we obtain
1 2
s2(x, y)
[g(r, c) w(r, c)h(r, c)] [g wh] (5-43)
mn (r, c)Sxy
where g and wh denote the average values of g and of the product wh in neighbor-
hood Sxy , respectively.
If we assume that w is approximately constant in Sxy we can replace w(r, c) by
the value of w at the center of the neighborhood:
wh w(x, y) h (5-45)
in Sxy , where h is the average value of h in the neighborhood. Using these approxi-
mations, Eq. (5-43) becomes
1 2
s2(x, y)
[g(r, c) w(x, y)h(r, c)] [g w(x, y) h] (5-46)
mn (r, c)Sxy
vtucircle.com
5.4 Periodic Noise Reduction Using Frequency Domain Filtering 347
To minimize s2(x, y) with respect to w(x, y) we solve
s2(x, y) (5-47)
0
w(x, y)
g h gh
w(x, y) (5-48)
h h
2 2
To obtain the value of the restored image at point (x, y) we use this equation to com-
pute w(x, y) and then substitute it into Eq. (5-40). To obtain the complete restored
image, we perform this procedure at every point in the noisy image, g.
Figure 5.20(a) shows a digital image of the Martian terrain taken by the Mariner 6 spacecraft. The image
is corrupted by a semi-periodic interference pattern that is considerably more complex (and much more
subtle) than those we have studied thus far. The Fourier spectrum of the image, shown in Fig. 5.20(b),
has a number of “starlike” bursts of energy caused by the interference. As expected, these components
are more difficult to detect than those we have seen before. Figure 5.21 shows the spectrum again, but
without centering. This image offers a somewhat clearer view of the interference components because
the more prominent dc term and low frequencies are “out of way,” in the top left of the spectrum.
Figure 5.22(a) shows the spectrum components that, in the judgement of an experienced image ana-
lyst, are associated with the interference. Applying a notch pass filter to these components and using
Eq. (5-39) yielded the spatial noise pattern, h(x, y), shown in Fig. 5.22(b). Note the similarity between
this pattern and the structure of the noise in Fig. 5.20(a).
a b
FIGURE 5.20
(a) Image of the
Martian
terrain taken by
Mariner 6.
(b) Fourier
spectrum showing
periodic
interference.
(Courtesy of
NASA.)
vtucircle.com
348 Chapter 5 Image Restoration and Reconstruction
FIGURE 5.21
Uncentered
Fourier spectrum
of the image
in Fig. 5.20(a).
(Courtesy of
NASA.)
Finally, Fig. 5.23 shows the restored image, obtained using Eq. (5-40) with the interference pattern just
discussed. Function w(x, y) was computed using the procedure explained in the preceding paragraphs.
As you can see, the periodic interference was virtually eliminated from the noisy image in Fig. 5.20(a).
For the moment, let us assume that h(x, y) 0 so that g(x, y) $ f (x, y). Based on
the discussion in Section 2.6, $ is linear if
a b
FIGURE 5.22
(a) Fourier spec-
trum of N(u, v),
and
(b) corresponding
spatial noise
interference
pattern, h(x, y).
(Courtesy of
NASA.)
vtucircle.com
Image Segmentation
Preview
The material in the previous chapter began a transition from image processing methods whose inputs
and outputs are images, to methods in which the inputs are images but the outputs are attributes extract-
ed from those images. Most of the segmentation algorithms in this chapter are based on one of two basic
properties of image intensity values: discontinuity and similarity. In the first category, the approach is
to partition an image into regions based on abrupt changes in intensity, such as edges. Approaches in
the second category are based on partitioning an image into regions that are similar according to a set
of predefined criteria. Thresholding, region growing, and region splitting and merging are examples of
methods in this category. We show that improvements in segmentation performance can be achieved
by combining methods from distinct categories, such as techniques in which edge detection is combined
with thresholding. We discuss also image segmentation using clustering and superpixels, and give an
introduction to graph cuts, an approach ideally suited for extracting the principal regions of an image.
This is followed by a discussion of image segmentation based on morphology, an approach that com-
bines several of the attributes of segmentation based on the techniques presented in the first part of the
chapter. We conclude the chapter with a brief discussion on the use of motion cues for segmentation.
699
vtucircle.com
10.1 FUNDAMENTALS
Let R represent the entire spatial region occupied by an image. We may view image
segmentation as a process that partitions R into n subregions, R1, R2 , …, Rn , such
that
n
(a)
∪ Ri R.
i 1
(b) Ri is a connected set, for i 0, 1, 2, … , n.
(c) Ri ∩Rj for all i and j, i j.
(d) QRi TRUE for i 0, 1, 2, … , n.
(e) Q Ri URj FALSE for any adjacent regions R i and Rj .
where QRk is a logical predicate defined over the points in set Rk , and is the
null set. The symbols U and W represent set union and intersection, respectively, as
defined in Section 2.6. Two regions Ri and Rj are said to be adjacent if their union
forms a connected set, as defined in Section 2.5. If the set formed by the union of two
regions is not connected, the regions are said to disjoint.
Condition (a) indicates that the segmentation must be complete, in the sense that
every pixel must be in a region. Condition (b) requires that points in a region be con-
nected in some predefined sense (e.g., the points must be 8-connected). Condition
(c) says that the regions must be disjoint. Condition (d) deals with the properties
that must be satisfied by the pixels in a segmented region—for example, QRi
TRUE if all pixels in Ri have the same intensity. Finally, condition (e) indicates
that two adjacent regions Ri and Rj must be different in the sense of predicate Q.†
Thus, we see that the fundamental problem in segmentation is to partition an
image into regions that satisfy the preceding conditions. Segmentation algorithms
for monochrome images generally are based on one of two basic categories dealing
with properties of intensity values: discontinuity and similarity. In the first category,
we assume that boundaries of regions are sufficiently different from each other, and
from the background, to allow boundary detection based on local discontinuities in
intensity. Edge-based segmentation is the principal approach used in this category.
Region-based segmentation approaches in the second category are based on parti-
tioning an image into regions that are similar according to a set of predefined criteria.
Figure 10.1 illustrates the preceding concepts. Figure 10.1(a) shows an image of a
region of constant intensity superimposed on a darker background, also of constant
intensity. These two regions comprise the overall image. Figure 10.1(b) shows the
result of computing the boundary of the inner region based on intensity discontinui-
ties. Points on the inside and outside of the boundary are black (zero) because there
are no discontinuities in intensity in those regions. To segment the image, we assign
one level (say, white) to the pixels on or inside the boundary, and another level (e.g.,
black) to all points exterior to the boundary. Figure 10.1(c) shows the result of such
a procedure. We see that conditions (a) through (c) stated at the beginning of this
†
In general, Q can be a compound expression such as, “Q Ri TRUE if the average intensity of the pixels in
region Ri is less than mi AND if the standard deviation of their intensity is greater than si,” where mi and si
are specified constants.
vtucircle.com
10.2 Point, Line, and Edge Detection 701
a b c
de f
FIGURE 10.1
(a) Image of a
constant intensity
region.
(b) Boundary
based on intensity
discontinuities.
(c) Result of
segmentation.
(d) Image of a
texture region.
(e) Result of
intensity discon-
tinuity computa-
tions (note the
large number of
small edges).
(f) Result of
segmentation
based on region
properties.
section are satisfied by this result. The predicate of condition (d) is: If a pixel is on,
or inside the boundary, label it white; otherwise, label it black. We see that this predi-
cate is TRUE for the points labeled black or white in Fig. 10.1(c). Similarly, the two
segmented regions (object and background) satisfy condition (e).
The next three images illustrate region-based segmentation. Figure 10.1(d) is
similar to Fig. 10.1(a), but the intensities of the inner region form a textured pattern.
Figure 10.1(e) shows the result of computing intensity discontinuities in this image.
The numerous spurious changes in intensity make it difficult to identify a unique
boundary for the original image because many of the nonzero intensity changes are
connected to the boundary, so edge-based segmentation is not a suitable approach.
However, we note that the outer region is constant, so all we need to solve this seg-
mentation problem is a predicate that differentiates between textured and constant
regions. The standard deviation of pixel values is a measure that accomplishes this
because it is nonzero in areas of the texture region, and zero otherwise. Figure 10.1(f)
shows the result of dividing the original image into subregions of size 8 8. Each
subregion was then labeled white if the standard deviation of its pixels was posi-
tive (i.e., if the predicate was TRUE), and zero otherwise. The result has a “blocky”
appearance around the edge of the region because groups of 8 8 squares were
labeled with the same intensity (smaller squares would have given a smoother
region boundary). Finally, note that these results also satisfy the five segmentation
conditions stated at the beginning of this section.
vtucircle.com
we are interested are isolated points, lines, and edges. Edge pixels are pixels at which
the intensity of an image changes abruptly, and edges (or edge segments) are sets of
When we refer to lines,
we are referring to thin connected edge pixels (see Section 2.5 regarding connectivity). Edge detectors are
structures, typically just local image processing tools designed to detect edge pixels. A line may be viewed as
a few pixels thick. Such
lines may correspond, for a (typically) thin edge segment in which the intensity of the background on either
example, to elements of side of the line is either much higher or much lower than the intensity of the line
a digitized architectural
drawing, or roads in a pixels. In fact, as we will discuss later, lines give rise to so-called “roof edges.” Finally,
satellite image. an isolated point may be viewed as a foreground (background) pixel surrounded by
background (foreground) pixels.
BACKGROUND
As we saw in Section 3.5, local averaging smoothes an image. Given that averaging
is analogous to integration, it is intuitive that abrupt, local changes in intensity can
be detected using derivatives. For reasons that will become evident shortly, first- and
second-order derivatives are particularly well suited for this purpose.
Derivatives of a digital function are defined in terms of finite differences. There
are various ways to compute these differences but, as explained in Section 3.6, we
require that any approximation used for first derivatives (1) must be zero in areas
of constant intensity; (2) must be nonzero at the onset of an intensity step or ramp;
and (3) must be nonzero at points along an intensity ramp. Similarly, we require that
an approximation used for second derivatives (1) must be zero in areas of constant
intensity; (2) must be nonzero at the onset and end of an intensity step or ramp; and
(3) must be zero along intensity ramps. Because we are dealing with digital quanti-
ties whose values are finite, the maximum possible intensity change is also finite, and
the shortest distance over which a change can occur is between adjacent pixels.
We obtain an approximation to the first-order derivative at an arbitrary point x of
a one-dimensional function f (x) by expanding the function f (x Δx) into a Taylor
series about x
2
3
Remember, the notation x 2! x 3! x
n! means “n factorial”: (10-1)
n! = 1×2×· · ·× n.
∞
Δxn n f (x)
n 0 n! xn
where Δx is the separation between samples of f. For our purposes, this separation
is measured in pixel units. Thus, following the convention in the book, Δx 1 for
the sample preceding x and Δx 1 for the sample following x. When Δx 1, Eq.
(10-1) becomes
Although this is an
f (x 1) f (x) f (x) 1 f (x) 1 f (x)
2 3
expression of only one
2 3
variable, we used partial
derivatives notation for
x 2! x 3! x
(10-2)
consistency when we
discuss functions of two
∞ 1 n f (x)
variables later in this n 0 n! x
section.
vtucircle.com
10.2 Point, Line, and Edge Detection 703
In what follows, we compute intensity differences using just a few terms of the Taylor
series. For first-order derivatives we use only the linear terms, and we can form dif-
ferences in one of three ways.
The forward difference is obtained from Eq. (10-2):
f (x)
f '(x) f (x 1) f (x) (10-4)
x
where, as you can see, we kept only the linear terms. The backward difference is simi-
larly obtained by keeping only the linear terms in Eq. (10-3):
f (x)
f '(x) f (x) f (x 1) (10-5)
x
and the central difference is obtained by subtracting Eq. (10-3) from Eq. (10-2):
f (x) f (x 1) f (x 1)
f '(x) (10-6)
x 2
The higher terms of the series that we did not use represent the error between an
exact and an approximate derivative expansion. In general, the more terms we use
from the Taylor series to represent a derivative, the more accurate the approxima-
tion will be. To include more terms implies that more points are used in the approxi-
mation, yielding a lower error. However, it turns out that central differences have
a lower error for the same number of points (see Problem 10.1). For this reason,
derivatives are usually expressed as central differences.
The second order derivative based on a central difference, 2 f (x) x2 , is obtained
by adding Eqs. (10-2) and (10-3):
2 f (x)
(10-7)
f ''(x) f (x 1) 2 f (x) f (x 1)
x 2
To obtain the third order, central derivative we need one more point on either side
of x. That is, we need the Taylor expansions for f (x 2) and f (x 2), which we
obtain from Eqs. (10-2) and (10-3) with Δx 2 and Δx 2, respectively. The strat-
egy is to combine the two Taylor expansions to eliminate all derivatives lower than
the third. The result after ignoring all higher-order terms [see Problem 10.2(a)] is
vtucircle.com
3 f (x) f (x 2) 2 f (x 1) 0 f (x) 2 f (x 1) f (x 2) (10-8)
f '''(x)
x 3
2
Similarly [see Problem 10.2(b)], the fourth finite difference (the highest we use in
the book) after ignoring all higher order terms is given by
4 f (x)
(10-9)
f ''''(x) f (x 2) 4 f (x 1) 6 f (x) 4 f (x 1) f (x 2)
x 4
Table 10.1 summarizes the first four central derivatives just discussed. Note the
symmetry of the coefficients about the center point. This symmetry is at the root
of why central differences have a lower approximation error for the same number
of points than the other two differences. For two variables, we apply the results in
Table 10.1 to each variable independently. For example,
2 f x, y
f x 1, y 2 f x, y f x 1, y (10-10)
x2
and
2 f x, y
f x, y 1 2 f x, y f x, y 1
(10-11)
y2
It is easily verified that the first and second-order derivatives in Eqs. (10-4)
through (10-7) satisfy the conditions stated at the beginning of this section regarding
derivatives of the first and second order. To illustrate this, consider Fig. 10.2. Part (a)
shows an image of various objects, a line, and an isolated point. Figure 10.2(b) shows
a horizontal intensity profile (scan line) through the center of the image, including
the isolated point. Transitions in intensity between the solid objects and the back-
ground along the scan line show two types of edges: ramp edges (on the left) and
step edges (on the right). As we will discuss later, intensity transitions involving thin
objects such as lines often are referred to as roof edges.
Figure 10.2(c) shows a simplified profile, with just enough points to make it possi-
ble for us to analyze manually how the first- and second-order derivatives behave as
they encounter a point, a line, and the edges of objects. In this diagram the transition
TABLE 10.1
vtucircle.com
10.2 Point, Line, and Edge Detection 705
a b
c
FIGURE 10.2
(a) Image.
(b) Horizontal
intensity profile
that includes the
isolated point
indicated by the
arrow.
(c) Subsampled
profile; the dashes
were added
for clarity. The
numbers in the
boxes are the
intensity values
of the dots shown
in the profile. The
derivatives were 7
obtained using 6
Eqs. (10-4) for the 5
first derivative 4
3
and Eq. (10-7) for
2
the second.
1
0
Intensity values 5 5 4 3 2 1 0 0 0 6 0 0 0 0 1 3 1 0 0 0 0 7 7 7 7
in the ramp spans four pixels, the noise point is a single pixel, the line is three pixels
thick, and the transition of the step edge takes place between adjacent pixels. The
number of intensity levels was limited to eight for simplicity.
Consider the properties of the first and second derivatives as we traverse the
profile from left to right. Initially, the first-order derivative is nonzero at the onset
and along the entire intensity ramp, while the second-order derivative is nonzero
only at the onset and end of the ramp. Because the edges of digital images resemble
this type of transition, we conclude that first-order derivatives produce “thick” edges,
and second-order derivatives much thinner ones. Next we encounter the isolated
noise point. Here, the magnitude of the response at the point is much stronger for
the second- than for the first-order derivative. This is not unexpected, because a
second-order derivative is much more aggressive than a first-order derivative in
enhancing sharp changes. Thus, we can expect second-order derivatives to enhance
fine detail (including noise) much more than first-order derivatives. The line in this
example is rather thin, so it too is fine detail, and we see again that the second deriva-
tive has a larger magnitude. Finally, note in both the ramp and step edges that the
vtucircle.com
FIGURE 10.3
A general 3 3
spatial filter
kernel. The w’s
are the kernel
coefficients
(weights).
2 f 2 f
2 f (x, y) (10-13)
x2 y2
vtucircle.com
10.2 Point, Line, and Edge Detection 707
where the partial derivatives are computed using the second-order finite differences
in Eqs. (10-10) and (10-11). The Laplacian is then
As explained in Section 3.6, this expression can be implemented using the Lapla-
cian kernel in Fig. 10.4(a) in Example 10.1. We then we say that a point has been
detected at a location (x, y) on which the kernel is centered if the absolute value of
the response of the filter at that point exceeds a specified threshold. Such points are
labeled 1 and all others are labeled 0 in the output image, thus producing a binary
image. In other words, we use the expression:
1
g(x, y) if Z(x, y) T (10-15)
0 otherwise
where g(x, y) is the output image, T is a nonnegative threshold, and Z is given by
Eq. (10-12). This formulation simply measures the weighted differences between a
pixel and its 8-neighbors. Intuitively, the idea is that the intensity of an isolated point
will be quite different from its surroundings, and thus will be easily detectable by
this type of kernel. Differences in intensity that are considered of interest are those
large enough (as determined by T) to be considered isolated points. Note that, as
usual for a derivative kernel, the coefficients sum to zero, indicating that the filter
response will be zero in areas of constant intensity.
Figure 10.4(b) is an X-ray image of a turbine blade from a jet engine. The blade has a porosity mani-
fested by a single black pixel in the upper-right quadrant of the image. Figure 10.4(c) is the result of fil-
tering the image with the Laplacian kernel, and Fig. 10.4(d) shows the result of Eq. (10-15) with T equal
to 90% of the highest absolute pixel value of the image in Fig. 10.4(c). The single pixel is clearly visible
in this image at the tip of the arrow (the pixel was enlarged to enhance its visibility). This type of detec-
tion process is specialized because it is based on abrupt intensity changes at single-pixel locations that
are surrounded by a homogeneous background in the area of the detector kernel. When this condition
is not satisfied, other methods discussed in this chapter are more suitable for detecting intensity changes.
LINE DETECTION
The next level of complexity is line detection. Based on the discussion earlier in this
section, we know that for line detection we can expect second derivatives to result
in a stronger filter response, and to produce thinner lines than first derivatives. Thus,
we can use the Laplacian kernel in Fig. 10.4(a) for line detection also, keeping in
mind that the double-line effect of the second derivative must be handled properly.
The following example illustrates the procedure.
vtucircle.com
====
a
bc d 1 1 1
FIGURE 10.4
(a) Laplacian ker-
nel used for point 1 1
8
detection.
(b) X-ray image
of a turbine blade
with a porosity 1 1 1
manifested by a
single black pixel.
(c) Result of con-
volving the kernel
with the image.
(d) Result of
using Eq. (10-15)
was a single point
(shown enlarged
at the tip of the
arrow). (Original
image courtesy of
X-TEK Systems,
Ltd.)
Figure 10.5(a) shows a 486 486 (binary) portion of a wire-bond mask for an electronic circuit, and
Fig. 10.5(b) shows its Laplacian image. Because the Laplacian image contains negative values (see the
discussion after Example 3.18), scaling is necessary for display. As the magnified section shows, mid gray
represents zero, darker shades of gray represent negative values, and lighter shades are positive. The
double-line effect is clearly visible in the magnified region.
At first, it might appear that the negative values can be handled simply by taking the absolute value
of the Laplacian image. However, as Fig. 10.5(c) shows, this approach doubles the thickness of the lines.
A more suitable approach is to use only the positive values of the Laplacian (in noisy situations we use
the values that exceed a positive threshold to eliminate random variations about zero caused by the
noise). As Fig. 10.5(d) shows, this approach results in thinner lines that generally are more useful. Note
in Figs. 10.5(b) through (d) that when the lines are wide with respect to the size of the Laplacian kernel,
the lines are separated by a zero “valley.” This is not unexpected. For example, when the 3 3 kernel is
centered on a line of constant intensity 5 pixels wide, the response will be zero, thus producing the effect
just mentioned. When we talk about line detection, the assumption is that lines are thin with respect to
the size of the detector. Lines that do not satisfy this assumption are best treated as regions and handled
by the edge detection methods discussed in the following section.
The Laplacian detector kernel in Fig. 10.4(a) is isotropic, so its response is inde-
pendent of direction (with respect to the four directions of the 3 3 kernel: verti-
cal, horizontal, and two diagonals). Often, interest lies in detecting lines in specified
vtucircle.com
10.2 Point, Line, and Edge Detection 709
c d
FIGURE 10.5
(a) Original
image.
(b) Laplacian
image; the
magnified
section shows the
positive/negative
double-line effect
characteristic of
the Laplacian.
(c) Absolute value
of the Laplacian.
(d) Positive values
of the Laplacian.
directions. Consider the kernels in Fig. 10.6. Suppose that an image with a constant
background and containing various lines (oriented at 0°, 45, and 90°) is filtered
with the first kernel. The maximum responses would occur at image locations in
which a horizontal line passes through the middle row of the kernel. This is easily
verified by sketching a simple array of 1’s with a line of a different intensity (say, 5s)
running horizontally through the array. A similar experiment would reveal that the
second kernel in Fig. 10.6 responds best to lines oriented at 45; the third kernel
to vertical lines; and the fourth kernel to lines in the 45 direction. The preferred
direction of each kernel is weighted with a larger coefficient (i.e., 2) than other possi-
ble directions. The coefficients in each kernel sum to zero, indicating a zero response
in areas of constant intensity.
Let Z1 , Z2 , Z3 , and Z4 denote the responses of the kernels in Fig. 10.6, from left
to right, where the Zs are given by Eq. (10-12). Suppose that an image is filtered
with these four kernels, one at a time. If, at a given point in the image, Zk Zj ,
for all j k, that point is said to be more likely associated with a line in the direc-
tion of kernel k. For example, if at a point in the image, Z1 Zj for j 2, 3, 4, that
vtucircle.com
1 1 1 2 1 1 1 2 1 1 1 2
2 2 2 1 2 1 1 2 1 1 2 1
1 1 1 1 1 2 1 2 1 2 1 1
FIGURE 10.6 Line detection kernels. Detection angles are with respect to the axis system in Fig. 2.19, with positive
angles measured counterclockwise with respect to the (vertical) x-axis.
point is said to be more likely associated with a horizontal line. If we are interested
in detecting all the lines in an image in the direction defined by a given kernel, we
simply run the kernel through the image and threshold the absolute value of the
result, as in Eq. (10-15). The nonzero points remaining after thresholding are the
strongest responses which, for lines one pixel thick, correspond closest to the direc-
tion defined by the kernel. The following example illustrates this procedure.
Figure 10.7(a) shows the image used in the previous example. Suppose that we are interested in find-
ing all the lines that are one pixel thick and oriented at 45. For this purpose, we use the kernel in
Fig. 10.6(b). Figure 10.7(b) is the result of filtering the image with that kernel. As before, the shades
darker than the gray background in Fig. 10.7(b) correspond to negative values. There are two principal
segments in the image oriented in the 45 direction, one in the top left and one at the bottom right. Fig-
ures 10.7(c) and (d) show zoomed sections of Fig. 10.7(b) corresponding to these two areas. The straight
line segment in Fig. 10.7(d) is brighter than the segment in Fig. 10.7(c) because the line segment in the
bottom right of Fig. 10.7(a) is one pixel thick, while the one at the top left is not. The kernel is “tuned”
to detect one-pixel-thick lines in the 45 direction, so we expect its response to be stronger when such
lines are detected. Figure 10.7(e) shows the positive values of Fig. 10.7(b). Because we are interested in
the strongest response, we let T equal 254 (the maximum value in Fig. 10.7(e) minus one). Figure 10.7(f)
shows in white the points whose values satisfied the condition g T, where g is the image in Fig. 10.7(e).
The isolated points in the figure are points that also had similarly strong responses to the kernel. In the
original image, these points and their immediate neighbors are oriented in such a way that the kernel
produced a maximum response at those locations. These isolated points can be detected using the kernel
in Fig. 10.4(a) and then deleted, or they can be deleted using morphological operators, as discussed in the
last chapter.
EDGE MODELS
Edge detection is an approach used frequently for segmenting images based on
abrupt (local) changes in intensity. We begin by introducing several ways to model
edges and then discuss a number of approaches for edge detection.
vtucircle.com
10.2 Point, Line, and Edge Detection 711
a b c
de f
FIGURE 10.7 (a) Image of a wire-bond template. (b) Result of processing with the 45 line detector kernel in Fig.
10.6. (c) Zoomed view of the top left region of (b). (d) Zoomed view of the bottom right region of (b). (e) The image
in (b) with all negative values set to zero. (f) All points (in white) whose values satisfied the condition g T, where
g is the image in (e) and T 254 (the maximum pixel value in the image minus 1). (The points in (f) were enlarged
to make them easier to see.)
Edge models are classified according to their intensity profiles. A step edge is
characterized by a transition between two intensity levels occurring ideally over the
distance of one pixel. Figure 10.8(a) shows a section of a vertical step edge and
a horizontal intensity profile through the edge. Step edges occur, for example, in
images generated by a computer for use in areas such as solid modeling and ani-
mation. These clean, ideal edges can occur over the distance of one pixel, provided
that no additional processing (such as smoothing) is used to make them look “real.”
Digital step edges are used frequently as edge models in algorithm development.
For example, the Canny edge detection algorithm discussed later in this section was
derived originally using a step-edge model.
In practice, digital images have edges that are blurred and noisy, with the degree
of blurring determined principally by limitations in the focusing mechanism (e.g.,
lenses in the case of optical images), and the noise level determined principally by
the electronic components of the imaging system. In such situations, edges are more
vtucircle.com
FIGURE 10.8
From left to right,
models (ideal
representations) of
a step, a ramp, and
a roof edge, and
their corresponding
intensity profiles.
closely modeled as having an intensity ramp profile, such as the edge in Fig. 10.8(b).
The slope of the ramp is inversely proportional to the degree to which the edge is
blurred. In this model, we no longer have a single “edge point” along the profile.
Instead, an edge point now is any point contained in the ramp, and an edge segment
would then be a set of such points that are connected.
A third type of edge is the so-called roof edge, having the characteristics illus-
trated in Fig. 10.8(c). Roof edges are models of lines through a region, with the
base (width) of the edge being determined by the thickness and sharpness of the
line. In the limit, when its base is one pixel wide, a roof edge is nothing more than
a one-pixel-thick line running through a region in an image. Roof edges arise, for
example, in range imaging, when thin objects (such as pipes) are closer to the sensor
than the background (such as walls). The pipes appear brighter and thus create an
image similar to the model in Fig. 10.8(c). Other areas in which roof edges appear
routinely are in the digitization of line drawings and also in satellite images, where
thin features, such as roads, can be modeled by this type of edge.
It is not unusual to find images that contain all three types of edges. Although
blurring and noise result in deviations from the ideal shapes, edges in images that
are reasonably sharp and have a moderate amount of noise do resemble the charac-
teristics of the edge models in Fig. 10.8, as the profiles in Fig. 10.9 illustrate. What the
models in Fig. 10.8 allow us to do is write mathematical expressions for edges in the
development of image processing algorithms. The performance of these algorithms
will depend on the differences between actual edges and the models used in devel-
oping the algorithms.
Figure 10.10(a) shows the image from which the segment in Fig. 10.8(b) was extract-
ed. Figure 10.10(b) shows a horizontal intensity profile. This figure shows also the first
and second derivatives of the intensity profile. Moving from left to right along the
intensity profile, we note that the first derivative is positive at the onset of the ramp
and at points on the ramp, and it is zero in areas of constant intensity. The second
derivative is positive at the beginning of the ramp, negative at the end of the ramp,
zero at points on the ramp, and zero at points of constant intensity. The signs of the
derivatives just discussed would be reversed for an edge that transitions from light to
dark. The intersection between the zero intensity axis and a line extending between
the extrema of the second derivative marks a point called the zero crossing of the
second derivative.
We conclude from these observations that the magnitude of the first derivative
can be used to detect the presence of an edge at a point in an image. Similarly, the
sign of the second derivative can be used to determine whether an edge pixel lies on
vtucircle.com
10.2 Point, Line, and Edge Detection 713
FIGURE 10.9 A 1508 1970 image showing (zoomed) actual ramp (bottom, left), step (top,
right), and roof edge profiles. The profiles are from dark to light, in the areas enclosed by the
small circles. The ramp and step profiles span 9 pixels and 2 pixels, respectively. The base of the
roof edge is 3 pixels. (Original image courtesy of Dr. David R. Pickens, Vanderbilt University.)
the dark or light side of an edge. Two additional properties of the second derivative
around an edge are: (1) it produces two values for every edge in an image; and (2)
its zero crossings can be used for locating the centers of thick edges, as we will show
later in this section. Some edge models utilize a smooth transition into and out of
a b
FIGURE 10.10
(a) Two regions of
constant
intensity
separated by an
ideal ramp edge.
(b) Detail near
the edge, showing
a horizontal
intensity profile, derivative
and its first and
second
derivatives.
derivative
vtucircle.com
the ramp (see Problem 10.9). However, the conclusions reached using those models
are the same as with an ideal ramp, and working with the latter simplifies theoretical
formulations. Finally, although attention thus far has been limited to a 1-D horizon-
tal profile, a similar argument applies to an edge of any orientation in an image. We
simply define a profile perpendicular to the edge direction at any desired point, and
interpret the results in the same manner as for the vertical edge just discussed.
The edge models in Fig. 10.8 are free of noise. The image segments in the first column in Fig. 10.11 show
close-ups of four ramp edges that transition from a black region on the left to a white region on the right
(keep in mind that the entire transition from black to white is a single edge). The image segment at the
top left is free of noise. The other three images in the first column are corrupted by additive Gaussian
noise with zero mean and standard deviation of 0.1, 1.0, and 10.0 intensity levels, respectively. The graph
below each image is a horizontal intensity profile passing through the center of the image. All images
have 8 bits of intensity resolution, with 0 and 255 representing black and white, respectively.
Consider the image at the top of the center column. As discussed in connection with Fig. 10.10(b), the
derivative of the scan line on the left is zero in the constant areas. These are the two black bands shown
in the derivative image. The derivatives at points on the ramp are constant and equal to the slope of the
ramp. These constant values in the derivative image are shown in gray. As we move down the center col-
umn, the derivatives become increasingly different from the noiseless case. In fact, it would be difficult
to associate the last profile in the center column with the first derivative of a ramp edge. What makes
these results interesting is that the noise is almost visually undetectable in the images on the left column.
These examples are good illustrations of the sensitivity of derivatives to noise.
As expected, the second derivative is even more sensitive to noise. The second derivative of the noise-
less image is shown at the top of the right column. The thin white and black vertical lines are the positive
and negative components of the second derivative, as explained in Fig. 10.10. The gray in these images
represents zero (as discussed earlier, scaling causes zero to show as gray). The only noisy second deriva-
tive image that barely resembles the noiseless case corresponds to noise with a standard deviation of 0.1.
The remaining second-derivative images and profiles clearly illustrate that it would be difficult indeed to
detect their positive and negative components, which are the truly useful features of the second deriva-
tive in terms of edge detection.
The fact that such little visual noise can have such a significant impact on the two key derivatives
used for detecting edges is an important issue to keep in mind. In particular, image smoothing should be
a serious consideration prior to the use of derivatives in applications where noise with levels similar to
those we have just discussed is likely to be present.
In summary, the three steps performed typically for edge detection are:
1. Image smoothing for noise reduction. The need for this step is illustrated by the
results in the second and third columns of Fig. 10.11.
2. Detection of edge points. As mentioned earlier, this is a local operation that
extracts from an image all points that are potential edge-point candidates.
3. Edge localization. The objective of this step is to select from the candidate
points only the points that are members of the set of points comprising an edge.
The remainder of this section deals with techniques for achieving these objectives.
vtucircle.com
10.2 Point, Line, and Edge Detection 715
FIGURE 10.11 First column: 8-bit images with values in the range [0, 255], and intensity profiles
of a ramp edge corrupted by Gaussian noise of zero mean and standard deviations of 0.0, 0.1,
1.0, and 10.0 intensity levels, respectively. Second column: First-derivative images and inten-
sity profiles. Third column: Second-derivative images and intensity profiles.
vtucircle.com
BASIC EDGE DETECTION
As illustrated in the preceding discussion, detecting changes in intensity for the pur-
pose of finding edges can be accomplished using first- or second-order derivatives.
We begin with first-order derivatives, and work with second-order derivatives in the
following subsection.
This vector has the well-known property that it points in the direction of maximum
rate of change of f at (x, y) (see Problem 10.10). Equation (10-16) is valid at an
arbitrary (but single) point (x, y). When evaluated for all applicable values of x
and y, ∇f (x, y) becomes a vector image, each element of which is a vector given by
Eq. (10-16). The magnitude, M(x, y), of this gradient vector at a point (x, y) is given
by its Euclidean vector norm:
This is the value of the rate of change in the direction of the gradient vector at point
(x, y). Note that M(x, y), ∇f (x, y) , gx (x, y), and gy (x, y) are arrays of the same
size as f, created when x and y are allowed to vary over all pixel locations in f. It is
common practice to refer to M(x, y) and ∇f (x, y) as the gradient image, or simply
as the gradient when the meaning is clear. The summation, square, and square root
operations are elementwise operations, as defined in Section 2.6.
The direction of the gradient vector at a point (x, y) is given by
gy (x, y)
a(x, y) tan1 g (x, y) (10-18)
x
Angles are measured in the counterclockwise direction with respect to the x-axis
(see Fig. 2.19). This is also an image of the same size as f, created by the elementwise
division of gx and gy over all applicable values of x and y. The following example
illustrates, the direction of an edge at a point (x, y) is orthogonal to the direction,
a(x, y), of the gradient vector at the point.
Figure 10.12(a) shows a zoomed section of an image containing a straight edge segment. Each square
corresponds to a pixel, and we are interested in obtaining the strength and direction of the edge at the
point highlighted with a box. The shaded pixels in this figure are assumed to have value 0, and the white
vtucircle.com
10.2 Point, Line, and Edge Detection 717
y
Origin
a a
a 90°
direction
FIGURE 10.12 Using the gradient to determine edge strength and direction at a point. Note that the edge direction
is perpendicular to the direction of the gradient vector at the point where the gradient is computed. Each square
represents one pixel. (Recall from Fig. 2.19 that the origin of our coordinate system is at the top, left.)
pixels have value 1. We discuss after this example an approach for computing the derivatives in the x-
and y-directions using a 3 3 neighborhood centered at a point. The method consists of subtracting the
pixels in the top row of the neighborhood from the pixels in the bottom row to obtain the partial deriva-
tive in the x-direction. Similarly, we subtract the pixels in the left column from the pixels in the right col-
umn of the neighborhood to obtain the partial derivative in the y-direction. It then follows, using these
differences as our estimates of the partials, that f x 2 and f y 2 at the point in question. Then,
f
g x x 2
f
g y f 2
y
from which we obtain ∇f 2 2 at that point. Similarly, the direction of the gradient vector at the
same point follows from Eq. (10-18): a tan1 gy gx 45, which is the same as 135° measured in
the positive (counterclockwise) direction with respect to the x-axis in our image coordinate system (see
Fig. 2.19). Figure 10.12(b) shows the gradient vector and its direction angle.
As mentioned earlier, the direction of an edge at a point is orthogonal to the gradient vector at that
point. So the direction angle of the edge in this example is a 90 135 90 45, as Fig. 10.12(c)
shows. All edge points in Fig. 10.12(a) have the same gradient, so the entire edge segment is in the same
direction. The gradient vector sometimes is called the edge normal. When the vector is normalized to unit
length by dividing it by its magnitude, the resulting vector is referred to as the edge unit normal.
Gradient Operators
Obtaining the gradient of an image requires computing the partial derivatives f x
and f y at every pixel location in the image. For the gradient, we typically use a
forward or centered finite difference (see Table 10.1). Using forward differences we
obtain
f (x, y)
g (x, y) f (x 1, y) f (x, y) (10-19)
x
x
vtucircle.com
a b
1 1 1
FIGURE 10.13
1-D kernels used to
implement Eqs. 1
(10-19) and (10-20).
and
f (x, y)
g (x, y) f (x, y 1) f (x, y) (10-20)
y
y
These two equations can be implemented for all values of x and y by filtering f (x, y)
with the 1-D kernels in Fig. 10.13.
When diagonal edge direction is of interest, we need 2-D kernels. The Roberts
Filter kernels used to
cross-gradient operators (Roberts [1965]) are one of the earliest attempts to use 2-D
compute the derivatives kernels with a diagonal preference. Consider the 3 3 region in Fig. 10.14(a). The
needed for the gradient
are often called gradient
Roberts operators are based on implementing the diagonal differences
operators, difference f
operators, edge operators, gx (z9 z5 ) (10-21)
or edge detectors. x
and
f (z z ) (10-22)
gy
y 8 6
These derivatives can be implemented by filtering an image with the kernels shown
in Figs. 10.14(b) and (c).
Kernels of size 2 2 are simple conceptually, but they are not as useful for com-
puting edge direction as kernels that are symmetric about their centers, the smallest
of which are of size 3 3. These kernels take into account the nature of the data on
opposite sides of the center point, and thus carry more information regarding the
direction of an edge. The simplest digital approximations to the partial derivatives
using kernels of size 3 3 are given by
Observe that these two
f
g (z z z ) (z z z )
equations are first-order
central differences as
given in Eq. (10-6), but
x
x 7 8 9 1 2 3
multiplied by 2. and (10-23)
f
g (z z z ) (z z z )
y
y 3 6 9 1 4 7
In this formulation, the difference between the third and first rows of the 3 3 region
approximates the derivative in the x-direction, and the difference between the third
and first columns approximate the derivative in the y-direction. Intuitively, we would
expect these approximations to be more accurate than the approximations obtained
using the Roberts operators. Equations (10-22) and (10-23) can be implemented over
an entire image by filtering it with the two kernels in Figs. 10.14(d) and (e). These
kernels are called the Prewitt operators (Prewitt [1970]).
A slight variation of the preceding two equations uses a weight of 2 in the center
coefficient:
vtucircle.com
10.2 Point, Line, and Edge Detection 719
a
b c z1 z2 z3
d e
f g z4 z5 z6
FIGURE 10.14
A 3 3 region
of an image (the z7 z8 z9
z’s are intensity
values), and
various kernels 1 0 0 1
used to compute
the gradient at the 0 1 1 0
point labeled z5.
Roberts
1 1 1 1 0 1
0 0 0 1 0 1
1 1 1 1 0 1
Prewitt
1 2 1 1 0 1
0 0 0 2 0 2
1 2 1 1 0 1
Sobel
f (z 2z z ) (z 2z z ) (10-24)
gx
x
7 8 9 1 2 3
and
f (z 2z z ) (z 2z z ) (10-25)
gy
y
3 6 9 1 4 7
It can be demonstrated (see Problem 10.12) that using a 2 in the center location pro-
vides image smoothing. Figures 10.14(f) and (g) show the kernels used to implement
Eqs. (10-24) and (10-25). These kernels are called the Sobel operators (Sobel [1970]).
The Prewitt kernels are simpler to implement than the Sobel kernels, but the
slight computational difference between them typically is not an issue. The fact
that the Sobel kernels have better noise-suppression (smoothing) characteristics
makes them preferable because, as mentioned earlier in the discussion of Fig. 10.11,
noise suppression is an important issue when dealing with derivatives. Note that the
vtucircle.com
Recall the important coefficients of all the kernels in Fig. 10.14 sum to zero, thus giving a response of zero
result in Problem 3.32
that using a kernel in areas of constant intensity, as expected of derivative operators.
whose coefficients sum Any of the pairs of kernels from Fig. 10.14 are convolved with an image to obtain
to zero produces a
filtered image whose the gradient components gx and gy at every pixel location. These two partial deriva-
pixels also sum to zero. tive arrays are then used to estimate edge strength and direction. Obtaining the
This implies in general
that some pixels will be magnitude of the gradient requires the computations in Eq. (10-17). This imple-
negative. Similarly, if the mentation is not always desirable because of the computational burden required
kernel coefficients sum
to 1, the sum of pixels in by squares and square roots, and an approach used frequently is to approximate the
the original and filtered magnitude of the gradient by absolute values:
images will be the same
(see Problem 3.31).
M(x, y) gx gy (10-26)
Figure 10.16 illustrates the Sobel absolute value response of the two components of the gradient, gx
and gy , as well as the gradient image formed from the sum of these two components. The directionality
of the horizontal and vertical components of the gradient is evident in Figs. 10.16(b) and (c). Note, for
vtucircle.com
10.2 Point, Line, and Edge Detection 721
a b c d
e f g h 3 3 5 3 5 5 5 5 5 5 5 3
FIGURE 10.15
Kirsch compass 3 0 5 3 0 5 3 0 3 5 0 3
kernels. The edge
direction of
3 3 5 3 3 3 3 3 3 3 3 3
strongest response
of each kernel is N NW W SW
labeled below it.
5 3 3 3 3 3 3 3 3 3 3 3
5 0 3 5 0 3 3 0 3 3 0 5
5 3 3 5 5 3 5 5 5 3 5 5
S SE E NE
example, how strong the roof tile, horizontal brick joints, and horizontal segments of the windows are in
Fig. 10.16(b) compared to other edges. In contrast, Fig. 10.16(c) favors features such as the vertical com-
ponents of the façade and windows. It is common terminology to use the term edge map when referring
to an image whose principal features are edges, such as gradient magnitude images. The intensities of the
image in Fig. 10.16(a) were scaled to the range [0, 1]. We use values in this range to simplify parameter
selection in the various methods for edge detection discussed in this section.
c d
FIGURE 10.16
(a) Image of size
834 1114 pixels,
with intensity
values scaled to
the range [0,1].
(b) gx , the
component of
the gradient in 1 1 1
the x-direction,
obtained using the 1
Sobel kernel in
Fig. 10.14(f) to
filter the image.
(c) gy , obtained
using the kernel
in Fig. 10.14(g).
(d) The gradient
.
vtucircle.com
FIGURE 10.17
Gradient angle
image computed
using Eq. (10-18).
Areas of constant
intensity in this
image indicate
that the direction
of the gradient
vector is the same
at all the pixel
locations in those
regions.
Figure 10.17 shows the gradient angle image computed using Eq. (10-18). In general, angle images are
not as useful as gradient magnitude images for edge detection, but they do complement the information
extracted from an image using the magnitude of the gradient. For instance, the constant intensity areas
in Fig. 10.16(a), such as the front edge of the sloping roof and top horizontal bands of the front wall,
are constant in Fig. 10.17, indicating that the gradient vector direction at all the pixel locations in those
regions is the same. As we will show later in this section, angle information plays a key supporting role
in the implementation of the Canny edge detection algorithm, a widely used edge detection scheme.
The original image in Fig. 10.16(a) is of reasonably high resolution, and at the
distance the image was acquired, the contribution made to image detail by the wall
bricks is significant. This level of fine detail often is undesirable in edge detection
because it tends to act as noise, which is enhanced by derivative computations and
thus complicates detection of the principal edges. One way to reduce fine detail is
to smooth the image prior to computing the edges. Figure 10.18 shows the same
sequence of images as in Fig. 10.16, but with the original image smoothed first using
a 5 5 averaging filter (see Section 3.5 regarding smoothing filters). The response
of each kernel now shows almost no contribution due to the bricks, with the results
being dominated mostly by the principal edges in the image.
Figures 10.16 and 10.18 show that the horizontal and vertical Sobel kernels do
not differentiate between edges in the 45 directions. If it is important to empha-
size edges oriented in particular diagonal directions, then one of the Kirsch kernels
in Fig. 10.15 should be used. Figures 10.19(a) and (b) show the responses of the 45°
(NW) and 45 (SW) Kirsch kernels, respectively. The stronger diagonal selectivity
of these kernels is evident in these figures. Both kernels have similar responses to
horizontal and vertical edges, but the response in these directions is weaker.
vtucircle.com
10.2 Point, Line, and Edge Detection 723
c d
FIGURE 10.18
Same sequence as
in Fig. 10.16, but
with the original
image smoothed
using a 5 5 aver-
aging kernel prior
to edge detection.
black. Comparing this image with Fig. 10.16(d), we see that there are fewer edges
in the thresholded image, and that the edges in this image are much sharper (see,
for example, the edges in the roof tile). On the other hand, numerous edges, such
as the sloping line defining the far edge of the roof (see arrow), are broken in the
thresholded image.
When interest lies both in highlighting the principal edges and on maintaining
as much connectivity as possible, it is common practice to use both smoothing and
thresholding. Figure 10.20(b) shows the result of thresholding Fig. 10.18(d), which is
the gradient of the smoothed image. This result shows a reduced number of broken
edges; for instance, compare the corresponding edges identified by the arrows in
Figs. 10.20(a) and (b).
a b
FIGURE 10.19
Diagonal edge
detection.
(a) Result of using
the Kirsch kernel in
Fig. 10.15(c).
(b) Result of using
the kernel in Fig.
10.15(d). The input
image in both cases
was Fig. 10.18(a).
vtucircle.com
a b
FIGURE 10.20
(a) Result of
thresholding
Fig. 10.16(d), the
gradient of the
original image.
(b) Result of
thresholding
Fig. 10.18(d), the
gradient of the
smoothed image.
with standard deviation s (sometimes s is called the space constant in this context).
Gaussian function.
vtucircle.com
10.2 Point, Line, and Edge Detection 725
2G(x, y) 2G(x, y)
2G(x, y)
x2 y2
x2 y2 x2 y2
x y
e 2s 2
b y s2 e
2s2
b (10-28)
x s2
x2 y2 x2 y2
x2 1 2
1
e 2s2
a
e 2s2
s4 s2 s4 s2
Collecting terms, we obtain
x 2 y2
x2 y2 2s2
G(x, y) a
2
be
2s2 (10-29)
s 4
vtucircle.com
∇2G
c d
FIGURE 10.21
(a) 3-D plot of
the negative of the
LoG.
(b) Negative of
the LoG
displayed as an
image.
(c) Cross section
of (a) showing
zero crossings.
(d) 5 5 kernel
approximation to ∇2G
0 0 1 0 0
the shape in (a).
The negative
of this kernel 0 1 2 1 0
would be used in
practice. 16
1 2 2 1
0 1 2 1 0
0 0 1 0 0
direction, thus avoiding having to use multiple kernels to calculate the strongest
response at any point in the image.
The Marr-Hildreth algorithm consists of convolving the LoG kernel with an input
image,
and then finding the zero crossings of g(x, y) to determine the locations of edges in
f (x, y). Because the Laplacian and convolution are linear processes, we can write
This expression is
implemented in the Eq. (10-30) as
spatial domain using
1. Filter the input image with an n n Gaussian lowpass kernel obtained by sam-
pling Eq. (10-27).
2. Compute the Laplacian of the image resulting from Step 1 using, for example,
the 3 3 kernel in Fig. 10.4(a). [Steps 1 and 2 implement Eq. (10-31).]
3. Find the zero crossings of the image from Step 2.
vtucircle.com
10.2 Point, Line, and Edge Detection 727
To specify the size of the Gaussian kernel, recall from our discussion of Fig. 3.35 that
the values of a Gaussian function at a distance larger than 3s from the mean are
small enough so that they can be ignored. As discussed in Section 3.5, this implies
As explained in Section
3.5, )ƒ and :; denote the
using a Gaussian kernel of size L6sM L6sM , where L6sM denotes the ceiling of 6s; that
ceiling and floor func- is, smallest integer not less than 6s. Because we work with kernels of odd dimen-
tions. That is, the ceiling
and floor functions map
sions, we would use the smallest odd integer satisfying this condition. Using a kernel
a real number to the smaller than this will “truncate” the LoG function, with the degree of truncation
smallest following, or the
largest previous, integer,
being inversely proportional to the size of the kernel. Using a larger kernel would
respectively. make little difference in the result.
One approach for finding the zero crossings at any pixel, p, of the filtered image,
g(x, y), is to use a 3 3 neighborhood centered at p. A zero crossing at p implies
Attempts to find zero
that the signs of at least two of its opposing neighboring pixels must differ. There are
crossings by finding the four cases to test: left/right, up/down, and the two diagonals. If the values of g(x, y)
coordinates (x, y) where are being compared against a threshold (a common approach), then not only must
g(x, y) = 0 are impractical
because of noise and the signs of opposing neighbors be different, but the absolute value of their numeri-
other computational cal difference must also exceed the threshold before we can call p a zero-crossing
inaccuracies.
pixel. We illustrate this approach in Example 10.7.
Computing zero crossings is the key feature of the Marr-Hildreth edge-detection
method. The approach discussed in the previous paragraph is attractive because of
its simplicity of implementation and because it generally gives good results. If the
accuracy of the zero-crossing locations found using this method is inadequate in a
particular application, then a technique proposed by Huertas and Medioni [1986]
for finding zero crossings with subpixel accuracy can be employed.
Figure 10.22(a) shows the building image used earlier and Fig. 10.22(b) is the result of Steps 1 and 2 of
the Marr-Hildreth algorithm, using s 4 (approximately 0.5% of the short dimension of the image)
and n 25 to satisfy the size condition stated above. As in Fig. 10.5, the gray tones in this image are due
to scaling. Figure 10.22(c) shows the zero crossings obtained using the 3 3 neighborhood approach just
discussed, with a threshold of zero. Note that all the edges form closed loops. This so-called “spaghetti
effect” is a serious drawback of this method when a threshold value of zero is used (see Problem 10.17).
We avoid closed-loop edges by using a positive threshold.
Figure 10.22(d) shows the result of using a threshold approximately equal to 4% of the maximum
value of the LoG image. The majority of the principal edges were readily detected, and “irrelevant” fea-
tures, such as the edges due to the bricks and the tile roof, were filtered out. This type of performance
is virtually impossible to obtain using the gradient-based edge-detection techniques discussed earlier.
Another important consequence of using zero crossings for edge detection is that the resulting edges are
1 pixel thick. This property simplifies subsequent stages of processing, such as edge linking.
x y
2 2
x y
2 2
1 1
DG (x, y) 2
e 2s 1
2
2 e 2s 2
2 (10-32)
2ps 1 2ps 2
vtucircle.com
c d
FIGURE 10.22
(a) Image of size
834 1114 pixels,
with intensity
values scaled to the
range [0, 1].
(b) Result of
Steps 1 and 2 of
the Marr-Hildreth
algorithm using
s 4 and n 25.
(c) Zero cross-
ings of (b) using
a threshold of 0
(note the closed-
loop edges).
(d) Zero cross-
ings found using a
threshold equal to
4% of the maxi-
mum value of the
image in (b). Note
the thin edges.
with s1 s2. Experimental results suggest that certain “channels” in the human
vision system are selective with respect to orientation and frequency, and can be
modeled using Eq. (10-32) with a ratio of standard deviations of 1.75:1. Using the
ratio 1.6:1 preserves the basic characteristics of these observations and also pro-
vides a closer “engineering” approximation to the LoG function (Marr and Hil-
dreth [1980]). In order for the LoG and DoG to have the same zero crossings, the
value of s for the LoG must be selected based on the following equation (see
Problem 10.19):
s 2
s 2 s12 s22 (10-33)
2 2
ln 1
s1 s2 2
Although the zero crossings of the LoG and DoG will be the same when this value
of s is used, their amplitude scales will be different. We can make them compatible
by scaling both functions so that they have the same value at the origin.
The profiles in Figs. 10.23(a) and (b) were generated with standard devia-
tion ratios of 1:1.75 and 1:1.6, respectively (by convention, the curves shown are
inverted, as in Fig. 10.21). The LoG profiles are the solid lines, and the DoG profiles
are dotted. The curves shown are intensity profiles through the center of the LoG
and DoG arrays, generated by sampling Eqs. (10-29) and (10-32), respectively. The
amplitude of all curves at the origin were normalized to 1. As Fig. 10.23(b) shows,
the ratio 1:1.6 yielded a slightly closer approximation of the LoG and DoG func-
tions (for example, compare the bottom lobes of the two figures).
vtucircle.com
10.2 Point, Line, and Edge Detection 729
a b
FIGURE 10.23
(a) Negatives of
the LoG (solid)
and DoG
(dotted) profiles
using a s ratio of
1.75:1. (b) Profiles
obtained using a
ratio of 1.6:1.
Gaussian kernels are separable (see Section 3.4). Therefore, both the LoG and
the DoG filtering operations can be implemented with 1-D convolutions instead of
using 2-D convolutions directly (see Problem 10.19). For an image of size M N
and a kernel of size n n, doing so reduces the number of multiplications and addi-
tions for each convolution from being proportional to n2MN for 2-D convolutions
to being proportional to nMN for 1-D convolutions. This implementation difference
is significant. For example, if n 25, a 1-D implementation will require on the order
of 12 times fewer multiplication and addition operations than using 2-D convolution.
x
2
d x x2
e 2s 2 e 2s
2 2
(10-34)
dx s
†
Recall that white noise is noise having a frequency spectrum that is continuous and uniform over a specified
frequency band. White Gaussian noise is white noise in which the distribution of amplitude values is Gaussian.
Gaussian white noise is a good approximation of many real-world situations and generates mathematically
tractable models. It has the useful property that its values are statistically independent.
vtucircle.com
where the approximation was only about 20% worse that using the optimized
numerical solution (a difference of this magnitude generally is visually impercep-
tible in most applications).
Generalizing the preceding result to 2-D involves recognizing that the 1-D
approach still applies in the direction of the edge normal (see Fig. 10.12). Because
the direction of the normal is unknown beforehand, this would require applying the
1-D edge detector in all possible directions. This task can be approximated by first
smoothing the image with a circular 2-D Gaussian function, computing the gradient
of the result, and then using the gradient magnitude and direction to estimate edge
strength and direction at every point.
Let f (x, y) denote the input image and G(x, y) denote the Gaussian function:
x2 y2
G(x, y) e 2s2 (10-35)
with gx (x, y) fs (x, y) x and gy (x, y) fs (x, y) y. Any of the derivative fil-
ter kernel pairs in Fig. 10.14 can be used to obtain gx (x, y) and gy (x, y). Equation
(10-36) is implemented using an n n Gaussian kernel whose size is discussed below.
Keep in mind that ∇fs (x, y) and a(x, y) are arrays of the same size as the image
from which they are computed.
Gradient image ∇fs (x, y) typically contains wide ridges around local maxima.
The next step is to thin those ridges. One approach is to use nonmaxima suppres-
sion. The essence of this approach is to specify a number of discrete orientations of
the edge normal (gradient vector). For example, in a 3 3 region we can define four
orientations† for an edge passing through the center point of the region: horizontal,
vertical, 45 and 45 Figure 10.24(a) shows the situation for the two possible
orientations of a horizontal edge. Because we have to quantize all possible edge
directions into four ranges, we have to define a range of directions over which we
consider an edge to be horizontal. We determine edge direction from the direction
of the edge normal, which we obtain directly from the image data using Eq. (10-38).
As Fig. 10.24(b) shows, if the edge normal is in the range of directions from 22.5 to
†
Every edge has two possible orientations. For example, an edge whose normal is oriented at 0° and an edge
whose normal is oriented at 180° are the same horizontal edge.
vtucircle.com
10.2 Point, Line, and Edge Detection 731
a b —157.5° +157.5°
c Edge normal
FIGURE 10.24
(a) Two possible
orientations of a
y
horizontal edge 4 6
(shaded) in a 3 3 Edge Edge normal
neighborhood. (gradient vector)
(b) Range of values a
(shaded) of a, the Edge normal
—22.5° +22.5°
direction angle of
x
the edge normal
for a horizontal —157.5° +157.5°
edge. (c) The angle +45°edge
ranges of the edge
normals for the —112.5° +112.5°
four types of edge
directions in a 3 3 Vertical edge
neighborhood.
Each edge direc-
tion has two ranges, —67.5° +67.5°
shown in corre-
sponding shades.
—45°edge
—22.5° +22.5°
0°
Horizontal edge
22.5 or from 157.5 to 157.5 we call the edge a horizontal edge. Figure 10.24(c)
shows the angle ranges corresponding to the four directions under consideration.
Let d1 , d2 , d3 ,and d4 denote the four basic edge directions just discussed for
a 3 3 region: horizontal, 45 vertical, and 45 respectively. We can formulate
the following nonmaxima suppression scheme for a 3 3 region centered at an
arbitrary point (x, y) in a:
1. Find the direction dk that is closest to a(x, y).
2. Let K denote the value of ∇fs at (x, y). If K is less than the value of ∇fs at one
or both of the neighbors of point (x, y) along dk , let gN (x, y) 0 (suppression);
otherwise, let gN (x, y) K.
When repeated for all values of x and y, this procedure yields a nonmaxima sup-
pressed image gN (x, y) that is of the same size as fs (x, y). For example, with reference
to Fig. 10.24(a), letting (x, y) be at p5, and assuming a horizontal edge through p5,
the pixels of interest in Step 2 would be p2 and p8. Image gN (x, y) contains only the
thinned edges; it is equal to image ∇fs (x, y) with the nonmaxima edge points sup-
pressed.
The final operation is to threshold gN (x, y) to reduce false edge points. In the
Marr-Hildreth algorithm we did this using a single threshold, in which all values
below the threshold were set to 0. If we set the threshold too low, there will still
be some false edges (called false positives). If the threshold is set too high, then
valid edge points will be eliminated (false negatives). Canny’s algorithm attempts to
vtucircle.com
improve on this situation by using hysteresis thresholding which, as we will discuss
in Section 10.3, uses two thresholds: a low threshold, TL and a high threshold, TH .
Experimental evidence (Canny [1986]) suggests that the ratio of the high to low
threshold should be in the range of 2:1 to 3:1.
We can visualize the thresholding operation as creating two additional images:
Initially, gNH (x, y) and gNL(x, y) are set to 0. After thresholding, gNH (x, y) will usu-
ally have fewer nonzero pixels than gNL(x, y), but all the nonzero pixels in gNH (x, y)
will be contained in gNL(x, y) because the latter image is formed with a lower thresh-
old. We eliminate from gNL(x, y) all the nonzero pixels from gNH (x, y) by letting
gNL(x, y) gNL(x, y) gNH (x, y) (10-41)
The nonzero pixels in gNH (x, y) and gNL(x, y) may be viewed as being “strong”
and “weak” edge pixels, respectively. After the thresholding operations, all strong
pixels in gNH (x, y) are assumed to be valid edge pixels, and are so marked imme-
diately. Depending on the value of TH , the edges in gNH (x, y) typically have gaps.
Longer edges are formed using the following procedure:
(a) Locate the next unvisited edge pixel, p, in gNH (x, y).
(b) Mark as valid edge pixels all the weak pixels in gNL(x, y) that are connected to
p using, say, 8-connectivity.
(c) If all nonzero pixels in gNH (x, y) have been visited go to Step (d). Else, return
to Step ( a).
(d) Set to zero all pixels in gNL(x, y) that were not marked as valid edge pixels.
At the end of this procedure, the final image output by the Canny algorithm is
formed by appending to gNH (x, y) all the nonzero pixels from gNL(x, y).
We used two additional images, gNH (x, y) and gNL(x, y) to simplify the discussion.
In practice, hysteresis thresholding can be implemented directly during nonmaxima
suppression, and thresholding can be implemented directly on gN (x, y) by forming a
list of strong pixels and the weak pixels connected to them.
Summarizing, the Canny edge detection algorithm consists of the following steps:
Although the edges after nonmaxima suppression are thinner than raw gradient edg-
es, the former can still be thicker than one pixel. To obtain edges one pixel thick, it is
typical to follow Step 4 with one pass of an edge-thinning algorithm (see Section 9.5).
vtucircle.com
10.2 Point, Line, and Edge Detection 733
As mentioned earlier, smoothing is accomplished by convolving the input image
with a Gaussian kernel whose size, n n, must be chosen. Once a value of s has
been specified, we can use the approach discussed in connection with the Marr-Hil-
Usually, selecting a
suitable value of s dreth algorithm to determine an odd value of n that provides the “full” smoothing
for the first time in an capability of the Gaussian filter for the specified value of s.
application requires
experimentation. Some final comments on implementation: As noted earlier in the discussion of
the Marr-Hildreth edge detector, the 2-D Gaussian function in Eq. (10-35) is sepa-
rable into a product of two 1-D Gaussians. Thus, Step 1 of the Canny algorithm can
be formulated as 1-D convolutions that operate on the rows (columns) of the image
one at a time, and then work on the columns (rows) of the result. Furthermore, if
we use the approximations in Eqs. (10-19) and (10-20), we can also implement the
gradient computations required for Step 2 as 1-D convolutions (see Problem 10.22).
Figure 10.25(a) shows the familiar building image. For comparison, Figs. 10.25(b) and (c) show, respec-
tively, the result in Fig. 10.20(b) obtained using the thresholded gradient, and Fig. 10.22(d) using the
Marr-Hildreth detector. Recall that the parameters used in generating those two images were selected
to detect the principal edges, while attempting to reduce “irrelevant” features, such as the edges of the
bricks and the roof tiles.
Figure 10.25(d) shows the result obtained with the Canny algorithm using the parameters TL 0.04,
TH 0.10 (2.5 times the value of the low threshold), s 4, and a kernel of size 25 25, which cor-
responds to the smallest odd integer not less than 6s. These parameters were chosen experimentally
c d
FIGURE 10.25
(a) Original image
of size 834 1114
pixels, with
intensity values
scaled to the range
[0, 1].
(b) Thresholded
gradient of the
smoothed image.
(c) Image obtained
using the
Marr-Hildreth
algorithm.
(d) Image obtained
using the Canny
algorithm. Note the
significant
improvement of
the Canny image
compared to the
other two.
vtucircle.com
to achieve the objectives stated in the previous paragraph for the gradient and Marr-Hildreth images.
Comparing the Canny image with the other two images, we see in the Canny result significant improve-
ments in detail of the principal edges and, at the same time, more rejection of irrelevant features. For
example, note that both edges of the concrete band lining the bricks in the upper section of the image
were detected by the Canny algorithm, whereas the thresholded gradient lost both of these edges, and
the Marr-Hildreth method detected only the upper one. In terms of filtering out irrelevant detail, the
Canny image does not contain a single edge due to the roof tiles; this is not true in the other two images.
The quality of the lines with regard to continuity, thinness, and straightness is also superior in the Canny
image. Results such as these have made the Canny algorithm a tool of choice for edge detection.
As another comparison of the three principal edge-detection methods discussed in this section, consider
Fig. 10.26(a), which shows a 512 512 head CT image. Our objective is to extract the edges of the outer
contour of the brain (the gray region in the image), the contour of the spinal region (shown directly
behind the nose, toward the front of the brain), and the outer contour of the head. We wish to generate
the thinnest, continuous contours possible, while eliminating edge details related to the gray content in
the eyes and brain areas.
Figure 10.26(b) shows a thresholded gradient image that was first smoothed using a 5 5 averaging
kernel. The threshold required to achieve the result shown was 15% of the maximum value of the gradi-
ent image. Figure 10.26(c) shows the result obtained with the Marr-Hildreth edge-detection algorithm
with a threshold of 0.002, s 3, and a kernel of size 19 19. Figure 10.26(d) was obtained using the
Canny algorithm with TL 0.05,TH 0.15 (3 times the value of the low threshold), s 2, and a kernel
of size 13 13.
c d
FIGURE 10.26
(a) Head CT image
of size 512 512
pixels, with
intensity values
scaled to the range
[0, 1].
(b) Thresholded
gradient of the
smoothed image.
(c) Image obtained
using the Marr-Hil-
dreth algorithm.
(d) Image obtained
using the Canny
algorithm.
(Original image
courtesy of Dr.
David R. Pickens,
Vanderbilt
University.)
vtucircle.com
10.2 Point, Line, and Edge Detection 735
In terms of edge quality and the ability to eliminate irrelevant detail, the results in Fig. 10.26 correspond
closely to the results and conclusions in the previous example. Note also that the Canny algorithm was
the only procedure capable of yielding a totally unbroken edge for the posterior boundary of the brain,
and the closest boundary of the spinal cord. It was also the only procedure capable of finding the cleanest
contours, while eliminating all the edges associated with the gray brain matter in the original image.
The price paid for the improved performance of the Canny algorithm is a sig-
nificantly more complex implementation than the two approaches discussed earlier.
In some applications, such as real-time industrial image processing, cost and speed
requirements usually dictate the use of simpler techniques, principally the thresh-
olded gradient approach. When edge quality is the driving force, the Marr-Hildreth
and Canny algorithms, especially the latter, offer superior alternatives.
Local Processing
A simple approach for linking edge points is to analyze the characteristics of pixels
in a small neighborhood about every point (x, y) that has been declared an edge
point by one of the techniques discussed in the preceding sections. All points that
are similar according to predefined criteria are linked, forming an edge of pixels that
share common properties according to the specified criteria.
The two principal properties used for establishing similarity of edge pixels in this
kind of local analysis are (1) the strength (magnitude) and (2) the direction of the
gradient vector. The first property is based on Eq. (10-17). Let Sxy denote the set of
coordinates of a neighborhood centered at point (x, y) in an image. An edge pixel
with coordinates (s, t) in Sxy is similar in magnitude to the pixel at (x, y) if
vtucircle.com
The direction angle of the gradient vector is given by Eq. (10-18). An edge pixel
with coordinates (s, t) in Sxy has an angle similar to the pixel at (x, y) if
where A is a positive angle threshold. As noted earlier, the direction of the edge at
(x, y) is perpendicular to the direction of the gradient vector at that point.
A pixel with coordinates (s, t) in Sxy is considered to be linked to the pixel at (x, y)
if both magnitude and direction criteria are satisfied. This process is repeated for
every edge pixel. As the center of the neighborhood is moved from pixel to pixel, a
record of linked points is kept. A simple bookkeeping procedure is to assign a dif-
ferent intensity value to each set of linked edge pixels.
The preceding formulation is computationally expensive because all neighbors of
every point have to be examined. A simplification particularly well suited for real
time applications consists of the following steps:
1. Compute the gradient magnitude and angle arrays, M(x, y) and a(x, y), of the
input image, f (x, y).
2. Form a binary image, g(x, y), whose value at any point (x, y) is given by:
When interest lies in horizontal and vertical edge linking, Step 4 becomes a simple
procedure in which g is rotated ninety degrees, the rows are scanned, and the result
is rotated back. This is the application found most frequently in practice and, as the
following example shows, this approach can yield good results. In general, image
rotation is an expensive computational process so, when linking in numerous angle
directions is required, it is more practical to combine Steps 3 and 4 into a single,
radial scanning procedure.
Figure 10.27(a) shows a 534 566 image of the rear of a vehicle. The objective of this example is to
illustrate the use of the preceding algorithm for finding rectangles whose sizes makes them suitable
candidates for license plates. The formation of these rectangles can be accomplished by detecting
vtucircle.com
10.2 Point, Line, and Edge Detection 737
a b c
de f
FIGURE 10.27
(a) Image of the rear
of a vehicle.
(b) Gradient magni-
tude image.
(c) Horizontally
connected edge
pixels.
(d) Vertically con-
nected edge pixels.
(e) The logical OR
of (c) and (d).
(f) Final result,
using morphological
thinning. (Original
image courtesy of
Perceptics
Corporation.)
strong horizontal and vertical edges. Figure 10.27(b) shows the gradient magnitude image, M(x, y), and
Figs. 10.27(c) and (d) show the result of Steps 3 and 4 of the algorithm, obtained by letting TM equal
to 30% of the maximum gradient value, A 90, TA 45 and filling all gaps of 25 or fewer pixels
(approximately 5% of the image width). A large range of allowable angle directions was required to
detect the rounded corners of the license plate enclosure, as well as the rear windows of the vehicle.
Figure 10.27(e) is the result of forming the logical OR of the two preceding images, and Fig. 10.27(f)
was obtained by thinning 10.27(e) with the thinning procedure discussed in Section 9.5. As Fig. 10.27(f)
shows, the rectangle corresponding to the license plate was clearly detected in the image. It would be
a simple matter to isolate the license plate from all the rectangles in the image, using the fact that the
width-to-height ratio of license plates have distinctive proportions (e.g., a 2:1 ratio in U.S. plates).
vtucircle.com
comparisons of every point to all lines. This is a computationally prohibitive task in
most applications.
Hough [1962] proposed an alternative approach, commonly referred to as the
The original formulation
of the Hough transform Hough transform. Let (xi , yi ) denote a point in the xy-plane and consider the general
presented here works equation of a straight line in slope-intercept form: yi axi b. Infinitely many lines
with straight lines. For a
generalization to pass through (xi , yi ), but they all satisfy the equation yi axi b for varying val-
arbitrary shapes, see
Ballard [1981].
ues of a and b. However, writing this equation as b xia yi and considering the
ab-plane (also called parameter space) yields the equation of a single line for a fixed
point (xi , yi ). Furthermore, a second point (xj , yj ) also has a single line in parameter
space associated with it, which intersects the line associated with (xi , yi ) at some
point (a', b') in parameter space, where a' is the slope and b' the intercept of the line
containing both (xi , yi ) and (xj , yj ) in the xy-plane (we are assuming, of course, that
the lines are not parallel). In fact, all points on this line have lines in parameter space
that intersect at (a', b'). Figure 10.28 illustrates these concepts.
In principle, the parameter space lines corresponding to all points (xk , yk ) in the
xy-plane could be plotted, and the principal lines in that plane could be found by
identifying points in parameter space where large numbers of parameter-space lines
intersect. However, a difficulty with this approach is that a, (the slope of a line)
approaches infinity as the line approaches the vertical direction. One way around
this difficulty is to use the normal representation of a line:
a b b'
y b
FIGURE 10.28
(a) xy-plane. b = —xia + yi
(b) Parameter
space.
a'
b = —xja + yj
vtucircle.com
10.2 Point, Line, and Edge Detection 739
r' rmax
xicosu + yisinu = r
x r r
FIGURE 10.29 (a) (r, u) parameterization of a line in the xy-plane. (b) Sinusoidal curves in the ru-plane;the point of
intersection (r', u') corresponds to the line passing through points (xi , yi ) and (xj , yj ) in the xy-plane. (c) Division
of the ru-plane into accumulator cells.
Figure 10.30 illustrates the Hough transform based on Eq. (10-44). Figure 10.30(a) shows an image
of size M M (M 101) with five labeled white points, and Fig. 10.30(b) shows each of these points
mapped onto the ru-plane using subdivisions of one unit for the r and u axes. The range of u values is
90, and the range of r values is 2M. As Fig. 10.30(b) shows, each curve has a different sinusoidal
shape. The horizontal line resulting from the mapping of point 1 is a sinusoid of zero amplitude.
The points labeled A (not to be confused with accumulator values) and B in Fig. 10.30(b) illustrate
the colinearity detection property of the Hough transform. For example, point B, marks the intersection
of the curves corresponding to points 2, 3, and 4 in the xy image plane. The location of point A indicates
that these three points lie on a straight line passing through the origin (r 0) and oriented at 45 [see
Fig. 10.29(a)]. Similarly, the curves intersecting at point B in parameter space indicate that points 2, 3,
and 4 lie on a straight line oriented at 45, and whose distance from the origin is r 71 (one-half the
diagonal distance from the origin of the image to the opposite corner, rounded to the nearest integer
vtucircle.com
a
b
FIGURE 10.30
(a) Image of size
101 101 pixels,
containing five
white points (four
in the corners and
one in the center).
(b) Corresponding
parameter space.
—100
2
—50
S 1 S
0
3
50 4
100
value). Finally, the points labeled Q, R, and S in Fig. 10.30(b) illustrate the fact that the Hough transform
exhibits a reflective adjacency relationship at the right and left edges of the parameter space. This prop-
erty is the result of the manner in which r and u change sign at the 90 boundaries.
Although the focus thus far has been on straight lines, the Hough transform is
applicable to any function of the form g v, c 0, where v is a vector of coordinates
and c is a vector of coefficients. For example, points lying on the circle
x c1 y c 2 c23
2 2
(10-45)
can be detected by using the basic approach just discussed. The difference is the
presence of three parameters c1, c2 , and c3 that result in a 3-D parameter space with
vtucircle.com
10.2 Point, Line, and Edge Detection 741
cube-like cells, and accumulators of the form A(i, j, k). The procedure is to incre-
ment c1 and c2 , solve for the value of c3 that satisfies Eq. (10-45), and update the
accumulator cell associated with the triplet (c1, c2 , c3 ). Clearly, the complexity of the
Hough transform depends on the number of coordinates and coefficients in a given
functional representation. As noted earlier, generalizations of the Hough transform
to detect curves with no simple analytic representations are possible, as is the appli-
cation of the transform to grayscale images.
Returning to the edge-linking problem, an approach based on the Hough trans-
form is as follows:
1. Obtain a binary edge map using any of the methods discussed earlier in this section.
2. Specify subdivisions in the ru-plane.
3. Examine the counts of the accumulator cells for high pixel concentrations.
4. Examine the relationship (principally for continuity) between pixels in a chosen
cell.
Continuity in this case usually is based on computing the distance between discon-
nected pixels corresponding to a given accumulator cell. A gap in a line associated
with a given cell is bridged if the length of the gap is less than a specified threshold.
Being able to group lines based on direction is a global concept applicable over the
entire image, requiring only that we examine pixels associated with specific accumu-
lator cells. The following example illustrates these concepts.
Figure 10.31(a) shows an aerial image of an airport. The objective of this example is to use the Hough
transform to extract the two edges defining the principal runway. A solution to such a problem might be
of interest, for instance, in applications involving autonomous air navigation.
The first step is to obtain an edge map. Figure 10.31(b) shows the edge map obtained using Canny’s
algorithm with the same parameters and procedure used in Example 10.9. For the purpose of computing
the Hough transform, similar results can be obtained using any of the other edge-detection techniques
discussed earlier. Figure 10.31(c) shows the Hough parameter space obtained using 1° increments for u,
and one-pixel increments for r.
The runway of interest is oriented approximately 1 off the north direction, so we select the cells cor-
responding to 90 and containing the highest count because the runways are the longest lines oriented
in these directions. The small boxes on the edges of Fig. 10.31(c) highlight these cells. As mentioned ear-
lier in connection with Fig. 10.30(b), the Hough transform exhibits adjacency at the edges. Another way
of interpreting this property is that a line oriented at 90 and a line oriented at 90 are equivalent (i.e.,
they are both vertical). Figure 10.31(d) shows the lines corresponding to the two accumulator cells just
discussed, and Fig. 10.31(e) shows the lines superimposed on the original image. The lines were obtained
by joining all gaps not exceeding 20% (approximately 100 pixels) of the image height. These lines clearly
correspond to the edges of the runway of interest.
Note that the only information needed to solve this problem was the orientation of the runway and
the observer’s position relative to it. In other words, a vehicle navigating autonomously would know
that if the runway of interest faces north, and the vehicle’s direction of travel also is north, the runway
should appear vertically in the image. Other relative orientations are handled in a similar manner. The
vtucircle.com
FIGURE 10.31 (a) A 502 564 aerial image of an airport. (b) Edge map obtained using Canny’s algorithm. (c) Hough
parameter space (the boxes highlight the points associated with long vertical lines). (d) Lines in the image plane
corresponding to the points highlighted by the boxes. (e) Lines superimposed on the original image.
orientations of runways throughout the world are available in flight charts, and the direction of travel
is easily obtainable using GPS (Global Positioning System) information. This information also could be
used to compute the distance between the vehicle and the runway, thus allowing estimates of param-
eters such as expected length of lines relative to image size, as we did in this example.
10.3 THRESHOLDING
Because of its intuitive properties, simplicity of implementation, and computational
speed, image thresholding enjoys a central position in applications of image segmen-
tation. Thresholding was introduced in Section 3.1, and we have used it in various
discussions since then. In this section, we discuss thresholding in a more formal way,
and develop techniques that are considerably more general than what has been pre-
sented thus far.
FOUNDATION
In the previous section, regions were identified by first finding edge segments,
then attempting to link the segments into boundaries. In this section, we discuss
vtucircle.com
10.3 Thresholding 743
techniques for partitioning images directly into regions based on intensity values
and/or properties of these values.
When T is a constant applicable over an entire image, the process given in this equa-
tion is referred to as global thresholding. When the value of T changes over an image,
we use the term variable thresholding. The terms local or regional thresholding are
used sometimes to denote variable thresholding in which the value of T at any point
(x, y) in an image depends on properties of a neighborhood of (x, y) (for example,
the average intensity of the pixels in the neighborhood). If T depends on the spa-
tial coordinates (x, y) themselves, then variable thresholding is often referred to as
dynamic or adaptive thresholding. Use of these terms is not universal.
Figure 10.32(b) shows a more difficult thresholding problem involving a histo-
gram with three dominant modes corresponding, for example, to two types of light
objects on a dark background. Here, multiple thresholding classifies a point (x, y) as
belonging to the background if f (x, y) T1, to one object class if T1 f (x, y) T2 ,
and to the other object class if f (x, y) T2. That is, the segmented image is given by
a if f (x, y) T2
g x, y b if T1 f (x, y) T2 (10-47)
c if f (x, y) T1
a b
FIGURE 10.32
Intensity
histograms that
can be partitioned
(a) by a single
threshold, and
(b) by dual
thresholds.
T T1 T2
vtucircle.com
where a, b, and c are any three distinct intensity values. We will discuss dual threshold-
ing later in this section. Segmentation problems requiring more than two thresholds
are difficult (or often impossible) to solve, and better results usually are obtained using
other methods, such as variable thresholding, as will be discussed later in this section,
or region growing, as we will discuss in Section 10.4.
Based on the preceding discussion, we may infer intuitively that the success of
intensity thresholding is related directly to the width and depth of the valley(s) sepa-
rating the histogram modes. In turn, the key factors affecting the properties of the
valley(s) are: (1) the separation between peaks (the further apart the peaks are, the
better the chances of separating the modes); (2) the noise content in the image (the
modes broaden as noise increases); (3) the relative sizes of objects and background;
(4) the uniformity of the illumination source; and (5) the uniformity of the reflectance
properties of the image.
a b c
de f
FIGURE 10.33 (a) Noiseless 8-bit image. (b) Image with additive Gaussian noise of mean 0 and standard deviation of
10 intensity levels. (c) Image with additive Gaussian noise of mean 0 and standard deviation of 50 intensity levels.
(d) through (f) Corresponding histograms.
vtucircle.com
10.3 Thresholding 745
[see Fig. 10.33(e)], but their separation is enough so that the depth of the valley
between them is sufficient to make the modes easy to separate. A threshold placed
midway between the two peaks would do the job. Figure 10.33(c) shows the result
of corrupting the image with Gaussian noise of zero mean and a standard deviation
of 50 intensity levels. As the histogram in Fig. 10.33(f) shows, the situation is much
more serious now, as there is no way to differentiate between the two modes. With-
out additional processing (such as the methods discussed later in this section) we
have little hope of finding a suitable threshold for segmenting this image.
0 63 127 191 255 0 0.2 0.4 0.6 0.8 1 0 63 127 191 255
a b c
de f
FIGURE 10.34 (a) Noisy image. (b) Intensity ramp in the range [0.2, 0.6]. (c) Product of (a) and (b). (d) through (f)
Corresponding histograms.
vtucircle.com
perfectly uniform, but the reflectance of the image was not, as a results, for example,
of natural reflectivity variations in the surface of objects and/or background.
The important point is that illumination and reflectance play a central role in the
success of image segmentation using thresholding or other segmentation techniques.
Therefore, controlling these factors when possible should be the first step consid-
ered in the solution of a segmentation problem. There are three basic approaches
to the problem when control over these factors is not possible. The first is to correct
the shading pattern directly. For example, nonuniform (but fixed) illumination can
be corrected by multiplying the image by the inverse of the pattern, which can be
obtained by imaging a flat surface of constant intensity. The second is to attempt
to correct the global shading pattern via processing using, for example, the top-hat
transformation introduced in Section 9.8. The third approach is to “work around”
nonuniformities using variable thresholding, as discussed later in this section.
vtucircle.com
10.3 Thresholding 747
FIGURE 10.35 (a) Noisy fingerprint. (b) Histogram. (c) Segmented result using a global threshold (thin image border
added for clarity). (Original image courtesy of the National Institute of Standards and Technology.).
initial choice for T ). If this condition is met, the algorithm converges in a finite num-
ber of steps, whether or not the modes are separable (see Problem 10.30).
Figure 10.35 shows an example of segmentation using the preceding iterative algorithm. Figure 10.35(a)
is the original image and Fig. 10.35(b) is the image histogram, showing a distinct valley. Application
of the basic global algorithm resulted in the threshold T 125.4 after three iterations, starting with T
equal to the average intensity of the image, and using ΔT 0. Figure 10.35(c) shows the result obtained
using T 125 to segment the original image. As expected from the clear separation of modes in the
histogram, the segmentation between object and background was perfect.
vtucircle.com
between-class variance, a well-known measure used in statistical discriminant analy-
sis. The basic idea is that properly thresholded classes should be distinct with respect
to the intensity values of their pixels and, conversely, that a threshold giving the
best separation between classes in terms of their intensity values would be the best
(optimum) threshold. In addition to its optimality, Otsu’s method has the important
property that it is based entirely on computations performed on the histogram of an
image, an easily obtainable 1-D array (see Section 3.3).
Let 0, 1, 2, … , L 1 denote the set of L distinct integer intensity levels in a digi-
tal image of size M N pixels, and let ni denote the number of pixels with intensity i.
The total number, MN, of pixels in the image is MN n0 n1 n2 ⋯ nL1. The
normalized histogram (see Section 3.3) has components pi ni MN , from which it
follows that L1
pi 1
i0
pi 0 (10-48)
P1(k) pi (10-49)
i0
Viewed another way, this is the probability of class c1 occurring. For example, if we
set k 0, the probability of class c1 having any pixels assigned to it is zero. Similarly,
the probability of class c2 occurring is
L1
P k
i
i0
1
where P1(k) is given by Eq. (10-49). The term P i c1 in Eq. (10-51) is the probability
of intensity value i, given that i comes from class c1. The rightmost term in the first
line of the equation follows from Bayes’ formula:
P A B P B A P A P B
The second line follows from the fact that P c1 i, the probability of c1 given i, is 1
because we are dealing only with values of i from class c1. Also, P(i) is the probabil-
ity of the ith value, which is the ith component of the histogram, pi . Finally, P(c1) is
the probability of class c1 which, from Eq. (10-49), is equal to P1(k).
vtucircle.com
10.3 Thresholding 749
m2 (k) iP i c2
i k1 (10-52)
L1
1
P2 (k) i k1
ip i
m(k)
i0
i pi (10-53)
and the average intensity of the entire image (i.e., the global mean) is given by
L1
mG
i0
i pi (10-54)
The validity of the following two equations can be verified by direct substitution of
the preceding results:
and
P1 P2 1 (10-56)
where s2 is the global variance [i.e., the intensity variance of all the pixels in the
image, as given in Eq. (3-26)],
s2 L1 i m p
2
(10-58)
G G i
i0
P1 m1 mG P2 m2 mG
2 2 2
(10-59)
vtucircle.com
The first line of this equation follows from Eqs. (10-55), (10-56), and (10-59). The
second line follows from Eqs. (10-50) through (10-54). This form is slightly more
efficient computationally because the global mean, mG , is computed only once, so
only two parameters, m1 and P1, need to be computed for any value of k.
The first line in Eq. (10-60) indicates that the farther the two means m1 and m2 are
from each other, the larger s2 will be, implying that the between-class variance is a
measure of separability between classes. Because sG 2 is a constant, it follows that h
B
Then, the optimum threshold is the value, k*, that maximizes s2 (k) :
s2 k* max s (k)
2
0 k L1
(10-63)
To find k* we simply evaluate this equation for all integer values of k (subject to the
condition 0 P1(k) 1) and select the value of k that yielded the maximum s2 (k).
If the maximum exists for more than one value of k, it is customary to average the
various values of k for which s2 (k) is maximum. It can be shown (see Problem
10.36) that a maximum always exists, subject to the condition 0 P1(k) 1. Evaluat-
ing Eqs. (10-62) and (10-63) for all values of k is a relatively inexpensive computa-
tional procedure, because the maximum number of integer values that k can have
is L, which is only 256 for 8-bit images.
Once k* has been obtained, input image f (x, y) is segmented as before:
1 if f (x, y) k*
g(x, y) (10-64)
0 if f (x, y) k*
vtucircle.com
10.3 Thresholding 751
In general, the measure in Eq.(10-61) has values in the range
0 h(k) 1 (10-65)
for values of k in the range 0, L 1. When evaluated at the optimum threshold
k*, this measure is a quantitative estimate of the separability of classes, which in
turn gives us an idea of the accuracy of thresholding a given image with k*. The
lower bound in Eq. (10-65) is attainable only by images with a single, constant inten-
sity level. The upper bound is attainable only by two-valued images with intensities
equal to 0 and L 1 (see Problem 10.37).
Otsu’s algorithm may be summarized as follows:
1. Compute the normalized histogram of the input image. Denote the components
of the histogram by pi , i 0, 1, 2, … , L 1.
2. Compute the cumulative sums, P1(k), for k 0, 1, 2, … , L 1, using Eq. (10-49).
3. Compute the cumulative means, m(k), for k 0, 1, 2, … , L 1, using Eq. (10-53).
4. Compute the global mean, mG , using Eq. (10-54).
5. Compute the between-class variance term, s2 (k), for k 0, 1, 2, … , L 1, using
Eq. (10-62).
6. Obtain the Otsu threshold, k*, as the value of k for which s2 (k) is maximum. If
the maximum is not unique, obtain k* by averaging the values of k correspond-
ing to the various maxima detected.
7. Compute the global variance, s2 , using Eq. (10-58), and then obtain the separa-
bility measure, h*, by evaluating Eq. (10-61) with k k*.
Figure 10.36(a) shows an optical microscope image of polymersome cells. These are cells artificially engi-
neered using polymers. They are invisible to the human immune system and can be used, for example,
to deliver medication to targeted regions of the body. Figure 10.36(b) shows the image histogram. The
objective of this example is to segment the molecules from the background. Figure 10.36(c) is the result
of using the basic global thresholding algorithm discussed earlier. Because the histogram has no distinct
valleys and the intensity difference between the background and objects is small, the algorithm failed to
achieve the desired segmentation. Figure 10.36(d) shows the result obtained using Otsu’s method. This
result obviously is superior to Fig. 10.36(c). The threshold value computed by the basic algorithm was
169, while the threshold computed by Otsu’s method was 182, which is closer to the lighter areas in the
image defining the cells. The separability measure h* was 0.467.
As a point of interest, applying Otsu’s method to the fingerprint image in Example 10.13 yielded a
threshold of 125 and a separability measure of 0.944. The threshold is identical to the value (rounded to
the nearest integer) obtained with the basic algorithm. This is not unexpected, given the nature of the
histogram. In fact, the separability measure is high because of the relatively large separation between
modes and the deep valley between them.
vtucircle.com
c d
FIGURE 10.36
(a) Original
image.
(b) Histogram
(high peaks
were clipped to
highlight details in
the lower values).
(c) Segmenta-
tion result using
the basic global
algorithm from
0 63 127 191 255
Section 10.3.
(d) Result using
Otsu’s method.
(Original image
courtesy of
Professor Daniel
A. Hammer, the
University of
Pennsylvania.)
vtucircle.com
10.3 Thresholding 753
a b c
de f
FIGURE 10.37 (a) Noisy image from Fig. 10.33(c) and (b) its histogram. (c) Result obtained using Otsu’s method.
(d) Noisy image smoothed using a 5 5 averaging kernel and (e) its histogram. (f) Result of thresholding using
Otsu’s method.
Next, we investigate the effect of severely reducing the size of the foreground
region with respect to the background. Figure 10.38(a) shows the result. The noise in
this image is additive Gaussian noise with zero mean and a standard deviation of 10
intensity levels (as opposed to 50 in the previous example). As Fig. 10.38(b) shows,
the histogram has no clear valley, so we would expect segmentation to fail, a fact that
is confirmed by the result in Fig. 10.38(c). Figure 10.38(d) shows the image smoothed
with an averaging kernel of size 5 5, and Fig. 10.38(e) is the corresponding histo-
gram. As expected, the net effect was to reduce the spread of the histogram, but the
distribution still is unimodal. As Fig. 10.38(f) shows, segmentation failed again. The
reason for the failure can be traced to the fact that the region is so small that its con-
tribution to the histogram is insignificant compared to the intensity spread caused
by noise. In situations such as this, the approach discussed in the following section is
more likely to succeed.
vtucircle.com
0 63 127 191 255
a b c
de f
FIGURE 10.38 (a) Noisy image and (b) its histogram. (c) Result obtained using Otsu’s method. (d) Noisy image
smoothed using a 5 5 averaging kernel and (e) its histogram. (f) Result of thresholding using Otsu’s method.
Thresholding failed in both cases to extract the object of interest. (See Fig. 10.39 for a better solution.)
objects and the background. An immediate and obvious improvement is that his-
tograms should be less dependent on the relative sizes of objects and background.
For instance, the histogram of an image composed of a small object on a large back-
ground area (or vice versa) would be dominated by a large peak because of the high
concentration of one type of pixels. We saw in Fig. 10.38 that this can lead to failure
in thresholding.
If only the pixels on or near the edges between objects and background were
used, the resulting histogram would have peaks of approximately the same height. In
addition, the probability that any of those pixels lies on an object would be approxi-
mately equal to the probability that it lies on the background, thus improving the
symmetry of the histogram modes. Finally, as indicated in the following paragraph,
using pixels that satisfy some simple measures based on gradient and Laplacian
operators has a tendency to deepen the valley between histogram peaks.
The approach just discussed assumes that the edges between objects and back-
ground are known. This information clearly is not available during segmentation,
as finding a division between objects and background is precisely what segmenta-
tion aims to do. However, an indication of whether a pixel is on an edge may be
obtained by computing its gradient or Laplacian. For example, the average value
of the Laplacian is 0 at the transition of an edge (see Fig. 10.10), so the valleys of
vtucircle.com
10.3 Thresholding 755
histograms formed from the pixels selected by a Laplacian criterion can be expected
to be sparsely populated. This property tends to produce the desirable deep valleys
discussed above. In practice, comparable results typically are obtained using either
the gradient or Laplacian images, with the latter being favored because it is compu-
tationally more attractive and is also created using an isotropic edge detector.
The preceding discussion is summarized in the following algorithm, where f (x, y)
It is possible to modify is the input image:
this algorithm so that
both the magnitude of
the gradient and the
1. Compute an edge image as either the magnitude of the gradient, or absolute
absolute value of the value of the Laplacian, of f (x, y) using any of the methods in Section 10.2.
Laplacian images are
used. In this case, we 2. Specify a threshold value, T.
would specify a threshold
for each image and form 3. Threshold the image from Step 1 using T from Step 2 to produce a binary image,
the logical OR of the gT (x, y). This image is used as a mask image in the following step to select pixels
two results to obtain
the marker image. This from f (x, y) corresponding to “strong” edge pixels in the mask.
approach is useful when
more control is desired 4. Compute a histogram using only the pixels in f (x, y) that correspond to the
over the points deemed
to be valid edge points.
locations of the 1-valued pixels in gT (x, y).
5. Use the histogram from Step 4 to segment f (x, y) globally using, for example,
Otsu’s method.
Figures 10.39(a) and (b) show the image and histogram from Fig. 10.38. You saw that this image could
not be segmented by smoothing followed by thresholding. The objective of this example is to solve the
problem using edge information. Figure 10.39(c) is the mask image, gT (x, y), formed as gradient mag-
nitude image thresholded at the 99.7 percentile. Figure 10.39(d) is the image formed by multiplying the
mask by the input image. Figure 10.39(e) is the histogram of the nonzero elements in Fig. 10.39(d). Note
that this histogram has the important features discussed earlier; that is, it has reasonably symmetrical
modes separated by a deep valley. Thus, while the histogram of the original noisy image offered no hope
for successful thresholding, the histogram in Fig. 10.39(e) indicates that thresholding of the small object
from the background is indeed possible. The result in Fig. 10.39(f) shows that this is the case. This image
was generated using Otsu’s method [to obtain a threshold based on the histogram in Fig. 10.42(e)], and
then applying the Otsu threshold globally to the noisy image in Fig. 10.39(a). The result is nearly perfect.
vtucircle.com
0 63 127 191 255
a b c
de f
FIGURE 10.39 (a) Noisy image from Fig. 10.38(a) and (b) its histogram. (c) Mask image formed as the gradient mag-
nitude image thresholded at the 99.7 percentile. (d) Image formed as the product of (a) and (c). (e) Histogram of
the nonzero pixels in the image in (d). (f) Result of segmenting image (a) with the Otsu threshold based on the
histogram in (e). The threshold was 134, which is approximately midway between the peaks in this histogram.
In this example, we consider a more complex thresholding problem. Figure 10.40(a) shows an 8-bit
image of yeast cells for which we want to use global thresholding to obtain the regions corresponding
to the bright spots. As a starting point, Fig. 10.40(b) shows the image histogram, and Fig. 10.40(c) is
the result obtained using Otsu’s method directly on the image, based on the histogram shown. We see
that Otsu’s method failed to achieve the original objective of detecting the bright spots. Although the
method was able to isolate some of the cell regions themselves, several of the segmented regions on the
right were actually joined. The threshold computed by the Otsu method was 42, and the separability
measure was 0.636.
Figure 10.40(d) shows the mask image gT (x, y) obtained by computing the absolute value of the
Laplacian image, then thresholding it with T set to 115 on an intensity scale in the range [0, 255]. This
value of T corresponds approximately to the 99.5 percentile of the values in the absolute Laplacian
image, so thresholding at this level results in a sparse set of pixels, as Fig. 10.40(d) shows. Note in this
image how the points cluster near the edges of the bright spots, as expected from the preceding dis-
cussion. Figure 10.40(e) is the histogram of the nonzero pixels in the product of (a) and (d). Finally,
Fig. 10.40(f) shows the result of globally segmenting the original image using Otsu’s method based on
the histogram in Fig. 10.40(e). This result agrees with the locations of the bright spots in the image. The
threshold computed by the Otsu method was 115, and the separability measure was 0.762, both of which
are higher than the values obtained by using the original histogram.
vtucircle.com
10.3 Thresholding 757
a b c
de f
FIGURE 10.40 (a) Image of yeast cells. (b) Histogram of (a). (c) Segmentation of (a) with Otsu’s method using the
histogram in (b). (d) Mask image formed by thresholding the absolute Laplacian image. (e) Histogram of the non-
zero pixels in the product of (a) and (d). (f) Original image thresholded using Otsu’s method based on the histogram
in (e). (Original image courtesy of Professor Susan L. Forsburg, University of Southern California.)
By varying the percentile at which the threshold is set, we can even improve the segmentation of the
complete cell regions. For example, Fig. 10.41 shows the result obtained using the same procedure as in
the previous paragraph, but with the threshold set at 55, which is approximately 5% of the maximum
value of the absolute Laplacian image. This value is at the 53.9 percentile of the values in that image.
This result clearly is superior to the result in Fig. 10.40(c) obtained using Otsu’s method with the histo-
gram of the original image.
MULTIPLE THRESHOLDS
Thus far, we have focused attention on image segmentation using a single global
threshold. Otsu’s method can be extended to an arbitrary number of thresholds
vtucircle.com
FIGURE 10.41
Image in Fig.
10.40(a) segmented
using the same
procedure as
explained in Figs.
10.40(d) through
(f), but using a
lower value to
threshold the
absolute Laplacian
image.
ipi
1
m (10-68)
Pk ic k
As before, mG is the global mean given in Eq. (10-54). The K classes are separated
by K 1 thresholds whose values, k, k,…, k , are the values that maximize Eq.
K 1
1 2
(10-66):
B
s2 k, k,…, k
1 2
max
K 1
s2 k , k ,…k
B
(10-69)
1 2 K 1
0k1 k2 …k K L1
vtucircle.com
10.3 Thresholding 759
Recall from the
discussion of the
For three classes consisting of three intensity intervals (which are separated by
Canny edge detec- two thresholds), the between-class variance is given by:
tor that thresholding
with two thresholds is 2 2 2 2
referred to as hysteresis sB P1 m1 mG P2 m2 mG P3 m3 mG (10-70)
thresholding.
where k1
P1 pi
i0
k2
P2 pi (10-71)
i k1 1
L1
P3 pi
i k2 1
and
k1
1
m1
P1 ipi
i0
1 k2
m
2
P2
i k1 1
ipi (10-72)
L1
m3
1
ip
P3 i k 1 i 2
We see from Eqs. (10-71) and (10-72) that P and m, and therefore s2 , are functions
* *
of k1 and
2 k2 . The two optimum threshold values, k and k , are the values that maxi-
(k , k 1 2
mize s 1 2 ). That is, as indicated in Eq. (10-69), we find the optimum thresholds
B
by finding
B
s2 k, k
1
2
max s2 k , k
B 1 2
(10-75)
0k1 k2 L1
The procedure starts by selecting the first value of k1 (that value is 1 because look-
ing for a threshold at 0 intensity makes no sense; also, keep in mind that the incre-
ment values are integers because we are dealing with integer intensity values).
Next, k2 is incremented through all its values greater than k1 and less than L 1
(i.e., k2 k1 1,…, L 2). Then, k1 is incremented to its next value and k2 is incre-
mented again through all its values greater than k1. This procedure is repeated
until k1 L 3. The result of this procedure is a 2-D array, s2 k1, k2 , and the last
step is to look for the maximum value in this array. The values of k1 and k2 cor-
responding to that maximum in the array are the optimum thresholds, k* and k*.
1 2
vtucircle.com
If there are several maxima, the corresponding values of k1 and k2 are averaged to
obtain the final thresholds. The thresholded image is then given by
a if f (x, y) k*1
g(x, y) b if k* f (x, y) k* (10-76)
1 2
c
if f (x, y) k*2
Figure 10.42(a) shows an image of an iceberg. The objective of this example is to segment the image into
three regions: the dark background, the illuminated area of the iceberg, and the area in shadows. It is
evident from the image histogram in Fig. 10.42(b) that two thresholds are required to solve this problem.
The procedure discussed above resulted in the thresholds k1 80 and k 2 177, which we note from
Fig. 10.45(b) are near the centers of the two histogram valleys. Figure 10.42(c) is the segmentation that
resulted using these two thresholds in Eq. (10-76). The separability measure was 0.954. The principal
reason this example worked out so well can be traced to the histogram having three distinct modes
separated by reasonably wide, deep valleys. But we can do even better using superpixels, as you will see
in Section 10.5.
FIGURE 10.42 (a) Image of an iceberg. (b) Histogram. (c) Image segmented into three regions using dual Otsu thresholds.
(Original image courtesy of NOAA.)
vtucircle.com
10.3 Thresholding 761
VARIABLE THRESHOLDING
As discussed earlier in this section, factors such as noise and nonuniform illumina-
tion play a major role in the performance of a thresholding algorithm. We showed
that image smoothing and the use of edge information can help significantly. How-
ever, sometimes this type of preprocessing is either impractical or ineffective in
improving the situation, to the point where the problem cannot be solved by any
of the thresholding methods discussed thus far. In such situations, the next level of
thresholding complexity involves variable thresholding, as we will illustrate in the
following discussion.
where f (x, y) is the input image. This equation is evaluated for all pixel locations
in the image, and a different threshold is computed at each location (x, y) using the
pixels in the neighborhood Sxy .
Significant power (with a modest increase in computation) can be added to vari-
able thresholding by using predicates based on the parameters computed in the neigh-
borhood of a point (x, y) :
vtucircle.com
where Q is a predicate based on parameters computed using the pixels in neighbor-
hood Sxy . For example, consider the following predicate, Q sxy , mxy , based on the
local mean and standard deviation:
Q s ,m
TRUE if f (x, y) asxy AND f (x, y) bmxy (10-82)
xy xy
FALSE otherwise
Note that Eq. (10-80) is a special case of Eq. (10-81), obtained by letting Q be TRUE
if f (x, y) Txy and FALSE otherwise. In this case, the predicate is based simply on
the intensity at a point.
Figure 10.43(a) shows the yeast image from Example 10.16. This image has three predominant inten-
sity levels, so it is reasonable to assume that perhaps dual thresholding could be a good segmentation
approach. Figure 10.43(b) is the result of using the dual thresholding method summarized in Eq. (10-76).
As the figure shows, it was possible to isolate the bright areas from the background, but the mid-gray
regions on the right side of the image were not segmented (i.e., separated) properly. To illustrate the use
c d
FIGURE 10.43
(a) Image from
Fig. 10.40.
(b) Image
segmented using
the dual
thresholding
approach given
by Eq. (10-76).
(c) Image of local
standard
deviations.
(d) Result
obtained using
local thresholding.
vtucircle.com
10.3 Thresholding 763
of local thresholding, we computed the local standard deviation sxy for all (x, y) in the input image using
a neighborhood of size 3 3. Figure 10.43(c) shows the result. Note how the faint outer lines correctly
delineate the boundaries of the cells. Next, we formed a predicate of the form shown in Eq. (10-82), but
using the global mean instead of mxy . Choosing the global mean generally gives better results when the
background is nearly constant and all the object intensities are above or below the background intensity.
The values a 30 and b 1.5 were used to complete the specification of the predicate (these values
were determined experimentally, as is usually the case in applications such as this). The image was then
segmented using Eq. (10-82). As Fig. 10.43(d) shows, the segmentation was quite successful. Note in par-
ticular that all the outer regions were segmented properly, and that most of the inner, brighter regions
were isolated correctly.
1 k1
m(k 1)
n i k2 n
zi for k n 1
(10-83)
m(k)
1
z
k1
z
k n
for k n 1
n
where n is the number of points used in computing the average, and m(1) z1. The
conditions imposed on k are so that all subscripts on zk are positive. All this means
is that n points must be available for computing the average. When k is less than the
limits shown (this happens near the image borders) the averages are formed with
the available image points. Because a moving average is computed for every point
in the image, segmentation is implemented using Eq. (10-80) with Txy cmxy , where
c is positive scalar, and mxy is the moving average from Eq. (10-83) at point (x, y) in
the input image.
Figure 10.44(a) shows an image of handwritten text shaded by a spot intensity pattern. This form of
intensity shading is typical of images obtained using spot illumination (such as a photographic flash).
Figure 10.44(b) is the result of segmentation using the Otsu global thresholding method. It is not unex-
pected that global thresholding could not overcome the intensity variation because the method gener-
ally performs poorly when the areas of interest are embedded in a nonuniform illumination field. Figure
10.44(c) shows successful segmentation with local thresholding using moving averages. For images of
written material, a rule of thumb is to let n equal five times the average stroke width. In this case, the
average width was 4 pixels, so we let n 20 in Eq. (10-83) and used c 0.5.
vtucircle.com
FIGURE 10.44 (a) Text image corrupted by spot shading. (b) Result of global thresholding using Otsu’s method.
(c) Result of local thresholding using moving averages.
As another illustration of the effectiveness of this segmentation approach, we used the same param-
eters as in the previous paragraph to segment the image in Fig. 10.45(a), which is corrupted by a sinu-
soidal intensity variation typical of the variations that may occur when the power supply in a document
scanner is not properly grounded. As Figs. 10.45(b) and (c) show, the segmentation results are compa-
rable to those in Fig. 10.44.
Note that successful segmentation results were obtained in both cases using the same values for n
and c, which shows the relative ruggedness of the approach. In general, thresholding based on moving
averages works well when the objects of interest are small (or thin) with respect to the image size, a
condition satisfied by images of typed or handwritten text.
REGION GROWING
As its name implies, region growing is a procedure that groups pixels or subregions
into larger regions based on predefined criteria for growth. The basic approach is to
start with a set of “seed” points, and from these grow regions by appending to each
seed those neighboring pixels that have predefined properties similar to the seed
(such as ranges of intensity or color).
Selecting a set of one or more starting points can often be based on the nature of
the problem, as we show later in Example 10.20. When a priori information is not
vtucircle.com
10.4 Segmentation by Region Growing and by Region Splitting and Merging 765
FIGURE 10.45 (a) Text image corrupted by sinusoidal shading. (b) Result of global thresholding using Otsu’s method.
(c) Result of local thresholding using moving averages..
available, the procedure is to compute at every pixel the same set of properties that
ultimately will be used to assign pixels to regions during the growing process. If the
result of these computations shows clusters of values, the pixels whose properties
place them near the centroid of these clusters can be used as seeds.
The selection of similarity criteria depends not only on the problem under con-
sideration, but also on the type of image data available. For example, the analysis of
land-use satellite imagery depends heavily on the use of color. This problem would
be significantly more difficult, or even impossible, to solve without the inherent infor-
mation available in color images. When the images are monochrome, region analysis
must be carried out with a set of descriptors based on intensity levels and spatial
properties (such as moments or texture). We will discuss descriptors useful for region
characterization in Chapter 11.
Descriptors alone can yield misleading results if connectivity properties are not
used in the region-growing process. For example, visualize a random arrangement of
pixels that have three distinct intensity values. Grouping pixels with the same inten-
sity value to form a “region,” without paying attention to connectivity, would yield a
segmentation result that is meaningless in the context of this discussion.
Another problem in region growing is the formulation of a stopping rule. Region
growth should stop when no more pixels satisfy the criteria for inclusion in that
region. Criteria such as intensity values, texture, and color are local in nature and
do not take into account the “history” of region growth. Additional criteria that can
increase the power of a region-growing algorithm utilize the concept of size, like-
ness between a candidate pixel and the pixels grown so far (such as a comparison of
the intensity of a candidate and the average intensity of the grown region), and the
shape of the region being grown. The use of these types of descriptors is based on
the assumption that a model of expected results is at least partially available.
Let: f (x, y) denote an input image; S(x, y) denote a seed array containing 1’s
at the locations of seed points and 0’s elsewhere; and Q denote a predicate to be
applied at each location (x, y). Arrays f and S are assumed to be of the same size.
A basic region-growing algorithm based on 8-connectivity may be stated as follows.
vtucircle.com
1. Find all connected components in S(x, y) and reduce each connected component
See Sections 2.5 and 9.5
regarding connected to one pixel; label all such pixels found as 1. All other pixels in S are labeled 0.
components, and
Section 9.2 regarding 2. Form an image fQ such that, at each point (x, y), fQ (x, y) 1 if the input image
erosion. satisfies a given predicate, Q, at those coordinates, and fQ (x, y) 0 otherwise.
3. Let g be an image formed by appending to each seed point in S all the 1-valued
points in fQ that are 8-connected to that seed point.
4. Label each connected component in g with a different region label (e.g.,integers
or letters). This is the segmented image obtained by region growing.
Figure 10.46(a) shows an 8-bit X-ray image of a weld (the horizontal dark region) containing several
cracks and porosities (the bright regions running horizontally through the center of the image). We illus-
trate the use of region growing by segmenting the defective weld regions. These regions could be used
in applications such as weld inspection, for inclusion in a database of historical studies, or for controlling
an automated welding system.
The first thing we do is determine the seed points. From the physics of the problem, we know that
cracks and porosities will attenuate X-rays considerably less than solid welds, so we expect the regions
containing these types of defects to be significantly brighter than other parts of the X-ray image. We
can extract the seed points by thresholding the original image, using a threshold set at a high percen-
tile. Figure 10.46(b) shows the histogram of the image, and Fig. 10.46(c) shows the thresholded result
obtained with a threshold equal to the 99.9 percentile of intensity values in the image, which in this case
was 254 (see Section 10.3 regarding percentiles). Figure 10.46(d) shows the result of morphologically
eroding each connected component in Fig. 10.46(c) to a single point.
Next, we have to specify a predicate. In this example, we are interested in appending to each seed
all the pixels that (a) are 8-connected to that seed, and (b) are “similar” to it. Using absolute intensity
differences as a measure of similarity, our predicate applied at each location (x, y) is
TRUE if the absolute difference of intensities
between the seed and the pixel at (x, y) is T
Q
FALSE
otherwise
where T is a specified threshold. Although this predicate is based on intensity differences and uses a
single threshold, we could specify more complex schemes in which a different threshold is applied to
each pixel, and properties other than differences are used. In this case, the preceding predicate is suf-
ficient to solve the problem, as the rest of this example shows.
From the previous paragraph, we know that all seed values are 255 because the image was thresh-
olded with a threshold of 254. Figure 10.46(e) shows the difference between the seed value (255) and
Fig. 10.46(a). The image in Fig. 10.46(e) contains all the differences needed to compute the predicate at
each location (x, y). Figure 10.46(f) shows the corresponding histogram. We need a threshold to use in
the predicate to establish similarity. The histogram has three principal modes, so we can start by apply-
ing to the difference image the dual thresholding technique discussed in Section 10.3. The resulting two
thresholds in this case were T1 68 and T2 126, which we see correspond closely to the valleys of
the histogram. (As a brief digression, we segmented the image using these two thresholds. The result in
vtucircle.com
10.4 Segmentation by Region Growing and by Region Splitting and Merging 767
a b c
de f
gh i
Figure 10.46 (a) X-ray image of a defective weld. (b) Histogram. (c) Initial seed image. (d) Final seed image (the
points were enlarged for clarity). (e) Absolute value of the difference between the seed value (255) and (a).
(f) Histogram of (e). (g) Difference image thresholded using dual thresholds. (h) Difference image thresholded with
the smallest of the dual thresholds. (i) Segmentation result obtained by region growing. (Original image courtesy
of X-TEK Systems, Ltd.)
Fig. 10.46(g) shows that segmenting the defects cannot be accomplished using dual thresholds, despite
the fact that the thresholds are in the deep valleys of the histogram.)
Figure 10.46(h) shows the result of thresholding the difference image with only T1. The black points
are the pixels for which the predicate was TRUE; the others failed the predicate. The important result
here is that the points in the good regions of the weld failed the predicate, so they will not be included
in the final result. The points in the outer region will be considered by the region-growing algorithm as
vtucircle.com
candidates. However, Step 3 will reject the outer points because they are not 8-connected to the seeds.
In fact, as Fig. 10.46(i) shows, this step resulted in the correct segmentation, indicating that the use of
connectivity was a fundamental requirement in this case. Finally, note that in Step 4 we used the same
value for all the regions found by the algorithm. In this case, it was visually preferable to do so because
all those regions have the same physical meaning in this application—they all represent porosities.
1. Split into four disjoint quadrants any region Ri for which Q(Ri ) FALSE.
2. When no further splitting is possible, merge any adjacent regions Rj and Rk for
which Q Rj URk TRUE.
a b R
FIGURE 10.47
(a) Partitioned
image.
(b) Corresponding
quadtree.
R represents
the entire image
region.
vtucircle.com
10.4 Segmentation by Region Growing and by Region Splitting and Merging 769
Numerous variations of this basic theme are possible. For example, a significant
simplification results if in Step 2 we allow merging of any two adjacent regions Rj
and Rk if each one satisfies the predicate individually. This results in a much sim-
pler (and faster) algorithm, because testing of the predicate is limited to individual
quadregions. As the following example shows, this simplification is still capable of
yielding good segmentation results.
Figure 10.48(a) shows a 566 566 X-ray image of the Cygnus Loop supernova. The objective of this
example is to segment (extract from the image) the “ring” of less dense matter surrounding the dense
inner region. The region of interest has some obvious characteristics that should help in its segmenta-
tion. First, we note that the data in this region has a random nature, indicating that its standard devia-
tion should be greater than the standard deviation of the background (which is near 0) and of the large
central region, which is smooth. Similarly, the mean value (average intensity) of a region containing
data from the outer ring should be greater than the mean of the darker background and less than the
mean of the lighter central region. Thus, we should be able to segment the region of interest using the
following predicate:
c d
FIGURE 10.48
(a) Image of the
Cygnus Loop
supernova, taken
in the X-ray band
by NASA’s
Hubble Telescope.
(b) through (d)
Results of limit-
ing the smallest
allowed
quadregion to be
of sizes of 32 32,
16 16, and 8 8
pixels,
respectively.
(Original image
courtesy of
NASA.)
vtucircle.com
TRUE if sR a AND 0 mR b
Q(R)
FALSE otherwise
where sR and mR are the standard deviation and mean of the region being processed, and a and b are
nonnegative constants.
Analysis of several regions in the outer area of interest revealed that the mean intensity of pixels
in those regions did not exceed 125, and the standard deviation was always greater than 10. Figures
10.48(b) through (d) show the results obtained using these values for a and b, and varying the minimum
size allowed for the quadregions from 32 to 8. The pixels in a quadregion that satisfied the predicate
were set to white; all others in that region were set to black. The best result in terms of capturing the
shape of the outer region was obtained using quadregions of size 16 16. The small black squares in
Fig. 10.48(d) are quadregions of size 8 8 whose pixels did not satisfy the predicate. Using smaller
quadregions would result in increasing numbers of such black regions. Using regions larger than the one
illustrated here would result in a more “block-like” segmentation. Note that in all cases the segmented
region (white pixels) was a connected region that completely separates the inner, smoother region from
the background. Thus, the segmentation effectively partitioned the image into three distinct areas that
correspond to the three principal features in the image: background, a dense region, and a sparse region.
Using any of the white regions in Fig. 10.48 as a mask would make it a relatively simple task to extract
these regions from the original image (see Problem 10.43). As in Example 10.20, these results could not
have been obtained using edge- or threshold-based segmentation.
As used in the preceding example, properties based on the mean and standard
deviation of pixel intensities in a region attempt to quantify the texture of the region
(see Section 11.3 for a discussion on texture). The concept of texture segmentation
is based on using measures of texture in the predicates. In other words, we can per-
form texture segmentation by any of the methods discussed in this section simply by
specifying predicates based on texture content.
vtucircle.com