0% found this document useful (0 votes)

5 views103 pages

Mod3 - Computer Vision

This chapter discusses the image degradation and restoration process, modeling degradation as an operator combined with additive noise to produce a degraded image. It introduces various noise models, including Gaussian, Rayleigh, Erlang, exponential, uniform, and salt-and-pepper noise, detailing their probability density functions and characteristics. The chapter emphasizes the importance of understanding these noise types for effective image restoration techniques.

Uploaded by

sudarssudarshannk38

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views103 pages

Mod3 - Computer Vision

Uploaded by

sudarssudarshannk38

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 103

MODULE 3

318 Chapter 5 Image Restoration and Reconstruction

5.1 A MODEL OF THE IMAGE DEGRADATION/RESTORATION

PROCESS
In this chapter, we model image degradation as an operator $ that, together with an
additive noise term, operates on an input image f (x, y) to produce a degraded image
g(x, y) (see Fig. 5.1). Given g(x, y), some knowledge about $, and some knowledge
about the additive noise term h(x, y), the objective of restoration is to obtain an
estimate fˆ(x, y) of the original image. We want the estimate to be as close as possible
to the original image and, in general, the more we know about $ and h, the closer
fˆ(x, y) will be to f (x, y).
We will show in Section 5.5 that, if $ is a linear, position-invariant operator, then
the degraded image is given in the spatial domain by

g(x, y)  (h ★ f )(x, y)  h(x, y) (5-1)

where h(x, y) is the spatial representation of the degradation function. As in Chapters

3 and 4, the symbol “★” indicates convolution. It follows from the convolution theorem
that the equivalent of Eq. (5-1) in the frequency domain is

G(u, v)  H(u, v)F(u, v)  N(u, v) (5-2)

where the terms in capital letters are the Fourier transforms of the corresponding
terms in Eq. (5-1). These two equations are the foundation for most of the restora-
tion material in this chapter.
In the following three sections, we work only with degradations caused by noise.
Beginning in Section 5.5 we look at several methods for image restoration in the
presence of both $ and h.

5.2 NOISE MODELS

The principal sources of noise in digital images arise during image acquisition and/or
transmission. The performance of imaging sensors is affected by a variety of environ-
mental factors during image acquisition, and by the quality of the sensing elements
themselves. For instance, in acquiring images with a CCD camera, light levels and
sensor temperature are major factors affecting the amount of noise in the resulting
image. Images are corrupted during transmission principally by interference in the
transmission channel. For example, an image transmitted using a wireless network
might be corrupted by lightning or other atmospheric disturbance.

FIGURE 5.1
A model of the g (x, y)
image f(x, y) fˆ(x, y)
degradation/
restoration
process.

vtucircle.com
5.2 Noise Models 319

SPATIAL AND FREQUENCY PROPERTIES OF NOISE

Relevant to our discussion are parameters that define the spatial characteristics of
noise, and whether the noise is correlated with the image. Frequency properties refer
to the frequency content of noise in the Fourier (frequency) domain discussed in
detail in Chapter 4. For example, when the Fourier spectrum of noise is constant, the
noise is called white noise. This terminology is a carryover from the physical prop-
erties of white light, which contains all frequencies in the visible spectrum in equal
proportions.
With the exception of spatially periodic noise, we assume in this chapter that
noise is independent of spatial coordinates, and that it is uncorrelated with respect
to the image itself (that is, there is no correlation between pixel values and the values
of noise components). Although these assumptions are at least partially invalid in
some applications (quantum-limited imaging, such as in X-ray and nuclear-medicine
imaging, is a good example), the complexities of dealing with spatially dependent
and correlated noise are beyond the scope of our discussion.

SOME IMPORTANT NOISE PROBABILITY DENSITY FUNCTIONS

In the discussion that follows, we shall be concerned with the statistical behavior of
You may find it helpful
to take a look at the the intensity values in the noise component of the model in Fig. 5.1. These may be
Tutorials section of the
book website for a brief
considered random variables, characterized by a probability density function (PDF),
review of probability. as noted briefly as noted earlier. The noise component of the model in Fig. 5.1 is an
image, h(x, y), of the same size as the input image. We create a noise image for simu-
lation purposes by generating an array whose intensity values are random numbers
with a specified probability density function. This approach is true for all the PDFs
to be discussed shortly, with the exception of salt-and-pepper noise, which is applied
differently. The following are among the most common noise PDFs found in image
processing applications.

Gaussian Noise
Because of its mathematical tractability in both the spatial and frequency domains,
Gaussian noise models are used frequently in practice. In fact, this tractability is so
convenient that it often results in Gaussian models being used in situations in which
they are marginally applicable at best.
The PDF of a Gaussian random variable, z, is defined by the following familiar
expression:

(z  z )2
1 
p(z)  e 2s2
∞  z  ∞ (5-3)

where z represents intensity, z is the mean (average) value of z, and s is its standard
deviation. Figure 5.2(a) shows a plot of this function. We know that for a Gaussian
random variable, the probability that values of z are in the range z  s is approxi-
mately 0.68; the probability is about 0.95 that the values of z are in the range z  2s.

vtucircle.com
320 Chapter 5 Image Restoration and Reconstruction
p(z) p(z) p(z)

1 2
0.607
b K
Rayleigh
0.607
a(b — 1)b—1 —(b—1)
e
(b — 1)!

_ _ _
— a b z (b — 1)/a z
a+
2
p(z) p(z) p(z)
1  Pp )
a
b—a
Ps
Pp

z a b z 0 V 2k  1 z
a b c
de f
FIGURE 5.2 Some important probability density functions.

Rayleigh Noise
The PDF of Rayleigh noise is given by

 2 (z  a)e(z  a) 2
b
za
 (5-4)
0 z a

The mean and variance of z when this random variable is characterized by a Ray-
leigh PDF are

z  a  (5-5)
and
2 b4  p
s  (5-6)
4
Figure 5.2(b) shows a plot of the Rayleigh density. Note the displacement from the
origin, and the fact that the basic shape of the density is skewed to the right. The
Rayleigh density can be quite useful for modeling the shape of skewed histograms.

vtucircle.com
5.2 Noise Models 321

Erlang (Gamma) Noise

The PDF of Erlang noise is

 abzb  1 az
 e z 0
p(z)  (b  1)! (5-7)
0 z 0

where the parameters are such that a  b, b is a positive integer, and “!” indicates
factorial. The mean and variance of z are
b
z (5-8)
a
and
b
s2  (5-9)
a2
Figure 5.2(c) shows a plot of this density. Although Eq. (5-9) often is referred to as
the gamma density, strictly speaking this is correct only when the denominator is
the gamma function, Г(b). When the denominator is as shown, the density is more
appropriately called the Erlang density.

Exponential Noise
The PDF of exponential noise is given by

aeaz z 0
p(z)   (5-10)

0 z 0

where a  0. The mean and variance of z are

1
z (5-11)
a
and
1
s2  (5-12)
a2
Note that this PDF is a special case of the Erlang PDF with b  1. Figure 5.2(d)
shows a plot of the exponential density function.

Uniform Noise
The PDF of uniform noise is

 1
 a z b
p(z)  b  a (5-13)
 0
 otherwise

vtucircle.com
322 Chapter 5 Image Restoration and Reconstruction
The mean and variance of z are
a  b
z (5-14)
2
and
2 (b  a)2
s  (5-15)
12
Figure 5.2(e) shows a plot of the uniform density.

Salt-and-Pepper Noise
If k represents the number of bits used to represent the intensity values in a digital
image, then the range of possible intensity values for that image is [0, 2k  1] (e.g.,
[0, 255] for an 8-bit image). The PDF of salt-and-pepper noise is given by
When image intensities
are scaled to the range P for z  2k  1
[0, 1], we replace by 1 the
value of salt in this equa-  s

tion. V then becomes a p(z)  Pp for z  0 (5-16)

fractional value in the 
open interval (0, 1).  1  (Ps  Pp ) for z  V

where V is any integer value in the range 0  V  2k  1.

Let h(x, y) denote a salt-and-pepper noise image, whose intensity values satisfy
Eq. (5-16). Given an image, f (x, y), of the same size as h(x, y), we corrupt it with salt-
and-pepper noise by assigning a 0 to all locations in f where a 0 occurs in h. Similarly,
we assign a value of 2k  1 to all location in f where that value appears in h. Finally,
we leave unchanged all location in f where V occurs in h.
If neither Ps nor Pp is zero, and especially if they are equal, noise values satisfy-
ing Eq. (5-16) will be white (2k  1) or black (0), and will resemble salt and pepper
granules distributed randomly over the image; hence the name of this type of noise.
Other names you will find used in the literature are bipolar impulse noise (unipolar
if either Ps or Pp is 0), data-drop-out noise, and spike noise. We use the terms impulse
and salt-and-pepper noise interchangeably.
The probability, P, that a pixel is corrupted by salt or pepper noise is P  Ps  Pp.
It is common terminology to refer to P as the noise density. If, for example, Ps  0.02
and Pp  0.01, then P  0.03 and we say that approximately 2% of the pixels in an
image are corrupted by salt noise, 1% are corrupted by pepper noise, and the noise
density is 3%, meaning that approximately 3% of the pixels in the image are cor-
rupted by salt-and-pepper noise.
Although, as you have seen, salt-and-pepper noise is specified by the probability
of each, and not by the mean and variance, we include the latter here for complete-
ness. The mean of salt-and-pepper noise is given by

z  (0)Pp  K(1  Ps  Pp )  (2k  1)Ps (5-17)

and the variance by

vtucircle.com
5.2 Noise Models 323

s2  (0  z)2 Pp  (K  z)2(1  Ps  Pp )  (2k  1)2 Ps (5-18)

where we have included 0 as a value explicit in both equations to indicate that the
value of pepper noise is assumed to be zero.
As a group, the preceding PDFs provide useful tools for modeling a broad range
of noise corruption situations found in practice. For example, Gaussian noise arises
in an image due to factors such as electronic circuit noise and sensor noise caused by
poor illumination and/or high temperature. The Rayleigh density is helpful in char-
acterizing noise phenomena in range imaging. The exponential and gamma densities
find application in laser imaging. Impulse noise is found in situations where quick
transients, such as faulty switching, take place during imaging. The uniform density
is perhaps the least descriptive of practical situations. However, the uniform density
is quite useful as the basis for numerous random number generators that are used
extensively in simulations (Gonzalez, Woods, and Eddins [2009]).

Figure 5.3 shows a test pattern used for illustrating the noise models just discussed. This is a suitable pat-
tern to use because it is composed of simple, constant areas that span the gray scale from black to near
white in only three increments. This facilitates visual analysis of the characteristics of the various noise
components added to an image.
Figure 5.4 shows the test pattern after addition of the six types of noise in Fig. 5.2. Below each image
is the histogram computed directly from that image. The parameters of the noise were chosen in each
case so that the histogram corresponding to the three intensity levels in the test pattern would start to
merge. This made the noise quite visible, without obscuring the basic structure of the underlying image.
We see a close correspondence in comparing the histograms in Fig. 5.4 with the PDFs in Fig. 5.2.
The histogram for the salt-and-pepper example does not contain a specific peak for V because, as you
will recall, V is used only during the creation of the noise image to leave values in the original image
unchanged. Of course, in addition to the salt and pepper peaks, there are peaks for the other intensi-
ties in the image. With the exception of slightly different overall intensity, it is difficult to differentiate

FIGURE 5.3
Test pattern used
to illustrate the
characteristics of
the PDFs from
Fig. 5.2.

vtucircle.com
324 Chapter 5 Image Restoration and Reconstruction

a b c
de f
FIGURE 5.4 Images and histograms resulting from adding Gaussian, Rayleigh, and Erlanga noise to the image in
Fig. 5.3.

visually between the first five images in Fig. 5.4, even though their histograms are significantly different.
The salt-and-pepper appearance of the image in Fig. 5.4(i) is the only one that is visually indicative of
the type of noise causing the degradation.

PERIODIC NOISE
Periodic noise in images typically arises from electrical or electromechanical inter-
ference during image acquisition. This is the only type of spatially dependent noise
we will consider in this chapter. As we will discuss in Section 5.4, periodic noise can
be reduced significantly via frequency domain filtering. For example, consider the
image in Fig. 5.5(a). This image is corrupted by additive (spatial) sinusoidal noise.
The Fourier transform of a pure sinusoid is a pair of conjugate impulses† located at

† Be careful not to confuse the term impulse in the frequency domain with the use of the same term in impulse
noise discussed earlier, which is in the spatial domain.

vtucircle.com
5.2 Noise Models 325

g i
j l
FIGURE 5.4 (continued) Images and histograms resulting from adding exponential, uniform, and salt-and-pepper noise
to the image in Fig. 5.3. In the salt-and-pepper histogram, the peaks in the origin (zero intensity) and at the far end
of the scale are shown displaced slightly so that they do not blend with the page background.

the conjugate frequencies of the sine wave (see Table 4.4). Thus, if the amplitude of
a sine wave in the spatial domain is strong enough, we would expect to see in the
spectrum of the image a pair of impulses for each sine wave in the image. As shown
in Fig. 5.5(b), this is indeed the case. Eliminating or reducing these impulses in the
frequency domain will eliminate or reduce the sinusoidal noise in the spatial domain.
We will have much more to say in Section 5.4 about this and other examples of peri-
odic noise.

ESTIMATING NOISE PARAMETERS

The parameters of periodic noise typically are estimated by inspection of the Fourier
spectrum. Periodic noise tends to produce frequency spikes that often can be detect-
ed even by visual analysis. Another approach is to attempt to infer the periodicity

vtucircle.com
326 Chapter 5 Image Restoration and Reconstruction

a b
FIGURE 5.5
(a) Image
corrupted by
additive
sinusoidal noise.
(b) Spectrum
showing two
conjugate
impulses caused
by the sine wave.
(Original
image courtesy of
NASA.)

of noise components directly from the image, but this is possible only in simplis-
tic cases. Automated analysis is possible in situations in which the noise spikes are
either exceptionally pronounced, or when knowledge is available about the general
location of the frequency components of the interference (see Section 5.4).
The parameters of noise PDFs may be known partially from sensor specifications,
but it is often necessary to estimate them for a particular imaging arrangement. If
the imaging system is available, one simple way to study the characteristics of system
noise is to capture a set of “flat” images. For example, in the case of an optical sen-
sor, this is as simple as imaging a solid gray board that is illuminated uniformly. The
resulting images typically are good indicators of system noise.
When only images already generated by a sensor are available, it is often possible
to estimate the parameters of the PDF from small patches of reasonably constant
background intensity. For example, the vertical strips shown in Fig. 5.6 were cropped
from the Gaussian, Rayleigh, and uniform images in Fig. 5.4. The histograms shown
were calculated using image data from these small strips. The histograms in Fig. 5.4
that correspond to the histograms in Fig. 5.6 are the ones in the middle of the group
of three in Figs. 5.4(d), (e), and (k).We see that the shapes of these histograms cor-
respond quite closely to the shapes of the corresponding histograms in Fig. 5.6. Their
heights are different due to scaling, but the shapes are unmistakably similar.
The simplest use of the data from the image strips is for calculating the mean and
variance of intensity levels. Consider a strip (subimage) denoted by S, and let pS (zi ),
i  0, 1, 2, … , L  1, denote the probability estimates (normalized histogram values)
of the intensities of the pixels in S, where L is the number of possible intensities in
the entire image (e.g., 256 for an 8-bit image). As in Eqs. (2-69) and (2-70), we esti-
mate the mean and variance of the pixel values in S as follows:
L1

z 
i0
zi pS (zi ) (5-19)

and

vtucircle.com
5.3 Restoration in the Presence of Noise Only—Spatial Filtering 327

FIGURE 5.6 Histograms computed using small strips (shown as inserts) from (a) the Gaussian, (b) the Rayleigh, and
(c) the uniform noisy images in Fig. 5.4.

L1

s2  (zi  z)2 pS (zi )

i0
(5-20)

The shape of the histogram identifies the closest PDF match. If the shape is approxi-
mately Gaussian, then the mean and variance are all we need because the Gaussian
PDF is specified completely by these two parameters. For the other shapes discussed
earlier, we use the mean and variance to solve for the parameters a and b. Impulse
noise is handled differently because the estimate needed is of the actual probability
of occurrence of white and black pixels. Obtaining this estimate requires that both
black and white pixels be visible, so a mid-gray, relatively constant area is needed in
the image in order to be able to compute a meaningful histogram of the noise. The
heights of the peaks corresponding to black and white pixels are the estimates of Pa
and Pb in Eq. (5-16).

5.3 RESTORATION IN THE PRESENCE OF NOISE ONLY—SPATIAL

FILTERING
When an image is degraded only by additive noise, Eqs. (5-1) and (5-2) become

g(x, y)  f (x, y)  h(x, y) (5-21)

and

G(u, v)  F(u, v)  N(u, v) (5-22)

The noise terms generally are unknown, so subtracting them from g(x, y) [G(u, v)]
to obtain f (x, y) [F(u, v)] typically is not an option. In the case of periodic noise,

vtucircle.com
328 Chapter 5 Image Restoration and Reconstruction

sometimes it is possible to estimate N(u, v) from the spectrum of G(u, v), as noted
in Section 5.2. In this case N(u, v) can be subtracted from G(u, v) to obtain an esti-
mate of the original image, but this type of knowledge is the exception, rather than
the rule.
Spatial filtering is the method of choice for estimating f (x, y) [i.e., denoising
image g(x, y)] in situations when only additive random noise is present. Spatial fil-
tering was discussed in detail in Chapter 3. With the exception of the nature of the
computation performed by a specific filter, the mechanics for implementing all the
filters that follow are exactly as discussed in Sections 3.4 through 3.7.

MEAN FILTERS
In this section, we discuss briefly the noise-reduction capabilities of the spatial filters
introduced in Section 3.5 and develop several other filters whose performance is in
many cases superior to the filters discussed in that section.

Arithmetic Mean Filter

The arithmetic mean filter is the simplest of the mean filters (the arithmetic mean
filter is the same as the box filter we discussed in Chapter 3). Let Sxy represent the
set of coordinates in a rectangular subimage window (neighborhood) of size m  n,
We assume that m and
n are odd integers. The centered on point (x, y). The arithmetic mean filter computes the average value of
size of a mean filter is
the same as the size of
the corrupted image, g(x, y), in the area defined by Sxy . The value of the restored
neighborhood Sxy; that image fˆ at point (x, y) is the arithmetic mean computed using the pixels in the
is, m × n. region defined by Sxy .In other words, 
1 
fˆ(x, y)   g(r, c) (5-23)
mn (r, c)Sxy

where, as in Eq. (2-43), r and c are the row and column coordinates of the pixels
contained in the neighborhood Sxy . This operation can be implemented using a spa-
tial kernel of size m  n in which all coefficients have value 1 mn. A mean filter
smooths local variations in an image, and noise is reduced as a result of blurring.

Geometric Mean Filter

An image restored using a geometric mean filter is given by the expression

1
  mn 
fˆ(x, y)    g(r, c) (5-24)
(r, c)S
xy 

where M indicates multiplication. Here, each restored pixel is given by the product of
all the pixels in the subimage area, raised to the power 1 mn. As Example 5.2 below
illustrates, a geometric mean filter achieves smoothing comparable to an arithmetic
mean filter, but it tends to lose less image detail in the process.

vtucircle.com
5.3 Restoration in the Presence of Noise Only—Spatial Filtering 329

Harmonic Mean Filter

The harmonic mean filtering operation is given by the expression

mn
fˆ(x, y)  (5-25)

(r, c)Sxy
1
g(r, c)

The harmonic mean filter works well for salt noise, but fails for pepper noise. It does
well also with other types of noise like Gaussian noise.

Contraharmonic Mean Filter

The contraharmonic mean filter yields a restored image based on the expression

 g(r, c)Q1
fˆ(x, y)  (5-26)
(r, c)Sxy

 g(r, c)Q
(r, c)Sxy

where Q is called the order of the filter. This filter is well suited for reducing or vir-
tually eliminating the effects of salt-and-pepper noise. For positive values of Q, the
filter eliminates pepper noise. For negative values of Q, it eliminates salt noise. It
cannot do both simultaneously. Note that the contraharmonic filter reduces to the
arithmetic mean filter if Q  0, and to the harmonic mean filter if Q  1.

Figure 5.7(a) shows an 8-bit X-ray image of a circuit board, and Fig. 5.7(b) shows the same image, but
corrupted with additive Gaussian noise of zero mean and variance of 400. For this type of image, this is
a significant level of noise. Figures 5.7(c) and (d) show, respectively, the result of filtering the noisy image
with an arithmetic mean filter of size 3  3 and a geometric mean filter of the same size. Although both
filters did a reasonable job of attenuating the contribution due to noise, the geometric mean filter did
not blur the image as much as the arithmetic filter. For instance, the connector fingers at the top of the
image are sharper in Fig. 5.7(d) than in (c). The same is true in other parts of the image.
Figure 5.8(a) shows the same circuit image, but corrupted now by pepper noise with probability of
0.1. Similarly, Fig. 5.8(b) shows the image corrupted by salt noise with the same probability. Figure 5.8(c)
shows the result of filtering Fig. 5.8(a) using a contraharmonic mean filter with Q  1.5, and Fig. 5.8(d)
shows the result of filtering Fig. 5.8(b) with Q  1.5. Both filters did a good job of reducing the effect of
the noise. The positive-order filter did a better job of cleaning the background, at the expense of slightly
thinning and blurring the dark areas. The opposite was true of the negative order filter.
In general, the arithmetic and geometric mean filters (particularly the latter) are well suited for ran-
dom noise like Gaussian or uniform noise. The contraharmonic filter is well suited for impulse noise, but
it has the disadvantage that it must be known whether the noise is dark or light in order to select the
proper sign for Q. The results of choosing the wrong sign for Q can be disastrous, as Fig. 5.9 shows. Some
of the filters discussed in the following sections eliminate this shortcoming.

vtucircle.com
330 Chapter 5 Image Restoration and Reconstruction

c d
FIGURE 5.7
(a) X-ray image
of circuit board.
(b) Image
corrupted by
additive Gaussian
noise. (c) Result
of filtering with
an arithmetic
mean filter of size
3  3. (d) Result
of filtering with a
geometric mean
filter of the same
size. (Original
image courtesy of
Mr. Joseph E.
Pascente, Lixi,
Inc.)

ORDER-STATISTIC FILTERS
We introduced order-statistic filters in Section 3.6. We now expand the discussion
in that section and introduce some additional order-statistic filters. As noted in Sec-
tion 3.6, order-statistic filters are spatial filters whose response is based on ordering
(ranking) the values of the pixels contained in the neighborhood encompassed by
the filter. The ranking result determines the response of the filter.

Median Filter
The best-known order-statistic filter in image processing is the median filter, which,
as its name implies, replaces the value of a pixel by the median of the intensity levels
in a predefined neighborhood of that pixel:

fˆ(x, y)  median g(r, c)  (5-27)

(r, c)Sxy

where, as before, Sxy is a subimage (neighborhood) centered on point (x, y). The val-
ue of the pixel at (x, y) is included in the computation of the median. Median filters

vtucircle.com
5.3 Restoration in the Presence of Noise Only—Spatial Filtering 331

c d
FIGURE 5.8
(a) Image
corrupted by
pepper noise with
a probability of
0.1. (b) Image
corrupted by salt
noise with the
same
probability.
(c) Result of
filtering (a) with
a 3  3 contra-
harmonic filter
Q  1.5. (d) Result
of filtering (b)
with Q  1.5.

a b
FIGURE 5.9
Results of
selecting the
wrong sign in
contraharmonic
filtering.
(a) Result of
filtering Fig. 5.8(a)
with a
contraharmonic
filter of size 3  3
and Q  1.5.
(b) Result of
filtering Fig. 5.8(b)
using Q  1.5.

vtucircle.com
332 Chapter 5 Image Restoration and Reconstruction
are quite popular because, for certain types of random noise, they provide excellent
noise-reduction capabilities, with considerably less blurring than linear smoothing
filters of similar size. Median filters are particularly effective in the presence of both
bipolar and unipolar impulse noise, as Example 5.3 below shows. Computation of
the median and implementation of this filter are discussed in Section 3.6.

Max and Min Filters

Although the median filter is by far the order-statistic filter most used in image pro-
cessing, it is by no means the only one. The median represents the 50th percentile of
a ranked set of numbers, but you will recall from basic statistics that ranking lends
itself to many other possibilities. For example, using the 100th percentile results in
the so-called max filter, given by

fˆ(x, y)  max
(r, c)Sxy
g(r, c) (5-28)

This filter is useful for finding the brightest points in an image or for eroding dark
regions adjacent to bright areas. Also, because pepper noise has very low values, it
is reduced by this filter as a result of the max selection process in the subimage area
Sxy .
The 0th percentile filter is the min filter:

fˆ(x, y)  min
(r, c)Sxy
g(r, c) (5-29)

This filter is useful for finding the darkest points in an image or for eroding light
regions adjacent to dark areas. Also, it reduces salt noise as a result of the min opera-
tion.

Midpoint Filter
The midpoint filter computes the midpoint between the maximum and minimum
values in the area encompassed by the filter:

1
fˆ(x, y)  max g(r, c)  min g(r, c) (5-30)
 (r, c)S 
2 xy
(r, c)Sxy

Note that this filter combines order statistics and averaging. It works best for ran-
domly distributed noise, like Gaussian or uniform noise.

Alpha-Trimmed Mean Filter

Suppose that we delete the d 2 lowest and the d 2 highest intensity values of g(r, c)
in the neighborhood Sxy . Let gR (r, c) represent the remaining mn  d pixels in Sxy .
A filter formed by averaging these remaining pixels is called an alpha-trimmed mean
filter. The form of this filter is

vtucircle.com
5.3 Restoration in the Presence of Noise Only—Spatial Filtering 333


1
fˆ(x, y)  g (r, c) (5-31)
mn  d (r, c)S R

where the value of d can range from 0 to mn  1. When d  0 the alpha-trimmed fil-
ter reduces to the arithmetic mean filter discussed earlier. If we choose d  mn  1,
the filter becomes a median filter. For other values of d, the alpha-trimmed filter is
useful in situations involving multiple types of noise, such as a combination of salt-
and-pepper and Gaussian noise.

Figure 5.10(a) shows the circuit board image corrupted by salt-and-pepper noise with probabilities
Ps  Pp  0.1. Figure 5.10(b) shows the result of median filtering with a filter of size 3  3. The improve-
ment over Fig. 5.10(a) is significant, but several noise points still are visible. A second pass [on the im-
age in Fig. 5.10(b)] with the median filter removed most of these points, leaving only few, barely visible
noise points. These were removed with a third pass of the filter. These results are good examples of the
power of median filtering in handling impulse-like additive noise. Keep in mind that repeated passes
of a median filter will blur the image, so it is desirable to keep the number of passes as low as possible.
Figure 5.11(a) shows the result of applying the max filter to the pepper noise image of Fig. 5.8(a). The
filter did a reasonable job of removing the pepper noise, but we note that it also removed (set to a light
intensity level) some dark pixels from the borders of the dark objects. Figure 5.11(b) shows the result
of applying the min filter to the image in Fig. 5.8(b). In this case, the min filter did a better job than the
max filter on noise removal, but it removed some white points around the border of light objects. These
made the light objects smaller and some of the dark objects larger (like the connector fingers in the top
of the image) because white points around these objects were set to a dark level.
The alpha-trimmed filter is illustrated next. Figure 5.12(a) shows the circuit board image corrupted
this time by additive, uniform noise of variance 800 and zero mean. This is a high level of noise corrup-
tion that is made worse by further addition of salt-and-pepper noise with Ps  Pp  0.1, as Fig. 5.12(b)
shows. The high level of noise in this image warrants use of larger filters. Figures 5.12(c) through (f) show
the results, respectively, obtained using arithmetic mean, geometric mean, median, and alpha-trimmed
mean (with d  6 ) filters of size 5  5. As expected, the arithmetic and geometric mean filters (especially
the latter) did not do well because of the presence of impulse noise. The median and alpha-trimmed
filters performed much better, with the alpha-trimmed filter giving slightly better noise reduction. For
example, note in Fig. 5.12(f) that the fourth connector finger from the top left is slightly smoother in
the alpha-trimmed result. This is not unexpected because, for a high value of d, the alpha-trimmed filter
approaches the performance of the median filter, but still retains some smoothing capabilities.

ADAPTIVE FILTERS
Once selected, the filters discussed thus far are applied to an image without regard
for how image characteristics vary from one point to another. In this section, we
take a look at two adaptive filters whose behavior changes based on statistical char-
acteristics of the image inside the filter region defined by the m  n rectangular
neighborhood Sxy . As the following discussion shows, adaptive filters are capable
of performance superior to that of the filters discussed thus far. The price paid for

vtucircle.com
334 Chapter 5 Image Restoration and Reconstruction

c d
FIGURE 5.10
(a) Image
corrupted by salt-
and- pepper noise
with probabilities
Ps  Pp  0.1.
(b) Result of one
pass with a medi-
an filter of size
3  3. (c) Result
of processing (b)
with this filter.
(d) Result of
processing (c)
with the same
filter.

a b
FIGURE 5.11
(a) Result of
filtering Fig. 5.8(a)
with a max filter
of size 3  3.
(b) Result of
filtering Fig. 5.8(b)
with a min filter of
the same size.

vtucircle.com
5.3 Restoration in the Presence of Noise Only—Spatial Filtering 335

a b
c d
e f
FIGURE 5.12
(a) Image
corrupted by
additive uniform
noise. (b) Image
additionally
corrupted by
additive salt-and-
pepper noise.
(c)-(f) Image (b)
filtered with a
5  5:
(c) arithmetic
mean filter;
(d) geometric
mean filter;
(e) median filter;
(f) alpha-trimmed
mean filter, with
d  6.

vtucircle.com
336 Chapter 5 Image Restoration and Reconstruction

improved filtering power is an increase in filter complexity. Keep in mind that we

still are dealing with the case in which the degraded image is equal to the original
image plus noise. No other types of degradations are being considered yet.

Adaptive, Local Noise Reduction Filter

The simplest statistical measures of a random variable are its mean and variance.
These are reasonable parameters on which to base an adaptive filter because they
are quantities closely related to the appearance of an image. The mean gives a mea-
sure of average intensity in the region over which the mean is computed, and the
variance gives a measure of image contrast in that region.
Our filter is to operate on a neighborhood, Sxy , centered on coordinates (x, y).
The response of the filter at (x, y) is to be2based on the following quantities: g(x, y),
the value of the noisy image at (x, y); s , the variance of the noise; z S , the local
average intensity of the pixels in S h 2 xy

xy ; and s Sxy , the local variance of the intensities of

pixels in Sxy . We want the behavior of the filter to be as follows:
1. If s2 is zero, the filter should return simply the value of g at (x, y). This is the
trivial, zero-noise case in which g is equal to f at (x, y).
2. If the local variance s2 is high relative to s2 , the filter should return a value
Sxy h
close to g at (x, y). A high local variance typically is associated with edges, and
these should be preserved.
3. If the two variances are equal, we want the filter to return the arithmetic mean
value of the pixels in Sxy . This condition occurs when the local area has the same
properties as the overall image, and local noise is to be reduced by averaging.
An adaptive expression for obtaining fˆ(x, y) based on these assumptions may be
written as
fˆ(x, y)  g(x, y)  h g(x, y)  z 
2
(5-32)
2  Sxy 
S xy

The only quantity that needs to be known a priori is s2 , the variance of the noise
corrupting image f (x, y). This is a constant that can be estimated from sample noisy
images using Eq. (3-26). The other parameters are computed from the pixels in
neighborhood Sxy using Eqs. (3-27) and (3-28).
An assumption in Eq. (5-32) is that the ratio of the two variances does not exceed 1,
which implies that s2  s2 . The noise in our model is additive and position indepen-
h Sxy
dent, so this is a reasonable assumption to make because Sxy is a subset of g(x, y).
However, we seldom have exact knowledge of s2 . Therefore, it is possible for this
condition to be violated in practice. For that reason, a test should be built 2into 2an
implementation of Eq. (5-32) so that the ratio is set to 1 if the condition s  s
h Sxy
occurs. This makes this filter nonlinear. However, it prevents nonsensical results (i.e.,
negative intensity levels, depending on the value of zSxy ) due to a potential lack of
knowledge about the variance of the image noise. Another approach is to allow the
negative values to occur, and then rescale the intensity values at the end. The result
then would be a loss of dynamic range in the image.

vtucircle.com
5.3 Restoration in the Presence of Noise Only—Spatial Filtering 337

EXAMPLE 5.4: Image denoising using adaptive, local noise-reduction filtering.

Figure 5.13(a) shows the circuit-board image, corrupted this time by additive Gaussian noise of zero
mean and a variance of 1000. This is a significant level of noise corruption, but it makes an ideal test bed
on which to compare relative filter performance. Figure 5.13(b) is the result of processing the noisy im-
age with an arithmetic mean filter of size 7  7. The noise was smoothed out, but at the cost of significant
blurring. Similar comments apply to Fig. 5.13(c), which shows the result of processing the noisy image
with a geometric mean filter, also of size 7  7. The differences between these two filtered images are
analogous to those we discussed in Example 5.2; only the degree of blurring is different.
Figure 5.13(d) shows the result of using the adaptive filter of Eq. (5-32) with s2  1000. The improve-
ments in this result compared with the two previous filters are significant. In terms of overall noise
reduction, the adaptive filter achieved results similar to the arithmetic and geometric mean filters. How-
ever, the image filtered with the adaptive filter is much sharper. For example, the connector fingers at the
top of the image are significantly sharper in Fig. 5.13(d). Other features, such as holes and the eight legs
of the dark component on the lower left-hand side of the image, are much clearer in Fig. 5.13(d).These
results are typical of what can be achieved with an adaptive filter. As mentioned earlier, the price paid
for the improved performance is additional filter complexity.

c d
FIGURE 5.13
(a) Image
corrupted by
additive
Gaussian noise of
zero mean and a
variance of 1000.
(b) Result of
arithmetic mean
filtering.
(c) Result of
geometric mean
filtering.
(d) Result of
adaptive noise-
reduction filtering.
All filters used
were of size 7  7.

vtucircle.com
338 Chapter 5 Image Restoration and Reconstruction

The preceding results used a value for s2 that matched the variance of the noise exactly. If this
quantity is not known, and the estimate used is too low, the algorithm will return an image that closely
resembles the original because the corrections will be smaller than they should be. Estimates that are
too high will cause the ratio of the variances to be clipped at 1.0, and the algorithm will subtract the
mean from the image more frequently than it would normally. If negative values are allowed and the
image is rescaled at the end, the result will be a loss of dynamic range, as mentioned previously.

Adaptive Median Filter

The median filter in Eq. (5-27) performs well if the spatial density of the salt-and-
pepper noise is low (as a rule of thumb, Ps and Pp less than 0.2). We show in the fol-
lowing discussion that adaptive median filtering can handle noise with probabilities
larger than these. An additional benefit of the adaptive median filter is that it seeks
to preserve detail while simultaneously smoothing non-impulse noise, something
that the “traditional” median filter does not do. As in all the filters discussed in the
preceding sections, the adaptive median filter also works in a rectangular neighbor-
hood, Sxy . Unlike those filters, however, the adaptive median filter changes (increas-
es) the size of Sxy during filtering, depending on certain conditions to be listed short-
ly. Keep in mind that the output of the filter is a single value used to replace the
value of the pixel at (x, y), the point on which region Sxy is centered at a given time.
We use the following notation:

zmin  minimum intensity value in Sxy

zmax  maximum intensity value in Sxy
zmed  median of intensity values in Sxy
zxy  intensity at coordinates (x, y)
Smax  maximum allowed size of Sxy

The adaptive median-filtering algorithm uses two processing levels, denoted level A
and level B, at each point (x, y) :

Level A : If zmin  zmed  zmax , go to Level B

Else, increase the size of Sxy
If Sxy  Smax , repeat level A
Else, output zmed.
Level B : If zmin  zxy  zmax , output zxy
Else output zmed.

where Sxy and Smax are odd, positive integers greater than 1. Another option in the
last step of level A is to output zxy instead of zmed. This produces a slightly less
blurred result, but can fail to detect salt (pepper) noise embedded in a constant
background having the same value as pepper (salt) noise.

vtucircle.com
5.3 Restoration in the Presence of Noise Only—Spatial Filtering 339
This algorithm has three principal objectives: to remove salt-and-pepper (impulse)
noise, to provide smoothing of other noise that may not be impulsive, and to reduce
distortion, such as excessive thinning or thickening of object boundaries. The values
zmin and zmax are considered statistically by the algorithm to be “impulse-like” noise
components in region Sxy , even if these are not the lowest and highest possible pixel
values in the image.
With these observations in mind, we see that the purpose of level A is to deter-
mine if the median filter output, zmed , is an impulse (salt or pepper) or not. If the
condition zmin  zmed  zmax holds, then zmed cannot be an impulse for the reason
mentioned in the previous paragraph. In this case, we go to level B and test to see
if the point in the center of the neighborhood is itself an impulse (recall that (x, y)
is the location of the point being processed, and zxy is its intensity). If the condition
zmin  zxy  zmax is true, then the pixel at zxy cannot be the intensity of an impulse for
the same reason that zmed was not. In this case, the algorithm outputs the unchanged
pixel value, zxy . By not changing these “intermediate-level” points, distortion is
reduced in the filtered image. If the condition zmin  zxy  zmax is false, then either
zxy  zmin or zxy  zmax . In either case, the value of the pixel is an extreme value and
the algorithm outputs the median value, zmed , which we know from level A is not a
noise impulse. The last step is what the standard median filter does. The problem is
that the standard median filter replaces every point in the image by the median of
the corresponding neighborhood. This causes unnecessary loss of detail.
Continuing with the explanation, suppose that level A does find an impulse (i.e.,
it fails the test that would cause it to branch to level B). The algorithm then increas-
es the size of the neighborhood and repeats level A. This looping continues until
the algorithm either finds a median value that is not an impulse (and branches to
stage B), or the maximum neighborhood size is reached. If the maximum size is
reached, the algorithm returns the value of zmed. Note that there is no guarantee
that this value is not an impulse. The smaller the noise probabilities Pa and/or Pb are,
or the larger Smax is allowed to be, the less likely it is that a premature exit will occur.
This is plausible. As the density of the noise impulses increases, it stands to reason
that we would need a larger window to “clean up” the noise spikes.
Every time the algorithm outputs a value, the center of neighborhood Sxy is
moved to the next location in the image. The algorithm then is reinitialized and
applied to the pixels in the new region encompassed by the neighborhood. As indi-
cated in Problem 3.37, the median value can be updated iteratively from one loca-
tion to the next, thus reducing computational load.

Figure 5.14(a) shows the circuit-board image corrupted by salt-and-pepper noise with probabilities
Ps  Pp  0.25, which is 2.5 times the noise level used in Fig. 5.10(a). Here the noise level is high enough
to obscure most of the detail in the image. As a basis for comparison, the image was filtered first using a
7  7 median filter, the smallest filter required to remove most visible traces of impulse noise in this case.
Figure 5.14(b) shows the result. Although the noise was effectively removed, the filter caused significant

vtucircle.com
340 Chapter 5 Image Restoration and Reconstruction

FIGURE 5.14 (a) Image corrupted by salt-and-pepper noise with probabilities Ps  Pp  0.25. (b) Result of filtering
with a 7  7 median filter. (c) Result of adaptive median filtering with S max  7.

loss of detail in the image. For instance, some of the connector fingers at the top of the image appear
distorted or broken. Other image details are similarly distorted.
Figure 5.14(c) shows the result of using the adaptive median filter with Smax  7. Noise removal
performance was similar to the median filter. However, the adaptive filter did a much better job of pre-
serving sharpness and detail. The connector fingers are less distorted, and some other features that were
either obscured or distorted beyond recognition by the median filter appear sharper and better defined
in Fig. 5.14(c). Two notable examples are the feed-through small white holes throughout the board, and
the dark component with eight legs in the bottom, left quadrant of the image.
Considering the high level of noise in Fig. 5.14(a), the adaptive algorithm performed quite well. The
choice of maximum allowed size for Sxy depends on the application, but a reasonable starting value can
be estimated by experimenting with various sizes of the standard median filter first. This will establish a
visual baseline regarding expectations on the performance of the adaptive algorithm.

5.4 PERIODIC NOISE REDUCTION USING FREQUENCY DOMAIN

FILTERING
Periodic noise can be analyzed and filtered quite effectively using frequency domain
techniques. The basic idea is that periodic noise appears as concentrated bursts of
energy in the Fourier transform, at locations corresponding to the frequencies of
the periodic interference. The approach is to use a selective filter (see Section 4.10)
to isolate the noise. The three types of selective filters (bandreject, bandpass, and
notch) were discussed in detail in Section 4.10. There is no difference between how
these filters were used in Chapter 4, and the way they are used for image restora-
tion. In restoration of images corrupted by periodic interference, the tool of choice
is a notch filter. In the following discussion we will expand on the notch filtering
approach introduced in Section 4.10, and also develop a more powerful optimum
notch filtering method.

vtucircle.com
5.4 Periodic Noise Reduction Using Frequency Domain Filtering 341

MORE ON NOTCH FILTERING

As explained in Section 4.10, notch reject filter transfer functions are constructed
as products of highpass filter transfer functions whose centers have been translated
to the centers of the notches. The general form of a notch filter transfer function is
Q

HNR (u, v)  
k1
Hk (u, v)Hk (u, v) (5-33)

where Hk (u, v) and Hk (u, v) are highpass filter transfer functions whose centers
are at (uk , vk ) and (uk , vk ), respectively.† These centers are specified with respect
to the center of the frequency rectangle, floor( M 2), floor(N 2), where, as usual,
M and N are the number of rows and columns in the input image. Thus, the distance
computations for the filter transfer functions are given by
1/2
D (u, v)  (u  M 2  u )2  (v  N 2  v )2  (5-34)
k  k k 

and
1/2
D (u, v)  (u  M 2  u )2  (v  N 2  v )2  (5-35)
k  k k 
For example, the following is a Butterworth notch reject filter transfer function of
order n with three notch pairs:
 1  1 
H ( , ) 
3
   
(5-36)
k1 1  D0k Dk (u, v)  1  D0k Dk (u, v)
n


Because notches are specified as symmetric pairs, the constant D0k is the same for
each pair. However, this constant can be different from one pair to another. Other
notch reject filter functions are constructed in the same manner, depending on the
highpass filter function chosen. As explained in Section 4.10, a notch pass filter
transfer function is obtained from a notch reject function using the expression

HNP (u, v)  1  HNR (u, v) (5-37)

where HNP (u, v) is the transfer function of the notch pass filter corresponding to
the notch reject filter with transfer function HNR (u, v). Figure 5.15 shows perspec-
tive plots of the transfer functions of ideal, Gaussian, and Butterworth notch reject
filters with one notch pair. As we discussed in Chapter 4, we see again that the shape
of the Butterworth transfer function represents a transition between the sharpness
of the ideal function and the broad, smooth shape of the Gaussian transfer function.
As we show in the second part of the following example, we are not limited to
notch filter transfer functions of the form just discussed. We can construct notch

† Remember, frequency domain transfer functions are symmetric about the center of the frequency rectangle, so
the notches are specified as symmetric pairs. Also, recall from Section 4.10 that we use unpadded images when
working with notch filters in order to simplify the specification of notch locations.

vtucircle.com
342 Chapter 5 Image Restoration and Reconstruction

H (u, v) H (u, v) H (u, v)

FIGURE 5.15 Perspective plots of (a) ideal, (b) Gaussian, and (c) Butterworth notch reject filter transfer functions.

filters of arbitrary shapes, provided that they are zero-phase-shift functions, as

defined in Section 4.7.

Figure 5.16(a) is the same as Fig. 2.45(a), which we used in Section 2.6 to introduce the concept of filter-
ing in the frequency domain. We now look in more detail at the process of denoising this image, which is
corrupted by a single, 2-D additive sine wave. You know from Table 4.4 that the Fourier transform of a
pure sine wave is a pair of complex, conjugate impulses, so we would expect the spectrum to have a pair
of bright dots at the frequencies of the sine wave. As Fig. 5.16(b) shows, this is indeed is the case. Because
we can determine the location of these impulses accurately, eliminating them is a simple task, consisting
of using a notch filter transfer function whose notches coincide with the location of the impulses.
Figure 5.16(c) shows an ideal notch reject filter transfer function, which is an array of 1's (shown in
white) and two small circular regions of 0's (shown in black). Figure 5.16(d) shows the result of filtering
the noisy image this transfer function. The sinusoidal noise was virtually eliminated, and a number of
details that were previously obscured by the interference are clearly visible in the filtered image (see, for
example, the thin fiducial marks and the fine detail in the terrain and rock formations). As we showed
in Example 4.25, obtaining an image of the interference pattern is straightforward. We simply turn the
reject filter into a pass filter by subtracting it from 1, and filter the input image with it. Figure 5.17 shows
the result.
Figure 5.18(a) shows the same image as Fig. 4.50(a), but covering a larger area (the interference
pattern is the same). When we discussed lowpass filtering of that image in Chapter 4, we indicated that
there were better ways to reduce the effect of the scan lines. The notch filtering approach that follows
reduces the scan lines significantly, without introducing blurring. Unless blurring is desirable for reasons
we discussed in Section 4.9, notch filtering generally gives much better results.
Just by looking at the nearly horizontal lines of the noise pattern in Fig. 5.18(a), we expect its con-
tribution in the frequency domain to be concentrated along the vertical axis of the DFT. However,
the noise is not dominant enough to have a clear pattern along this axis, as is evident in the spectrum
shown in Fig. 5.18(b). The approach to follow in cases like this is to use a narrow, rectangular notch filter
function that extends along the vertical axis, and thus eliminates all components of the interference
along that axis. We do not filter near the origin to avoid eliminating the dc term and low frequencies,

vtucircle.com
5.4 Periodic Noise Reduction Using Frequency Domain Filtering 343

c d
FIGURE 5.16
(a) Image cor-
rupted by sinusoi-
dal interference.
(b) Spectrum
showing the
bursts of energy
caused by the
interference. (The
bursts were
enlarged for
display purposes.)
(c) Notch filter
(the radius of the
circles is 2 pixels)
used to eliminate
the energy bursts.
(The thin borders
are not part of the
data.)
(d) Result of
notch reject
filtering.
(Original
image courtesy of
NASA.)

which, as you know from Chapter 4, are responsible for the intensity differences between smooth areas.
Figure 5.18(c) shows the filter transfer function we used, and Fig. 5.18(d) shows the filtered result. Most
of the fine scan lines were eliminated or significantly attenuated. In order to get an image of the noise
pattern, we proceed as before by converting the reject filter into a pass filter, and then filtering the input
image with it. Figure 5.19 shows the result.

FIGURE 5.17
Sinusoidal
pattern extracted
from the DFT
of Fig. 5.16(a)
using a notch pass
filter.

vtucircle.com
344 Chapter 5 Image Restoration and Reconstruction

c d
FIGURE 5.18
(a) Satellite image
of Florida and the
Gulf of Mexico.
(Note horizontal
sensor scan lines.)
(b) Spectrum of
(a). (c) Notch
reject filter
transfer
function. (The
thin black border
is not part of the
data.) (d) Filtered
image. (Original
image courtesy of
NOAA.)

FIGURE 5.19
Noise pattern
extracted from
Fig. 5.18(a) by
notch pass
filtering.

vtucircle.com
5.4 Periodic Noise Reduction Using Frequency Domain Filtering 345

OPTIMUM NOTCH FILTERING

In the examples of notch filtering given thus far, the interference patterns have been
simple to identify and characterize in the frequency domain, leading to the specifica-
tion of notch filter transfer functions that also are simple to define heuristically.
When several interference components are present, heuristic specifications of
filter transfer functions are not always acceptable because they may remove too
much image information in the filtering process (a highly undesirable feature when
images are unique and/or expensive to acquire). In addition, the interference com-
ponents generally are not single-frequency bursts. Instead, they tend to have broad
skirts that carry information about the interference pattern. These skirts are not
always easily detectable from the normal transform background. Alternative filter-
ing methods that reduce the effect of these degradations are quite useful in practice.
The method discussed next is optimum, in the sense that it minimizes local variances
of the restored estimate fˆ(x, y).
The procedure consists of first isolating the principal contributions of the interfer-
ence pattern and then subtracting a variable, weighted portion of the pattern from
the corrupted image. Although we develop the procedure in the context of a specific
application, the basic approach is general and can be applied to other restoration
tasks in which multiple periodic interference is a problem.
We begin by extracting the principal frequency components of the interfer-
ence pattern. As before, we do this by placing a notch pass filter transfer function,
HNP (u, v), at the location of each spike. If the filter is constructed to pass only com-
ponents associated with the interference pattern, then the Fourier transform of the
interference noise pattern is given by the expression

N(u, v)  HNP (u, v)G(u, v) (5-38)

where, as usual, G(u, v) is the DFT of the corrupted image.

Specifying HNP (u, v) requires considerable judgment about what is or is not an
interference spike. For this reason, the notch pass filter generally is constructed inter-
actively by observing the spectrum of G(u, v) on a display. After a particular filter
function has been selected, the corresponding noise pattern in the spatial domain is
obtained using the familiar expression

h(x, y)  1
HNP(u, v)G(u, v) (5-39)

Because the corrupted image is assumed to be formed by the addition of the uncor-
rupted image f (x, y) and the interference, h(x, y), if the latter were known com-
pletely, subtracting the pattern from g(x, y) to obtain f (x, y) would be a simple mat-
ter. The problem, of course, is that this filtering procedure usually yields only an
approximation of the true noise pattern. The effect of incomplete components not
present in the estimate of h(x, y) can be minimized by subtracting from g(x, y) a
weighted portion of h(x, y) to obtain an estimate of f (x, y) :

fˆ(x, y)  g(x, y)  w(x, y)h(x, y) (5-40)

vtucircle.com
346 Chapter 5 Image Restoration and Reconstruction

where, as before, fˆ(x, y) is the estimate of f (x, y) and w(x, y) is to be determined.

This function is called a weighting or modulation function, and the objective of the
procedure is to select w(x, y) so that the result is optimized in some meaningful way.
One approach is to select w(x, y) so that the variance of fˆ(x, y) is minimized over a
specified neighborhood of every point (x, y).
Consider a neighborhood Sxy of (odd) size m  n, centered on (x, y). The “local”
variance of fˆ(x, y) at point (x, y) can be estimated using the samples in Sxy , as fol-
lows:



1 ^ _
s (x, y) 
2
[ f (r, c)  f^]2 (5-41)
mn (r,c)Sxy

where fˆ is the average value of fˆ in neighborhood Sxy ; that is,

ˆ
 f (r, c)
1
fˆ  (5-42)
mn (r, c)Sxy

Points on or near the edge of the image can be treated by considering partial neigh-
borhoods or by padding the border with 0's.
Substituting Eq. (5-40) into Eq. (5-41) we obtain

1  2
s2(x, y)   

[g(r, c)  w(r, c)h(r, c)]  [g  wh] (5-43)
mn (r, c)Sxy  

where g and wh denote the average values of g and of the product wh in neighbor-
hood Sxy , respectively.
If we assume that w is approximately constant in Sxy we can replace w(r, c) by
the value of w at the center of the neighborhood:

w(r, c)  w(x, y) (5-44)

Because w(x, y) is assumed to be constant in Sxy , it follows that w=w(x, y) and,

therefore, that

wh  w(x, y) h (5-45)

in Sxy , where h is the average value of h in the neighborhood. Using these approxi-
mations, Eq. (5-43) becomes

1  2
s2(x, y)   

[g(r, c)  w(x, y)h(r, c)]  [g  w(x, y) h] (5-46)
mn (r, c)Sxy  

vtucircle.com
5.4 Periodic Noise Reduction Using Frequency Domain Filtering 347


To minimize s2(x, y) with respect to w(x, y) we solve

s2(x, y)  (5-47)
 0
w(x, y)

for w(x, y). The result is (see Problem 5.17):

g h  gh
w(x, y)   (5-48)
h  h
2 2

To obtain the value of the restored image at point (x, y) we use this equation to com-
pute w(x, y) and then substitute it into Eq. (5-40). To obtain the complete restored
image, we perform this procedure at every point in the noisy image, g.

Figure 5.20(a) shows a digital image of the Martian terrain taken by the Mariner 6 spacecraft. The image
is corrupted by a semi-periodic interference pattern that is considerably more complex (and much more
subtle) than those we have studied thus far. The Fourier spectrum of the image, shown in Fig. 5.20(b),
has a number of “starlike” bursts of energy caused by the interference. As expected, these components
are more difficult to detect than those we have seen before. Figure 5.21 shows the spectrum again, but
without centering. This image offers a somewhat clearer view of the interference components because
the more prominent dc term and low frequencies are “out of way,” in the top left of the spectrum.
Figure 5.22(a) shows the spectrum components that, in the judgement of an experienced image ana-
lyst, are associated with the interference. Applying a notch pass filter to these components and using
Eq. (5-39) yielded the spatial noise pattern, h(x, y), shown in Fig. 5.22(b). Note the similarity between
this pattern and the structure of the noise in Fig. 5.20(a).

a b
FIGURE 5.20
(a) Image of the
Martian
terrain taken by
Mariner 6.
(b) Fourier
spectrum showing
periodic
interference.
(Courtesy of
NASA.)

vtucircle.com
348 Chapter 5 Image Restoration and Reconstruction
FIGURE 5.21
Uncentered
Fourier spectrum
of the image
in Fig. 5.20(a).
(Courtesy of
NASA.)

Finally, Fig. 5.23 shows the restored image, obtained using Eq. (5-40) with the interference pattern just
discussed. Function w(x, y) was computed using the procedure explained in the preceding paragraphs.
As you can see, the periodic interference was virtually eliminated from the noisy image in Fig. 5.20(a).

5.5 LINEAR, POSITION-INVARIANT DEGRADATIONS

The input-output relationship in Fig. 5.1 before the restoration stage is expressed as

g(x, y)  $ f (x, y)  h(x, y) (5-49)

For the moment, let us assume that h(x, y)  0 so that g(x, y)  $ f (x, y). Based on
the discussion in Section 2.6, $ is linear if

a b
FIGURE 5.22
(a) Fourier spec-
trum of N(u, v),
and
(b) corresponding
spatial noise
interference
pattern, h(x, y).
(Courtesy of
NASA.)

vtucircle.com
Image Segmentation

The whole is equal to the sum of its parts.

Euclid
The whole is greater than the sum of its parts.
Max Wertheimer

Preview
The material in the previous chapter began a transition from image processing methods whose inputs
and outputs are images, to methods in which the inputs are images but the outputs are attributes extract-
ed from those images. Most of the segmentation algorithms in this chapter are based on one of two basic
properties of image intensity values: discontinuity and similarity. In the first category, the approach is
to partition an image into regions based on abrupt changes in intensity, such as edges. Approaches in
the second category are based on partitioning an image into regions that are similar according to a set
of predefined criteria. Thresholding, region growing, and region splitting and merging are examples of
methods in this category. We show that improvements in segmentation performance can be achieved
by combining methods from distinct categories, such as techniques in which edge detection is combined
with thresholding. We discuss also image segmentation using clustering and superpixels, and give an
introduction to graph cuts, an approach ideally suited for extracting the principal regions of an image.
This is followed by a discussion of image segmentation based on morphology, an approach that com-
bines several of the attributes of segmentation based on the techniques presented in the first part of the
chapter. We conclude the chapter with a brief discussion on the use of motion cues for segmentation.

Upon completion of this chapter, readers should:

Understand the characteristics of various types Know how to combine thresholding and spa-
of edges found in practice. tial filtering to improve segmentation.
Understand how to use spatial filtering for Be familiar with region-based segmentation,
edge detection. including clustering and superpixels.
Be familiar with other types of edge detection Understand how graph cuts and morphologi-
methods that go beyond spatial filtering. cal watersheds are used for segmentation.
Understand image thresholding using several Be familiar with basic techniques for utilizing
different approaches. motion in image segmentation.

699

vtucircle.com
10.1 FUNDAMENTALS
Let R represent the entire spatial region occupied by an image. We may view image
segmentation as a process that partitions R into n subregions, R1, R2 , …, Rn , such
that
n
(a)
∪ Ri  R.
i 1
(b) Ri is a connected set, for i  0, 1, 2, … , n.
(c) Ri ∩Rj   for all i and j, i  j.
(d) QRi   TRUE for i  0, 1, 2, … , n.

(e) Q Ri URj   FALSE for any adjacent regions R i and Rj .
where QRk  is a logical predicate defined over the points in set Rk , and  is the
null set. The symbols U and W represent set union and intersection, respectively, as
defined in Section 2.6. Two regions Ri and Rj are said to be adjacent if their union
forms a connected set, as defined in Section 2.5. If the set formed by the union of two
regions is not connected, the regions are said to disjoint.
Condition (a) indicates that the segmentation must be complete, in the sense that
every pixel must be in a region. Condition (b) requires that points in a region be con-
nected in some predefined sense (e.g., the points must be 8-connected). Condition
(c) says that the regions must be disjoint. Condition (d) deals with the properties
that must be satisfied by the pixels in a segmented region—for example, QRi  
TRUE if all pixels in Ri have the same intensity. Finally, condition (e) indicates
that two adjacent regions Ri and Rj must be different in the sense of predicate Q.†
Thus, we see that the fundamental problem in segmentation is to partition an
image into regions that satisfy the preceding conditions. Segmentation algorithms
for monochrome images generally are based on one of two basic categories dealing
with properties of intensity values: discontinuity and similarity. In the first category,
we assume that boundaries of regions are sufficiently different from each other, and
from the background, to allow boundary detection based on local discontinuities in
intensity. Edge-based segmentation is the principal approach used in this category.
Region-based segmentation approaches in the second category are based on parti-
tioning an image into regions that are similar according to a set of predefined criteria.
Figure 10.1 illustrates the preceding concepts. Figure 10.1(a) shows an image of a
region of constant intensity superimposed on a darker background, also of constant
intensity. These two regions comprise the overall image. Figure 10.1(b) shows the
result of computing the boundary of the inner region based on intensity discontinui-
ties. Points on the inside and outside of the boundary are black (zero) because there
are no discontinuities in intensity in those regions. To segment the image, we assign
one level (say, white) to the pixels on or inside the boundary, and another level (e.g.,
black) to all points exterior to the boundary. Figure 10.1(c) shows the result of such
a procedure. We see that conditions (a) through (c) stated at the beginning of this

†
In general, Q can be a compound expression such as, “Q Ri   TRUE if the average intensity of the pixels in
region Ri is less than mi AND if the standard deviation of their intensity is greater than si,” where mi and si
are specified constants.

vtucircle.com
10.2 Point, Line, and Edge Detection 701

a b c
de f
FIGURE 10.1
(a) Image of a
constant intensity
region.
(b) Boundary
based on intensity
discontinuities.
(c) Result of
segmentation.
(d) Image of a
texture region.
(e) Result of
intensity discon-
tinuity computa-
tions (note the
large number of
small edges).
(f) Result of
segmentation
based on region
properties.

section are satisfied by this result. The predicate of condition (d) is: If a pixel is on,
or inside the boundary, label it white; otherwise, label it black. We see that this predi-
cate is TRUE for the points labeled black or white in Fig. 10.1(c). Similarly, the two
segmented regions (object and background) satisfy condition (e).
The next three images illustrate region-based segmentation. Figure 10.1(d) is
similar to Fig. 10.1(a), but the intensities of the inner region form a textured pattern.
Figure 10.1(e) shows the result of computing intensity discontinuities in this image.
The numerous spurious changes in intensity make it difficult to identify a unique
boundary for the original image because many of the nonzero intensity changes are
connected to the boundary, so edge-based segmentation is not a suitable approach.
However, we note that the outer region is constant, so all we need to solve this seg-
mentation problem is a predicate that differentiates between textured and constant
regions. The standard deviation of pixel values is a measure that accomplishes this
because it is nonzero in areas of the texture region, and zero otherwise. Figure 10.1(f)
shows the result of dividing the original image into subregions of size 8  8. Each
subregion was then labeled white if the standard deviation of its pixels was posi-
tive (i.e., if the predicate was TRUE), and zero otherwise. The result has a “blocky”
appearance around the edge of the region because groups of 8  8 squares were
labeled with the same intensity (smaller squares would have given a smoother
region boundary). Finally, note that these results also satisfy the five segmentation
conditions stated at the beginning of this section.

10.2 POINT, LINE, AND EDGE DETECTION

The focus of this section is on segmentation methods that are based on detecting
sharp, local changes in intensity. The three types of image characteristics in which

vtucircle.com
we are interested are isolated points, lines, and edges. Edge pixels are pixels at which
the intensity of an image changes abruptly, and edges (or edge segments) are sets of
When we refer to lines,
we are referring to thin connected edge pixels (see Section 2.5 regarding connectivity). Edge detectors are
structures, typically just local image processing tools designed to detect edge pixels. A line may be viewed as
a few pixels thick. Such
lines may correspond, for a (typically) thin edge segment in which the intensity of the background on either
example, to elements of side of the line is either much higher or much lower than the intensity of the line
a digitized architectural
drawing, or roads in a pixels. In fact, as we will discuss later, lines give rise to so-called “roof edges.” Finally,
satellite image. an isolated point may be viewed as a foreground (background) pixel surrounded by
background (foreground) pixels.

BACKGROUND
As we saw in Section 3.5, local averaging smoothes an image. Given that averaging
is analogous to integration, it is intuitive that abrupt, local changes in intensity can
be detected using derivatives. For reasons that will become evident shortly, first- and
second-order derivatives are particularly well suited for this purpose.
Derivatives of a digital function are defined in terms of finite differences. There
are various ways to compute these differences but, as explained in Section 3.6, we
require that any approximation used for first derivatives (1) must be zero in areas
of constant intensity; (2) must be nonzero at the onset of an intensity step or ramp;
and (3) must be nonzero at points along an intensity ramp. Similarly, we require that
an approximation used for second derivatives (1) must be zero in areas of constant
intensity; (2) must be nonzero at the onset and end of an intensity step or ramp; and
(3) must be zero along intensity ramps. Because we are dealing with digital quanti-
ties whose values are finite, the maximum possible intensity change is also finite, and
the shortest distance over which a change can occur is between adjacent pixels.
We obtain an approximation to the first-order derivative at an arbitrary point x of
a one-dimensional function f (x) by expanding the function f (x  Δx) into a Taylor
series about x

f (x  Δx)  f (x)  Δx f (x)  Δx  f (x) Δx 3 f (x)    

2 2 3

2
 3
Remember, the notation x 2! x 3! x
n! means “n factorial”: (10-1)
n! = 1×2×· · ·× n.

∞
Δxn n f (x)
n 0 n! xn

where Δx is the separation between samples of f. For our purposes, this separation
is measured in pixel units. Thus, following the convention in the book, Δx  1 for
the sample preceding x and Δx  1 for the sample following x. When Δx  1, Eq.
(10-1) becomes
Although this is an
f (x  1)  f (x)  f (x)  1  f (x)  1  f (x)  
2 3
expression of only one
2 3
variable, we used partial
derivatives notation for
x 2! x 3! x
(10-2)
consistency when we
discuss functions of two 
∞ 1 n f (x)
variables later in this n 0 n! x
section.

vtucircle.com
10.2 Point, Line, and Edge Detection 703

Similarly, when Δx  1,

f (x) 1 2 f (x) 1 3 f (x)
f (x  1)  f (x)     
xn 2! x2 3! x3
∞
1  f (x)
n
(10-3)
 
n0 xn

In what follows, we compute intensity differences using just a few terms of the Taylor
series. For first-order derivatives we use only the linear terms, and we can form dif-
ferences in one of three ways.
The forward difference is obtained from Eq. (10-2):

f (x)
 f '(x)  f (x  1)  f (x) (10-4)
x

where, as you can see, we kept only the linear terms. The backward difference is simi-
larly obtained by keeping only the linear terms in Eq. (10-3):
f (x)
 f '(x)  f (x)  f (x  1) (10-5)
x
and the central difference is obtained by subtracting Eq. (10-3) from Eq. (10-2):

f (x) f (x  1)  f (x  1)
 f '(x)  (10-6)
x 2

The higher terms of the series that we did not use represent the error between an
exact and an approximate derivative expansion. In general, the more terms we use
from the Taylor series to represent a derivative, the more accurate the approxima-
tion will be. To include more terms implies that more points are used in the approxi-
mation, yielding a lower error. However, it turns out that central differences have
a lower error for the same number of points (see Problem 10.1). For this reason,
derivatives are usually expressed as central differences.
The second order derivative based on a central difference, 2 f (x) x2 , is obtained
by adding Eqs. (10-2) and (10-3):
2 f (x)
      (10-7)
f ''(x) f (x 1) 2 f (x) f (x 1)
x 2

To obtain the third order, central derivative we need one more point on either side
of x. That is, we need the Taylor expansions for f (x  2) and f (x  2), which we
obtain from Eqs. (10-2) and (10-3) with Δx  2 and Δx  2, respectively. The strat-
egy is to combine the two Taylor expansions to eliminate all derivatives lower than
the third. The result after ignoring all higher-order terms [see Problem 10.2(a)] is

vtucircle.com
3 f (x) f (x  2)  2 f (x  1)  0 f (x)  2 f (x  1)  f (x  2) (10-8)
 
f '''(x)
x 3
2

Similarly [see Problem 10.2(b)], the fourth finite difference (the highest we use in
the book) after ignoring all higher order terms is given by
4 f (x)
          (10-9)
f ''''(x) f (x 2) 4 f (x 1) 6 f (x) 4 f (x 1) f (x 2)
x 4

Table 10.1 summarizes the first four central derivatives just discussed. Note the
symmetry of the coefficients about the center point. This symmetry is at the root
of why central differences have a lower approximation error for the same number
of points than the other two differences. For two variables, we apply the results in
Table 10.1 to each variable independently. For example,
2 f x, y
 f  x  1, y  2 f x, y  f  x  1, y (10-10)
x2
 
and  
 
2 f x, y
 f x, y  1  2 f x, y  f x, y  1

(10-11)
y2

It is easily verified that the first and second-order derivatives in Eqs. (10-4)
through (10-7) satisfy the conditions stated at the beginning of this section regarding
derivatives of the first and second order. To illustrate this, consider Fig. 10.2. Part (a)
shows an image of various objects, a line, and an isolated point. Figure 10.2(b) shows
a horizontal intensity profile (scan line) through the center of the image, including
the isolated point. Transitions in intensity between the solid objects and the back-
ground along the scan line show two types of edges: ramp edges (on the left) and
step edges (on the right). As we will discuss later, intensity transitions involving thin
objects such as lines often are referred to as roof edges.
Figure 10.2(c) shows a simplified profile, with just enough points to make it possi-
ble for us to analyze manually how the first- and second-order derivatives behave as
they encounter a point, a line, and the edges of objects. In this diagram the transition

TABLE 10.1

digital derivatives 2 f '(x) 1 0 1

(finite differenc-
es) for samples 1 2 1
taken uniformly,
2 f '''(x) 1 2 0 2 1
1 4 6 4 1

vtucircle.com
10.2 Point, Line, and Edge Detection 705

a b
c
FIGURE 10.2
(a) Image.
(b) Horizontal
intensity profile
that includes the
isolated point
indicated by the
arrow.
(c) Subsampled
profile; the dashes
were added
for clarity. The
numbers in the
boxes are the
intensity values
of the dots shown
in the profile. The
derivatives were 7
obtained using 6
Eqs. (10-4) for the 5
first derivative 4
3
and Eq. (10-7) for
2
the second.
1
0
Intensity values 5 5 4 3 2 1 0 0 0 6 0 0 0 0 1 3 1 0 0 0 0 7 7 7 7

First derivative —1—1—1—1—1 0 0 6 —6 0 0 0 1 2 —2—1 0 0 0 7 0 0 0

Second derivative —1 0 0 0 0 1 0 6 —12 6 0 0 1 1 —4 1 1 0 0 7 —7 0 0

in the ramp spans four pixels, the noise point is a single pixel, the line is three pixels
thick, and the transition of the step edge takes place between adjacent pixels. The
number of intensity levels was limited to eight for simplicity.
Consider the properties of the first and second derivatives as we traverse the
profile from left to right. Initially, the first-order derivative is nonzero at the onset
and along the entire intensity ramp, while the second-order derivative is nonzero
only at the onset and end of the ramp. Because the edges of digital images resemble
this type of transition, we conclude that first-order derivatives produce “thick” edges,
and second-order derivatives much thinner ones. Next we encounter the isolated
noise point. Here, the magnitude of the response at the point is much stronger for
the second- than for the first-order derivative. This is not unexpected, because a
second-order derivative is much more aggressive than a first-order derivative in
enhancing sharp changes. Thus, we can expect second-order derivatives to enhance
fine detail (including noise) much more than first-order derivatives. The line in this
example is rather thin, so it too is fine detail, and we see again that the second deriva-
tive has a larger magnitude. Finally, note in both the ramp and step edges that the

vtucircle.com
FIGURE 10.3
A general 3  3
spatial filter
kernel. The w’s
are the kernel
coefficients
(weights).

second derivative has opposite signs (negative to positive or positive to negative)

as it transitions into and out of an edge. This “double-edge” effect is an important
characteristic that can be used to locate edges, as we will show later in this section.
As we move into the edge, the sign of the second derivative is used also to determine
whether an edge is a transition from light to dark (negative second derivative), or
from dark to light (positive second derivative)
In summary, we arrive at the following conclusions: (1) First-order derivatives gen-
erally produce thicker edges. (2) Second-order derivatives have a stronger response to
fine detail, such as thin lines, isolated points, and noise. (3) Second-order derivatives
produce a double-edge response at ramp and step transitions in intensity. (4) The sign
of the second derivative can be used to determine whether a transition into an edge is
from light to dark or dark to light.
The approach of choice for computing first and second derivatives at every pix-
el location in an image is to use spatial convolution. For the 3  3 filter kernel in
Fig. 10.3, the procedure is to compute the sum of products of the kernel coefficients
with the intensity values in the region encompassed by the kernel, as we explained
in Section 3.4. That is, the response of the filter at the center point of the kernel is
This equation is an
expansion of Eq. (3-35) Z  w1z1  w2z2  …  w9z9
for a 3×3 kernel, valid 9 (10-12)
wkzk
at one point, and using
simplified subscript 
notation for the kernel k 1
coefficients.
where zk is the intensity of the pixel whose spatial location corresponds to the loca-
tion of the kth kernel coefficient.

DETECTION OF ISOLATED POINTS

Based on the conclusions reached in the preceding section, we know that point
detection should be based on the second derivative which, from the discussion in
Section 3.6, means using the Laplacian:

2 f 2 f
2 f (x, y)   (10-13)
x2 y2

vtucircle.com
10.2 Point, Line, and Edge Detection 707

where the partial derivatives are computed using the second-order finite differences
in Eqs. (10-10) and (10-11). The Laplacian is then

2 f x, y  f (x  1, y)  f (x  1, y)  f (x, y  1)  f (x, y  1)  4 f (x, y) (10-14)

As explained in Section 3.6, this expression can be implemented using the Lapla-
cian kernel in Fig. 10.4(a) in Example 10.1. We then we say that a point has been
detected at a location (x, y) on which the kernel is centered if the absolute value of
the response of the filter at that point exceeds a specified threshold. Such points are
labeled 1 and all others are labeled 0 in the output image, thus producing a binary
image. In other words, we use the expression:
1
g(x, y)  if Z(x, y)  T (10-15)

0 otherwise
where g(x, y) is the output image, T is a nonnegative threshold, and Z is given by
Eq. (10-12). This formulation simply measures the weighted differences between a
pixel and its 8-neighbors. Intuitively, the idea is that the intensity of an isolated point
will be quite different from its surroundings, and thus will be easily detectable by
this type of kernel. Differences in intensity that are considered of interest are those
large enough (as determined by T) to be considered isolated points. Note that, as
usual for a derivative kernel, the coefficients sum to zero, indicating that the filter
response will be zero in areas of constant intensity.

Figure 10.4(b) is an X-ray image of a turbine blade from a jet engine. The blade has a porosity mani-
fested by a single black pixel in the upper-right quadrant of the image. Figure 10.4(c) is the result of fil-
tering the image with the Laplacian kernel, and Fig. 10.4(d) shows the result of Eq. (10-15) with T equal
to 90% of the highest absolute pixel value of the image in Fig. 10.4(c). The single pixel is clearly visible
in this image at the tip of the arrow (the pixel was enlarged to enhance its visibility). This type of detec-
tion process is specialized because it is based on abrupt intensity changes at single-pixel locations that
are surrounded by a homogeneous background in the area of the detector kernel. When this condition
is not satisfied, other methods discussed in this chapter are more suitable for detecting intensity changes.

LINE DETECTION
The next level of complexity is line detection. Based on the discussion earlier in this
section, we know that for line detection we can expect second derivatives to result
in a stronger filter response, and to produce thinner lines than first derivatives. Thus,
we can use the Laplacian kernel in Fig. 10.4(a) for line detection also, keeping in
mind that the double-line effect of the second derivative must be handled properly.
The following example illustrates the procedure.

vtucircle.com
====
a
bc d 1 1 1
FIGURE 10.4
(a) Laplacian ker-
nel used for point 1 1
8
detection.
(b) X-ray image
of a turbine blade
with a porosity 1 1 1
manifested by a
single black pixel.
(c) Result of con-
volving the kernel
with the image.
(d) Result of
using Eq. (10-15)
was a single point
(shown enlarged
at the tip of the
arrow). (Original
image courtesy of
X-TEK Systems,
Ltd.)

Figure 10.5(a) shows a 486  486 (binary) portion of a wire-bond mask for an electronic circuit, and
Fig. 10.5(b) shows its Laplacian image. Because the Laplacian image contains negative values (see the
discussion after Example 3.18), scaling is necessary for display. As the magnified section shows, mid gray
represents zero, darker shades of gray represent negative values, and lighter shades are positive. The
double-line effect is clearly visible in the magnified region.
At first, it might appear that the negative values can be handled simply by taking the absolute value
of the Laplacian image. However, as Fig. 10.5(c) shows, this approach doubles the thickness of the lines.
A more suitable approach is to use only the positive values of the Laplacian (in noisy situations we use
the values that exceed a positive threshold to eliminate random variations about zero caused by the
noise). As Fig. 10.5(d) shows, this approach results in thinner lines that generally are more useful. Note
in Figs. 10.5(b) through (d) that when the lines are wide with respect to the size of the Laplacian kernel,
the lines are separated by a zero “valley.” This is not unexpected. For example, when the 3  3 kernel is
centered on a line of constant intensity 5 pixels wide, the response will be zero, thus producing the effect
just mentioned. When we talk about line detection, the assumption is that lines are thin with respect to
the size of the detector. Lines that do not satisfy this assumption are best treated as regions and handled
by the edge detection methods discussed in the following section.

The Laplacian detector kernel in Fig. 10.4(a) is isotropic, so its response is inde-
pendent of direction (with respect to the four directions of the 3  3 kernel: verti-
cal, horizontal, and two diagonals). Often, interest lies in detecting lines in specified

vtucircle.com
10.2 Point, Line, and Edge Detection 709

c d
FIGURE 10.5
(a) Original
image.
(b) Laplacian
image; the
magnified
section shows the
positive/negative
double-line effect
characteristic of
the Laplacian.
(c) Absolute value
of the Laplacian.
(d) Positive values
of the Laplacian.

directions. Consider the kernels in Fig. 10.6. Suppose that an image with a constant
background and containing various lines (oriented at 0°, 45, and 90°) is filtered
with the first kernel. The maximum responses would occur at image locations in
which a horizontal line passes through the middle row of the kernel. This is easily
verified by sketching a simple array of 1’s with a line of a different intensity (say, 5s)
running horizontally through the array. A similar experiment would reveal that the
second kernel in Fig. 10.6 responds best to lines oriented at 45; the third kernel
to vertical lines; and the fourth kernel to lines in the 45 direction. The preferred
direction of each kernel is weighted with a larger coefficient (i.e., 2) than other possi-
ble directions. The coefficients in each kernel sum to zero, indicating a zero response
in areas of constant intensity.
Let Z1 , Z2 , Z3 , and Z4 denote the responses of the kernels in Fig. 10.6, from left
to right, where the Zs are given by Eq. (10-12). Suppose that an image is filtered
with these four kernels, one at a time. If, at a given point in the image, Zk  Zj ,
for all j  k, that point is said to be more likely associated with a line in the direc-
tion of kernel k. For example, if at a point in the image, Z1  Zj for j  2, 3, 4, that

vtucircle.com
1 1 1 2 1 1 1 2 1 1 1 2

2 2 2 1 2 1 1 2 1 1 2 1

1 1 1 1 1 2 1 2 1 2 1 1

Horizontal +45° Vertical —45°

FIGURE 10.6 Line detection kernels. Detection angles are with respect to the axis system in Fig. 2.19, with positive
angles measured counterclockwise with respect to the (vertical) x-axis.

point is said to be more likely associated with a horizontal line. If we are interested
in detecting all the lines in an image in the direction defined by a given kernel, we
simply run the kernel through the image and threshold the absolute value of the
result, as in Eq. (10-15). The nonzero points remaining after thresholding are the
strongest responses which, for lines one pixel thick, correspond closest to the direc-
tion defined by the kernel. The following example illustrates this procedure.

Figure 10.7(a) shows the image used in the previous example. Suppose that we are interested in find-
ing all the lines that are one pixel thick and oriented at 45. For this purpose, we use the kernel in
Fig. 10.6(b). Figure 10.7(b) is the result of filtering the image with that kernel. As before, the shades
darker than the gray background in Fig. 10.7(b) correspond to negative values. There are two principal
segments in the image oriented in the 45 direction, one in the top left and one at the bottom right. Fig-
ures 10.7(c) and (d) show zoomed sections of Fig. 10.7(b) corresponding to these two areas. The straight
line segment in Fig. 10.7(d) is brighter than the segment in Fig. 10.7(c) because the line segment in the
bottom right of Fig. 10.7(a) is one pixel thick, while the one at the top left is not. The kernel is “tuned”
to detect one-pixel-thick lines in the 45 direction, so we expect its response to be stronger when such
lines are detected. Figure 10.7(e) shows the positive values of Fig. 10.7(b). Because we are interested in
the strongest response, we let T equal 254 (the maximum value in Fig. 10.7(e) minus one). Figure 10.7(f)
shows in white the points whose values satisfied the condition g  T, where g is the image in Fig. 10.7(e).
The isolated points in the figure are points that also had similarly strong responses to the kernel. In the
original image, these points and their immediate neighbors are oriented in such a way that the kernel
produced a maximum response at those locations. These isolated points can be detected using the kernel
in Fig. 10.4(a) and then deleted, or they can be deleted using morphological operators, as discussed in the
last chapter.

EDGE MODELS
Edge detection is an approach used frequently for segmenting images based on
abrupt (local) changes in intensity. We begin by introducing several ways to model
edges and then discuss a number of approaches for edge detection.

vtucircle.com
10.2 Point, Line, and Edge Detection 711

a b c
de f
FIGURE 10.7 (a) Image of a wire-bond template. (b) Result of processing with the 45 line detector kernel in Fig.
10.6. (c) Zoomed view of the top left region of (b). (d) Zoomed view of the bottom right region of (b). (e) The image
in (b) with all negative values set to zero. (f) All points (in white) whose values satisfied the condition g  T, where
g is the image in (e) and T  254 (the maximum pixel value in the image minus 1). (The points in (f) were enlarged
to make them easier to see.)

Edge models are classified according to their intensity profiles. A step edge is
characterized by a transition between two intensity levels occurring ideally over the
distance of one pixel. Figure 10.8(a) shows a section of a vertical step edge and
a horizontal intensity profile through the edge. Step edges occur, for example, in
images generated by a computer for use in areas such as solid modeling and ani-
mation. These clean, ideal edges can occur over the distance of one pixel, provided
that no additional processing (such as smoothing) is used to make them look “real.”
Digital step edges are used frequently as edge models in algorithm development.
For example, the Canny edge detection algorithm discussed later in this section was
derived originally using a step-edge model.
In practice, digital images have edges that are blurred and noisy, with the degree
of blurring determined principally by limitations in the focusing mechanism (e.g.,
lenses in the case of optical images), and the noise level determined principally by
the electronic components of the imaging system. In such situations, edges are more

vtucircle.com
FIGURE 10.8
From left to right,
models (ideal
representations) of
a step, a ramp, and
a roof edge, and
their corresponding
intensity profiles.

closely modeled as having an intensity ramp profile, such as the edge in Fig. 10.8(b).
The slope of the ramp is inversely proportional to the degree to which the edge is
blurred. In this model, we no longer have a single “edge point” along the profile.
Instead, an edge point now is any point contained in the ramp, and an edge segment
would then be a set of such points that are connected.
A third type of edge is the so-called roof edge, having the characteristics illus-
trated in Fig. 10.8(c). Roof edges are models of lines through a region, with the
base (width) of the edge being determined by the thickness and sharpness of the
line. In the limit, when its base is one pixel wide, a roof edge is nothing more than
a one-pixel-thick line running through a region in an image. Roof edges arise, for
example, in range imaging, when thin objects (such as pipes) are closer to the sensor
than the background (such as walls). The pipes appear brighter and thus create an
image similar to the model in Fig. 10.8(c). Other areas in which roof edges appear
routinely are in the digitization of line drawings and also in satellite images, where
thin features, such as roads, can be modeled by this type of edge.
It is not unusual to find images that contain all three types of edges. Although
blurring and noise result in deviations from the ideal shapes, edges in images that
are reasonably sharp and have a moderate amount of noise do resemble the charac-
teristics of the edge models in Fig. 10.8, as the profiles in Fig. 10.9 illustrate. What the
models in Fig. 10.8 allow us to do is write mathematical expressions for edges in the
development of image processing algorithms. The performance of these algorithms
will depend on the differences between actual edges and the models used in devel-
oping the algorithms.
Figure 10.10(a) shows the image from which the segment in Fig. 10.8(b) was extract-
ed. Figure 10.10(b) shows a horizontal intensity profile. This figure shows also the first
and second derivatives of the intensity profile. Moving from left to right along the
intensity profile, we note that the first derivative is positive at the onset of the ramp
and at points on the ramp, and it is zero in areas of constant intensity. The second
derivative is positive at the beginning of the ramp, negative at the end of the ramp,
zero at points on the ramp, and zero at points of constant intensity. The signs of the
derivatives just discussed would be reversed for an edge that transitions from light to
dark. The intersection between the zero intensity axis and a line extending between
the extrema of the second derivative marks a point called the zero crossing of the
second derivative.
We conclude from these observations that the magnitude of the first derivative
can be used to detect the presence of an edge at a point in an image. Similarly, the
sign of the second derivative can be used to determine whether an edge pixel lies on

vtucircle.com
10.2 Point, Line, and Edge Detection 713

FIGURE 10.9 A 1508  1970 image showing (zoomed) actual ramp (bottom, left), step (top,
right), and roof edge profiles. The profiles are from dark to light, in the areas enclosed by the
small circles. The ramp and step profiles span 9 pixels and 2 pixels, respectively. The base of the
roof edge is 3 pixels. (Original image courtesy of Dr. David R. Pickens, Vanderbilt University.)

the dark or light side of an edge. Two additional properties of the second derivative
around an edge are: (1) it produces two values for every edge in an image; and (2)
its zero crossings can be used for locating the centers of thick edges, as we will show
later in this section. Some edge models utilize a smooth transition into and out of

a b
FIGURE 10.10
(a) Two regions of
constant
intensity
separated by an
ideal ramp edge.
(b) Detail near
the edge, showing
a horizontal
intensity profile, derivative
and its first and
second
derivatives.

derivative

vtucircle.com
the ramp (see Problem 10.9). However, the conclusions reached using those models
are the same as with an ideal ramp, and working with the latter simplifies theoretical
formulations. Finally, although attention thus far has been limited to a 1-D horizon-
tal profile, a similar argument applies to an edge of any orientation in an image. We
simply define a profile perpendicular to the edge direction at any desired point, and
interpret the results in the same manner as for the vertical edge just discussed.

The edge models in Fig. 10.8 are free of noise. The image segments in the first column in Fig. 10.11 show
close-ups of four ramp edges that transition from a black region on the left to a white region on the right
(keep in mind that the entire transition from black to white is a single edge). The image segment at the
top left is free of noise. The other three images in the first column are corrupted by additive Gaussian
noise with zero mean and standard deviation of 0.1, 1.0, and 10.0 intensity levels, respectively. The graph
below each image is a horizontal intensity profile passing through the center of the image. All images
have 8 bits of intensity resolution, with 0 and 255 representing black and white, respectively.
Consider the image at the top of the center column. As discussed in connection with Fig. 10.10(b), the
derivative of the scan line on the left is zero in the constant areas. These are the two black bands shown
in the derivative image. The derivatives at points on the ramp are constant and equal to the slope of the
ramp. These constant values in the derivative image are shown in gray. As we move down the center col-
umn, the derivatives become increasingly different from the noiseless case. In fact, it would be difficult
to associate the last profile in the center column with the first derivative of a ramp edge. What makes
these results interesting is that the noise is almost visually undetectable in the images on the left column.
These examples are good illustrations of the sensitivity of derivatives to noise.
As expected, the second derivative is even more sensitive to noise. The second derivative of the noise-
less image is shown at the top of the right column. The thin white and black vertical lines are the positive
and negative components of the second derivative, as explained in Fig. 10.10. The gray in these images
represents zero (as discussed earlier, scaling causes zero to show as gray). The only noisy second deriva-
tive image that barely resembles the noiseless case corresponds to noise with a standard deviation of 0.1.
The remaining second-derivative images and profiles clearly illustrate that it would be difficult indeed to
detect their positive and negative components, which are the truly useful features of the second deriva-
tive in terms of edge detection.
The fact that such little visual noise can have such a significant impact on the two key derivatives
used for detecting edges is an important issue to keep in mind. In particular, image smoothing should be
a serious consideration prior to the use of derivatives in applications where noise with levels similar to
those we have just discussed is likely to be present.

In summary, the three steps performed typically for edge detection are:
1. Image smoothing for noise reduction. The need for this step is illustrated by the
results in the second and third columns of Fig. 10.11.
2. Detection of edge points. As mentioned earlier, this is a local operation that
extracts from an image all points that are potential edge-point candidates.
3. Edge localization. The objective of this step is to select from the candidate
points only the points that are members of the set of points comprising an edge.
The remainder of this section deals with techniques for achieving these objectives.

vtucircle.com
10.2 Point, Line, and Edge Detection 715

FIGURE 10.11 First column: 8-bit images with values in the range [0, 255], and intensity profiles
of a ramp edge corrupted by Gaussian noise of zero mean and standard deviations of 0.0, 0.1,
1.0, and 10.0 intensity levels, respectively. Second column: First-derivative images and inten-
sity profiles. Third column: Second-derivative images and intensity profiles.

vtucircle.com
BASIC EDGE DETECTION
As illustrated in the preceding discussion, detecting changes in intensity for the pur-
pose of finding edges can be accomplished using first- or second-order derivatives.
We begin with first-order derivatives, and work with second-order derivatives in the
following subsection.

The Image Gradient and Its Properties

The tool of choice for finding edge strength and direction at an arbitrary location
(x, y) of an image, f, is the gradient, denoted by ∇f and defined as the vector
f (x, y)
For convenience, we  gx(x, y) x
repeat here some of the
gradient concepts and f (x, y)  grad f (x, y)      (10-16)
equations introduced in g y (x, y)  f (x, y) 
Chapter 3.  y 

This vector has the well-known property that it points in the direction of maximum
rate of change of f at (x, y) (see Problem 10.10). Equation (10-16) is valid at an
arbitrary (but single) point (x, y). When evaluated for all applicable values of x
and y, ∇f (x, y) becomes a vector image, each element of which is a vector given by
Eq. (10-16). The magnitude, M(x, y), of this gradient vector at a point (x, y) is given
by its Euclidean vector norm:

M(x, y)  f (x, y)  (10-17)

This is the value of the rate of change in the direction of the gradient vector at point
(x, y). Note that M(x, y), ∇f (x, y) , gx (x, y), and gy (x, y) are arrays of the same
size as f, created when x and y are allowed to vary over all pixel locations in f. It is
common practice to refer to M(x, y) and ∇f (x, y) as the gradient image, or simply
as the gradient when the meaning is clear. The summation, square, and square root
operations are elementwise operations, as defined in Section 2.6.
The direction of the gradient vector at a point (x, y) is given by
 gy (x, y)
a(x, y)  tan1 g (x, y) (10-18)
 
 x 
Angles are measured in the counterclockwise direction with respect to the x-axis
(see Fig. 2.19). This is also an image of the same size as f, created by the elementwise
division of gx and gy over all applicable values of x and y. The following example
illustrates, the direction of an edge at a point (x, y) is orthogonal to the direction,
a(x, y), of the gradient vector at the point.

Figure 10.12(a) shows a zoomed section of an image containing a straight edge segment. Each square
corresponds to a pixel, and we are interested in obtaining the strength and direction of the edge at the
point highlighted with a box. The shaded pixels in this figure are assumed to have value 0, and the white

vtucircle.com
10.2 Point, Line, and Edge Detection 717
y
Origin

a a
a 90°
direction

FIGURE 10.12 Using the gradient to determine edge strength and direction at a point. Note that the edge direction
is perpendicular to the direction of the gradient vector at the point where the gradient is computed. Each square
represents one pixel. (Recall from Fig. 2.19 that the origin of our coordinate system is at the top, left.)

pixels have value 1. We discuss after this example an approach for computing the derivatives in the x-
and y-directions using a 3  3 neighborhood centered at a point. The method consists of subtracting the
pixels in the top row of the neighborhood from the pixels in the bottom row to obtain the partial deriva-
tive in the x-direction. Similarly, we subtract the pixels in the left column from the pixels in the right col-
umn of the neighborhood to obtain the partial derivative in the y-direction. It then follows, using these
differences as our estimates of the partials, that f x  2 and f y  2 at the point in question. Then,
f 
g x  x  2
f         
g y  f   2 
 y 
from which we obtain ∇f  2 2 at that point. Similarly, the direction of the gradient vector at the
 
same point follows from Eq. (10-18): a  tan1 gy gx  45, which is the same as 135° measured in
the positive (counterclockwise) direction with respect to the x-axis in our image coordinate system (see
Fig. 2.19). Figure 10.12(b) shows the gradient vector and its direction angle.
As mentioned earlier, the direction of an edge at a point is orthogonal to the gradient vector at that
point. So the direction angle of the edge in this example is a  90  135  90  45, as Fig. 10.12(c)
shows. All edge points in Fig. 10.12(a) have the same gradient, so the entire edge segment is in the same
direction. The gradient vector sometimes is called the edge normal. When the vector is normalized to unit
length by dividing it by its magnitude, the resulting vector is referred to as the edge unit normal.

Gradient Operators
Obtaining the gradient of an image requires computing the partial derivatives f x
and f y at every pixel location in the image. For the gradient, we typically use a
forward or centered finite difference (see Table 10.1). Using forward differences we
obtain
f (x, y)
g (x, y)   f (x  1, y)  f (x, y) (10-19)
x
x

vtucircle.com
a b
1 1 1
FIGURE 10.13
1-D kernels used to
implement Eqs. 1
(10-19) and (10-20).

and 
f (x, y) 
g (x, y)   f (x, y  1)  f (x, y) (10-20)
y
y
These two equations can be implemented for all values of x and y by filtering f (x, y)
with the 1-D kernels in Fig. 10.13.
When diagonal edge direction is of interest, we need 2-D kernels. The Roberts
Filter kernels used to
cross-gradient operators (Roberts [1965]) are one of the earliest attempts to use 2-D
compute the derivatives kernels with a diagonal preference. Consider the 3  3 region in Fig. 10.14(a). The
needed for the gradient
are often called gradient
Roberts operators are based on implementing the diagonal differences
operators, difference f
operators, edge operators, gx   (z9  z5 ) (10-21)
or edge detectors. x
and
f  (z  z ) (10-22)
gy 
y 8 6

These derivatives can be implemented by filtering an image with the kernels shown
in Figs. 10.14(b) and (c).
Kernels of size 2  2 are simple conceptually, but they are not as useful for com-
puting edge direction as kernels that are symmetric about their centers, the smallest
of which are of size 3  3. These kernels take into account the nature of the data on
opposite sides of the center point, and thus carry more information regarding the
direction of an edge. The simplest digital approximations to the partial derivatives
using kernels of size 3  3 are given by
Observe that these two
f
g   (z  z  z )  (z  z  z )
equations are first-order
central differences as
given in Eq. (10-6), but
x
x 7 8 9 1 2 3
multiplied by 2. and (10-23)
f
g   (z  z  z )  (z  z  z )
y
y 3 6 9 1 4 7

In this formulation, the difference between the third and first rows of the 3  3 region
approximates the derivative in the x-direction, and the difference between the third
and first columns approximate the derivative in the y-direction. Intuitively, we would
expect these approximations to be more accurate than the approximations obtained
using the Roberts operators. Equations (10-22) and (10-23) can be implemented over
an entire image by filtering it with the two kernels in Figs. 10.14(d) and (e). These
kernels are called the Prewitt operators (Prewitt [1970]).
A slight variation of the preceding two equations uses a weight of 2 in the center
coefficient:

vtucircle.com
10.2 Point, Line, and Edge Detection 719

a
b c z1 z2 z3
d e
f g z4 z5 z6
FIGURE 10.14
A 3  3 region
of an image (the z7 z8 z9
z’s are intensity
values), and
various kernels 1 0 0 1
used to compute
the gradient at the 0 1 1 0
point labeled z5.
Roberts

1 1 1 1 0 1

0 0 0 1 0 1

1 1 1 1 0 1

Prewitt

1 2 1 1 0 1

0 0 0 2 0 2

1 2 1 1 0 1

Sobel

f  (z  2z  z )  (z  2z  z ) (10-24)
gx 
x
7 8 9 1 2 3

and
f  (z  2z  z )  (z  2z  z ) (10-25)
gy 
y
3 6 9 1 4 7

It can be demonstrated (see Problem 10.12) that using a 2 in the center location pro-
vides image smoothing. Figures 10.14(f) and (g) show the kernels used to implement
Eqs. (10-24) and (10-25). These kernels are called the Sobel operators (Sobel [1970]).
The Prewitt kernels are simpler to implement than the Sobel kernels, but the
slight computational difference between them typically is not an issue. The fact
that the Sobel kernels have better noise-suppression (smoothing) characteristics
makes them preferable because, as mentioned earlier in the discussion of Fig. 10.11,
noise suppression is an important issue when dealing with derivatives. Note that the

vtucircle.com
Recall the important coefficients of all the kernels in Fig. 10.14 sum to zero, thus giving a response of zero
result in Problem 3.32
that using a kernel in areas of constant intensity, as expected of derivative operators.
whose coefficients sum Any of the pairs of kernels from Fig. 10.14 are convolved with an image to obtain
to zero produces a
filtered image whose the gradient components gx and gy at every pixel location. These two partial deriva-
pixels also sum to zero. tive arrays are then used to estimate edge strength and direction. Obtaining the
This implies in general
that some pixels will be magnitude of the gradient requires the computations in Eq. (10-17). This imple-
negative. Similarly, if the mentation is not always desirable because of the computational burden required
kernel coefficients sum
to 1, the sum of pixels in by squares and square roots, and an approach used frequently is to approximate the
the original and filtered magnitude of the gradient by absolute values:
images will be the same
(see Problem 3.31).
M(x, y)  gx  gy (10-26)

This equation is more attractive computationally, and it still preserves relative

changes in intensity levels. The price paid for this advantage is that the resulting
filters will not be isotropic (invariant to rotation) in general. However, this is not an
issue when kernels such as the Prewitt and Sobel kernels are used to compute gx
and gy because these kernels give isotropic results only for vertical and horizontal
edges. This means that results would be isotropic only for edges in those two direc-
tions anyway, regardless of which of the two equations is used. That is, Eqs. (10-17)
and (10-26) give identical results for vertical and horizontal edges when either the
Sobel or Prewitt kernels are used (see Problem 10.11).
The 3  3 kernels in Fig. 10.14 exhibit their strongest response predominantly for
vertical and horizontal edges. The Kirsch compass kernels (Kirsch [1971]) in Fig. 10.15,
are designed to detect edge magnitude and direction (angle) in all eight compass
directions. Instead of computing the magnitude using Eq. (10-17) and angle using
Eq. (10-18), Kirsch’s approach was to determine the edge magnitude by convolv-
ing an image with all eight kernels and assign the edge magnitude at a point as the
response of the kernel that gave strongest convolution value at that point. The edge
angle at that point is then the direction associated with that kernel. For example, if
the strongest value at a point in the image resulted from using the north (N) kernel,
the edge magnitude at that point would be assigned the response of that kernel, and
the direction would be 0 (because compass kernel pairs differ by a rotation of 180
choosing the maximum response will always result in a positive number). Although
when working with, say, the Sobel kernels, we think of a north or south edge as
being vertical, the N and S compass kernels differentiate between the two, the differ-
ence being the direction of the intensity transitions defining the edge. For example,
assuming that intensity values are in the range [0, 1], the binary edge in Fig. 10.8(a)
is defined by black (0) on the left and white (1) on the right. When all Kirsch kernels
are applied to this edge, the N kernel will yield the highest value, thus indicating an
edge oriented in the north direction (at the point of the computation).

Figure 10.16 illustrates the Sobel absolute value response of the two components of the gradient, gx
and gy , as well as the gradient image formed from the sum of these two components. The directionality
of the horizontal and vertical components of the gradient is evident in Figs. 10.16(b) and (c). Note, for

vtucircle.com
10.2 Point, Line, and Edge Detection 721

a b c d
e f g h 3 3 5 3 5 5 5 5 5 5 5 3
FIGURE 10.15
Kirsch compass 3 0 5 3 0 5 3 0 3 5 0 3
kernels. The edge
direction of
3 3 5 3 3 3 3 3 3 3 3 3
strongest response
of each kernel is N NW W SW
labeled below it.
5 3 3 3 3 3 3 3 3 3 3 3

5 0 3 5 0 3 3 0 3 3 0 5

5 3 3 5 5 3 5 5 5 3 5 5

S SE E NE

example, how strong the roof tile, horizontal brick joints, and horizontal segments of the windows are in
Fig. 10.16(b) compared to other edges. In contrast, Fig. 10.16(c) favors features such as the vertical com-
ponents of the façade and windows. It is common terminology to use the term edge map when referring
to an image whose principal features are edges, such as gradient magnitude images. The intensities of the
image in Fig. 10.16(a) were scaled to the range [0, 1]. We use values in this range to simplify parameter
selection in the various methods for edge detection discussed in this section.

c d
FIGURE 10.16
(a) Image of size
834  1114 pixels,
with intensity
values scaled to
the range [0,1].
(b) gx , the
component of
the gradient in 1 1 1
the x-direction,
obtained using the 1
Sobel kernel in
Fig. 10.14(f) to
filter the image.
(c) gy , obtained
using the kernel
in Fig. 10.14(g).
(d) The gradient
 .

vtucircle.com
FIGURE 10.17
Gradient angle
image computed
using Eq. (10-18).
Areas of constant
intensity in this
image indicate
that the direction
of the gradient
vector is the same
at all the pixel
locations in those
regions.

Figure 10.17 shows the gradient angle image computed using Eq. (10-18). In general, angle images are
not as useful as gradient magnitude images for edge detection, but they do complement the information
extracted from an image using the magnitude of the gradient. For instance, the constant intensity areas
in Fig. 10.16(a), such as the front edge of the sloping roof and top horizontal bands of the front wall,
are constant in Fig. 10.17, indicating that the gradient vector direction at all the pixel locations in those
regions is the same. As we will show later in this section, angle information plays a key supporting role
in the implementation of the Canny edge detection algorithm, a widely used edge detection scheme.

The original image in Fig. 10.16(a) is of reasonably high resolution, and at the
distance the image was acquired, the contribution made to image detail by the wall
bricks is significant. This level of fine detail often is undesirable in edge detection
because it tends to act as noise, which is enhanced by derivative computations and
thus complicates detection of the principal edges. One way to reduce fine detail is
to smooth the image prior to computing the edges. Figure 10.18 shows the same
sequence of images as in Fig. 10.16, but with the original image smoothed first using
a 5  5 averaging filter (see Section 3.5 regarding smoothing filters). The response
of each kernel now shows almost no contribution due to the bricks, with the results
being dominated mostly by the principal edges in the image.
Figures 10.16 and 10.18 show that the horizontal and vertical Sobel kernels do
not differentiate between edges in the 45 directions. If it is important to empha-
size edges oriented in particular diagonal directions, then one of the Kirsch kernels
in Fig. 10.15 should be used. Figures 10.19(a) and (b) show the responses of the 45°
(NW) and 45 (SW) Kirsch kernels, respectively. The stronger diagonal selectivity
of these kernels is evident in these figures. Both kernels have similar responses to
horizontal and vertical edges, but the response in these directions is weaker.

The threshold used to Combining the Gradient with Thresholding

generate Fig. 10.20(a)
was selected so that most The results in Fig. 10.18 show that edge detection can be made more selective by
of the small edges caused
by the bricks were smoothing the image prior to computing the gradient. Another approach aimed
eliminated. This was the at achieving the same objective is to threshold the gradient image. For example,
same objective as when
the image in Fig. 10.16(a) Fig. 10.20(a) shows the gradient image from Fig. 10.16(d), thresholded so that pix-
was smoothed prior to els with values greater than or equal to 33% of the maximum value of the gradi-
computing the gradient.
ent image are shown in white, while pixels below the threshold value are shown in

vtucircle.com
10.2 Point, Line, and Edge Detection 723

c d
FIGURE 10.18
Same sequence as
in Fig. 10.16, but
with the original
image smoothed
using a 5  5 aver-
aging kernel prior
to edge detection.

black. Comparing this image with Fig. 10.16(d), we see that there are fewer edges
in the thresholded image, and that the edges in this image are much sharper (see,
for example, the edges in the roof tile). On the other hand, numerous edges, such
as the sloping line defining the far edge of the roof (see arrow), are broken in the
thresholded image.
When interest lies both in highlighting the principal edges and on maintaining
as much connectivity as possible, it is common practice to use both smoothing and
thresholding. Figure 10.20(b) shows the result of thresholding Fig. 10.18(d), which is
the gradient of the smoothed image. This result shows a reduced number of broken
edges; for instance, compare the corresponding edges identified by the arrows in
Figs. 10.20(a) and (b).

a b
FIGURE 10.19
Diagonal edge
detection.
(a) Result of using
the Kirsch kernel in
Fig. 10.15(c).
(b) Result of using
the kernel in Fig.
10.15(d). The input
image in both cases
was Fig. 10.18(a).

vtucircle.com
a b
FIGURE 10.20
(a) Result of
thresholding
Fig. 10.16(d), the
gradient of the
original image.
(b) Result of
thresholding
Fig. 10.18(d), the
gradient of the
smoothed image.

MORE ADVANCED TECHNIQUES FOR EDGE DETECTION

The edge-detection methods discussed in the previous subsections are based on fil-
tering an image with one or more kernels, with no provisions made for edge char-
acteristics and noise content. In this section, we discuss more advanced techniques
that attempt to improve on simple edge-detection methods by taking into account
factors such as image noise and the nature of edges themselves.

The Marr-Hildreth Edge Detector

One of the earliest successful attempts at incorporating more sophisticated analy-
sis into the edge-finding process is attributed to Marr and Hildreth [1980]. Edge-
detection methods in use at the time were based on small operators, such as the
Sobel kernels discussed earlier. Marr and Hildreth argued (1) that intensity chang-
es are not independent of image scale, implying that their detection requires using
operators of different sizes; and (2) that a sudden intensity change will give rise to a
peak or trough in the first derivative or, equivalently, to a zero crossing in the second
derivative (as we saw in Fig. 10.10).
These ideas suggest that an operator used for edge detection should have two
salient features. First and foremost, it should be a differential operator capable of
computing a digital approximation of the first or second derivative at every point in
the image. Second, it should be capable of being “tuned” to act at any desired scale,
so that large operators can be used to detect blurry edges and small operators to
detect sharply focused fine detail.
Marr and Hildreth suggested that the most satisfactory operator fulfilling these
conditions is the filter ∇2G where, as defined in Section 3.6, ∇2 is the Laplacian, and
Equation (10-27) differs G is the 2-D Gaussian function
from the definition of a
Gaussian function by a x2  y2

multiplicative constant
[see Eq. (3-45)]. Here, G(x, y)  e 2s2 (10-27)
we are interested only in
the general shape of the

with standard deviation s (sometimes s is called the space constant in this context).
Gaussian function.

We find an expression for ∇2G by applying the Laplacian to Eq. (10-27):

vtucircle.com
10.2 Point, Line, and Edge Detection 725

2G(x, y) 2G(x, y)
2G(x, y)  
x2 y2
x2  y2 x2  y2
 x    y 
 e 2s 2
b  y s2 e
2s2
b (10-28)
x s2
x2  y2 x2  y2
x2  1  2

1 
 e 2s2
a
e 2s2
s4 s2 s4 s2
Collecting terms, we obtain

x 2  y2
x2  y2  2s2 
 G(x, y)  a
2
be
2s2 (10-29)
s 4

This expression is called the Laplacian of a Gaussian (LoG).

Figures 10.21(a) through (c) show a 3-D plot, image, and cross-section of the
negative of the LoG function (note that the zero crossings of the LoG occur at
x2  y2  2s2 , which defines a circle of radius 2s centered on the peak of the
Gaussian function). Because of the shape illustrated in Fig. 10.21(a), the LoG func-
tion sometimes is called the Mexican hat operator. Figure 10.21(d) shows a 5  5
kernel that approximates the shape in Fig. 10.21(a) (normally, we would use the neg-
ative of this kernel). This approximation is not unique. Its purpose is to capture the
essential shape of the LoG function; in terms of Fig. 10.21(a), this means a positive,
central term surrounded by an adjacent, negative region whose values decrease as a
function of distance from the origin, and a zero outer region. The coefficients must
sum to zero so that the response of the kernel is zero in areas of constant intensity.
Filter kernels of arbitrary size (but fixed s) can be generated by sampling Eq. (10-29),
and scaling the coefficients so that they sum to zero. A more effective approach for
generating a LoG kernel is sampling Eq. (10-27) to the desired size, then convolving
the resulting array with a Laplacian kernel, such as the kernel in Fig. 10.4(a). Because
convolving an image with a kernel whose coefficients sum to zero yields an image
whose elements also sum to zero (see Problems 3.32 and 10.16), this approach auto-
matically satisfies the requirement that the sum of the LoG kernel coefficients be
zero. We will discuss size selection for LoG filter later in this section.
There are two fundamental ideas behind the selection of the operator 2G. First,
the Gaussian part of the operator blurs the image, thus reducing the intensity of
structures (including noise) at scales much smaller than s. Unlike the averaging
filter used in Fig. 10.18, the Gaussian function is smooth in both the spatial and
frequency domains (see Section 4.8), and is thus less likely to introduce artifacts
(e.g., ringing) not present in the original image. The other idea concerns the second-
derivative properties of the Laplacian operator, 2. Although first derivatives can
be used for detecting abrupt changes in intensity, they are directional operators. The
Laplacian, on the other hand, has the important advantage of being isotropic (invari-
ant to rotation), which not only corresponds to characteristics of the human visual
system (Marr [1982]) but also responds equally to changes in intensity in any kernel

vtucircle.com
∇2G
c d
FIGURE 10.21
(a) 3-D plot of
the negative of the
LoG.
(b) Negative of
the LoG
displayed as an
image.
(c) Cross section
of (a) showing
zero crossings.
(d) 5  5 kernel
approximation to ∇2G
0 0 1 0 0
the shape in (a).
The negative
of this kernel 0 1 2 1 0
would be used in
practice. 16
1 2 2 1

0 1 2 1 0

0 0 1 0 0

direction, thus avoiding having to use multiple kernels to calculate the strongest
response at any point in the image.
The Marr-Hildreth algorithm consists of convolving the LoG kernel with an input
image,

g(x, y)  ∇2G(x, y) ★ f (x, y) (10-30)

and then finding the zero crossings of g(x, y) to determine the locations of edges in
f (x, y). Because the Laplacian and convolution are linear processes, we can write
This expression is
implemented in the Eq. (10-30) as
spatial domain using

g(x, y)  2 G(x, y)★ f (x, y)

Eq. (3-35). It can be
implemented also in the (10-31)
frequency domain using
Eq. (4-104).
indicating that we can smooth the image first with a Gaussian filter and then com-
pute the Laplacian of the result. These two equations give identical results.
The Marr-Hildreth edge-detection algorithm may be summarized as follows:

1. Filter the input image with an n  n Gaussian lowpass kernel obtained by sam-
pling Eq. (10-27).
2. Compute the Laplacian of the image resulting from Step 1 using, for example,
the 3  3 kernel in Fig. 10.4(a). [Steps 1 and 2 implement Eq. (10-31).]
3. Find the zero crossings of the image from Step 2.

vtucircle.com
10.2 Point, Line, and Edge Detection 727
To specify the size of the Gaussian kernel, recall from our discussion of Fig. 3.35 that
the values of a Gaussian function at a distance larger than 3s from the mean are
small enough so that they can be ignored. As discussed in Section 3.5, this implies
As explained in Section
3.5, )ƒ and :; denote the
using a Gaussian kernel of size L6sM  L6sM , where L6sM denotes the ceiling of 6s; that
ceiling and floor func- is, smallest integer not less than 6s. Because we work with kernels of odd dimen-
tions. That is, the ceiling
and floor functions map
sions, we would use the smallest odd integer satisfying this condition. Using a kernel
a real number to the smaller than this will “truncate” the LoG function, with the degree of truncation
smallest following, or the
largest previous, integer,
being inversely proportional to the size of the kernel. Using a larger kernel would
respectively. make little difference in the result.
One approach for finding the zero crossings at any pixel, p, of the filtered image,
g(x, y), is to use a 3  3 neighborhood centered at p. A zero crossing at p implies
Attempts to find zero
that the signs of at least two of its opposing neighboring pixels must differ. There are
crossings by finding the four cases to test: left/right, up/down, and the two diagonals. If the values of g(x, y)
coordinates (x, y) where are being compared against a threshold (a common approach), then not only must
g(x, y) = 0 are impractical
because of noise and the signs of opposing neighbors be different, but the absolute value of their numeri-
other computational cal difference must also exceed the threshold before we can call p a zero-crossing
inaccuracies.
pixel. We illustrate this approach in Example 10.7.
Computing zero crossings is the key feature of the Marr-Hildreth edge-detection
method. The approach discussed in the previous paragraph is attractive because of
its simplicity of implementation and because it generally gives good results. If the
accuracy of the zero-crossing locations found using this method is inadequate in a
particular application, then a technique proposed by Huertas and Medioni [1986]
for finding zero crossings with subpixel accuracy can be employed.

Figure 10.22(a) shows the building image used earlier and Fig. 10.22(b) is the result of Steps 1 and 2 of
the Marr-Hildreth algorithm, using s  4 (approximately 0.5% of the short dimension of the image)
and n  25 to satisfy the size condition stated above. As in Fig. 10.5, the gray tones in this image are due
to scaling. Figure 10.22(c) shows the zero crossings obtained using the 3  3 neighborhood approach just
discussed, with a threshold of zero. Note that all the edges form closed loops. This so-called “spaghetti
effect” is a serious drawback of this method when a threshold value of zero is used (see Problem 10.17).
We avoid closed-loop edges by using a positive threshold.
Figure 10.22(d) shows the result of using a threshold approximately equal to 4% of the maximum
value of the LoG image. The majority of the principal edges were readily detected, and “irrelevant” fea-
tures, such as the edges due to the bricks and the tile roof, were filtered out. This type of performance
is virtually impossible to obtain using the gradient-based edge-detection techniques discussed earlier.
Another important consequence of using zero crossings for edge detection is that the resulting edges are
1 pixel thick. This property simplifies subsequent stages of processing, such as edge linking.

It is possible to approximate the LoG function in Eq. (10-29) by a difference of

Gaussians (DoG):

x y
2 2
x y
2 2
1 1
DG (x, y)  2
e 2s 1 
2 
2 e 2s 2
2 (10-32)
2ps 1 2ps 2

vtucircle.com
c d
FIGURE 10.22
(a) Image of size
834  1114 pixels,
with intensity
values scaled to the
range [0, 1].
(b) Result of
Steps 1 and 2 of
the Marr-Hildreth
algorithm using
s  4 and n  25.
(c) Zero cross-
ings of (b) using
a threshold of 0
(note the closed-
loop edges).
(d) Zero cross-
ings found using a
threshold equal to
4% of the maxi-
mum value of the
image in (b). Note
the thin edges.

with s1  s2. Experimental results suggest that certain “channels” in the human
vision system are selective with respect to orientation and frequency, and can be
modeled using Eq. (10-32) with a ratio of standard deviations of 1.75:1. Using the
ratio 1.6:1 preserves the basic characteristics of these observations and also pro-
vides a closer “engineering” approximation to the LoG function (Marr and Hil-
dreth [1980]). In order for the LoG and DoG to have the same zero crossings, the
value of s for the LoG must be selected based on the following equation (see
Problem 10.19):
 s 2 
s 2 s12 s22 (10-33)
2 2
ln  1 
s1  s2  2 
Although the zero crossings of the LoG and DoG will be the same when this value
of s is used, their amplitude scales will be different. We can make them compatible
by scaling both functions so that they have the same value at the origin.
The profiles in Figs. 10.23(a) and (b) were generated with standard devia-
tion ratios of 1:1.75 and 1:1.6, respectively (by convention, the curves shown are
inverted, as in Fig. 10.21). The LoG profiles are the solid lines, and the DoG profiles
are dotted. The curves shown are intensity profiles through the center of the LoG
and DoG arrays, generated by sampling Eqs. (10-29) and (10-32), respectively. The
amplitude of all curves at the origin were normalized to 1. As Fig. 10.23(b) shows,
the ratio 1:1.6 yielded a slightly closer approximation of the LoG and DoG func-
tions (for example, compare the bottom lobes of the two figures).

vtucircle.com
10.2 Point, Line, and Edge Detection 729

a b
FIGURE 10.23
(a) Negatives of
the LoG (solid)
and DoG
(dotted) profiles
using a s ratio of
1.75:1. (b) Profiles
obtained using a
ratio of 1.6:1.

Gaussian kernels are separable (see Section 3.4). Therefore, both the LoG and
the DoG filtering operations can be implemented with 1-D convolutions instead of
using 2-D convolutions directly (see Problem 10.19). For an image of size M  N
and a kernel of size n  n, doing so reduces the number of multiplications and addi-
tions for each convolution from being proportional to n2MN for 2-D convolutions
to being proportional to nMN for 1-D convolutions. This implementation difference
is significant. For example, if n  25, a 1-D implementation will require on the order
of 12 times fewer multiplication and addition operations than using 2-D convolution.

The Canny Edge Detector

Although the algorithm is more complex, the performance of the Canny edge detec-
tor (Canny [1986]) discussed in this section is superior in general to the edge detec-
tors discussed thus far. Canny’s approach is based on three basic objectives:
1. Low error rate. All edges should be found, and there should be no spurious
responses.
2. Edge points should be well localized. The edges located must be as close as pos-
sible to the true edges. That is, the distance between a point marked as an edge
by the detector and the center of the true edge should be minimum.
3. Single edge point response. The detector should return only one point for each
true edge point. That is, the number of local maxima around the true edge should
be minimum. This means that the detector should not identify multiple edge pix-
els where only a single edge point exists.
The essence of Canny’s work was in expressing the preceding three criteria math-
ematically, and then attempting to find optimal solutions to these formulations. In
general, it is difficult (or impossible) to find a closed-form solution that satisfies
all the preceding objectives. However, using numerical optimization with 1-D step
edges corrupted by additive white Gaussian noise† led to the conclusion that a good
approximation to the optimal step edge detector is the first derivative of a Gaussian,

x
2
d x x2
e 2s  2 e 2s
2 2
(10-34)
dx s
†
Recall that white noise is noise having a frequency spectrum that is continuous and uniform over a specified
frequency band. White Gaussian noise is white noise in which the distribution of amplitude values is Gaussian.
Gaussian white noise is a good approximation of many real-world situations and generates mathematically
tractable models. It has the useful property that its values are statistically independent.

vtucircle.com
where the approximation was only about 20% worse that using the optimized
numerical solution (a difference of this magnitude generally is visually impercep-
tible in most applications).
Generalizing the preceding result to 2-D involves recognizing that the 1-D
approach still applies in the direction of the edge normal (see Fig. 10.12). Because
the direction of the normal is unknown beforehand, this would require applying the
1-D edge detector in all possible directions. This task can be approximated by first
smoothing the image with a circular 2-D Gaussian function, computing the gradient
of the result, and then using the gradient magnitude and direction to estimate edge
strength and direction at every point.
Let f (x, y) denote the input image and G(x, y) denote the Gaussian function:
x2  y2

G(x, y)  e 2s2 (10-35)

We form a smoothed image, fs (x, y), by convolving f and G:

fs (x, y)  G(x, y)★ f (x, y) (10-36)

This operation is followed by computing the gradient magnitude and direction

(angle), as discussed earlier:

Ms (x, y)  ∇fs (x, y)  (10-37)

and
 gy (x, y)
a(x, y)  tan1 g (x, y) (10-38)
 
 x 

with gx (x, y)  fs (x, y) x and gy (x, y)  fs (x, y) y. Any of the derivative fil-
ter kernel pairs in Fig. 10.14 can be used to obtain gx (x, y) and gy (x, y). Equation
(10-36) is implemented using an n  n Gaussian kernel whose size is discussed below.
Keep in mind that ∇fs (x, y) and a(x, y) are arrays of the same size as the image
from which they are computed.
Gradient image ∇fs (x, y) typically contains wide ridges around local maxima.
The next step is to thin those ridges. One approach is to use nonmaxima suppres-
sion. The essence of this approach is to specify a number of discrete orientations of
the edge normal (gradient vector). For example, in a 3  3 region we can define four
orientations† for an edge passing through the center point of the region: horizontal,
vertical, 45 and 45 Figure 10.24(a) shows the situation for the two possible
orientations of a horizontal edge. Because we have to quantize all possible edge
directions into four ranges, we have to define a range of directions over which we
consider an edge to be horizontal. We determine edge direction from the direction
of the edge normal, which we obtain directly from the image data using Eq. (10-38).
As Fig. 10.24(b) shows, if the edge normal is in the range of directions from 22.5 to
†
Every edge has two possible orientations. For example, an edge whose normal is oriented at 0° and an edge
whose normal is oriented at 180° are the same horizontal edge.

vtucircle.com
10.2 Point, Line, and Edge Detection 731

a b —157.5° +157.5°
c Edge normal
FIGURE 10.24
(a) Two possible
orientations of a
y
horizontal edge 4 6
(shaded) in a 3  3 Edge Edge normal
neighborhood. (gradient vector)
(b) Range of values a
(shaded) of a, the Edge normal
—22.5° +22.5°
direction angle of
x
the edge normal
for a horizontal —157.5° +157.5°
edge. (c) The angle +45°edge
ranges of the edge
normals for the —112.5° +112.5°
four types of edge
directions in a 3  3 Vertical edge
neighborhood.
Each edge direc-
tion has two ranges, —67.5° +67.5°
shown in corre-
sponding shades.
—45°edge
—22.5° +22.5°
0°
Horizontal edge

22.5 or from 157.5 to 157.5 we call the edge a horizontal edge. Figure 10.24(c)
shows the angle ranges corresponding to the four directions under consideration.
Let d1 , d2 , d3 ,and d4 denote the four basic edge directions just discussed for
a 3  3 region: horizontal, 45 vertical, and 45 respectively. We can formulate
the following nonmaxima suppression scheme for a 3  3 region centered at an
arbitrary point (x, y) in a:
1. Find the direction dk that is closest to a(x, y).
2. Let K denote the value of ∇fs at (x, y). If K is less than the value of ∇fs at one
or both of the neighbors of point (x, y) along dk , let gN (x, y)  0 (suppression);
otherwise, let gN (x, y)  K.
When repeated for all values of x and y, this procedure yields a nonmaxima sup-
pressed image gN (x, y) that is of the same size as fs (x, y). For example, with reference
to Fig. 10.24(a), letting (x, y) be at p5, and assuming a horizontal edge through p5,
the pixels of interest in Step 2 would be p2 and p8. Image gN (x, y) contains only the
thinned edges; it is equal to image ∇fs (x, y) with the nonmaxima edge points sup-
pressed.
The final operation is to threshold gN (x, y) to reduce false edge points. In the
Marr-Hildreth algorithm we did this using a single threshold, in which all values
below the threshold were set to 0. If we set the threshold too low, there will still
be some false edges (called false positives). If the threshold is set too high, then
valid edge points will be eliminated (false negatives). Canny’s algorithm attempts to

vtucircle.com
improve on this situation by using hysteresis thresholding which, as we will discuss
in Section 10.3, uses two thresholds: a low threshold, TL and a high threshold, TH .
Experimental evidence (Canny [1986]) suggests that the ratio of the high to low
threshold should be in the range of 2:1 to 3:1.
We can visualize the thresholding operation as creating two additional images:

gNH (x, y)  gN (x, y)  TH (10-39)

and
gNL(x, y)  gN (x, y)  TL (10-40)

Initially, gNH (x, y) and gNL(x, y) are set to 0. After thresholding, gNH (x, y) will usu-
ally have fewer nonzero pixels than gNL(x, y), but all the nonzero pixels in gNH (x, y)
will be contained in gNL(x, y) because the latter image is formed with a lower thresh-
old. We eliminate from gNL(x, y) all the nonzero pixels from gNH (x, y) by letting
gNL(x, y)  gNL(x, y)  gNH (x, y) (10-41)
The nonzero pixels in gNH (x, y) and gNL(x, y) may be viewed as being “strong”
and “weak” edge pixels, respectively. After the thresholding operations, all strong
pixels in gNH (x, y) are assumed to be valid edge pixels, and are so marked imme-
diately. Depending on the value of TH , the edges in gNH (x, y) typically have gaps.
Longer edges are formed using the following procedure:

(a) Locate the next unvisited edge pixel, p, in gNH (x, y).
(b) Mark as valid edge pixels all the weak pixels in gNL(x, y) that are connected to
p using, say, 8-connectivity.
(c) If all nonzero pixels in gNH (x, y) have been visited go to Step (d). Else, return
to Step ( a).
(d) Set to zero all pixels in gNL(x, y) that were not marked as valid edge pixels.
At the end of this procedure, the final image output by the Canny algorithm is
formed by appending to gNH (x, y) all the nonzero pixels from gNL(x, y).
We used two additional images, gNH (x, y) and gNL(x, y) to simplify the discussion.
In practice, hysteresis thresholding can be implemented directly during nonmaxima
suppression, and thresholding can be implemented directly on gN (x, y) by forming a
list of strong pixels and the weak pixels connected to them.
Summarizing, the Canny edge detection algorithm consists of the following steps:

1. Smooth the input image with a Gaussian filter.

2. Compute the gradient magnitude and angle images.
3. Apply nonmaxima suppression to the gradient magnitude image.
4. Use double thresholding and connectivity analysis to detect and link edges.

Although the edges after nonmaxima suppression are thinner than raw gradient edg-
es, the former can still be thicker than one pixel. To obtain edges one pixel thick, it is
typical to follow Step 4 with one pass of an edge-thinning algorithm (see Section 9.5).

vtucircle.com
10.2 Point, Line, and Edge Detection 733
As mentioned earlier, smoothing is accomplished by convolving the input image
with a Gaussian kernel whose size, n  n, must be chosen. Once a value of s has
been specified, we can use the approach discussed in connection with the Marr-Hil-
Usually, selecting a
suitable value of s dreth algorithm to determine an odd value of n that provides the “full” smoothing
for the first time in an capability of the Gaussian filter for the specified value of s.
application requires
experimentation. Some final comments on implementation: As noted earlier in the discussion of
the Marr-Hildreth edge detector, the 2-D Gaussian function in Eq. (10-35) is sepa-
rable into a product of two 1-D Gaussians. Thus, Step 1 of the Canny algorithm can
be formulated as 1-D convolutions that operate on the rows (columns) of the image
one at a time, and then work on the columns (rows) of the result. Furthermore, if
we use the approximations in Eqs. (10-19) and (10-20), we can also implement the
gradient computations required for Step 2 as 1-D convolutions (see Problem 10.22).

Figure 10.25(a) shows the familiar building image. For comparison, Figs. 10.25(b) and (c) show, respec-
tively, the result in Fig. 10.20(b) obtained using the thresholded gradient, and Fig. 10.22(d) using the
Marr-Hildreth detector. Recall that the parameters used in generating those two images were selected
to detect the principal edges, while attempting to reduce “irrelevant” features, such as the edges of the
bricks and the roof tiles.
Figure 10.25(d) shows the result obtained with the Canny algorithm using the parameters TL  0.04,
TH  0.10 (2.5 times the value of the low threshold), s  4, and a kernel of size 25  25, which cor-
responds to the smallest odd integer not less than 6s. These parameters were chosen experimentally

c d
FIGURE 10.25
(a) Original image
of size 834  1114
pixels, with
intensity values
scaled to the range
[0, 1].
(b) Thresholded
gradient of the
smoothed image.
(c) Image obtained
using the
Marr-Hildreth
algorithm.
(d) Image obtained
using the Canny
algorithm. Note the
significant
improvement of
the Canny image
compared to the
other two.

vtucircle.com
to achieve the objectives stated in the previous paragraph for the gradient and Marr-Hildreth images.
Comparing the Canny image with the other two images, we see in the Canny result significant improve-
ments in detail of the principal edges and, at the same time, more rejection of irrelevant features. For
example, note that both edges of the concrete band lining the bricks in the upper section of the image
were detected by the Canny algorithm, whereas the thresholded gradient lost both of these edges, and
the Marr-Hildreth method detected only the upper one. In terms of filtering out irrelevant detail, the
Canny image does not contain a single edge due to the roof tiles; this is not true in the other two images.
The quality of the lines with regard to continuity, thinness, and straightness is also superior in the Canny
image. Results such as these have made the Canny algorithm a tool of choice for edge detection.

As another comparison of the three principal edge-detection methods discussed in this section, consider
Fig. 10.26(a), which shows a 512  512 head CT image. Our objective is to extract the edges of the outer
contour of the brain (the gray region in the image), the contour of the spinal region (shown directly
behind the nose, toward the front of the brain), and the outer contour of the head. We wish to generate
the thinnest, continuous contours possible, while eliminating edge details related to the gray content in
the eyes and brain areas.
Figure 10.26(b) shows a thresholded gradient image that was first smoothed using a 5  5 averaging
kernel. The threshold required to achieve the result shown was 15% of the maximum value of the gradi-
ent image. Figure 10.26(c) shows the result obtained with the Marr-Hildreth edge-detection algorithm
with a threshold of 0.002, s  3, and a kernel of size 19  19. Figure 10.26(d) was obtained using the
Canny algorithm with TL  0.05,TH  0.15 (3 times the value of the low threshold), s  2, and a kernel
of size 13  13.

c d
FIGURE 10.26
(a) Head CT image
of size 512  512
pixels, with
intensity values
scaled to the range
[0, 1].
(b) Thresholded
gradient of the
smoothed image.
(c) Image obtained
using the Marr-Hil-
dreth algorithm.
(d) Image obtained
using the Canny
algorithm.
(Original image
courtesy of Dr.
David R. Pickens,
Vanderbilt
University.)

vtucircle.com
10.2 Point, Line, and Edge Detection 735
In terms of edge quality and the ability to eliminate irrelevant detail, the results in Fig. 10.26 correspond
closely to the results and conclusions in the previous example. Note also that the Canny algorithm was
the only procedure capable of yielding a totally unbroken edge for the posterior boundary of the brain,
and the closest boundary of the spinal cord. It was also the only procedure capable of finding the cleanest
contours, while eliminating all the edges associated with the gray brain matter in the original image.

The price paid for the improved performance of the Canny algorithm is a sig-
nificantly more complex implementation than the two approaches discussed earlier.
In some applications, such as real-time industrial image processing, cost and speed
requirements usually dictate the use of simpler techniques, principally the thresh-
olded gradient approach. When edge quality is the driving force, the Marr-Hildreth
and Canny algorithms, especially the latter, offer superior alternatives.

LINKING EDGE POINTS

Ideally, edge detection should yield sets of pixels lying only on edges. In practice,
these pixels seldom characterize edges completely because of noise, breaks in the
edges caused by nonuniform illumination, and other effects that introduce disconti-
nuities in intensity values. Therefore, edge detection typically is followed by linking
algorithms designed to assemble edge pixels into meaningful edges and/or region
boundaries. In this section, we discuss two fundamental approaches to edge linking
that are representative of techniques used in practice. The first requires knowledge
about edge points in a local region (e.g., a 3  3 neighborhood), and the second
is a global approach that works with an entire edge map. As it turns out, linking
points along the boundary of a region is also an important aspect of some of the
segmentation methods discussed in the next chapter, and in extracting features from
a segmented image, as we will do in Chapter 11. Thus, you will encounter additional
edge-point linking methods in the next two chapters.

Local Processing
A simple approach for linking edge points is to analyze the characteristics of pixels
in a small neighborhood about every point (x, y) that has been declared an edge
point by one of the techniques discussed in the preceding sections. All points that
are similar according to predefined criteria are linked, forming an edge of pixels that
share common properties according to the specified criteria.
The two principal properties used for establishing similarity of edge pixels in this
kind of local analysis are (1) the strength (magnitude) and (2) the direction of the
gradient vector. The first property is based on Eq. (10-17). Let Sxy denote the set of
coordinates of a neighborhood centered at point (x, y) in an image. An edge pixel
with coordinates (s, t) in Sxy is similar in magnitude to the pixel at (x, y) if

M(s, t)  M(x, y)  E (10-42)

where E is a positive threshold.

vtucircle.com
The direction angle of the gradient vector is given by Eq. (10-18). An edge pixel
with coordinates (s, t) in Sxy has an angle similar to the pixel at (x, y) if

a(s, t)  a(x, y)  A (10-43)

where A is a positive angle threshold. As noted earlier, the direction of the edge at
(x, y) is perpendicular to the direction of the gradient vector at that point.
A pixel with coordinates (s, t) in Sxy is considered to be linked to the pixel at (x, y)
if both magnitude and direction criteria are satisfied. This process is repeated for
every edge pixel. As the center of the neighborhood is moved from pixel to pixel, a
record of linked points is kept. A simple bookkeeping procedure is to assign a dif-
ferent intensity value to each set of linked edge pixels.
The preceding formulation is computationally expensive because all neighbors of
every point have to be examined. A simplification particularly well suited for real
time applications consists of the following steps:

1. Compute the gradient magnitude and angle arrays, M(x, y) and a(x, y), of the
input image, f (x, y).
2. Form a binary image, g(x, y), whose value at any point (x, y) is given by:

1 if M(x, y)  TM AND a(x, y)  A  TA

g(x, y)  
0 otherwise

where TM is a threshold, A is a specified angle direction, and TA defines a

“band” of acceptable directions about A.
3. Scan the rows of g and fill (set to 1) all gaps (sets of 0’s) in each row that do not
exceed a specified length, L. Note that, by definition, a gap is bounded at both
ends by one or more 1’s. The rows are processed individually, with no “memory”
kept between them.
4. To detect gaps in any other direction, u, rotate g by this angle and apply the
horizontal scanning procedure in Step 3. Rotate the result back by u.

When interest lies in horizontal and vertical edge linking, Step 4 becomes a simple
procedure in which g is rotated ninety degrees, the rows are scanned, and the result
is rotated back. This is the application found most frequently in practice and, as the
following example shows, this approach can yield good results. In general, image
rotation is an expensive computational process so, when linking in numerous angle
directions is required, it is more practical to combine Steps 3 and 4 into a single,
radial scanning procedure.

Figure 10.27(a) shows a 534  566 image of the rear of a vehicle. The objective of this example is to
illustrate the use of the preceding algorithm for finding rectangles whose sizes makes them suitable
candidates for license plates. The formation of these rectangles can be accomplished by detecting

vtucircle.com
10.2 Point, Line, and Edge Detection 737

a b c
de f
FIGURE 10.27
(a) Image of the rear
of a vehicle.
(b) Gradient magni-
tude image.
(c) Horizontally
connected edge
pixels.
(d) Vertically con-
nected edge pixels.
(e) The logical OR
of (c) and (d).
(f) Final result,
using morphological
thinning. (Original
image courtesy of
Perceptics
Corporation.)

strong horizontal and vertical edges. Figure 10.27(b) shows the gradient magnitude image, M(x, y), and
Figs. 10.27(c) and (d) show the result of Steps 3 and 4 of the algorithm, obtained by letting TM equal
to 30% of the maximum gradient value, A  90, TA  45 and filling all gaps of 25 or fewer pixels
(approximately 5% of the image width). A large range of allowable angle directions was required to
detect the rounded corners of the license plate enclosure, as well as the rear windows of the vehicle.
Figure 10.27(e) is the result of forming the logical OR of the two preceding images, and Fig. 10.27(f)
was obtained by thinning 10.27(e) with the thinning procedure discussed in Section 9.5. As Fig. 10.27(f)
shows, the rectangle corresponding to the license plate was clearly detected in the image. It would be
a simple matter to isolate the license plate from all the rectangles in the image, using the fact that the
width-to-height ratio of license plates have distinctive proportions (e.g., a 2:1 ratio in U.S. plates).

Global Processing Using the Hough Transform

The method discussed in the previous section is applicable in situations in which
knowledge about pixels belonging to individual objects is available. Often, we have
to work in unstructured environments in which all we have is an edge map and no
knowledge about where objects of interest might be. In such situations, all pixels
are candidates for linking, and thus have to be accepted or eliminated based on pre-
defined global properties. In this section, we develop an approach based on whether
sets of pixels lie on curves of a specified shape. Once detected, these curves form the
edges or region boundaries of interest.
Given n points in an image, suppose that we want to find subsets of these points
that lie on straight lines. One possible solution is to find all lines determined by every
pair of points, then find all subsets of points that are close to particular lines. This
approach involves finding n  n  1 2  n2 lines, then performing  n   n  n  1 2  n3

vtucircle.com
comparisons of every point to all lines. This is a computationally prohibitive task in
most applications.
Hough [1962] proposed an alternative approach, commonly referred to as the
The original formulation
of the Hough transform Hough transform. Let (xi , yi ) denote a point in the xy-plane and consider the general
presented here works equation of a straight line in slope-intercept form: yi  axi  b. Infinitely many lines
with straight lines. For a
generalization to pass through (xi , yi ), but they all satisfy the equation yi  axi  b for varying val-
arbitrary shapes, see
Ballard [1981].
ues of a and b. However, writing this equation as b   xia  yi and considering the
ab-plane (also called parameter space) yields the equation of a single line for a fixed
point (xi , yi ). Furthermore, a second point (xj , yj ) also has a single line in parameter
space associated with it, which intersects the line associated with (xi , yi ) at some
point (a', b') in parameter space, where a' is the slope and b' the intercept of the line
containing both (xi , yi ) and (xj , yj ) in the xy-plane (we are assuming, of course, that
the lines are not parallel). In fact, all points on this line have lines in parameter space
that intersect at (a', b'). Figure 10.28 illustrates these concepts.
In principle, the parameter space lines corresponding to all points (xk , yk ) in the
xy-plane could be plotted, and the principal lines in that plane could be found by
identifying points in parameter space where large numbers of parameter-space lines
intersect. However, a difficulty with this approach is that a, (the slope of a line)
approaches infinity as the line approaches the vertical direction. One way around
this difficulty is to use the normal representation of a line:

x cosu  y sin u  r (10-44)

Figure 10.29(a) illustrates the geometrical interpretation of the parameters r and u.

A horizontal line has u  0, with r being equal to the positive x-intercept. Simi-
larly, a vertical line has u  90, with r being equal to the positive y-intercept, or
u  90, with r being equal to the negative y-intercept (we limit the angle to the
range 90  u  90). Each sinusoidal curve in Figure 10.29(b) represents the fam-
ily of lines that pass through a particular point (xk , yk ) in the xy-plane. The intersec-
tion point (r', u') in Fig. 10.29(b) corresponds to the line that passes through both
(xi , yi ) and (xj , yj ) in Fig. 10.29(a).
The computational attractiveness of the Hough transform arises from subdividing
the ru parameter space into so-called accumulator cells, as Fig. 10.29(c) illustrates,
where (rmin, rmax ) and umin, umax  are the expected ranges of the parameter values:

a b b'
y b
FIGURE 10.28
(a) xy-plane. b = —xia + yi
(b) Parameter
space.
a'

b = —xja + yj

vtucircle.com
10.2 Point, Line, and Edge Detection 739

u' umin 0 umax

y u u
rmin
xjcosu + yjsinu = r

r' rmax
xicosu + yisinu = r
x r r

FIGURE 10.29 (a) (r, u) parameterization of a line in the xy-plane. (b) Sinusoidal curves in the ru-plane;the point of
intersection (r', u') corresponds to the line passing through points (xi , yi ) and (xj , yj ) in the xy-plane. (c) Division
of the ru-plane into accumulator cells.

90  u  90 and D  r  D, where D is the maximum distance between opposite

corners in an image. The cell at coordinates (i, j) with accumulator value A(i, j) cor-
responds to the square associated with parameter-space coordinates (ri , uj ). Ini-
tially, these cells are set to zero. Then, for every non-background point (xk , yk ) in
the xy-plane, we let u equal each of the allowed subdivision values on the u-axis
and solve for the corresponding r using the equation r  xk cosu  yk sin u. The
resulting r values are then rounded off to the nearest allowed cell value along the
r axis. If a choice of uq results in the solution rp , then we let A( p, q)  A( p, q)  1.
At the end of the procedure, a value of K in a cell A(i, j) means that K points in the
xy-plane lie on the line x cosuj  y sin uj  ri . The number of subdivisions in the
ru-plane determines the accuracy of the colinearity of these points. It can be shown
(see Problem 10.27) that the number of computations in the method just discussed is
linear with respect to n, the number of non-background points in the xy-plane.

Figure 10.30 illustrates the Hough transform based on Eq. (10-44). Figure 10.30(a) shows an image
of size M  M (M  101) with five labeled white points, and Fig. 10.30(b) shows each of these points
mapped onto the ru-plane using subdivisions of one unit for the r and u axes. The range of u values is
90, and the range of r values is  2M. As Fig. 10.30(b) shows, each curve has a different sinusoidal
shape. The horizontal line resulting from the mapping of point 1 is a sinusoid of zero amplitude.
The points labeled A (not to be confused with accumulator values) and B in Fig. 10.30(b) illustrate
the colinearity detection property of the Hough transform. For example, point B, marks the intersection
of the curves corresponding to points 2, 3, and 4 in the xy image plane. The location of point A indicates
that these three points lie on a straight line passing through the origin (r  0) and oriented at 45 [see
Fig. 10.29(a)]. Similarly, the curves intersecting at point B in parameter space indicate that points 2, 3,
and 4 lie on a straight line oriented at 45, and whose distance from the origin is r  71 (one-half the
diagonal distance from the origin of the image to the opposite corner, rounded to the nearest integer

vtucircle.com
a
b
FIGURE 10.30
(a) Image of size
101  101 pixels,
containing five
white points (four
in the corners and
one in the center).
(b) Corresponding
parameter space.

—100
2

—50

S 1 S
0

3
50 4

100

—80 —60 —40 —20 0 20 40 60 80

value). Finally, the points labeled Q, R, and S in Fig. 10.30(b) illustrate the fact that the Hough transform
exhibits a reflective adjacency relationship at the right and left edges of the parameter space. This prop-
erty is the result of the manner in which r and u change sign at the 90 boundaries.

Although the focus thus far has been on straight lines, the Hough transform is
applicable to any function of the form g v, c  0, where v is a vector of coordinates
and c is a vector of coefficients. For example, points lying on the circle

x  c1    y  c 2   c23
2 2
(10-45)

can be detected by using the basic approach just discussed. The difference is the
presence of three parameters c1, c2 , and c3 that result in a 3-D parameter space with

vtucircle.com
10.2 Point, Line, and Edge Detection 741

cube-like cells, and accumulators of the form A(i, j, k). The procedure is to incre-
ment c1 and c2 , solve for the value of c3 that satisfies Eq. (10-45), and update the
accumulator cell associated with the triplet (c1, c2 , c3 ). Clearly, the complexity of the
Hough transform depends on the number of coordinates and coefficients in a given
functional representation. As noted earlier, generalizations of the Hough transform
to detect curves with no simple analytic representations are possible, as is the appli-
cation of the transform to grayscale images.
Returning to the edge-linking problem, an approach based on the Hough trans-
form is as follows:

1. Obtain a binary edge map using any of the methods discussed earlier in this section.
2. Specify subdivisions in the ru-plane.
3. Examine the counts of the accumulator cells for high pixel concentrations.
4. Examine the relationship (principally for continuity) between pixels in a chosen
cell.

Continuity in this case usually is based on computing the distance between discon-
nected pixels corresponding to a given accumulator cell. A gap in a line associated
with a given cell is bridged if the length of the gap is less than a specified threshold.
Being able to group lines based on direction is a global concept applicable over the
entire image, requiring only that we examine pixels associated with specific accumu-
lator cells. The following example illustrates these concepts.

Figure 10.31(a) shows an aerial image of an airport. The objective of this example is to use the Hough
transform to extract the two edges defining the principal runway. A solution to such a problem might be
of interest, for instance, in applications involving autonomous air navigation.
The first step is to obtain an edge map. Figure 10.31(b) shows the edge map obtained using Canny’s
algorithm with the same parameters and procedure used in Example 10.9. For the purpose of computing
the Hough transform, similar results can be obtained using any of the other edge-detection techniques
discussed earlier. Figure 10.31(c) shows the Hough parameter space obtained using 1° increments for u,
and one-pixel increments for r.
The runway of interest is oriented approximately 1 off the north direction, so we select the cells cor-
responding to 90 and containing the highest count because the runways are the longest lines oriented
in these directions. The small boxes on the edges of Fig. 10.31(c) highlight these cells. As mentioned ear-
lier in connection with Fig. 10.30(b), the Hough transform exhibits adjacency at the edges. Another way
of interpreting this property is that a line oriented at 90 and a line oriented at 90 are equivalent (i.e.,
they are both vertical). Figure 10.31(d) shows the lines corresponding to the two accumulator cells just
discussed, and Fig. 10.31(e) shows the lines superimposed on the original image. The lines were obtained
by joining all gaps not exceeding 20% (approximately 100 pixels) of the image height. These lines clearly
correspond to the edges of the runway of interest.
Note that the only information needed to solve this problem was the orientation of the runway and
the observer’s position relative to it. In other words, a vehicle navigating autonomously would know
that if the runway of interest faces north, and the vehicle’s direction of travel also is north, the runway
should appear vertically in the image. Other relative orientations are handled in a similar manner. The

vtucircle.com
FIGURE 10.31 (a) A 502  564 aerial image of an airport. (b) Edge map obtained using Canny’s algorithm. (c) Hough
parameter space (the boxes highlight the points associated with long vertical lines). (d) Lines in the image plane
corresponding to the points highlighted by the boxes. (e) Lines superimposed on the original image.

orientations of runways throughout the world are available in flight charts, and the direction of travel
is easily obtainable using GPS (Global Positioning System) information. This information also could be
used to compute the distance between the vehicle and the runway, thus allowing estimates of param-
eters such as expected length of lines relative to image size, as we did in this example.

10.3 THRESHOLDING
Because of its intuitive properties, simplicity of implementation, and computational
speed, image thresholding enjoys a central position in applications of image segmen-
tation. Thresholding was introduced in Section 3.1, and we have used it in various
discussions since then. In this section, we discuss thresholding in a more formal way,
and develop techniques that are considerably more general than what has been pre-
sented thus far.

FOUNDATION
In the previous section, regions were identified by first finding edge segments,
then attempting to link the segments into boundaries. In this section, we discuss

vtucircle.com
10.3 Thresholding 743
techniques for partitioning images directly into regions based on intensity values
and/or properties of these values.

The Basics of Intensity Thresholding

Suppose that the intensity histogram in Fig. 10.32(a) corresponds to an image, f (x, y),
composed of light objects on a dark background, in such a way that object and back-
ground pixels have intensity values grouped into two dominant modes. One obvious
Remember, f(x, y)
denotes the intensity of f
way to extract the objects from the background is to select a threshold, T, that sepa-
at coordinates (x, y). rates these modes. Then, any point (x, y) in the image at which f (x, y)  T is called
an object point. Otherwise, the point is called a background point. In other words,
Although we follow the segmented image, denoted by g(x, y), is given by
convention in using 0
intensity for the back-
1 if f (x, y)  T
ground and 1 for object
pixels, any two distinct g(x, y)   (10-46)
values can be used in 0 if f (x, y)  T
Eq. (10-46).

When T is a constant applicable over an entire image, the process given in this equa-
tion is referred to as global thresholding. When the value of T changes over an image,
we use the term variable thresholding. The terms local or regional thresholding are
used sometimes to denote variable thresholding in which the value of T at any point
(x, y) in an image depends on properties of a neighborhood of (x, y) (for example,
the average intensity of the pixels in the neighborhood). If T depends on the spa-
tial coordinates (x, y) themselves, then variable thresholding is often referred to as
dynamic or adaptive thresholding. Use of these terms is not universal.
Figure 10.32(b) shows a more difficult thresholding problem involving a histo-
gram with three dominant modes corresponding, for example, to two types of light
objects on a dark background. Here, multiple thresholding classifies a point (x, y) as
belonging to the background if f (x, y)  T1, to one object class if T1  f (x, y)  T2 ,
and to the other object class if f (x, y)  T2. That is, the segmented image is given by

a if f (x, y)  T2

g x, y  b if T1  f (x, y)  T2 (10-47)
 c if f (x, y)  T1


a b
FIGURE 10.32
Intensity
histograms that
can be partitioned
(a) by a single
threshold, and
(b) by dual
thresholds.
T T1 T2

vtucircle.com
where a, b, and c are any three distinct intensity values. We will discuss dual threshold-
ing later in this section. Segmentation problems requiring more than two thresholds
are difficult (or often impossible) to solve, and better results usually are obtained using
other methods, such as variable thresholding, as will be discussed later in this section,
or region growing, as we will discuss in Section 10.4.
Based on the preceding discussion, we may infer intuitively that the success of
intensity thresholding is related directly to the width and depth of the valley(s) sepa-
rating the histogram modes. In turn, the key factors affecting the properties of the
valley(s) are: (1) the separation between peaks (the further apart the peaks are, the
better the chances of separating the modes); (2) the noise content in the image (the
modes broaden as noise increases); (3) the relative sizes of objects and background;
(4) the uniformity of the illumination source; and (5) the uniformity of the reflectance
properties of the image.

The Role of Noise in Image Thresholding

The simple synthetic image in Fig. 10.33(a) is free of noise, so its histogram con-
sists of two “spike” modes, as Fig. 10.33(d) shows. Segmenting this image into two
regions is a trivial task: we just select a threshold anywhere between the two modes.
Figure 10.33(b) shows the original image corrupted by Gaussian noise of zero
mean and a standard deviation of 10 intensity levels. The modes are broader now

0 63 127 191 255 0 63 127 191 255 0 63 127 191 255

a b c
de f
FIGURE 10.33 (a) Noiseless 8-bit image. (b) Image with additive Gaussian noise of mean 0 and standard deviation of
10 intensity levels. (c) Image with additive Gaussian noise of mean 0 and standard deviation of 50 intensity levels.
(d) through (f) Corresponding histograms.

vtucircle.com
10.3 Thresholding 745
[see Fig. 10.33(e)], but their separation is enough so that the depth of the valley
between them is sufficient to make the modes easy to separate. A threshold placed
midway between the two peaks would do the job. Figure 10.33(c) shows the result
of corrupting the image with Gaussian noise of zero mean and a standard deviation
of 50 intensity levels. As the histogram in Fig. 10.33(f) shows, the situation is much
more serious now, as there is no way to differentiate between the two modes. With-
out additional processing (such as the methods discussed later in this section) we
have little hope of finding a suitable threshold for segmenting this image.

The Role of Illumination and Reflectance in Image Thresholding

Figure 10.34 illustrates the effect that illumination can have on the histogram of
an image. Figure 10.34(a) is the noisy image from Fig. 10.33(b), and Fig. 10.34(d)
shows its histogram. As before, this image is easily segmentable with a single thresh-
old. With reference to the image formation model discussed in Section 2.3, suppose
that we multiply the image in Fig. 10.34(a) by a nonuniform intensity function, such
In theory, the histogram as the intensity ramp in Fig. 10.37(b), whose histogram is shown in Fig. 10.34(e).
of a ramp image is
uniform. In practice, the
Figure 10.34(c) shows the product of these two images, and Fig. 10.34(f) is the result-
degree of uniformity ing histogram. The deep valley between peaks was corrupted to the point where sep-
depends on the size of
the image and number of
aration of the modes without additional processing (to be discussed later in this sec-
intensity levels. tion) is no longer possible. Similar results would be obtained if the illumination was

0 63 127 191 255 0 0.2 0.4 0.6 0.8 1 0 63 127 191 255

a b c
de f
FIGURE 10.34 (a) Noisy image. (b) Intensity ramp in the range [0.2, 0.6]. (c) Product of (a) and (b). (d) through (f)
Corresponding histograms.

vtucircle.com
perfectly uniform, but the reflectance of the image was not, as a results, for example,
of natural reflectivity variations in the surface of objects and/or background.
The important point is that illumination and reflectance play a central role in the
success of image segmentation using thresholding or other segmentation techniques.
Therefore, controlling these factors when possible should be the first step consid-
ered in the solution of a segmentation problem. There are three basic approaches
to the problem when control over these factors is not possible. The first is to correct
the shading pattern directly. For example, nonuniform (but fixed) illumination can
be corrected by multiplying the image by the inverse of the pattern, which can be
obtained by imaging a flat surface of constant intensity. The second is to attempt
to correct the global shading pattern via processing using, for example, the top-hat
transformation introduced in Section 9.8. The third approach is to “work around”
nonuniformities using variable thresholding, as discussed later in this section.

BASIC GLOBAL THRESHOLDING

When the intensity distributions of objects and background pixels are sufficiently
distinct, it is possible to use a single (global) threshold applicable over the entire
image. In most applications, there is usually enough variability between images that,
even if global thresholding is a suitable approach, an algorithm capable of estimat-
ing the threshold value for each image is required. The following iterative algorithm
can be used for this purpose:

1. Select an initial estimate for the global threshold, T.

2. Segment the image using T in Eq. (10-46). This will produce two groups of
pixels: G1, consisting of pixels with intensity values  T; and G2 , consisting of
pixels with values  T.
3. Compute the average (mean) intensity values m1 and m2 for the pixels in G1
and G2 , respectively.
4. Compute a new threshold value midway between m1and m2 :
1
T  m  m 
2 1 2

5. Repeat Steps 2 through 4 until the difference between values of T in successive

iterations is smaller than a predefined value, ΔT.
The algorithm is stated here in terms of successively thresholding the input image
and calculating the means at each step, because it is more intuitive to introduce
it in this manner. However, it is possible to develop an equivalent (and more effi-
cient) procedure by expressing all computations in the terms of the image histogram,
which has to be computed only once (see Problem 10.29).
The preceding algorithm works well in situations where there is a reasonably
clear valley between the modes of the histogram related to objects and background.
Parameter ΔT is used to stop iterating when the changes in threshold values is small.
The initial threshold must be chosen greater than the minimum and less than the
maximum intensity level in the image (the average intensity of the image is a good

vtucircle.com
10.3 Thresholding 747

0 63 127 191 255

FIGURE 10.35 (a) Noisy fingerprint. (b) Histogram. (c) Segmented result using a global threshold (thin image border
added for clarity). (Original image courtesy of the National Institute of Standards and Technology.).

initial choice for T ). If this condition is met, the algorithm converges in a finite num-
ber of steps, whether or not the modes are separable (see Problem 10.30).

Figure 10.35 shows an example of segmentation using the preceding iterative algorithm. Figure 10.35(a)
is the original image and Fig. 10.35(b) is the image histogram, showing a distinct valley. Application
of the basic global algorithm resulted in the threshold T  125.4 after three iterations, starting with T
equal to the average intensity of the image, and using ΔT  0. Figure 10.35(c) shows the result obtained
using T  125 to segment the original image. As expected from the clear separation of modes in the
histogram, the segmentation between object and background was perfect.

OPTIMUM GLOBAL THRESHOLDING USING OTSU’S METHOD

Thresholding may be viewed as a statistical-decision theory problem whose objec-
tive is to minimize the average error incurred in assigning pixels to two or more
groups (also called classes). This problem is known to have an elegant closed-form
solution known as the Bayes decision function (see Section 12.4). The solution is
based on only two parameters: the probability density function (PDF) of the inten-
sity levels of each class, and the probability that each class occurs in a given applica-
tion. Unfortunately, estimating PDFs is not a trivial matter, so the problem usually
is simplified by making workable assumptions about the form of the PDFs, such as
assuming that they are Gaussian functions. Even with simplifications, the process
of implementing solutions using these assumptions can be complex and not always
well-suited for real-time applications.
The approach in the following discussion, called Otsu’s method (Otsu [1979]), is
an attractive alternative. The method is optimum in the sense that it maximizes the

vtucircle.com
between-class variance, a well-known measure used in statistical discriminant analy-
sis. The basic idea is that properly thresholded classes should be distinct with respect
to the intensity values of their pixels and, conversely, that a threshold giving the
best separation between classes in terms of their intensity values would be the best
(optimum) threshold. In addition to its optimality, Otsu’s method has the important
property that it is based entirely on computations performed on the histogram of an
image, an easily obtainable 1-D array (see Section 3.3).
Let 0, 1, 2, … , L  1 denote the set of L distinct integer intensity levels in a digi-
tal image of size M  N pixels, and let ni denote the number of pixels with intensity i.
The total number, MN, of pixels in the image is MN  n0  n1  n2  ⋯  nL1. The
normalized histogram (see Section 3.3) has components pi  ni MN , from which it
follows that L1

 pi  1
i0
pi  0 (10-48)

Now, suppose that we select a threshold T(k)  k, 0  k  L  1, and use it to thresh-

old the input image into two classes, c1 and c2 , where c1 consists of all the pixels in
the image with intensity values in the range [0, k] and c2 consists of the pixels with
values in the range [k  1, L  1]. Using this threshold, the probability, P1(k), that a
pixel is assigned to (i.e., thresholded into) class c1 is given by the cumulative sum
k

P1(k)   pi (10-49)
i0

Viewed another way, this is the probability of class c1 occurring. For example, if we
set k  0, the probability of class c1 having any pixels assigned to it is zero. Similarly,
the probability of class c2 occurring is
L1

P2 (k)   pi  1  P1(k) (10-50)

i  k1

From Eq. (3-25), the mean intensity value of the pixels in c1 is

k k

m1(k)   iP i ck1    iP c1 i P i P c1 

i0 1 i0
 ip (10-51)

P k 
 i
i0
1

where P1(k) is given by Eq. (10-49). The term P i c1  in Eq. (10-51) is the probability
of intensity value i, given that i comes from class c1. The rightmost term in the first
line of the equation follows from Bayes’ formula:

P  A B  P B A P  A P B
The second line follows from the fact that P c1 i, the probability of c1 given i, is 1
because we are dealing only with values of i from class c1. Also, P(i) is the probabil-
ity of the ith value, which is the ith component of the histogram, pi . Finally, P(c1) is
the probability of class c1 which, from Eq. (10-49), is equal to P1(k).

vtucircle.com
10.3 Thresholding 749

Similarly, the mean intensity value of the pixels assigned to class c2 is

L1

m2 (k)   iP i c2 
i  k1 (10-52)
L1
1
 
P2 (k) i  k1
ip i

The cumulative mean (average intensity) up to level k is given by

m(k)  
i0
i pi (10-53)

and the average intensity of the entire image (i.e., the global mean) is given by
L1

mG  
i0
i pi (10-54)

The validity of the following two equations can be verified by direct substitution of
the preceding results:

P1m1  P2 m2  mG (10-55)

and

P1  P2  1 (10-56)

where we have omitted the ks temporarily in favor of notational clarity.

In order to evaluate the effectiveness of the threshold at level k, we use the nor-
malized, dimensionless measure
s2B
h 2
(10-57)
G

where s2 is the global variance [i.e., the intensity variance of all the pixels in the
image, as given in Eq. (3-26)],
s2  L1 i  m  p
2
(10-58)
G  G i
i0

and s2 is the between-class variance, defined as

 P1 m1  mG   P2 m2  mG 
2 2 2
(10-59)

The second step in this This expression can also be written as

equation makes sense
only if P1 is greater than
s2B  P1 P2 m1  m2 
0 and less than 1, which, 2
in view of Eq. (10-56),
implies that P2 must satisfy
the same condition. m P  m
2 (10-60)
P1 1  P1 

vtucircle.com
The first line of this equation follows from Eqs. (10-55), (10-56), and (10-59). The
second line follows from Eqs. (10-50) through (10-54). This form is slightly more
efficient computationally because the global mean, mG , is computed only once, so
only two parameters, m1 and P1, need to be computed for any value of k.
The first line in Eq. (10-60) indicates that the farther the two means m1 and m2 are
from each other, the larger s2 will be, implying that the between-class variance is a
measure of separability between classes. Because sG 2 is a constant, it follows that h
B

also is a measure of separability, and maximizing this metric is equivalent to maximiz-

ing s2 . The objective, then, is to determine the threshold value, k, that maximizes the
between-class variance, as stated earlier. Note that Eq. (10-57) assumes implicitly that
2
 0. This variance can be zero only when all the intensity levels in the image are
the same, which implies the existence of only one class of pixels. This in turn means
that h  0 for a constant image because the separability of a single class from itself
is zero.
Reintroducing k, we have the final results:2
s (k)
  B (10-61)
hk 2
G
and
s2 k 
mG P1 (k)  m(k)2
(10-62)
P1(k)1  P1(k)

Then, the optimum threshold is the value, k*, that maximizes s2 (k) :

 
s2 k*  max s (k)
2
0  k  L1
(10-63)

To find k* we simply evaluate this equation for all integer values of k (subject to the
condition 0  P1(k)  1) and select the value of k that yielded the maximum s2 (k).
If the maximum exists for more than one value of k, it is customary to average the
various values of k for which s2 (k) is maximum. It can be shown (see Problem
10.36) that a maximum always exists, subject to the condition 0  P1(k)  1. Evaluat-
ing Eqs. (10-62) and (10-63) for all values of k is a relatively inexpensive computa-
tional procedure, because the maximum number of integer values that k can have
is L, which is only 256 for 8-bit images.
Once k* has been obtained, input image f (x, y) is segmented as before:
1 if f (x, y)  k*
g(x, y)   (10-64)
0 if f (x, y)  k*

for x  0, 1, 2, … , M  1 and y  0, 1, 2, … , N  1. Note that all the quantities needed

to evaluate Eq. (10-62) are obtained using only the histogram of f (x, y). In addition
to the optimum threshold, other information regarding the segmented image can be
extracted from the histogram. For example, P1(k* ) and P2 (k* ), the class probabilities
evaluated at the optimum threshold, indicate the portions of the areas occupied by
the classes (groups of pixels) in the thresholded image. Similarly, the means m1(k* )
and m2(k* ) are estimates of the average intensity of the classes in the original image.

vtucircle.com
10.3 Thresholding 751
In general, the measure in Eq.(10-61) has values in the range

0  h(k)  1 (10-65)

for values of k in the range 0, L  1. When evaluated at the optimum threshold
k*, this measure is a quantitative estimate of the separability of classes, which in
turn gives us an idea of the accuracy of thresholding a given image with k*. The
lower bound in Eq. (10-65) is attainable only by images with a single, constant inten-
sity level. The upper bound is attainable only by two-valued images with intensities
equal to 0 and L  1 (see Problem 10.37).
Otsu’s algorithm may be summarized as follows:

1. Compute the normalized histogram of the input image. Denote the components
of the histogram by pi , i  0, 1, 2, … , L  1.
2. Compute the cumulative sums, P1(k), for k  0, 1, 2, … , L  1, using Eq. (10-49).
3. Compute the cumulative means, m(k), for k  0, 1, 2, … , L  1, using Eq. (10-53).
4. Compute the global mean, mG , using Eq. (10-54).
5. Compute the between-class variance term, s2 (k), for k  0, 1, 2, … , L  1, using
Eq. (10-62).
6. Obtain the Otsu threshold, k*, as the value of k for which s2 (k) is maximum. If
the maximum is not unique, obtain k* by averaging the values of k correspond-
ing to the various maxima detected.
7. Compute the global variance, s2 , using Eq. (10-58), and then obtain the separa-
bility measure, h*, by evaluating Eq. (10-61) with k  k*.

The following example illustrates the use of this algorithm.

Figure 10.36(a) shows an optical microscope image of polymersome cells. These are cells artificially engi-
neered using polymers. They are invisible to the human immune system and can be used, for example,
to deliver medication to targeted regions of the body. Figure 10.36(b) shows the image histogram. The
objective of this example is to segment the molecules from the background. Figure 10.36(c) is the result
of using the basic global thresholding algorithm discussed earlier. Because the histogram has no distinct
valleys and the intensity difference between the background and objects is small, the algorithm failed to
achieve the desired segmentation. Figure 10.36(d) shows the result obtained using Otsu’s method. This
result obviously is superior to Fig. 10.36(c). The threshold value computed by the basic algorithm was
169, while the threshold computed by Otsu’s method was 182, which is closer to the lighter areas in the
image defining the cells. The separability measure h* was 0.467.
As a point of interest, applying Otsu’s method to the fingerprint image in Example 10.13 yielded a
threshold of 125 and a separability measure of 0.944. The threshold is identical to the value (rounded to
the nearest integer) obtained with the basic algorithm. This is not unexpected, given the nature of the
histogram. In fact, the separability measure is high because of the relatively large separation between
modes and the deep valley between them.

vtucircle.com
c d
FIGURE 10.36
(a) Original
image.
(b) Histogram
(high peaks
were clipped to
highlight details in
the lower values).
(c) Segmenta-
tion result using
the basic global
algorithm from
0 63 127 191 255
Section 10.3.
(d) Result using
Otsu’s method.
(Original image
courtesy of
Professor Daniel
A. Hammer, the
University of
Pennsylvania.)

USING IMAGE SMOOTHING TO IMPROVE GLOBAL THRESHOLDING

As illustrated in Fig. 10.33, noise can turn a simple thresholding problem into an
unsolvable one. When noise cannot be reduced at the source, and thresholding is the
preferred segmentation method, a technique that often enhances performance is to
smooth the image prior to thresholding. We illustrate this approach with an example.
Figure 10.37(a) is the image from Fig. 10.33(c), Fig. 10.37(b) shows its histogram,
and Fig. 10.37(c) is the image thresholded using Otsu’s method. Every black point
in the white region and every white point in the black region is a thresholding error,
so the segmentation was highly unsuccessful. Figure 10.37(d) shows the result of
smoothing the noisy image with an averaging kernel of size 5  5 (the image is of size
651  814 pixels), and Fig. 10.37(e) is its histogram. The improvement in the shape
of the histogram as a result of smoothing is evident, and we would expect threshold-
ing of the smoothed image to be nearly perfect. Figure 10.37(f) shows this to be the
case. The slight distortion of the boundary between object and background in the
segmented, smoothed image was caused by the blurring of the boundary. In fact, the
more aggressively we smooth an image, the more boundary errors we should antici-
pate in the segmented result.

vtucircle.com
10.3 Thresholding 753

0 63 127 191 255

a b c
de f
FIGURE 10.37 (a) Noisy image from Fig. 10.33(c) and (b) its histogram. (c) Result obtained using Otsu’s method.
(d) Noisy image smoothed using a 5  5 averaging kernel and (e) its histogram. (f) Result of thresholding using
Otsu’s method.

Next, we investigate the effect of severely reducing the size of the foreground
region with respect to the background. Figure 10.38(a) shows the result. The noise in
this image is additive Gaussian noise with zero mean and a standard deviation of 10
intensity levels (as opposed to 50 in the previous example). As Fig. 10.38(b) shows,
the histogram has no clear valley, so we would expect segmentation to fail, a fact that
is confirmed by the result in Fig. 10.38(c). Figure 10.38(d) shows the image smoothed
with an averaging kernel of size 5  5, and Fig. 10.38(e) is the corresponding histo-
gram. As expected, the net effect was to reduce the spread of the histogram, but the
distribution still is unimodal. As Fig. 10.38(f) shows, segmentation failed again. The
reason for the failure can be traced to the fact that the region is so small that its con-
tribution to the histogram is insignificant compared to the intensity spread caused
by noise. In situations such as this, the approach discussed in the following section is
more likely to succeed.

USING EDGES TO IMPROVE GLOBAL THRESHOLDING

Based on the discussion thus far, we conclude that the chances of finding a “good”
threshold are enhanced considerably if the histogram peaks are tall, narrow, sym-
metric, and separated by deep valleys. One approach for improving the shape of
histograms is to consider only those pixels that lie on or near the edges between

vtucircle.com
0 63 127 191 255

0 63 127 191 255

a b c
de f
FIGURE 10.38 (a) Noisy image and (b) its histogram. (c) Result obtained using Otsu’s method. (d) Noisy image
smoothed using a 5  5 averaging kernel and (e) its histogram. (f) Result of thresholding using Otsu’s method.
Thresholding failed in both cases to extract the object of interest. (See Fig. 10.39 for a better solution.)

objects and the background. An immediate and obvious improvement is that his-
tograms should be less dependent on the relative sizes of objects and background.
For instance, the histogram of an image composed of a small object on a large back-
ground area (or vice versa) would be dominated by a large peak because of the high
concentration of one type of pixels. We saw in Fig. 10.38 that this can lead to failure
in thresholding.
If only the pixels on or near the edges between objects and background were
used, the resulting histogram would have peaks of approximately the same height. In
addition, the probability that any of those pixels lies on an object would be approxi-
mately equal to the probability that it lies on the background, thus improving the
symmetry of the histogram modes. Finally, as indicated in the following paragraph,
using pixels that satisfy some simple measures based on gradient and Laplacian
operators has a tendency to deepen the valley between histogram peaks.
The approach just discussed assumes that the edges between objects and back-
ground are known. This information clearly is not available during segmentation,
as finding a division between objects and background is precisely what segmenta-
tion aims to do. However, an indication of whether a pixel is on an edge may be
obtained by computing its gradient or Laplacian. For example, the average value
of the Laplacian is 0 at the transition of an edge (see Fig. 10.10), so the valleys of

vtucircle.com
10.3 Thresholding 755

histograms formed from the pixels selected by a Laplacian criterion can be expected
to be sparsely populated. This property tends to produce the desirable deep valleys
discussed above. In practice, comparable results typically are obtained using either
the gradient or Laplacian images, with the latter being favored because it is compu-
tationally more attractive and is also created using an isotropic edge detector.
The preceding discussion is summarized in the following algorithm, where f (x, y)
It is possible to modify is the input image:
this algorithm so that
both the magnitude of
the gradient and the
1. Compute an edge image as either the magnitude of the gradient, or absolute
absolute value of the value of the Laplacian, of f (x, y) using any of the methods in Section 10.2.
Laplacian images are
used. In this case, we 2. Specify a threshold value, T.
would specify a threshold
for each image and form 3. Threshold the image from Step 1 using T from Step 2 to produce a binary image,
the logical OR of the gT (x, y). This image is used as a mask image in the following step to select pixels
two results to obtain
the marker image. This from f (x, y) corresponding to “strong” edge pixels in the mask.
approach is useful when
more control is desired 4. Compute a histogram using only the pixels in f (x, y) that correspond to the
over the points deemed
to be valid edge points.
locations of the 1-valued pixels in gT (x, y).
5. Use the histogram from Step 4 to segment f (x, y) globally using, for example,
Otsu’s method.

The nth percentile is

If T is set to any value less than the minimum value of the edge image then, accord-
the smallest number ing to Eq. (10-46), gT (x, y) will consist of all 1’s, implying that all pixels of f (x, y)
that is greater than n%
of the numbers in a
will be used to compute the image histogram. In this case, the preceding algorithm
given set. For example, becomes global thresholding using the histogram of the original image. It is custom-
if you received a 95 in a
test and this score was ary to specify the value of T to correspond to a percentile, which typically is set
greater than 85% of all high (e.g., in the high 90’s) so that few pixels in the gradient/Laplacian image will
the students taking the
test, then you would be be used in the computation. The following examples illustrate the concepts just dis-
in the 85th percentile
with respect to the test
cussed. The first example uses the gradient, and the second uses the Laplacian. Simi-
scores. lar results can be obtained in both examples using either approach. The important
issue is to generate a suitable derivative image.

Figures 10.39(a) and (b) show the image and histogram from Fig. 10.38. You saw that this image could
not be segmented by smoothing followed by thresholding. The objective of this example is to solve the
problem using edge information. Figure 10.39(c) is the mask image, gT (x, y), formed as gradient mag-
nitude image thresholded at the 99.7 percentile. Figure 10.39(d) is the image formed by multiplying the
mask by the input image. Figure 10.39(e) is the histogram of the nonzero elements in Fig. 10.39(d). Note
that this histogram has the important features discussed earlier; that is, it has reasonably symmetrical
modes separated by a deep valley. Thus, while the histogram of the original noisy image offered no hope
for successful thresholding, the histogram in Fig. 10.39(e) indicates that thresholding of the small object
from the background is indeed possible. The result in Fig. 10.39(f) shows that this is the case. This image
was generated using Otsu’s method [to obtain a threshold based on the histogram in Fig. 10.42(e)], and
then applying the Otsu threshold globally to the noisy image in Fig. 10.39(a). The result is nearly perfect.

vtucircle.com
0 63 127 191 255

0 63 127 191 255

a b c
de f
FIGURE 10.39 (a) Noisy image from Fig. 10.38(a) and (b) its histogram. (c) Mask image formed as the gradient mag-
nitude image thresholded at the 99.7 percentile. (d) Image formed as the product of (a) and (c). (e) Histogram of
the nonzero pixels in the image in (d). (f) Result of segmenting image (a) with the Otsu threshold based on the
histogram in (e). The threshold was 134, which is approximately midway between the peaks in this histogram.

In this example, we consider a more complex thresholding problem. Figure 10.40(a) shows an 8-bit
image of yeast cells for which we want to use global thresholding to obtain the regions corresponding
to the bright spots. As a starting point, Fig. 10.40(b) shows the image histogram, and Fig. 10.40(c) is
the result obtained using Otsu’s method directly on the image, based on the histogram shown. We see
that Otsu’s method failed to achieve the original objective of detecting the bright spots. Although the
method was able to isolate some of the cell regions themselves, several of the segmented regions on the
right were actually joined. The threshold computed by the Otsu method was 42, and the separability
measure was 0.636.
Figure 10.40(d) shows the mask image gT (x, y) obtained by computing the absolute value of the
Laplacian image, then thresholding it with T set to 115 on an intensity scale in the range [0, 255]. This
value of T corresponds approximately to the 99.5 percentile of the values in the absolute Laplacian
image, so thresholding at this level results in a sparse set of pixels, as Fig. 10.40(d) shows. Note in this
image how the points cluster near the edges of the bright spots, as expected from the preceding dis-
cussion. Figure 10.40(e) is the histogram of the nonzero pixels in the product of (a) and (d). Finally,
Fig. 10.40(f) shows the result of globally segmenting the original image using Otsu’s method based on
the histogram in Fig. 10.40(e). This result agrees with the locations of the bright spots in the image. The
threshold computed by the Otsu method was 115, and the separability measure was 0.762, both of which
are higher than the values obtained by using the original histogram.

vtucircle.com
10.3 Thresholding 757

0 63 127 191 255

a b c
de f
FIGURE 10.40 (a) Image of yeast cells. (b) Histogram of (a). (c) Segmentation of (a) with Otsu’s method using the
histogram in (b). (d) Mask image formed by thresholding the absolute Laplacian image. (e) Histogram of the non-
zero pixels in the product of (a) and (d). (f) Original image thresholded using Otsu’s method based on the histogram
in (e). (Original image courtesy of Professor Susan L. Forsburg, University of Southern California.)

By varying the percentile at which the threshold is set, we can even improve the segmentation of the
complete cell regions. For example, Fig. 10.41 shows the result obtained using the same procedure as in
the previous paragraph, but with the threshold set at 55, which is approximately 5% of the maximum
value of the absolute Laplacian image. This value is at the 53.9 percentile of the values in that image.
This result clearly is superior to the result in Fig. 10.40(c) obtained using Otsu’s method with the histo-
gram of the original image.

MULTIPLE THRESHOLDS
Thus far, we have focused attention on image segmentation using a single global
threshold. Otsu’s method can be extended to an arbitrary number of thresholds

vtucircle.com
FIGURE 10.41
Image in Fig.
10.40(a) segmented
using the same
procedure as
explained in Figs.
10.40(d) through
(f), but using a
lower value to
threshold the
absolute Laplacian
image.

because the separability measure on which it is based also extends to an arbitrary

In applications involving number of classes (Fukunaga [1972]). In the case of K classes, c1, c2 ,…, cK , the
more than one variable between-class variance generalizes to the expression
(for example the RGB
components of a color K
2
 Pk mk  mG 
2
image), thresholding can
be implemented using a
sB  (10-66)
k 1
distance measure, such
as the Euclidean distance,
or Mahalanobis distance where
discussed in Section 6.7
(see Eqs. (6-48), (6-49),
and Example 6.15).
Pk   pi (10-67)
ick
and

 ipi
1
m  (10-68)
Pk ic k

As before, mG is the global mean given in Eq. (10-54). The K classes are separated
by K  1 thresholds whose values, k, k,…, k , are the values that maximize Eq.
K 1
1 2

(10-66):

B

s2 k, k,…, k
1 2
 max
K 1
 s2 k , k ,…k 
B
(10-69)
1 2 K 1
0k1 k2 …k K L1

Although this result is applicable to an arbitrary number of classes, it begins to lose

meaning as the number of classes increases because we are dealing with only one
variable (intensity). In fact, the between-class variance usually is cast in terms of
multiple variables expressed as vectors (Fukunaga [1972]). In practice, using mul-
tiple global thresholding is considered a viable approach when there is reason to
believe that the problem can be solved effectively with two thresholds. Applications
that require more than two thresholds generally are solved using more than just
intensity values. Instead, the approach is to use additional descriptors (e.g., color)
and the application is cast as a pattern recognition problem, as you will learn shortly
in the discussion on multivariable thresholding.

vtucircle.com
10.3 Thresholding 759
Recall from the
discussion of the
For three classes consisting of three intensity intervals (which are separated by
Canny edge detec- two thresholds), the between-class variance is given by:
tor that thresholding
with two thresholds is 2 2 2 2
referred to as hysteresis sB  P1 m1  mG   P2 m2  mG   P3 m3  mG  (10-70)
thresholding.
where k1

P1   pi
i0
k2

P2   pi (10-71)
i  k1 1
 L1
P3   pi
i  k2 1
and
k1
1
m1 
P1  ipi
i0
1 k2
m 
2
P2 
i  k1 1
ipi (10-72)
L1
m3 
1
 ip
P3 i  k 1 i 2

As in Eqs. (10-55) and (10-56), the following relationships hold:

P1m1  P2 m2  P3m3  mG (10-73)

and
P1  P2  P3  1 (10-74)

We see from Eqs. (10-71) and (10-72) that P and m, and therefore s2 , are functions
* *
of k1 and
2 k2 . The two optimum threshold values, k and k , are the values that maxi-
(k , k 1 2
mize s 1 2 ). That is, as indicated in Eq. (10-69), we find the optimum thresholds
B
by finding

B

s2 k, k
1

2
max s2 k , k
B 1 2
 (10-75)
0k1 k2 L1

The procedure starts by selecting the first value of k1 (that value is 1 because look-
ing for a threshold at 0 intensity makes no sense; also, keep in mind that the incre-
ment values are integers because we are dealing with integer intensity values).
Next, k2 is incremented through all its values greater than k1 and less than L  1
(i.e., k2  k1  1,…, L  2). Then, k1 is incremented to its next value and k2 is incre-
mented again through all its values greater than k1. This procedure is repeated
until k1  L  3. The result of this procedure is a 2-D array, s2 k1, k2 , and the last
step is to look for the maximum value in this array. The values of k1 and k2 cor-
responding to that maximum in the array are the optimum thresholds, k* and k*.
1 2

vtucircle.com
If there are several maxima, the corresponding values of k1 and k2 are averaged to
obtain the final thresholds. The thresholded image is then given by

a if f (x, y)  k*1

g(x, y)  b if k*  f (x, y)  k* (10-76)
1 2

c
 if f (x, y)  k*2

where a, b, and c are any three distinct intensity values.

Finally, the separability measure defined earlier for one threshold extends direct-
ly to multiple thresholds:
s2 k, k  
h   
k1, k2
B 1
2
2
(10-77)
G

where s2 is the total image variance from Eq. (10-58).

Figure 10.42(a) shows an image of an iceberg. The objective of this example is to segment the image into
three regions: the dark background, the illuminated area of the iceberg, and the area in shadows. It is
evident from the image histogram in Fig. 10.42(b) that two thresholds are required to solve this problem.
The procedure discussed above resulted in the thresholds k1  80 and k 2 177, which we note from
Fig. 10.45(b) are near the centers of the two histogram valleys. Figure 10.42(c) is the segmentation that
resulted using these two thresholds in Eq. (10-76). The separability measure was 0.954. The principal
reason this example worked out so well can be traced to the histogram having three distinct modes
separated by reasonably wide, deep valleys. But we can do even better using superpixels, as you will see
in Section 10.5.

0 63 127 191 255

FIGURE 10.42 (a) Image of an iceberg. (b) Histogram. (c) Image segmented into three regions using dual Otsu thresholds.
(Original image courtesy of NOAA.)

vtucircle.com
10.3 Thresholding 761

VARIABLE THRESHOLDING
As discussed earlier in this section, factors such as noise and nonuniform illumina-
tion play a major role in the performance of a thresholding algorithm. We showed
that image smoothing and the use of edge information can help significantly. How-
ever, sometimes this type of preprocessing is either impractical or ineffective in
improving the situation, to the point where the problem cannot be solved by any
of the thresholding methods discussed thus far. In such situations, the next level of
thresholding complexity involves variable thresholding, as we will illustrate in the
following discussion.

Variable Thresholding Based on Local Image Properties

A basic approach to variable thresholding is to compute a threshold at every point,
(x, y), in the image based on one or more specified properties in a neighborhood
of (x, y). Although this may seem like a laborious process, modern algorithms and
hardware allow for fast neighborhood processing, especially for common functions
such as logical and arithmetic operations.
We illustrate the approach using the mean and standard deviation of the pixel
values in a neighborhood of every point in an image. These two quantities are use-
ful for determining local thresholds because, as you know from Chapter 3, they are
descriptors of average intensity and contrast. Let mxy and sxy denote the mean and
standard deviation of the set of pixel values in a neighborhood, Sxy , centered at
We simplified the nota-
tion slightly from the coordinates (x, y) in an image (see Section 3.3 regarding computation of the local
form we used in mean and standard deviation). The following are common forms of variable thresh-
Eqs. (3-27) and (3-28) by
letting xy imply a olds based on the local image properties:
neighborhood S, centered
at coordinates (x, y). Txy  asxy  bmxy (10-78)

where a and b are nonnegative constants, and

Note that Txy is a

Txy  asxy  bmG (10-79)
threshold array of the
same size as the image where mG is the global image mean. The segmented image is computed as
from which it was
obtained. The threshold
at a location (x, y) in the
1 if f (x, y)  Txy
array is used to segment
g(x, y)   (10-80)
the value of an image at
0 if f (x, y)  Txy
that location. 

where f (x, y) is the input image. This equation is evaluated for all pixel locations
in the image, and a different threshold is computed at each location (x, y) using the
pixels in the neighborhood Sxy .
Significant power (with a modest increase in computation) can be added to vari-
able thresholding by using predicates based on the parameters computed in the neigh-
borhood of a point (x, y) :

1 if Q(local parameters) is TRUE

g(x, y)   (10-81)
0 if Q(local parameters) is FALSE

vtucircle.com
where Q is a predicate based on parameters computed using the pixels in neighbor-

hood Sxy . For example, consider the following predicate, Q sxy , mxy , based on the
local mean and standard deviation:


Q s ,m
  
TRUE if f (x, y)  asxy AND f (x, y)  bmxy (10-82)
xy xy

 FALSE otherwise
Note that Eq. (10-80) is a special case of Eq. (10-81), obtained by letting Q be TRUE
if f (x, y)  Txy and FALSE otherwise. In this case, the predicate is based simply on
the intensity at a point.

Figure 10.43(a) shows the yeast image from Example 10.16. This image has three predominant inten-
sity levels, so it is reasonable to assume that perhaps dual thresholding could be a good segmentation
approach. Figure 10.43(b) is the result of using the dual thresholding method summarized in Eq. (10-76).
As the figure shows, it was possible to isolate the bright areas from the background, but the mid-gray
regions on the right side of the image were not segmented (i.e., separated) properly. To illustrate the use

c d
FIGURE 10.43
(a) Image from
Fig. 10.40.
(b) Image
segmented using
the dual
thresholding
approach given
by Eq. (10-76).
(c) Image of local
standard
deviations.
(d) Result
obtained using
local thresholding.

vtucircle.com
10.3 Thresholding 763

of local thresholding, we computed the local standard deviation sxy for all (x, y) in the input image using
a neighborhood of size 3  3. Figure 10.43(c) shows the result. Note how the faint outer lines correctly
delineate the boundaries of the cells. Next, we formed a predicate of the form shown in Eq. (10-82), but
using the global mean instead of mxy . Choosing the global mean generally gives better results when the
background is nearly constant and all the object intensities are above or below the background intensity.
The values a  30 and b  1.5 were used to complete the specification of the predicate (these values
were determined experimentally, as is usually the case in applications such as this). The image was then
segmented using Eq. (10-82). As Fig. 10.43(d) shows, the segmentation was quite successful. Note in par-
ticular that all the outer regions were segmented properly, and that most of the inner, brighter regions
were isolated correctly.

Variable Thresholding Based on Moving Averages

A special case of the variable thresholding method discussed in the previous sec-
tion is based on computing a moving average along scan lines of an image. This
implementation is useful in applications such as document processing, where speed
is a fundamental requirement. The scanning typically is carried out line by line in a
zigzag pattern to reduce illumination bias. Let zk1 denote the intensity of the point
encountered in the scanning sequence at step k  1. The moving average (mean
intensity) at this new point is given by

1 k1
m(k  1)  
n i  k2 n
zi for k  n  1
(10-83)
 m(k) 
1
z
k1
 z
k n
 for k  n  1
n

where n is the number of points used in computing the average, and m(1)  z1. The
conditions imposed on k are so that all subscripts on zk are positive. All this means
is that n points must be available for computing the average. When k is less than the
limits shown (this happens near the image borders) the averages are formed with
the available image points. Because a moving average is computed for every point
in the image, segmentation is implemented using Eq. (10-80) with Txy  cmxy , where
c is positive scalar, and mxy is the moving average from Eq. (10-83) at point (x, y) in
the input image.

Figure 10.44(a) shows an image of handwritten text shaded by a spot intensity pattern. This form of
intensity shading is typical of images obtained using spot illumination (such as a photographic flash).
Figure 10.44(b) is the result of segmentation using the Otsu global thresholding method. It is not unex-
pected that global thresholding could not overcome the intensity variation because the method gener-
ally performs poorly when the areas of interest are embedded in a nonuniform illumination field. Figure
10.44(c) shows successful segmentation with local thresholding using moving averages. For images of
written material, a rule of thumb is to let n equal five times the average stroke width. In this case, the
average width was 4 pixels, so we let n  20 in Eq. (10-83) and used c  0.5.

vtucircle.com
FIGURE 10.44 (a) Text image corrupted by spot shading. (b) Result of global thresholding using Otsu’s method.
(c) Result of local thresholding using moving averages.

As another illustration of the effectiveness of this segmentation approach, we used the same param-
eters as in the previous paragraph to segment the image in Fig. 10.45(a), which is corrupted by a sinu-
soidal intensity variation typical of the variations that may occur when the power supply in a document
scanner is not properly grounded. As Figs. 10.45(b) and (c) show, the segmentation results are compa-
rable to those in Fig. 10.44.
Note that successful segmentation results were obtained in both cases using the same values for n
and c, which shows the relative ruggedness of the approach. In general, thresholding based on moving
averages works well when the objects of interest are small (or thin) with respect to the image size, a
condition satisfied by images of typed or handwritten text.

10.4 SEGMENTATION BY REGION GROWING AND BY REGION

SPLITTING AND MERGING
As we discussed in Section 10.1, the objective of segmentation is to partition an
You should review the
terminology introduced image into regions. In Section 10.2, we approached this problem by attempting to
in Section 10.1 before find boundaries between regions based on discontinuities in intensity levels, where-
proceeding.
as in Section 10.3, segmentation was accomplished via thresholds based on the dis-
tribution of pixel properties, such as intensity values or color. In this section and in
Sections 10.5 and 10.6, we discuss segmentation techniques that find the regions
directly. In Section 10.7, we will discuss a method that finds the regions and their
boundaries simultaneously.

REGION GROWING
As its name implies, region growing is a procedure that groups pixels or subregions
into larger regions based on predefined criteria for growth. The basic approach is to
start with a set of “seed” points, and from these grow regions by appending to each
seed those neighboring pixels that have predefined properties similar to the seed
(such as ranges of intensity or color).
Selecting a set of one or more starting points can often be based on the nature of
the problem, as we show later in Example 10.20. When a priori information is not

vtucircle.com
10.4 Segmentation by Region Growing and by Region Splitting and Merging 765

FIGURE 10.45 (a) Text image corrupted by sinusoidal shading. (b) Result of global thresholding using Otsu’s method.
(c) Result of local thresholding using moving averages..

available, the procedure is to compute at every pixel the same set of properties that
ultimately will be used to assign pixels to regions during the growing process. If the
result of these computations shows clusters of values, the pixels whose properties
place them near the centroid of these clusters can be used as seeds.
The selection of similarity criteria depends not only on the problem under con-
sideration, but also on the type of image data available. For example, the analysis of
land-use satellite imagery depends heavily on the use of color. This problem would
be significantly more difficult, or even impossible, to solve without the inherent infor-
mation available in color images. When the images are monochrome, region analysis
must be carried out with a set of descriptors based on intensity levels and spatial
properties (such as moments or texture). We will discuss descriptors useful for region
characterization in Chapter 11.
Descriptors alone can yield misleading results if connectivity properties are not
used in the region-growing process. For example, visualize a random arrangement of
pixels that have three distinct intensity values. Grouping pixels with the same inten-
sity value to form a “region,” without paying attention to connectivity, would yield a
segmentation result that is meaningless in the context of this discussion.
Another problem in region growing is the formulation of a stopping rule. Region
growth should stop when no more pixels satisfy the criteria for inclusion in that
region. Criteria such as intensity values, texture, and color are local in nature and
do not take into account the “history” of region growth. Additional criteria that can
increase the power of a region-growing algorithm utilize the concept of size, like-
ness between a candidate pixel and the pixels grown so far (such as a comparison of
the intensity of a candidate and the average intensity of the grown region), and the
shape of the region being grown. The use of these types of descriptors is based on
the assumption that a model of expected results is at least partially available.
Let: f (x, y) denote an input image; S(x, y) denote a seed array containing 1’s
at the locations of seed points and 0’s elsewhere; and Q denote a predicate to be
applied at each location (x, y). Arrays f and S are assumed to be of the same size.
A basic region-growing algorithm based on 8-connectivity may be stated as follows.

vtucircle.com
1. Find all connected components in S(x, y) and reduce each connected component
See Sections 2.5 and 9.5
regarding connected to one pixel; label all such pixels found as 1. All other pixels in S are labeled 0.
components, and
Section 9.2 regarding 2. Form an image fQ such that, at each point (x, y), fQ (x, y)  1 if the input image
erosion. satisfies a given predicate, Q, at those coordinates, and fQ (x, y)  0 otherwise.
3. Let g be an image formed by appending to each seed point in S all the 1-valued
points in fQ that are 8-connected to that seed point.
4. Label each connected component in g with a different region label (e.g.,integers
or letters). This is the segmented image obtained by region growing.

The following example illustrates the mechanics of this algorithm.

Figure 10.46(a) shows an 8-bit X-ray image of a weld (the horizontal dark region) containing several
cracks and porosities (the bright regions running horizontally through the center of the image). We illus-
trate the use of region growing by segmenting the defective weld regions. These regions could be used
in applications such as weld inspection, for inclusion in a database of historical studies, or for controlling
an automated welding system.
The first thing we do is determine the seed points. From the physics of the problem, we know that
cracks and porosities will attenuate X-rays considerably less than solid welds, so we expect the regions
containing these types of defects to be significantly brighter than other parts of the X-ray image. We
can extract the seed points by thresholding the original image, using a threshold set at a high percen-
tile. Figure 10.46(b) shows the histogram of the image, and Fig. 10.46(c) shows the thresholded result
obtained with a threshold equal to the 99.9 percentile of intensity values in the image, which in this case
was 254 (see Section 10.3 regarding percentiles). Figure 10.46(d) shows the result of morphologically
eroding each connected component in Fig. 10.46(c) to a single point.
Next, we have to specify a predicate. In this example, we are interested in appending to each seed
all the pixels that (a) are 8-connected to that seed, and (b) are “similar” to it. Using absolute intensity
differences as a measure of similarity, our predicate applied at each location (x, y) is
 TRUE if the absolute difference of intensities
 between the seed and the pixel at (x, y) is  T
Q  
 FALSE
 otherwise
where T is a specified threshold. Although this predicate is based on intensity differences and uses a
single threshold, we could specify more complex schemes in which a different threshold is applied to
each pixel, and properties other than differences are used. In this case, the preceding predicate is suf-
ficient to solve the problem, as the rest of this example shows.
From the previous paragraph, we know that all seed values are 255 because the image was thresh-
olded with a threshold of 254. Figure 10.46(e) shows the difference between the seed value (255) and
Fig. 10.46(a). The image in Fig. 10.46(e) contains all the differences needed to compute the predicate at
each location (x, y). Figure 10.46(f) shows the corresponding histogram. We need a threshold to use in
the predicate to establish similarity. The histogram has three principal modes, so we can start by apply-
ing to the difference image the dual thresholding technique discussed in Section 10.3. The resulting two
thresholds in this case were T1  68 and T2  126, which we see correspond closely to the valleys of
the histogram. (As a brief digression, we segmented the image using these two thresholds. The result in

vtucircle.com
10.4 Segmentation by Region Growing and by Region Splitting and Merging 767

0 63 127 191 255

a b c
de f
gh i
Figure 10.46 (a) X-ray image of a defective weld. (b) Histogram. (c) Initial seed image. (d) Final seed image (the
points were enlarged for clarity). (e) Absolute value of the difference between the seed value (255) and (a).
(f) Histogram of (e). (g) Difference image thresholded using dual thresholds. (h) Difference image thresholded with
the smallest of the dual thresholds. (i) Segmentation result obtained by region growing. (Original image courtesy
of X-TEK Systems, Ltd.)

Fig. 10.46(g) shows that segmenting the defects cannot be accomplished using dual thresholds, despite
the fact that the thresholds are in the deep valleys of the histogram.)
Figure 10.46(h) shows the result of thresholding the difference image with only T1. The black points
are the pixels for which the predicate was TRUE; the others failed the predicate. The important result
here is that the points in the good regions of the weld failed the predicate, so they will not be included
in the final result. The points in the outer region will be considered by the region-growing algorithm as

vtucircle.com
candidates. However, Step 3 will reject the outer points because they are not 8-connected to the seeds.
In fact, as Fig. 10.46(i) shows, this step resulted in the correct segmentation, indicating that the use of
connectivity was a fundamental requirement in this case. Finally, note that in Step 4 we used the same
value for all the regions found by the algorithm. In this case, it was visually preferable to do so because
all those regions have the same physical meaning in this application—they all represent porosities.

REGION SPLITTING AND MERGING

The procedure just discussed grows regions from seed points. An alternative is to sub-
divide an image initially into a set of disjoint regions and then merge and/or split the
regions in an attempt to satisfy the conditions of segmentation stated in Section 10.1.
The basics of region splitting and merging are discussed next.
Let R represent the entire image region and select a predicate Q. One approach
for segmenting R is to subdivide it successively into smaller and smaller quadrant
regions so that, for any region Ri , Q(Ri )  TRUE. We start with the entire region, R.
If Q(R)  FALSE, we divide the image into quadrants. If Q is FALSE for any
quadrant, we subdivide that quadrant into sub-quadrants, and so on. This splitting
technique has a convenient representation in the form of so-called quadtrees; that
is, trees in which each node has exactly four descendants, as Fig. 10.47 shows (the
images corresponding to the nodes of a quadtree sometimes are called quadregions
or quadimages). Note that the root of the tree corresponds to the entire image, and
that each node corresponds to the subdivision of a node into four descendant nodes.
In this case, only R4 was subdivided further.
If only splitting is used, the final partition normally contains adjacent regions with
identical properties. This drawback can be remedied by allowing merging as well as
splitting. Satisfying the constraints of segmentation outlined in Section 10.1 requires
merging only adjacent regions whose combined pixels satisfy the predicate Q. That
 
See Section 2.5
regarding region is, two adjacent regions Rj and Rk are merged only if Q Rj URk  TRUE.
adjacency.
The preceding discussion can be summarized by the following procedure in which,
at any step, we

1. Split into four disjoint quadrants any region Ri for which Q(Ri )  FALSE.
2. When no further splitting is possible, merge any adjacent regions Rj and Rk for

which Q Rj URk  TRUE.

a b R
FIGURE 10.47
(a) Partitioned
image.
(b) Corresponding
quadtree.
R represents
the entire image
region.

vtucircle.com
10.4 Segmentation by Region Growing and by Region Splitting and Merging 769

3. Stop when no further merging is possible.

Numerous variations of this basic theme are possible. For example, a significant
simplification results if in Step 2 we allow merging of any two adjacent regions Rj
and Rk if each one satisfies the predicate individually. This results in a much sim-
pler (and faster) algorithm, because testing of the predicate is limited to individual
quadregions. As the following example shows, this simplification is still capable of
yielding good segmentation results.

Figure 10.48(a) shows a 566  566 X-ray image of the Cygnus Loop supernova. The objective of this
example is to segment (extract from the image) the “ring” of less dense matter surrounding the dense
inner region. The region of interest has some obvious characteristics that should help in its segmenta-
tion. First, we note that the data in this region has a random nature, indicating that its standard devia-
tion should be greater than the standard deviation of the background (which is near 0) and of the large
central region, which is smooth. Similarly, the mean value (average intensity) of a region containing
data from the outer ring should be greater than the mean of the darker background and less than the
mean of the lighter central region. Thus, we should be able to segment the region of interest using the
following predicate:

c d
FIGURE 10.48
(a) Image of the
Cygnus Loop
supernova, taken
in the X-ray band
by NASA’s
Hubble Telescope.
(b) through (d)
Results of limit-
ing the smallest
allowed
quadregion to be
of sizes of 32  32,
16  16, and 8  8
pixels,
respectively.
(Original image
courtesy of
NASA.)

vtucircle.com
TRUE if sR  a AND 0  mR  b
Q(R)  
FALSE otherwise

where sR and mR are the standard deviation and mean of the region being processed, and a and b are
nonnegative constants.
Analysis of several regions in the outer area of interest revealed that the mean intensity of pixels
in those regions did not exceed 125, and the standard deviation was always greater than 10. Figures
10.48(b) through (d) show the results obtained using these values for a and b, and varying the minimum
size allowed for the quadregions from 32 to 8. The pixels in a quadregion that satisfied the predicate
were set to white; all others in that region were set to black. The best result in terms of capturing the
shape of the outer region was obtained using quadregions of size 16  16. The small black squares in
Fig. 10.48(d) are quadregions of size 8  8 whose pixels did not satisfy the predicate. Using smaller
quadregions would result in increasing numbers of such black regions. Using regions larger than the one
illustrated here would result in a more “block-like” segmentation. Note that in all cases the segmented
region (white pixels) was a connected region that completely separates the inner, smoother region from
the background. Thus, the segmentation effectively partitioned the image into three distinct areas that
correspond to the three principal features in the image: background, a dense region, and a sparse region.
Using any of the white regions in Fig. 10.48 as a mask would make it a relatively simple task to extract
these regions from the original image (see Problem 10.43). As in Example 10.20, these results could not
have been obtained using edge- or threshold-based segmentation.

As used in the preceding example, properties based on the mean and standard
deviation of pixel intensities in a region attempt to quantify the texture of the region
(see Section 11.3 for a discussion on texture). The concept of texture segmentation
is based on using measures of texture in the predicates. In other words, we can per-
form texture segmentation by any of the methods discussed in this section simply by
specifying predicates based on texture content.

10.5 REGION SEGMENTATION USING CLUSTERING AND

SUPERPIXELS
In this section, we discuss two related approaches to region segmentation. The first
is a classical approach based on seeking clusters in data, related to such variables as
intensity and color. The second approach is significantly more modern, and is based
on using clustering to extract “superpixels” from an image.
A more general form of
clustering is
unsupervised clustering, REGION SEGMENTATION USING K-MEANS CLUSTERING
in which a clustering
algorithm attempts to The basic idea behind the clustering approach used in this chapter is to partition a
find a meaningful set of
clusters in a given set
set, Q, of observations into a specified number, k, of clusters. In k-means clustering,
of samples. We do not each observation is assigned to the cluster with the nearest mean (hence the name
address this topic, as
our focus in this brief
of the method), and each mean is called the prototype of its cluster. A k-means algo-
introduction is only to rithm is an iterative procedure that successively refines the means until convergence
illustrate how supervised
clustering is used for
is achieved.
image segmentation. Let {z1, z2 ,… , zQ } be set of vector observations (samples). These vectors have
the form

vtucircle.com

Image Restoration
No ratings yet
Image Restoration
92 pages
Dip Notes@Azdocuments - in
100% (1)
Dip Notes@Azdocuments - in
181 pages
Chapter 4 Image Restoration and Compression - 240827 - 072445
No ratings yet
Chapter 4 Image Restoration and Compression - 240827 - 072445
35 pages
BAI151A Module 3 Textbook
No ratings yet
BAI151A Module 3 Textbook
103 pages
Unit 3
No ratings yet
Unit 3
149 pages
Unit-3 Complete Arunendra
No ratings yet
Unit-3 Complete Arunendra
154 pages
Lecture 4. Image Restoration and Reconstruction
No ratings yet
Lecture 4. Image Restoration and Reconstruction
135 pages
Unit III CC
No ratings yet
Unit III CC
98 pages
Chapter 5
No ratings yet
Chapter 5
49 pages
1.1 Overview
No ratings yet
1.1 Overview
51 pages
New Folder 7
No ratings yet
New Folder 7
32 pages
Lecture 8
No ratings yet
Lecture 8
40 pages
Group-3 DIP Restoration
No ratings yet
Group-3 DIP Restoration
70 pages
RCG Ch5a
No ratings yet
RCG Ch5a
50 pages
Module - 4: Image Restoration
No ratings yet
Module - 4: Image Restoration
16 pages
IPT Module 4
No ratings yet
IPT Module 4
19 pages
Chapter 5
No ratings yet
Chapter 5
26 pages
Module 3
No ratings yet
Module 3
27 pages
Module-5 Image Restoration
No ratings yet
Module-5 Image Restoration
122 pages
DIP Unit 3
No ratings yet
DIP Unit 3
18 pages
Dip Unit 6
No ratings yet
Dip Unit 6
43 pages
DIP7
No ratings yet
DIP7
159 pages
Module 3
No ratings yet
Module 3
15 pages
DIP Unit 3
No ratings yet
DIP Unit 3
25 pages
(MIP Lab) Noise Removal in Digital Image Processing
No ratings yet
(MIP Lab) Noise Removal in Digital Image Processing
16 pages
Image Restoration - Updated
No ratings yet
Image Restoration - Updated
15 pages
Image Restoration 1
No ratings yet
Image Restoration 1
30 pages
Unit5pdf 2021 09 11 13 04 04
No ratings yet
Unit5pdf 2021 09 11 13 04 04
69 pages
4 - Image Restoration - Reconstruction
No ratings yet
4 - Image Restoration - Reconstruction
65 pages
Computer Vision Mod-3
No ratings yet
Computer Vision Mod-3
15 pages
Image Restoration
No ratings yet
Image Restoration
28 pages
Digital Image Processing: Image Restoration: Noise Removal
No ratings yet
Digital Image Processing: Image Restoration: Noise Removal
57 pages
Module 3
No ratings yet
Module 3
35 pages
Salt & Pepper Noise Gaussian & Speckle Noise Periodic Noise: Image Degradation and Restoration Chapter Five
No ratings yet
Salt & Pepper Noise Gaussian & Speckle Noise Periodic Noise: Image Degradation and Restoration Chapter Five
16 pages
Lecture 9 - Image Restoration and Reconstruction
No ratings yet
Lecture 9 - Image Restoration and Reconstruction
20 pages
Mod 5 Image Restoration
No ratings yet
Mod 5 Image Restoration
35 pages
Unit3 Image Processing
No ratings yet
Unit3 Image Processing
12 pages
University of Engineering & Technology, Mardan
No ratings yet
University of Engineering & Technology, Mardan
39 pages
Lec 5
No ratings yet
Lec 5
28 pages
Iarjset 2020 71204
No ratings yet
Iarjset 2020 71204
5 pages
Review On Various Noise Models and Image Restoration Techniques
No ratings yet
Review On Various Noise Models and Image Restoration Techniques
6 pages
DasSIDirect 3.0
No ratings yet
DasSIDirect 3.0
192 pages
Course Title: Computer Vision and Image Processing
No ratings yet
Course Title: Computer Vision and Image Processing
25 pages
L011 Dip
No ratings yet
L011 Dip
53 pages
Study Materials For Module 4 - DIP
No ratings yet
Study Materials For Module 4 - DIP
12 pages
Image Restoration Notes
No ratings yet
Image Restoration Notes
6 pages
Ip Unit-3
No ratings yet
Ip Unit-3
12 pages
Contrast Enhancement
No ratings yet
Contrast Enhancement
39 pages
Oise Modeling and Probability Theory
No ratings yet
Oise Modeling and Probability Theory
7 pages
Dip 08 09
No ratings yet
Dip 08 09
122 pages
EEE F435 (2020-21-I) : Digital Image Processing
No ratings yet
EEE F435 (2020-21-I) : Digital Image Processing
30 pages
DIP - BEC613C - Module 5 - Restoration Notes
No ratings yet
DIP - BEC613C - Module 5 - Restoration Notes
13 pages
Image Restoration
No ratings yet
Image Restoration
28 pages
EFFECTS OF ADDITION OF POLES AND ZEROS IN ROOT LOCUS SP
No ratings yet
EFFECTS OF ADDITION OF POLES AND ZEROS IN ROOT LOCUS SP
6 pages
Chapter 5
No ratings yet
Chapter 5
26 pages
CH 5
No ratings yet
CH 5
7 pages
A Review Paper Noise Models in Digital Image Processing
No ratings yet
A Review Paper Noise Models in Digital Image Processing
13 pages
DIP-5-Image Restoration
No ratings yet
DIP-5-Image Restoration
21 pages
Automation of Sewage Treatment Plant Using PLC & SCADA: A Major Project Report
No ratings yet
Automation of Sewage Treatment Plant Using PLC & SCADA: A Major Project Report
23 pages
Atm System FINAL
No ratings yet
Atm System FINAL
77 pages
Ratios Rates
No ratings yet
Ratios Rates
2 pages
Spesifikasi Barang Listrik
No ratings yet
Spesifikasi Barang Listrik
2 pages
Choosing Right Automation Tool
No ratings yet
Choosing Right Automation Tool
8 pages
Advance Java Notes
No ratings yet
Advance Java Notes
138 pages
Egyptian Numbers
No ratings yet
Egyptian Numbers
3 pages
Lens and Optical Instrument - Eng
No ratings yet
Lens and Optical Instrument - Eng
7 pages
Git Cheat Sheet
No ratings yet
Git Cheat Sheet
9 pages
Imd 95-280y Im 0322
No ratings yet
Imd 95-280y Im 0322
6 pages
LD7538 LeadtrendTechnology
No ratings yet
LD7538 LeadtrendTechnology
17 pages
1 Binary & Hexadecimal Systems J24
No ratings yet
1 Binary & Hexadecimal Systems J24
19 pages
Poster - 6 - PATH - Oxygen Oxygen Conversion Calculation - 33x23 in (NEW)
No ratings yet
Poster - 6 - PATH - Oxygen Oxygen Conversion Calculation - 33x23 in (NEW)
1 page
SR Elite, Aiims S60, MPL & LTC Aits Grand Test - 20 Paper (01-05-2023)
No ratings yet
SR Elite, Aiims S60, MPL & LTC Aits Grand Test - 20 Paper (01-05-2023)
30 pages
Studbolt Catalouge
No ratings yet
Studbolt Catalouge
47 pages
Consumer Theory
No ratings yet
Consumer Theory
17 pages
Aditya Kaplash Research Paper (Ground Improvement Using Stone Column)
No ratings yet
Aditya Kaplash Research Paper (Ground Improvement Using Stone Column)
22 pages
CMAX-DM60-CPUSEV53: Electrical Specifications
No ratings yet
CMAX-DM60-CPUSEV53: Electrical Specifications
3 pages
CEM3 Manual
No ratings yet
CEM3 Manual
76 pages
Cases of Pronoun
No ratings yet
Cases of Pronoun
3 pages
Biology Photosynthesis A-Level OCR Notes
No ratings yet
Biology Photosynthesis A-Level OCR Notes
13 pages
Module 3
No ratings yet
Module 3
5 pages
Assignment #2: Programming Fundamentals
No ratings yet
Assignment #2: Programming Fundamentals
7 pages
Algebra - Add-Sub Poly Version 3
No ratings yet
Algebra - Add-Sub Poly Version 3
2 pages
MS Access II PDF
No ratings yet
MS Access II PDF
44 pages
MS5105 Module Outline 2022-2023
No ratings yet
MS5105 Module Outline 2022-2023
4 pages
CHE 3800 - Mass Transfer and Separation Process (Winter 2017)
No ratings yet
CHE 3800 - Mass Transfer and Separation Process (Winter 2017)
3 pages
Physical Chemistry (471) : Faculty of Applied Sciences Laboratory Report
No ratings yet
Physical Chemistry (471) : Faculty of Applied Sciences Laboratory Report
12 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Topology and Geometry for Physicists
From Everand
Topology and Geometry for Physicists
Charles Nash
3.5/5 (1)