0% found this document useful (0 votes)
38 views146 pages

@vtucode - in 21CS732 Module 2 Textbook

Uploaded by

fexodih181
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views146 pages

@vtucode - in 21CS732 Module 2 Textbook

Uploaded by

fexodih181
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 146

MODULE 2

3 Intensity Transformations
and Spatial Filtering
It makes all the difference whether one sees darkness
through the light or brightness through the shadows.
David Lindsay

Preview
The term spatial domain refers to the image plane itself, and image process-
ing methods in this category are based on direct manipulation of pixels in
an image. This is in contrast to image processing in a transform domain
which, as introduced in Section 2.6.7 and discussed in more detail in
Chapter 4, involves first transforming an image into the transform domain,
doing the processing there, and obtaining the inverse transform to bring the
results back into the spatial domain. Two principal categories of spatial pro-
cessing are intensity transformations and spatial filtering. As you will learn
in this chapter, intensity transformations operate on single pixels of an
image, principally for the purpose of contrast manipulation and image
thresholding. Spatial filtering deals with performing operations, such as
image sharpening, by working in a neighborhood of every pixel in an image.
In the sections that follow, we discuss a number of “classical” techniques for
intensity transformations and spatial filtering. We also discuss in some de-
tail fuzzy techniques that allow us to incorporate imprecise, knowledge-
based information in the formulation of intensity transformations and
spatial filtering algorithms.

104
3.2 ■ Some Basic Intensity Transformation Functions 107

higher contrast than the original by darkening the intensity levels below k
and brightening the levels above k. In this technique, sometimes called
contrast stretching (see Section 3.2.4), values of r lower than k are com-
pressed by the transformation function into a narrow range of s, toward
black. The opposite is true for values of r higher than k. Observe how an in-
tensity value r0 is mapped to obtain the corresponding value s0. In the limit-
ing case shown in Fig. 3.2(b), T(r) produces a two-level (binary) image. A
mapping of this form is called a thresholding function. Some fairly simple, yet
powerful, processing approaches can be formulated with intensity transfor-
mation functions. In this chapter, we use intensity transformations principally
for image enhancement. In Chapter 10, we use them for image segmentation.
Approaches whose results depend only on the intensity at a point sometimes
are called point processing techniques, as opposed to the neighborhood pro-
cessing techniques discussed earlier in this section.

3.1.2 About the Examples in This Chapter


Although intensity transformations and spatial filtering span a broad range of
applications, most of the examples in this chapter are applications to image
enhancement. Enhancement is the process of manipulating an image so that
the result is more suitable than the original for a specific application. The
word specific is important here because it establishes at the outset that en-
hancement techniques are problem oriented. Thus, for example, a method
that is quite useful for enhancing X-ray images may not be the best approach
for enhancing satellite images taken in the infrared band of the electromag-
netic spectrum. There is no general “theory” of image enhancement. When an
image is processed for visual interpretation, the viewer is the ultimate judge
of how well a particular method works. When dealing with machine percep-
tion, a given technique is easier to quantify. For example, in an automated
character-recognition system, the most appropriate enhancement method is
the one that results in the best recognition rate, leaving aside other consider-
ations such as computational requirements of one method over another.
Regardless of the application or method used, however, image enhancement
is one of the most visually appealing areas of image processing. By its very na-
ture, beginners in image processing generally find enhancement applications in-
teresting and relatively simple to understand. Therefore, using examples from
image enhancement to illustrate the spatial processing methods developed in
this chapter not only saves having an extra chapter in the book dealing with
image enhancement but, more importantly, is an effective approach for intro-
ducing newcomers to the details of processing techniques in the spatial domain.
As you will see as you progress through the book, the basic material developed in
this chapter is applicable to a much broader scope than just image enhancement.

3.2 Some Basic Intensity Transformation Functions


Intensity transformations are among the simplest of all image processing tech-
niques. The values of pixels, before and after processing, will be denoted by r
and s, respectively. As indicated in the previous section, these values are related
108 Chapter 3 ■ Intensity Transformations and Spatial Filtering

by an expression of the form s = T(r), where T is a transformation that maps a


pixel value r into a pixel value s. Because we are dealing with digital quantities,
values of a transformation function typically are stored in a one-dimensional
array and the mappings from r to s are implemented via table lookups. For an
8-bit environment, a lookup table containing the values of T will have 256 entries.
As an introduction to intensity transformations, consider Fig. 3.3, which
shows three basic types of functions used frequently for image enhance-
ment: linear (negative and identity transformations), logarithmic (log and
inverse-log transformations), and power-law (nth power and nth root trans-
formations). The identity function is the trivial case in which output intensi-
ties are identical to input intensities. It is included in the graph only for
completeness.

3.2.1 Image Negatives


The negative of an image with intensity levels in the range [0, L - 1] is ob-
tained by using the negative transformation shown in Fig. 3.3, which is given by
the expression

s = L - 1 - r (3.2-1)

Reversing the intensity levels of an image in this manner produces the


equivalent of a photographic negative. This type of processing is particularly
suited for enhancing white or gray detail embedded in dark regions of an

FIGURE 3.3 Some L1


basic intensity
transformation Negative
functions. All
curves were nth root
scaled to fit in the 3L/4
range shown.
Output intensity level, s

Log
nth power
L/2

L/4

Identity Inverse log

0
0 L/4 L/2 3L/4 L1
Input intensity level, r
3.2 ■ Some Basic Intensity Transformation Functions 109

a b
FIGURE 3.4
(a) Original digital
mammogram.
(b) Negative
image obtained
using the negative
transformation
in Eq. (3.2-1).
(Courtesy of G.E.
Medical Systems.)

image, especially when the black areas are dominant in size. Figure 3.4
shows an example. The original image is a digital mammogram showing a
small lesion. In spite of the fact that the visual content is the same in both
images, note how much easier it is to analyze the breast tissue in the nega-
tive image in this particular case.

3.2.2 Log Transformations


The general form of the log transformation in Fig. 3.3 is
s = c log (1 + r) (3.2-2)
where c is a constant, and it is assumed that r Ú 0. The shape of the log curve
in Fig. 3.3 shows that this transformation maps a narrow range of low intensity
values in the input into a wider range of output levels. The opposite is true of
higher values of input levels. We use a transformation of this type to expand
the values of dark pixels in an image while compressing the higher-level val-
ues. The opposite is true of the inverse log transformation.
Any curve having the general shape of the log functions shown in Fig. 3.3
would accomplish this spreading/compressing of intensity levels in an image,
but the power-law transformations discussed in the next section are much
more versatile for this purpose. The log function has the important character-
istic that it compresses the dynamic range of images with large variations in
pixel values. A classic illustration of an application in which pixel values have
a large dynamic range is the Fourier spectrum, which will be discussed in
Chapter 4. At the moment, we are concerned only with the image characteris-
tics of spectra. It is not unusual to encounter spectrum values that range from 0
to 10 6 or higher. While processing numbers such as these presents no problems
for a computer, image display systems generally will not be able to reproduce
110 Chapter 3 ■ Intensity Transformations and Spatial Filtering

a b
FIGURE 3.5
(a) Fourier
spectrum.
(b) Result of
applying the log
transformation in
Eq. (3.2-2) with
c = 1.

faithfully such a wide range of intensity values. The net effect is that a signifi-
cant degree of intensity detail can be lost in the display of a typical Fourier
spectrum.
As an illustration of log transformations, Fig. 3.5(a) shows a Fourier spec-
trum with values in the range 0 to 1.5 * 106. When these values are scaled lin-
early for display in an 8-bit system, the brightest pixels will dominate the
display, at the expense of lower (and just as important) values of the spec-
trum. The effect of this dominance is illustrated vividly by the relatively small
area of the image in Fig. 3.5(a) that is not perceived as black. If, instead of dis-
playing the values in this manner, we first apply Eq. (3.2-2) (with c = 1 in this
case) to the spectrum values, then the range of values of the result becomes 0
to 6.2, which is more manageable. Figure 3.5(b) shows the result of scaling this
new range linearly and displaying the spectrum in the same 8-bit display. The
wealth of detail visible in this image as compared to an unmodified display of
the spectrum is evident from these pictures. Most of the Fourier spectra seen
in image processing publications have been scaled in just this manner.

3.2.3 Power-Law (Gamma) Transformations


Power-law transformations have the basic form

s = c rg (3.2-3)

where c and g are positive constants. Sometimes Eq. (3.2-3) is written as


s = c(r + e)g to account for an offset (that is, a measurable output when the
input is zero). However, offsets typically are an issue of display calibration
and as a result they are normally ignored in Eq. (3.2-3). Plots of s versus r for
various values of g are shown in Fig. 3.6. As in the case of the log transforma-
tion, power-law curves with fractional values of g map a narrow range of dark
input values into a wider range of output values, with the opposite being true
for higher values of input levels. Unlike the log function, however, we notice
3.2 ■ Some Basic Intensity Transformation Functions 111
L1 FIGURE 3.6 Plots
of the equation
g  0.04 s = crg for
various values of
g  0.10
g (c = 1 in all
3L/4 g  0.20 cases). All curves
were scaled to fit
Output intensity level, s

in the range
g  0.40
shown.
g  0.67
L/2 g1
g  1.5

g  2.5

L/4 g  5.0

g  10.0

g  25.0

0
0 L/4 L/ 2 3L/4 L1
Input intensity level, r

here a family of possible transformation curves obtained simply by varying g.


As expected, we see in Fig. 3.6 that curves generated with values of g 7 1
have exactly the opposite effect as those generated with values of g 6 1.
Finally, we note that Eq. (3.2-3) reduces to the identity transformation when
c = g = 1.
A variety of devices used for image capture, printing, and display respond
according to a power law. By convention, the exponent in the power-law equa-
tion is referred to as gamma [hence our use of this symbol in Eq. (3.2-3)].
The process used to correct these power-law response phenomena is called
gamma correction. For example, cathode ray tube (CRT) devices have an
intensity-to-voltage response that is a power function, with exponents vary-
ing from approximately 1.8 to 2.5. With reference to the curve for g = 2.5 in
Fig. 3.6, we see that such display systems would tend to produce images that
are darker than intended. This effect is illustrated in Fig. 3.7. Figure 3.7(a)
shows a simple intensity-ramp image input into a monitor. As expected, the
output of the monitor appears darker than the input, as Fig. 3.7(b) shows.
Gamma correction in this case is straightforward. All we need to do is pre-
process the input image before inputting it into the monitor by performing
the transformation s = r1>2.5 = r0.4. The result is shown in Fig. 3.7(c). When
input into the same monitor, this gamma-corrected input produces an out-
put that is close in appearance to the original image, as Fig. 3.7(d) shows. A
similar analysis would apply to other imaging devices such as scanners and
printers. The only difference would be the device-dependent value of
gamma (Poynton [1996]).
112 Chapter 3 ■ Intensity Transformations and Spatial Filtering

a b
c d
FIGURE 3.7
(a) Intensity ramp
image. (b) Image
as viewed on a
simulated monitor
with a gamma of
2.5. (c) Gamma-
corrected image.
(d) Corrected
image as viewed
on the same Original image Gamma Original image as viewed
monitor. Compare correction on monitor
(d) and (a).

Gamma-corrected image Gamma-corrected image as


viewed on the same monitor

Gamma correction is important if displaying an image accurately on a


computer screen is of concern. Images that are not corrected properly can
look either bleached out, or, what is more likely, too dark. Trying to reproduce
colors accurately also requires some knowledge of gamma correction because
varying the value of gamma changes not only the intensity, but also the ratios
of red to green to blue in a color image. Gamma correction has become in-
creasingly important in the past few years, as the use of digital images for
commercial purposes over the Internet has increased. It is not unusual that
images created for a popular Web site will be viewed by millions of people,
the majority of whom will have different monitors and/or monitor settings.
Some computer systems even have partial gamma correction built in. Also,
current image standards do not contain the value of gamma with which an
image was created, thus complicating the issue further. Given these con-
straints, a reasonable approach when storing images in a Web site is to pre-
process the images with a gamma that represents an “average” of the types of
monitors and computer systems that one expects in the open market at any
given point in time.
3.2 ■ Some Basic Intensity Transformation Functions 113

■ In addition to gamma correction, power-law transformations are useful for EXAMPLE 3.1:
general-purpose contrast manipulation. Figure 3.8(a) shows a magnetic reso- Contrast
enhancement
nance image (MRI) of an upper thoracic human spine with a fracture disloca-
using power-law
tion and spinal cord impingement. The fracture is visible near the vertical transformations.
center of the spine, approximately one-fourth of the way down from the top of
the picture. Because the given image is predominantly dark, an expansion of
intensity levels is desirable. This can be accomplished with a power-law trans-
formation with a fractional exponent. The other images shown in the figure
were obtained by processing Fig. 3.8(a) with the power-law transformation

a b
c d
FIGURE 3.8
(a) Magnetic
resonance
image (MRI) of a
fractured human
spine.
(b)–(d) Results of
applying the
transformation in
Eq. (3.2-3) with
c = 1 and
g = 0.6, 0.4, and
0.3, respectively.
(Original image
courtesy of Dr.
David R. Pickens,
Department of
Radiology and
Radiological
Sciences,
Vanderbilt
University
Medical Center.)
114 Chapter 3 ■ Intensity Transformations and Spatial Filtering

function of Eq. (3.2-3). The values of gamma corresponding to images (b)


through (d) are 0.6, 0.4, and 0.3, respectively (the value of c was 1 in all cases).
We note that, as gamma decreased from 0.6 to 0.4, more detail became visible.
A further decrease of gamma to 0.3 enhanced a little more detail in the back-
ground, but began to reduce contrast to the point where the image started to
have a very slight “washed-out” appearance, especially in the background. By
comparing all results, we see that the best enhancement in terms of contrast
and discernable detail was obtained with g = 0.4. A value of g = 0.3 is an ap-
proximate limit below which contrast in this particular image would be
reduced to an unacceptable level. ■

EXAMPLE 3.2: ■ Figure 3.9(a) shows the opposite problem of Fig. 3.8(a). The image to be
Another processed now has a washed-out appearance, indicating that a compression
illustration of
of intensity levels is desirable. This can be accomplished with Eq. (3.2-3)
power-law
transformations. using values of g greater than 1. The results of processing Fig. 3.9(a) with
g = 3.0, 4.0, and 5.0 are shown in Figs. 3.9(b) through (d). Suitable results
were obtained with gamma values of 3.0 and 4.0, the latter having a slightly

a b
c d
FIGURE 3.9
(a) Aerial image.
(b)–(d) Results of
applying the
transformation in
Eq. (3.2-3) with
c = 1 and
g = 3.0, 4.0, and
5.0, respectively.
(Original image
for this example
courtesy of
NASA.)
3.2 ■ Some Basic Intensity Transformation Functions 115

more appealing appearance because it has higher contrast. The result obtained
with g = 5.0 has areas that are too dark, in which some detail is lost. The dark
region to the left of the main road in the upper left quadrant is an example of
such an area. ■

3.2.4 Piecewise-Linear Transformation Functions


A complementary approach to the methods discussed in the previous three sec-
tions is to use piecewise linear functions. The principal advantage of piecewise
linear functions over the types of functions we have discussed thus far is that
the form of piecewise functions can be arbitrarily complex. In fact, as you will
see shortly, a practical implementation of some important transformations can
be formulated only as piecewise functions. The principal disadvantage of piece-
wise functions is that their specification requires considerably more user input.

Contrast stretching
One of the simplest piecewise linear functions is a contrast-stretching trans-
formation. Low-contrast images can result from poor illumination, lack of dy-
namic range in the imaging sensor, or even the wrong setting of a lens aperture
during image acquisition. Contrast stretching is a process that expands the
range of intensity levels in an image so that it spans the full intensity range of
the recording medium or display device.
Figure 3.10(a) shows a typical transformation used for contrast stretching. The
locations of points (r1, s1) and (r2, s2) control the shape of the transformation func-
tion. If r1 = s1 and r2 = s2, the transformation is a linear function that produces no
changes in intensity levels. If r1 = r2, s1 = 0 and s2 = L - 1, the transformation
becomes a thresholding function that creates a binary image, as illustrated in
Fig. 3.2(b). Intermediate values of (r1, s1) and (r2, s2) produce various degrees of
spread in the intensity levels of the output image, thus affecting its contrast. In gen-
eral, r1 … r2 and s1 … s2 is assumed so that the function is single valued and mo-
notonically increasing. This condition preserves the order of intensity levels, thus
preventing the creation of intensity artifacts in the processed image.
Figure 3.10(b) shows an 8-bit image with low contrast. Figure 3.10(c) shows
the result of contrast stretching, obtained by setting (r1, s1) = (rmin, 0) and
(r2, s2) = (rmax, L - 1), where rmin and rmax denote the minimum and maxi-
mum intensity levels in the image, respectively. Thus, the transformation func-
tion stretched the levels linearly from their original range to the full range
[0, L - 1]. Finally, Fig. 3.10(d) shows the result of using the thresholding func-
tion defined previously, with (r1, s1) = (m, 0) and (r2, s2) = (m, L - 1),
where m is the mean intensity level in the image. The original image on which
these results are based is a scanning electron microscope image of pollen, mag-
nified approximately 700 times.

Intensity-level slicing
Highlighting a specific range of intensities in an image often is of interest.Appli-
cations include enhancing features such as masses of water in satellite imagery
and enhancing flaws in X-ray images. The process, often called intensity-level
116 Chapter 3 ■ Intensity Transformations and Spatial Filtering

a b
c d
L1
FIGURE 3.10

Output intensity level, s


Contrast stretching. (r2, s2)
(a) Form of 3L/4
transformation
function. (b) A
L/ 2 T(r)
low-contrast image.
(c) Result of
contrast stretching. L/4
(d) Result of (r1, s1)
thresholding.
(Original image 0
0 L/4 L/2 3L/4 L  1
courtesy of Dr.
Roger Heady, Input intensity level, r
Research School of
Biological Sciences,
Australian National
University,
Canberra,
Australia.)

slicing, can be implemented in several ways, but most are variations of two basic
themes. One approach is to display in one value (say, white) all the values in the
range of interest and in another (say, black) all other intensities. This transfor-
mation, shown in Fig. 3.11(a), produces a binary image. The second approach,
based on the transformation in Fig. 3.11(b), brightens (or darkens) the desired
range of intensities but leaves all other intensity levels in the image unchanged.

a b L1 L1

FIGURE 3.11 (a) This


transformation
highlights intensity
range [A, B] and
reduces all other
s s T (r)
intensities to a lower T(r)
level. (b) This
transformation
highlights range
[A, B] and preserves
all other intensity
levels. r r
0 A B L1 0 A B L1
3.2 ■ Some Basic Intensity Transformation Functions 117

■ Figure 3.12(a) is an aortic angiogram near the kidney area (see Section EXAMPLE 3.3:
1.3.2 for a more detailed explanation of this image). The objective of this ex- Intensity-level
slicing.
ample is to use intensity-level slicing to highlight the major blood vessels that
appear brighter as a result of an injected contrast medium. Figure 3.12(b)
shows the result of using a transformation of the form in Fig. 3.11(a), with the
selected band near the top of the scale, because the range of interest is brighter
than the background. The net result of this transformation is that the blood
vessel and parts of the kidneys appear white, while all other intensities are
black. This type of enhancement produces a binary image and is useful for
studying the shape of the flow of the contrast medium (to detect blockages, for
example).
If, on the other hand, interest lies in the actual intensity values of the region
of interest, we can use the transformation in Fig. 3.11(b). Figure 3.12(c) shows
the result of using such a transformation in which a band of intensities in the
mid-gray region around the mean intensity was set to black, while all other in-
tensities were left unchanged. Here, we see that the gray-level tonality of the
major blood vessels and part of the kidney area were left intact. Such a result
might be useful when interest lies in measuring the actual flow of the contrast
medium as a function of time in a series of images. ■

Bit-plane slicing
Pixels are digital numbers composed of bits. For example, the intensity of each
pixel in a 256-level gray-scale image is composed of 8 bits (i.e., one byte). In-
stead of highlighting intensity-level ranges, we could highlight the contribution

a b c
FIGURE 3.12 (a) Aortic angiogram. (b) Result of using a slicing transformation of the type illustrated in Fig.
3.11(a), with the range of intensities of interest selected in the upper end of the gray scale. (c) Result of
using the transformation in Fig. 3.11(b), with the selected area set to black, so that grays in the area of the
blood vessels and kidneys were preserved. (Original image courtesy of Dr. Thomas R. Gest, University of
Michigan Medical School.)
118 Chapter 3 ■ Intensity Transformations and Spatial Filtering

FIGURE 3.13 One 8-bit byte Bit plane 8


Bit-plane (most significant)
representation of
an 8-bit image.

Bit plane 1
(least significant)

made to total image appearance by specific bits. As Fig. 3.13 illustrates, an 8-bit
image may be considered as being composed of eight 1-bit planes, with plane 1
containing the lowest-order bit of all pixels in the image and plane 8 all the
highest-order bits.
Figure 3.14(a) shows an 8-bit gray-scale image and Figs. 3.14(b) through (i)
are its eight 1-bit planes, with Fig. 3.14(b) corresponding to the lowest-order bit.
Observe that the four higher-order bit planes, especially the last two, contain a
significant amount of the visually significant data. The lower-order planes con-
tribute to more subtle intensity details in the image. The original image has a
gray border whose intensity is 194. Notice that the corresponding borders of some
of the bit planes are black (0), while others are white (1). To see why, consider a

a b c
d e f
g h i
FIGURE 3.14 (a) An 8-bit gray-scale image of size 500 * 1192 pixels. (b) through (i) Bit planes 1 through 8,
with bit plane 1 corresponding to the least significant bit. Each bit plane is a binary image.
3.2 ■ Some Basic Intensity Transformation Functions 119

pixel in, say, the middle of the lower border of Fig. 3.14(a). The corresponding
pixels in the bit planes, starting with the highest-order plane, have values 1 1 0 0
0 0 1 0, which is the binary representation of decimal 194. The value of any pixel
in the original image can be similarly reconstructed from its corresponding
binary-valued pixels in the bit planes.
In terms of intensity transformation functions, it is not difficult to show that
the binary image for the 8th bit plane of an 8-bit image can be obtained by
processing the input image with a thresholding intensity transformation func-
tion that maps all intensities between 0 and 127 to 0 and maps all levels be-
tween 128 and 255 to 1. The binary image in Fig. 3.14(i) was obtained in just
this manner. It is left as an exercise (Problem 3.4) to obtain the intensity trans-
formation functions for generating the other bit planes.
Decomposing an image into its bit planes is useful for analyzing the rela-
tive importance of each bit in the image, a process that aids in determining
the adequacy of the number of bits used to quantize the image. Also, this type
of decomposition is useful for image compression (the topic of Chapter 8), in
which fewer than all planes are used in reconstructing an image. For example,
Fig. 3.15(a) shows an image reconstructed using bit planes 8 and 7. The recon-
struction is done by multiplying the pixels of the nth plane by the constant
2n - 1. This is nothing more than converting the nth significant binary bit to
decimal. Each plane used is multiplied by the corresponding constant, and all
planes used are added to obtain the gray scale image. Thus, to obtain
Fig. 3.15(a), we multiplied bit plane 8 by 128, bit plane 7 by 64, and added the
two planes. Although the main features of the original image were restored,
the reconstructed image appears flat, especially in the background. This is not
surprising because two planes can produce only four distinct intensity levels.
Adding plane 6 to the reconstruction helped the situation, as Fig. 3.15(b)
shows. Note that the background of this image has perceptible false contour-
ing. This effect is reduced significantly by adding the 5th plane to the recon-
struction, as Fig. 3.15(c) illustrates. Using more planes in the reconstruction
would not contribute significantly to the appearance of this image. Thus, we
conclude that storing the four highest-order bit planes would allow us to re-
construct the original image in acceptable detail. Storing these four planes in-
stead of the original image requires 50% less storage (ignoring memory
architecture issues).

a b c
FIGURE 3.15 Images reconstructed using (a) bit planes 8 and 7; (b) bit planes 8, 7, and 6; and (c) bit planes 8,
7, 6, and 5. Compare (c) with Fig. 3.14(a).
120 Chapter 3 ■ Intensity Transformations and Spatial Filtering

3.3 Histogram Processing


The histogram of a digital image with intensity levels in the range [0, L - 1]
is a discrete function h(rk) = nk, where rk is the kth intensity value and nk is
the number of pixels in the image with intensity rk. It is common practice to
normalize a histogram by dividing each of its components by the total num-
ber of pixels in the image, denoted by the product MN, where, as usual, M
and N are the row and column dimensions of the image. Thus, a normalized
histogram is given by p(rk) = rk >MN, for k = 0, 1, 2, Á , L - 1. Loosely
speaking, p(rk) is an estimate of the probability of occurrence of intensity
level rk in an image. The sum of all components of a normalized histogram is
Consult the book Web equal to 1.
site for a review of basic Histograms are the basis for numerous spatial domain processing tech-
probability theory.
niques. Histogram manipulation can be used for image enhancement, as
shown in this section. In addition to providing useful image statistics, we shall
see in subsequent chapters that the information inherent in histograms also is
quite useful in other image processing applications, such as image compression
and segmentation. Histograms are simple to calculate in software and also
lend themselves to economic hardware implementations, thus making them a
popular tool for real-time image processing.
As an introduction to histogram processing for intensity transformations,
consider Fig. 3.16, which is the pollen image of Fig. 3.10 shown in four basic in-
tensity characteristics: dark, light, low contrast, and high contrast. The right
side of the figure shows the histograms corresponding to these images. The
horizontal axis of each histogram plot corresponds to intensity values, rk. The
vertical axis corresponds to values of h(rk) = nk or p(rk) = nk>MN if the val-
ues are normalized. Thus, histograms may be viewed graphically simply as
plots of h(rk) = nk versus rk or p(rk) = nk>MN versus rk.
We note in the dark image that the components of the histogram are con-
centrated on the low (dark) side of the intensity scale. Similarly, the compo-
nents of the histogram of the light image are biased toward the high side of
the scale. An image with low contrast has a narrow histogram located typi-
cally toward the middle of the intensity scale. For a monochrome image this
implies a dull, washed-out gray look. Finally, we see that the components of
the histogram in the high-contrast image cover a wide range of the intensity
scale and, further, that the distribution of pixels is not too far from uniform,
with very few vertical lines being much higher than the others. Intuitively, it
is reasonable to conclude that an image whose pixels tend to occupy the entire
range of possible intensity levels and, in addition, tend to be distributed uni-
formly, will have an appearance of high contrast and will exhibit a large vari-
ety of gray tones. The net effect will be an image that shows a great deal of
gray-level detail and has high dynamic range. It will be shown shortly that it
is possible to develop a transformation function that can automatically
achieve this effect, based only on information available in the histogram of
the input image.
3.3 ■ Histogram Processing 121

Histogram of dark image

Histogram of light image

Histogram of low-contrast image

Histogram of high-contrast image

FIGURE 3.16 Four basic image types: dark, light, low contrast, high
contrast, and their corresponding histograms.
122 Chapter 3 ■ Intensity Transformations and Spatial Filtering

3.3.1 Histogram Equalization


Consider for a moment continuous intensity values and let the variable r de-
note the intensities of an image to be processed. As usual, we assume that r is
in the range [0, L - 1], with r = 0 representing black and r = L - 1 repre-
senting white. For r satisfying these conditions, we focus attention on transfor-
mations (intensity mappings) of the form
s = T(r) 0 … r … L - 1 (3.3-1)
that produce an output intensity level s for every pixel in the input image hav-
ing intensity r. We assume that:
(a) T(r) is a monotonically† increasing function in the interval 0 … r … L - 1;
and
(b) 0 … T(r) … L - 1 for 0 … r … L - 1.
In some formulations to be discussed later, we use the inverse
r = T -1(s) 0 … s … L - 1 (3.3-2)
in which case we change condition (a) to
(a ¿ ) T(r) is a strictly monotonically increasing function in the interval
0 … r … L - 1.
The requirement in condition (a) that T(r) be monotonically increasing
guarantees that output intensity values will never be less than corresponding
input values, thus preventing artifacts created by reversals of intensity. Condi-
tion (b) guarantees that the range of output intensities is the same as the
input. Finally, condition (a ¿ ) guarantees that the mappings from s back to r
will be one-to-one, thus preventing ambiguities. Figure 3.17(a) shows a function

a b T(r) T (r)
FIGURE 3.17
(a) Monotonically L1 L1
increasing
Single
function, showing value, sk
how multiple T (r)
T(r)
values can map to
a single value. Single sk
(b) Strictly value, sq
monotonically
increasing ...
function. This is a
one-to-one
r r
mapping, both 0 rk L1
ways. 0 Multiple Single L  1
values value


Recall that a function T(r) is monotonically increasing if T(r2) Ú T(r1) for r2 7 r1. T(r) is a strictly mo-
notonically increasing function if T(r2) 7 T(r1) for r2 7 r1. Similar definitions apply to monotonically
decreasing functions.
3.3 ■ Histogram Processing 123

that satisfies conditions (a) and (b). Here, we see that it is possible for multi-
ple values to map to a single value and still satisfy these two conditions. That
is, a monotonic transformation function performs a one-to-one or many-to-
one mapping. This is perfectly fine when mapping from r to s. However,
Fig. 3.17(a) presents a problem if we wanted to recover the values of r unique-
ly from the mapped values (inverse mapping can be visualized by reversing
the direction of the arrows). This would be possible for the inverse mapping
of sk in Fig. 3.17(a), but the inverse mapping of sq is a range of values, which,
of course, prevents us in general from recovering the original value of r that
resulted in sq. As Fig. 3.17(b) shows, requiring that T(r) be strictly monotonic
guarantees that the inverse mappings will be single valued (i.e., the mapping
is one-to-one in both directions). This is a theoretical requirement that allows
us to derive some important histogram processing techniques later in this
chapter. Because in practice we deal with integer intensity values, we are
forced to round all results to their nearest integer values. Therefore, when
strict monotonicity is not satisfied, we address the problem of a nonunique in-
verse transformation by looking for the closest integer matches. Example 3.8
gives an illustration of this.
The intensity levels in an image may be viewed as random variables in the
interval [0, L - 1]. A fundamental descriptor of a random variable is its prob-
ability density function (PDF). Let pr (r) and ps (s) denote the PDFs of r and s,
respectively, where the subscripts on p are used to indicate that pr and ps are
different functions in general. A fundamental result from basic probability
theory is that if pr(r) and T(r) are known, and T(r) is continuous and differen-
tiable over the range of values of interest, then the PDF of the transformed
(mapped) variable s can be obtained using the simple formula

ps (s) = pr(r) ` `
dr
(3.3-3)
ds

Thus, we see that the PDF of the output intensity variable, s, is determined by
the PDF of the input intensities and the transformation function used [recall
that r and s are related by T(r)].
A transformation function of particular importance in image processing has
the form
r
s = T(r) = (L - 1) pr (w) dw (3.3-4)
L0
where w is a dummy variable of integration. The right side of this equation is
recognized as the cumulative distribution function (CDF) of random variable
r. Because PDFs always are positive, and recalling that the integral of a func-
tion is the area under the function, it follows that the transformation function
of Eq. (3.3-4) satisfies condition (a) because the area under the function can-
not decrease as r increases. When the upper limit in this equation is
r = (L - 1), the integral evaluates to 1 (the area under a PDF curve always
is 1), so the maximum value of s is (L - 1) and condition (b) is satisfied also.
124 Chapter 3 ■ Intensity Transformations and Spatial Filtering

To find the ps(s) corresponding to the transformation just discussed, we use


Eq. (3.3-3). We know from Leibniz’s rule in basic calculus that the derivative of
a definite integral with respect to its upper limit is the integrand evaluated at
the limit. That is,

ds dT(r)
=
dr dr
r
d
= (L - 1) B p (w) dw R (3.3-5)
dr L0 r

= (L - 1)pr(r)

Substituting this result for dr> ds in Eq. (3.3-3), and keeping in mind that all
probability values are positive, yields

ps (s) = pr (r) ` `
dr
ds

= pr(r) ` `
1
(3.3-6)
(L - 1)pr (r)

1
= 0 … s … L - 1
L - 1
We recognize the form of ps (s) in the last line of this equation as a uniform
probability density function. Simply stated, we have demonstrated that per-
forming the intensity transformation in Eq. (3.3-4) yields a random variable, s,
characterized by a uniform PDF. It is important to note from this equation that
T(r) depends on pr (r) but, as Eq. (3.3-6) shows, the resulting ps (s) always is
uniform, independently of the form of pr(r). Figure 3.18 illustrates these
concepts.

pr (r) ps (s)

A
Eq. (3.3-4)
1
L1

r s
0 L1 0 L1

a b
FIGURE 3.18 (a) An arbitrary PDF. (b) Result of applying the transformation in
Eq. (3.3-4) to all intensity levels, r. The resulting intensities, s, have a uniform PDF,
independently of the form of the PDF of the r’s.
3.3 ■ Histogram Processing 125

■ To fix ideas, consider the following simple example. Suppose that the (con- EXAMPLE 3.4:
tinuous) intensity values in an image have the PDF Illustration of
Eqs. (3.3-4) and
2r (3.3-6).
for 0 … r … L - 1
pr(r) = c (L - 1)2
0 otherwise
From Eq. (3.3-4),
r r
2 r2
s = T(r) = (L - 1) pr (w) dw = w dw =
L0 L - 1 L0 L - 1
Suppose next that we form a new image with intensities, s, obtained using
this transformation; that is, the s values are formed by squaring the corre-
sponding intensity values of the input image and dividing them by (L - 1).
For example, consider an image in which L = 10, and suppose that a pixel
in an arbitrary location (x, y) in the input image has intensity r = 3. Then
the pixel in that location in the new image is s = T(r) = r 2>9 = 1. We can
verify that the PDF of the intensities in the new image is uniform simply by
substituting pr(r) into Eq. (3.3-6) and using the fact that s = r 2>(L - 1);
that is,
-1

ps(s) = pr(r) ` ` = `B R `
dr 2r ds
ds 2 dr
(L - 1)

` `
-1
2r d r2
= B R
(L - 1)2 dr L - 1

2 ` ` =
2r (L - 1) 1
=
(L - 1) 2 r L - 1
where the last step follows from the fact that r is nonnegative and we assume
that L 7 1. As expected, the result is a uniform PDF. ■

For discrete values, we deal with probabilities (histogram values) and sum-
mations instead of probability density functions and integrals.† As mentioned
earlier, the probability of occurrence of intensity level rk in a digital image is
approximated by
nk
pr(rk) = k = 0, 1, 2, Á , L - 1 (3.3-7)
MN
where MN is the total number of pixels in the image, nk is the number of pix-
els that have intensity rk, and L is the number of possible intensity levels in the
image (e.g., 256 for an 8-bit image). As noted in the beginning of this section, a
plot of pr(rk) versus rk is commonly referred to as a histogram.


The conditions of monotonicity stated earlier apply also in the discrete case. We simply restrict the val-
ues of the variables to be discrete.
126 Chapter 3 ■ Intensity Transformations and Spatial Filtering

The discrete form of the transformation in Eq. (3.3-4) is


k
sk = T(rk) = (L - 1) a pr(rj)
j=0
(3.3-8)
(L - 1) k
MN ja
= nj k = 0, 1, 2, Á , L - 1
=0

Thus, a processed (output) image is obtained by mapping each pixel in the


input image with intensity rk into a corresponding pixel with level sk in the
output image, using Eq. (3.3-8). The transformation (mapping) T(rk) in this
equation is called a histogram equalization or histogram linearization trans-
formation. It is not difficult to show (Problem 3.10) that this transformation
satisfies conditions (a) and (b) stated previously in this section.

EXAMPLE 3.5: ■ Before continuing, it will be helpful to work through a simple example.
A simple Suppose that a 3-bit image (L = 8) of size 64 * 64 pixels (MN = 4096) has
illustration of
the intensity distribution shown in Table 3.1, where the intensity levels are in-
histogram
equalization. tegers in the range [0, L - 1] = [0, 7].
The histogram of our hypothetical image is sketched in Fig. 3.19(a). Values
of the histogram equalization transformation function are obtained using
Eq. (3.3-8). For instance,
0
s0 = T(r0) = 7 a pr(rj) = 7pr (r0) = 1.33
j=0

Similarly,
1
s1 = T(r1) = 7 a pr (rj) = 7pr (r0) + 7pr(r1) = 3.08
j=0

and s2 = 4.55, s3 = 5.67, s4 = 6.23, s5 = 6.65, s6 = 6.86, s7 = 7.00. This trans-


formation function has the staircase shape shown in Fig. 3.19(b).

pr(rk) = nk>MN
TABLE 3.1
rk nk
Intensity
distribution and r0 = 0 790 0.19
histogram values r1 = 1 1023 0.25
for a 3-bit, r2 = 2 850 0.21
64 * 64 digital r3 = 3 656 0.16
image. r4 = 4 329 0.08
r5 = 5 245 0.06
r6 = 6 122 0.03
r7 = 7 81 0.02
3.3 ■ Histogram Processing 127

pr (rk) sk ps (sk)

.25 7.0 .25


.20 5.6 .20
.15 4.2 .15
T(r)
.10 2.8 .10
.05 1.4 .05
rk rk sk
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

a b c
FIGURE 3.19 Illustration of histogram equalization of a 3-bit (8 intensity levels) image. (a) Original
histogram. (b) Transformation function. (c) Equalized histogram.

At this point, the s values still have fractions because they were generated
by summing probability values, so we round them to the nearest integer:

s0 = 1.33 : 1 s4 = 6.23 : 6
s1 = 3.08 : 3 s5 = 6.65 : 7
s2 = 4.55 : 5 s6 = 6.86 : 7
s3 = 5.67 : 6 s7 = 7.00 : 7

These are the values of the equalized histogram. Observe that there are only
five distinct intensity levels. Because r0 = 0 was mapped to s0 = 1, there are
790 pixels in the histogram equalized image with this value (see Table 3.1).
Also, there are in this image 1023 pixels with a value of s1 = 3 and 850 pixels
with a value of s2 = 5. However both r3 and r4 were mapped to the same
value, 6, so there are (656 + 329) = 985 pixels in the equalized image with this
value. Similarly, there are (245 + 122 + 81) = 448 pixels with a value of 7 in
the histogram equalized image. Dividing these numbers by MN = 4096 yielded
the equalized histogram in Fig. 3.19(c).
Because a histogram is an approximation to a PDF, and no new allowed in-
tensity levels are created in the process, perfectly flat histograms are rare in
practical applications of histogram equalization. Thus, unlike its continuous
counterpart, it cannot be proved (in general) that discrete histogram equaliza-
tion results in a uniform histogram. However, as you will see shortly, using Eq.
(3.3-8) has the general tendency to spread the histogram of the input image so
that the intensity levels of the equalized image span a wider range of the in-
tensity scale. The net result is contrast enhancement. ■

We discussed earlier in this section the many advantages of having intensity


values that cover the entire gray scale. In addition to producing intensities that
have this tendency, the method just derived has the additional advantage that
it is fully “automatic.” In other words, given an image, the process of histogram
equalization consists simply of implementing Eq. (3.3-8), which is based on in-
formation that can be extracted directly from the given image, without the
128 Chapter 3 ■ Intensity Transformations and Spatial Filtering

need for further parameter specifications. We note also the simplicity of the
computations required to implement the technique.
The inverse transformation from s back to r is denoted by

rk = T-1(sk) k = 0, 1, 2, Á , L - 1 (3.3-9)

It can be shown (Problem 3.10) that this inverse transformation satisfies con-
ditions (a¿) and (b) only if none of the levels, rk, k = 0, 1, 2, Á , L - 1, are
missing from the input image, which in turn means that none of the components
of the image histogram are zero. Although the inverse transformation is not
used in histogram equalization, it plays a central role in the histogram-matching
scheme developed in the next section.

EXAMPLE 3.6: ■ The left column in Fig. 3.20 shows the four images from Fig. 3.16, and the
Histogram center column shows the result of performing histogram equalization on each
equalization.
of these images. The first three results from top to bottom show significant im-
provement. As expected, histogram equalization did not have much effect on
the fourth image because the intensities of this image already span the full in-
tensity scale. Figure 3.21 shows the transformation functions used to generate the
equalized images in Fig. 3.20. These functions were generated using Eq. (3.3-8).
Observe that transformation (4) has a nearly linear shape, indicating that the
inputs were mapped to nearly equal outputs.
The third column in Fig. 3.20 shows the histograms of the equalized images. It
is of interest to note that, while all these histograms are different, the histogram-
equalized images themselves are visually very similar.This is not unexpected be-
cause the basic difference between the images on the left column is one of
contrast, not content. In other words, because the images have the same con-
tent, the increase in contrast resulting from histogram equalization was
enough to render any intensity differences in the equalized images visually in-
distinguishable. Given the significant contrast differences between the original
images, this example illustrates the power of histogram equalization as an
adaptive contrast enhancement tool. ■

3.3.2 Histogram Matching (Specification)


As indicated in the preceding discussion, histogram equalization automati-
cally determines a transformation function that seeks to produce an output
image that has a uniform histogram. When automatic enhancement is de-
sired, this is a good approach because the results from this technique are
predictable and the method is simple to implement. We show in this section
that there are applications in which attempting to base enhancement on a
uniform histogram is not the best approach. In particular, it is useful some-
times to be able to specify the shape of the histogram that we wish the
processed image to have. The method used to generate a processed image
that has a specified histogram is called histogram matching or histogram
specification.
3.3 ■ Histogram Processing 129

FIGURE 3.20 Left column: images from Fig. 3.16. Center column: corresponding histogram-
equalized images. Right column: histograms of the images in the center column.
130 Chapter 3 ■ Intensity Transformations and Spatial Filtering

FIGURE 3.21 255


Transformation
functions for
histogram
equalization. 192
Transformations
(4)
(1) through (4) (1)
were obtained from 128
the histograms of (2)
the images (from
top to bottom) in (3)
the left column of 64
Fig. 3.20 using
Eq. (3.3-8).
0
0 64 128 192 255

Let us return for a moment to continuous intensities r and z (considered con-


tinuous random variables), and let pr(r) and pz (z) denote their corresponding
continuous probability density functions. In this notation, r and z denote the in-
tensity levels of the input and output (processed) images, respectively. We can
estimate pr(r) from the given input image, while pz (z) is the specified probabili-
ty density function that we wish the output image to have.
Let s be a random variable with the property
r
s = T(r) = (L - 1) pr(w) dw (3.3-10)
L0
where, as before, w is a dummy variable of integration. We recognize this expres-
sion as the continuous version of histogram equalization given in Eq. (3.3-4).
Suppose next that we define a random variable z with the property
z
G(z) = (L - 1) pz (t) dt = s (3.3-11)
3
0

where t is a dummy variable of integration. It then follows from these two


equations that G(z) = T(r) and, therefore, that z must satisfy the condition

z = G-1[T(r)] = G-1(s) (3.3-12)

The transformation T(r) can be obtained from Eq. (3.3-10) once pr(r) has
been estimated from the input image. Similarly, the transformation function
G(z) can be obtained using Eq. (3.3-11) because pz(z) is given.
Equations (3.3-10) through (3.3-12) show that an image whose intensity
levels have a specified probability density function can be obtained from a
given image by using the following procedure:
1. Obtain pr(r) from the input image and use Eq. (3.3-10) to obtain the val-
ues of s.
2. Use the specified PDF in Eq. (3.3-11) to obtain the transformation function
G(z).
3.3 ■ Histogram Processing 131

3. Obtain the inverse transformation z = G-1(s); because z is obtained from


s, this process is a mapping from s to z, the latter being the desired values.
4. Obtain the output image by first equalizing the input image using Eq.
(3.3-10); the pixel values in this image are the s values. For each pixel with
value s in the equalized image, perform the inverse mapping z = G-1(s) to
obtain the corresponding pixel in the output image. When all pixels have
been thus processed, the PDF of the output image will be equal to the
specified PDF.

■ Assuming continuous intensity values, suppose that an image has the inten- EXAMPLE 3.7:
sity PDF pr(r) = 2 r>(L - 1)2 for 0 … r … (L - 1) and pr(r) = 0 for other Histogram
specification.
values of r. Find the transformation function that will produce an image whose
intensity PDF is pz (z) = 3z2>(L - 1)3 for 0 … z … (L - 1) and pz (z) = 0 for
other values of z.
First, we find the histogram equalization transformation for the interval
[0, L - 1]:
r r
2 r2
s = T(r) = (L - 1) pr(w) dw = w dw =
L0 (L - 1) L0 (L - 1)

By definition, this transformation is 0 for values outside the range [0, L - 1].
Squaring the values of the input intensities and dividing them by (L - 1)2 will
produce an image whose intensities, s, have a uniform PDF because this is a
histogram-equalization transformation, as discussed earlier.
We are interested in an image with a specified histogram, so we find next
z z
3 z3
G(z) = (L - 1) pz (w) dw = w2 dw =
L0 (L - 1) L0
2
(L - 1)2

over the interval [0, L - 1]; this function is 0 elsewhere by definition. Finally,
we require that G(z) = s, but G(z) = z3>(L - 1)2; so z3>(L - 1)2 = s, and
we have

z = C (L - 1)2s D
1>3

So, if we multiply every histogram equalized pixel by (L - 1)2 and raise the
product to the power 1>3, the result will be an image whose intensities, z, have
the PDF pz(z) = 3z2>(L - 1)3 in the interval [0, L - 1], as desired.
Because s = r2>(L - 1) we can generate the z’s directly from the intensi-
ties, r, of the input image:
1/3

z = C (L - 1) s D = C (L - 1)r 2 D
1/3 r2 1/3
2
= B (L - 1)2 R
(L - 1)

Thus, squaring the value of each pixel in the original image, multiplying the re-
sult by (L - 1), and raising the product to the power 1>3 will yield an image
132 Chapter 3 ■ Intensity Transformations and Spatial Filtering

whose intensity levels, z, have the specified PDF. We see that the intermedi-
ate step of equalizing the input image can be skipped; all we need is to obtain
the transformation function T(r) that maps r to s. Then, the two steps can be
combined into a single transformation from r to z. ■
As the preceding example shows, histogram specification is straightforward
in principle. In practice, a common difficulty is finding meaningful analytical
expressions for T(r) and G-1. Fortunately, the problem is simplified signifi-
cantly when dealing with discrete quantities. The price paid is the same as for
histogram equalization, where only an approximation to the desired histogram
is achievable. In spite of this, however, some very useful results can be ob-
tained, even with crude approximations.
The discrete formulation of Eq. (3.3-10) is the histogram equalization trans-
formation in Eq. (3.3-8), which we repeat here for convenience:
k
sk = T(rk) = (L - 1) a pr (rj)
j=0
(3.3-13)
k
(L - 1)
MN ja
= nj k = 0, 1, 2, Á , L - 1
=0

where, as before, MN is the total number of pixels in the image, nj is the num-
ber of pixels that have intensity value rj, and L is the total number of possible
intensity levels in the image. Similarly, given a specific value of sk, the discrete
formulation of Eq. (3.3-11) involves computing the transformation function
q
G(zq) = (L - 1) a pz (zi) (3.3-14)
i=0

for a value of q, so that

G(zq) = sk (3.3-15)
where pz (zi), is the ith value of the specified histogram. As before, we find the
desired value zq by obtaining the inverse transformation:
zq = G-1(sk) (3.3-16)

In other words, this operation gives a value of z for each value of s; thus, it per-
forms a mapping from s to z.
In practice, we do not need to compute the inverse of G. Because we deal
with intensity levels that are integers (e.g., 0 to 255 for an 8-bit image), it is a
simple matter to compute all the possible values of G using Eq. (3.3-14) for
q = 0, 1, 2, Á , L - 1. These values are scaled and rounded to their nearest
integer values spanning the range [0, L - 1]. The values are stored in a table.
Then, given a particular value of sk, we look for the closest match in the values
stored in the table. If, for example, the 64th entry in the table is the closest to
sk, then q = 63 (recall that we start counting at 0) and z63 is the best solution
to Eq. (3.3-15). Thus, the given value sk would be associated with z63 (i.e., that
3.3 ■ Histogram Processing 133

specific value of sk would map to z63). Because the zs are intensities used
as the basis for specifying the histogram pz(z), it follows that z0 = 0,
z1 = 1, Á , zL - 1 = L - 1, so z63 would have the intensity value 63. By re-
peating this procedure, we would find the mapping of each value of sk to the
value of zq that is the closest solution to Eq. (3.3-15). These mappings are the
solution to the histogram-specification problem.
Recalling that the sks are the values of the histogram-equalized image, we
may summarize the histogram-specification procedure as follows:

1. Compute the histogram pr(r) of the given image, and use it to find the his-
togram equalization transformation in Eq. (3.3-13). Round the resulting
values, sk, to the integer range [0, L - 1].
2. Compute all values of the transformation function G using the Eq. (3.3-14)
for q = 0, 1, 2, Á , L - 1, where pz (zi) are the values of the specified his-
togram. Round the values of G to integers in the range [0, L - 1]. Store
the values of G in a table.
3. For every value of sk, k = 0, 1, 2, Á , L - 1, use the stored values of G
from step 2 to find the corresponding value of zq so that G(zq) is closest to
sk and store these mappings from s to z. When more than one value of zq
satisfies the given sk (i.e., the mapping is not unique), choose the smallest
value by convention.
4. Form the histogram-specified image by first histogram-equalizing the
input image and then mapping every equalized pixel value, sk, of this
image to the corresponding value zq in the histogram-specified image
using the mappings found in step 3. As in the continuous case, the inter-
mediate step of equalizing the input image is conceptual. It can be skipped
by combining the two transformation functions, T and G-1, as Example 3.8
shows.

As mentioned earlier, for G-1 to satisfy conditions (a¿) and (b), G has to be
strictly monotonic, which, according to Eq. (3.3-14), means that none of the val-
ues pz(zi) of the specified histogram can be zero (Problem 3.10). When working
with discrete quantities, the fact that this condition may not be satisfied is not a
serious implementation issue, as step 3 above indicates. The following example
illustrates this numerically.

■ Consider again the 64 * 64 hypothetical image from Example 3.5, whose EXAMPLE 3.8:
histogram is repeated in Fig. 3.22(a). It is desired to transform this histogram A simple example
of histogram
so that it will have the values specified in the second column of Table 3.2.
specification.
Figure 3.22(b) shows a sketch of this histogram.
The first step in the procedure is to obtain the scaled histogram-equalized
values, which we did in Example 3.5:

s0 = 1 s2 = 5 s4 = 7 s6 = 7

s1 = 3 s3 = 6 s5 = 7 s7 = 7
134 Chapter 3 ■ Intensity Transformations and Spatial Filtering

a b pr (rk) pz (zq)
c d
FIGURE 3.22 .30 .30
(a) Histogram of a .25 .25
3-bit image. (b) .20 .20
Specified .15 .15
histogram.
(c) Transformation .10 .10
function obtained .05 .05
from the specified rk zq
histogram. 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
(d) Result of
G (zq) pz (zq)
performing
histogram
7 .25
specification. 6
Compare .20
5
(b) and (d). 4 .15
3 .10
2
1 .05
zq zq
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

In the next step, we compute all the values of the transformation function, G,
using Eq. (3.3-14):
0
G(z0) = 7 a pz (zj) = 0.00
j=0

Similarly,

G(z1) = 7 a pz (zj) = 7 C p(z0) + p(z1) D = 0.00


1

j=0

and
G(z2) = 0.00 G(z4) = 2.45 G(z6) = 5.95

G(z3) = 1.05 G(z5) = 4.55 G(z7) = 7.00

TABLE 3.2
Specified Actual
Specified and
zq pz (zq) pz(zk)
actual histograms
(the values in the z0 = 0 0.00 0.00
third column are z1 = 1 0.00 0.00
from the z2 = 2 0.00 0.00
computations z3 = 3 0.15 0.19
performed in the z4 = 4 0.20 0.25
body of Example z5 = 5 0.30 0.21
3.8). z6 = 6 0.20 0.24
z7 = 7 0.15 0.11
3.3 ■ Histogram Processing 135

As in Example 3.5, these fractional values are converted to integers in our


valid range, [0, 7]. The results are:

G(z0) = 0.00 : 0 G(z4) = 2.45 : 2


G(z1) = 0.00 : 0 G(z5) = 4.55 : 5
G(z2) = 0.00 : 0 G(z6) = 5.95 : 6
G(z3) = 1.05 : 1 G(z7) = 7.00 : 7

These results are summarized in Table 3.3, and the transformation function is
sketched in Fig. 3.22(c). Observe that G is not strictly monotonic, so condition
(a¿) is violated. Therefore, we make use of the approach outlined in step 3 of
the algorithm to handle this situation.
In the third step of the procedure, we find the smallest value of zq so that
the value G(zq) is the closest to sk. We do this for every value of sk to create
the required mappings from s to z. For example, s0 = 1, and we see that
G(z3) = 1, which is a perfect match in this case, so we have the correspon-
dence s0 : z3. That is, every pixel whose value is 1 in the histogram equalized
image would map to a pixel valued 3 (in the corresponding location) in the
histogram-specified image. Continuing in this manner, we arrive at the map-
pings in Table 3.4.
In the final step of the procedure, we use the mappings in Table 3.4 to map
every pixel in the histogram equalized image into a corresponding pixel in the
newly created histogram-specified image. The values of the resulting his-
togram are listed in the third column of Table 3.2, and the histogram is
sketched in Fig. 3.22(d). The values of pz (zq) were obtained using the same
procedure as in Example 3.5. For instance, we see in Table 3.4 that s = 1 maps
to z = 3, and there are 790 pixels in the histogram-equalized image with a
value of 1. Therefore, pz (z3) = 790>4096 = 0.19.
Although the final result shown in Fig. 3.22(d) does not match the specified
histogram exactly, the general trend of moving the intensities toward the high
end of the intensity scale definitely was achieved. As mentioned earlier, ob-
taining the histogram-equalized image as an intermediate step is useful for ex-
plaining the procedure, but this is not necessary. Instead, we could list the
mappings from the rs to the ss and from the ss to the zs in a three-column

TABLE 3.3
zq G(zq)
All possible
z0 = 0 0 values of the
z1 = 1 0 transformation
z2 = 2 0 function G scaled,
z3 = 3 1 rounded, and
z4 = 4 2 ordered with
z5 = 5 5 respect to z.
z6 = 6 6
z7 = 7 7
136 Chapter 3 ■ Intensity Transformations and Spatial Filtering

TABLE 3.4
sk : zq
Mappings of all
the values of sk 1 : 3
into corresponding 3 : 4
values of zq. 5 : 5
6 : 6
7 : 7

table. Then, we would use those mappings to map the original pixels directly
into the pixels of the histogram-specified image. ■

EXAMPLE 3.9: ■ Figure 3.23(a) shows an image of the Mars moon, Phobos, taken by NASA’s
Comparison Mars Global Surveyor. Figure 3.23(b) shows the histogram of Fig. 3.23(a). The
between
image is dominated by large, dark areas, resulting in a histogram characterized
histogram
equalization and by a large concentration of pixels in the dark end of the gray scale. At first
histogram glance, one might conclude that histogram equalization would be a good ap-
matching. proach to enhance this image, so that details in the dark areas become more
visible. It is demonstrated in the following discussion that this is not so.
Figure 3.24(a) shows the histogram equalization transformation [Eq. (3.3-8)
or (3.3-13)] obtained from the histogram in Fig. 3.23(b). The most relevant
characteristic of this transformation function is how fast it rises from intensity
level 0 to a level near 190. This is caused by the large concentration of pixels in
the input histogram having levels near 0. When this transformation is applied
to the levels of the input image to obtain a histogram-equalized result, the net
effect is to map a very narrow interval of dark pixels into the upper end of the
gray scale of the output image. Because numerous pixels in the input image
have levels precisely in this interval, we would expect the result to be an image
with a light, washed-out appearance. As Fig. 3.24(b) shows, this is indeed the

a b
FIGURE 3.23
(a) Image of the
Mars moon
7.00
Phobos taken by
Number of pixels (  10 4)

NASA’s Mars
Global Surveyor. 5.25
(b) Histogram.
(Original image
3.50
courtesy of
NASA.)
1.75

0
0 64 128 192 255
Intensity
3.3 ■ Histogram Processing 137

255 a b
c
192 FIGURE 3.24
Output intensity

(a) Transformation
function for
128 histogram
equalization.
(b) Histogram-
64
equalized image
(note the washed-
0 out appearance).
0 64 128 192 255 (c) Histogram
Input intensity of (b).
7.00
Number of pixels (  10 4)

5.25

3.50

1.75

0
0 64 128 192 255
Intensity

case. The histogram of this image is shown in Fig. 3.24(c). Note how all the in-
tensity levels are biased toward the upper one-half of the gray scale.
Because the problem with the transformation function in Fig. 3.24(a) was
caused by a large concentration of pixels in the original image with levels near
0, a reasonable approach is to modify the histogram of that image so that it
does not have this property. Figure 3.25(a) shows a manually specified function
that preserves the general shape of the original histogram, but has a smoother
transition of levels in the dark region of the gray scale. Sampling this function
into 256 equally spaced discrete values produced the desired specified his-
togram. The transformation function G(z) obtained from this histogram using
Eq. (3.3-14) is labeled transformation (1) in Fig. 3.25(b). Similarly, the inverse
transformation G-1(s) from Eq. (3.3-16) (obtained using the step-by-step pro-
cedure discussed earlier) is labeled transformation (2) in Fig. 3.25(b). The en-
hanced image in Fig. 3.25(c) was obtained by applying transformation (2) to
the pixels of the histogram-equalized image in Fig. 3.24(b). The improvement
of the histogram-specified image over the result obtained by histogram equal-
ization is evident by comparing these two images. It is of interest to note that a
rather modest change in the original histogram was all that was required to
obtain a significant improvement in appearance. Figure 3.25(d) shows the his-
togram of Fig. 3.25(c). The most distinguishing feature of this histogram is
how its low end has shifted right toward the lighter region of the gray scale
(but not excessively so), as desired. ■
138 Chapter 3 ■ Intensity Transformations and Spatial Filtering

a c 7.00
b

Number of pixels (  10 4)
d
5.25
FIGURE 3.25
(a) Specified
histogram. 3.50
(b) Transformations.
(c) Enhanced image
1.75
using mappings
from curve (2).
(d) Histogram of (c). 0
0 64 128 192 255
Intensity
255

192
Output intensity

(1)
128
(2)
64

0
0 64 128 192 255
Input intensity
7.00
Number of pixels (  104)

5.25

3.50

1.75

0
0 64 128 192 255
Intensity

Although it probably is obvious by now, we emphasize before leaving this


section that histogram specification is, for the most part, a trial-and-error
process. One can use guidelines learned from the problem at hand, just as we
did in the preceding example. At times, there may be cases in which it is possi-
ble to formulate what an “average” histogram should look like and use that as
the specified histogram. In cases such as these, histogram specification be-
comes a straightforward process. In general, however, there are no rules for
specifying histograms, and one must resort to analysis on a case-by-case basis
for any given enhancement task.
3.3 ■ Histogram Processing 139

3.3.3 Local Histogram Processing


The histogram processing methods discussed in the previous two sections are
global, in the sense that pixels are modified by a transformation function
based on the intensity distribution of an entire image. Although this global ap-
proach is suitable for overall enhancement, there are cases in which it is neces-
sary to enhance details over small areas in an image. The number of pixels in
these areas may have negligible influence on the computation of a global
transformation whose shape does not necessarily guarantee the desired local
enhancement. The solution is to devise transformation functions based on the
intensity distribution in a neighborhood of every pixel in the image.
The histogram processing techniques previously described are easily adapted
to local enhancement. The procedure is to define a neighborhood and move
its center from pixel to pixel. At each location, the histogram of the points in
the neighborhood is computed and either a histogram equalization or his-
togram specification transformation function is obtained. This function is
then used to map the intensity of the pixel centered in the neighborhood. The
center of the neighborhood region is then moved to an adjacent pixel location
and the procedure is repeated. Because only one row or column of the neigh-
borhood changes during a pixel-to-pixel translation of the neighborhood, up-
dating the histogram obtained in the previous location with the new data
introduced at each motion step is possible (Problem 3.12). This approach has
obvious advantages over repeatedly computing the histogram of all pixels in
the neighborhood region each time the region is moved one pixel location.
Another approach used sometimes to reduce computation is to utilize
nonoverlapping regions, but this method usually produces an undesirable
“blocky” effect.

■ Figure 3.26(a) shows an 8-bit, 512 * 512 image that at first glance appears EXAMPLE 3.10:
to contain five black squares on a gray background. The image is slightly noisy, Local histogram
equalization.
but the noise is imperceptible. Figure 3.26(b) shows the result of global his-
togram equalization. As often is the case with histogram equalization of
smooth, noisy regions, this image shows significant enhancement of the noise.
Aside from the noise, however, Fig. 3.26(b) does not reveal any new significant
details from the original, other than a very faint hint that the top left and bot-
tom right squares contain an object. Figure 3.26(c) was obtained using local
histogram equalization with a neighborhood of size 3 * 3. Here, we see signif-
icant detail contained within the dark squares. The intensity values of these ob-
jects were too close to the intensity of the large squares, and their sizes were
too small, to influence global histogram equalization significantly enough to
show this detail. ■

3.3.4 Using Histogram Statistics for Image Enhancement


Statistics obtained directly from an image histogram can be used for image en-
hancement. Let r denote a discrete random variable representing intensity val-
ues in the range [0, L - 1], and let p(ri) denote the normalized histogram
140 Chapter 3 ■ Intensity Transformations and Spatial Filtering

a b c
FIGURE 3.26 (a) Original image. (b) Result of global histogram equalization. (c) Result of local
histogram equalization applied to (a), using a neighborhood of size 3 * 3.

component corresponding to value ri. As indicated previously, we may view


p(ri) as an estimate of the probability that intensity ri occurs in the image from
which the histogram was obtained.
As we discussed in Section 2.6.8, the nth moment of r about its mean is de-
fined as
L-1
mn(r) = a (ri - m)n p(ri) (3.3-17)
i=0

We follow convention in where m is the mean (average intensity) value of r (i.e., the average intensity
using m for the mean
value. Do not confuse it
of the pixels in the image):
with the same symbol L-1
used to denote the num-
ber of rows in an m * n m = a ri p(ri) (3.3-18)
neighborhood, in which i=0
we also follow notational
convention. The second moment is particularly important:
L-1
m2(r) = a (ri - m)2 p(ri) (3.3-19)
i=0

We recognize this expression as the intensity variance, normally denoted by s2


(recall that the standard deviation is the square root of the variance). Whereas
the mean is a measure of average intensity, the variance (or standard devia-
tion) is a measure of contrast in an image. Observe that all moments are com-
puted easily using the preceding expressions once the histogram has been
obtained from a given image.
When working with only the mean and variance, it is common practice to es-
timate them directly from the sample values, without computing the histogram.
Appropriately, these estimates are called the sample mean and sample variance.
They are given by the following familiar expressions from basic statistics:
1 M-1 N-1
MN xa a f(x, y)
m = (3.3-20)
=0 y=0
3.3 ■ Histogram Processing 141

and The denominator in


Eq. (3.3-21) is written

C f(x, y) - m D
sometimes as MN - 1
1 M-1 N-1 2 instead of MN. This is
s2 = a a (3.3-21) done to obtain a so-
MN x = 0 y = 0 called unbiased estimate
of the variance. Howev-
for x = 0, 1, 2, Á , M - 1 and y = 0, 1, 2, Á , N - 1. In other words, as we er, we are more interest-
ed in Eqs. (3.3-21) and
know, the mean intensity of an image can be obtained simply by summing the (3.3-19) agreeing when
values of all its pixels and dividing the sum by the total number of pixels in the the histogram in the lat-
ter equation is computed
image.A similar interpretation applies to Eq. (3.3-21).As we illustrate in the fol- from the same image
lowing example, the results obtained using these two equations are identical to used in Eq. (3.3-21). For
this we require the MN
the results obtained using Eqs. (3.3-18) and (3.3-19), provided that the histogram term. The difference is
used in these equations is computed from the same image used in Eqs. (3.3-20) negligible for any image
of practical size.
and (3.3-21).

■ Before proceeding, it will be useful to work through a simple numerical ex- EXAMPLE 3.11:
ample to fix ideas. Consider the following 2-bit image of size 5 * 5: Computing
histogram
statistics.
0 0 1 1 2
1 2 3 0 1
3 3 2 2 0
2 3 1 0 0
1 1 3 2 2

The pixels are represented by 2 bits; therefore, L = 4 and the intensity levels
are in the range [0, 3]. The total number of pixels is 25, so the histogram has the
components
6 7
p(r0) = = 0.24; p(r1) = = 0.28;
25 25

7 5
p(r2) = = 0.28; p(r3) = = 0.20
25 25
where the numerator in p(ri) is the number of pixels in the image with intensity
level ri. We can compute the average value of the intensities in the image using
Eq. (3.3-18):
3
m = a ri p(ri)
i=0

= (0)(0.24) + (1)(0.28) + (2)(0.28) + (3)(0.20)


= 1.44
Letting f(x, y) denote the preceding 5 * 5 array and using Eq. (3.3-20), we obtain
1 4 4
25 xa a f(x, y)
m =
=0 y=0

= 1.44
142 Chapter 3 ■ Intensity Transformations and Spatial Filtering

As expected, the results agree. Similarly, the result for the variance is the same
(1.1264) using either Eq. (3.3-19) or (3.3-21). ■
We consider two uses of the mean and variance for enhancement purposes.
The global mean and variance are computed over an entire image and are use-
ful for gross adjustments in overall intensity and contrast. A more powerful
use of these parameters is in local enhancement, where the local mean and
variance are used as the basis for making changes that depend on image char-
acteristics in a neighborhood about each pixel in an image.
Let (x, y) denote the coordinates of any pixel in a given image, and let Sxy
denote a neighborhood (subimage) of specified size, centered on (x, y). The
mean value of the pixels in this neighborhood is given by the expression
L-1
mSxy = a ri pSxy (ri) (3.3-22)
i=0

where pSxy is the histogram of the pixels in region Sxy. This histogram has L
components, corresponding to the L possible intensity values in the input image.
However, many of the components are 0, depending on the size of Sxy. For ex-
ample, if the neighborhood is of size 3 * 3 and L = 256, only between 1 and 9
of the 256 components of the histogram of the neighborhood will be nonzero.
These non-zero values will correspond to the number of different intensities in
Sxy (the maximum number of possible different intensities in a 3 * 3 region is 9,
and the minimum is 1).
The variance of the pixels in the neighborhood similarly is given by
L-1
s2Sxy = a (ri - mSxy)2 pSxy(ri) (3.3-23)
i=0

As before, the local mean is a measure of average intensity in neighborhood


Sxy, and the local variance (or standard deviation) is a measure of intensity
contrast in that neighborhood. Expressions analogous to (3.3-20) and (3.3-21)
can be written for neighborhoods. We simply use the pixel values in the neigh-
borhoods in the summations and the number of pixels in the neighborhood in
the denominator.
As the following example illustrates, an important aspect of image process-
ing using the local mean and variance is the flexibility they afford in developing
simple, yet powerful enhancement techniques based on statistical measures
that have a close, predictable correspondence with image appearance.

EXAMPLE 3.12: ■ Figure 3.27(a) shows an SEM (scanning electron microscope) image of a
Local enhance- tungsten filament wrapped around a support. The filament in the center of
ment using
the image and its support are quite clear and easy to study. There is another
histogram
statistics. filament structure on the right, dark side of the image, but it is almost imper-
ceptible, and its size and other characteristics certainly are not easily discern-
able. Local enhancement by contrast manipulation is an ideal approach to
problems such as this, in which parts of an image may contain hidden features.
3.3 ■ Histogram Processing 143

a b c
FIGURE 3.27 (a) SEM image of a tungsten filament magnified approximately 130 *.
(b) Result of global histogram equalization. (c) Image enhanced using local histogram
statistics. (Original image courtesy of Mr. Michael Shaffer, Department of Geological
Sciences, University of Oregon, Eugene.)

In this particular case, the problem is to enhance dark areas while leaving
the light area as unchanged as possible because it does not require enhance-
ment. We can use the concepts presented in this section to formulate an en-
hancement method that can tell the difference between dark and light and, at
the same time, is capable of enhancing only the dark areas. A measure of
whether an area is relatively light or dark at a point (x, y) is to compare the av-
erage local intensity, mSxy, to the average image intensity, called the global
mean and denoted mG. This quantity is obtained with Eq. (3.3-18) or (3.3-20)
using the entire image. Thus, we have the first element of our enhancement
scheme: We will consider the pixel at a point (x, y) as a candidate for processing
if mSxy … k0 mG, where k0 is a positive constant with value less than 1.0.
Because we are interested in enhancing areas that have low contrast, we also
need a measure to determine whether the contrast of an area makes it a candi-
date for enhancement. We consider the pixel at a point (x, y) as a candidate for
enhancement if sSxy … k2sG, where sG is the global standard deviation
obtained using Eqs. (3.3-19) or (3.3-21) and k2 is a positive constant. The value
of this constant will be greater than 1.0 if we are interested in enhancing light
areas and less than 1.0 for dark areas.
Finally, we need to restrict the lowest values of contrast we are willing to ac-
cept; otherwise the procedure would attempt to enhance constant areas, whose
standard deviation is zero. Thus, we also set a lower limit on the local standard
deviation by requiring that k1sG … sSxy, with k1 6 k2. A pixel at (x, y) that
meets all the conditions for local enhancement is processed simply by multi-
plying it by a specified constant, E, to increase (or decrease) the value of its in-
tensity level relative to the rest of the image. Pixels that do not meet the
enhancement conditions are not changed.
144 Chapter 3 ■ Intensity Transformations and Spatial Filtering

We summarize the preceding approach as follows. Let f(x, y) represent the


value of an image at any image coordinates (x, y), and let g(x, y) represent the
corresponding enhanced value at those coordinates. Then,

E # f(x, y) if mSxy … k0 mG AND k1sG … sSxy … k2sG


g(x, y) = c (3.3-24)
f(x, y) otherwise

for x = 0, 1, 2, Á , M - 1 and y = 0, 1, 2, Á , N - 1, where, as indicated


above, E, k0, k1, and k2 are specified parameters, mG is the global mean of the
input image, and sG is its standard deviation. Parameters mSxy and sSxy are the
local mean and standard deviation, respectively. As usual, M and N are the row
and column image dimensions.
Choosing the parameters in Eq. (3.3-24) generally requires a bit of experi-
mentation to gain familiarity with a given image or class of images. In this
case, the following values were selected: E = 4.0, k0 = 0.4, k1 = 0.02, and
k2 = 0.4. The relatively low value of 4.0 for E was chosen so that, when it was
multiplied by the levels in the areas being enhanced (which are dark), the re-
sult would still tend toward the dark end of the scale, and thus preserve the
general visual balance of the image. The value of k0 was chosen as less than
half the global mean because we can see by looking at the image that the areas
that require enhancement definitely are dark enough to be below half the
global mean. A similar analysis led to the choice of values for k1 and k2.
Choosing these constants is not difficult in general, but their choice definitely
must be guided by a logical analysis of the enhancement problem at hand. Fi-
nally, the size of the local area Sxy should be as small as possible in order to
preserve detail and keep the computational burden as low as possible. We
chose a region of size 3 * 3.
As a basis for comparison, we enhanced the image using global histogram
equalization. Figure 3.27(b) shows the result. The dark area was improved but
details still are difficult to discern, and the light areas were changed, something
we did not want to do. Figure 3.27(c) shows the result of using the local statis-
tics method explained above. In comparing this image with the original in Fig.
3.27(a) or the histogram equalized result in Fig. 3.27(b), we note the obvious
detail that has been brought out on the right side of Fig. 3.27(c). Observe, for
example, the clarity of the ridges in the dark filaments. It is noteworthy that
the light-intensity areas on the left were left nearly intact, which was one of
our initial objectives. ■

3.4 Fundamentals of Spatial Filtering


In this section, we introduce several basic concepts underlying the use of spa-
tial filters for image processing. Spatial filtering is one of the principal tools
used in this field for a broad spectrum of applications, so it is highly advisable
that you develop a solid understanding of these concepts. As mentioned at the
beginning of this chapter, the examples in this section deal mostly with the use
of spatial filters for image enhancement. Other applications of spatial filtering
are discussed in later chapters.
3.4 ■ Fundamentals of Spatial Filtering 145

The name filter is borrowed from frequency domain processing, which is


the topic of the next chapter, where “filtering” refers to accepting (passing) or
rejecting certain frequency components. For example, a filter that passes low
frequencies is called a lowpass filter. The net effect produced by a lowpass fil-
ter is to blur (smooth) an image. We can accomplish a similar smoothing di-
rectly on the image itself by using spatial filters (also called spatial masks,
kernels, templates, and windows). In fact, as we show in Chapter 4, there is a
one-to-one correspondence between linear spatial filters and filters in the fre- See Section 2.6.2
regarding linearity.
quency domain. However, spatial filters offer considerably more versatility be-
cause, as you will see later, they can be used also for nonlinear filtering,
something we cannot do in the frequency domain.

3.4.1 The Mechanics of Spatial Filtering


In Fig. 3.1, we explained briefly that a spatial filter consists of (1) a
neighborhood, (typically a small rectangle), and (2) a predefined operation that
is performed on the image pixels encompassed by the neighborhood. Filtering
creates a new pixel with coordinates equal to the coordinates of the center of
the neighborhood, and whose value is the result of the filtering operation.† A
processed (filtered) image is generated as the center of the filter visits each
pixel in the input image. If the operation performed on the image pixels is lin-
ear, then the filter is called a linear spatial filter. Otherwise, the filter is
nonlinear. We focus attention first on linear filters and then illustrate some
simple nonlinear filters. Section 5.3 contains a more comprehensive list of non-
linear filters and their application.
Figure 3.28 illustrates the mechanics of linear spatial filtering using a 3 * 3
neighborhood. At any point (x, y) in the image, the response, g(x, y), of the fil-
ter is the sum of products of the filter coefficients and the image pixels encom-
passed by the filter:
g(x, y) = w(-1, -1)f(x - 1, y - 1) + w(-1, 0)f(x - 1, y) + Á
+ w(0, 0)f(x, y) + Á + w(1, 1)f(x + 1, y + 1)

Observe that the center coefficient of the filter, w(0, 0), aligns with the pixel at
location (x, y). For a mask of size m * n, we assume that m = 2a + 1 and
n = 2b + 1, where a and b are positive integers. This means that our focus in
the following discussion is on filters of odd size, with the smallest being of size It certainly is possible to
work with filters of even
3 * 3. In general, linear spatial filtering of an image of size M * N with a fil- size or mixed even and
ter of size m * n is given by the expression: odd sizes. However,
working with odd sizes
a b simplifies indexing and
g(x, y) = a a w(s, t)f(x + s, y + t) also is more intuitive
because the filters have
s = -a t = -b
centers falling on integer
values.
where x and y are varied so that each pixel in w visits every pixel in f.


The filtered pixel value typically is assigned to a corresponding location in a new image created to hold
the results of filtering. It is seldom the case that filtered pixels replace the values of the corresponding
location in the original image, as this would change the content of the image while filtering still is being
performed.
146 Chapter 3 ■ Intensity Transformations and Spatial Filtering

Image origin
y

Filter mask

Image pixels

w (1, 1) w(1, 0) w (1, 1)


Image

w (0,1) w (0,0) w(0,1)


x

w (1,1) w (1,0) w(1,1)

Filter coefficients
f(x  1, y  1) f (x  1, y) f (x  1, y  1)

f(x, y  1) f(x, y) f(x, y  1)

f (x  1, y  1) f(x  1, y) f (x  1, y  1)

Pixels of image
section under filter

FIGURE 3.28 The mechanics of linear spatial filtering using a 3 * 3 filter mask. The form chosen to denote
the coordinates of the filter mask coefficients simplifies writing expressions for linear filtering.

3.4.2 Spatial Correlation and Convolution


There are two closely related concepts that must be understood clearly when
performing linear spatial filtering. One is correlation and the other is
convolution. Correlation is the process of moving a filter mask over the image
and computing the sum of products at each location, exactly as explained in
the previous section. The mechanics of convolution are the same, except that
the filter is first rotated by 180°. The best way to explain the differences be-
tween the two concepts is by example. We begin with a 1-D illustration.
Figure 3.29(a) shows a 1-D function, f, and a filter, w, and Fig. 3.29(b) shows
the starting position to perform correlation. The first thing we note is that there
3.4 ■ Fundamentals of Spatial Filtering 147

Correlation Convolution

Origin f w Origin f w rotated 180


(a) 0 0 0 1 0 0 0 0 1 2 3 2 8 0 0 0 1 0 0 0 0 8 2 3 2 1 (i)

(b) 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 (j)
1 2 3 2 8 8 2 3 2 1
Starting position alignment

Zero padding

(c) 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 (k)
1 2 3 2 8 8 2 3 2 1

(d) 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 (l)
1 2 3 2 8 8 2 3 2 1
Position after one shift

(e) 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 (m)
1 2 3 2 8 8 2 3 2 1
Position after four shifts

(f) 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 (n)
1 2 3 2 8 8 2 3 2 1
Final position

Full correlation result Full convolution result


(g) 0 0 0 8 2 3 2 1 0 0 0 0 0 0 0 1 2 3 2 8 0 0 0 0 (o)

Cropped correlation result Cropped convolution result


(h) 0 8 2 3 2 1 0 0 0 1 2 3 2 8 0 0 (p)

FIGURE 3.29 Illustration of 1-D correlation and convolution of a filter with a discrete unit impulse. Note that
correlation and convolution are functions of displacement.

are parts of the functions that do not overlap. The solution to this problem is to
pad f with enough 0s on either side to allow each pixel in w to visit every pixel in Zero padding is not the
only option. For example,
f. If the filter is of size m, we need m - 1 0s on either side of f. Figure 3.29(c) we could duplicate the
shows a properly padded function. The first value of correlation is the sum of value of the first and last
element m - 1 times on
products of f and w for the initial position shown in Fig. 3.29(c) (the sum of each side of f, or mirror
products is 0). This corresponds to a displacement x = 0. To obtain the second the first and last m - 1
elements and use the
value of correlation, we shift w one pixel location to the right (a displacement of mirrored values for
x = 1) and compute the sum of products. The result again is 0. In fact, the first padding.

nonzero result is when x = 3, in which case the 8 in w overlaps the 1 in f and the
result of correlation is 8. Proceeding in this manner, we obtain the full correlation
result in Fig. 3.29(g). Note that it took 12 values of x (i.e., x = 0, 1, 2, Á , 11) to
fully slide w past f so that each pixel in w visited every pixel in f. Often, we like
to work with correlation arrays that are the same size as f, in which case we crop
the full correlation to the size of the original function, as Fig. 3.29(h) shows.
148 Chapter 3 ■ Intensity Transformations and Spatial Filtering

There are two important points to note from the discussion in the preceding
paragraph. First, correlation is a function of displacement of the filter. In other
words, the first value of correlation corresponds to zero displacement of the
filter, the second corresponds to one unit displacement, and so on. The second
thing to notice is that correlating a filter w with a function that contains all 0s
and a single 1 yields a result that is a copy of w, but rotated by 180°. We call a
function that contains a single 1 with the rest being 0s a discrete unit impulse.
So we conclude that correlation of a function with a discrete unit impulse
yields a rotated version of the function at the location of the impulse.
The concept of convolution is a cornerstone of linear system theory. As you
will learn in Chapter 4, a fundamental property of convolution is that convolv-
ing a function with a unit impulse yields a copy of the function at the location
of the impulse. We saw in the previous paragraph that correlation yields a copy
Note that rotation by of the function also, but rotated by 180°. Therefore, if we pre-rotate the filter
180° is equivalent to flip-
ping the function hori-
and perform the same sliding sum of products operation, we should be able to
zontally. obtain the desired result. As the right column in Fig. 3.29 shows, this indeed is
the case. Thus, we see that to perform convolution all we do is rotate one func-
tion by 180° and perform the same operations as in correlation. As it turns out,
it makes no difference which of the two functions we rotate.
The preceding concepts extend easily to images, as Fig. 3.30 shows. For a fil-
ter of size m * n, we pad the image with a minimum of m - 1 rows of 0s at
the top and bottom and n - 1 columns of 0s on the left and right. In this case,
m and n are equal to 3, so we pad f with two rows of 0s above and below and
two columns of 0s to the left and right, as Fig. 3.30(b) shows. Figure 3.30(c)
shows the initial position of the filter mask for performing correlation, and
Fig. 3.30(d) shows the full correlation result. Figure 3.30(e) shows the corre-
In 2-D, rotation by 180° sponding cropped result. Note again that the result is rotated by 180°. For con-
is equivalent to flipping
the mask along one axis
volution, we pre-rotate the mask as before and repeat the sliding sum of
and then the other. products just explained. Figures 3.30(f) through (h) show the result. You see
again that convolution of a function with an impulse copies the function at the
location of the impulse. It should be clear that, if the filter mask is symmetric,
correlation and convolution yield the same result.
If, instead of containing a single 1, image f in Fig. 3.30 had contained a re-
gion identically equal to w, the value of the correlation function (after nor-
malization) would have been maximum when w was centered on that region
of f. Thus, as you will see in Chapter 12, correlation can be used also to find
matches between images.
Summarizing the preceding discussion in equation form, we have that the
correlation of a filter w(x, y) of size m * n with an image f(x, y), denoted as
w(x, y)  f(x, y), is given by the equation listed at the end of the last section,
which we repeat here for convenience:
a b
w(x, y)  f(x, y) = a a w(s, t)f(x + s, y + t) (3.4-1)
s = -a t = -b

This equation is evaluated for all values of the displacement variables x and y
so that all elements of w visit every pixel in f, where we assume that f has been
padded appropriately. As explained earlier, a = (m - 1)>2, b = (n - 1)>2,
and we assume for notational convenience that m and n are odd integers.
3.4 ■ Fundamentals of Spatial Filtering 149

Padded f FIGURE 3.30


0 0 0 0 0 0 0 0 0 Correlation
0 0 0 0 0 0 0 0 0 (middle row) and
0 0 0 0 0 0 0 0 0 convolution (last
Origin f(x, y) 0 0 0 0 0 0 0 0 0 row) of a 2-D
0 0 0 0 0 0 0 0 0 1 0 0 0 0 filter with a 2-D
0 0 0 0 0 w(x, y) 0 0 0 0 0 0 0 0 0 discrete, unit
0 0 1 0 0 1 2 3 0 0 0 0 0 0 0 0 0 impulse. The 0s
0 0 0 0 0 4 5 6 0 0 0 0 0 0 0 0 0 are shown in gray
0 0 0 0 0 7 8 9 0 0 0 0 0 0 0 0 0 to simplify visual
(a) (b) analysis.
Initial position for w Full correlation result Cropped correlation result
1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 5 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 8 7 0
7 8 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 5 4 0
0 0 0 0 0 0 0 0 0 0 0 0 9 8 7 0 0 0 0 3 2 1 0
0 0 0 0 1 0 0 0 0 0 0 0 6 5 4 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 3 2 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(c) (d) (e)
Rotated w Full convolution result Cropped convolution result
9 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 5 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 0
3 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 5 6 0
0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 0 0 0 0 7 8 9 0
0 0 0 0 1 0 0 0 0 0 0 0 4 5 6 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 7 8 9 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(f) (g) (h)

In a similar manner, the convolution of w(x, y) and f(x, y), denoted by Often, when the mean-
ing is clear, we denote
w(x, y)  f(x, y),† is given by the expression the result of correlation
or convolution by a func-
a b tion g (x, y), instead of
w(x, y)  f(x, y) = a a w(s, t)f(x - s, y - t) (3.4-2) writing w(x, y)  f(x, y)
s = -a t = -b or w(x, y)  f(x, y). For
example, see the equa-
tion at the end of the
where the minus signs on the right flip f (i.e., rotate it by 180°). Flipping and previous section, and
shifting f instead of w is done for notational simplicity and also to follow Eq. (3.5-1).
convention. The result is the same. As with correlation, this equation is eval-
uated for all values of the displacement variables x and y so that every ele-
ment of w visits every pixel in f, which we assume has been padded
appropriately. You should expand Eq. (3.4-2) for a 3 * 3 mask and convince
yourself that the result using this equation is identical to the example in
Fig. 3.30. In practice, we frequently work with an algorithm that implements


Because correlation and convolution are commutative, we have that w(x, y)  f(x, y)
= f(x, y)  w(x, y) and w(x, y)  f(x, y) = f(x, y)  w(x, y).
150 Chapter 3 ■ Intensity Transformations and Spatial Filtering

Eq. (3.4-1). If we want to perform correlation, we input w into the algorithm;


for convolution, we input w rotated by 180°. The reverse is true if an algo-
rithm that implements Eq. (3.4-2) is available instead.
As mentioned earlier, convolution is a cornerstone of linear system theory.
As you will learn in Chapter 4, the property that the convolution of a function
with a unit impulse copies the function at the location of the impulse plays a
central role in a number of important derivations. We will revisit convolution
in Chapter 4 in the context of the Fourier transform and the convolution the-
orem. Unlike Eq. (3.4-2), however, we will be dealing with convolution of
functions that are of the same size. The form of the equation is the same, but
the limits of summation are different.
Using correlation or convolution to perform spatial filtering is a matter of
preference. In fact, because either Eq. (3.4-1) or (3.4-2) can be made to per-
form the function of the other by a simple rotation of the filter, what is impor-
tant is that the filter mask used in a given filtering task be specified in a way
that corresponds to the intended operation. All the linear spatial filtering re-
sults in this chapter are based on Eq. (3.4-1).
Finally, we point out that you are likely to encounter the terms,
convolution filter, convolution mask or convolution kernel in the image pro-
cessing literature. As a rule, these terms are used to denote a spatial filter,
and not necessarily that the filter will be used for true convolution. Similarly,
“convolving a mask with an image” often is used to denote the sliding, sum-
of-products process we just explained, and does not necessarily differentiate
between correlation and convolution. Rather, it is used generically to denote
either of the two operations. This imprecise terminology is a frequent source
of confusion.

3.4.3 Vector Representation of Linear Filtering


When interest lies in the characteristic response, R, of a mask either for cor-
relation or convolution, it is convenient sometimes to write the sum of
products as

R = w1 z1 + w2 z2 + Á + wmn zmn
mn
= a wk zk (3.4-3)
k=1
Consult the Tutorials sec-
tion of the book Web site = wTz
for a brief review of vec-
tors and matrices.
where the ws are the coefficients of an m * n filter and the zs are the corre-
sponding image intensities encompassed by the filter. If we are interested in
using Eq. (3.4-3) for correlation, we use the mask as given. To use the same
equation for convolution, we simply rotate the mask by 180°, as explained in
the last section. It is implied that Eq. (3.4-3) holds for a particular pair of coor-
dinates (x, y). You will see in the next section why this notation is convenient
for explaining the characteristics of a given linear filter.
3.4 ■ Fundamentals of Spatial Filtering 151

FIGURE 3.31
w1 w2 w3 Another
representation of
a general 3 * 3
filter mask.
w4 w5 w6

w7 w8 w9

As an example, Fig. 3.31 shows a general 3 * 3 mask with coefficients la-


beled as above. In this case, Eq. (3.4-3) becomes
R = w1 z1 + w2 z2 + Á + w9 z9
9
= a wk zk (3.4-4)
k=1
T
= w z
where w and z are 9-dimensional vectors formed from the coefficients of the
mask and the image intensities encompassed by the mask, respectively.

3.4.4 Generating Spatial Filter Masks


Generating an m * n linear spatial filter requires that we specify mn mask co-
efficients. In turn, these coefficients are selected based on what the filter is
supposed to do, keeping in mind that all we can do with linear filtering is to im-
plement a sum of products. For example, suppose that we want to replace the
pixels in an image by the average intensity of a 3 * 3 neighborhood centered
on those pixels. The average value at any location (x, y) in the image is the sum
of the nine intensity values in the 3 * 3 neighborhood centered on (x, y) di-
vided by 9. Letting zi, i = 1, 2, Á , 9, denote these intensities, the average is
1 9
9 ia
R = zi
=1

But this is the same as Eq. (3.4-4) with coefficient values wi = 1>9. In other
words, a linear filtering operation with a 3 * 3 mask whose coefficients are 1> 9
implements the desired averaging. As we discuss in the next section, this oper-
ation results in image smoothing. We discuss in the following sections a num-
ber of other filter masks based on this basic approach.
In some applications, we have a continuous function of two variables, and
the objective is to obtain a spatial filter mask based on that function. For ex-
ample, a Gaussian function of two variables has the basic form
x2 + y2
h(x, y) = e - 2s2

where s is the standard deviation and, as usual, we assume that coordinates x


and y are integers. To generate, say, a 3 * 3 filter mask from this function, we
152 Chapter 3 ■ Intensity Transformations and Spatial Filtering

sample it about its center. Thus, w1 = h(-1, -1), w2 = h(-1, 0), Á ,


w9 = h(1, 1). An m * n filter mask is generated in a similar manner. Recall
that a 2-D Gaussian function has a bell shape, and that the standard deviation
controls the “tightness” of the bell.
Generating a nonlinear filter requires that we specify the size of a neigh-
borhood and the operation(s) to be performed on the image pixels contained
in the neighborhood. For example, recalling that the max operation is nonlin-
ear (see Section 2.6.2), a 5 * 5 max filter centered at an arbitrary point (x, y)
of an image obtains the maximum intensity value of the 25 pixels and assigns
that value to location (x, y) in the processed image. Nonlinear filters are quite
powerful, and in some applications can perform functions that are beyond the
capabilities of linear filters, as we show later in this chapter and in Chapter 5.

3.5 Smoothing Spatial Filters


Smoothing filters are used for blurring and for noise reduction. Blurring is
used in preprocessing tasks, such as removal of small details from an image
prior to (large) object extraction, and bridging of small gaps in lines or curves.
Noise reduction can be accomplished by blurring with a linear filter and also
by nonlinear filtering.
3.5.1 Smoothing Linear Filters
The output (response) of a smoothing, linear spatial filter is simply the average
of the pixels contained in the neighborhood of the filter mask. These filters
sometimes are called averaging filters. As mentioned in the previous section,
they also are referred to a lowpass filters.
The idea behind smoothing filters is straightforward. By replacing the value
of every pixel in an image by the average of the intensity levels in the neigh-
borhood defined by the filter mask, this process results in an image with re-
duced “sharp” transitions in intensities. Because random noise typically
consists of sharp transitions in intensity levels, the most obvious application of
smoothing is noise reduction. However, edges (which almost always are desir-
able features of an image) also are characterized by sharp intensity transitions,
so averaging filters have the undesirable side effect that they blur edges. An-
other application of this type of process includes the smoothing of false con-
tours that result from using an insufficient number of intensity levels, as
discussed in Section 2.4.3. A major use of averaging filters is in the reduction
of “irrelevant” detail in an image. By “irrelevant” we mean pixel regions that
are small with respect to the size of the filter mask. This latter application is il-
lustrated later in this section.
Figure 3.32 shows two 3 * 3 smoothing filters. Use of the first filter yields
the standard average of the pixels under the mask. This can best be seen by
substituting the coefficients of the mask into Eq. (3.4-4):
1 9
9 ia
R = zi
=1

which is the average of the intensity levels of the pixels in the 3 * 3 neighbor-
hood defined by the mask, as discussed earlier. Note that, instead of being 1>9,
3.5 ■ Smoothing Spatial Filters 153

a b
1 1 1 1 2 1 FIGURE 3.32 Two
3 * 3 smoothing
(averaging) filter
1 1
 1 1 1  2 4 2 masks. The
9 16
constant multipli-
er in front of each
1 1 1 1 2 1 mask is equal to 1
divided by the
sum of the values
of its coefficients,
as is required to
compute an
average.
the coefficients of the filter are all 1s. The idea here is that it is computationally
more efficient to have coefficients valued 1. At the end of the filtering process
the entire image is divided by 9. An m * n mask would have a normalizing
constant equal to 1> mn. A spatial averaging filter in which all coefficients are
equal sometimes is called a box filter.
The second mask in Fig. 3.32 is a little more interesting. This mask yields a so-
called weighted average, terminology used to indicate that pixels are multiplied by
different coefficients, thus giving more importance (weight) to some pixels at the
expense of others. In the mask shown in Fig. 3.32(b) the pixel at the center of the
mask is multiplied by a higher value than any other, thus giving this pixel more
importance in the calculation of the average. The other pixels are inversely
weighted as a function of their distance from the center of the mask.The diagonal
terms are further away from the center than the orthogonal neighbors (by a fac-
tor of 12) and, thus, are weighed less than the immediate neighbors of the center
pixel. The basic strategy behind weighing the center point the highest and then
reducing the value of the coefficients as a function of increasing distance from the
origin is simply an attempt to reduce blurring in the smoothing process.We could
have chosen other weights to accomplish the same general objective. However,
the sum of all the coefficients in the mask of Fig. 3.32(b) is equal to 16, an attrac-
tive feature for computer implementation because it is an integer power of 2. In
practice, it is difficult in general to see differences between images smoothed by
using either of the masks in Fig. 3.32, or similar arrangements, because the area
spanned by these masks at any one location in an image is so small.
With reference to Eq. (3.4-1), the general implementation for filtering an
M * N image with a weighted averaging filter of size m * n (m and n odd) is
given by the expression
a b

a a w(s, t)f(x + s, y + t)
s = -a t = -b
g(x, y) = a b
(3.5-1)
a a w(s, t)
s = -a t = -b

The parameters in this equation are as defined in Eq. (3.4-1). As before, it is un-
derstood that the complete filtered image is obtained by applying Eq. (3.5-1)
for x = 0, 1, 2, Á , M - 1 and y = 0, 1, 2, Á , N - 1. The denominator in
154 Chapter 3 ■ Intensity Transformations and Spatial Filtering

Eq. (3.5-1) is simply the sum of the mask coefficients and, therefore, it is a con-
stant that needs to be computed only once.

EXAMPLE 3.13: ■ The effects of smoothing as a function of filter size are illustrated in Fig. 3.33,
Image smoothing which shows an original image and the corresponding smoothed results ob-
with masks of
tained using square averaging filters of sizes m = 3, 5, 9, 15, and 35 pixels, re-
various sizes.
spectively. The principal features of these results are as follows: For m = 3, we
note a general slight blurring throughout the entire image but, as expected, de-
tails that are of approximately the same size as the filter mask are affected con-
siderably more. For example, the 3 * 3 and 5 * 5 black squares in the image,
the small letter “a,” and the fine grain noise show significant blurring when com-
pared to the rest of the image. Note that the noise is less pronounced, and the
jagged borders of the characters were pleasingly smoothed.
The result for m = 5 is somewhat similar, with a slight further increase in
blurring. For m = 9 we see considerably more blurring, and the 20% black cir-
cle is not nearly as distinct from the background as in the previous three im-
ages, illustrating the blending effect that blurring has on objects whose
intensities are close to that of its neighboring pixels. Note the significant fur-
ther smoothing of the noisy rectangles. The results for m = 15 and 35 are ex-
treme with respect to the sizes of the objects in the image. This type of
aggresive blurring generally is used to eliminate small objects from an image.
For instance, the three small squares, two of the circles, and most of the noisy
rectangle areas have been blended into the background of the image in
Fig. 3.33(f). Note also in this figure the pronounced black border. This is a re-
sult of padding the border of the original image with 0s (black) and then
trimming off the padded area after filtering. Some of the black was blended
into all filtered images, but became truly objectionable for the images
smoothed with the larger filters. ■

As mentioned earlier, an important application of spatial averaging is to


blur an image for the purpose of getting a gross representation of objects of
interest, such that the intensity of smaller objects blends with the back-
ground and larger objects become “bloblike” and easy to detect. The size of
the mask establishes the relative size of the objects that will be blended with
the background. As an illustration, consider Fig. 3.34(a), which is an image
from the Hubble telescope in orbit around the Earth. Figure 3.34(b) shows
the result of applying a 15 * 15 averaging mask to this image. We see that a
number of objects have either blended with the background or their inten-
sity has diminished considerably. It is typical to follow an operation like this
with thresholding to eliminate objects based on their intensity. The result of
using the thresholding function of Fig. 3.2(b) with a threshold value equal to
25% of the highest intensity in the blurred image is shown in Fig. 3.34(c).
Comparing this result with the original image, we see that it is a reasonable
representation of what we would consider to be the largest, brightest ob-
jects in that image.
3.5 ■ Smoothing Spatial Filters 155

FIGURE 3.33 (a) Original image, of size 500 * 500 pixels. (b)–(f) Results of smoothing a b
with square averaging filter masks of sizes m = 3, 5, 9, 15, and 35, respectively. The black c d
squares at the top are of sizes 3, 5, 9, 15, 25, 35, 45, and 55 pixels, respectively; their borders e f
are 25 pixels apart. The letters at the bottom range in size from 10 to 24 points, in
increments of 2 points; the large letter at the top is 60 points. The vertical bars are 5 pixels
wide and 100 pixels high; their separation is 20 pixels. The diameter of the circles is 25
pixels, and their borders are 15 pixels apart; their intensity levels range from 0% to 100%
black in increments of 20%. The background of the image is 10% black. The noisy
rectangles are of size 50 * 120 pixels.
156 Chapter 3 ■ Intensity Transformations and Spatial Filtering

a b c
FIGURE 3.34 (a) Image of size 528 * 485 pixels from the Hubble Space Telescope. (b) Image filtered with a
15 * 15 averaging mask. (c) Result of thresholding (b). (Original image courtesy of NASA.)

3.5.2 Order-Statistic (Nonlinear) Filters


Order-statistic filters are nonlinear spatial filters whose response is based on or-
dering (ranking) the pixels contained in the image area encompassed by the fil-
ter, and then replacing the value of the center pixel with the value determined
by the ranking result. The best-known filter in this category is the median filter,
which, as its name implies, replaces the value of a pixel by the median of the in-
tensity values in the neighborhood of that pixel (the original value of the pixel is
included in the computation of the median). Median filters are quite popular be-
cause, for certain types of random noise, they provide excellent noise-reduction
capabilities, with considerably less blurring than linear smoothing filters of simi-
lar size. Median filters are particularly effective in the presence of impulse noise,
also called salt-and-pepper noise because of its appearance as white and black
dots superimposed on an image.
The median, j, of a set of values is such that half the values in the set are
less than or equal to j, and half are greater than or equal to j. In order to per-
form median filtering at a point in an image, we first sort the values of the pixel
in the neighborhood, determine their median, and assign that value to the cor-
responding pixel in the filtered image. For example, in a 3 * 3 neighborhood
the median is the 5th largest value, in a 5 * 5 neighborhood it is the 13th
largest value, and so on. When several values in a neighborhood are the same,
all equal values are grouped. For example, suppose that a 3 * 3 neighborhood
has values (10, 20, 20, 20, 15, 20, 20, 25, 100). These values are sorted as (10, 15,
20, 20, 20, 20, 20, 25, 100), which results in a median of 20. Thus, the principal
function of median filters is to force points with distinct intensity levels to be
more like their neighbors. In fact, isolated clusters of pixels that are light or
dark with respect to their neighbors, and whose area is less than m2>2 (one-
half the filter area), are eliminated by an m * m median filter. In this case
“eliminated” means forced to the median intensity of the neighbors. Larger
clusters are affected considerably less.
3.6 ■ Sharpening Spatial Filters 157

a b c
FIGURE 3.35 (a) X-ray image of circuit board corrupted by salt-and-pepper noise. (b) Noise reduction with
a 3 * 3 averaging mask. (c) Noise reduction with a 3 * 3 median filter. (Original image courtesy of Mr.
Joseph E. Pascente, Lixi, Inc.)

Although the median filter is by far the most useful order-statistic filter in
image processing, it is by no means the only one. The median represents the
50th percentile of a ranked set of numbers, but recall from basic statistics that See Section 10.3.5 regard-
ing percentiles.
ranking lends itself to many other possibilities. For example, using the 100th
percentile results in the so-called max filter, which is useful for finding the
brightest points in an image. The response of a 3 * 3 max filter is given by
R = max5zk ƒ k = 1, 2, Á , 96. The 0th percentile filter is the min filter, used
for the opposite purpose. Median, max, min, and several other nonlinear filters
are considered in more detail in Section 5.3.

■ Figure 3.35(a) shows an X-ray image of a circuit board heavily corrupted EXAMPLE 3.14:
by salt-and-pepper noise. To illustrate the point about the superiority of medi- Use of median
filtering for noise
an filtering over average filtering in situations such as this, we show in Fig.
reduction.
3.35(b) the result of processing the noisy image with a 3 * 3 neighborhood av-
eraging mask, and in Fig. 3.35(c) the result of using a 3 * 3 median filter. The
averaging filter blurred the image and its noise reduction performance was
poor. The superiority in all respects of median over average filtering in this
case is quite evident. In general, median filtering is much better suited than av-
eraging for the removal of salt-and-pepper noise. ■

3.6 Sharpening Spatial Filters


The principal objective of sharpening is to highlight transitions in intensity.
Uses of image sharpening vary and include applications ranging from electron-
ic printing and medical imaging to industrial inspection and autonomous guid-
ance in military systems. In the last section, we saw that image blurring could be
accomplished in the spatial domain by pixel averaging in a neighborhood. Be-
cause averaging is analogous to integration, it is logical to conclude that sharp-
ening can be accomplished by spatial differentiation. This, in fact, is the case,
158 Chapter 3 ■ Intensity Transformations and Spatial Filtering

and the discussion in this section deals with various ways of defining and imple-
menting operators for sharpening by digital differentiation. Fundamentally, the
strength of the response of a derivative operator is proportional to the degree
of intensity discontinuity of the image at the point at which the operator is ap-
plied. Thus, image differentiation enhances edges and other discontinuities
(such as noise) and deemphasizes areas with slowly varying intensities.

3.6.1 Foundation
In the two sections that follow, we consider in some detail sharpening filters
that are based on first- and second-order derivatives, respectively. Before pro-
ceeding with that discussion, however, we stop to look at some of the funda-
mental properties of these derivatives in a digital context. To simplify the
explanation, we focus attention initially on one-dimensional derivatives. In
particular, we are interested in the behavior of these derivatives in areas of
constant intensity, at the onset and end of discontinuities (step and ramp dis-
continuities), and along intensity ramps. As you will see in Chapter 10, these
types of discontinuities can be used to model noise points, lines, and edges in
an image. The behavior of derivatives during transitions into and out of these
image features also is of interest.
The derivatives of a digital function are defined in terms of differences.
There are various ways to define these differences. However, we require that
any definition we use for a first derivative (1) must be zero in areas of constant
intensity; (2) must be nonzero at the onset of an intensity step or ramp; and
(3) must be nonzero along ramps. Similarly, any definition of a second deriva-
tive (1) must be zero in constant areas; (2) must be nonzero at the onset and
end of an intensity step or ramp; and (3) must be zero along ramps of constant
slope. Because we are dealing with digital quantities whose values are finite,
the maximum possible intensity change also is finite, and the shortest distance
over which that change can occur is between adjacent pixels.
A basic definition of the first-order derivative of a one-dimensional func-
tion f(x) is the difference
We return to Eq. (3.6-1)
in Section 10.2.1 and
0f
show how it follows from
= f(x + 1) - f(x) (3.6-1)
0x
a Taylor series expansion.
For now, we accept it as a
definition. We used a partial derivative here in order to keep the notation the same as
when we consider an image function of two variables, f(x, y), at which time we
will be dealing with partial derivatives along the two spatial axes. Use of a par-
tial derivative in the present discussion does not affect in any way the nature
of what we are trying to accomplish. Clearly, 0f>0x = df>dx when there is
only one variable in the function; the same is true for the second derivative.
We define the second-order derivative of f(x) as the difference

0 2f
= f(x + 1) + f(x - 1) - 2f(x) (3.6-2)
0x2
It is easily verified that these two definitions satisfy the conditions stated
above. To illustrate this, and to examine the similarities and differences between
3.6 ■ Sharpening Spatial Filters 159

Intensity transition a
6 b
5 Constant c
intensity Ramp FIGURE 3.36
4 Step
Intensity

Illustration of the
3
first and second
2 derivatives of a
1 1-D digital
0 x function
representing a
Scan x section of a
6 6 6 6 5 4 3 2 1 1 1 1 1 1 6 6 6 6 6
line horizontal
1st derivative 0 0 1 1 1 1 1 0 0 0 0 0 5 0 0 0 0 intensity profile
2nd derivative 0 0 1 0 0 0 0 1 0 0 0 0 5 5 0 0 0 from an image. In
5 (a) and (c) data
4 points are joined
3 by dashed lines as
a visualization aid.
2
1
Intensity

0 x
1 Zero crossing
2
3 First derivative
4 Second derivative
5

first- and second-order derivatives of a digital function, consider the example


in Fig. 3.36.
Figure 3.36(b) (center of the figure) shows a section of a scan line (inten-
sity profile). The values inside the small squares are the intensity values in
the scan line, which are plotted as black dots above it in Fig. 3.36(a). The
dashed line connecting the dots is included to aid visualization. As the fig-
ure shows, the scan line contains an intensity ramp, three sections of con-
stant intensity, and an intensity step. The circles indicate the onset or end of
intensity transitions. The first- and second-order derivatives computed
using the two preceding definitions are included below the scan line in Fig.
3.36(b), and are plotted in Fig. 3.36(c). When computing the first derivative
at a location x, we subtract the value of the function at that location from
the next point. So this is a “look-ahead” operation. Similarly, to compute the
second derivative at x, we use the previous and the next points in the com-
putation. To avoid a situation in which the previous or next points are out-
side the range of the scan line, we show derivative computations in Fig. 3.36
from the second through the penultimate points in the sequence.
Let us consider the properties of the first and second derivatives as we tra-
verse the profile from left to right. First, we encounter an area of constant inten-
sity and, as Figs. 3.36(b) and (c) show, both derivatives are zero there, so condition
(1) is satisfied for both. Next, we encounter an intensity ramp followed by a step,
and we note that the first-order derivative is nonzero at the onset of the ramp and
160 Chapter 3 ■ Intensity Transformations and Spatial Filtering

the step; similarly, the second derivative is nonzero at the onset and end of both
the ramp and the step; therefore, property (2) is satisfied for both derivatives. Fi-
nally, we see that property (3) is satisfied also for both derivatives because the
first derivative is nonzero and the second is zero along the ramp. Note that the
sign of the second derivative changes at the onset and end of a step or ramp. In
fact, we see in Fig. 3.36(c) that in a step transition a line joining these two values
crosses the horizontal axis midway between the two extremes. This zero crossing
property is quite useful for locating edges, as you will see in Chapter 10.
Edges in digital images often are ramp-like transitions in intensity, in which
case the first derivative of the image would result in thick edges because the de-
rivative is nonzero along a ramp. On the other hand, the second derivative would
produce a double edge one pixel thick, separated by zeros. From this, we con-
clude that the second derivative enhances fine detail much better than the first
derivative, a property that is ideally suited for sharpening images.Also, as you will
learn later in this section, second derivatives are much easier to implement than
first derivates, so we focus our attention initially on second derivatives.

3.6.2 Using the Second Derivative for Image


Sharpening—The Laplacian
In this section we consider the implementation of 2-D, second-order deriva-
tives and their use for image sharpening. We return to this derivative in
Chapter 10, where we use it extensively for image segmentation. The approach
basically consists of defining a discrete formulation of the second-order deriv-
ative and then constructing a filter mask based on that formulation. We are in-
terested in isotropic filters, whose response is independent of the direction of
the discontinuities in the image to which the filter is applied. In other words,
isotropic filters are rotation invariant, in the sense that rotating the image and
then applying the filter gives the same result as applying the filter to the image
first and then rotating the result.
It can be shown (Rosenfeld and Kak [1982]) that the simplest isotropic de-
rivative operator is the Laplacian, which, for a function (image) f(x, y) of two
variables, is defined as
0 2f 0 2f
§2f = + (3.6-3)
0x2 0y2
Because derivatives of any order are linear operations, the Laplacian is a lin-
ear operator. To express this equation in discrete form, we use the definition in
Eq. (3.6-2), keeping in mind that we have to carry a second variable. In the
x-direction, we have
0 2f
= f(x + 1, y) + f(x - 1, y) - 2f(x, y) (3.6-4)
0x2
and, similarly, in the y-direction we have
0 2f
= f(x, y + 1) + f(x, y - 1) - 2f(x, y) (3.6-5)
0y2
3.6 ■ Sharpening Spatial Filters 161

Therefore, it follows from the preceding three equations that the discrete
Laplacian of two variables is
§2f(x, y) = f(x + 1, y) + f(x - 1, y) + f(x, y + 1) + f(x, y - 1)
-4f(x, y) (3.6-6)
This equation can be implemented using the filter mask in Fig. 3.37(a), which
gives an isotropic result for rotations in increments of 90°. The mechanics of
implementation are as in Section 3.5.1 for linear smoothing filters. We simply
are using different coefficients here.
The diagonal directions can be incorporated in the definition of the digital
Laplacian by adding two more terms to Eq. (3.6-6), one for each of the two di-
agonal directions. The form of each new term is the same as either Eq. (3.6-4) or
(3.6-5), but the coordinates are along the diagonals. Because each diagonal term
also contains a -2f(x, y) term, the total subtracted from the difference terms
now would be -8f(x, y). Figure 3.37(b) shows the filter mask used to imple-
ment this new definition. This mask yields isotropic results in increments of 45°.
You are likely to see in practice the Laplacian masks in Figs. 3.37(c) and (d).
They are obtained from definitions of the second derivatives that are the nega-
tives of the ones we used in Eqs. (3.6-4) and (3.6-5). As such, they yield equiva-
lent results, but the difference in sign must be kept in mind when combining (by
addition or subtraction) a Laplacian-filtered image with another image.
Because the Laplacian is a derivative operator, its use highlights intensity
discontinuities in an image and deemphasizes regions with slowly varying in-
tensity levels. This will tend to produce images that have grayish edge lines and
other discontinuities, all superimposed on a dark, featureless background.
Background features can be “recovered” while still preserving the sharpening

a b
0 1 0 1 1 1 c d
FIGURE 3.37
(a) Filter mask used
1 4 1 1 8 1
to implement
Eq. (3.6-6).
(b) Mask used to
implement an
0 1 0 1 1 1 extension of this
equation that
includes the
diagonal terms.
0 1 0 1 1 1 (c) and (d) Two
other implementa-
tions of the
Laplacian found
1 4 1 1 8 1
frequently in
practice.

0 1 0 1 1 1
162 Chapter 3 ■ Intensity Transformations and Spatial Filtering

effect of the Laplacian simply by adding the Laplacian image to the original.
As noted in the previous paragraph, it is important to keep in mind which def-
inition of the Laplacian is used. If the definition used has a negative center co-
efficient, then we subtract, rather than add, the Laplacian image to obtain a
sharpened result. Thus, the basic way in which we use the Laplacian for image
sharpening is
g(x, y) = f(x, y) + c C §2f(x, y) D (3.6-7)
where f(x, y) and g(x, y) are the input and sharpened images, respectively.
The constant is c = -1 if the Laplacian filters in Fig. 3.37(a) or (b) are used,
and c = 1 if either of the other two filters is used.

EXAMPLE 3.15: ■ Figure 3.38(a) shows a slightly blurred image of the North Pole of the
Image sharpening moon. Figure 3.38(b) shows the result of filtering this image with the Lapla-
using the
cian mask in Fig. 3.37(a). Large sections of this image are black because the
Laplacian.
Laplacian contains both positive and negative values, and all negative values
are clipped at 0 by the display.
A typical way to scale a Laplacian image is to add to it its minimum value to
bring the new minimum to zero and then scale the result to the full [0, L - 1]
intensity range, as explained in Eqs. (2.6-10) and (2.6-11). The image in
Fig. 3.38(c) was scaled in this manner. Note that the dominant features of the
image are edges and sharp intensity discontinuities. The background, previously
black, is now gray due to scaling. This grayish appearance is typical of Laplacian
images that have been scaled properly. Figure 3.38(d) shows the result obtained
using Eq. (3.6-7) with c = -1. The detail in this image is unmistakably clearer
and sharper than in the original image. Adding the original image to the Lapla-
cian restored the overall intensity variations in the image, with the Laplacian in-
creasing the contrast at the locations of intensity discontinuities.The net result is
an image in which small details were enhanced and the background tonality was
reasonably preserved. Finally, Fig. 3.38(e) shows the result of repeating the pre-
ceding procedure with the filter in Fig. 3.37(b). Here, we note a significant im-
provement in sharpness over Fig. 3.38(d). This is not unexpected because using
the filter in Fig. 3.37(b) provides additional differentiation (sharpening) in the
diagonal directions. Results such as those in Figs. 3.38(d) and (e) have made the
Laplacian a tool of choice for sharpening digital images. ■

3.6.3 Unsharp Masking and Highboost Filtering


A process that has been used for many years by the printing and publishing in-
dustry to sharpen images consists of subtracting an unsharp (smoothed) ver-
sion of an image from the original image. This process, called unsharp masking,
consists of the following steps:

1. Blur the original image.


2. Subtract the blurred image from the original (the resulting difference is
called the mask.)
3. Add the mask to the original.
3.6 ■ Sharpening Spatial Filters 163

a
b c
d e
FIGURE 3.38
(a) Blurred image
of the North Pole
of the moon.
(b) Laplacian
without scaling.
(c) Laplacian with
scaling. (d) Image
sharpened using
the mask in Fig.
3.37(a). (e) Result
of using the mask
in Fig. 3.37(b).
(Original image
courtesy of
NASA.)

-
Letting f (x, y) denote the blurred image, unsharp masking is expressed in
equation form as follows. First we obtain the mask:

gmask(x, y) = f(x, y) - f (x, y) (3.6-8)

Then we add a weighted portion of the mask back to the original image:

g(x, y) = f(x, y) + k * gmask(x, y) (3.6-9)

where we included a weight, k (k Ú 0), for generality. When k = 1, we have


unsharp masking, as defined above. When k 7 1, the process is referred to as
164 Chapter 3 ■ Intensity Transformations and Spatial Filtering

a
b
c Original signal
d
FIGURE 3.39 1-D
illustration of the
mechanics of
unsharp masking. Blurred signal
(a) Original
signal. (b) Blurred
signal with
original shown
Unsharp mask
dashed for refere-
nce. (c) Unsharp
mask. (d) Sharp-
ened signal,
obtained by
adding (c) to (a).
Sharpened signal

highboost filtering. Choosing k 6 1 de-emphasizes the contribution of the un-


sharp mask.
Figure 3.39 explains how unsharp masking works. The intensity profile in
Fig. 3.39(a) can be interpreted as a horizontal scan line through a vertical edge
that transitions from a dark to a light region in an image. Figure 3.39(b) shows
the result of smoothing, superimposed on the original signal (shown dashed)
for reference. Figure 3.39(c) is the unsharp mask, obtained by subtracting the
blurred signal from the original. By comparing this result with the section of
Fig. 3.36(c) corresponding to the ramp in Fig. 3.36(a), we note that the unsharp
mask in Fig. 3.39(c) is very similar to what we would obtain using a second-
order derivative. Figure 3.39(d) is the final sharpened result, obtained by
adding the mask to the original signal. The points at which a change of slope in
the intensity occurs in the signal are now emphasized (sharpened). Observe
that negative values were added to the original. Thus, it is possible for the final
result to have negative intensities if the original image has any zero values or
if the value of k is chosen large enough to emphasize the peaks of the mask to
a level larger than the minimum value in the original. Negative values would
cause a dark halo around edges, which, if k is large enough, can produce objec-
tionable results.

EXAMPLE 3.16: ■ Figure 3.40(a) shows a slightly blurred image of white text on a dark gray
Image sharpening background. Figure 3.40(b) was obtained using a Gaussian smoothing filter
using unsharp
(see Section 3.4.4) of size 5 * 5 with s = 3. Figure 3.40(c) is the unsharp
masking.
mask, obtained using Eq. (3.6-8). Figure 3.40(d) was obtained using unsharp
3.6 ■ Sharpening Spatial Filters 165

a
b
c
d
e
FIGURE 3.40
(a) Original
image.
(b) Result of
blurring with a
Gaussian filter.
(c) Unsharp
mask. (d) Result
of using unsharp
masking.
(e) Result of
using highboost
filtering.

masking [Eq. (3.6-9) with k = 1]. This image is a slight improvement over the
original, but we can do better. Figure 3.40(e) shows the result of using Eq. (3.6-9)
with k = 4.5, the largest possible value we could use and still keep positive all the
values in the final result. The improvement in this image over the original is
significant. ■

3.6.4 Using First-Order Derivatives for (Nonlinear) Image


Sharpening—The Gradient
First derivatives in image processing are implemented using the magnitude of
the gradient. For a function f(x, y), the gradient of f at coordinates (x, y) is de-
fined as the two-dimensional column vector
We discuss the gradient
0f in detail in Section
gx 0x 10.2.5. Here, we are inter-
§f K grad( f ) K B R = D T (3.6-10) ested only in using the
gy 0f magnitude of the gradi-
ent for image sharpening.
0y
This vector has the important geometrical property that it points in the direc-
tion of the greatest rate of change of f at location (x, y).
The magnitude (length) of vector §f, denoted as M(x, y), where

M(x, y) = mag(§f ) = 2 g2x + g2y (3.6-11)

is the value at (x, y) of the rate of change in the direction of the gradient vec-
tor. Note that M(x, y) is an image of the same size as the original, created when
x and y are allowed to vary over all pixel locations in f. It is common practice
to refer to this image as the gradient image (or simply as the gradient when the
meaning is clear).
166 Chapter 3 ■ Intensity Transformations and Spatial Filtering

Because the components of the gradient vector are derivatives, they are lin-
ear operators. However, the magnitude of this vector is not because of the
squaring and square root operations. On the other hand, the partial derivatives
in Eq. (3.6-10) are not rotation invariant (isotropic), but the magnitude of the
gradient vector is. In some implementations, it is more suitable computational-
ly to approximate the squares and square root operations by absolute values:

M(x, y) L ƒ gx ƒ + ƒ gy ƒ (3.6-12)

This expression still preserves the relative changes in intensity, but the isotropic
property is lost in general. However, as in the case of the Laplacian, the isotrop-
ic properties of the discrete gradient defined in the following paragraph are pre-
served only for a limited number of rotational increments that depend on the
filter masks used to approximate the derivatives. As it turns out, the most popu-
lar masks used to approximate the gradient are isotropic at multiples of 90°.
These results are independent of whether we use Eq. (3.6-11) or (3.6-12), so
nothing of significance is lost in using the latter equation if we choose to do so.
As in the case of the Laplacian, we now define discrete approximations to
the preceding equations and from there formulate the appropriate filter
masks. In order to simplify the discussion that follows, we will use the notation
in Fig. 3.41(a) to denote the intensities of image points in a 3 * 3 region. For

a
b c z1 z2 z3
d e
FIGURE 3.41
A 3 * 3 region of
an image (the zs z4 z5 z6
are intensity
values).
(b)–(c) Roberts z7 z8 z9
cross gradient
operators.
(d)–(e) Sobel
operators. All the 1 0 0 1
mask coefficients
sum to zero, as
expected of a
derivative 0 1 1 0
operator.

1 2 1 1 0 1

0 0 0 2 0 2

1 2 1 1 0 1
3.6 ■ Sharpening Spatial Filters 167

example, the center point, z5, denotes f(x, y) at an arbitrary location, (x, y); z1
denotes f(x - 1, y - 1); and so on, using the notation introduced in Fig. 3.28.
As indicated in Section 3.6.1, the simplest approximations to a first-order de-
rivative that satisfy the conditions stated in that section are gx = (z8 - z5) and
gy = (z6 - z5). Two other definitions proposed by Roberts [1965] in the early
development of digital image processing use cross differences:

gx = (z9 - z5) and gy = (z8 - z6) (3.6-13)

If we use Eqs. (3.6-11) and (3.6-13), we compute the gradient image as

M(x, y) = C (z9 - z5)2 + (z8 - z6)2 D


1>2
(3.6-14)

If we use Eqs. (3.6-12) and (3.6-13), then

M(x, y) L ƒ z9 - z5 ƒ + ƒ z8 - z6 ƒ (3.6-15)

where it is understood that x and y vary over the dimensions of the image in
the manner described earlier. The partial derivative terms needed in equation
(3.6-13) can be implemented using the two linear filter masks in Figs. 3.41(b)
and (c). These masks are referred to as the Roberts cross-gradient operators.
Masks of even sizes are awkward to implement because they do not have a
center of symmetry. The smallest filter masks in which we are interested are of
size 3 * 3. Approximations to gx and gy using a 3 * 3 neighborhood centered
on z5 are as follows:
0f
gx = = (z7 + 2z8 + z9) - (z1 + 2z2 + z3) (3.6-16)
0x
and
0f
gy = = (z3 + 2z6 + z9) - (z1 + 2z4 + z7) (3.6-17)
0y
These equations can be implemented using the masks in Figs. 3.41(d) and (e).
The difference between the third and first rows of the 3 * 3 image region im-
plemented by the mask in Fig. 3.41(d) approximates the partial derivative in
the x-direction, and the difference between the third and first columns in the
other mask approximates the derivative in the y-direction. After computing
the partial derivatives with these masks, we obtain the magnitude of the gradi-
ent as before. For example, substituting gx and gy into Eq. (3.6-12) yields

M(x, y) L ƒ (z7 + 2z8 + z9) - (z1 + 2z2 + z3) ƒ


+ ƒ (z3 + 2z6 + z9) - (z1 + 2z4 + z7) ƒ (3.6-18)

The masks in Figs. 3.41(d) and (e) are called the Sobel operators. The idea be-
hind using a weight value of 2 in the center coefficient is to achieve some
smoothing by giving more importance to the center point (we discuss this in
more detail in Chapter 10). Note that the coefficients in all the masks shown in
Fig. 3.41 sum to 0, indicating that they would give a response of 0 in an area of
constant intensity, as is expected of a derivative operator.
168 Chapter 3 ■ Intensity Transformations and Spatial Filtering

As mentioned earlier, the computations of gx and gy are linear opera-


tions because they involve derivatives and, therefore, can be implemented
as a sum of products using the spatial masks in Fig. 3.41. The nonlinear as-
pect of sharpening with the gradient is the computation of M(x, y) involving
squaring and square roots, or the use of absolute values, all of which are
nonlinear operations. These operations are performed after the linear
process that yields gx and gy.

EXAMPLE 3.17: ■ The gradient is used frequently in industrial inspection, either to aid hu-
Use of the mans in the detection of defects or, what is more common, as a preprocessing
gradient for edge
enhancement. step in automated inspection. We will have more to say about this in Chapters
10 and 11. However, it will be instructive at this point to consider a simple ex-
ample to show how the gradient can be used to enhance defects and eliminate
slowly changing background features. In this example, enhancement is used as
a preprocessing step for automated inspection, rather than for human analysis.
Figure 3.42(a) shows an optical image of a contact lens, illuminated by a
lighting arrangement designed to highlight imperfections, such as the two edge
defects in the lens boundary seen at 4 and 5 o’clock. Figure 3.42(b) shows the
gradient obtained using Eq. (3.6-12) with the two Sobel masks in Figs. 3.41(d)
and (e). The edge defects also are quite visible in this image, but with the
added advantage that constant or slowly varying shades of gray have been
eliminated, thus simplifying considerably the computational task required for
automated inspection. The gradient can be used also to highlight small specs
that may not be readily visible in a gray-scale image (specs like these can be
foreign matter, air pockets in a supporting solution, or miniscule imperfections
in the lens). The ability to enhance small discontinuities in an otherwise flat
gray field is another important feature of the gradient. ■

a b
FIGURE 3.42
(a) Optical image
of contact lens
(note defects on
the boundary at 4
and 5 o’clock).
(b) Sobel
gradient.
(Original image
courtesy of Pete
Sites, Perceptics
Corporation.)
202 Chapter 4 ■ Filtering in the Frequency Domain

examples from image enhancement in this chapter not only saves having an
extra chapter in the book but, more importantly, is an effective tool for intro-
ducing newcomers to filtering techniques in the frequency domain. We use
frequency domain processing methods for other applications in Chapters 5, 8,
10, and 11.

4.2 Preliminary Concepts


In order to simplify the progression of ideas presented in this chapter, we
pause briefly to introduce several of the basic concepts that underlie the mate-
rial that follows in later sections.

4.2.1 Complex Numbers


A complex number, C, is defined as
C = R + jI (4.2-1)
where R and I are real numbers, and j is an imaginary number equal to the
square of -1; that is, j = 1 -1. Here, R denotes the real part of the complex
number and I its imaginary part. Real numbers are a subset of complex
numbers in which I = 0. The conjugate of a complex number C, denoted C *,
is defined as
*
C = R - jI (4.2-2)
Complex numbers can be viewed geometrically as points in a plane (called the
complex plane) whose abscissa is the real axis (values of R) and whose ordi-
nate is the imaginary axis (values of I). That is, the complex number R + jI is
point (R, I) in the rectangular coordinate system of the complex plane.
Sometimes, it is useful to represent complex numbers in polar coordinates,
C = ƒ C ƒ (cos u + j sin u) (4.2-3)
where ƒ C ƒ = 2R2 + I2 is the length of the vector extending from the origin of
the complex plane to point (R, I), and u is the angle between the vector and the
real axis. Drawing a simple diagram of the real and complex axes with the vec-
tor in the first quadrant will reveal that tan u = (I>R) or u = arctan(I>R). The
arctan function returns angles in the range [-p>2, p>2]. However, because I
and R can be positive and negative independently, we need to be able to obtain
angles in the full range [-p, p]. This is accomplished simply by keeping track
of the sign of I and R when computing u. Many programming languages do this
automatically via so called four-quadrant arctangent functions. For example,
MATLAB provides the function atan2(Imag, Real) for this purpose.
Using Euler’s formula,
e ju = cos u + j sin u (4.2-4)
where e = 2.71828 Á , gives the following familiar representation of complex
numbers in polar coordinates,
C = ƒ C ƒ e ju (4.2-5)
4.2 ■ Preliminary Concepts 203

where ƒ C ƒ and u are as defined above. For example, the polar representation of
the complex number 1 + j2 is 13e ju, where u = 64.4° or 1.1 radians. The pre-
ceding equations are applicable also to complex functions. For example, a
complex function, F(u), of a variable u, can be expressed as the sum
F(u) = R(u) + jI(u), where R(u) and I(u) are the real and imaginary compo-
nent functions. As previously noted, the complex conjugate is F*(u)
= R(u) - jI(u), the magnitude is ƒ F(u) ƒ = 2R(u)2 + I(u)2, and the angle is
u(u) = arctan[I(u)>R(u)]. We return to complex functions several times in the
course of this and the next chapter.

4.2.2 Fourier Series


As indicated in Section 4.1.1, a function f(t) of a continuous variable t that is pe-
riodic with period, T, can be expressed as the sum of sines and cosines multiplied
by appropriate coefficients. This sum, known as a Fourier series, has the form
q 2pn
f(t) = a cn e j T t (4.2-6)
n = -q

where
T>2
1 2pn
cn = f(t) e-j T t dt for n = 0, ;1, ;2, Á (4.2-7)
T L-T>2
are the coefficients. The fact that Eq. (4.2-6) is an expansion of sines and
cosines follows from Euler’s formula, Eq. (4.2-4). We will return to the Fourier
series later in this section.

4.2.3 Impulses and Their Sifting Property


Central to the study of linear systems and the Fourier transform is the concept An impulse is not a func-
tion in the usual sense. A
of an impulse and its sifting property. A unit impulse of a continuous variable t more accurate name is a
located at t = 0, denoted d(t), is defined as distribution or
generalized function.
However, one often finds
q if t = 0
d(t) = b (4.2-8a) in the literature the
0 if t Z 0 names impulse function,
delta function, and Dirac
delta function, despite the
and is constrained also to satisfy the identity misnomer.
q
d(t) dt = 1 (4.2-8b)
L- q
Physically, if we interpret t as time, an impulse may be viewed as a spike of in-
finity amplitude and zero duration, having unit area. An impulse has the so-
called sifting property with respect to integration, To sift means literally to
separate, or to separate
q out by putting through a
sieve.
f(t) d(t) dt = f(0) (4.2-9)
L- q
provided that f(t) is continuous at t = 0, a condition typically satisfied in prac-
tice. Sifting simply yields the value of the function f(t) at the location of the im-
pulse (i.e., the origin, t = 0, in the previous equation). A more general statement
204 Chapter 4 ■ Filtering in the Frequency Domain

of the sifting property involves an impulse located at an arbitrary point t0, denot-
ed by d(t - t0). In this case, the sifting property becomes
q
f(t) d(t - t0) dt = f(t0) (4.2-10)
L- q
which yields the value of the function at the impulse location, t0. For instance,
if f(t) = cos(t), using the impulse d(t - p) in Eq. (4.2-10) yields the result
f(p) = cos(p) = -1. The power of the sifting concept will become quite evi-
dent shortly.
Let x represent a discrete variable. The unit discrete impulse, d(x), serves the
same purposes in the context of discrete systems as the impulse d(t) does when
working with continuous variables. It is defined as
1 x = 0
d(x) = b (4.2-11a)
0 x Z 0
Clearly, this definition also satisfies the discrete equivalent of Eq. (4.2-8b):
q

a d(x) = 1 (4.2-11b)
x = -q

The sifting property for discrete variables has the form


q

a f(x) d(x) = f(0) (4.2-12)


x = -q

or, more generally using a discrete impulse located at x = x0,


q

a f(x) d(x - x0) = f(x0) (4.2-13)


x = -q

As before, we see that the sifting property simply yields the value of the func-
tion at the location of the impulse. Figure 4.2 shows the unit discrete impulse
diagrammatically. Unlike its continuous counterpart, the discrete impulse is an
ordinary function.
Of particular interest later in this section is an impulse train, s¢T(t), defined
as the sum of infinitely many periodic impulses ¢T units apart:
q
s¢T(t) = a d(t - n¢T) (4.2-14)
q n=-

FIGURE 4.2
d(x  x0)
A unit discrete
1
impulse located at
x = x0. Variable x
is discrete, and d
is 0 everywhere
except at x = x0.
x
0 x0
4.2 ■ Preliminary Concepts 205

sT (t) FIGURE 4.3 An


impulse train.

... ...

t
. . . 3T 2T T 0 T 2T 3T . . .

Figure 4.3 shows an impulse train. The impulses can be continuous or discrete.

4.2.4 The Fourier Transform of Functions of


One Continuous Variable
The Fourier transform of a continuous function f(t) of a continuous variable, t,
denoted ᑣ5f(t)6, is defined by the equation†
q
ᑣ5f(t)6 = f(t) e -j2pmt dt (4.2-15)
L- q
where m is also a continuous variable. Because t is integrated out, ᑣ5f(t)6 is a
function only of m. We denote this fact explicitly by writing the Fourier trans-
form as ᑣ5f(t)6 = F(m); that is, the Fourier transform of f(t) may be written
for convenience as
q
F(m) = f(t) e -j2pmt dt (4.2-16)
L- q
Conversely, given F(m), we can obtain f(t) back using the inverse Fourier
transform, f(t) = ᑣ-15F(m)6, written as
q
f(t) = F(m) e j2pmt dm (4.2-17)
L- q
where we made use of the fact that variable m is integrated out in the inverse
transform and wrote simple f(t), rather than the more cumbersome notation
f(t) = ᑣ-15F(m)6. Equations (4.2-16) and (4.2-17) comprise the so-called
Fourier transform pair. They indicate the important fact mentioned in
Section 4.1 that a function can be recovered from its transform.
Using Euler’s formula we can express Eq. (4.2-16) as

f(t) C cos(2pmt) - j sin(2pmt) D dt


q
F(m) = (4.2-18)
L- q


Conditions for the existence of the Fourier transform are complicated to state in general (Champeney
[1987]), but a sufficient condition for its existence is that the integral of the absolute value of f(t), or the
integral of the square of f(t), be finite. Existence is seldom an issue in practice, except for idealized sig-
nals, such as sinusoids that extend forever. These are handled using generalized impulse functions. Our
primary interest is in the discrete Fourier transform pair which, as you will see shortly, is guaranteed to
exist for all finite functions.
206 Chapter 4 ■ Filtering in the Frequency Domain

If f(t) is real, we see that its transform in general is complex. Note that the
Fourier transform is an expansion of f(t) multiplied by sinusoidal terms whose
frequencies are determined by the values of m (variable t is integrated out, as
mentioned earlier). Because the only variable left after integration is frequen-
cy, we say that the domain of the Fourier transform is the frequency domain.
We discuss the frequency domain and its properties in more detail later in this
For consistency in termi- chapter. In our discussion, t can represent any continuous variable, and the
nology used in the previ-
ous two chapters, and to units of the frequency variable m depend on the units of t. For example, if t rep-
be used later in this resents time in seconds, the units of m are cycles/sec or Hertz (Hz). If t repre-
chapter in connection
with images, we refer to sents distance in meters, then the units of m are cycles/meter, and so on. In
the domain of variable t other words, the units of the frequency domain are cycles per unit of the inde-
in general as the spatial
domain. pendent variable of the input function.

EXAMPLE 4.1: ■ The Fourier transform of the function in Fig. 4.4(a) follows from Eq. (4.2-16):
Obtaining the
Fourier transform q W/2
of a simple F(m) = f(t) e -j2pmt dt = Ae -j2pmt dt
function. L- q L-W/2

Ce D -W/2 = Ce - e jpmW D
-A -j2pmt W/2 -A -jpmW
=
j2pm j2pm

C e jpmW - e-jpmW D
A
=
j2pm
sin(pmW)
= AW
(pmW)

where we used the trigonometric identity sin u = (e ju - e-ju)>2j. In this case


the complex terms of the Fourier transform combined nicely into a real sine

f(t) F( m) F ( m)

AW AW

1/W 1/W
t m m
W/ 2 0 W/ 2 0 . . . 2/W 0
. . . 2/W 2/W . . . 2/W . . .
1/W 1/W

a b c
FIGURE 4.4 (a) A simple function; (b) its Fourier transform; and (c) the spectrum. All functions extend to
infinity in both directions.
4.2 ■ Preliminary Concepts 207

function. The result in the last step of the preceding expression is known as the
sinc function:
sin(pm)
sinc(m) = (4.2-19)
(pm)
where sinc(0) = 1, and sinc(m) = 0 for all other integer values of m. Figure 4.4(b)
shows a plot of F(m).
In general, the Fourier transform contains complex terms, and it is custom-
ary for display purposes to work with the magnitude of the transform (a real
quantity), which is called the Fourier spectrum or the frequency spectrum:

ƒ F(m) ƒ = AT ` `
sin(pmW)
(pmW)
Figure 4.4(c) shows a plot of ƒ F(m) ƒ as a function of frequency. The key prop-
erties to note are that the locations of the zeros of both F(m) and ƒ F(m) ƒ are
inversely proportional to the width, W, of the “box” function, that the height of
the lobes decreases as a function of distance from the origin, and that the func-
tion extends to infinity for both positive and negative values of m. As you will
see later, these properties are quite helpful in interpreting the spectra of two-
dimensional Fourier transforms of images. ■

■ The Fourier transform of a unit impulse located at the origin follows from EXAMPLE 4.2:
Eq. (4.2-16): Fourier transform
of an impulse and
q
of an impulse
F(m) = d(t) e -j2pmt dt train.
L- q
q
= e -j2pmt d(t) dt
L- q
= e-j2pm0 = e0
= 1
where the third step follows from the sifting property in Eq. (4.2-9). Thus, we
see that the Fourier transform of an impulse located at the origin of the spatial
domain is a constant in the frequency domain. Similarly, the Fourier transform
of an impulse located at t = t0 is
q
F(m) = d(t - t0) e -j2pmt dt
L- q
q
= e -j2pmt d(t - t0) dt
L- q
= e-j2pmt0
= cos(2pmt0) - j sin(2pmt0)
208 Chapter 4 ■ Filtering in the Frequency Domain

where the third line follows from the sifting property in Eq. (4.2-10) and the
last line follows from Euler’s formula. These last two lines are equivalent rep-
resentations of a unit circle centered on the origin of the complex plane.
In Section 4.3, we make use of the Fourier transform of a periodic im-
pulse train. Obtaining this transform is not as straightforward as we just
showed for individual impulses. However, understanding how to derive the
transform of an impulse train is quite important, so we take the time to de-
rive it in detail here. We start by noting that the only difference in the form
of Eqs. (4.2-16) and (4.2-17) is the sign of the exponential. Thus, if a function
f(t) has the Fourier transform F(m), then the latter function evaluated at t,
that is, F(t), must have the transform f( -m). Using this symmetry property
and given, as we showed above, that the Fourier transform of an impulse
d(t - t0) is e-j2pmt0, it follows that the function e -j2pt0 t has the transform
d(-m - t0). By letting -t0 = a, it follows that the transform of e j2pat is
d(-m + a) = d(m - a), where the last step is true because d is not zero only
when m = a, which is the same result for either d(-m + a) or d(m - a), so
the two forms are equivalent.
The impulse train s¢T(t) in Eq. (4.2-14) is periodic with period ¢T, so we
know from Section 4.2.2 that it can be expressed as a Fourier series:
q 2pn
s¢T (t) = a cn e j ¢T t
n = -q

where
¢T>2
1 2pn
cn = s¢T (t) e -j ¢T t dt
¢T L-¢T>2

With reference to Fig. 4.3, we see that the integral in the interval
[- ¢T>2, ¢T>2] encompasses only the impulse of s¢T(t) that is located at the
origin. Therefore, the preceding equation becomes
¢T>2
1 2pn
cn = d(t) e -j ¢T t dt
¢T L-¢T>2

1 0
= e
¢T
1
=
¢T

The Fourier series expansion then becomes

1 q j 2pnt
¢T n =a
s¢T (t) = e ¢T
-q

Our objective is to obtain the Fourier transform of this expression. Because


summation is a linear process, obtaining the Fourier transform of a sum is
4.2 ■ Preliminary Concepts 209

the same as obtaining the sum of the transforms of the individual compo-
nents. These components are exponentials, and we established earlier in this
example that

ᑣ E e j ¢T t F = d ¢m -
2pn n

¢T

So, S(m), the Fourier transform of the periodic impulse train s¢T (t), is

S(m) = ᑣ E s¢T (t) F

1 q j 2pnt
= ᑣb e ¢T r
¢T n =a
-q

1 q 2pn
= ᑣ b a e j ¢T t r
¢T n = -q

b
1 q n
¢T n =a
= dam -
-q ¢T

This fundamental result tells us that the Fourier transform of an impulse train
with period ¢T is also an impulse train, whose period is 1>¢T. This inverse
proportionality between the periods of s¢T (t) and S(m) is analogous to what
we found in Fig. 4.4 in connection with a box function and its transform. This
property plays a fundamental role in the remainder of this chapter. ■

4.2.5 Convolution
We need one more building block before proceeding. We introduced the idea
of convolution in Section 3.4.2. You learned in that section that convolution of
two functions involves flipping (rotating by 180°) one function about its origin
and sliding it past the other. At each displacement in the sliding process, we
perform a computation, which in the case of Chapter 3 was a sum of products.
In the present discussion, we are interested in the convolution of two continu-
ous functions, f(t) and h(t), of one continuous variable, t, so we have to use in-
tegration instead of a summation. The convolution of these two functions,
denoted as before by the operator , is defined as
q
f(t)  h(t) = f(t) h(t - t) dt (4.2-20)
L- q
where the minus sign accounts for the flipping just mentioned, t is the
displacement needed to slide one function past the other, and t is a dummy
variable that is integrated out. We assume for now that the functions extend
from - q to q .
We illustrated the basic mechanics of convolution in Section 3.4.2, and we
will do so again later in this chapter and in Chapter 5. At the moment, we are
210 Chapter 4 ■ Filtering in the Frequency Domain

interested in finding the Fourier transform of Eq. (4.2-20). We start with


Eq. (4.2-15):

ᑣ E f(t)  h(t) F =
q q
B f(t) h(t - t) dt R e -j2pmt dt
L- q L- q
q q
= f(t) B h(t - t) e -j2pmt dt R dt
L- q L- q

The term inside the brackets is the Fourier transform of h(t - t). We show
later in this chapter that ᑣ5h(t - t)6 = H(m)e-j2pmt, where H(m) is the
Fourier transform of h(t). Using this fact in the preceding equation gives us

ᑣ E f(t)  h(t) F = f(t) C H(m) e -j2pmt D dt


q
The same result would
be obtained if the order
of f(t) and h(t) were
L- q
reversed, so convolution
is commutative. q
= H(m) f(t) e -j2pmt dt
L- q

= H(m) F(m)

Recalling from Section 4.2.4 that we refer to the domain of t as the spatial do-
main, and the domain of m as the frequency domain, the preceding equation
tells us that the Fourier transform of the convolution of two functions in the
spatial domain is equal to the product in the frequency domain of the Fourier
transforms of the two functions. Conversely, if we have the product of the two
transforms, we can obtain the convolution in the spatial domain by computing
the inverse Fourier transform. In other words, f(t)  h(t) and H(u) F(u) are a
Fourier transform pair. This result is one-half of the convolution theorem and
is written as

f(t)  h(t) 3 H(m) F(m) (4.2-21)

The double arrow is used to indicate that the expression on the right is ob-
tained by taking the Fourier transform of the expression on the left, while the
expression on the left is obtained by taking the inverse Fourier transform of
the expression on the right.
Following a similar development would result in the other half of the con-
volution theorem:

f(t)h(t) 3 H(m)  F(m) (4.2-22)

which states that convolution in the frequency domain is analogous to multi-


plication in the spatial domain, the two being related by the forward and in-
verse Fourier transforms, respectively. As you will see later in this chapter, the
convolution theorem is the foundation for filtering in the frequency domain.
4.5 ■ Extension to Functions of Two Variables 225

4.5 Extension to Functions of Two Variables


In this section, we extend to two variables the concepts introduced in Sections
4.2 through 4.4.

4.5.1 The 2-D Impulse and Its Sifting Property


The impulse, d(t, z), of two continuous variables, t and z, is defined as in
Eq. (4.2-8):

q if t = z = 0
d(t, z) = b (4.5-1a)
0 otherwise

and
q q
d(t, z) dt dz = 1 (4.5-1b)
L- q L- q
As in the 1-D case, the 2-D impulse exhibits the sifting property under
integration,
q q
f(t, z) d(t, z) dt dz = f(0, 0) (4.5-2)
L- q L- q
or, more generally for an impulse located at coordinates (t0, z0),
q q
f(t, z) d(t - t0, z - z0) dt dz = f(t0, z0) (4.5-3)
L- q L- q
As before, we see that the sifting property yields the value of the function
f(t, z) at the location of the impulse.
For discrete variables x and y, the 2-D discrete impulse is defined as

1 if x = y = 0
d(x, y) = b (4.5-4)
0 otherwise

and its sifting property is


q q

a a f(x, y) d(x, y) = f(0, 0) (4.5-5)


x = -q y = -q

where f(x, y) is a function of discrete variables x and y. For an impulse located


at coordinates (x0, y0) (see Fig. 4.12) the sifting property is
q q

a a f(x, y) d(x - x0, y - y0) = f(x0, y0) (4.5-6)


x = -q y = -q

As before, the sifting property of a discrete impulse yields the value of the dis-
crete function f(x, y) at the location of the impulse.
226 Chapter 4 ■ Filtering in the Frequency Domain

FIGURE 4.12 d (x  x0, y  y0)


Two-dimensional
unit discrete
impulse. Variables
x and y are 1
discrete, and d is
zero everywhere
except at
coordinates
(x0, y0). x0
x y0 y

4.5.2 The 2-D Continuous Fourier Transform Pair


Let f(t, z) be a continuous function of two continuous variables, t and z. The
two-dimensional, continuous Fourier transform pair is given by the expressions
q q
F(m, n) = f(t, z) e -j2p(mt + nz) dtdz (4.5-7)
L- q L- q

and
q q
f(t, z) = F(m, n) e j2p(mt + nz) dm dn (4.5-8)
L- q L- q

where m and n are the frequency variables. When referring to images, t and z
are interpreted to be continuous spatial variables. As in the 1-D case, the do-
main of the variables m and n defines the continuous frequency domain.

EXAMPLE 4.5: ■ Figure 4.13(a) shows a 2-D function analogous to the 1-D case in Example 4.1.
Obtaining the 2-D Following a procedure similar to the one used in that example gives the result
Fourier transform
of a simple q q
function. F(m, n) = f(t, z) e -j 2p(mt + nz) dt dz
L- q L- q
T> 2 Z> 2
= A e -j 2p(mt + nz) dt dz
L-T> 2 L-Z> 2

sin(pmT) sin(pnZ)
= ATZ B RB R
(pmT) (pnZ)

The magnitude (spectrum) is given by the expression

ƒ F(m, n) ƒ = ATZ ` `` `
sin(pmT) sin(pnZ)
(pmT) (pnZ)

Figure 4.13(b) shows a portion of the spectrum about the origin. As in the 1-D
case, the locations of the zeros in the spectrum are inversely proportional to
4.5 ■ Extension to Functions of Two Variables 227

F( m, n)

f(t, z) ATZ

T Z
A

t T/2 Z/2 z
m n

a b
FIGURE 4.13 (a) A 2-D function, and (b) a section of its spectrum (not to scale). The
block is longer along the t-axis, so the spectrum is more “contracted” along the m-axis.
Compare with Fig. 4.4.

the values of T and Z. Thus, the larger T and Z are, the more “contracted” the
spectrum will become, and vice versa. ■

4.5.3 Two-Dimensional Sampling and the 2-D Sampling Theorem


In a manner similar to the 1-D case, sampling in two dimensions can be mod-
eled using the sampling function (2-D impulse train):

q q
s¢T¢Z (t, z) = aq aqd(t - m¢T, z - n¢Z) (4.5-9)
m=- n=-

where ¢T and ¢Z are the separations between samples along the t- and z-axis
of the continuous function f(t, z). Equation (4.5-9) describes a set of periodic
impulses extending infinitely along the two axes (Fig. 4.14). As in the 1-D case
illustrated in Fig. 4.5, multiplying f(t, z) by s¢T¢Z (t, z) yields the sampled
function.
Function f(t, z) is said to be band-limited if its Fourier transform is 0 out-
side a rectangle established by the intervals [-mmax, mmax] and [-nmax, nmax];
that is,

F(m, n) = 0 for ƒ m ƒ Ú mmax and ƒ n ƒ Ú nmax (4.5-10)

The two-dimensional sampling theorem states that a continuous, band-limited


function f(t, z) can be recovered with no error from a set of its samples if the
sampling intervals are

1
¢T 6 (4.5-11)
2mmax
and

1
¢Z 6 (4.5-12)
2nmax
or, expressed in terms of the sampling rate, if
228 Chapter 4 ■ Filtering in the Frequency Domain

FIGURE 4.14 sTZ (t, z)


Two-dimensional
impulse train.

... ...

t z
... Z T ...

1
7 2mmax (4.5-13)
¢T
and

1
7 2nmax (4.5-14)
¢Z
Stated another way, we say that no information is lost if a 2-D, band-limited, con-
tinuous function is represented by samples acquired at rates greater than twice
the highest frequency content of the function in both the m- and n-directions.
Figure 4.15 shows the 2-D equivalents of Figs. 4.6(b) and (d). A 2-D ideal box
filter has the form illustrated in Fig. 4.13(a). The dashed portion of Fig. 4.15(a)
shows the location of the filter to achieve the necessary isolation of a single pe-
riod of the transform for reconstruction of a band-limited function from its sam-
ples, as in Section 4.3.3. From Section 4.3.4, we know that if the function is
under-sampled the periods overlap, and it becomes impossible to isolate a single
period, as Fig. 4.15(b) shows. Aliasing would result under such conditions.

4.5.4 Aliasing in Images


In this section, we extend the concept of aliasing to images and discuss several
aspects related to image sampling and resampling.

a b
Footprint of an
FIGURE 4.15 ideal lowpass
Two-dimensional (box) filter
Fourier transforms
of (a) an over-
sampled, and v v
(b) under-sampled
band-limited
function. m max vmax

m m
4.5 ■ Extension to Functions of Two Variables 229

Extension from 1-D aliasing


As in the 1-D case, a continuous function f(t, z) of two continuous variables, t and
z, can be band-limited in general only if it extends infinitely in both coordinate di-
rections.The very act of limiting the duration of the function introduces corrupting
frequency components extending to infinity in the frequency domain, as explained
in Section 4.3.4. Because we cannot sample a function infinitely, aliasing is always
present in digital images, just as it is present in sampled 1-D functions. There are
two principal manifestations of aliasing in images: spatial aliasing and temporal
aliasing. Spatial aliasing is due to under-sampling, as discussed in Section 4.3.4.
Temporal aliasing is related to time intervals between images in a sequence of im-
ages. One of the most common examples of temporal aliasing is the “wagon
wheel” effect, in which wheels with spokes in a sequence of images (for example,
in a movie) appear to be rotating backwards.This is caused by the frame rate being
too low with respect to the speed of wheel rotation in the sequence.
Our focus in this chapter is on spatial aliasing. The key concerns with spatial
aliasing in images are the introduction of artifacts such as jaggedness in line
features, spurious highlights, and the appearance of frequency patterns not pre-
sent in the original image. The following example illustrates aliasing in images.

■ Suppose that we have an imaging system that is perfect, in the sense that it EXAMPLE 4.6:
is noiseless and produces an exact digital image of what it sees, but the number Aliasing in
images.
of samples it can take is fixed at 96 * 96 pixels. If we use this system to digitize
checkerboard patterns, it will be able to resolve patterns that are up to This example should not
be construed as being un-
96 * 96 squares, in which the size of each square is 1 * 1 pixels. In this limit- realistic. Sampling a
ing case, each pixel in the resulting image will correspond to one square in the “perfect” scene under
noiseless, distortion-free
pattern. We are interested in examining what happens when the detail (the conditions is common
size of the checkerboard squares) is less than one camera pixel; that is, when when converting computer-
generated models and
the imaging system is asked to digitize checkerboard patterns that have more vector drawings to digital
than 96 * 96 squares in the field of view. images.
Figures 4.16(a) and (b) show the result of sampling checkerboards whose
squares are of size 16 and 6 pixels on the side, respectively. These results are as
expected. However, when the size of the squares is reduced to slightly less than
one camera pixel a severely aliased image results, as Fig. 4.16(c) shows. Finally,
reducing the size of the squares to slightly less than 0.5 pixels on the side yielded
the image in Fig. 4.16(d). In this case, the aliased result looks like a normal
checkerboard pattern. In fact, this image would result from sampling a checker-
board image whose squares were 12 pixels on the side. This last image is a good
reminder that aliasing can create results that may be quite misleading. ■

The effects of aliasing can be reduced by slightly defocusing the scene to be


digitized so that high frequencies are attenuated. As explained in Section 4.3.4,
anti-aliasing filtering has to be done at the “front-end,” before the image is
sampled. There are no such things as after-the-fact software anti-aliasing filters
that can be used to reduce the effects of aliasing caused by violations of the
sampling theorem. Most commercial digital image manipulation packages do
have a feature called “anti-aliasing.” However, as illustrated in Examples 4.7
230 Chapter 4 ■ Filtering in the Frequency Domain

a b
c d
FIGURE 4.16 Aliasing in images. In (a) and (b), the lengths of the sides of the squares
are 16 and 6 pixels, respectively, and aliasing is visually negligible. In (c) and (d), the
sides of the squares are 0.9174 and 0.4798 pixels, respectively, and the results show
significant aliasing. Note that (d) masquerades as a “normal” image.

and 4.8, this term is related to blurring a digital image to reduce additional
aliasing artifacts caused by resampling. The term does not apply to reducing
aliasing in the original sampled image. A significant number of commercial
digital cameras have true anti-aliasing filtering built in, either in the lens or on
the surface of the sensor itself. For this reason, it is difficult to illustrate alias-
ing using images obtained with such cameras.

Image interpolation and resampling


As in the 1-D case, perfect reconstruction of a band-limited image function
from a set of its samples requires 2-D convolution in the spatial domain with a
sinc function. As explained in Section 4.3.5, this theoretically perfect recon-
struction requires interpolation using infinite summations which, in practice,
forces us to look for approximations. One of the most common applications of
2-D interpolation in image processing is in image resizing (zooming and
shrinking). Zooming may be viewed as over-sampling, while shrinking may be
viewed as under-sampling. The key difference between these two operations
and the sampling concepts discussed in previous sections is that zooming and
shrinking are applied to digital images.
Interpolation was explained in Section 2.4.4. Our interest there was to illus-
trate the performance of nearest neighbor, bilinear, and bicubic interpolation.
In this section, we give some additional examples with a focus on sampling and
anti-aliasing issues. A special case of nearest neighbor interpolation that ties in
nicely with over-sampling is zooming by pixel replication, which is applicable
when we want to increase the size of an image an integer number of times. For
4.5 ■ Extension to Functions of Two Variables 231

instance, to double the size of an image, we duplicate each column. This dou-
bles the image size in the horizontal direction. Then, we duplicate each row of
the enlarged image to double the size in the vertical direction. The same pro-
cedure is used to enlarge the image any integer number of times. The intensity-
level assignment of each pixel is predetermined by the fact that new locations
are exact duplicates of old locations.
Image shrinking is done in a manner similar to zooming. Under-sampling is
achieved by row-column deletion (e.g., to shrink an image by one-half, we
delete every other row and column). We can use the zooming grid analogy in
Section 2.4.4 to visualize the concept of shrinking by a non-integer factor, ex-
cept that we now expand the grid to fit over the original image, do intensity-
level interpolation, and then shrink the grid back to its specified size. To reduce
aliasing, it is a good idea to blur an image slightly before shrinking it (we discuss The process of resam-
pling an image without
frequency domain blurring in Section 4.8). An alternate technique is to super- using band-limiting blur-
sample the original scene and then reduce (resample) its size by row and col- ring is called decimation.
umn deletion. This can yield sharper results than with smoothing, but it clearly
requires access to the original scene. Clearly, if we have no access to the original
scene (as typically is the case in practice) super-sampling is not an option.

■ The effects of aliasing generally are worsened when the size of a digital EXAMPLE 4.7:
image is reduced. Figure 4.17(a) is an image purposely created to illustrate the Illustration of
aliasing in
effects of aliasing (note the thinly-spaced parallel lines in all garments worn by
resampled images.
the subject). There are no objectionable artifacts in Fig. 4.17(a), indicating that

a b c
FIGURE 4.17 Illustration of aliasing on resampled images. (a) A digital image with negligible visual aliasing.
(b) Result of resizing the image to 50% of its original size by pixel deletion. Aliasing is clearly visible.
(c) Result of blurring the image in (a) with a 3 * 3 averaging filter prior to resizing. The image is slightly
more blurred than (b), but aliasing is not longer objectionable. (Original image courtesy of the Signal
Compression Laboratory, University of California, Santa Barbara.)
232 Chapter 4 ■ Filtering in the Frequency Domain

the sampling rate used initially was sufficient to avoid visible aliasing. In
Fig. 4.17(b), the image was reduced to 50% of its original size using row-
column deletion. The effects of aliasing are quite visible in this image (see,
for example the areas around the subject’s knees). The digital “equivalent”
of anti-aliasing filtering of continuous images is to attenuate the high fre-
quencies of a digital image by smoothing it before resampling. Figure
4.17(c) shows the result of smoothing the image in Fig. 4.17(a) with a 3 * 3
averaging filter (see Section 3.5) before reducing its size. The improvement
over Fig. 4.17(b) is evident. Images (b) and (c) were resized up to their orig-
inal dimension by pixel replication to simplify comparisons. ■

When you work with images that have strong edge content, the effects of
aliasing are seen as block-like image components, called jaggies. The following
example illustrates this phenomenon.

EXAMPLE 4.8: ■ Figure 4.18(a) shows a 1024 * 1024 digital image of a computer-generated
Illustration of scene in which aliasing is negligible. Figure 4.18(b) is the result of reducing
jaggies in image
the size of (a) by 75% to 256 * 256 pixels using bilinear interpolation and
shrinking.
then using pixel replication to bring the image back to its original size in
order to make the effects of aliasing (jaggies in this case) more visible. As in
Example 4.7, the effects of aliasing can be made less objectionable by
smoothing the image before resampling. Figure 4.18(c) is the result of using a
5 * 5 averaging filter prior to reducing the size of the image. As this figure
shows, jaggies were reduced significantly. The size reduction and increase to
the original size in Fig. 4.18(c) were done using the same approach used to
generate Fig. 4.18(b). ■

a b c
FIGURE 4.18 Illustration of jaggies. (a) A 1024 * 1024 digital image of a computer-generated scene with
negligible visible aliasing. (b) Result of reducing (a) to 25% of its original size using bilinear interpolation.
(c) Result of blurring the image in (a) with a 5 * 5 averaging filter prior to resizing it to 25% using bilinear
interpolation. (Original image courtesy of D. P. Mitchell, Mental Landscape, LLC.)
4.5 ■ Extension to Functions of Two Variables 233

■ In the previous two examples, we used pixel replication to zoom the small EXAMPLE 4.9:
resampled images. This is not a preferred approach in general, as Fig. 4.19 il- Illustration of
jaggies in image
lustrates. Figure 4.19(a) shows a 1024 * 1024 zoomed image generated by
zooming.
pixel replication from a 256 * 256 section out of the center of the image in
Fig. 4.18(a). Note the “blocky” edges. The zoomed image in Fig. 4.19(b) was
generated from the same 256 * 256 section, but using bilinear interpolation.
The edges in this result are considerably smoother. For example, the edges of
the bottle neck and the large checkerboard squares are not nearly as blocky
in (b) as they are in (a). ■

Moiré patterns
Before leaving this section, we examine another type of artifact, called moiré
patterns,† that sometimes result from sampling scenes with periodic or nearly
periodic components. In optics, moiré patterns refer to beat patterns pro-
duced between two gratings of approximately equal spacing. These patterns
are a common everyday occurrence. We see them, for example, in overlapping
insect window screens and on the interference between TV raster lines and
striped materials. In digital image processing, the problem arises routinely
when scanning media print, such as newspapers and magazines, or in images
with periodic components whose spacing is comparable to the spacing be-
tween samples. It is important to note that moiré patterns are more general
than sampling artifacts. For instance, Fig. 4.20 shows the moiré effect using ink
drawings that have not been digitized. Separately, the patterns are clean and
void of interference. However, superimposing one pattern on the other creates

a b
FIGURE 4.19 Image zooming. (a) A 1024 * 1024 digital image generated by pixel
replication from a 256 * 256 image extracted from the middle of Fig. 4.18(a).
(b) Image generated using bi-linear interpolation, showing a significant reduction in
jaggies.


The term moiré is a French word (not the name of a person) that appears to have originated with
weavers who first noticed interference patterns visible on some fabrics; the term is rooted on the word
mohair, a cloth made from Angola goat hairs.
234 Chapter 4 ■ Filtering in the Frequency Domain

a b c
d e f
FIGURE 4.20
Examples of the
moiré effect.
These are ink
drawings, not
digitized patterns.
Superimposing
one pattern on
the other is
equivalent
mathematically to
multiplying the
patterns.

a beat pattern that has frequencies not present in either of the original pat-
terns. Note in particular the moiré effect produced by two patterns of dots, as
this is the effect of interest in the following discussion.
Color printing uses red, Newspapers and other printed materials make use of so called halftone
green, and blue dots to
produce the sensation in
dots, which are black dots or ellipses whose sizes and various joining schemes
the eye of continuous are used to simulate gray tones. As a rule, the following numbers are typical:
color.
newspapers are printed using 75 halftone dots per inch (dpi for short), maga-
zines use 133 dpi, and high-quality brochures use 175 dpi. Figure 4.21 shows

FIGURE 4.21
A newspaper
image of size
246 * 168 pixels
sampled at 75 dpi
showing a moiré
pattern. The
moiré pattern in
this image is the
interference
pattern created
between the ;45°
orientation of the
halftone dots and
the north–south
orientation of the
sampling grid
used to digitize
the image.
4.5 ■ Extension to Functions of Two Variables 235

what happens when a newspaper image is sampled at 75 dpi. The sampling lat-
tice (which is oriented vertically and horizontally) and dot patterns on the
newspaper image (oriented at ;45°) interact to create a uniform moiré pat-
tern that makes the image look blotchy. (We discuss a technique in Section
4.10.2 for reducing moiré interference patterns.)
As a related point of interest, Fig. 4.22 shows a newspaper image sam-
pled at 400 dpi to avoid moiré effects. The enlargement of the region sur-
rounding the subject’s left eye illustrates how halftone dots are used to
create shades of gray. The dot size is inversely proportional to image inten-
sity. In light areas, the dots are small or totally absent (see, for example, the
white part of the eye). In light gray areas, the dots are larger, as shown
below the eye. In darker areas, when dot size exceeds a specified value (typ-
ically 50%), dots are allowed to join along two specified directions to form
an interconnected mesh (see, for example, the left part of the eye). In some
cases the dots join along only one direction, as in the top right area below
the eyebrow.

4.5.5 The 2-D Discrete Fourier Transform and Its Inverse


A development similar to the material in Sections 4.3 and 4.4 would yield the Sometimes you will find
in the literature the
following 2-D discrete Fourier transform (DFT): 1/MN constant in front of
DFT instead of the
M-1 N-1 IDFT. At times, the con-
F(u, v) = a a f(x, y) e -j2p(ux>M + vy>N) (4.5-15) stant is expressed as
1> 2MN and is included
x=0 y=0 in front of the forward
and inverse transforms,
where f(x, y) is a digital image of size M * N. As in the 1-D case, Eq. (4.5-15) thus creating a more
symmetric pair. Any of
must be evaluated for values of the discrete variables u and v in the ranges these formulations is cor-
u = 0, 1, 2, Á , M - 1 and v = 0, 1, 2, Á , N - 1.† rect, provided that you
are consistent.

FIGURE 4.22
A newspaper
image and an
enlargement
showing how
halftone dots are
arranged to
render shades of
gray.


As mentioned in Section 4.4.1, keep in mind that in this chapter we use (t, z) and (m, n) to denote 2-D
continuous spatial and frequency-domain variables. In the 2-D discrete case, we use (x, y) for spatial
variables and (u, v) for frequency-domain variables.
236 Chapter 4 ■ Filtering in the Frequency Domain

Given the transform F(u, v), we can obtain f(x, y) by using the inverse dis-
crete Fourier transform (IDFT):

1 M-1 N-1 j 2p(ux>M + vy>N)


MN ua a F(u, v) e
f(x, y) = (4.5-16)
=0 v=0

for x = 0, 1, 2, Á , M - 1 and y = 0, 1, 2, Á , N - 1. Equations (4.5-15) and


(4.5-16) constitute the 2-D discrete Fourier transform pair. The rest of this
chapter is based on properties of these two equations and their use for image
filtering in the frequency domain.

4.6 Some Properties of the 2-D Discrete Fourier


Transform
In this section, we introduce several properties of the 2-D discrete Fourier
transform and its inverse.

4.6.1 Relationships Between Spatial and Frequency Intervals


The relationships between spatial sampling and the corresponding frequency-
domain intervals are as explained in Section 4.4.2. Suppose that a continuous
function f(t, z) is sampled to form a digital image, f(x, y), consisting of
M * N samples taken in the t- and z-directions, respectively. Let ¢T and ¢Z
denote the separations between samples (see Fig. 4.14). Then, the separations
between the corresponding discrete, frequency domain variables are given by

1
¢u = (4.6-1)
M¢T
and

1
¢v = (4.6-2)
N¢Z
respectively. Note that the separations between samples in the frequency do-
main are inversely proportional both to the spacing between spatial samples
and the number of samples.

4.6.2 Translation and Rotation


It can be shown by direct substitution into Eqs. (4.5-15) and (4.5-16) that
the Fourier transform pair satisfies the following translation properties
(Problem 4.16):

f(x, y) e j 2p(u0 x>M + v0 y>N) 3 F(u - u0, v - v0) (4.6-3)

and

f(x - x0, y - y0) 3 F(u, v) e -j2p(x0 u>M + y0 v>N) (4.6-4)


4.6 ■ Some Properties of the 2-D Discrete Fourier Transform 237

That is, multiplying f (x, y) by the exponential shown shifts the origin of the
DFT to (u0, v0) and, conversely, multiplying F (u, v) by the negative of that
exponential shifts the origin of f (x, y) to (x0, y0). As we illustrate in
Example 4.13, translation has no effect on the magnitude (spectrum) of
F (u, v).
Using the polar coordinates

x = r cos u y = r sin u u = v cos w v = v sin w

results in the following transform pair:

f(r, u + u0) 3 F(v, w + u0) (4.6-5)

which indicates that rotating f(x, y) by an angle u0 rotates F(u, v) by the same
angle. Conversely, rotating F(u, v) rotates f(x, y) by the same angle.

4.6.3 Periodicity
As in the 1-D case, the 2-D Fourier transform and its inverse are infinitely pe-
riodic in the u and v directions; that is,

F(u, v) = F(u + k1M, v) = F(u, v + k2N) = F(u + k1M, v + k2N) (4.6-6)

and

f(x, y) = f(x + k1M, y) = f(x, y + k2N) = f(x + k1M, y + k2N) (4.6-7)

where k1 and k2 are integers.


The periodicities of the transform and its inverse are important issues in
the implementation of DFT-based algorithms. Consider the 1-D spectrum in
Fig. 4.23(a). As explained in Section 4.4.1, the transform data in the interval
from 0 to M - 1 consists of two back-to-back half periods meeting at point
M> 2. For display and filtering purposes, it is more convenient to have in this
interval a complete period of the transform in which the data are contiguous,
as in Fig. 4.23(b). It follows from Eq. (4.6-3) that

f(x) e j 2p(u0x/M) 3 F(u - u0)

In other words, multiplying f(x) by the exponential term shown shifts the data
so that the origin, F(0), is located at u0. If we let u0 = M>2, the exponential
term becomes e jpx which is equal to ( -1)x because x is an integer. In this case,

f(x)(-1)x 3 F(u - M>2)

That is, multiplying f(x) by ( -1)x shifts the data so that F(0) is at the center of
the interval [0, M - 1], which corresponds to Fig. 4.23(b), as desired.
In 2-D the situation is more difficult to graph, but the principle is the same,
as Fig. 4.23(c) shows. Instead of two half periods, there are now four quarter
periods meeting at the point (M> 2, N> 2). The dashed rectangles correspond to
238 Chapter 4 ■ Filtering in the Frequency Domain

a F(u)
b
c d
FIGURE 4.23 Two back-to-back
Centering the periods meet here.
Fourier transform.
(a) A 1-D DFT u
showing an infinite M/ 2 0 M/ 2  1 M/ 2 M
number of periods. M1
(b) Shifted DFT F(u)
obtained by
multiplying f(x)
by (-1)x before Two back-to-back
computing F(u). periods meet here.
(c) A 2-D DFT
showing an infinite
number of periods. u
0 M/2 M1
The solid area is One period (M samples)
the M * N data
array, F(u, v),
obtained with Eq.
(4.5-15). This array
consists of four
quarter periods.
(d) A Shifted DFT (0, 0) N/2 N1
obtained by v
multiplying f(x, y)
by (-1)x + y F(u, v) M/ 2
before computing
F(u, v). The data Four back-to-back
now contains one periods meet here. M1 F (u, v)
complete, centered
period, as in (b). u
Four back-to-back
periods meet here.
 Periods of the DFT.

 M  N data array, F(u, v).

the infinite number of periods of the 2-D DFT. As in the 1-D case, visualization
is simplified if we shift the data so that F(0, 0) is at (M> 2, N> 2). Letting
(u0, v0) = (M>2, N>2) in Eq. (4.6-3) results in the expression

f(x, y)(-1)x + y 3 F(u - M>2, v - N>2) (4.6-8)

Using this equation shifts the data so that F(0, 0) is at the center of the
frequency rectangle defined by the intervals [0, M - 1] and [0, N - 1], as
desired. Figure 4.23(d) shows the result. We illustrate these concepts later in
this section as part of Example 4.11 and Fig. 4.24.
4.6 ■ Some Properties of the 2-D Discrete Fourier Transform 239

4.6.4 Symmetry Properties


An important result from functional analysis is that any real or complex func-
tion, w(x, y), can be expressed as the sum of an even and an odd part (each of
which can be real or complex):

w(x, y) = we(x, y) + wo(x, y) (4.6-9)

where the even and odd parts are defined as

w(x, y) + w(-x, -y)


we(x, y) ! (4.6-10a)
2
and

w(x, y) - w(-x, -y)


wo(x, y) ! (4.6-10b)
2

Substituting Eqs. (4.6-10a) and (4.6-10b) into Eq. (4.6-9) gives the identity
w(x, y) K w(x, y), thus proving the validity of the latter equation. It follows
from the preceding definitions that

we(x, y) = we(-x, -y) (4.6-11a)

and that

wo(x, y) = -wo(-x, -y) (4.6-11b)

Even functions are said to be symmetric and odd functions are antisymmetric.
Because all indices in the DFT and IDFT are positive, when we talk about
symmetry (antisymmetry) we are referring to symmetry (antisymmetry) about
the center point of a sequence. In terms of Eq. (4.6-11), indices to the right of
the center point of a 1-D array are considered positive, and those to the left
are considered negative (similarly in 2-D). In our work, it is more convenient
to think only in terms of nonnegative indices, in which case the definitions of
evenness and oddness become:

we(x, y) = we(M - x, N - y) (4.6-12a)

and

wo(x, y) = -wo(M - x, N - y) (4.6-12b)

where, as usual, M and N are the number of rows and columns of a 2-D array.
240 Chapter 4 ■ Filtering in the Frequency Domain

We know from elementary mathematical analysis that the product of two


even or two odd functions is even, and that the product of an even and an
odd function is odd. In addition, the only way that a discrete function can be
To convince yourself that odd is if all its samples sum to zero. These properties lead to the important
the samples of an odd
function sum to zero, result that
sketch one period of a
1-D sine wave about the
M-1 N-1
origin or any other inter-
val spanning one period. a a we(x, y) wo(x, y) = 0 (4.6-13)
x=0 y=0

for any two discrete even and odd functions we and wo. In other words, be-
cause the argument of Eq. (4.6-13) is odd, the result of the summations is 0.
The functions can be real or complex.

EXAMPLE 4.10: ■ Although evenness and oddness are visualized easily for continuous func-
Even and odd tions, these concepts are not as intuitive when dealing with discrete sequences.
functions.
The following illustrations will help clarify the preceding ideas. Consider the
1-D sequence

f = E f(0) f(1) f(2) f(3) F

= E2 1 1 1F

in which M = 4. To test for evenness, the condition f(x) = f(4 - x) must be


satisfied; that is, we require that

f(0) = f(4), f(2) = f(2), f(1) = f(3), f(3) = f(1)

Because f(4) is outside the range being examined, and it can be any value,
the value of f(0) is immaterial in the test for evenness. We see that the next
three conditions are satisfied by the values in the array, so the sequence is
even. In fact, we conclude that any 4-point even sequence has to have the
form

5a b c b6

That is, only the second and last points must be equal in a 4-point even se-
quence.
An odd sequence has the interesting property that its first term, w0(0, 0), is
always 0, a fact that follows directly from Eq. (4.6-10b). Consider the 1-D se-
quence

g = E g(0) g(1) g(2) g(3) F


= 50 -1 0 16
4.6 ■ Some Properties of the 2-D Discrete Fourier Transform 241

We easily can confirm that this is an odd sequence by noting that the terms in
the sequence satisfy the condition g(x) = -g(4 - x). For example,
g(1) = -g(3). Any 4-point odd sequence has the form

50 -b 0 b6

That is, when M is an even number, a 1-D odd sequence has the property that
the points at locations 0 and M> 2 always are zero. When M is odd, the first
term still has to be 0, but the remaining terms form pairs with equal value but
opposite sign.
The preceding discussion indicates that evenness and oddness of sequences
depend also on the length of the sequences. For example, we already showed
that the sequence 50 -1 0 16 is odd. However, the sequence
50 -1 0 1 06 is neither odd nor even, although the “basic” structure ap-
pears to be odd. This is an important issue in interpreting DFT results. We
show later in this section that the DFTs of even and odd functions have some
very important characteristics. Thus, it often is the case that understanding
when a function is odd or even plays a key role in our ability to interpret image
results based on DFTs.
The same basic considerations hold in 2-D. For example, the 6 * 6 2-D se-
quence

0 0 0 0 0 0
0 0 0 0 0 0 As an exercise, you
should use Eq. (4.6-12b)
0 0 -1 0 1 0 to convince yourself that
this 2-D sequence is odd.
0 0 -2 0 2 0
0 0 -1 0 1 0
0 0 0 0 0 0

is odd. However, adding another row and column of 0s would give a result
that is neither odd nor even. Note that the inner structure of this array is a
Sobel mask, as discussed in Section 3.6.4. We return to this mask in
Example 4.15. ■

Armed with the preceding concepts, we can establish a number of important


symmetry properties of the DFT and its inverse. A property used frequently is
that the Fourier transform of a real function, f(x, y), is conjugate symmetric:

F *(u, v) = F(-u, -v) (4.6-14)

If f(x, y) is imaginary, its Fourier transform is conjugate antisymmetric: Conjugate symmetry also
F*(-u, -v) = -F(u, v). The proof of Eq. (4.6-14) is as follows: is called hermitian sym-
metry. The term
antihermitian is used
M-1 N-1 * sometimes to refer to

F *(u, v) = B a a f(x, y) e -j 2 p(ux>M + vy>N) R


conjugate antisymmetry.

x=0 y=0
242 Chapter 4 ■ Filtering in the Frequency Domain

M-1 N-1
= a a f*(x, y)e j 2 p(ux>M + vy>N)
x=0 y=0

M-1 N-1
= a a f(x, y)e -j 2p( [-u] x>M + [-v] y> N)
x=0 y=0

= F(-u, -v)

where the third step follows from the fact that f(x, y) is real. A similar ap-
proach can be used to prove the conjugate antisymmetry exhibited by the
transform of imaginary functions.
Table 4.1 lists symmetries and related properties of the DFT that are useful
in digital image processing. Recall that the double arrows indicate Fourier
transform pairs; that is, for any row in the table, the properties on the right are
satisfied by the Fourier transform of the function having the properties listed
on the left, and vice versa. For example, entry 5 reads: The DFT of a real
function f1x, y2, in which 1x, y2 is replaced by 1-x, -y2, is F *(u, v), where
F(u, v), the DFT of f1x, y2, is a complex function, and vice versa.

TABLE 4.1 Some


symmetry Spatial Domain† Frequency Domain†
properties of the
1) f(x, y) real 3 F*(u, v) = F( -u, -v)
2-D DFT and its
inverse. R(u, v) 2) f(x, y) imaginary 3 F*(-u, -v) = -F(u, v)
and I(u, v) are the
real and imaginary 3) f(x, y) real 3 R(u, v) even; I(u, v) odd
parts of F(u, v),
respectively. The 4) f(x, y) imaginary 3 R(u, v) odd; I(u, v) even
term complex 5) f(-x, -y) real 3 F*(u, v) complex
indicates that a
function has 6) f( -x, -y) complex 3 F(-u, -v) complex
nonzero real and *
imaginary parts. 7) f (x, y) complex 3 F*(-u - v) complex
8) f(x, y) real and even 3 F(u, v) real and even
9) f(x, y) real and odd 3 F(u, v) imaginary and odd
10) f(x, y) imaginary and even 3 F(u, v) imaginary and even
11) f(x, y) imaginary and odd 3 F(u, v) real and odd
12) f(x, y) complex and even 3 F(u, v) complex and even
13) f(x, y) complex and odd 3 F(u, v) complex and odd

Recall that x, y, u, and v are discrete (integer) variables, with x and u in the range [0, M - 1], and y, and
v in the range [0, N - 1]. To say that a complex function is even means that its real and imaginary parts
are even, and similarly for an odd complex function.
4.6 ■ Some Properties of the 2-D Discrete Fourier Transform 243

■ With reference to the even and odd concepts discussed earlier and illustrat- EXAMPLE 4.11:
ed in Example 4.10, the following 1-D sequences and their transforms are 1-D illustrations
short examples of the properties listed in Table 4.1. The numbers in parenthe- of properties from
Table 4.1.
ses on the right are the individual elements of F(u), and similarly for f(x) in
the last two properties.

Property f(x) F(u)

3 51 2 3 46 3 5(10) (-2 + 2j) (-2) (- 2 - 2j)6


4 j51 2 3 46 3 5(2.5j) (.5 - .5j) (- .5j) (-.5 - .5j)6
8 52 1 1 16 3 5(5) (1) (1) (1)6
9 50 -1 0 16 3 5(0) (2j) (0) (-2j)6
10 j52 1 1 16 3 5(5j) (j) (j) (j)6
11 j50 -1 0 16 3 5(0) (-2) (0) (2)6
12 5(4 + 4j) (3 + 2j) (0 + 2j) (3 + 2j)6 3 5(10 + 10j) (4 + 2j) (-2 + 2j) (4 + 2j)6
13 5(0 + 0j) (1 + 1j) (0 + 0j) (-1 - j)6 3 5(0 + 0j) (2 - 2j) (0 + 0j) (-2 + 2j)6

For example, in property 3 we see that a real function with elements


51 2 3 46 has Fourier transform whose real part, 510 -2 -2 -26, is
even and whose imaginary part, 50 2 0 -26, is odd. Property 8 tells us that
a real even function has a transform that is real and even also. Property 12
shows that an even complex function has a transform that is also complex and
even. The other property examples are analyzed in a similar manner. ■

■ In this example, we prove several of the properties in Table 4.1 to develop EXAMPLE 4.12:
familiarity with manipulating these important properties, and to establish a Proving several
basis for solving some of the problems at the end of the chapter. We prove only symmetry
properties of the
the properties on the right given the properties on the left. The converse is DFT from Table
proved in a manner similar to the proofs we give here. 4.1.
Consider property 3, which reads: If f(x, y) is a real function, the real part of
its DFT is even and the odd part is odd; similarly, if a DFT has real and
imaginary parts that are even and odd, respectively, then its IDFT is a real
function. We prove this property formally as follows. F(u, v) is complex in
general, so it can be expressed as the sum of a real and an imaginary part:
F(u, v) = R(u, v) + jI(u, v). Then, F*(u, v) = R(u, v) - jI(u, v). Also,
F(-u, -v) = R(-u, -v) + jI(-u, -v). But, as proved earlier, if f(x, y) is real
then F*(u, v) = F(-u, -v), which, based on the preceding two equations, means
that R(u, v) = R(-u, -v) and I(u, v) = -I( -u, -v). In view of Eqs. (4.6-11a)
and (4.6-11b), this proves that R is an even function and I is an odd function.
Next, we prove property 8. If f(x, y) is real we know from property 3 that
the real part of F(u, v) is even, so to prove property 8 all we have to do is show
that if f(x, y) is real and even then the imaginary part of F(u, v) is 0 (i.e., F is
real). The steps are as follows:
M-1 N-1
F(u, v) = a a f(x, y) e-j2p(ux>M + vy>N)
x=0 y=0

which we can write as


244 Chapter 4 ■ Filtering in the Frequency Domain

M-1 N-1
F(u, v) = a a [ fr (x, y)] e -j 2p(ux>M + vy>N)
x=0 y=0

M-1 N-1
= a a [fr (x, y)] e -j2 p(ux>M)e -j2p(vy> N)
x=0 y=0

M-1 N-1
= a a [even][even - jodd][even - jodd]
x=0 y=0

M-1 N-1
= a a [even][even # even - 2jeven # odd - odd # odd]
x=0 y=0

M-1 N-1 M-1 N -1


= a a [even # even] - 2j a a [even # odd]
x=0 y=0 x=0 y=0

M-1 N-1
- a #
a [even even]
x=0 y=0

= real

The fourth step follows from Euler’s equation and the fact that the cos and sin
are even and odd functions, respectively. We also know from property 8 that, in
addition to being real, f is an even function. The only term in the penultimate
line containing imaginary components is the second term, which is 0 according
to Eq. (4.6-14). Thus, if f is real and even then F is real. As noted earlier, F is
also even because f is real. This concludes the proof.
Finally, we prove the validity of property 6. From the definition of the DFT,

ᑣ E f(-x, -y) F = a a f(-x, -y) e-j2p(ux>M + vy>N)


Note that we are not M-1 N-1
making a change of
variable here. We are
evaluating the DFT of x=0 y=0
f(-x, -y), so we simply
insert this function into
the equation, as we would Because of periodicity, f(-x, -y) = f(M - x, N - y). If we now define
any other function. m = M - x and n = N - y, then

ᑣ E f( -x, -y) F = a
M-1 N-1
-j2p(u[M - m] >M + v [N - n] > N)
a f(m, n) e
m=0 n=0

(To convince yourself that the summations are correct, try a 1-D transform
and expand a few terms by hand.) Because exp [-j2p(integer)] = 1, it
follows that
4.6 ■ Some Properties of the 2-D Discrete Fourier Transform 245

ᑣ E f(-x, -y) F = a a f(m, n) e j 2p(um>M + vn>N)


M-1 N-1

m=0 n=0

= F(-u, -v)
This concludes the proof. ■

4.6.5 Fourier Spectrum and Phase Angle


Because the 2-D DFT is complex in general, it can be expressed in polar
form:
F(u, v) = ƒ F(u, v) ƒ e jf(u,v) (4.6-15)
where the magnitude
ƒ F(u, v) ƒ = C R2(u, v) + I 2(u, v) D
1>2
(4.6-16)

is called the Fourier (or frequency) spectrum, and

I(u, v)
f(u, v) = arctan B R (4.6-17)
R(u, v)

is the phase angle. Recall from the discussion in Section 4.2.1 that the arctan
must be computed using a four-quadrant arctangent, such as MATLAB’s
atan2(Imag, Real) function.
Finally, the power spectrum is defined as

P(u, v) = ƒ F(u, v) ƒ 2
(4.6-18)
= R2(u, v) + I2(u, v)

As before, R and I are the real and imaginary parts of F(u, v) and all compu-
tations are carried out for the discrete variables u = 0, 1, 2, Á , M - 1 and
v = 0, 1, 2, Á , N - 1. Therefore, ƒ F(u, v) ƒ , f(u, v), and P(u, v) are arrays of
size M * N.
The Fourier transform of a real function is conjugate symmetric [Eq. (4.6-14)],
which implies that the spectrum has even symmetry about the origin:
ƒ F(u, v) ƒ = ƒ F(-u, -v) ƒ (4.6-19)
The phase angle exhibits the following odd symmetry about the origin:
f(u, v) = -f(-u, -v) (4.6-20)
It follows from Eq. (4.5-15) that
M-1 N-1
F(0, 0) = a a f(x, y)
x=0 y=0
246 Chapter 4 ■ Filtering in the Frequency Domain

a b y v
c d
FIGURE 4.24
(a) Image.
(b) Spectrum
showing bright spots
in the four corners.
(c) Centered
spectrum. (d) Result
showing increased
detail after a log
transformation. The
zero crossings of the
spectrum are closer in
x u
the vertical direction v v
because the rectangle
in (a) is longer in that
direction. The
coordinate
convention used
throughout the book
places the origin of
the spatial and
frequency domains at
the top left.

u u

which indicates that the zero-frequency term is proportional to the average


value of f(x, y). That is,

1 M-1 N-1
MN xa a f(x, y)
F(0, 0) = MN
=0 y=0

= MNf(x, y) (4.6-21)

where f denotes the average value of f. Then,

ƒ F(0, 0) ƒ = MN ƒ f(x, y) ƒ (4.6-22)

Because the proportionality constant MN usually is large, ƒ F(0, 0) ƒ typically is


the largest component of the spectrum by a factor that can be several orders of
magnitude larger than other terms. Because frequency components u and v
are zero at the origin, F(0, 0) sometimes is called the dc component of the
transform. This terminology is from electrical engineering, where “dc” signifies
direct current (i.e., current of zero frequency).
EXAMPLE 4.13: ■ Figure 4.24(a) shows a simple image and Fig. 4.24(b) shows its spectrum,
The 2-D Fourier whose values were scaled to the range [0, 255] and displayed in image form. The
spectrum of a
origins of both the spatial and frequency domains are at the top left. Two things
simple function.
are apparent in Fig. 4.22(b). As expected, the area around the origin of the
4.6 ■ Some Properties of the 2-D Discrete Fourier Transform 247

transform contains the highest values (and thus appears brighter in the image).
However, note that the four corners of the spectrum contain similarly high
values. The reason is the periodicity property discussed in the previous section.
To center the spectrum, we simply multiply the image in (a) by (-1)x + y before
computing the DFT, as indicated in Eq. (4.6-8). Figure 4.22(c) shows the result,
which clearly is much easier to visualize (note the symmetry about the center
point). Because the dc term dominates the values of the spectrum, the dynamic
range of other intensities in the displayed image are compressed. To bring out
those details, we perform a log transformation, as described in Section 3.2.2.
Figure 4.24(d) shows the display of (1 + log ƒ F(u, v) ƒ ). The increased rendition
of detail is evident. Most spectra shown in this and subsequent chapters are
scaled in this manner.
It follows from Eqs. (4.6-4) and (4.6-5) that the spectrum is insensitive to
image translation (the absolute value of the exponential term is 1), but it rotates
by the same angle of a rotated image. Figure 4.25 illustrates these properties.
The spectrum in Fig. 4.25(b) is identical to the spectrum in Fig. 4.24(d). Clearly,
the images in Figs. 4.24(a) and 4.25(a) are different, so if their Fourier spectra
are the same then, based on Eq. (4.6-15), their phase angles must be different.
Figure 4.26 confirms this. Figures 4.26(a) and (b) are the phase angle arrays
(shown as images) of the DFTs of Figs. 4.24(a) and 4.25(a). Note the lack of
similarity between the phase images, in spite of the fact that the only differences
between their corresponding images is simple translation. In general, visual
analysis of phase angle images yields little intuitive information. For instance,
due to its 45° orientation, one would expect intuitively that the phase angle in

a b
c d
FIGURE 4.25
(a) The rectangle
in Fig. 4.24(a)
translated,
and (b) the
corresponding
spectrum.
(c) Rotated
rectangle,
and (d) the
corresponding
spectrum. The
spectrum
corresponding to
the translated
rectangle is
identical to the
spectrum
corresponding to
the original image
in Fig. 4.24(a).
248 Chapter 4 ■ Filtering in the Frequency Domain

a b c
FIGURE 4.26 Phase angle array corresponding (a) to the image of the centered rectangle
in Fig. 4.24(a), (b) to the translated image in Fig. 4.25(a), and (c) to the rotated image in
Fig. 4.25(c).

Fig. 4.26(a) should correspond to the rotated image in Fig. 4.25(c), rather than to
the image in Fig. 4.24(a). In fact, as Fig. 4.26(c) shows, the phase angle of the ro-
tated image has a strong orientation that is much less than 45°. ■

The components of the spectrum of the DFT determine the amplitudes of


the sinusoids that combine to form the resulting image. At any given frequen-
cy in the DFT of an image, a large amplitude implies a greater prominence of
a sinusoid of that frequency in the image. Conversely, a small amplitude im-
plies that less of that sinusoid is present in the image. Although, as Fig. 4.26
shows, the contribution of the phase components is less intuitive, it is just as
important. The phase is a measure of displacement of the various sinusoids
with respect to their origin. Thus, while the magnitude of the 2-D DFT is an
array whose components determine the intensities in the image, the corre-
sponding phase is an array of angles that carry much of the information about
where discernable objects are located in the image. The following example
clarifies these concepts further.

EXAMPLE 4.14: ■ Figure 4.27(b) is the phase angle of the DFT of Fig. 4.27(a). There is no de-
Further tail in this array that would lead us by visual analysis to associate it with fea-
illustration of the
properties of the
tures in its corresponding image (not even the symmetry of the phase angle is
Fourier spectrum visible). However, the importance of the phase in determining shape charac-
and phase angle. teristics is evident in Fig. 4.27(c), which was obtained by computing the inverse
DFT of Eq. (4.6-15) using only phase information (i.e., with ƒ F(u, v) ƒ = 1 in
the equation). Although the intensity information has been lost (remember,
that information is carried by the spectrum) the key shape features in this
image are unmistakably from Fig. 4.27(a).
Figure 4.27(d) was obtained using only the spectrum in Eq. (4.6-15) and com-
puting the inverse DFT. This means setting the exponential term to 1, which in
turn implies setting the phase angle to 0.The result is not unexpected. It contains
only intensity information, with the dc term being the most dominant. There is
no shape information in the image because the phase was set to zero.
4.6 ■ Some Properties of the 2-D Discrete Fourier Transform 249

a b c
d e f
FIGURE 4.27 (a) Woman. (b) Phase angle. (c) Woman reconstructed using only the
phase angle. (d) Woman reconstructed using only the spectrum. (e) Reconstruction
using the phase angle corresponding to the woman and the spectrum corresponding to
the rectangle in Fig. 4.24(a). (f) Reconstruction using the phase of the rectangle and the
spectrum of the woman.

Finally, Figs. 4.27(e) and (f) show yet again the dominance of the phase in de-
termining the feature content of an image. Figure 4.27(e) was obtained by com-
puting the IDFT of Eq. (4.6-15) using the spectrum of the rectangle in Fig. 4.24(a)
and the phase angle corresponding to the woman. The shape of the woman
clearly dominates this result. Conversely, the rectangle dominates Fig. 4.27(f),
which was computed using the spectrum of the woman and the phase angle of
the rectangle. ■

4.6.6 The 2-D Convolution Theorem


Extending Eq. (4.4-10) to two variables results in the following expression for
2-D circular convolution:
M-1 N-1
f(x, y)  h(x, y) = a a f(m, n)h(x - m, y - n) (4.6-23)
m=0 n=0

for x = 0, 1, 2, Á , M - 1 and y = 0, 1, 2, Á , N - 1. As in Eq. (4.4-10),


Eq. (4.6-23) gives one period of a 2-D periodic sequence. The 2-D convolution
theorem is given by the expressions
f(x, y)  h(x, y) 3 F(u, v)H(u, v) (4.6-24)
and, conversely,
250 Chapter 4 ■ Filtering in the Frequency Domain

f(x, y)h(x, y) 3 F(u, v)  H(u, v) (4.6-25)

where F and H are obtained using Eq. (4.5-15) and, as before, the double
arrow is used to indicate that the left and right sides of the expressions consti-
tute a Fourier transform pair. Our interest in the remainder of this chapter is in
Eq. (4.6-24), which states that the inverse DFT of the product F(u, v)H(u, v)
yields f(x, y)  h(x, y), the 2-D spatial convolution of f and h. Similarly, the
DFT of the spatial convolution yields the product of the transforms in the fre-
quency domain. Equation (4.6-24) is the foundation of linear filtering and, as
explained in Section 4.7, is the basis for all the filtering techniques discussed in
this chapter.
Because we are dealing here with discrete quantities, computation of the
We discuss efficient ways Fourier transforms is carried out with a DFT algorithm. If we elect to compute
to compute the DFT in
Section 4.11. the spatial convolution using the IDFT of the product of the two transforms,
then the periodicity issues discussed in Section 4.6.3 must be taken into ac-
count. We give a 1-D example of this and then extend the conclusions to two
variables. The left column of Fig. 4.28 implements convolution of two functions,
f and h, using the 1-D equivalent of Eq. (3.4-2) which, because the two func-
tions are of same size, is written as
399
f(x)  h(x) = a f(x)h(x - m)
m=0

This equation is identical to Eq. (4.4-10), but the requirement on the displace-
ment x is that it be sufficiently large to cause the flipped (rotated) version of h
to slide completely past f. In other words, the procedure consists of (1) mirror-
ing h about the origin (i.e., rotating it by 180°) [Fig. 4.28(c)], (2) translating the
mirrored function by an amount x [Fig. 4.28(d)], and (3) for each value x of
translation, computing the entire sum of products in the right side of the pre-
ceding equation. In terms of Fig. 4.28 this means multiplying the function in
Fig. 4.28(a) by the function in Fig. 4.28(d) for each value of x. The displacement
x ranges over all values required to completely slide h across f. Figure 4.28(e)
shows the convolution of these two functions. Note that convolution is a func-
tion of the displacement variable, x, and that the range of x required in this ex-
ample to completely slide h past f is from 0 to 799.
If we use the DFT and the convolution theorem to obtain the same result as
in the left column of Fig. 4.28, we must take into account the periodicity inher-
ent in the expression for the DFT. This is equivalent to convolving the two pe-
riodic functions in Figs. 4.28(f) and (g). The convolution procedure is the same
as we just discussed, but the two functions now are periodic. Proceeding with
these two functions as in the previous paragraph would yield the result in
Fig. 4.28(j) which obviously is incorrect. Because we are convolving two peri-
odic functions, the convolution itself is periodic. The closeness of the periods in
Fig. 4.28 is such that they interfere with each other to cause what is commonly
referred to as wraparound error. According to the convolution theorem, if we
had computed the DFT of the two 400-point functions, f and h, multiplied the
4.6 ■ Some Properties of the 2-D Discrete Fourier Transform 251

f(m) f (m) a f
b g
c h
3 3 d i
e j
m m FIGURE 4.28 Left
0 200 400 0 200 400 column:
h (m) h(m) convolution of
two discrete
functions
obtained using the
2 2 approach
discussed in
m m Section 3.4.2. The
0 200 400 0 200 400 result in (e) is
h (m) h(m) correct. Right
column:
Convolution of
the same
functions, but
taking into
m m
account the
0 200 400 0 200 400
periodicity
h (x  m) h(x  m)
implied by the
DFT. Note in (j)
how data from
x x adjacent periods
produce
m m wraparound error,
0 200 400 0 200 400 yielding an
f(x) g(x) f (x) g (x) incorrect
convolution
1200 result. To obtain
1200 the correct result,
600 600 function padding
must be used.
x x
0 200 400 600 800 0200 400
Range of
Fourier transform
computation

two transforms, and then computed the inverse DFT, we would have obtained
the erroneous 400-point segment of the convolution shown in Fig. 4.28(j).
Fortunately, the solution to the wraparound error problem is simple. Consider
two functions, f(x) and h(x) composed of A and B samples, respectively. It can be
shown (Brigham [1988]) that if we append zeros to both functions so that they
have the same length, denoted by P, then wraparound is avoided by choosing The zeros could be
appended also to the
beginning of the func-
P Ú A + B - 1 (4.6-26) tions, or they could be
divided between the
In our example, each function has 400 points, so the minimum value we could beginning and end of the
use is P = 799, which implies that we would append 399 zeros to the trailing functions. It is simpler
to append them at the
edge of each function. This process is called zero padding. As an exercise, you end.
252 Chapter 4 ■ Filtering in the Frequency Domain

should convince yourself that if the periods of the functions in Figs. 4.28(f) and
(g) were lengthened by appending to each period at least 399 zeros, the result
would be a periodic convolution in which each period is identical to the correct
result in Fig. 4.28(e). Using the DFT via the convolution theorem would result
in a 799-point spatial function identical to Fig. 4.28(e). The conclusion, then, is
that to obtain the same convolution result between the “straight” representa-
tion of the convolution equation approach in Chapter 3, and the DFT ap-
proach, functions in the latter must be padded prior to computing their
transforms.
Visualizing a similar example in 2-D would be more difficult, but we would
arrive at the same conclusion regarding wraparound error and the need for ap-
pending zeros to the functions. Let f(x, y) and h(x, y) be two image arrays of
sizes A * B and C * D pixels, respectively. Wraparound error in their circular
convolution can be avoided by padding these functions with zeros, as follows:

f(x, y) 0 … x … A - 1 and 0 … y … B - 1
fp (x, y) = b (4.6-27)
0 A … x … P or B … y … Q

and

h(x, y) 0 … x … C - 1 and 0 … y … D - 1
hp (x, y) = b (4.6-28)
0 C … x … P or D … y … Q

with

P Ú A + C-1 (4.6-29)

and

Q Ú B + D-1 (4.6-30)

The resulting padded images are of size P * Q. If both arrays are of the same
size, M * N, then we require that

P Ú 2M - 1 (4.6-31)

and

Q Ú 2N - 1 (4.6-32)

We give an example in Section 4.7.2 showing the effects of wraparound error


on images. As rule, DFT algorithms tend to execute faster with arrays of even
size, so it is good practice to select P and Q as the smallest even integers that
satisfy the preceding equations. If the two arrays are of the same size, this
means that P and Q are selected as twice the array size.
The two functions in Figs. 4.28(a) and (b) conveniently become zero before
the end of the sampling interval. If one or both of the functions were not zero at
4.6 ■ Some Properties of the 2-D Discrete Fourier Transform 253

the end of the interval, then a discontinuity would be created when zeros were
appended to the function to eliminate wraparound error. This is analogous to
multiplying a function by a box, which in the frequency domain would imply
convolution of the original transform with a sinc function (see Example 4.1).
This, in turn, would create so-called frequency leakage, caused by the high-
frequency components of the sinc function. Leakage produces a blocky effect
on images. Although leakage never can be totally eliminated, it can be reduced
significantly by multiplying the sampled function by another function that ta-
pers smoothly to near zero at both ends of the sampled record to dampen the
sharp transitions (and thus the high frequency components) of the box. This ap-
proach, called windowing or apodizing, is an important consideration when fi- A simple apodizing func-
tion is a triangle, cen-
delity in image reconstruction (as in high-definition graphics) is desired. If you tered on the data record,
are faced with the need for windowing, a good approach is to use a 2-D Gaussian which tapers to 0 at both
ends of the record. This is
function (see Section 4.8.3). One advantage of this function is that its Fourier called the Bartlett win-
transform is Gaussian also, thus producing low leakage. dow. Other common win-
dows are the Hamming
and the Hann windows.
We can even use a
4.6.7 Summary of 2-D Discrete Fourier Transform Properties Gaussian function. We
return to the issue of
Table 4.2 summarizes the principal DFT definitions introduced in this chapter. windowing in Section
5.11.5.
Separability is discussed in Section 4.11.1 and obtaining the inverse using a
forward transform algorithm is discussed in Section 4.11.2. Correlation is dis-
cussed in Chapter 12.

TABLE 4.2
Name Expression(s)
Summary of DFT
1) Discrete Fourier M-1 N-1
definitions and
transform (DFT) F(u, v) = a a f(x, y) e-j2p(ux>M + vy>N) corresponding
of f(x, y) x=0 y=0 expressions.

2) Inverse discrete
Fourier transform 1 M-1 N-1 j2p(ux>M + vy>N)
MN ua a F(u, v) e
f(x, y) =
(IDFT) of F(u, v) =0 v=0

3) Polar representation F(u, v) = ƒ F(u, v) ƒ ejf(u,v)

ƒ F (u, v) ƒ = C R 2(u, v) + I 2(u, v) D


1/2
4) Spectrum
R = Real(F); I = Imag(F)

I(u, v)
5) Phase angle f(u, v) = tan-1 B R
R(u, v)

6) Power spectrum P(u, v) = ƒ F(u, v) ƒ 2

1 M-1 N-1 1
MN xa a f(x, y) = MN F(0, 0)
7) Average value f(x, y) =
=0 y=0

(Continued)
254 Chapter 4 ■ Filtering in the Frequency Domain

TABLE 4.2
Name Expression(s)
(Continued)
8) Periodicity (k1 and F(u, v) = F(u + k1M, v) = F(u, v + k2N)
k2 are integers) = F(u + k1M, v + k2N)
f(x, y) = f(x + k1M, y) = f(x, y + k2N)
= f(x + k1M, y + k2N)
M-1 N-1
9) Convolution f(x, y)  h(x, y) = a a f(m, n)h(x - m, y - n)
m=0 n=0
M-1 N-1
10) Correlation f(x, y)  h(x, y) = a a f*(m, n)h(x + m, y + n)
m=0 n=0

11) Separability The 2-D DFT can be computed by computing 1-D


DFT transforms along the rows (columns) of the
image, followed by 1-D transforms along the columns
(rows) of the result. See Section 4.11.1.
M-1 N-1
12) Obtaining the inverse MNf*(x, y) = a a F *(u, v)e -j2p(ux>M + vy>N)
Fourier transform u=0 v=0
This equation indicates that inputting F*(u, v) into an
using a forward
algorithm that computes the forward transform
transform algorithm.
(right side of above equation) yields MNf*(x, y).
Taking the complex conjugate and dividing by MN
gives the desired inverse. See Section 4.11.2.

Table 4.3 summarizes some important DFT pairs. Although our focus is on
discrete functions, the last two entries in the table are Fourier transform pairs
that can be derived only for continuous variables (note the use of continuous
variable notation). We include them here because, with proper interpretation,
they are quite useful in digital image processing. The differentiation pair can

TABLE 4.3
Name DFT Pairs
Summary of DFT
pairs. The closed- 1) Symmetry See Table 4.1
form expressions properties
in 12 and 13 are
valid only for 2) Linearity af1(x, y) + bf2(x, y) 3 aF1(u, v) + bF2(u, v)
continuous 3) Translation f(x, y) e j2p(u0x>M + v0y>N) 3 F(u - u0, v - v0)
variables. They (general) f(x - x0, y - y0) 3 F(u, v)e-j2p(ux0/M + vy0/N)
can be used with
discrete variables 4) Translation f(x, y)(-1)x + y 3 F(u - M>2, v - N>2)
by sampling the to center of f(x - M>2, y - N>2) 3 F(u, v)(-1)u + v
closed-form, the frequency
continuous rectangle,
expressions. (M/2, N/2)
5) Rotation f(r, u + u0) 3 F(v, w + u0)
x = r cos u y = r sin u u = v cos w v = v sin w
6) Convolution f(x, y)  h(x, y) 3 F(u, v)H(u, v)
theorem† f(x, y)h(x, y) 3 F(u, v)  H(u, v)

(Continued)
4.7 ■ The Basics of Filtering in the Frequency Domain 255

TABLE 4.3
Name DFT Pairs
(Continued)
7) Correlation f(x, y)  h(x, y) 3 F *(u, v) H(u, v)
theorem† f*(x, y)h(x, y) 3 F(u, v)  H(u, v)
8) Discrete unit d(x, y) 3 1
impulse
sin(pua) sin(pvb) -jp(ua + vb)
9) Rectangle rect[a, b] 3 ab e
(pua) (pvb)
10) Sine sin(2pu0x + 2pv0y) 3

C d(u + Mu0, v + Nv0) - d(u - Mu0, v - Nv0) D


1
j
2
11) Cosine cos(2pu0x + 2pv0y) 3

C d(u + Mu0, v + Nv0) + d(u - Mu0, v - Nv0) D


1
2
The following Fourier transform pairs are derivable only for continuous variables,
denoted as before by t and z for spatial variables and by m and n for frequency
variables. These results can be used for DFT work by sampling the continuous forms.
0 m 0 n
12) Differentiation a b a b f(t, z) 3 (j2pm)m(j2pn)nF(m, n)
0t 0z
(The expressions
m
on the right 0 f(t, z) m
0 nf(t, z)
3 (j2pm) F(m, n); 3 (j2pn)nF(m, n)
assume that 0tm 0zn
f(; q , ; q ) = 0.)
2 2 2
+ z2) 2
+ n2)>2s2
13) Gaussian A2ps2e-2p s (t 3 Ae-(m (A is a constant)

Assumes that the functions have been extended by zero padding. Convolution and correlation are asso-
ciative, commutative, and distributive.

be used to derive the frequency-domain equivalent of the Laplacian defined in


Eq. (3.6-3) (Problem 4.26). The Gaussian pair is discussed in Section 4.7.4.
Tables 4.1 through 4.3 provide a summary of properties useful when working
with the DFT. Many of these properties are key elements in the development of
the material in the rest of this chapter, and some are used in subsequent chapters.

4.7 The Basics of Filtering in the Frequency Domain


In this section, we lay the groundwork for all the filtering techniques discussed
in the remainder of the chapter.

4.7.1 Additional Characteristics of the Frequency Domain


We begin by observing in Eq. (4.5-15) that each term of F(u, v) contains all val-
ues of f(x, y), modified by the values of the exponential terms. Thus, with the
exception of trivial cases, it usually is impossible to make direct associations be-
tween specific components of an image and its transform. However, some gen-
eral statements can be made about the relationship between the frequency
256 Chapter 4 ■ Filtering in the Frequency Domain

components of the Fourier transform and spatial features of an image. For


instance, because frequency is directly related to spatial rates of change, it is not
difficult intuitively to associate frequencies in the Fourier transform with pat-
terns of intensity variations in an image. We showed in Section 4.6.5 that the
slowest varying frequency component (u = v = 0) is proportional to the aver-
age intensity of an image. As we move away from the origin of the transform,
the low frequencies correspond to the slowly varying intensity components of
an image. In an image of a room, for example, these might correspond to
smooth intensity variations on the walls and floor. As we move further away
from the origin, the higher frequencies begin to correspond to faster and faster
intensity changes in the image. These are the edges of objects and other compo-
nents of an image characterized by abrupt changes in intensity.
Filtering techniques in the frequency domain are based on modifying the
Fourier transform to achieve a specific objective and then computing the in-
verse DFT to get us back to the image domain, as introduced in Section
2.6.7. It follows from Eq. (4.6-15) that the two components of the transform
to which we have access are the transform magnitude (spectrum) and the
phase angle. Section 4.6.5 covered the basic properties of these two compo-
nents of the transform. We learned there that visual analysis of the phase
component generally is not very useful. The spectrum, however, provides
some useful guidelines as to gross characteristics of the image from which
the spectrum was generated. For example, consider Fig. 4.29(a), which is a
scanning electron microscope image of an integrated circuit, magnified ap-
proximately 2500 times. Aside from the interesting construction of the de-
vice itself, we note two principal features: strong edges that run
approximately at ;45° and two white, oxide protrusions resulting from
thermally-induced failure.The Fourier spectrum in Fig. 4.29(b) shows prominent
components along the ;45° directions that correspond to the edges just
mentioned. Looking carefully along the vertical axis, we see a vertical component

a b
FIGURE 4.29 (a) SEM image of a damaged integrated circuit. (b) Fourier spectrum of
(a). (Original image courtesy of Dr. J. M. Hudak, Brockhouse Institute for Materials
Research, McMaster University, Hamilton, Ontario, Canada.)
4.7 ■ The Basics of Filtering in the Frequency Domain 257

that is off-axis slightly to the left. This component was caused by the edges of
the oxide protrusions. Note how the angle of the frequency component with
respect to the vertical axis corresponds to the inclination (with respect to the
horizontal axis) of the long white element, and note also the zeros in the ver-
tical frequency component, corresponding to the narrow vertical span of the
oxide protrusions.
These are typical of the types of associations that can be made in general
between the frequency and spatial domains. As we show later in this chapter,
even these types of gross associations, coupled with the relationships men-
tioned previously between frequency content and rate of change of intensity
levels in an image, can lead to some very useful results. In the next section,
we show the effects of modifying various frequency ranges in the transform
of Fig. 4.29(a).

4.7.2 Frequency Domain Filtering Fundamentals


Filtering in the frequency domain consists of modifying the Fourier transform
of an image and then computing the inverse transform to obtain the processed
result. Thus, given a digital image, f(x, y), of size M * N, the basic filtering
equation in which we are interested has the form:

g(x, y) = ᑣ-1[H(u, v)F(u, v)] (4.7-1) If H is real and symmet-


ric and f is real (as is typ-
ically the case), then the
where ᑣ-1 is the IDFT, F(u, v) is the DFT of the input image, f(x, y), H(u, v) IDFT in Eq. (4.7-1)
is a filter function (also called simply the filter, or the filter transfer function), should yield real quanti-
ties in theory. In practice,
and g(x, y) is the filtered (output) image. Functions F, H, and g are arrays of the inverse generally
size M * N, the same as the input image. The product H(u, v)F(u, v) is contains parasitic com-
plex terms from round-
formed using array multiplication, as defined in Section 2.6.1. The filter func- off and other
tion modifies the transform of the input image to yield a processed output, computational inaccura-
cies. Thus, it is customary
g(x, y). Specification of H(u, v) is simplified considerably by using functions to take the real part of
that are symmetric about their center, which requires that F(u, v) be centered the IDFT to form g.

also. As explained in Section 4.6.3, this is accomplished by multiplying the


input image by (-1)x + y prior to computing its transform.†
We are now in a position to consider the filtering process in some detail. One
of the simplest filters we can construct is a filter H(u, v) that is 0 at the center of
the transform and 1 elsewhere. This filter would reject the dc term and “pass”
(i.e., leave unchanged) all other terms of F(u, v) when we form the product
H(u, v)F(u, v). We know from Eq. (4.6-21) that the dc term is responsible for the
average intensity of an image, so setting it to zero will reduce the average intensi-
ty of the output image to zero. Figure 4.30 shows the result of this operation using
Eq. (4.7-1). As expected, the image became much darker. (An average of zero


Many software implementations of the 2-D DFT (e.g., MATLAB) do not center the transform. This im-
plies that filter functions must be arranged to correspond to the same data format as the uncentered
transform (i.e., with the origin at the top left). The net result is that filters are more difficult to generate
and display. We use centering in our discussions to aid in visualization, which is crucial in developing a
clear understanding of filtering concepts. Either method can be used practice, as long as consistency is
maintained.
258 Chapter 4 ■ Filtering in the Frequency Domain

FIGURE 4.30
Result of filtering
the image in
Fig. 4.29(a) by
setting to 0 the
term F(M> 2, N> 2)
in the Fourier
transform.

implies the existence of negative intensities. Therefore, although it illustrates the


principle, Fig. 4.30 is not a true representation of the original, as all negative in-
tensities were clipped (set to 0) for display purposes.)
As noted earlier, low frequencies in the transform are related to slowly
varying intensity components in an image, such as the walls of a room or a
cloudless sky in an outdoor scene. On the other hand, high frequencies are
caused by sharp transitions in intensity, such as edges and noise. Therefore, we
would expect that a filter H(u, v) that attenuates high frequencies while passing
low frequencies (appropriately called a lowpass filter) would blur an image,
while a filter with the opposite property (called a highpass filter) would en-
hance sharp detail, but cause a reduction in contrast in the image. Figure 4.31 il-
lustrates these effects. Note the similarity between Figs. 4.31(e) and Fig. 4.30.
The reason is that the highpass filter shown eliminates the dc term, resulting in
the same basic effect that led to Fig. 4.30. Adding a small constant to the filter
does not affect sharpening appreciably, but it does prevent elimination of the
dc term and thus preserves tonality, as Fig. 4.31(f) shows.
Equation (4.7-1) involves the product of two functions in the frequency do-
main which, by the convolution theorem, implies convolution in the spatial do-
main. We know from the discussion in Section 4.6.6 that if the functions in
question are not padded we can expect wraparound error. Consider what hap-
pens when we apply Eq. (4.7-1) without padding. Figure 4.32(a) shows a sim-
ple image, and Fig. 4.32(b) is the result of lowpass filtering the image with a
Gaussian lowpass filter of the form shown in Fig. 4.31(a). As expected, the
image is blurred. However, the blurring is not uniform; the top white edge is
blurred, but the side white edges are not. Padding the input image according to
Eqs. (4.6-31) and (4.6-32) before applying Eq. (4.7-1) results in the filtered
image in Fig. 4.32(c). This result is as expected.
Figure 4.33 illustrates the reason for the discrepancy between Figs. 4.32(b)
and (c). The dashed areas in Fig. 4.33 correspond to the image in Fig. 4.32(a).
Figure 4.33(a) shows the periodicity implicit in the use of the DFT, as ex-
plained in Section 4.6.3. Imagine convolving the spatial representation of the
blurring filter with this image. When the filter is passing through the top of the
4.7 ■ The Basics of Filtering in the Frequency Domain 259

H (u, v)
H (u, v)
H(u, v)

M/2 N/2 M/ 2 N/ 2
u
v
N/2 M/2 a
u v u v

a b c
d e f
FIGURE 4.31 Top row: frequency domain filters. Bottom row: corresponding filtered images obtained using
Eq. (4.7-1).We used a = 0.85 in (c) to obtain (f) (the height of the filter itself is 1). Compare (f) with Fig. 4.29(a).

a b c
FIGURE 4.32 (a) A simple image. (b) Result of blurring with a Gaussian lowpass filter without padding.
(c) Result of lowpass filtering with padding. Compare the light area of the vertical edges in (b) and (c).
260 Chapter 4 ■ Filtering in the Frequency Domain

a b
FIGURE 4.33 2-D image periodicity inherent in using the DFT. (a) Periodicity without
image padding. (b) Periodicity after padding with 0s (black). The dashed areas in the
center correspond to the image in Fig. 4.32(a). (The thin white lines in both images are
superimposed for clarity; they are not part of the data.)

dashed image, it will encompass part of the image and also part of the bottom
of the periodic image right above it. When a dark and a light region reside
under the filter, the result is a mid-gray, blurred output. However, when the fil-
ter is passing through the top right side of the image, the filter will encompass
only light areas in the image and its right neighbor. The average of a constant
is the same constant, so filtering will have no effect in this area, giving the re-
sult in Fig. 4.32(b). Padding the image with 0s creates a uniform border around
the periodic sequence, as Fig. 4.33(b) shows. Convolving the blurring function
with the padded “mosaic” of Fig. 4.33(b) gives the correct result in Fig. 4.32(c).
You can see from this example that failure to pad an image can lead to erro-
neous results. If the purpose of filtering is only for rough visual analysis, the
padding step is skipped sometimes.
Thus far, the discussion has centered on padding the input image, but
Eq. (4.7-1) also involves a filter that can be specified either in the spatial or in
the frequency domain. However, padding is done in the spatial domain, which
raises an important question about the relationship between spatial padding
and filters specified directly in the frequency domain.
At first glance, one could conclude that the way to handle padding of a
frequency domain filter is to construct the filter to be of the same size as the
image, compute the IDFT of the filter to obtain the corresponding spatial fil-
ter, pad that filter in the spatial domain, and then compute its DFT to return
to the frequency domain. The 1-D example in Fig. 4.34 illustrates the pitfalls in
this approach. Figure 4.34(a) shows a 1-D ideal lowpass filter in the frequency
domain. The filter is real and has even symmetry, so we know from property 8
in Table 4.1 that its IDFT will be real and symmetric also. Figure 4.34(b)
shows the result of multiplying the elements of the frequency domain filter
4.7 ■ The Basics of Filtering in the Frequency Domain 261

1.2 0.04 a c
b d
1
0.03 FIGURE 4.34
(a) Original filter
0.8
specified in the
0.6 0.02 (centered)
frequency domain.
0.4 (b) Spatial
0.01 representation
0.2 obtained by
computing the
0
0 IDFT of (a).
(c) Result of
0.2 0.01 padding (b) to twice
0 128 255 0 128 256 384 511 its length (note the
0.04 1.2 discontinuities).
(d) Corresponding
1 filter in the
0.03
frequency domain
0.8 obtained by
0.02 computing the DFT
0.6 of (c). Note the
ringing caused by
0.4
0.01 the discontinuities
in (c). (The curves
0.2
appear continuous
0 because the points
0
were joined to
0.01 0.2 simplify visual
0 128 255 0 128 256 384 511 analysis.)

by (-1)u and computing its IDFT to obtain the corresponding spatial filter.
The extremes of this spatial function are not zero so, as Fig. 4.34(c) shows,
zero-padding the function created two discontinuities (padding the two ends
of the function is the same as padding one end, as long as the total number of
zeros used is the same).
To get back to the frequency domain, we compute the DFT of the spatial,
padded filter. Figure 4.34(d) shows the result.The discontinuities in the spatial fil-
ter created ringing in its frequency domain counterpart, as you would expect
from the results in Example 4.1. Viewed another way, we know from that exam-
ple that the Fourier transform of a box function is a sinc function with frequency
components extending to infinity, and we would expect the same behavior from
the inverse transform of a box.That is, the spatial representation of an ideal (box) See the end of Section
4.3.3 regarding the defini-
frequency domain filter has components extending to infinity. Therefore, any tion of an ideal filter.
spatial truncation of the filter to implement zero-padding will introduce disconti-
nuities, which will then in general result in ringing in the frequency domain (trun-
cation can be avoided in this case if it is done at zero crossings, but we are
interested in general procedures, and not all filters have zero crossings).
What the preceding results tell us is that, because we cannot work with an infi-
nite number of components, we cannot use an ideal frequency domain filter [as in
262 Chapter 4 ■ Filtering in the Frequency Domain

Fig. 4.34(a)] and simultaneously use zero padding to avoid wraparound error. A
decision on which limitation to accept is required. Our objective is to work with
specified filter shapes in the frequency domain (including ideal filters) without
having to be concerned with truncation issues. One approach is to zero-pad im-
ages and then create filters in the frequency domain to be of the same size as the
padded images (remember, images and filters must be of the same size when
using the DFT). Of course, this will result in wraparound error because no
padding is used for the filter, but in practice this error is mitigated significantly by
the separation provided by the padding of the image, and it is preferable to ring-
ing. Smooth filters (such as those in Fig. 4.31) present even less of a problem.
Specifically, then, the approach we will follow in this chapter in order to work
with filters of a specified shape directly in the frequency domain is to pad images
to size P * Q and construct filters of the same dimensions. As explained ear-
lier, P and Q are given by Eqs. (4.6-29) and (4.6-30).
We conclude this section by analyzing the phase angle of the filtered trans-
form. Because the DFT is a complex array, we can express it in terms of its real
and imaginary parts:

F(u, v) = R(u, v) + jI(u, v) (4.7-2)

Equation (4.7-1) then becomes

g(x, y) = ᑣ-1 C H(u, v)R(u, v) + jH(u, v)I(u, v) D (4.7-3)

The phase angle is not altered by filtering in the manner just described be-
cause H(u, v) cancels out when the ratio of the imaginary and real parts is
formed in Eq. (4.6-17). Filters that affect the real and imaginary parts equally,
and thus have no effect on the phase, are appropriately called zero-phase-shift
filters. These are the only types of filters considered in this chapter.
Even small changes in the phase angle can have dramatic (usually undesir-
able) effects on the filtered output. Figure 4.35 illustrates the effect of some-
thing as simple as a scalar change. Figure 4.35(a) shows an image resulting
from multiplying the angle array in Eq. (4.6-15) by 0.5, without changing

a b
FIGURE 4.35
(a) Image resulting
from multiplying by
0.5 the phase angle
in Eq. (4.6-15) and
then computing the
IDFT. (b) The
result of
multiplying the
phase by 0.25. The
spectrum was not
changed in either of
the two cases.
4.7 ■ The Basics of Filtering in the Frequency Domain 263

ƒ F(u, v) ƒ , and then computing the IDFT. The basic shapes remain unchanged,
but the intensity distribution is quite distorted. Figure 4.35(b) shows the result
of multiplying the phase by 0.25. The image is almost unrecognizable.

4.7.3 Summary of Steps for Filtering in the Frequency Domain


The material in the previous two sections can be summarized as follows:
1. Given an input image f(x, y) of size M * N, obtain the padding parame-
ters P and Q from Eqs. (4.6-31) and (4.6-32). Typically, we select P = 2M
and Q = 2N.
2. Form a padded image, fp (x, y), of size P * Q by appending the necessary
number of zeros to f(x, y).
3. Multiply fp (x, y) by ( -1)x + y to center its transform. As noted earlier, center-
ing helps in visualizing
4. Compute the DFT, F(u, v), of the image from step 3. the filtering process and
5. Generate a real, symmetric filter function, H(u, v), of size P * Q with cen- in generating the filter
ter at coordinates (P> 2, Q> 2).† Form the product G(u, v) = H(u, v)F(u, v) functions themselves, but
centering is not a funda-
using array multiplication; that is, G(i, k) = H(i, k)F(i, k). mental requirement.
6. Obtain the processed image:

gp (x, y) = E real C ᑣ-1[G(u, v)] D F (-1)x + y


where the real part is selected in order to ignore parasitic complex com-
ponents resulting from computational inaccuracies, and the subscript p in-
dicates that we are dealing with padded arrays.
7. Obtain the final processed result, g(x, y), by extracting the M * N region
from the top, left quadrant of gp (x, y).
Figure 4.36 illustrates the preceding steps. The legend in the figure explains the
source of each image. If it were enlarged, Fig. 4.36(c) would show black dots
interleaved in the image because negative intensities are clipped to 0 for dis-
play. Note in Fig. 4.36(h) the characteristic dark border exhibited by lowpass
filtered images processed using zero padding.

4.7.4 Correspondence Between Filtering in the Spatial and


Frequency Domains
The link between filtering in the spatial and frequency domains is the convo-
lution theorem. In Section 4.7.2, we defined filtering in the frequency domain
as the multiplication of a filter function, H(u, v), times F(u, v), the Fourier
transform of the input image. Given a filter H(u, v), suppose that we want to
find its equivalent representation in the spatial domain. If we let
f(x, y) = d(x, y), it follows from Table 4.3 that F(u, v) = 1. Then, from
Eq. (4.7-1), the filtered output is ᑣ-15H(u, v)6. But this is the inverse trans-
form of the frequency domain filter, which is the corresponding filter in the


If H(u, v) is to be generated from a given spatial filter, h(x, y), then we form hp(x, y) by padding the
spatial filter to size P * Q, multiply the expanded array by (-1)x + y, and compute the DFT of the result
to obtain a centered H(u, v). Example 4.15 illustrates this procedure.
264 Chapter 4 ■ Filtering in the Frequency Domain

a b c
d e f
g h
FIGURE 4.36
(a) An M * N
image, f.
(b) Padded image,
fp of size P * Q.
(c) Result of
multiplying fp by
(-1)x + y.
(d) Spectrum of
Fp. (e) Centered
Gaussian lowpass
filter, H, of size
P * Q.
(f) Spectrum of
the product HFp.
(g) gp, the product
of (-1)x + y and
the real part of
the IDFT of HFp.
(h) Final result, g,
obtained by
cropping the first
M rows and N
columns of gp.

spatial domain. Conversely, it follows from a similar analysis and the convolu-
tion theorem that, given a spatial filter, we obtain its frequency domain repre-
sentation by taking the forward Fourier transform of the spatial filter.
Therefore, the two filters form a Fourier transform pair:

h(x, y) 3 H(u, v) (4.7-4)

where h(x, y) is a spatial filter. Because this filter can be obtained from the re-
sponse of a frequency domain filter to an impulse, h(x, y) sometimes is re-
ferred to as the impulse response of H(u, v). Also, because all quantities in a
discrete implementation of Eq. (4.7-4) are finite, such filters are called finite
impulse response (FIR) filters. These are the only types of linear spatial filters
considered in this book.
We introduced spatial convolution in Section 3.4.1 and discussed its imple-
mentation in connection with Eq. (3.4-2), which involved convolving func-
tions of different sizes. When we speak of spatial convolution in terms of the
4.7 ■ The Basics of Filtering in the Frequency Domain 265

convolution theorem and the DFT, it is implied that we are convolving peri-
odic functions, as explained in Fig. 4.28. For this reason, as explained earlier,
Eq. (4.6-23) is referred to as circular convolution. Furthermore, convolution
in the context of the DFT involves functions of the same size, whereas in
Eq. (3.4-2) the functions typically are of different sizes.
In practice, we prefer to implement convolution filtering using Eq. (3.4-2)
with small filter masks because of speed and ease of implementation in
hardware and/or firmware. However, filtering concepts are more intuitive in
the frequency domain. One way to take advantage of the properties of both
domains is to specify a filter in the frequency domain, compute its IDFT,
and then use the resulting, full-size spatial filter as a guide for constructing
smaller spatial filter masks (more formal approaches are mentioned in
Section 4.11.4). This is illustrated next. Later in this section, we illustrate
also the converse, in which a small spatial filter is given and we obtain its
full-size frequency domain representation. This approach is useful for ana-
lyzing the behavior of small spatial filters in the frequency domain. Keep in
mind during the following discussion that the Fourier transform and its in-
verse are linear processes (Problem 4.14), so the discussion is limited to lin-
ear filtering.
In the following discussion, we use Gaussian filters to illustrate how
frequency domain filters can be used as guides for specifying the coefficients
of some of the small masks discussed in Chapter 3. Filters based on Gaussian
functions are of particular interest because, as noted in Table 4.3, both the
forward and inverse Fourier transforms of a Gaussian function are real
Gaussian functions. We limit the discussion to 1-D to illustrate the underly-
ing principles. Two-dimensional Gaussian filters are discussed later in this
chapter.
Let H(u) denote the 1-D frequency domain Gaussian filter:

H(u) = A e -u >2s
2 2
(4.7-5)

where s is the standard deviation of the Gaussian curve. The corresponding


filter in the spatial domain is obtained by taking the inverse Fourier transform
of H(u) (Problem 4.31):
2
s2x2
h(x) = 12psAe -2p (4.7-6)

These equations† are important for two reasons: (1) They are a Fourier trans-
form pair, both components of which are Gaussian and real. This facilitates
analysis because we do not have to be concerned with complex numbers. In
addition, Gaussian curves are intuitive and easy to manipulate. (2) The func-
tions behave reciprocally. When H(u) has a broad profile (large value of s),


As mentioned in Table 4.3, closed forms for the forward and inverse Fourier transforms of Gaussians
are valid only for continuous functions. To use discrete formulations we simply sample the continuous
Gaussian transforms. Our use of discrete variables here implies that we are dealing with sampled
transforms.
266 Chapter 4 ■ Filtering in the Frequency Domain

h(x) has a narrow profile, and vice versa. In fact, as s approaches infinity, H(u)
tends toward a constant function and h(x) tends toward an impulse, which im-
plies no filtering in the frequency and spatial domains, respectively.
Figures 4.37(a) and (b) show plots of a Gaussian lowpass filter in the fre-
quency domain and the corresponding lowpass filter in the spatial domain.
Suppose that we want to use the shape of h(x) in Fig. 4.37(b) as a guide for
specifying the coefficients of a small spatial mask. The key similarity be-
tween the two filters is that all their values are positive. Thus, we conclude
that we can implement lowpass filtering in the spatial domain by using a
mask with all positive coefficients (as we did in Section 3.5.1). For reference,
Fig. 4.37(b) shows two of the masks discussed in that section. Note the recip-
rocal relationship between the width of the filters, as discussed in the previ-
ous paragraph. The narrower the frequency domain filter, the more it will
attenuate the low frequencies, resulting in increased blurring. In the spatial
domain, this means that a larger mask must be used to increase blurring, as
illustrated in Example 3.13.
More complex filters can be constructed using the basic Gaussian function
of Eq. (4.7-5). For example, we can construct a highpass filter as the difference
of Gaussians:

H(u) = A e -u >2s 1 - B e -u >2s 2


2 2 2 2
(4.7-7)

with A Ú B and s1 7 s2. The corresponding filter in the spatial domain is


2 2 2 2 2 2
h(x) = 12ps1 A e -2p s 1x - 12ps2 B e -2p s2x (4.7-8)

Figures 4.37(c) and (d) show plots of these two equations. We note again the
reciprocity in width, but the most important feature here is that h(x) has a pos-
itive center term with negative terms on either side. The small masks shown in

a c H (u) H (u)
b d
FIGURE 4.37
(a) A 1-D Gaussian
lowpass filter in the
frequency domain.
(b) Spatial
lowpass filter
corresponding to u u
(a). (c) Gaussian h(x) h (x)
highpass filter in
the frequency 1 1 1 1 1 1
1
domain. (d) Spatial ––
9
 1
1
1 1
1 1
1 8 1
1 1 1
highpass filter 1 2 1 0 1 0
corresponding to 1
––  2
16
4 2 1 4 1
1 2 1 0 1 0
(c). The small 2-D
masks shown are x x
spatial filters we
used in Chapter 3.
4.7 ■ The Basics of Filtering in the Frequency Domain 267

Fig. 4.37(d) “capture” this property. These two masks were used in Chapter 3
as sharpening filters, which we now know are highpass filters.
Although we have gone through significant effort to get here, be assured
that it is impossible to truly understand filtering in the frequency domain
without the foundation we have just established. In practice, the frequency
domain can be viewed as a “laboratory” in which we take advantage of the
correspondence between frequency content and image appearance. As is
demonstrated numerous times later in this chapter, some tasks that would be
exceptionally difficult or impossible to formulate directly in the spatial do-
main become almost trivial in the frequency domain. Once we have selected a
specific filter via experimentation in the frequency domain, the actual imple-
mentation of the method usually is done in the spatial domain. One approach
is to specify small spatial masks that attempt to capture the “essence” of the
full filter function in the spatial domain, as we explained in Fig. 4.37. A more
formal approach is to design a 2-D digital filter by using approximations
based on mathematical or statistical criteria. We touch on this point again in
Section 4.11.4.

■ In this example, we start with a spatial mask and show how to generate its EXAMPLE 4.15:
corresponding filter in the frequency domain. Then, we compare the filtering Obtaining a
results obtained using frequency domain and spatial techniques. This type of frequency domain
filter from a small
analysis is useful when one wishes to compare the performance of given spa-
spatial mask.
tial masks against one or more “full” filter candidates in the frequency do-
main, or to gain deeper understanding about the performance of a mask. To
keep matters simple, we use the 3 * 3 Sobel vertical edge detector from
Fig. 3.41(e). Figure 4.38(a) shows a 600 * 600 pixel image, f(x, y), that we wish
to filter, and Fig. 4.38(b) shows its spectrum.
Figure 4.39(a) shows the Sobel mask, h(x, y) (the perspective plot is ex-
plained below). Because the input image is of size 600 * 600 pixels and the fil-
ter is of size 3 * 3 we avoid wraparound error by padding f and h to size

a b
FIGURE 4.38
(a) Image of a
building, and
(b) its spectrum.
268 Chapter 4 ■ Filtering in the Frequency Domain

a b
c d 1 0 1
FIGURE 4.39
2 0 2
(a) A spatial
mask and
1 0 1
perspective plot
of its
corresponding
frequency domain
filter. (b) Filter
shown as an
image. (c) Result
of filtering
Fig. 4.38(a) in the
frequency domain
with the filter in
(b). (d) Result of
filtering the same
image with the
spatial filter in
(a). The results
are identical.

602 * 602 pixels, according to Eqs. (4.6-29) and (4.6-30). The Sobel mask ex-
hibits odd symmetry, provided that it is embedded in an array of zeros of even
size (see Example 4.10). To maintain this symmetry, we place h(x, y) so that its
center is at the center of the 602 * 602 padded array. This is an important as-
pect of filter generation. If we preserve the odd symmetry with respect to the
padded array in forming hp(x, y), we know from property 9 in Table 4.1 that
H(u, v) will be purely imaginary. As we show at the end of this example, this
will yield results that are identical to filtering the image spatially using h(x, y).
If the symmetry were not preserved, the results would no longer be same.
The procedure used to generate H(u, v) is: (1) multiply hp(x, y) by (-1)x + y
to center the frequency domain filter; (2) compute the forward DFT of the re-
sult in (1); (3) set the real part of the resulting DFT to 0 to account for parasitic
real parts (we know that H(u, v) has to be purely imaginary); and (4) multiply
the result by (-1)u + v. This last step reverses the multiplication of H(u, v) by
(-1)u + v, which is implicit when h(x, y) was moved to the center of hp(x, y).
Figure 4.39(a) shows a perspective plot of H(u, v), and Fig. 4.39(b) shows
4.8 ■ Image Smoothing Using Frequency Domain Filters 269

H(u, v) as an image. As, expected, the function is odd, thus the antisymmetry
about its center. Function H(u, v) is used as any other frequency domain filter
in the procedure outlined in Section 4.7.3.
Figure 4.39(c) is the result of using the filter just obtained in the proce-
dure outlined in Section 4.7.3 to filter the image in Fig. 4.38(a). As expected
from a derivative filter, edges are enhanced and all the constant intensity
areas are reduced to zero (the grayish tone is due to scaling for display).
Figure 4.39(d) shows the result of filtering the same image in the spatial do-
main directly, using h(x, y) in the procedure outlined in Section 3.6.4. The re-
sults are identical. ■

4.8 Image Smoothing Using Frequency Domain Filters


The remainder of this chapter deals with various filtering techniques in the fre-
quency domain. We begin with lowpass filters. Edges and other sharp intensity
transitions (such as noise) in an image contribute significantly to the high-
frequency content of its Fourier transform. Hence, smoothing (blurring) is
achieved in the frequency domain by high-frequency attenuation; that is, by
lowpass filtering. In this section, we consider three types of lowpass filters:
ideal, Butterworth, and Gaussian. These three categories cover the range from
very sharp (ideal) to very smooth (Gaussian) filtering. The Butterworth filter
has a parameter called the filter order. For high order values, the Butterworth
filter approaches the ideal filter. For lower order values, the Butterworth filter
is more like a Gaussian filter. Thus, the Butterworth filter may be viewed as
providing a transition between two “extremes.” All filtering in this section fol-
lows the procedure outlined in Section 4.7.3, so all filter functions, H(u, v), are
understood to be discrete functions of size P * Q; that is, the discrete frequency
variables are in the range u = 0, 1, 2, Á , P - 1 and v = 0, 1, 2, Á , Q - 1.

4.8.1 Ideal Lowpass Filters


A 2-D lowpass filter that passes without attenuation all frequencies within a
circle of radius D0 from the origin and “cuts off” all frequencies outside this
circle is called an ideal lowpass filter (ILPF); it is specified by the function

1 if D(u, v) … D0
H(u, v) = b (4.8-1)
0 if D(u, v) 7 D0

where D0 is a positive constant and D(u, v) is the distance between a point (u, v)
in the frequency domain and the center of the frequency rectangle; that is,

D(u, v) = C (u - P>2)2 + (v - Q>2)2 D


1/ 2
(4.8-2)

where, as before, P and Q are the padded sizes from Eqs. (4.6-31) and (4.6-32).
Figure 4.40(a) shows a perspective plot of H(u, v) and Fig. 4.40(b) shows the
filter displayed as an image. As mentioned in Section 4.3.3, the name ideal
indicates that all frequencies on or inside a circle of radius D0 are passed
270 Chapter 4 ■ Filtering in the Frequency Domain

H(u, v) H (u, v)
v
1

u v
D (u, v)
D0
u

a b c
FIGURE 4.40 (a) Perspective plot of an ideal lowpass-filter transfer function. (b) Filter displayed as an image.
(c) Filter radial cross section.

without attenuation, whereas all frequencies outside the circle are completely
attenuated (filtered out). The ideal lowpass filter is radially symmetric about
the origin, which means that the filter is completely defined by a radial cross
section, as Fig. 4.40(c) shows. Rotating the cross section by 360° yields the fil-
ter in 2-D.
For an ILPF cross section, the point of transition between H(u, v) = 1 and
H(u, v) = 0 is called the cutoff frequency. In the case of Fig. 4.40, for example,
the cutoff frequency is D0. The sharp cutoff frequencies of an ILPF cannot be
realized with electronic components, although they certainly can be simulated
in a computer. The effects of using these “nonphysical” filters on a digital
image are discussed later in this section.
The lowpass filters introduced in this chapter are compared by studying
their behavior as a function of the same cutoff frequencies. One way to estab-
lish a set of standard cutoff frequency loci is to compute circles that enclose
specified amounts of total image power PT. This quantity is obtained by sum-
ming the components of the power spectrum of the padded images at each
point (u, v), for u = 0, 1, Á , P - 1 and v = 0, 1, Á , Q - 1; that is,
P-1 Q-1
PT = a a P(u, v) (4.8-3)
u=0 v=0

where P(u, v) is given in Eq. (4.6-18). If the DFT has been centered, a circle of
radius D0 with origin at the center of the frequency rectangle encloses a per-
cent of the power, where

a = 100c a a P(u, v)>PT d (4.8-4)


u v

and the summation is taken over values of (u, v) that lie inside the circle or on
its boundary.
4.8 ■ Image Smoothing Using Frequency Domain Filters 271

Figures 4.41(a) and (b) show a test pattern image and its spectrum. The
circles superimposed on the spectrum have radii of 10, 30, 60, 160, and 460
pixels, respectively. These circles enclose a percent of the image power, for
a = 87.0, 93.1, 95.7, 97.8, and 99.2%, respectively. The spectrum falls off
rapidly, with 87% of the total power being enclosed by a relatively small
circle of radius 10.

■ Figure 4.42 shows the results of applying ILPFs with cutoff frequencies at EXAMPLE 4.16:
the radii shown in Fig. 4.41(b). Figure 4.42(b) is useless for all practical pur- Image smoothing
using an ILPF.
poses, unless the objective of blurring is to eliminate all detail in the image,
except the “blobs” representing the largest objects. The severe blurring in
this image is a clear indication that most of the sharp detail information in
the picture is contained in the 13% power removed by the filter. As the filter
radius increases, less and less power is removed, resulting in less blurring.
Note that the images in Figs. 4.42(c) through (e) are characterized by “ring-
ing,” which becomes finer in texture as the amount of high frequency con-
tent removed decreases. Ringing is visible even in the image [Fig. 4.42(e)] in
which only 2% of the total power was removed. This ringing behavior is a
characteristic of ideal filters, as you will see shortly. Finally, the result for
a = 99.2 shows very slight blurring in the noisy squares but, for the most
part, this image is quite close to the original. This indicates that little edge
information is contained in the upper 0.8% of the spectrum power in this
particular case.
It is clear from this example that ideal lowpass filtering is not very practi-
cal. However, it is useful to study their behavior as part of our development of

a b
FIGURE 4.41 (a) Test pattern of size 688 * 688 pixels, and (b) its Fourier spectrum. The
spectrum is double the image size due to padding but is shown in half size so that it fits
in the page. The superimposed circles have radii equal to 10, 30, 60, 160, and 460 with
respect to the full-size spectrum image. These radii enclose 87.0, 93.1, 95.7, 97.8, and
99.2% of the padded image power, respectively.
272 Chapter 4 ■ Filtering in the Frequency Domain

a b
c d
e f
FIGURE 4.42 (a) Original image. (b)–(f) Results of filtering using ILPFs with cutoff
frequencies set at radii values 10, 30, 60, 160, and 460, as shown in Fig. 4.41(b). The
power removed by these filters was 13, 6.9, 4.3, 2.2, and 0.8% of the total, respectively.
4.8 ■ Image Smoothing Using Frequency Domain Filters 273

filtering concepts. Also, as shown in the discussion that follows, some interest-
ing insight is gained by attempting to explain the ringing property of ILPFs in
the spatial domain. ■

The blurring and ringing properties of ILPFs can be explained using the
convolution theorem. Figure 4.43(a) shows the spatial representation, h(x, y), of
an ILPF of radius 10, and Fig. 4.43(b) shows the intensity profile of a line passing
through the center of the image. Because a cross section of the ILPF in the fre-
quency domain looks like a box filter, it is not unexpected that a cross section of
the corresponding spatial filter has the shape of a sinc function. Filtering in the
spatial domain is done by convolving h(x, y) with the image. Imagine each pixel
in the image being a discrete impulse whose strength is proportional to the in-
tensity of the image at that location. Convolving a sinc with an impulse copies
the sinc at the location of the impulse. The center lobe of the sinc is the principal
cause of blurring, while the outer, smaller lobes are mainly responsible for ring-
ing. Convolving the sinc with every pixel in the image provides a nice model for
explaining the behavior of ILPFs. Because the “spread” of the sinc function is in-
versely proportional to the radius of H(u, v), the larger D0 becomes, the more
the spatial sinc approaches an impulse which, in the limit, causes no blurring at
all when convolved with the image. This type of reciprocal behavior should be
routine to you by now. In the next two sections, we show that it is possible to
achieve blurring with little or no ringing, which is an important objective in
lowpass filtering.

4.8.2 Butterworth Lowpass Filters


The transfer function of a Butterworth lowpass filter (BLPF) of order n, and The transfer function of
the Butterworth lowpass
with cutoff frequency at a distance D0 from the origin, is defined as filter normally is written
as the square root of our
expression. However, our
1 interest here is in the
H(u, v) = (4.8-5)
1 + [D(u, v)>D0] 2n basic form of the filter, so
we exclude the square
root for computational
where D(u, v) is given by Eq. (4.8-2). Figure 4.44 shows a perspective plot, convenience.
image display, and radial cross sections of the BLPF function.

a b
FIGURE 4.43
(a) Representation
in the spatial
domain of an
ILPF of radius 5
and size
1000 * 1000.
(b) Intensity
profile of a
horizontal line
passing through
the center of the
image.
274 Chapter 4 ■ Filtering in the Frequency Domain

H(u, v) H (u, v)
v 1.0

n1
0.5 n2
n3
u n4
v

D(u, v)
D0
u

a b c
FIGURE 4.44 (a) Perspective plot of a Butterworth lowpass-filter transfer function. (b) Filter displayed as an
image. (c) Filter radial cross sections of orders 1 through 4.

Unlike the ILPF, the BLPF transfer function does not have a sharp discon-
tinuity that gives a clear cutoff between passed and filtered frequencies. For
filters with smooth transfer functions, defining a cutoff frequency locus at
points for which H(u, v) is down to a certain fraction of its maximum value is
customary. In Eq. (4.8-5), (down 50% from its maximum value of 1) when
D(u, v) = D0.

EXAMPLE 4.17: ■ Figure 4.45 shows the results of applying the BLPF of Eq. (4.8-5) to
Image smoothing Fig. 4.45(a), with n = 2 and D0 equal to the five radii in Fig. 4.41(b). Unlike the
with a
results in Fig. 4.42 for the ILPF, we note here a smooth transition in blurring as
Butterworth
lowpass filter. a function of increasing cutoff frequency. Moreover, no ringing is visible in any
of the images processed with this particular BLPF, a fact attributed to the fil-
ter’s smooth transition between low and high frequencies. ■

A BLPF of order 1 has no ringing in the spatial domain. Ringing generally


is imperceptible in filters of order 2, but can become significant in filters of
higher order. Figure 4.46 shows a comparison between the spatial representa-
tion of BLPFs of various orders (using a cutoff frequency of 5 in all cases).
Shown also is the intensity profile along a horizontal scan line through the cen-
ter of each filter. These filters were obtained and displayed using the same pro-
cedure used to generate Fig. 4.43. To facilitate comparisons, additional
enhancing with a gamma transformation [see Eq. (3.2-3)] was applied to the
images of Fig. 4.46. The BLPF of order 1 [Fig. 4.46(a)] has neither ringing nor
negative values. The filter of order 2 does show mild ringing and small negative
values, but they certainly are less pronounced than in the ILPF. As the remain-
ing images show, ringing in the BLPF becomes significant for higher-order fil-
ters. A Butterworth filter of order 20 exhibits characteristics similar to those of
the ILPF (in the limit, both filters are identical). BLPFs of order 2 are a good
compromise between effective lowpass filtering and acceptable ringing.
4.8 ■ Image Smoothing Using Frequency Domain Filters 275

a b
c d
e f
FIGURE 4.45 (a) Original image. (b)–(f) Results of filtering using BLPFs of order 2,
with cutoff frequencies at the radii shown in Fig. 4.41. Compare with Fig. 4.42.
276 Chapter 4 ■ Filtering in the Frequency Domain

a b c d
FIGURE 4.46 (a)–(d) Spatial representation of BLPFs of order 1, 2, 5, and 20, and corresponding intensity
profiles through the center of the filters (the size in all cases is 1000 * 1000 and the cutoff frequency is 5).
Observe how ringing increases as a function of filter order.

4.8.3 Gaussian Lowpass Filters


Gaussian lowpass filters (GLPFs) of one dimension were introduced in
Section 4.7.4 as an aid in exploring some important relationships between the
spatial and frequency domains. The form of these filters in two dimensions is
given by
2 2
H(u, v) = e-D (u, v)>2s (4.8-6)

where, as in Eq. (4.8-2), D(u, v) is the distance from the center of the frequency
rectangle. Here we do not use a multiplying constant as in Section 4.7.4 in
order to be consistent with the filters discussed in the present section, whose
highest value is 1. As before, s is a measure of spread about the center. By let-
ting s = D0, we can express the filter using the notation of the other filters in
this section:
2
(u, v)>2D02
H(u, v) = e -D (4.8-7)

where D0 is the cutoff frequency. When D(u, v) = D0, the GLPF is down to
0.607 of its maximum value.
As Table 4.3 shows, the inverse Fourier transform of the GLPF is Gaussian
also. This means that a spatial Gaussian filter, obtained by computing the
IDFT of Eq. (4.8-6) or (4.8-7), will have no ringing. Figure 4.47 shows a per-
spective plot, image display, and radial cross sections of a GLPF function, and
Table 4.4 summarizes the lowpass filters discussed in this section.
4.8 ■ Image Smoothing Using Frequency Domain Filters 277

H(u, v) H (u, v)
v 1.0

D0  10
0.667 D0  20
D0  40
D0  100
u v

D(u, v)
u

a b c
FIGURE 4.47 (a) Perspective plot of a GLPF transfer function. (b) Filter displayed as an image. (c) Filter
radial cross sections for various values of D0.

TABLE 4.4
Lowpass filters. D0 is the cutoff frequency and n is the order of the Butterworth filter.

Ideal Butterworth Gaussian

1 if D(u, v) … D0 1 2 2
H(u, v) = b H(u, v) = H(u, v) = e -D (u,v)>2D0
0 if D(u, v) 7 D0 1 + [D(u, v)>D0] 2n

■ Figure 4.48 shows the results of applying the GLPF of Eq. (4.8-7) to EXAMPLE 4.18:
Fig. 4.48(a), with D0 equal to the five radii in Fig. 4.41(b). As in the case of the Image smoothing
BLPF of order 2 (Fig. 4.45), we note a smooth transition in blurring as a func- with a Gaussian
lowpass filter.
tion of increasing cutoff frequency. The GLPF achieved slightly less smoothing
than the BLPF of order 2 for the same value of cutoff frequency, as can be
seen, for example, by comparing Figs. 4.45(c) and 4.48(c). This is expected, be-
cause the profile of the GLPF is not as “tight” as the profile of the BLPF of
order 2. However, the results are quite comparable, and we are assured of no
ringing in the case of the GLPF. This is an important characteristic in practice,
especially in situations (e.g., medical imaging) in which any type of artifact is
unacceptable. In cases where tight control of the transition between low and
high frequencies about the cutoff frequency are needed, then the BLPF pre-
sents a more suitable choice. The price of this additional control over the filter
profile is the possibility of ringing. ■

4.8.4 Additional Examples of Lowpass Filtering


In the following discussion, we show several practical applications of lowpass
filtering in the frequency domain. The first example is from the field of ma-
chine perception with application to character recognition; the second is from
the printing and publishing industry; and the third is related to processing
278 Chapter 4 ■ Filtering in the Frequency Domain

a b
c d
e f
FIGURE 4.48 (a) Original image. (b)–(f) Results of filtering using GLPFs with cutoff
frequencies at the radii shown in Fig. 4.41. Compare with Figs. 4.42 and 4.45.
4.8 ■ Image Smoothing Using Frequency Domain Filters 279

a b
FIGURE 4.49
(a) Sample text of
low resolution
(note broken
characters in
magnified view).
(b) Result of
filtering with a
GLPF (broken
character
segments were
joined).

satellite and aerial images. Similar results can be obtained using the lowpass
spatial filtering techniques discussed in Section 3.5.
Figure 4.49 shows a sample of text of poor resolution. One encounters text
like this, for example, in fax transmissions, duplicated material, and historical
records. This particular sample is free of additional difficulties like smudges,
creases, and torn sections. The magnified section in Fig. 4.49(a) shows that the
characters in this document have distorted shapes due to lack of resolution,
and many of the characters are broken. Although humans fill these gaps visu-
ally without difficulty, machine recognition systems have real difficulties read-
ing broken characters. One approach for handling this problem is to bridge
small gaps in the input image by blurring it. Figure 4.49(b) shows how well
characters can be “repaired” by this simple process using a Gaussian lowpass
filter with D0 = 80. The images are of size 444 * 508 pixels.
Lowpass filtering is a staple in the printing and publishing industry, where it
is used for numerous preprocessing functions, including unsharp masking, as We discuss unsharp
masking in the frequency
discussed in Section 3.6.3. “Cosmetic” processing is another use of lowpass fil- domain in Section 4.9.5
tering prior to printing. Figure 4.50 shows an application of lowpass filtering
for producing a smoother, softer-looking result from a sharp original. For
human faces, the typical objective is to reduce the sharpness of fine skin lines
and small blemishes. The magnified sections in Figs. 4.50(b) and (c) clearly
show a significant reduction in fine skin lines around the eyes in this case. In
fact, the smoothed images look quite soft and pleasing.
Figure 4.51 shows two applications of lowpass filtering on the same image,
but with totally different objectives. Figure 4.51(a) is an 808 * 754 very high
resolution radiometer (VHRR) image showing part of the Gulf of Mexico
(dark) and Florida (light), taken from a NOAA satellite (note the horizontal
sensor scan lines). The boundaries between bodies of water were caused by
loop currents. This image is illustrative of remotely sensed images in which sen-
sors have the tendency to produce pronounced scan lines along the direction in
which the scene is being scanned (see Example 4.24 for an illustration of a
280 Chapter 4 ■ Filtering in the Frequency Domain

a b c
FIGURE 4.50 (a) Original image (784 * 732 pixels). (b) Result of filtering using a GLPF with D0 = 100.
(c) Result of filtering using a GLPF with D0 = 80. Note the reduction in fine skin lines in the magnified
sections in (b) and (c).

physical cause). Lowpass filtering is a crude but simple way to reduce the effect
of these lines, as Fig. 4.51(b) shows (we consider more effective approaches in
Sections 4.10 and 5.4.1). This image was obtained using a GLFP with D0 = 50.
The reduction in the effect of the scan lines can simplify the detection of fea-
tures such as the interface boundaries between ocean currents.
Figure 4.51(c) shows the result of significantly more aggressive Gaussian
lowpass filtering with D0 = 20. Here, the objective is to blur out as much de-
tail as possible while leaving large features recognizable. For instance, this type
of filtering could be part of a preprocessing stage for an image analysis system
that searches for features in an image bank. An example of such features could
be lakes of a given size, such as Lake Okeechobee in the lower eastern region
of Florida, shown as a nearly round dark region in Fig. 4.51(c). Lowpass filter-
ing helps simplify the analysis by averaging out features smaller than the ones
of interest.

4.9 Image Sharpening Using Frequency Domain Filters


In the previous section, we showed that an image can be smoothed by attenu-
ating the high-frequency components of its Fourier transform. Because edges
and other abrupt changes in intensities are associated with high-frequency
components, image sharpening can be achieved in the frequency domain by
highpass filtering, which attenuates the low-frequency components without
disturbing high-frequency information in the Fourier transform. As in Section
4.9 ■ Image Sharpening Using Frequency Domain Filters 281

a b c
FIGURE 4.51 (a) Image showing prominent horizontal scan lines. (b) Result of filtering using a GLPF with
D0 = 50. (c) Result of using a GLPF with D0 = 20. (Original image courtesy of NOAA.)

4.8, we consider only zero-phase-shift filters that are radially symmetric. All
filtering in this section is based on the procedure outlined in Section 4.7.3, so
all filter functions, H(u, v), are understood to be discrete functions of size
P * Q; that is, the discrete frequency variables are in the range
u = 0, 1, 2, Á , P - 1 and v = 0, 1, 2, Á , Q - 1.
A highpass filter is obtained from a given lowpass filter using the equation

HHP (u, v) = 1 - HLP (u, v) (4.9-1)

where HLP (u, v) is the transfer function of the lowpass filter. That is, when the
lowpass filter attenuates frequencies, the highpass filter passes them, and vice
versa.
In this section, we consider ideal, Butterworth, and Gaussian highpass fil-
ters. As in the previous section, we illustrate the characteristics of these filters
in both the frequency and spatial domains. Figure 4.52 shows typical 3-D plots,
image representations, and cross sections for these filters. As before, we see
that the Butterworth filter represents a transition between the sharpness of
the ideal filter and the broad smoothness of the Gaussian filter. Figure 4.53,
discussed in the sections that follow, illustrates what these filters look like in
the spatial domain. The spatial filters were obtained and displayed by using the
procedure used to generate Figs. 4.43 and 4.46.

4.9.1 Ideal Highpass Filters


A 2-D ideal highpass filter (IHPF) is defined as

0 if D(u, v) … D0
H(u, v) = b (4.9-2)
1 if D(u, v) 7 D0
282 Chapter 4 ■ Filtering in the Frequency Domain

H(u, v) H(u, v)
v 1.0

u v
D(u, v)
u
H(u, v) H(u, v)
v 1.0

u v
D(u, v)
u
H(u, v) H(u, v)
v 1.0

u v
a b c D(u, v)
d e f
g h i u

FIGURE 4.52 Top row: Perspective plot, image representation, and cross section of a typical ideal highpass
filter. Middle and bottom rows: The same sequence for typical Butterworth and Gaussian highpass filters.

where D0 is the cutoff frequency and D(u, v) is given by Eq. (4.8-2). This ex-
pression follows directly from Eqs. (4.8-1) and (4.9-1). As intended, the IHPF
is the opposite of the ILPF in the sense that it sets to zero all frequencies inside
a circle of radius D0 while passing, without attenuation, all frequencies outside
the circle. As in the case of the ILPF, the IHPF is not physically realizable. How-
ever, we consider it here for completeness and, as before, because its proper-
ties can be used to explain phenomena such as ringing in the spatial domain.
The discussion will be brief.
Because of the way in which they are related [Eq. (4.9-1)], we can expect
IHPFs to have the same ringing properties as ILPFs. This is demonstrated
4.9 ■ Image Sharpening Using Frequency Domain Filters 283

~ ~ ~

a b c
FIGURE 4.53 Spatial representation of typical (a) ideal, (b) Butterworth, and (c) Gaussian frequency domain
highpass filters, and corresponding intensity profiles through their centers.

clearly in Fig. 4.54, which consists of various IHPF results using the original
image in Fig. 4.41(a) with D0 set to 30, 60, and 160 pixels, respectively. The ring-
ing in Fig. 4.54(a) is so severe that it produced distorted, thickened object
boundaries (e.g., look at the large letter “a”). Edges of the top three circles do
not show well because they are not as strong as the other edges in the image
(the intensity of these three objects is much closer to the background intensity,

a b c
FIGURE 4.54 Results of highpass filtering the image in Fig. 4.41(a) using an IHPF with D0 = 30, 60, and 160.
284 Chapter 4 ■ Filtering in the Frequency Domain

giving discontinuities of smaller magnitude). Looking at the “spot” size of the


spatial representation of the IHPF in Fig. 4.53(a) and keeping in mind that fil-
tering in the spatial domain is convolution of the spatial filter with the image
helps explain why the smaller objects and lines appear almost solid white.
Look in particular at the three small squares in the top row and the thin, ver-
tical bars in Fig. 4.54(a). The situation improved somewhat with D0 = 60.
Edge distortion is quite evident still, but now we begin to see filtering on the
smaller objects. Due to the now familiar inverse relationship between the fre-
quency and spatial domains, we know that the spot size of this filter is smaller
than the spot of the filter with D0 = 30. The result for D0 = 160 is closer to
what a highpass-filtered image should look like. Here, the edges are much
cleaner and less distorted, and the smaller objects have been filtered prop-
erly. Of course, the constant background in all images is zero in these
highpass-filtered images because highpass filtering is analogous to differ-
entiation in the spatial domain.

4.9.2 Butterworth Highpass Filters


A 2-D Butterworth highpass filter (BHPF) of order n and cutoff frequency D0
is defined as

1
1 + [D0>D(u, v)]2n
H(u, v) = (4.9-3)

where D(u, v) is given by Eq. (4.8-2). This expression follows directly from
Eqs. (4.8-5) and (4.9-1). The middle row of Fig. 4.52 shows an image and cross
section of the BHPF function.
As with lowpass filters, we can expect Butterworth highpass filters to be-
have smoother than IHPFs. Figure 4.55 shows the performance of a BHPF, of

a b c
FIGURE 4.55 Results of highpass filtering the image in Fig. 4.41(a) using a BHPF of order 2 with D0 = 30, 60,
and 160, corresponding to the circles in Fig. 4.41(b). These results are much smoother than those obtained
with an IHPF.
4.9 ■ Image Sharpening Using Frequency Domain Filters 285

a b c
FIGURE 4.56 Results of highpass filtering the image in Fig. 4.41(a) using a GHPF with D0 = 30, 60, and 160,
corresponding to the circles in Fig. 4.41(b). Compare with Figs. 4.54 and 4.55.

order 2 and with D0 set to the same values as in Fig. 4.54. The boundaries are
much less distorted than in Fig. 4.54, even for the smallest value of cutoff fre-
quency. Because the spot sizes in the center areas of the IHPF and the BHPF
are similar [see Figs. 4.53(a) and (b)], the performance of the two filters on the
smaller objects is comparable. The transition into higher values of cutoff fre-
quencies is much smoother with the BHPF.

4.9.3 Gaussian Highpass Filters


The transfer function of the Gaussian highpass filter (GHPF) with cutoff fre-
quency locus at a distance D0 from the center of the frequency rectangle is
given by
2 2
H(u, v) = 1 - e-D (u,v)>2D0 (4.9-4)
where D(u, v) is given by Eq. (4.8-2). This expression follows directly from
Eqs. (4.8-7) and (4.9-1). The third row in Fig. 4.52 shows a perspective plot,
image, and cross section of the GHPF function. Following the same format as
for the BHPF, we show in Fig. 4.56 comparable results using GHPFs. As ex-
pected, the results obtained are more gradual than with the previous two fil-
ters. Even the filtering of the smaller objects and thin bars is cleaner with the
Gaussian filter. Table 4.5 contains a summary of the highpass filters discussed
in this section.

TABLE 4.5
Highpass filters. D0 is the cutoff frequency and n is the order of the Butterworth filter.

Ideal Butterworth Gaussian

1 if D(u, v) … D0 1 2 2
H(u, v) = b H(u, v) = 1 - e -D (u,v)>2D0
1 + [D0 >D(u, v)]2n
H(u, v) =
0 if D(u, v) 7 D0
286 Chapter 4 ■ Filtering in the Frequency Domain

EXAMPLE 4.19: ■ Figure 4.57(a) is a 1026 * 962 image of a thumb print in which smudges
Using highpass (a typical problem) are evident. A key step in automated fingerprint recog-
filtering and
nition is enhancement of print ridges and the reduction of smudges. En-
thresholding for
image hancement is useful also in human interpretation of prints. In this example,
enhancement. we use highpass filtering to enhance the ridges and reduce the effects of
smudging. Enhancement of the ridges is accomplished by the fact that they
contain high frequencies, which are unchanged by a highpass filter. On the
other hand, the filter reduces low frequency components, which correspond
to slowly varying intensities in the image, such as the background and
smudges. Thus, enhancement is achieved by reducing the effect of all fea-
tures except those with high frequencies, which are the features of interest
in this case.
Figure 4.57(b) is the result of using a Butterworth highpass filter of order 4
The value D0 = 50 is ap- with a cutoff frequency of 50. As expected, the highpass-filtered image lost its
proximately 2.5% of the
short dimension of the gray tones because the dc term was reduced to 0. The net result is that dark
padded image. The idea tones typically predominate in highpass-filtered images, thus requiring addi-
is for D0 to be close to
the origin so low fre- tional processing to enhance details of interest. A simple approach is to thresh-
quencies are attenuated, old the filtered image. Figure 4.57(c) shows the result of setting to black all
but not completely elimi-
nated. A range of 2% to negative values and to white all positive values in the filtered image. Note how
5% of the short dimen- the ridges are clear and the effect of the smudges has been reduced consider-
sion is a good starting
point. ably. In fact, ridges that are barely visible in the top, right section of the image
in Fig. 4.57(a) are nicely enhanced in Fig. 4.57(c). ■

4.9.4 The Laplacian in the Frequency Domain


In Section 3.6.2, we used the Laplacian for image enhancement in the spatial
domain. In this section, we revisit the Laplacian and show that it yields equiv-
alent results using frequency domain techniques. It can be shown (Problem
4.26) that the Laplacian can be implemented in the frequency domain using
the filter

H(u, v) = -4p2(u2 + v2) (4.9-5)

a b c
FIGURE 4.57 (a) Thumb print. (b) Result of highpass filtering (a). (c) Result of
thresholding (b). (Original image courtesy of the U.S. National Institute of Standards
and Technology.)
4.9 ■ Image Sharpening Using Frequency Domain Filters 287

or, with respect to the center of the frequency rectangle, using the filter

H(u, v) = -4p2 C (u - P>2)2 + (v - Q>2)2 D


(4.9-6)
= -4p2D2(u, v)

where D(u, v) is the distance function given in Eq. (4.8-2). Then, the Laplacian
image is obtained as:

§2f(x, y) = ᑣ-1 E H(u, v)F(u, v) F (4.9-7)

where F(u, v) is the DFT of f(x, y). As explained in Section 3.6.2, enhance-
ment is achieved using the equation:

g(x, y) = f(x, y) + c§2f(x, y) (4.9-8)

Here, c = -1 because H(u, v) is negative. In Chapter 3, f(x, y) and §2f(x, y)


had comparable values. However, computing §2f(x, y) with Eq. (4.9-7) intro-
duces DFT scaling factors that can be several orders of magnitude larger than
the maximum value of f. Thus, the differences between f and its Laplacian
must be brought into comparable ranges. The easiest way to handle this prob-
lem is to normalize the values of f(x, y) to the range [0, 1] (before computing
its DFT) and divide §2f(x, y) by its maximum value, which will bring it to the
approximate range [-1, 1] (recall that the Laplacian has negative values).
Equation (4.9-8) can then be applied.
In the frequency domain, Eq. (4.9-8) is written as

g(x, y) = ᑣ-1 E F(u, v) - H(u, v)F(u, v) F

= ᑣ-1 E C 1 - H(u, v) D F(u, v) F (4.9-9)

= ᑣ-1 E C 1 + 4p2D2(u, v) D F(u, v) F

Although this result is elegant, it has the same scaling issues just mentioned,
compounded by the fact that the normalizing factor is not as easily computed.
For this reason, Eq. (4.9-8) is the preferred implementation in the frequency
domain, with §2f(x, y) computed using Eq. (4.9-7) and scaled using the ap-
proach mentioned in the previous paragraph.

■ Figure 4.58(a) is the same as Fig. 3.38(a), and Fig. 4.58(b) shows the result of EXAMPLE 4.20:
using Eq. (4.9-8), in which the Laplacian was computed in the frequency do- Image sharpening
in the frequency
main using Eq. (4.9-7). Scaling was done as described in connection with that
domain using the
equation. We see by comparing Figs. 4.58(b) and 3.38(e) that the frequency do- Laplacian.
main and spatial results are identical visually. Observe that the results in these
two figures correspond to the Laplacian mask in Fig. 3.37(b), which has a -8 in
the center (Problem 4.26). ■
288 Chapter 4 ■ Filtering in the Frequency Domain

a b
FIGURE 4.58
(a) Original,
blurry image.
(b) Image
enhanced using
the Laplacian in
the frequency
domain. Compare
with Fig. 3.38(e).

4.9.5 Unsharp Masking, Highboost Filtering,


and High-Frequency-Emphasis Filtering
In this section, we discuss frequency domain formulations of the unsharp
masking and high-boost filtering image sharpening techniques introduced in
Section 3.6.3. Using frequency domain methods, the mask defined in Eq. (3.6-8)
is given by

gmask(x, y) = f(x, y) - fLP (x, y) (4.9-10)

with

fLP (x, y) = ᑣ-1 C HLP (u, v)F(u, v) D (4.9-11)

where HLP (u, v) is a lowpass filter and F(u, v) is the Fourier transform of
f(x, y). Here, fLP (x, y) is a smoothed image analogous to f(x, y) in Eq. (3.6-8).
Then, as in Eq. (3.6-9),

g(x, y) = f(x, y) + k * gmask(x, y) (4.9-12)

This expression defines unsharp masking when k = 1 and highboost filter-


ing when k 7 1. Using the preceding results, we can express Eq. (4.9-12)
entirely in terms of frequency domain computations involving a lowpass
filter:

g(x, y) = ᑣ-1 E C 1 + k * [1 - HLP (u, v)] D F(u, v) F (4.9-13)

Using Eq. (4.9-1), we can express this result in terms of a highpass filter:

g(x, y) = ᑣ-1 E [1 + k * HHP (u, v)]F(u, v) F (4.9-14)


4.9 ■ Image Sharpening Using Frequency Domain Filters 289

The expression contained within the square brackets is called a high-frequency-


emphasis filter. As noted earlier, highpass filters set the dc term to zero, thus
reducing the average intensity in the filtered image to 0. The high-frequency-
emphasis filter does not have this problem because of the 1 that is added to the
highpass filter. The constant, k, gives control over the proportion of high fre-
quencies that influence the final result. A slightly more general formulation of
high-frequency-emphasis filtering is the expression

g(x, y) = ᑣ-1 E [k1 + k2 * HHP (u, v)]F(u, v) F (4.9-15)

where k1 Ú 0 gives controls of the offset from the origin [see Fig. 4.31(c)] and
k2 Ú 0 controls the contribution of high frequencies.

■ Figure 4.59(a) shows a 416 * 596 chest X-ray with a narrow range of inten- EXAMPLE 4.21:
sity levels. The objective of this example is to enhance the image using high- Image
enhancement
frequency-emphasis filtering. X-rays cannot be focused in the same manner
using high-
that optical lenses are focused, and the resulting images generally tend to be frequency-
slightly blurred. Because the intensities in this particular image are biased emphasis filtering.
toward the dark end of the gray scale, we also take this opportunity to give
an example of how spatial domain processing can be used to complement
frequency-domain filtering.
Figure 4.59(b) shows the result of highpass filtering using a Gaussian filter Artifacts such as ringing
are unacceptable in med-
with D0 = 40 (approximately 5% of the short dimension of the padded ical imaging. Thus, it is
image). As expected, the filtered result is rather featureless, but it shows faint- good practice to avoid
using filters that have the
ly the principal edges in the image. Figure 4.59(c) shows the advantage of high- potential for introducing
emphasis filtering, where we used Eq. (4.9-15) with k1 = 0.5 and k2 = 0.75. artifacts in the processed
image. Because spatial
Although the image is still dark, the gray-level tonality due to the low-frequency and frequency domain
components was not lost. Gaussian filters are
Fourier transform pairs,
As discussed in Section 3.3.1, an image characterized by intensity levels in a these filters produce
narrow range of the gray scale is an ideal candidate for histogram equaliza- smooth results that are
void of artifacts.
tion. As Fig. 4.59(d) shows, this was indeed an appropriate method to further
enhance the image. Note the clarity of the bone structure and other details
that simply are not visible in any of the other three images. The final enhanced
image is a little noisy, but this is typical of X-ray images when their gray scale
is expanded. The result obtained using a combination of high-frequency em-
phasis and histogram equalization is superior to the result that would be ob-
tained by using either method alone. ■

4.9.6 Homomorphic Filtering


The illumination-reflectance model introduced in Section 2.3.4 can be used to
develop a frequency domain procedure for improving the appearance of an
image by simultaneous intensity range compression and contrast enhance-
ment. From the discussion in that section, an image f(x, y) can be expressed as
the product of its illumination, i(x, y), and reflectance, r(x, y), components:

f(x, y) = i(x, y)r(x, y) (4.9-16)


290 Chapter 4 ■ Filtering in the Frequency Domain

a b
c d
FIGURE 4.59 (a) A chest X-ray image. (b) Result of highpass filtering with a Gaussian
filter. (c) Result of high-frequency-emphasis filtering using the same filter. (d) Result of
performing histogram equalization on (c). (Original image courtesy of Dr. Thomas R.
Gest, Division of Anatomical Sciences, University of Michigan Medical School.)

This equation cannot be used directly to operate on the frequency compo-


nents of illumination and reflectance because the Fourier transform of a prod-
uct is not the product of the transforms:
ᑣ[f(x, y)] Z ᑣ[i(x, y)] ᑣ[r(x, y)] (4.9-17)
If an image f (x, y) with However, suppose that we define
intensities in the range
[0, L - 1] has any 0 val-
ues, a 1 must be added to
z(x, y) = ln f(x, y)
every element of the (4.9-18)
image to avoid having to = ln i(x, y) + ln r(x, y)
deal with ln(0). The 1 is
then subtracted at the
end of the filtering
Then,

ᑣ E z(x, y) F = ᑣ E ln f(x, y) F
process.

= ᑣ E ln i(x, y) F + ᑣ E ln r(x, y) F
(4.9-19)

or

Z(u, v) = Fi (u, v) + Fr (u, v) (4.9-20)


4.9 ■ Image Sharpening Using Frequency Domain Filters 291

where Fi (u, v) and Fr (u, v) are the Fourier transforms of ln i(x, y) and
ln r(x, y), respectively.
We can filter Z(u, v) using a filter H(u, v) so that

S(u, v) = H(u, v)Z(u, v)


(4.9-21)
= H(u, v)Fi (u, v) + H(u, v)Fr (u, v)

The filtered image in the spatial domain is

s(x, y) = ᑣ-1 E S(u, v) F


(4.9-22)
= ᑣ-1 E H(u, v)Fi (u, v) F + ᑣ-1 E H(u, v)Fr (u, v) F

By defining

i¿(x, y) = ᑣ-1 E H(u, v)Fi (u, v) F (4.9-23)

and

r¿(x, y) = ᑣ-1 E H(u, v)Fr (u, v) F (4.9-24)

we can express Eq. (4.9-23) in the form

s(x, y) = i¿(x, y) + r¿(x, y) (4.9-25)

Finally, because z(x, y) was formed by taking the natural logarithm of the
input image, we reverse the process by taking the exponential of the filtered
result to form the output image:

g(x, y) = e s(x,y)

= e i¿(x,y)e r¿(x,y) (4.9-26)


= i0 (x, y) r0 (x, y)

where

i0 (x, y) = e i¿(x,y) (4.9-27)

and

r0 (x, y) = e r¿(x,y) (4.9-28)

are the illumination and reflectance components of the output (processed)


image.
292 Chapter 4 ■ Filtering in the Frequency Domain

FIGURE 4.60
Summary of steps
f(x, y) ln DF T H (u, v) (DF T)1 exp g (x, y)
in homomorphic
filtering.

The filtering approach just derived is summarized in Fig. 4.60. This method
is based on a special case of a class of systems known as homomorphic systems.
In this particular application, the key to the approach is the separation of the
illumination and reflectance components achieved in the form shown in
Eq. (4.9-20). The homomorphic filter function H(u, v) then can operate on
these components separately, as indicated by Eq. (4.9-21).
The illumination component of an image generally is characterized by slow
spatial variations, while the reflectance component tends to vary abruptly, par-
ticularly at the junctions of dissimilar objects. These characteristics lead to as-
sociating the low frequencies of the Fourier transform of the logarithm of an
image with illumination and the high frequencies with reflectance. Although
these associations are rough approximations, they can be used to advantage in
image filtering, as illustrated in Example 4.22.
A good deal of control can be gained over the illumination and reflectance
components with a homomorphic filter. This control requires specification of
a filter function H(u, v) that affects the low- and high-frequency components
of the Fourier transform in different, controllable ways. Figure 4.61 shows a
cross section of such a filter. If the parameters gL and gH are chosen so that
gL 6 1 and gH 7 1, the filter function in Fig. 4.61 tends to attenuate the con-
tribution made by the low frequencies (illumination) and amplify the contri-
bution made by high frequencies (reflectance). The net result is simultaneous
dynamic range compression and contrast enhancement.
The shape of the function in Fig. 4.61 can be approximated using the basic
form of a highpass filter. For example, using a slightly modified form of the
Gaussian highpass filter yields the function

H(u, v) = (gH - gL) C 1 - e -c[D (u, v)>D0 ] D + gL


2 2
(4.9-29)

FIGURE 4.61 H (u, v)


Radial cross
section of a
circularly
symmetric
homomorphic gH
filter function.
The vertical axis is
at the center of
the frequency
rectangle and
D(u, v) is the
distance from the gL
center.

D(u, v)
4.9 ■ Image Sharpening Using Frequency Domain Filters 293

where D(u, v) is defined in Eq. (4.8-2) and the constant c controls the
sharpness of the slope of the function as it transitions between gL and gH.
This filter is similar to the high-emphasis filter discussed in the previous
section.

■ Figure 4.62(a) shows a full body PET (Positron Emission Tomography) EXAMPLE 4.22:
scan of size 1162 * 746 pixels. The image is slightly blurry and many of its Image
enhancement
low-intensity features are obscured by the high intensity of the “hot spots”
using
dominating the dynamic range of the display. (These hot spots were caused by homomorphic
a tumor in the brain and one in the lungs.) Figure 4.62(b) was obtained by ho- filtering.
momorphic filtering Fig. 4.62(a) using the filter in Eq. (4.9-29) with
gL = 0.25, gH = 2, c = 1, and D0 = 80. A cross section of this filter looks Recall that filtering uses
image padding, so the fil-
just like Fig. 4.61, with a slightly steeper slope. ter is of size P * Q.
Note in Fig. 4.62(b) how much sharper the hot spots, the brain, and the
skeleton are in the processed image, and how much more detail is visible in
this image. By reducing the effects of the dominant illumination components
(the hot spots), it became possible for the dynamic range of the display to
allow lower intensities to become much more visible. Similarly, because the
high frequencies are enhanced by homomorphic filtering, the reflectance
components of the image (edge information) were sharpened considerably.
The enhanced image in Fig. 4.62(b) is a significant improvement over the
original. ■

a b
FIGURE 4.62
(a) Full body PET
scan. (b) Image
enhanced using
homomorphic
filtering. (Original
image courtesy of
Dr. Michael
E. Casey, CTI
PET Systems.)
294 Chapter 4 ■ Filtering in the Frequency Domain

4.10 Selective Filtering


The filters discussed in the previous two sections operate over the entire fre-
quency rectangle. There are applications in which it is of interest to process
specific bands of frequencies or small regions of the frequency rectangle. Fil-
ters in the first category are called bandreject or bandpass filters, respectively.
Filters in the second category are called notch filters.

4.10.1 Bandreject and Bandpass Filters


These types of filters are easy to construct using the concepts from the previ-
ous two sections. Table 4.6 shows expressions for ideal, Butterworth, and
Gaussian bandreject filters, where D(u, v) is the distance from the center of
the frequency rectangle, as given in Eq. (4.8-2), D0 is the radial center of the
band, and W is the width of the band. Figure 4.63(a) shows a Gaussian band-
reject filter in image form, where black is 0 and white is 1.
A bandpass filter is obtained from a bandreject filter in the same manner
that we obtained a highpass filter from a lowpass filter:

HBP (u, v) = 1 - HBR (u, v) (4.10-1)

Figure 4.63(b) shows a Gaussian bandpass filter in image form.

4.10.2 Notch Filters


Notch filters are the most useful of the selective filters. A notch filter rejects
(or passes) frequencies in a predefined neighborhood about the center of the
frequency rectangle. Zero-phase-shift filters must be symmetric about the ori-
gin, so a notch with center at (u0, v0) must have a corresponding notch at loca-
tion (-u0, -v0). Notch reject filters are constructed as products of highpass
filters whose centers have been translated to the centers of the notches. The
general form is:
Q
HNR(u, v) = q Hk(u, v)H-k(u, v) (4.10-2)
k=1

where Hk(u, v) and H-k(u, v) are highpass filters whose centers are at (uk, vk)
and (-uk, -vk), respectively. These centers are specified with respect to the

TABLE 4.6
Bandreject filters. W is the width of the band, D is the distance D(u, v) from the center of the filter, D0 is the
cutoff frequency, and n is the order of the Butterworth filter. We show D instead of D(u, v) to simplify the
notation in the table.

Ideal Butterworth Gaussian

W W 1
H(u, v) =
H(u, v) = 1 - e - C D
0 if D0 - … D … D0 + 2n D2 - D02 2
H(u, v) = c 2 2 DW DW
1 + B 2 R
1 otherwise D - D20
4.10 ■ Selective Filtering 295

a b
FIGURE 4.63
(a) Bandreject
Gaussian filter.
(b) Corresponding
bandpass filter.
The thin black
border in (a) was
added for clarity; it
is not part of the
data.

center of the frequency rectangle, (M/2, N/2). The distance computations for
each filter are thus carried out using the expressions

Dk(u, v) = C (u - M>2 - uk)2 + (v - N>2 - vk)2 D


1/2
(4.10-3)

and

D-k(u, v) = C (u - M>2 + uk)2 + (v - N>2 + vk)2 D


1/2
(4.10-4)

For example, the following is a Butterworth notch reject filter of order n, con-
taining three notch pairs:
3
1 1
HNR(u, v) = q B RB R (4.10-5)
k=1 1 + [D0k >D k (u, v)]2n
1 + [D0k>D -k(u, v)]
2n

where Dk and D-k are given by Eqs. (4.10-3) and (4.10-4). The constant D0k is
the same for each pair of notches, but it can be different for different pairs.
Other notch reject filters are constructed in the same manner, depending on
the highpass filter chosen. As with the filters discussed earlier, a notch pass fil-
ter is obtained from a notch reject filter using the expression

HNP (u, v) = 1 - HNR (u, v) (4.10-6)

As the next three examples show, one of the principal applications of notch
filtering is for selectively modifying local regions of the DFT. This type of pro-
cessing typically is done interactively, working directly on DFTs obtained
without padding. The advantages of working interactively with actual DFTs
(as opposed to having to “translate” from padded to actual frequency values)
outweigh any wraparound errors that may result from not using padding in
the filtering process. Also, as we show in Section 5.4.4, even more powerful
notch filtering techniques than those discussed here are based on unpadded
DFTs. To get an idea of how DFT values change as a function of padding, see
Problem 4.22.
296 Chapter 4 ■ Filtering in the Frequency Domain

EXAMPLE 4.23: ■ Figure 4.64(a) is the scanned newspaper image from Fig. 4.21, showing a
Reduction of prominent moiré pattern, and Fig. 4.64(b) is its spectrum. We know from
moiré patterns
using notch
Table 4.3 that the Fourier transform of a pure sine, which is a periodic func-
filtering. tion, is a pair of conjugate symmetric impulses. The symmetric “impulse-like”
bursts in Fig. 4.64(b) are a result of the near periodicity of the moiré pattern.
We can attenuate these bursts by using notch filtering.

a b
c d
FIGURE 4.64
(a) Sampled
newspaper image
showing a
moiré pattern.
(b) Spectrum.
(c) Butterworth
notch reject filter
multiplied by the
Fourier
transform.
(d) Filtered
image.
4.10 ■ Selective Filtering 297

Figure 4.64(c) shows the result of multiplying the DFT of Fig. 4.64(a) by a
Butterworth notch reject filter with D0 = 3 and n = 4 for all notch pairs. The
value of the radius was selected (by visual inspection of the spectrum) to en-
compass the energy bursts completely, and the value of n was selected to give
notches with mildly sharp transitions. The locations of the center of the notch-
es were determined interactively from the spectrum. Figure 4.64(d) shows the
result obtained with this filter using the procedure outlined in Section 4.7.3.
The improvement is significant, considering the low resolution and degrada-
tion of the original image. ■

■ Figure 4.65(a) shows an image of part of the rings surrounding the planet EXAMPLE 4.24:
Saturn. This image was captured by Cassini, the first spacecraft to enter the Enhancement of
corrupted Cassini
planet’s orbit. The vertical sinusoidal pattern was caused by an AC signal su-
Saturn image by
perimposed on the camera video signal just prior to digitizing the image. This notch filtering.
was an unexpected problem that corrupted some images from the mission.
Fortunately, this type of interference is fairly easy to correct by postprocessing.
One approach is to use notch filtering.
Figure 4.65(b) shows the DFT spectrum. Careful analysis of the vertical axis
reveals a series of small bursts of energy which correspond to the nearly sinusoidal

a b
c d
FIGURE 4.65
(a) 674 * 674
image of the
Saturn rings
showing nearly
periodic
interference.
(b) Spectrum: The
bursts of energy
in the vertical axis
near the origin
correspond to the
interference
pattern. (c) A
vertical notch
reject filter.
(d) Result of
filtering. The thin
black border in
(c) was added for
clarity; it is not
part of the data.
(Original image
courtesy
of Dr. Robert
A. West,
NASA/JPL.)
298 Chapter 4 ■ Filtering in the Frequency Domain

a b
FIGURE 4.66
(a) Result
(spectrum) of
applying a notch
pass filter to
the DFT of
Fig. 4.65(a).
(b) Spatial
pattern obtained
by computing the
IDFT of (a).

interference. A simple approach is to use a narrow notch rectangle filter starting


with the lowest frequency burst and extending for the remaining of the vertical
axis. Figure 4.65(c) shows such a filter (white represents 1 and black 0). Figure
4.65(d) shows the result of filtering the corrupted image with this filter.This result
is a significant improvement over the original image.
We isolated the frequencies in the vertical axis using a notch pass version of
the same filter [Fig. 4.66(a)]. Then, as Fig. 4.66(b) shows, the IDFT of these fre-
quencies yielded the spatial interference pattern itself. ■

4.11 Implementation
We have focused attention thus far on theoretical concepts and on examples of
filtering in the frequency domain. One thing that should be clear by now is that
computational requirements in this area of image processing are not trivial.
Thus, it is important to develop a basic understanding of methods by which
Fourier transform computations can be simplified and speeded up. This sec-
tion deals with these issues.

4.11.1 Separability of the 2-D DFT


As mentioned in Table 4.2, the 2-D DFT is separable into 1-D transforms. We
can write Eq. (4.5-15) as
M-1 N-1
F(u, v) = a e-j2pux>M a f(x, y) e-j2pvy>N
x=0 y=0
(4.11-1)
M-1
= a F(x, v) e-j2pux>M
x=0

where
N-1
F(x, v) = a f(x, y) e-j2pvy>N (4.11-2)
y=0

You might also like