0% found this document useful (0 votes)
29 views132 pages

DIP Material

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views132 pages

DIP Material

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 132

Introduction

What Is Digital Image Processing?


An image may be defined as a two-dimensional function, f(x, y), where
x and y are spatial (plane) coordinates, and the amplitude of f at any pair of coordinates (x, y) is
called the intensity or gray level of the image at that point. When x, y, and the amplitude values
of f are all finite, discrete quantities, we call the image a digital image. The field of digital image
processing refers to processing digital images by means of a digital computer. Note that a digital
image is composed of a finite number of elements, each of which has a particular location and
value. These elements are referred to as picture elements, image elements, pels, and pixels. Pixel
is the term most widely used to denote the elements of a digital image.

Fundamental Steps in Digital Image Processing

There are two categories of the steps involved in the image processing
1. Methods whose outputs are input are images.
2. Methods whose outputs are attributes extracted from those images.

i) Image acquisition
It could be as simple as being given an image that is already in digital form. Generally the
image acquisition stage involves processing such as scaling.

ii) Image Enhancement


It is among the simplest and most appealing areas of digital image processing. The idea behind
this is to bring out details that are obscured or simply to highlight certain features of interest in
image. Image enhancement is a very subjective area of image processing.
iii) Image Restoration
It deals with improving the appearance of an image. It is an objective approach, in the sense that
restoration techniques tend to be based on mathematical or probabilistic models of image
processing. Enhancement, on the other hand is based on human subjective preferences regarding
what constitutes a “good” enhancement result.

iv) Color image processing


It is an area that is been gaining importance because of the use of digital images over the
internet. Color image processing deals with basically color models and their implementation in
image processing applications.

v) Wavelets and Multiresolution Processing


These are the foundation for representing image in various degrees of resolution

vi) Compression
It deals with techniques reducing the storage required to save an image, or the bandwidth
required to transmit it over the network. It has to major approaches:
a) Lossless Compression
b) Lossy Compression

vii) Morphological processing


It deals with tools for extracting image components that are useful in the representation
and description of shape and boundary of objects. It is majorly used in automated inspection
Applications.

viii) Representation and Description


It always follows the output of segmentation step that is, raw pixel data, constituting either the
boundary of an image or points in the region itself. In either case converting the data to a form
suitable for computer processing is necessary.

ix) Recognition
It is the process that assigns label to an object based on its descriptors. It is the last step of image
processing which use artificial intelligence software.

x) Knowledge base
Knowledge about a problem domain is coded into an image processing system in the form of a
knowledge base. This knowledge may be as simple as detailing regions of an image where the
information of the interest in known to be located. Thus limiting search that has to be conducted
in seeking the information. The knowledge base also can be quite complex such interrelated list
of all major possible defects in a materials inspection problems or an image database containing
high resolution satellite images of a region in connection with change detection application
Components of Image Processing System

Image Sensors
With reference to sensing, two elements are required to acquire digital image. The first is a
physical device that is sensitive to the energy radiated by the object we wish to image and second
is specialized image processing hardware.

Specialized image processing hardware:


It consists of the digitizer just mentioned, plus hardware that performs other primitive operations
such as an arithmetic logic unit, which performs arithmetic such addition and subtraction and
logical operations in parallel on images.

Computer:
It is a general purpose computer and can range from a PC to a supercomputer depending on the
application. In dedicated applications, sometimes specially designed computer are used to
achieve a required level of performance.

Software
It consist of specialized modules that perform specific tasks a well designed package also
includes capability for the user to write code, as a minimum, utilizes the specialized module.
More sophisticated software packages allow the integration of these modules.

Mass storage
This capability is a must in image processing applications. An image of size 1024 x1024 pixels,
in which the intensity of each pixel is an 8- bit quantity requires one megabytes of storage space
if the image is not compressed.
Image processing applications falls into three principal categories of storage
i) Short term storage for use during processing
ii) On line storage for relatively fast retrieval
iii) Archival storage such as magnetic tapes and disks

Image displays
Image displays in use today are mainly color TV monitors. These monitors are driven by the
outputs of image and graphics displays cards that are an integral part of computer system

Hardcopy devices
The devices for recording image includes laser printers, film cameras, heat sensitive devices
inkjet units and digital units such as optical and CD ROM disk. Films provide the highest
possible resolution, but paper is the obvious medium of choice for written applications.

Networking
It is almost a default function in any computer system in use today because of the large amount
of data inherent in image processing applications. The key consideration in image transmission
bandwidth.

Elements of Visual Perception

Structure of the human Eye:


The eye is nearly a sphere with average approximately 20 mm diameter. The eye is enclosed
with three membranes

a) The cornea and sclera: it is a tough, transparent tissue that covers the anterior surface of the
eye. Rest of the optic globe is covered by the sclera
b) The choroid: It contains a network of blood vessels that serve as the major source of nutrition
to the eyes. It helps to reduce extraneous light entering in the eye It has two parts
(1) Iris Diaphragms- it contracts or expands to control the amount of light that enters
the eyes.
(2) Ciliary body
The lens is made up of concentric layers of fibrous cells and is suspended by fibers that attach to
the ciliary body. It contains 60 to 70% water, about 6% fat, and more protein than any other
tissue in the eye.
The lens is colored by a slightly yellow pigmentation that increases with age. In extreme cases,
excessive clouding of the lens, caused by the affliction commonly referred to as cataracts, can
lead to poor color discrimination and loss of clear vision.
c) Retina: it is innermost membrane of the eye. When the eye is properly focused, light from an
object outside the eye is imaged on the retina. There are various light receptors over the surface
of the retina
The two major classes of the receptors are-
1) Cones- it is in the number about 6 to 7 million. These are located in the central portion of the
retina called the fovea. These are highly sensitive to color. Human can resolve fine details with
these cones because each one is connected to its own nerve end. Cone vision is called photopic
or bright light vision
2) Rods – these are very much in number from 75 to 150 million and are distributed over the
entire retinal surface. The large area of distribution and the fact that several roads are connected
to a single nerve give a general overall picture of the field of view.They are not involved in the
color vision and are sensitive to low level of illumination. Rod vision is called is scotopic or dim
light vision.
The absent of reciprocators is called blind spot. Figure shows the density of rods and cones for a
cross section of the right eye passing through the region of emergence of the optic nerve from the
eye.

Image Formation in the Eye:


 The major difference between the lens of the eye and an ordinary optical lens in that the
former is flexible.
 The shape of the lens of the eye is controlled by tension in the fiber of the ciliary body.
To focus on the distant object the controlling muscles allow the lens to become thicker in
order to focus on object near the eye it becomes relatively flattened.
 The distance between the center of the lens and the retina is called the focal length and it
varies from 17mm to 14mm as the refractive power of the lens increases from its
minimum to its maximum.
When the eye focuses on an object farther away than about 3m the lens exhibits its lowest
refractive power. When the eye focuses on a nearly object. The lens is most strongly refractive.
The retinal image is reflected primarily in the area of the fovea. Perception then takes place by
the relative excitation of light receptors, which transform radiant energy into electrical impulses
that are ultimately decoded by the brain.

The geometry in Fig. illustrates how to obtain the dimensions of an image formed on the retina.
For example, suppose that a person is looking at a tree 15 m high at a distance of 100 m. Letting
h denote the height of that object in the retinal image, the geometry of Fig. yields 15/100 = h/17
or h = 2.55mm. The retinal image is focused primarily on the region of the fovea. Perception
then takes place by the relative excitation of light receptors, which transform radiant energy into
electrical impulses that ultimately are decoded by the brain.

Brightness Adaption and Discrimination:


Digital image are displayed as a discrete set of intensities. The range of light intensity levels to
which the human visual system can adopt is enormous- on the order of 1010 from scotopic
threshold to the glare limit. Experimental evidences indicate that subjective brightness is a
logarithmic function of the light intensity incident on the eye.

The curve represents the range of intensities to which the visual system can adopt. But the visual
system cannot operate over such a dynamic range simultaneously. Rather, it is accomplished by
change in its overcall sensitivity called brightness adaptation. For any given set of conditions, the
current sensitivity level to which of the visual system is called brightness adoption level , Ba in
the curve. The small intersecting curve represents the range of subjective brightness that the eye
can perceive when adapted to this level. It is restricted at level Bb , at and below which all
stimuli are perceived as indistinguishable blacks. The upper portion of the curve is not actually
restricted. Whole simply raise the adaptation level higher than Ba. The ability of the eye to
discriminate between change in light intensity at any specific adaptation level is also of
considerable interest. Take a flat, uniformly illuminated area large enough to occupy the entire
field of view of the subject. It may be a diffuser such as an opaque glass, that is illuminated from
behind by a light source whose intensity, I can be varied. To this field is added an increment of
illumination ΔI in the form of a short duration flash that appears as circle in the center of the
uniformly illuminated field. If ΔI is not bright enough, the subject cannot see any perceivable
changes.

As ΔI gets stronger the subject may indicate of a perceived change. ΔIc is the increment of
illumination discernible 50% of the time with background illumination I. Now, ΔIc /I is called
the Weber ratio.
Small value means that small percentage change in intensity is discernible representing “good”
brightness discrimination.
Large value of Weber ratio means large percentage change in intensity is required representing
“poor brightness discrimination”. Optical illusion In this the eye fills the non existing
information or wrongly pervious geometrical properties of objects

.
A plot of as a function of log I has the general shape shown in Fig. This curve shows that
brightness discrimination is poor (the Weber ratio is large) at low levels of illumination, and it
improves significantly (the Weber ratio decreases) as background illumination increases.
Two phenomena clearly demonstrate that perceived brightness is not a simple function of intensity. The
first is based on the fact that the visual system tends to undershoot or overshoot around the boundary of
regions of different intensities. Figure 2.7(a) shows a striking example of this phenomenon. Although the
intensity of the stripes is constant, we actually perceive a brightness pattern that is strongly scalloped near
the boundaries fig. These seemingly scalloped bands are called Mach bands.

The second phenomenon, called simultaneous contrast, is related to the fact that a region’s
perceived brightness does not depend simply on its intensity, as Fig demonstrates. All the center
squares have exactly the same intensity.

However, they appear to the eye to become darker as the background gets lighter.

Other examples of human perception phenomena are optical illusions, in which the eye fills in
nonexisting information or wrongly perceives geometrical properties of objects. Figure shows
some examples. In Fig.a. the outline of a square is seen clearly, despite the fact that no lines
defining such a figure are part of the image.The same effect, this time with a circle, can be seen
in Fig.(b); note how just a few lines are sufficient to give the illusion of a complete circle. The
two horizontal line segments in Fig. (c) are of the same length, but one appears shorter than the
other. Finally, all lines in Fig.(d) that are oriented at 45° are equidistant and parallel. Yet the
crosshatching creates the illusion that those lines are far from being parallel. Optical illusions are
a characteristic of the human visual system that is not fully understood.
Image Sensing and Acquisition:

The types of images in which we are interested are generated by the combination of an
“illumination” source and the reflection or absorption of energy from that source by the elements
of the “scene” being imaged.

Depending on the nature of the source, illumination energy is reflected from, or transmitted
through, objects. An example in the first category is light reflected from a planar surface. An
example in the second category is when X-rays pass through a patient’s body for the purpose of
generating a diagnostic X-ray film. In some applications, the reflected or transmitted energy is
focused onto a photo converter (e.g., a phosphor screen), which converts the energy into visible
light. Electron microscopy and some applications of gamma imaging use this approach.
The idea is simple: Incoming energy is transformed into a voltage by the combination of input
electrical power and sensor material that is responsive to the particular type of energy being
detected.
The output voltage waveform is the response of the sensor(s), and a digital quantity is obtained
from each sensor by digitizing its response. In this section, we look at the principal modalities for
image sensing and generation.
Figure 2.12 shows the three principal sensor arrangements used to transform illumination energy
into digital images.The idea is simple: Incoming energy is transformed into a voltage by the
combination of input electrical power and sensor material that is responsive to the particular type
of energy being detected.The output voltage waveform is the response of the sensor(s), and a
digital quantity is obtained from each sensor by digitizing its response. In this section, we look at
the principal modalities for image sensing and generation.
Image Acquisition Using a Single Sensor:
The components of a single sensor. The most familiar sensor of this type is the
photodiode, which is constructed of silicon materials and whose output voltage waveform is
proportional to light. The use of a filter in front of a sensor improves selectivity.
For example, a green (pass) filter in front of a light sensor favors light in the green band of
the color spectrum. As a consequence, the sensor output will be stronger for green light than for
other components in the visible spectrum.
In order to generate a 2-D image using a single sensor, there has to be relative
displacements in both the x- and y-directions between the sensor and the area to be imaged.
Figure 2.13 shows an arrangement used in high-precision scanning, where a film negative is
mounted onto a drum whose mechanical rotation provides displacement in one dimension. The
single sensor is mounted on a lead screw that provides motion in the perpendicular direction.
Since mechanical motion can be controlled with high precision, this method is an inexpensive
(but slow) way to obtain high-resolution images. Other similar mechanical arrangements use a
flat bed, with the sensor moving in two linear directions. These types of mechanical digitizers
sometimes are referred to as microdensitometers.
Image Acquisition Using Sensor Strips:
 A geometry that is used much more frequently than single sensors consists of an in-line
arrangement of sensors in the form of a sensor strip, shows.
 The strip provides imaging elements in one direction. Motion perpendicular to the strip
provides imaging in the other direction. This is the type of arrangement used in most flat
bed scanners.
 Sensing devices with 4000 or more in-line sensors are possible.
 In-line sensors are used routinely in airborne imaging applications, in which the imaging
system is mounted on an aircraft that flies at a constant altitude and speed over the
geographical area to be imaged.
 One- dimensional imaging sensor strips that respond to various bands of the
electromagnetic spectrum are mounted perpendicular to the direction of flight.
 The imaging strip gives one line of an image at a time, and the motion of the strip
completes the other dimension of a two-dimensional image.
 Lenses or other focusing schemes are used to project area to be scanned onto the sensors.
 Sensor strips mounted in a ring configuration are used in medical and industrial imaging
to obtain cross-sectional (“slice”) images of 3-D objects

Image Acquisition Using Sensor Arrays:


The individual sensors arranged in the form of a 2-D array. Numerous electromagnetic
and some ultrasonic sensing devices frequently are arranged in an array format. A typical sensor
for these cameras is a CCD array.
CCD sensors are used widely in digital cameras and other light sensing instruments. The
response of each sensor is proportional to the integral of the light energy projected onto the
surface of the sensor, a property that is used in astronomical and other applications requiring low
noise images. Noise reduction is achieved by letting the sensor integrate the input light signal
over minutes or even hours. The two dimensional, its key advantage is that a complete image can
be obtained by focusing the energy pattern onto the surface of the array. Motion obviously is not
necessary, as is the case with the sensor arrangements
This figure shows the energy from an illumination source being reflected from a scene element,
but, as mentioned at the beginning of this section, the energy also could be transmitted through
the scene elements. The first function performed by the imaging system is to collect the
incoming energy and focus it onto an image plane. If the illumination is light, the front end of the
imaging system is a lens, which projects the viewed scene onto the lens focal plane. The sensor
array, which is coincident with the focal plane, produces outputs proportional to the integral of
the light received at each sensor. Digital and analog circuitry sweep these outputs and convert
them to a video signal, which is then digitized by another section of the imaging system

A Simple Image Formation Model:


An image is a two-dimensional function of the form f(x, y). The value or amplitude of f at spatial
coordinates (x, y) is a positive scalar quantity whose physical meaning is determined by the source of the image.
When an image is generated from a physical process, its intensity values are proportional to energy radiated by a
physical source.
Hence,
0 < f(x, y) < ∞
The function f(x, y) may be characterized by two components: (1) the amount of source
illumination incident on the scene being viewed, and (2) the amount of illumination reflected by
the objects in the scene. Appropriately, these are called the illumination and reflectance
components and are denoted by i(x, y) and r(x, y) , respectively. The two functions combine as a
product to form :
f(x, y) = i(x, y) r(x, y)
where
0 < i(x, y) < ∞ and 0 < r(x, y) < ∞

Reflectance is bounded by 0 (total absorption) and 1 (total reflectance)


Let the intensity (gray level) of a monochrome image at any coordinates (x0, y0) be denoted by
l = f(xo, yo)
l lies in the range Lmin≤ l≤Lmax
Lmin = iminrmin and Lmax = imaxrmax
The interval [Lmin, Lmax ] is called the gray (or intensity) scale. Common practice is to shift this
interval numerically to the interval [0, L-1], where l = 0 is considered black and l = L-1is
considered white on the gray scale. All intermediate values are shades of gray varying from
black to white.
Image Sampling and Quantization:
To create a digital image, we need to convert the continuous sensed data into digital form. This
involves two processes: sampling and quantization. A continuous image, f(x, y), that we want to
convert to digital form. An image may be continuous with respect to the x- and y-coordinates,
and also in amplitude. To convert it to digital form, we have to sample the function in both
coordinates and in amplitude. Digitizing the coordinate values is called sampling. Digitizing the
amplitude values is called quantization.

The one-dimensional function shown in Fig. 2.16(b) is a plot of amplitude (gray level) values of
the continuous image along the line segment AB. The random variations are due to image noise.
To sample this function, we take equally spaced samples along line AB, The location of each
sample is given by a vertical tick mark in the bottom part of the figure. The samples are shown
as small white squares superimposed on the function. The set of these discrete locations gives the
sampled function. However, the values of the samples still span (vertically) a continuous range
of gray-level values. In order to form a digital function, the gray-level values also must be
converted (quantized) into discrete quantities. The right side gray-level scale divided into eight
discrete levels, ranging from black to white. The vertical tick marks indicate the specific value
assigned to each of the eight gray levels. The continuous gray levels are quantized simply by
assigning one of the eight discrete gray levels to each sample. The assignment is made depending
on the vertical proximity of a sample to a vertical tick mark. The digital samples resulting from
both sampling and quantization.
When a sensing array is used for image acquisition, there is no motion and the number of
sensors in the array establishes the limits of sampling in both directions. Quantization of the
sensor outputs is as before. Figure 2.17 illustrates this concept. Figure 2.17(a) shows a
continuous image projected onto the plane of an array sensor. Figure 2.17(b) shows the image
after sampling and quantization. Clearly, the quality of a digital image is determined to a large
degree by the number of samples and discrete intensity levels used in sampling and quantization.

Representing Digital Images


The result of sampling and quantization is matrix of real numbers. Assume that an image
f(x,y) is sampled so that the resulting digital image has M rows and N Columns. The values of
the coordinates (x,y) now become discrete quantities thus the value of the coordinates at origin
become ( x,y) =(0,0).The matrix can be represented in the following form as well.

Each element of this matrix is called an image element, picture element, pixel, or pel.
Traditional matrix notation to denote a digital image and its elements:

The sampling process may be viewed as partitioning the x-y plane into a grid with the
coordinates of the center of each grid being a pair of elements from the Cartesian products Z2
which is the set of all ordered pair of elements (Zi, Zj) with Zi and Zj being integers from Z.
Hence f(x,y) is a digital image if gray level (that is, a real number from the set of real number R)
to each distinct pair of coordinates (x,y). This functional assignment is the quantization process.

If the gray levels are also integers, Z replaces R, and a digital image become a 2D
function whose coordinates and the amplitude value are integers. Due to processing storage and
hardware consideration, the number of gray levels typically is an integer power of 2. L=2K Then,
the number b, of bits required to store a digital image is
b=M *N* K
When M=N The equation become b=N2*K
When an image can have 2k gray levels, it is referred to as “k- bit”. An image with 256 possible
gray levels is called an “8-bit image (because 256=28).
There are three basic ways to represent f(x, y),

Dynamic range of an imaging system to be the ratio of the maximum measurable intensity to the
minimum detectable intensity level in the system. As a rule, the upper limit is determined by
saturation and the lower limit by noise
The difference in intensity between the highest and lowest intensity levels in an image is called
Contrast level of an image.
Spatial and Intensity Resolution:
Spatial resolution is the smallest discernible details are an image. spatial resolution can
be stated in a number of ways, with line pairs per unit distance, and dots (pixels) per unit
distance. Suppose a chart can be constructed with vertical lines of width w with the space
between the also having width W, so a line pair consists of one such line and its adjacent space
thus. The width of the line pair is 2w and there is 1/2w line pair per unit distance resolution is
simply the smallest number of discernible line pair unit distance.
Dots per unit distance are a measure of image resolution used commonly in the printing and
publishing industry. In the U.S., this measure usually is expressed as dots per inch (dpi). To give
you an idea of quality, newspapers are printed with a resolution of 75 dpi, magazines at 133 dpi,
glossy brochures at 175 dpi, and the book page at which you are presently looking is printed at
2400 dpi.
Intensity resolution refers to smallest discernible change in gray levels.
Measuring discernible change in gray levels is a highly subjective process reducing the number
of bits R while repairing the spatial resolution constant creates the problem of false contouring .it
is caused by the use of an insufficient number of gray levels on the smooth areas of the digital
image . It is called so because the rides resemble top graphics contours in a map. It is generally
quite visible in image displayed using 16 or less uniformly spaced gray levels.
iso Preference Curves
To see the effect of varying N and K simultaneously, these pictures are taken having little, mid
level and high level of details.

Different image were generated by varying N and k and observers were then asked to rank the
results according to their subjective quality. Results were summarized in the form of iso-
preference curve in the N-k plane.

The iso-preference curve tends to shift right and upward but their shapes in each of the three
image categories are shown in the figure. A shift up and right in the curve simply means large
values for N and k which implies better picture quality The result shows that iso-preference
curve tends to become more vertical as the detail in the image increases. The result suggests that
for image with a large amount of details only a few gray levels may be needed. For a fixed value
of N, the perceived quality for this type of image is nearly independent of the number of gray
levels used.

Image Interpolation:
Interpolation is a basic tool used extensively in tasks such as zooming, shrinking, rotating, and
geometric corrections. All these tasks are called resampling methods.
Interpolation is the process of using known data to estimate values at unknown locations.

Suppose that an image of 500 X 500 size pixels has to be enlarged 1.5 times to 750 X750
pixels. A simple way to visualize zooming is to create an imaginary 750 X750 grid with the same
pixel spacing as the original, and then shrink it so that it fits exactly over the original image.
Obviously, the pixel spacing in the shrunken 750 X750 grid will be less than the pixel spacing in
the original image.
To perform intensity-level assignment for any point in the overlay, we look for its closest pixel
in the original image and assign the intensity of that pixel to the new pixel in the 750 X750
grid.When we are finished assigning intensities to all the points in the overlay grid, we expand it
to the original specified size to obtain the zoomed image. The method is called nearest neighbor
interpolation because it assigns to each new location the intensity of its nearest neighbor in the
original image.
A more suitable approach is bilinear interpolation, in which we use the four nearest neighbors to
estimate the intensity at a given location. Let (x, y) denote the coordinates of the location to
which we want to assign an intensity value and let v (x, y) denote that intensity value. For
bilinear interpolation, the assigned value is obtained using the equation
v(x, y) = ax + by + cxy + d
where the four coefficients are determined from the four equations in four unknowns that can be
written using the four nearest neighbors of point (x, y).
The next level of complexity is bicubic interpolation, which involves the sixteen nearest
neighbors of a point.The intensity value assigned to point (x, y) is obtained using the equation
CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

UNIT-2

INTENSITY TRANSFORMATIONS & SPATIAL FILTERING


Image Enhancement technique is to improve the quality of an image even if the
degradation is available. This can be achieved by increasing the dominance of some features
or decreasing the ambiguity between different regions. Image enhancement approaches fall
into two broad categories: spatial domain methods and frequency domain methods. The term
spatial domain refers to the image plane itself, and approaches in this category are based on
direct manipulation of pixels in an image. Frequency domain processing techniques are based
on modifying the Fourier transform of an image.
The two principal categories of spatial domain are Intensity transformations and
spatial filtering.
 The Intensity transformations operate on single pixel of an image, for the purpose of
contrast manipulation and image thresholding.
 Spatial filtering deals with performing operations such as image sharpening by
working in neighborhood of every pixel in an image.
2.1 The Basics of Intensity transformations and Spatial filtering
The term spatial domain refers to the aggregate of pixels composing an image. The
Spatial domain processes will be denoted by the expression
g(x ,y) = T[ f(x, y)]
Where f(x, y) is the input image, g(x, y) is the processed image, and T is an operator
on f, defined over some neighborhood of (x, y). The operator can apply to a single image or
to a set of images, such as performing the pixel-by-pixel sum of a sequence of images for
noise reduction. The following figure shows the basic implementation of spatial domain on a
single image.

Fig: A 3×3 neighborhood about a point (x, y) in an image in the spatial domain.

Digital Image Processing by P Sudheer Chakravarthi 1


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

The point (x, y) is an arbitrary location in the image and the small region shown
containing the point is a neighborhood of (x, y). The neighborhood is rectangular, centered on
(x, y) and much smaller in size than the image.
The process consists of moving the origin of the neighborhood from pixel to pixel and
applying the operator T to the pixels in the neighborhood to yield the output at that location.
Thus for any specific location (x, y) the value of the output image g at those coordinates is
equal to the result of applying T to the neighborhood with origin at (x, y) in f. This procedure
is called spatial filtering, in which the neighborhood, along with a predefined operation is
called a spatial filter. The smallest possible neighborhood is of size 1×1. In this case, g
depends only on the value of f at a single point (x, y) and T becomes an intensity
transformation or gray level mapping of the form
s = T(r)
Where s and r are variables represents the intensity of g and f at any point (x, y). The
effect of applying the transformation T(r) to every pixel of f to generate the corresponding
pixels in g would produce an image of higher contrast than the original by darkening the
levels below m and brightening the levels above m in the original image. This is known as
contrast stretching (Fig.(a)), the values of r below m are compressed by the transformation
function into a narrow range of s, toward black. The opposite effect takes place for values of
r above m. In the limiting case shown in Fig.(b), T(r) produces a two-level (binary) image. A
mapping of this form is called a thresholding function. Hence the enhancement at any point
in an image depends only on the gray level at that point, and the techniques in this category
are referred to as point processing.

Fig: Intensity transformation functions (a) Contrast Stretching (b) Thresholding

Digital Image Processing by P Sudheer Chakravarthi 2


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

2.2 Intensity transformation Functions


There are three basic types of functions used frequently for image enhancement:
linear (negative and identity transformations), logarithmic (log and inverse-log
transformations), and power-law (nth power and nth root transformations).The identity
function is the trivial case in which output intensities are identical to input intensities. It is
included in the graph only for completeness.

Fig: Some basic Intensity transformation functions used for image enhancement.
Image Negatives
The negative of an image with gray levels in the range [0, L-1]is obtained by using
the negative transformation which is given by the expression
s=L-1–r
Reversing the intensity levels of an image in this manner produces the equivalent of a
photographic negative. This type of processing is particularly suited for enhancing white or
gray detail embedded in dark regions of an image, especially when the black areas are
dominant in size.

Fig: (a) Original digital mammogram. (b) Negative image obtained using the negative
transformation

Digital Image Processing by P Sudheer Chakravarthi 3


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Log Transformations
The general form of the log transformation is
S=c log (1+r)
where c is a constant, and it is assumed that r ≥ 0.The shape of the log curve shows that this
transformation maps a narrow range of low gray-level values in the input image into a wider
range of output levels. The opposite is true of higher values of input levels. We would use a
transformation of this type to expand the values of dark pixels in an image while compressing
the higher-level values. The opposite is true of the inverse log transformation. The log
transformation function has an important characteristic that it compresses the dynamic range
of images with large variations in pixel values. Log transformation is basically employed in
Fourier transform.

Fig: (a) Fourier spectrum. (b) Result of applying the log transformation
Power –Law Transformations
Power-law transformations have the basic form s = crγ
Where c and γ are positive constants. However Plots of s versus r for various values
of γ are shown in the following figure.

Fig: Plots of the equation s=crγ for various values of g (c=1 in all cases).

Digital Image Processing by P Sudheer Chakravarthi 4


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

The curves generated with values of γ>1 have exactly the opposite effect as those
generated with values of γ<1. It reduces to the identity transformation when c=γ=1. The
power law transformation used in a variety of devices for image capture, printing, and
display. The exponent in the power-law equation is referred to as gamma is used to correct
this power-law response phenomena is called gamma correction.
Piecewise-Linear Transformation Functions
The principal advantage of piecewise linear functions is that these functions can be
arbitrarily complex. But their specification requires considerably more user input. These
transformations are of 3 types.
Contrast Stretching:
Contrast Stretching is a process that expands the range of intensity values in an image,
in order to utilize the dynamic range of intensity values. It is one of the simplest piecewise
linear function. Low contrast images can result from poor illumination. The following figure
shows that typical transformation used for contrast stretching.

Fig: Form of transformation function


The locations of points (r1, s1) and (r2, s2) control the shape of the transformation
function. If r1=s1 and r2=s2, the transformation is a linear function that produces no changes
in gray levels. If r1=r2, s1=0and s2=L-1, the transformation becomes a thresholding function
that creates a binary image. Intermediate values of (r1, s1) and (r2, s2) produce various
degrees of spread in the gray levels of the output image, thus affecting its contrast. In general,
r1 ≤ r2 and s1 ≤ s2 is assumed so that the function is single valued and monotonically
increasing. This condition preserves the order of gray levels, thus preventing the creation of
intensity artifacts in the processed image.

Digital Image Processing by P Sudheer Chakravarthi 5


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Intensity-Level Slicing:
The process of highlighting a specific range of intensities in an image is known as
Intensity-Level Slicing. There are two basic approaches can be adopted for Intensity-Level
Slicing.
 One approach is to display a high value for all gray levels in the range of interest and
a low value for all other intensities. This transformation produces a binary image.
 The second approach, based on the transformation, brightens the desired range of gray
levels but preserves the background and gray-level in the image without change.

Fig: (a) The transformation highlights range [A, B] of gray levels and reduces all others to a constant
level. (b) The transformation highlights range [A, B] but preserves all other levels.
Bit-Plane Slicing:
Instead of highlighting gray-level ranges, highlighting the contribution made to total
image appearance by specific bits might be desired. Suppose that each pixel in an image is
represented by 8 bits. Imagine that the image is composed of eight 1-bit planes, ranging from
bit-plane 0 for the least significant bit to bit plane 7 for the most significant bit. In terms of 8-
bit bytes, plane 0 contains all the lowest order bits in the bytes comprising the pixels in the
image and plane 7 contains all the high-order bits. The following figure shows the various bit
planes for an image.

Fig: Bit Plane Representation of an 8-bit image.

Digital Image Processing by P Sudheer Chakravarthi 6


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

The higher-order bits contain the majority of the visually significant data. The other
bit planes contribute to more subtle details in the image. Separating a digital image into its bit
planes is useful for analyzing the relative importance played by each bit of the image, a
process that aids in determining the adequacy of the number of bits used to quantize each
pixel.

2.3 Histogram Processing

Histogram of an image is defined as the representation including relative frequency of


occurrence of various gray levels in the image. In general, the histogram of a digital image
with gray levels in the range [0, L-1] is a discrete function h(rk) = nk, where rk is the kth gray
level and nk is the number of pixels in the image having gray level rk. A normalized histogram
can be obtained by dividing each of its values by the total number of pixels in the image and
it is given by the equation,

nk
p rk = for k = 0, 1, 2,…..L-1
MN

Histograms are the basis for numerous spatial domain processing techniques.
Histogram manipulation can be used effectively for image enhancement and also is quite
useful in other image processing applications, such as image compression and segmentation.
Histograms are simple to calculate in software and also lend themselves to economic
hardware implementations, thus making them a popular tool for real-time image processing.
The purpose of histogram is to classify the image falls in which category. Generally
images are classified as follows,

 Dark images: The components of the histogram are concentrated on the low side of
the intensity scale.
 Bright images: The components of the histogram are biased towards the high side of
the intensity scale.
 Low contrast: An image with low contrast has a histogram that will be narrow and
will be centered toward the middle of the gray scale.
 High contrast: The components of histogram in the high-contrast image cover a
broad range of the gray scale.

Digital Image Processing by P Sudheer Chakravarthi 7


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Fig: Dark, light, low contrast, high contrast images, and their corresponding histograms.

Histogram Equalization

Let the variable r represent the gray levels of the image to be enhanced. We assume
that r has been normalized to the interval [0, L-1], with r=0 representing black and r=L-1
representing white. For any r the conditions, then transformations of the form,

s = T(r) 0 ≤ r ≤ L-1
It produces an output intensity level s for every pixel in the input image having
intensity r. Assume that the transformation function T(r) satisfies the following conditions:

(a) T(r) is single-valued and monotonically increasing function in the interval 0 ≤ r ≤ L-1
(b) 0 ≤T( r) ≤ L-1 for 0 ≤ r ≤ L-1

The transformation function should be single valued so that the inverse


transformations should exist. T(r) is monotonically increasing condition guarantees that the
output intensity values never be less than corresponding input values. The second conditions
guarantee that the output gray levels will be in the same range as the input levels. The gray
levels of the image may be viewed as random variables in the interval [0, L-1]. The most
fundamental descriptor of a random variable is its probability density function (PDF). Let
Pr(r) and Ps(s) denote the probability density functions of random variables r and s
respectively. If pr(r) and T(r) are known and T-1(s) satisfies condition (a), then the probability
density function Ps(s) of the transformed variable s can be obtained by
dr
Ps s = Pr r
ds

Digital Image Processing by P Sudheer Chakravarthi 8


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Thus, the probability density function of the transformed variable, s is determined by


the PDF of the input gray-level and by the chosen transformation function. A transformation
function of particular importance in image processing has the form
𝑟
s = T(r) = 0 𝑟
𝑃 𝑤 𝑑𝑤

Where w is a dummy variable of integration. The right side of this equation is


recognized as the cumulative distribution function (CDF) of random variable r. Since
probability density functions are always positive, and recalling that the integral of a function
is the area under the function, it follows that this transformation function is single valued and
monotonically increasing, and, therefore, satisfies condition (a). Similarly, the integral of a
probability density function for variables in the range [0, L-1], so condition (b) is satisfied.
Using this definition of T we see that the derivative of s with respect to r is,

Substituting this result dr/ds

Hence the Ps(s) is a uniform probability function and independent of the form Pr(r).
For discrete values we deal with probabilities and summations instead of probability density
functions and integrals. The probability of occurrence of gray level rk in an image is
approximated by
nk
Pr r = k = 0, 1, 2,…….L-1
MN
Where MN is the total number of pixels in the image, nk is the number of pixels that
have gray level rk and L is the number of possible intensity levels in the image. The discrete
version of the transformation function given is
𝑘
sk = Tk(r) = 𝑗 =0 𝑃𝑟 (𝑟𝑗 )

Digital Image Processing by P Sudheer Chakravarthi 9


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Thus, a processed (output) image is obtained by mapping each pixel with level rk in
the input image into a corresponding pixel with level sk in the output image. Hence the
transformation is called histogram equalization or histogram linearization.
Histogram Matching
Histogram equalization automatically determines a transformation function that seeks
to produce an output image that has a uniform histogram. This is a good approach because
the results from this technique are predictable and the method is simple to implement.
However in some applications the enhancement on a uniform histogram is not the best
approach. In particular, it is useful sometimes to be able to specify the shape of the histogram
that we wish the processed image to have. The method used to generate a processed image
that has a specified histogram is called histogram matching or histogram specification.
Let us consider a continuous gray levels r and z (considered continuous random
variables), and let pr(r) and pz(z) denote their corresponding continuous probability density
functions. Where r and z denote the gray levels of the input and output (processed) images
respectively. We can estimate pr(r) from the given input image, while pz(z) is the specified
probability density function that we wish the output image to have. Let s be a random
variable with the property,
𝑟
s = T(r) = 0 𝑟
𝑃 𝑤 𝑑𝑤

Where w is a dummy variable of integration. This is a continuous version of


histogram equalization. Now we define a random variable z with the property
𝑧
G (z) = 0 𝑧
𝑃 𝑡 𝑑𝑡 = s
Where t is a dummy variable of integration, the two equations that G(z)=T(r) and,
therefore, that z must satisfy the condition
Z= G-1 (s) = G-1 [T(r)]
The transformation T(r) can be obtained once pr(r) has been estimated from the input
image. Similarly, the transformation function G(z) can be obtained when pz(z) is given.
Assuming that G–1 exists and show that an image with a specified probability density function
can be obtained from an input image by using the following procedure:
𝑟
 Obtain the transformation function T(r) by using s = T(r) = 0 𝑃𝑟 𝑤 𝑑𝑤
𝑧
 Obtain the transformation function G (z) by using G (z) = 0 𝑃𝑧 𝑡 𝑑𝑡 = s

 Obtain the inverse transformation function G–1 by using Z= G-1 (s) = G-1 [T(r)]

Digital Image Processing by P Sudheer Chakravarthi 10


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

 Obtain the output image by applying Z= G-1 (s) = G-1 [T(r)] to all the pixels in the
input image. The result of this procedure will be an image whose gray levels, z, have
the specified probability density function pz(z).
Local Histogram Processing
The histogram processing methods are global, in the sense that pixels are modified by
a transformation function based on the intensity distribution of an entire image. Although this
global approach is suitable for overall enhancement, there are cases in which it is necessary to
enhance details over small areas in an image. The number of pixels in these areas may have
negligible influence on the computation of a global transformation whose shape does not
necessarily guarantee the desired local enhancement. The solution is to devise transformation
functions based on the intensity distribution or other properties in the neighborhood of every
pixel in the image.
The histogram processing techniques previously described are easily adaptable to
local enhancement. The procedure is to define a square or rectangular neighborhood and
move the center of this area from pixel to pixel. At each location, the histogram of the points
in the neighborhood is computed and either a histogram equalization or histogram
specification transformation function is obtained. This function is finally used to map the
intensity of the pixel centered in the neighborhood. The center of the neighborhood region is
then moved to an adjacent pixel location and the procedure is repeated. Since only one new
row or column of the neighborhood changes during a pixel-to-pixel translation of the region,
updating the histogram obtained in the previous location with the new data introduced at each
motion step is possible. This approach has obvious advantages over repeatedly computing the
histogram over all pixels in the neighborhood region each time the region is moved one pixel
location. Another approach used some times to reduce computation is to utilize non
overlapping regions, but this method usually produces an undesirable checkerboard effect.
2.4 Enhancement Using Arithmetic/Logic Operations
Arithmetic/logic operations involving images are performed on a pixel-by-pixel basis
between two or more images. For an example, subtraction of two images results in a new
image whose pixel at coordinates (x, y) is the difference between the pixels in that same
location in the two images being subtracted. Depending on the hardware and/or software
being used, the actual mechanics of implementing arithmetic/logic operations can be done
sequentially, one pixel at a time, or in parallel, where all operations are performed
simultaneously.

Digital Image Processing by P Sudheer Chakravarthi 11


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Logic operations similarly operate on a pixel-by-pixel basis. Basically we can


implement the AND, OR, and NOT logic operators because these three operators are
functionally complete. Any other logic operator can be implemented by using only these three
basic functions. The logic operations on gray-scale images, pixel values are processed as
strings of binary numbers.
Image Subtraction
The difference between two images f(x, y) and h(x, y), expressed as
g (x, y) = f(x, y) – h(x, y)
It is obtained by computing the difference between all pairs of corresponding pixels from f
and h. The key usefulness of subtraction is the enhancement of differences between images.
Generally the higher-order bit planes of an image carry a significant amount of visually
relevant detail, while the lower planes contribute more to fine (often imperceptible) detail.
Image Averaging
Consider a noisy image g(x, y) formed by the addition of noise η (x, y) to an original
image f(x, y); that is,
g(x, y) = f(x, y) + η(x, y)
If an image 𝑔(x, y) is formed by averaging K different noisy images,
1 𝑘
𝑔(x, y) = 𝑖=1 𝑔𝑖 (𝑥, 𝑦)
𝐾

As K increases, the variability (noise) of the pixel values at each location (x, y)
decreases. Because E {𝑔(x, y)} = f(x, y), this means that 𝑔(x, y) approaches f(x, y) as the
number of noisy images used in the averaging process increases. The images gi(x, y) must be
registered (aligned) in order to avoid the introduction of blurring and other artifacts in the
output image.

Digital Image Processing by P Sudheer Chakravarthi 12


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

2.5 Basics of Spatial Filtering


Spatial Filter consists of a neighborhood and a predefined operation that is performed
on the image pixels by the neighborhood. Filtering creates anew pixel with coordinates equal
to the coordinates of the center of the neighborhood, and whose value is the result of the
filtering operation. A processed image is generated as the center of the filter visits each pixel
in the input image. If the operation performed on the image pixels is linear, then the filter is
called a linear spatial filter otherwise nonlinear.
The mechanics of spatial filtering are illustrated in the following figure. The process
consists simply of moving the filter mask from point to point in an image. At each point (x,
y), the response g(x, y) of the filter at that point is given by a sum of products of the filter
coefficients and the corresponding image pixels in the area spanned by the filter mask.

Fig: The mechanics of linear spatial filtering using a 3×3 filter mask.

For the 3×3 mask shown in the figure, the result (or response), g(x, y) of linear
filtering with the filter mask at a point (x, y) in the image is

It is the sum of products of the mask coefficients with the corresponding pixels
directly under the mask. Observe that the coefficient w (0, 0) coincides with image value f(x,
y), indicating that the mask is centered at (x, y) when the computation of the sum of products
takes place. For a mask of size m×n, we assume that m=2a+1 and n=2b+1, where a and b are
nonnegative integers.

Digital Image Processing by P Sudheer Chakravarthi 13


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

In general, linear filtering of an image f of size M×N with a filter mask of size m×n is
given by the expression:

Where x and y are varied so that each pixel in w visits every pixel in f
Spatial Correlation and Convolution
Correlation is the process of moving a filter mask over the image and computing the
sum of products at each location. The mechanism of convolution is the same except that the
filter is first rotated by 1800. The difference between the correlation and convolution can be
explained with a 1-D image as follows.

The correlation of a filter w (x, y) of size m×n with an image is given by the equation

Digital Image Processing by P Sudheer Chakravarthi 14


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

The convolution of a filter w (x, y) of size m×n with an image is given by the equation

2.6 Smoothing Spatial Filters


Smoothing filters are used for blurring and for noise reduction. Blurring is used in
preprocessing steps, such as removal of small details from an image prior to object extraction,
and bridging of small gaps in lines or curves. Noise reduction can be accomplished by
blurring with a linear filter and also by nonlinear filtering.
Smoothing Linear Filters
The output (response) of a smoothing, linear spatial filter is simply the average of the
pixels contained in the neighborhood of the filter mask. These filters are called averaging
filters and also are referred to a lowpass filters.
The smoothing filters can replace the value of every pixel in an image by the average
of the gray levels in the neighborhood defined by the filter mask, this process results in an
image with reduced “sharp” transitions in gray levels for noise reduction. However, edges are
characterized by sharp transitions in gray levels, so averaging filters have the undesirable side
effect that they blur edges. A major use of averaging filters is in the reduction of “irrelevant”
detail in an image. The following 3×3 smoothing filter yields the standard average of the
pixels under the mask and can be obtained by substituting the coefficients of the mask into
9
1
𝑅= 𝑧𝑖
9
𝑖=1

It is the average of the gray levels of the pixels in the 3×3 neighborhood defined by
the mask. An m×n mask would have a normalizing constant equal to 1/mn. A spatial
averaging filter in which all coefficients are equal is called a box filter.

Average Filter Mask Weighted Average Filter Mask

Digital Image Processing by P Sudheer Chakravarthi 15


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

The second mask is called weighted average which is used to indicate that pixels are
multiplied by different coefficients. In this mask the pixel at the center of the mask is
multiplied by a higher value than any other, thus giving this pixel more importance in the
calculation of the average. The other pixels are inversely weighted as a function of their
distance from the center of the mask. The diagonal terms are further away from the center
than the orthogonal neighbors and, thus, are weighed less than these immediate neighbors of
the center pixel.
The general implementation for filtering an M×N image with a weighted averaging
filter of size m×n (m and n odd) is given by the expression

The complete filtered image is obtained by applying the above equation for x=0, 1, 2,
………. M-1 and y=0, 1, 2,………., N-1. The denominator is simply the sum of the mask
coefficients and, therefore, it is a constant that needs to be computed only once. This scale
factor is applied to all the pixels of the output image after the filtering process is completed.
Order-Statistics Filters
Order-statistics filters are nonlinear spatial filters whose response is based on ordering
(ranking) the pixels contained in the image area encompassed by the filter, and then replacing
the value of the center pixel with the value determined by the ranking result. The best-known
example in this category is the “Median filter”. It replaces the value of a pixel by the median
of the gray levels in the neighborhood of that pixel. Median filters are quite popular because,
they provide excellent noise-reduction capabilities. They are effective in the presence of
impulse noise, also called salt-and-pepper noise because of its appearance as white and black
dots superimposed on an image.
In order to perform median filtering at a point in an image, we first sort the values of
the pixel in question and its neighbors, determine their median, and assign this value to that
pixel. For example, in a 3×3 neighborhood the median is the 5th largest value, in a 5×5
neighborhood the 13th largest value, and so on. When several values in a neighborhood are
the same, all equal values are grouped. For example, suppose that a 3×3 neighborhood has
values (10, 20, 20, 20, 15, 20, 20, 25, 100). These values are sorted as (10, 15, 20, 20, 20, 20,
20, 25, 100), which results in a median of 20. Thus, the principal function of median filters is
to force points with distinct gray levels to be more like their neighbors.
Digital Image Processing by P Sudheer Chakravarthi 16
CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

The median represents the 50th percentile of a ranked set of numbers, but the ranking
lends itself to many other possibilities. For example, using the 100th percentile filter is called
max filter, which is useful to find the brightest points in an image. The 0th percentile filter is
the min filter, used for the opposite purpose.
2.7 Sharpening Spatial Filters
The principal objective of sharpening is to highlight fine detail in an image or to
enhance detail that has been blurred, either in error or as a natural effect of a particular
method of image acquisition. It includes applications ranging from electronic printing and
medical imaging to industrial inspection and autonomous guidance in military systems.
As smoothing can be achieved by integration, sharpening can be achieved by spatial
differentiation. The strength of response of derivative operator is proportional to the degree of
discontinuity of the image at that point at which the operator is applied. Thus image
differentiation enhances edges and other discontinuities and deemphasizes the areas with
slow varying grey levels.
The derivatives of a digital function are defined in terms of differences. There are
various ways to define these differences. A basic definition of the first-order derivative of a
one-dimensional image f(x) is the difference
𝑑𝑓
= 𝑓 𝑥 + 1 − 𝑓(𝑥)
𝑑𝑥
The first order derivative must satisfy the properties such as,
 Must be zero in the areas of constant gray-level values.
 Must be nonzero at the onset of a gray-level step or ramp.
 Must be nonzero along ramps.

Similarly, we define a second-order derivative as the difference

𝑑2 𝑓
= 𝑓 𝑥 + 1 + 𝑓(𝑥 − 1) − 2𝑓(𝑥)
𝑑𝑥 2

The second order derivative must satisfy the properties such as,
 Must be zero in the areas of constant gray-level values.
 Must be nonzero at the onset and end of a gray-level step or ramp.
 Must be zero along ramps of constant slope.

Digital Image Processing by P Sudheer Chakravarthi 17


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Use of Second Derivatives for Enhancement–The Laplacian

The second order derivatives in image processing are implemented by using the
Laplacian operator. The Laplacian for an image f(x, y), is defined as

2
𝑑2 𝑓 𝑑2 𝑓
∇ 𝑓= 2+ 2
𝑑𝑥 𝑑𝑦

From the definition, the second order derivative in x-direction is

𝑑2 𝑓
= 𝑓 𝑥 + 1, 𝑦 + 𝑓(𝑥 − 1, 𝑦) − 2𝑓(𝑥, 𝑦)
𝑑𝑥 2
Similarly in y-direction is
𝑑2 𝑓
= 𝑓 𝑥, 𝑦 + 1 + 𝑓(𝑥, 𝑦 − 1) − 2𝑓(𝑥, 𝑦)
𝑑𝑦 2
Then
∇2 𝑓 = 𝑓 𝑥 + 1, 𝑦 + 𝑓 𝑥 − 1, 𝑦 + 𝑓 𝑥, 𝑦 + 1 + 𝑓 𝑥, 𝑦 − 1 − 4𝑓(𝑥, 𝑦)

This equation can be implemented using the mask shown in the following, which
gives an isotropic result for rotations in increments of 90°.The diagonal directions can be
incorporated in the definition of the digital Laplacian by adding two more terms to the above
equation, one for each of the two diagonal directions. The form of each new term is the same
but the coordinates are along the diagonals. Since each diagonal term also contains a –2f(x, y)
term, the total subtracted from the difference terms now would be –8f(x, y). The mask used to
implement this new definition is shown in the figure. This mask yields isotropic results for
increments of 45°. The other two masks are also used frequently in practice.

Fig.(a) Filter mask used to implement the digital Laplacian (b) Mask used to implement an
extension of this equation that includes the diagonal neighbors. (c)&(d) Two other
Implementations of the Laplacian.
Digital Image Processing by P Sudheer Chakravarthi 18
CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

The Laplacian with the negative sign gives equivalent results. Because the Laplacian
is a derivative operator, it highlights gray-level discontinuities in an image and deemphasizes
regions with slowly varying gray levels. This will tend to produce images that have grayish
edge lines and other discontinuities, all superimposed on a dark, featureless background.
Background features can be “recovered” while still preserving the sharpening effect of the
Laplacian operation simply by adding the original and Laplacian images. If the definition
uses a negative center coefficient, then we subtract, rather than add, then Laplacian image is
used to obtain a sharpened result. Thus, the basic way in which we use the Laplacian for
image enhancement is as follows:

Unsharp masking and High-Boost filtering

A process that has been used for many years in the publishing industry to sharpen
images consists of subtracting a blurred version of an image from the original image. This
process, called unsharp masking, consists of following steps
 Blur the original image.
 Subtract the blurred image from the original.
 Add the mask to the original.
Let 𝑓(𝑥, 𝑦) denotes the blurred image, unsharp masking is expressed as
gmask (x, y) = f(x, y) - 𝑓(𝑥, 𝑦)
Then we add a weighted portion of the mask to the original image
g(x,y) = f(x,y)+k* gmask(x,y) Where k is a weighted coefficient.
When k=1, which acts as unsharp masking
When k>1, It is referred as High-Boost Filtering.
When k<1, It de-emphasizes the contribution of the unsharp mask.

Digital Image Processing by P Sudheer Chakravarthi 19


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Use of First Derivatives for Enhancement–The Laplacian

The first derivatives in image processing are implemented using the magnitude of the
gradient. The gradient of f at coordinates (x, y) is defined as the two-dimensional column
vector

The magnitude of vector ∇𝑓 denoted as M(x, y)


M(x, y) = mag( ∇𝑓) = (𝑔𝑥2 + 𝑔𝑦2 )

The components of the gradient vector itself are linear operators, but the magnitude of
this vector obviously is not because of the squaring and square root. The computational
burden of implementing the above equation over an entire image is not trivial, and it is
common practice to approximate the magnitude of the gradient by using absolute values
instead of squares and square roots:
M(x, y) ≡ | gx | + | gy|
This equation is simpler to compute and it still preserves relative changes in gray
levels, but the isotropic feature property is lost in general. However, as in the case of the
Laplacian, the isotropic properties of the digital gradient are preserved only for a limited
number of rotational increments that depend on the masks used to approximate the
derivatives. As it turns out, the most popular masks used to approximate the gradient give the
same result only for vertical and horizontal edges and thus the isotropic properties of the
gradient are preserved only for multiples of 90°.
Let us denote the intensities of image points in a 3×3 region shown in figure (a). For
example, the center point, z5 , denotes f(x, y), z1 denotes f(x-1, y-1), and so on. The simplest
approximations to a first-order derivative that satisfy the conditions stated are gx= (z8-z5) and
gy = (z6-z5). Two other definitions proposed by Roberts [1965] in the early development of
digital image processing use cross differences:

gx=(z9-z5) and gy=(z8-z6)

Digital Image Processing by P Sudheer Chakravarthi 20


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Then we compute the gradient and its absolute values as

This equation can be implemented with the two masks shown in figure (b) and
(c).These masks are referred to as the Roberts cross-gradient operators. Masks of even size
are difficult to implement. The smallest filter mask in which we are interested is of size 3×3.
An approximation using absolute values, still at point z5 , but using a 3×3 mask, is

These equations can be implemented using the masks shown in figure (d) and (e). The
difference between the third and first rows of the 3×3 image region approximates the
derivative in the x-direction, and the difference between the third and first columns
approximates the derivative in the y-direction. The masks shown in figure (d) and (e) are
referred as Sobel operator. The magnitude of gradient by using these masks is

Digital Image Processing by P Sudheer Chakravarthi 21


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

FILTERING IN THE FREQUENCY DOMAIN


2.8. Basic Concepts
Complex Numbers:
A complex number, C, is defined as
C == R + jI
Where R and I are real numbers and j is an imaginary number equal to t j = −1.
Here, R denotes the real part of the complex number and 1 its imaginary part. Real numbers
are a subset of complex numbers in which I = 0. The conjugate of a complex number C
denoted by C*, is defined as
C· = R - jI
Sometimes, it is useful to represent complex numbers in polar coordinates,

Where |C|= 𝑅 2 + 𝐼 2 and θ = tan-1( I / R)


Fourier series:
Let a function f(t) of a continuous variable t that is periodic with period, T, can be
expressed as the sum of sines and cosines multiplied by appropriate coefficients. This sum,
known as a Fourier series, has the form

Impulses and Their Sifting Property:

A unit impulse of a continuous variable t located at t = 0, is defined as

and also to satisfy the identity

Digital Image Processing by P Sudheer Chakravarthi 22


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

An impulse sifting property with respect to integration is

The sifting property involves an impulse located at an arbitrary point to, denoted by
δ (t – to) is defined as

Let x represent a discrete variable, the unit discrete impulse, δ (x) is defined as

and also satisfies the condition

The sifting property for discrete variables has the form

Generally a discrete impulse located at x=x0

The Fourier Transform of Functions of One Continuous Variable:

The Fourier transform of a continuous function f(t) of a continuous variable, t, is


defined by the equation,

and its Inverse Fourier transform

Digital Image Processing by P Sudheer Chakravarthi 23


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Convolution:
The convolution of two continuous functions, f(t) and h(t), of one continuous variable,
t, is defined as

and its Fourier Transform is

Hence the convolution theorem is written as

and

Sampling and the Fourier Transform of Sampled Functions

Sampling is the process of converting a Continuous function into a sequence of


discrete values. Let us consider a continuous function, f(t), that we wish to sample at uniform
intervals (ΔT) of the independent variable t. We assume that sampling is to multiply f(t) by a
sampling function equal to a train of impulses ΔT units apart is defined as

Digital Image Processing by P Sudheer Chakravarthi 24


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Where 𝑓(t) denotes the sampled function and each component of this summation is
an impulse weighted by the value of f(t) at the location of the impulse. The value of each
sample is then given by the "strength" of the weighted impulse, which we obtain by
integration. That is, the value, f k, of an arbitrary sample in the sequence is given by

Fig: (a) A continuous function. (b) Train of impulses used to model the sampling process.(c)
Sampled function formed as the product of (a) and (b). (d) Sample values obtained by
integration and using the sifting prope.rty of the impulse.

Digital Image Processing by P Sudheer Chakravarthi 25


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

The Fourier transform of sampled function is

Where

Hence the convolution is

The summation in the last line shows that the Fourier transform of the sampled
function is an infinite, periodic sequence of copies of the transform of the original,
continuous function. The separation between copies is determined by the value of 1/ ΔT.

The quantity 1/ ΔT, is the sampling rate used to generate the sampled function. The
sampling rate was high enough to provide sufficient separation between the periods and thus
preserve the integrity of F(µ) is known as over-sampling. If the sampling rate was just
enough to preserve F(µ) is known as critically-sampling. The sampling rate was below the
minimum required to maintain distinct copies of F(µ) and thus failed to preserve the original
transform is known as under-sampling.

Digital Image Processing by P Sudheer Chakravarthi 26


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Fig. (a) Fourier transform of a band-limited function. (b)-(d) Transforms of the corresponding
sampled function under the conditions of over-sampling, critically sampling, and under-
sampling, respectively.

Sampling Theorem:
A function f(t) whose Fourier transform is zero for values of frequencies outside a
finite interval [-µmax, µmax] about the origin is called a band-limited function. We can recover
f(t) from its sampled version- if we can isolate a copy of F(µ) from the periodic sequence of
copies of this function contained in 𝐹 (µ). 𝐹 (µ) is a continuous, periodic function with period
1/ ΔT. This implies that we can recover f(t) from that single period by using the inverse
Fourier transform. Extracting from 𝐹 (µ) a single period that is equal to F (µ) is possible if the
separation between copies is sufficient with separation period
1
> 2𝜇𝑚𝑎𝑥
∆𝑇
This equation indicates that a continuous, band-limited function can be recovered
completely from a set of its samples if the samples are acquired at a rate exceeding twice the
highest frequency content of the function. This result is known as the sampling theorem.

Digital Image Processing by P Sudheer Chakravarthi 27


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Fig. (a) Transform of a band-limited function. (b) Transform resulting from critically
sampling the same function.
2.9. Extension to Functions of Two Variables
The 2-D Impulse and Its Sifting Property:
The impulse, δ (t, z), of two continuous variables, t and z, is defined as in

The 2-D impulse exhibits the sifting property under integration,

For discrete variables x and y, the 2-D discrete impulse is defined as

Digital Image Processing by P Sudheer Chakravarthi 28


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Where f(x, y) is a function of discrete variables x and y. For an impulse located at


coordinates (x0, y0) the sifting property is

The 2-D Continuous Fourier Transform Pair

Let f (t, z) be a continuous function of two continuous variables, t and z. The two-
dimensional, continuous Fourier transform pair is given by the expressions

Two-Dimensional Sampling and the 2-D Sampling Theorem:


Sampling in two dimensions can be modeled using the sampling function

Where ΔT and ΔZ are the separations between samples along the t- and z-axis of the
continuous function f (t, z). Function f(t, z) is said to be band-limited if its Fourier Transform
is zero outside a rectangle established by the intervals [-µmax, µmax] and [-vmax , vmax] that is,

The two-dimensional sampling theorem states that a continuous, band-limited


function f(t, z) can be recovered with no error from a set of its samples if the sampling
intervals are,

Digital Image Processing by P Sudheer Chakravarthi 29


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

The 2-D Discrete Fourier Transform and Its Inverse


The discrete Fourier transform and its inverse Fourier transform of an image f(x, y) of
size M×N is defined as

The Fourier spectrum, Phase angle and power spectrum as

Where R (u, v) and I (u,v) are the real and imaginary parts of F(u,v). Some properties
of Fourier transform are listed below,

Digital Image Processing by P Sudheer Chakravarthi 30


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

2.10. The Basics of Filtering in the Frequency Domain


Filtering in the frequency domain are based on modifying the Fourier Transform of an
image and then computing the inverse transform to obtain the processed result.
g(x,y) = F-1[H(u,v)F(u,v)]
Fundamental steps involved in the Frequency Domain are:
1. Given an input image f(x,y) of size M×N, obtain padding parameters P and Q.
Typically, P=2M and Q=2N.
2. Form a padded image fp(x,y) of size P×Q by appending the necessary number of
zeros to f(x,y).
3. Multiply fp(x,y) by (-1)x+y to centre its transform.
4. Compute the DFT, F(u,v), of the image from step 3.
5. Generate a real, symmetric filter function, H(u,v), of size P×Q with centre at
coordinates (P/2,Q/2). Form the product G(u,v)=H(u,v)F(u,v) using array
multiplication.

Digital Image Processing by P Sudheer Chakravarthi 31


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

6. Obtain the processed image:

7. Obtain the final processed result, g(x,y), by extracting the M×N region from the top,
left quadrant of gp(x,y).
2.11. Image Smoothing using Frequency Domain Filters:
Smoothing is achieved in the frequency domain filtering by attenuating a specified
range of high frequency components in the transform of a given image.
Ideal Low pass Filter
The ideal low pass filter that passes without attenuation all frequencies within a circle
of radius D0 from the origin and “cuts off” all frequencies outside this circle. The 2-D low
pass filter

Where D(u,v) is the distance between a point (u,v) and the centre of the frequency
rectangle:

Fig: 3.3 (a) Perspective plot of an ideal Low pass Filter transfer function. (b) Filter displayed
as an image. (c) Filter Radial cross section.

The point of transition between H(u,v) = 1 and H(u,v) = 0 is called the cutoff
frequency. The sharp cutoff frequencies of an ILPF cannot be realized with electronic
components and it produces ringing effect where a series of lines decreasing intensity lie
parallel to the edges. To avoid this ringing effect Gaussian low-pass or Butterworth low-pass
filters are preferred.

Digital Image Processing by P Sudheer Chakravarthi 32


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Butterworth Low pass Filter:


The transfer functions of a Butterworth low-pass filter (BLPF) of order n and with
cutoff frequency at a distance D0 from the origin:

If n increases, the filter becomes sharper with increased ringing in the spatial domain.
For n=1 it produces no ringing effect and n=2 ringing is present but imperceptible.

Fig: 3.4 (a) Perspective plot of a Butterworth Low pass Filter transfer function. (b) Filter
displayed as an image. (c) Filter Radial cross section of orders through n=1 to 4.
Gaussian Low pass Filter:
The transfer function of a 2D Gaussian low-pass filter (GLPF) is defined as

The Gaussian LPF transfer function is controlled by the value of cut-off frequency D0.
The advantage of the Gaussian filter is that it never causes ringing effect.

Fig. (a) Perspective plot of a Butterworth Low pass Filter transfer function. (b) Filter
displayed as an image. (c) Filter Radial cross section for various values of D0.

Digital Image Processing by P Sudheer Chakravarthi 33


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

2.12. Image Sharpening using Frequency Domain Filters:


Image sharpening can be achieved in the frequency domain by High-pass filtering
which attenuates the low frequency components without disturbing high-frequency
components of the Fourier transform of the image. A high-pass HHP filter can be obtained
from a given low-pass HLP filter by

HHP (u, v) = 1- HLP (u, v)

Ideal High pass Filter


A 2-D Ideal High pass Filter (IHPF) is defined as:

The IHPF is the opposite of the ILPF in the sense that it sets to zero all frequencies
inside a circle of radius Do while passing, without attenuation all frequencies ouside the
circle.

Butterworth High pass Filter


A 2-D Butterworth High pass Filter of order n and cutoff frequency Do is defined as

The order n determines the sharpness of the cutoff value and the amount of ringing.
The transition into higher values of cutoff frequencies is much smoother with the BHPF.

Gaussian High pass Filter


The transfer function of the Gaussian high pass filter with cutoff frequency locus at a
distance Do from the center of the frequency rectangle is given by

The results obtained by using GHPF are more gradual than with the IHPF, BHPF
filters. Even the filtering of the smaller objects and thin bars is cleaner with the Gaussian
filter.

Digital Image Processing by P Sudheer Chakravarthi 34


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Fig. Perspective plot, Filter displayed as an image and Filter Radial cross section.
2.13. The Laplacian in the Frequency Domain
The Laplacian for an image f(x, y) is defined as
𝑑2 𝑓 𝑑2 𝑓
∇2 𝑓(𝑥, 𝑦) = +
𝑑𝑥 2 𝑑𝑦 2

We know that,

The Fourier Transform of Laplacian for an image f(x, y) is

Then

Hence the Laplacian can be implemented in the frequency domain by using the filter

Digital Image Processing by P Sudheer Chakravarthi 35


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

In all filtering operations, the assumption is that the origin of F (u, v) has been
x+y
centered by performing the operation f(x, y) (-1) prior to taking the transform of the
image. If f (and F) are of size M X N, this operation shifts the center transform so that
(u, v) = (0, 0) is at point (M/2, N/2) in the frequency rectangle. As before, the center of the
filter function also needs to be shifted:

The Laplacian-filtered image in the spatial domain is obtained by computing the


inverse Fourier transform of H (u, v) F (u, v):

Conversely, computing the Laplacian in the spatial domain and computing the Fourier
transform of the result is equivalent to multiplying F(u, v) by H(u, v). Hence this dual
relationship in the Fourier-transform-pair notation,

The Enhanced image g(x, y) can be obtained by subtracting the Laplacian from the
original image,

2.14 The Unsharp Masking, High-Boost Filtering & High-Frequency Emphasis


Filtering
Using frequency domain, the filter mask can be defined as
gmask (x, y) = f(x, y) - fLP(x, y)
With
fLP(x, y)= F-1[HLP(u,v)F(u,v)]
Where HLP is a low pass filter, F(u,v) is the Fourier transform of f(x,y) and fLP(x,y)is a
smoothed image. Then
g(x, y) = f(x,y)+k* gmask(x,y)
This expression defines unsharp masking when k=1 and high boost filtering when
k>1. Using frequency domain the g(x, y) can be expressed as
g(x, y)= F-1{[1+k*[1-HLP(u,v)]]F(u,v)}

g(x, y)= F-1{[1+k*HHP(u,v)]F(u,v)}

Digital Image Processing by P Sudheer Chakravarthi 36


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

The expression contained within the square brackets is called “High-Frequency


Emphasis filter”. The High pass filter set the dc term to zero, thus reducing the average
intensity in the filtered image to 0. The high-frequency emphasis filter does not have this
problem because of the 1 that is added to the high pass filter. The constant k gives control
over the proportion of high frequencies that influence the final result. Now, the high
frequency-emphasis filtering is expressed as,
g(x, y)= F-1{[k1+k2*HHP(u,v)]F(u,v)}
Where k1≥0 gives controls of the offset from the origin and k2≥0 controls the
contribution of high frequencies.
2.15. Homomorphic filtering
The illumination-reflectance model can be used to develop a frequency domain
procedure for improving the appearance of an image by simultaneous gray-level range
compression and contrast enhancement. An image f(x, y) can be expressed as the product of
illumination and reflectance components,

This equation cannot be used directly to operate separately on the frequency


components of illumination and reflectance because the Fourier transform of the product of
two functions is not separable; in other words,

Where Fi(u, v) and Fr (u, v) are the Fourier transforms of lni(x, y) and ln r(x, y),
respectively. If we process Z (u, v) by means of a filter function H (u, v) then, from

Digital Image Processing by P Sudheer Chakravarthi 37


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Where the filtered image in the spatial domain is

Now we can express

Finally, z (x, y) was formed by taking the logarithm of the original image f (x, y), and
the inverse operation yields the desired enhanced image, denoted by g(x, y).

Where

are the illumination and reflectance components of the output image. The filtering
appraoch issummarized as shown in the figure.

Fig: Summary of steps in Homomorphic Filtering


This enhancement approach is based on a special case of a class of systems known as
homomorphic systems. In this particular application, the key to the approach is the separation
of the illumination and reflectance components achieved. The homomorphic filter function H
(u, v) can then operate on these components separately. The illumination component of an
image generally is characterized by slow spatial variations, while the reflectance component
tends to vary abruptly, particularly at the junctions of dissimilar objects.

Digital Image Processing by P Sudheer Chakravarthi 38


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

2.16. Selective filtering


The filters Low-Pass and High-Pass are operate over the entire frequency rectangle.
There are some other applications in which it is of interest to process specific band of
frequencies selective filters are used. They are,
 Band Pass Filter
 Band Reject Filter
 Notch Filters

Band Reject and Band Pass Filter


The Band Reject Filter transfer function is defined as

Where „W‟ is the width of the band, D is the distance D (u, v) from the centre of the
filter, Do is the cutoff frequency and n is the order of the Butterworth filter. The band reject
filters are very effective in removing periodic noise and the ringing effect normally small
A Band pass filter is obtained from the band reject filter as

Band Reject Filter Band Pass Filter

Fig: Band Reject Filter and its corresponding Band Pass Filter

Digital Image Processing by P Sudheer Chakravarthi 39


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

Notch Filters
A Notch filter Reject (or pass) frequencies in a predefined neighborhood about the
centre of the frequency rectangle. It is constructed as products of high pass filters whose
centers have been translated to the centers of the notches. The general form is defined as

Where Hk(u,v ) and H-k(u,v ) are high pass filters whose centers are at (uk, vk) and
(-uk, -vk) respectively. These centers are specified with respect to the center of the frequency
rectangle (M/2, N/2). The distance computations for each filter are defined as

A Notch Pass filter (NP) is obtained from a Notch Reject filter (NR) using:

For example the Butterworth notch reject filter of order n, containing three notch pairs
is defined as

PREVIOUS QUESTIONS
1. What is meant by image enhancement? Explain the various approaches used in image
enhancement.
2. a) Explain the Gray level transformation. Give the applications.
b) Compare frequency domain methods and spatial domain methods used in image
enhancement
3. Explain in detail about histogram processing.
4. What is meant by Histogram Equalization? Explain.
5. Explain how Fourier transforms are useful in digital image processing?
6. What is meant by Enhancement by point processing? Explain.

Digital Image Processing by P Sudheer Chakravarthi 40


CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain

7. Discuss in detail about the procedure involved in Histogram matching.


8. Prove that for continuous signal Histogram equalization results in flat histogram
9. Differentiate the spatial image enhancement and image enhancement in frequency
domain?
10. Explain in detail image averaging and image subtraction.
11. What is meant by Histogram of an image? Sketch Histograms of basic image types.
12. Explain about image smoothing using spatial filters
13. Explain the following concepts: Local enhancement.
14. Explain spatial filtering in image enhancement.
15. Explain about fuzzy techniques for intensity transformations.
16. Explain about the basic of filtering in the frequency domain.
17. Explain image smoothing using frequency domain filters.
18. Explain about the discrete Fourier transform (DFT) of one variable and two variables.
19. Explain about selective filtering
20. What is homomorphic filter? How to implement it?
21. Discuss about the properties of 2-D Discrete Fourier transform.
22. Explain the process of sampling in two dimensional functions.
23. Explain the process of two dimensional convolution?

Digital Image Processing by P Sudheer Chakravarthi 41


CHAPTER-3 Image Restoration and Reconstruction

UNIT-3
IMAGE RESTORATION AND RECONSTRUCTION
Introduction
Image Restoration is the process to recover an image that has been degraded by using
a priori knowledge of the degradation phenomenon. These techniques are oriented toward
modeling the degradation and applying the inverse process to recover the original image.
Restoration improves image in some predefined sense. Image enhancement techniques are
subjective process, where as image restoration techniques are objective process.

3.1 A Model of Image Degradation/Restoration Process


Image Degradation process operates on a degradation function that operates on an
input image with an additive noise term to produce degraded image. The image degradation
model is shown below.

Fig: A Model of Image Degradation/Restoration Process

Let f(x, y) is an input image and g(x, y) is the degraded image with some knowledge
about the degradation function H and some knowledge about the additive noise term η(x, y).
The objective of the restoration is to obtain an estimate 𝑓(x, y) of the original image. If H is a
linear, position-invariant process, then the degraded image is given in the spatial domain by
g(x,y)=f(x,y)*h(x,y)+η(x,y)
Where h(x, y) is the spatial representation of the degraded function. The degrade image in
frequency domain is represented as
G(u,v)=F(u,v)H(u,v)+N(u,v)
The terms in the capital letters are the Fourier Transform of the corresponding terms
in the spatial domain.

Digital Image Processing by P Sudheer Chakravarthi 1


CHAPTER-3 Image Restoration and Reconstruction

3.2 Noise Models


The principle sources of noise in digital image are due to image acquisition and
transmission.
 During image acquisition, the performance of image sensors gets affected by a variety
of factors such as environmental conditions and the quality of sensing elements.
 During image transmission, the images are corrupted due to the interference
introduced in the channel used for transmission.
The Noise components are considered as random variables, characterized by a
probability density function. The most common PDFs found in digital image processing
applications are given below.
Gaussian Noise
Gaussian noise is also known as „normal‟ noise. The Probability density function of a
Gaussian random variable z is given by

Where z represents intensity, 𝑧 is the mean and σ is its standard deviation and its
square (σ2 ) is called the variance of z. The values of Gaussian noise is approximately 70%
will be in the range [(𝑧 − σ), (𝑧 + σ)] and 95% will be in the range [(𝑧 − 2σ), (𝑧 + 2σ)].

Rayleigh Noise
The PDF of Rayleigh Noise is given by

Digital Image Processing by P Sudheer Chakravarthi 2


CHAPTER-3 Image Restoration and Reconstruction

Applications:
 It is used for characterizing noise phenomenon in range imaging.
 It describes the error in the measurement instrument.
 It describes the noise affected in radar.
 It determines the noise occurred when the signal is passed through the band pass
filter.

Erlang (gamma) Noise

The probability density function of Erlang noise is given by

Digital Image Processing by P Sudheer Chakravarthi 3


CHAPTER-3 Image Restoration and Reconstruction

Exponential Noise
The probability density function of Exponential noise is given by

p(z) is maximum at z=0


1 1
Mean: z = Variance: σ2 =
𝑎 𝑎2

Applications:
 It is used to describe the size of the raindrop.
 It is used to describe the fluctuations in received power reflected from certain targets
 It finds application in Laser imaging.
Uniform Noise
The probability density function of Uniform noise is given by

Digital Image Processing by P Sudheer Chakravarthi 4


CHAPTER-3 Image Restoration and Reconstruction

Salt and Pepper Noise (Impulse Noise)


The probability density function of Salt and Impulse noise is given by

If b>a, gray level b will appear as a light dot in image. Level a will appear like a dark
dot. The salt and pepper noise is also called as bi-polar impulse noise or Data-drop-out and
spike noise.

Periodic Noise
Periodic noise in an image occurred from electrical or elecrtomechnaical interference
during image acquisition. This is the only type of spatially dependent noise and the
parameters are estimated by the Fourier spectrum of the image. Periodic noise tends to
produce frequency spikes that often can be detected even by visual analysis. The mean and
variance are defined as

Digital Image Processing by P Sudheer Chakravarthi 5


CHAPTER-3 Image Restoration and Reconstruction

3.3 Restoration in the Presence of Noise only-Spatial Filtering


When the only degradation present in an image is noise,
g(x, y) = f(x,y)+ η(x,y)
and
G (u, v) = F (u, v) + N (u, v)
The noise terms are unknown so subtracting them from g(x, y) or G (u, v) is not a
realistic approach. In the case of periodic noise it is possible to estimate N (u, v) from the
spectrum G (u, v). So N (u, v) can be subtracted from G (u, v) to obtain an estimate of
original image. Spatial filtering can be done when only additive noise is present.

Mean Filters

Arithmetic Mean Filter:


It is the simplest mean filter. Let Sxy represents the set of coordinates in the sub image
of size m*n centered at point (x, y). The arithmetic mean filter computes the average value of
the corrupted image g(x, y) in the area defined by Sxy. The value of the restored image f at
any point (x, y) is the arithmetic mean computed using the pixels in the region defined by Sxy.

This operation can be using a convolution mask in which all coefficients have value
1/mn. A mean filter smoothes local variations in image Noise is reduced as a result of
blurring.

Geometric Mean Filter:


An image restored using a geometric mean filter is given by the expression

Here, each restored pixel is given by the product of the pixel in the sub-image
window, raised to the power 1/mn. A Geometric means filter achieves smoothing comparable
to the arithmetic mean filter, but it tends to lose image details in the process.

Digital Image Processing by P Sudheer Chakravarthi 6


CHAPTER-3 Image Restoration and Reconstruction

Harmonic Mean Filter:


The harmonic mean filtering operation is given by the expression

The harmonic mean filter works well for salt noise but fails for pepper noise. It does
well also with other types of noise.
Contra harmonic Mean Filter:
The contra harmonic mean filter yields a restored image based on the expression

Where Q is called the order of the filter and this filter is well suited for reducing the
effects of salt and pepper noise. For positive values of Q the filter eliminates pepper noise.
For negative values of Q it eliminates salt noise. It cannot do both simultaneously. The contra
harmonic filter reduces to arithmetic mean filter if Q=0 and to the harmonic filter if Q= -1.

Order-Static Filters
Order statistics filters are spatial filters whose response is based on ordering the pixel
contained in the image area encompassed by the filter. The response of the filter at any point
is determined by the ranking result.
Median Filter:
It is the best known order statistic filter. It replaces the value of a pixel by the median
of gray levels in the Neighborhood of the pixel.

The value of the pixel at (x, y) is included in the computation of the median. Median
filters are quite popular because for certain types of random noise, they provide excellent
noise reduction capabilities with considerably less blurring than smoothing filters of similar
size. These are effective for bipolar and unipolar impulse noise.

Digital Image Processing by P Sudheer Chakravarthi 7


CHAPTER-3 Image Restoration and Reconstruction

Max and Min Filters:


The median filter represents the 50th percentile of a ranked set of numbers. If using
the 100th percentile results is called “Max Filter”. It can be defined as

This filter is used for finding the brightest point in an image. Pepper noise in the
image has very low values, it is reduced by max filter using the max selection process in the
sublimated area SXY.
The 0th percentile filter is min filter

This filter is useful for flinging the darkest point in image. Also, it reduces salt noise
as a result of the min operation.
Midpoint Filter:
The midpoint filter simply computes the midpoint between the maximum and
minimum values in the area encompassed by the filter.

It combines the order statistics and averaging .This filter works best for randomly
distributed noise like Gaussian or uniform noise.

Alpha-trimmed mean Filter:


If we delete the d/2 lowest and the d/2 highest intensity values of g(s, t) in the
neighborhood SXY. Let gr(s, t) represents the remaining mn-d pixels. A filter formed by
averaging the reaming pixels is called alpha-trimmed mean filter.

The value of d can range from 0 to mn-1. If d=0 this filter reduces to arithmetic mean
filter. If d=mn-1, the filter becomes a median filter. For the other values of d the alpha-
trimmed median filter is useful for multiple types of noise, such as combination of salt-and-
pepper and Gaussian noise.

Digital Image Processing by P Sudheer Chakravarthi 8


CHAPTER-3 Image Restoration and Reconstruction

3.4 Adaptive Filters


Adaptive filter whose behavior changes based on the statistical characteristics of the
image inside the filter region Sxy.
Adaptive, local noise reduction filter
The simplest statistical measures of a random variable are its mean and variance. The
mean gives a measure of average intensity in the region over which the mean is computed
and the variance gives a measure of contrast in that region.
Let the filter is operate on a local region SXY. The response of the filter at any point
(x, y) is based on four quantities: (a) g(x, y), the value of noisy image at (x, y); (b) 𝜎𝜂2 the
variance of the noise corrupting f(x, y) to form g(x, y); (c) mL, the local mean of the pixels in
SXY; and (d) 𝜎𝐿2 , the local variance of the pixels in SXY. Hence the behavior of the filter is,
 If 𝜎𝜂2 is zero, the filter should return simply the value of g(x, y).
 If the local variance is high relative to 𝜎𝜂2 that means (𝜎𝐿2 > 𝜎𝜂2 ), the filter should
return a value close to g(x, y).
 If the two variances are equal, the filter returns the arithmetic mean value of the pixel
in SXY.
An adaptive filter for obtaining the restored image is

2

fˆ ( x, y )  g ( x, y )  2 g ( x, y )  mL 
L
The only quantity that needs to be known or estimated is the variance of the overall
noise is 𝜎𝜂2 . The other parameters are computed from the pixels in SXY.
Adaptive median filter
Adaptive median filters are used to preserve the details while smoothing non impulse.
However it changes the size of SXY during the filtering operation, depending on certain
conditions. The output of the filter is a single value used to replace the value of pixel at (x, y).
Let us consider the following parameters,

zmin = minimum intensity value in SXY


zmax = maximum intensity value in SXY
zmed = median of intensity values in SXY
zxy = intensity value at co-ordinates (x, y)
Smax = maximum allowed size of SXY

Digital Image Processing by P Sudheer Chakravarthi 9


CHAPTER-3 Image Restoration and Reconstruction

The adaptive median filtering algorithm works in two stages, denoted as stage A and
stage B as follows:
Stage A: A1 = zmed - zmin
A2 = zmed - zmax
If A1>0 AND A2<0, go to stage B
Else increase the window size
If window size ≤ Smax repeat stage A
Else output zmed
Stage B: B1 = zxy - zmin
B2 = zxy - zmax
If B1>0 AND B2<0, output zxy.
Else output zmed
3.5 Periodic Noise Reduction by Frequency Domain Filtering
Periodic noise in images are appears as concentrated bursts of energy in the Fourier
transform at locations corresponding to the frequencies of the periodic interference. This can
be removed by using selective filters.
Band Reject Filter:
The Band Reject Filter transfer function is defined as

Where „W‟ is the width of the band, D is the distance D (u, v) from the centre of the
filter, Do is the cutoff frequency and n is the order of the Butterworth filter. The band reject
filters are very effective in removing periodic noise and the ringing effect normally small.
The perspective plots of these filters are

Fig: Perspective plots of (a) Ideal (b) Butterworth and (c) Gaussian Band Reject Filters

Digital Image Processing by P Sudheer Chakravarthi 10


CHAPTER-3 Image Restoration and Reconstruction

Band Pass Filter:


A band pass filter performs the opposite operation of a band reject filter. A
Band pass filter is obtained from the band reject filter as

Notch Filters:
A Notch filter Reject (or pass) frequencies in a predefined neighborhood about the
centre of the frequency rectangle. It is constructed as products of high pass filters whose
centers have been translated to the centers of the notches. The general form is defined as

Where Hk(u,v ) and H-k(u,v ) are high pass filters whose centers are at (uk, vk) and
(-uk, -vk) respectively. These centers are specified with respect to the center of the frequency
rectangle (M/2, N/2).

Fig: Perspective plots of (a) Ideal (b) Butterworth and (c) Gaussian Notch Reject Filters

A Notch Pass filter (NP) is obtained from a Notch Reject filter (NR) using:

Digital Image Processing by P Sudheer Chakravarthi 11


CHAPTER-3 Image Restoration and Reconstruction

3.6 Linear, Position-Invariant Degradations


The input –output relation relationship before the restoration stage is
expressed as
g(x,y)=H[ f(x,y)] +η(x,y)
Let us assume that η(x, y) =0 then
g(x, y)=H[ f(x,y)]
If H is linear
H[af1(x,y)+bf2(x,y)]=aH[f1(x,y)]+bH[f2(x,y)]
Where a and b are scalars. f1(x,y) and f2(x,y) are any two input images. If a=b=1
H[f1(x,y)+f2(x,y)]=H[f1(x,y)]+H[f2(x,y)]
It is called the property of additivity. This property says that, if H is a linear operator,
the response to a sum of two inputs is equal to the sum of the two responses.
An operator having the input –output relation relationship g(x, y) =H[ f(x,y)] is said to
be position invariant if
H[f(x-α, y-β)] = g(x-α, y-β)
It indicates that the response at any point in the image depends only on the value of
the input at that point not on its position. If the impulse signal can be considered

Digital Image Processing by P Sudheer Chakravarthi 12


CHAPTER-3 Image Restoration and Reconstruction

3.7 Estimating the Degradation Function


There are three principle ways to estimate the degradation function for use in image
restoration: (1) Observation, (2) Experimentation and (3) mathematical modeling.
Estimation by Image observation:
In this process the degradation function is estimated by observing the Image. Select
the sub image whose signal content is strong. Let the observed sub image be denoted by
gs(x, y) and the processed sub image is 𝑓 (x, y). The estimated degradation function can be
expressed as

From the characteristics of this equation, we then deduce the complete degradation
function H (u, v) based on the consideration of position invariance.
Estimation by Experimentation:
The degrade image can be estimated accurately when the equipment is identical to the
one used to obtain the degraded image. Images similar to the degraded image can be acquired
with various system settings until they are degraded as closely as possible to the image we
wish to restore. Now obtain the impulse response of the degradation by imaging an impulse
using the same system settings.
An impulse is simulated by a bright dot of light, as bright as possible to reduce the
effect of noise. Then the degradation image can be expressed as

Where G(u, v) is the Fourier transform of observed image and A is a constant


describing the strength of the impulse.
Estimation by Modeling:
Image degradation function can be estimated by modeling includes the environmental
conditions that cause degradations. It can be expressed as

Where k is a constant that depends on the nature of the turbulence. Let f(x, y) is an
image that undergoes planar motion and that xo (t) and yo (t) are the time varying components
in the direction of x and y. The total blurring image g(x, y) is expressed as

Digital Image Processing by P Sudheer Chakravarthi 13


CHAPTER-3 Image Restoration and Reconstruction

The Fourier Transform of g(x, y) is G(u, v)

Reversing the order of the integration then

Let us define H (u, v)

Then the above expression can be expressed as

If the motion variables xo(t) and yo(t) are known then the degradation function H (u, v)
can becomes

3.8 Inverse Filtering


The simplest approach to restoration is direct inverse filtering, where we can estimate
𝐹 (u, v) of the transform of the original image simply by dividing the transform of the
degraded image G (u,v) by the degraded function

We know that G (u,v) = H(u, v)F(u, v) + N(u, v)

Digital Image Processing by P Sudheer Chakravarthi 14


CHAPTER-3 Image Restoration and Reconstruction

Then

From the above expression we can observe that if we know the degrade function we
cannot recover the un-degraded image exactly because N (u, v) is not known. If the degrade
function has zero or very small values, then the ratio N (u, v)/ H(u, v) could easily dominate
the 𝐹 (u, v). To avoid this disadvantage we limit the filter frequency values near the origin
because 𝐻(u, v) values are maximum at the origin.
3.9 Minimum Mean Square Error Filtering (Wiener Filtering)
In this filtering process, it incorporates both the degrade function and statistical
characteristics of noise .The images and noise in this method are considered as random
variables. The objective is to find an estimate 𝑓 of the uncorrupted image f such that the
mean square error between them is minimized. This error is measured by

Where E { } is the expected value of the argument. It is assumed that noise and image
are uncorrelated; one or the other has zero mean; the intensity levels in the estimate are a
linear function of the levels in the degraded image. Based on these conditions the minimum
of the error function in frequency domain is given by

This result is known as “Wiener filter”. The terms inside the bracket is commonly
referred as the minimum mean square error filter or the least square error filter. It does not
have the same problem as the inverse filter with zeros in the degraded function, unless the
entire denominator is zero for the same values of u and v.
H(u, v) = degraded function
H*(u, v) = complex conjugate of H(u, v)

Digital Image Processing by P Sudheer Chakravarthi 15


CHAPTER-3 Image Restoration and Reconstruction

The signal to noise ratio in frequency domain

The signal to noise ratio in spatial domain

The mean square error is obtained by using the expression

The modified expression to estimate 𝑓 by using minimum mean square error filtering

3.10 Constrained Least Square Filtering


The problem with the wiener filter is that it is necessary to know the power spectrum
of noise and image. The degraded image is given in the spatial domain by
g(x,y)=f(x,y)*h(x,y)+η(x,y)
In vector-matrix form

Where g, f, η vectors of dimension MNx1.H matrix of dimension MN x MN is very


large and is highly sensitive to noise. Optimality of restoration based on a measure of
smoothness: using Laplacian operator. The restoration must be constrained by the parameters
is to find the minimum criterion function c is defined as

Digital Image Processing by P Sudheer Chakravarthi 16


CHAPTER-3 Image Restoration and Reconstruction

subject to the constraint

The frequency domain solution to this optimization problem is given by the


expression

Where γ is a parameter that must be adjusted so that constraint is satisfied and P (u, v)
is the Fourier transform of the function

It is possible to adjust the parameter γ interactively until acceptable results are


achieved. A procedure for computing γ by iteration is as follows. Define a residual vector r as

Hence 𝐹 (u, v) is a function of γ, then r also a function of this parameter is a


monotonically increasing function of γ .

We want to adjust γ so that

Where a is an accuracy factor. If the constarint is satisfied. Because


φ(γ) is monotonic , finding the desired value of γ is not difficult. The appraoch

The variance and the mean of the entire image is

Digital Image Processing by P Sudheer Chakravarthi 17


CHAPTER-3 Image Restoration and Reconstruction

Hence the noise is

3.11 Geometric Mean Filter


Geometric mean filter is slightly generalized wiener filter in the form

α and β being positive real constants. Based on the values of α and β, geometric mean
filter performs the different actions
α = 1 => inverse filter
α = 0 => parametric Wiener filter (standard Wiener filter when β = 1)
α = 1/2 => actual geometric mean
α = 1/2 and β = 1 => spectrum equalization filter
PREVIOUS QUESTIONS
1. What is meant by image restoration? Explain the image degradation model
2. Discuss about the noise models
3. Explain the concept of algebraic image restoration
4. Discuss the advantages and disadvantages of wiener filter with regard to image
restoration.
5. Explain about noise modeling based on distribution function
6. Explain about wiener filter in noise removal
7. What is geometric mean filter? Explain
8. Explain the following. a) Minimum Mean square error filtering. b) Inverse filtering.
9. Discuss about Constrained Least Square restoration of a digital image in detail.
10. Explain in detail about different types of order statistics filters for Restoration.
11. Name different types of estimating the degradation function for use in image
restoration and explain in detail estimation by modeling.
12. Explain periodic noise reduction by frequency domain filtering
13. Explain adaptive filter and also what the two levels of adaptive median filtering
algorithms are

Digital Image Processing by P Sudheer Chakravarthi 18


CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

UNIT-5
Wavelets and Multi-resolution Processing
Image Compression
Introduction
In recent years, there have been significant advancements in algorithms and
architectures for the processing of image, video, and audio signals. These advancements have
proceeded along several directions. On the algorithmic front, new techniques have led to the
development of robust methods to reduce the size of the image, video, or audio data. Such
methods are extremely vital in many applications that manipulate and store digital data.
Informally, we refer to the process of size reduction as a compression process. We will
define this process in a more formal way later. On the architecture front, it is now feasible to
put sophisticated compression processes on a relatively low-cost single chip; this has spurred
a great deal of activity in developing multimedia systems for the large consumer market.
One of the exciting prospects of such advancements is that multimedia information
comprising image, video, and audio has the potential to become just another data type. This
usually implies that multimedia information will be digitally encoded so that it can be
manipulated, stored, and transmitted along with other digital data types. For such data usage
to be pervasive, it is essential that the data encoding is standard across different platforms
and applications. This will foster widespread development of applications and will also
promote interoperability among systems from different vendors. Furthermore, standardisation
can lead to the development of cost-effective implementations, which in turn will promote
the widespread use of multimedia information. This is the primary motivation behind the
emergence of image and video compression standards.
Compression is a process intended to yield a compact digital representation of a
signal. In the literature, the terms source coding, data compression, bandwidth compression,
and signal compression are all used to refer to the process of compression. In the cases where
the signal is defined as an image, a video stream, or an audio signal, the generic problem of
compression is to minimise the bit rate of their digital representation. There are many
applications that benefit when image, video, and audio signals are available in compressed
form. Without compression, most of these applications would not be feasible!
Example 1: Let us consider facsimile image transmission. In most facsimile machines, the
document is scanned and digitised. Typically, an 8.5x11 inches page is scanned at 200 dpi;

Digital Image Processing by P Sudheer Chakravarthi 1


CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

thus, resulting in 3.74 Mbits. Transmitting this data over a low-cost 14.4 kbits/s modem
would require 5.62 minutes. With compression, the transmission time can be reduced to 17
seconds. This results in substantial savings in transmission costs.

Example 2: Let us consider a video-based CD-ROM application. Full-motion video, at 30


fps and a 720 x 480 resolution, generates data at 20.736 Mbytes/s. At this rate, only 31
seconds of video can be stored on a 650 MByte CD-ROM. Compression technology can
increase the storage capacity to 74 minutes, for VHS-grade video quality.

Image, video, and audio signals are amenable to compression due to the factors below.
There is considerable statistical redundancy in the signal.

1. Within a single image or a single video frame, there exists significant correlation
among neighbour samples. This correlation is referred to as spatial correlation.
2. For data acquired from multiple sensors (such as satellite images), there exists
significant correlation amongst samples from these sensors. This correlation is
referred to as spectral correlation.
3. For temporal data (such as video), there is significant correlation amongst samples in
different segments of time. This is referred to as temporal correlation.

The term data compression refers to the process of reducing the amount of data
required to represent a given quantity of information. A clear distinction must be made
between data and information. They are not synonymous. In fact, data are the means by
which information is conveyed. Various amounts of data may be used to represent the same
amount of information. Such might be the case, for example, if a long-winded individual and
someone who is short and to the point were to relate the same story. Here, the information of
interest is the story; words are the data used to relate the information. If the two individuals
use a different number of words to tell the same basic story, two different versions of the
story are created, and at least one includes nonessential data. That is, it contains data (or
words) that either provide no relevant information or simply restate that which is already
known. It is thus said to contain data redundancy.
Data redundancy is a central issue in digital image compression. It is not an abstract
concept but a mathematically quantifiable entity. If n1 and n2 denote the number of
information-carrying units in two data sets that represent the same information, the relative
Digital Image Processing by P Sudheer Chakravarthi 2
CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

data redundancy RD of the first data set (the one characterized by n1) can be defined as

where CR , commonly called the compression ratio, is

For the case n2 = n1, CR = 1 and RD = 0, indicating that (relative to the second data
set) the first representation of the information contains no redundant data. When n2 << n1, CR
 ∞ and RD1, implying significant compression and highly redundant data. Finally, when
n2 >> n1 , CR 0 and RD ∞, indicating that the second data set contains much more data
than the original representation. This, of course, is the normally undesirable case of data
expansion. In general, CR and RD lie in the open intervals (0,∞) and (-∞, 1), respectively. A
practical compression ratio, such as 10 (or 10:1), means that the first data set has 10
information carrying units (say, bits) for every 1 unit in the second or compressed data set.
The corresponding redundancy of 0.9 implies that 90% of the data in the first data set is
redundant.
In digital image compression, three basic data redundancies can be identified and
exploited: coding redundancy, interpixel redundancy, and psychovisual redundancy.
Data compression is achieved when one or more of these redundancies are reduced or
eliminated.
Coding Redundancy:
In this, we utilize formulation to show how the gray-level histogram of an image also
can provide a great deal of insight into the construction of codes to reduce the amount of data
used to represent it.
Let us assume, once again, that a discrete random variable rk in the interval [0, 1]
represents the gray levels of an image and that each rk occurs with probability pr (rk).

where L is the number of gray levels, nk is the number of times that the kth gray level appears
in the image, and n is the total number of pixels in the image. If the number of bits used to
represent each value of rk is l (rk), then the average number of bits required to represent each
pixel is

Digital Image Processing by P Sudheer Chakravarthi 3


CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

That is, the average length of the code words assigned to the various gray-level values
is found by summing the product of the number of bits used to represent each gray level and
the probability that the gray level occurs. Thus the total number of bits required to code an M
X N image is MNLavg.
Interpixel Redundancy:
Consider the images shown in Figs. 1.1(a) and (b). As Figs. 1.1(c) and (d) show, these
images have virtually identical histograms. Note also that both histograms are trimodal,
indicating the presence of three dominant ranges of gray-level values. Because the gray levels
in these images are not equally probable, variable-length coding can be used to reduce the
coding redundancy that would result from a straight or natural binary encoding of their
pixels. The coding process, however, would not alter the level of correlation between the
pixels within the images. In other words, the codes used to represent the gray levels of each
image have nothing to do with the correlation between pixels. These correlations result from
the structural or geometric relationships between the objects in the image.

Digital Image Processing by P Sudheer Chakravarthi 4


CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

Fig.1.1 Two images and their gray-level histograms and normalized autocorrelation
coefficients along one line.

Figures 1.1(e) and (f) show the respective autocorrelation coefficients computed along
one line of each image.

where

The scaling factor in Eq. above accounts for the varying number of sum terms that
arise for each integer value of n. Of course, n must be strictly less than N, the number of
pixels on a line. The variable x is the coordinate of the line used in the computation. Note the
dramatic difference between the shape of the functions shown in Figs. 1.1(e) and (f). Their
shapes can be qualitatively related to the structure in the images in Figs. 1.1(a) and (b).This
relationship is particularly noticeable in Fig. 1.1 (f), where the high correlation between
pixels separated by 45 and 90 samples can be directly related to the spacing between the
vertically oriented matches of Fig. 1.1(b). In addition, the adjacent pixels of both images are
highly correlated. When n is 1, γ is 0.9922 and 0.9928 for the images of Figs. 1.1 (a) and (b),
respectively. These values are typical of most properly sampled television images.
These illustrations reflect another important form of data redundancy—one directly
related to the interpixel correlations within an image. Because the value of any given pixel
can be reasonably predicted from the value of its neighbors, the information carried by
individual pixels is relatively small. Much of the visual contribution of a single pixel to an
image is redundant; it could have been guessed on the basis of the values of its neighbors. A
variety of names, including spatial redundancy, geometric redundancy, and interframe
redundancy, have been coined to refer to these interpixel dependencies. We use the term
interpixel redundancy to encompass them all.
In order to reduce the interpixel redundancies in an image, the 2-D pixel array
normally used for human viewing and interpretation must be transformed into a more
efficient (but usually "nonvisual") format. For example, the differences between adjacent
pixels can be used to represent an image. Transformations of this type (that is, those that
remove interpixel redundancy) are referred to as mappings. They are called reversible

Digital Image Processing by P Sudheer Chakravarthi 5


CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

mappings if the original image elements can be reconstructed from the transformed data set.

Psychovisual Redundancy:
The brightness of a region, as perceived by the eye, depends on factors other than
simply the light reflected by the region. For example, intensity variations (Mach bands) can
be perceived in an area of constant intensity. Such phenomena result from the fact that the
eye does not respond with equal sensitivity to all visual information. Certain information
simply has less relative importance than other information in normal visual processing. This
information is said to be psychovisually redundant. It can be eliminated without significantly
impairing the quality of image perception.
That psychovisual redundancies exist should not come as a surprise, because human
perception of the information in an image normally does not involve quantitative analysis of
every pixel value in the image. In general, an observer searches for distinguishing features
such as edges or textural regions and mentally combines them into recognizable groupings.
The brain then correlates these groupings with prior knowledge in order to complete the
image interpretation process. Psychovisual redundancy is fundamentally different from the
redundancies discussed earlier. Unlike coding and interpixel redundancy, psychovisual
redundancy is associated with real or quantifiable visual information. Its elimination is
possible only because the information itself is not essential for normal visual processing.
Since the elimination of psychovisually redundant data results in a loss of quantitative
information, it is commonly referred to as quantization.
This terminology is consistent with normal usage of the word, which generally means
the mapping of a broad range of input values to a limited number of output values. As it is an
irreversible operation (visual information is lost), quantization results in lossy data
compression.
Fidelity Criterion.
The removal of psycho visually redundant data results in a loss of real or quantitative
visual information. Because information of interest may be lost, a repeatable or reproducible
means of quantifying the nature and extent of information loss is highly desirable. Two
general classes of criteria are used as the basis for such an assessment:
A) Objective fidelity criteria and
B) Subjective fidelity criteria.
When the level of information loss can be expressed as a function of the original or

Digital Image Processing by P Sudheer Chakravarthi 6


CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

input image and the compressed and subsequently decompressed output image, it is said to be
based on an objective fidelity criterion. A good example is the root-mean-square (rms) error
between an input and output image. Let f(x, y) represent an input image and let f(x, y) denote
an estimate or approximation of f(x, y) that results from compressing and subsequently
decompressing the input. For any value of x and y, the error e(x, y) between f (x, y) and f^ (x,
y) can be defined as

so that the total error between the two images is

where the images are of size M X N. The root-mean-square error, erms, between f(x, y)
and f^(x, y) then is the square root of the squared error averaged over the M X N array, or

A closely related objective fidelity criterion is the mean-square signal-to-noise ratio of


the compressed-decompressed image. If f^ (x, y) is considered to be the sum of the original
image f(x, y) and a noise signal e(x, y), the mean-square signal-to-noise ratio of the output
image, denoted SNRrms, is

The rms value of the signal-to-noise ratio, denoted SNRrms, is obtained by taking the
square root of Eq. above.
Although objective fidelity criteria offer a simple and convenient mechanism for
evaluating information loss, most decompressed images ultimately are viewed by humans.
Consequently, measuring image quality by the subjective evaluations of a human observer
often is more appropriate. This can be accomplished by showing a "typical" decompressed
image to an appropriate cross section of viewers and averaging their evaluations. The
evaluations may be made using an absolute rating scale or by means of side-by-side
comparisons of f(x, y) and f^(x, y).

Digital Image Processing by P Sudheer Chakravarthi 7


CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

Image Compression models


A compression system consists of two distinct structural blocks: an encoder and a
decoder. An input image f(x, y) is fed into the encoder, which creates a set of symbols from
the input data. After transmission over the channel, the encoded representation is fed to the
decoder, where a reconstructed output image f^(x, y) is generated. In general, f^(x, y) may or
may not be an exact replica of f(x, y). If it is, the system is error free or information
preserving; if not, some level of distortion is present in the reconstructed image. Both the
encoder and decoder shown in Fig. 3.1 consist of two relatively independent functions or
subblocks. The encoder is made up of a source encoder, which removes input redundancies,
and a channel encoder, which increases the noise immunity of the source encoder's output. As
would be expected, the decoder includes a channel decoder followed by a source decoder. If
the channel between the encoder and decoder is noise free (not prone to error), the channel
encoder and decoder are omitted, and the general encoder and decoder become the source
encoder and decoder, respectively.

Fig.3.1 A general compression system model


The Source Encoder and Decoder:
The source encoder is responsible for reducing or eliminating any coding, interpixel,
or psychovisual redundancies in the input image. The specific application and associated
fidelity requirements dictate the best encoding approach to use in any given situation.
Normally, the approach can be modeled by a series of three independent operations.

Digital Image Processing by P Sudheer Chakravarthi 8


CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

Fig.3.2 (a) Source encoder and (b) source decoder model


As Fig. 3.2 (a) shows, each operation is designed to reduce one of the three
redundancies. Figure 3.2 (b) depicts the corresponding source decoder. In the first stage of
the source encoding process, the mapper transforms the input data into a (usually nonvisual)
format designed to reduce interpixel redundancies in the input image. This operation
generally is reversible and may or may not reduce directly the amount of data required to
represent the image.
Run-length coding is an example of a mapping that directly results in data
compression in this initial stage of the overall source encoding process. The representation of
an image by a set of transform coefficients is an example of the opposite case. Here, the
mapper transforms the image into an array of coefficients, making its interpixel redundancies
more accessible for compression in later stages of the encoding process.
The second stage, or quantizer block in Fig. 3.2 (a), reduces the accuracy of the
mapper's output in accordance with some preestablished fidelity criterion. This stage reduces
the psychovisual redundancies of the input image. This operation is irreversible. Thus it must
be omitted when error-free compression is desired.
In the third and final stage of the source encoding process, the symbol coder creates a
fixed- or variable-length code to represent the quantizer output and maps the output in
accordance with the code. The term symbol coder distinguishes this coding operation from
the overall source encoding process. In most cases, a variable-length code is used to represent
the mapped and quantized data set. It assigns the shortest code words to the most frequently
occurring output values and thus reduces coding redundancy. The operation, of course, is
reversible. Upon completion of the symbol coding step, the input image has been processed
to remove each of the three redundancies.
Figure 3.2(a) shows the source encoding process as three successive operations, but
all three operations are not necessarily included in every compression system. Recall, for
example, that the quantizer must be omitted when error-free compression is desired. In
addition, some compression techniques normally are modeled by merging blocks that are
physically separate in Fig. 3.2(a). In the predictive compression systems, for instance, the
mapper and quantizer are often represented by a single block, which simultaneously performs
both operations.
Digital Image Processing by P Sudheer Chakravarthi 9
CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

The source decoder shown in Fig. 3.2(b) contains only two components: a symbol
decoder and an inverse mapper. These blocks perform, in reverse order, the inverse
operations of the source encoder's symbol encoder and mapper blocks. Because quantization
results in irreversible information loss, an inverse quantizer block is not included in the
general source decoder model shown in Fig. 3.2(b).
The Channel Encoder and Decoder:
The channel encoder and decoder play an important role in the overall encoding-
decoding process when the channel of Fig. 3.1 is noisy or prone to error. They are designed to
reduce the impact of channel noise by inserting a controlled form of redundancy into the
source encoded data. As the output of the source encoder contains little redundancy, it would
be highly sensitive to transmission noise without the addition of this "controlled redundancy."
One of the most useful channel encoding techniques was devised by R. W. Hamming
(Hamming [1950]). It is based on appending enough bits to the data being encoded to ensure
that some minimum number of bits must change between valid code words. Hamming
showed, for example, that if 3 bits of redundancy are added to a 4-bit word, so that the
distance between any two valid code words is 3, all single-bit errors can be detected and
corrected. (By appending additional bits of redundancy, multiple-bit errors can be detected
and corrected.) The 7-bit Hamming (7, 4) code word h1, h2, h3…., h6, h7 associated with a 4-
bit binary number b3b2b1b0 is

where denotes the exclusive OR operation. Note that bits h1, h2, and h4 are even-
parity bits for the bit fields b3 b2 b0, b3b1b0, and b2b1b0, respectively. (Recall that a string of
binary bits has even parity if the number of bits with a value of 1 is even.) To decode a
Hamming encoded result, the channel decoder must check the encoded value for odd parity
over the bit fields in which even parity was previously established. A single-bit error is
indicated by a nonzero parity word c4c2c1, where

Digital Image Processing by P Sudheer Chakravarthi 10


CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

If a nonzero value is found, the decoder simply complements the code word bit position
indicated by the parity word. The decoded binary value is then extracted from the corrected
code word as h3h5h6h7.

Variable-Length Coding:
The simplest approach to error-free image compression is to reduce only coding
redundancy. Coding redundancy normally is present in any natural binary encoding of the
gray levels in an image. It can be eliminated by coding the gray levels. To do so requires
construction of a variable-length code that assigns the shortest possible code words to the
most probable gray levels. Here, we examine several optimal and near optimal techniques for
constructing such a code. These techniques are formulated in the language of information
theory. In practice, the source symbols may be either the gray levels of an image or the output
of a gray-level mapping operation (pixel differences, run lengths, and so on).
Huffman coding:
The most popular technique for removing coding redundancy is due to Huffman
(Huffman [1952]). When coding the symbols of an information source individually, Huffman
coding yields the smallest possible number of code symbols per source symbol. In terms of
the noiseless coding theorem, the resulting code is optimal for a fixed value of n, subject to
the constraint that the source symbols be coded one at a time.
The first step in Huffman's approach is to create a series of source reductions by
ordering the probabilities of the symbols under consideration and combining the lowest
probability symbols into a single symbol that replaces them in the next source reduction.
Figure 4.1 illustrates this process for binary coding (K-ary Huffman codes can also be
constructed). At the far left, a hypothetical set of source symbols and their probabilities are
ordered from top to bottom in terms of decreasing probability values. To form the first source
reduction, the bottom two probabilities, 0.06 and 0.04, are combined to form a "compound
symbol" with probability 0.1. This compound symbol and its associated probability are
placed in the first source reduction column so that the
probabilities of the reduced source are also ordered from the most to the least probable. This
process is then repeated until a reduced source with two symbols (at the far right) is reached.
The second step in Huffman's procedure is to code each reduced source, starting with
the smallest source and working back to the original source. The minimal length binary code
for a two-symbol source, of course, is the symbols 0 and 1. As Fig. 4.2 shows, these symbols
are assigned to the two symbols on the right (the assignment is arbitrary; reversing the order
Digital Image Processing by P Sudheer Chakravarthi 11
CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

of the 0 and 1 would work just as well). As the reduced source symbol with probability 0.6
was generated by combining two symbols in the reduced source to its left, the 0 used to code
it is now assigned to both of these symbols, and a 0 and 1 are arbitrarily

Fig.4.1 Huffman source reductions.

Fig.4.2 Huffman code assignment procedure.

appended to each to distinguish them from each other. This operation is then repeated for
each reduced source until the original source is reached. The final code appears at the far left
in Fig. 4.2. The average length of this code is

and the entropy of the source is 2.14 bits/symbol. The resulting Huffman code efficiency is
0.973.
Huffman's procedure creates the optimal code for a set of symbols and probabilities
subject to the constraint that the symbols be coded one at a time. After the code has been
created, coding and/or decoding is accomplished in a simple lookup table manner. The code
itself is an instantaneous uniquely decodable block code. It is called a block code because
each source symbol is mapped into a fixed sequence of code symbols. It is instantaneous,

Digital Image Processing by P Sudheer Chakravarthi 12


CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

because each code word in a string of code symbols can be decoded without referencing
succeeding symbols. It is uniquely decodable, because any string of code symbols can be
decoded in only one way. Thus, any string of Huffman encoded symbols can be decoded by
examining the individual symbols of the string in a left to right manner. For the binary code
of Fig. 4.2, a left-to-right scan of the encoded string 010100111100 reveals that the first valid
code word is 01010, which is the code for symbol a3 .The next valid code is 011, which
corresponds to symbol a1. Continuing in this manner reveals the completely decoded message
to be a3a1a2a2a 6.
Arithmetic coding:
Unlike the variable-length codes described previously, arithmetic coding generates
nonblock codes. In arithmetic coding, which can be traced to the work of Elias, a one-to-one
correspondence between source symbols and code words does not exist. Instead, an entire
sequence of source symbols (or message) is assigned a single arithmetic code word. The code
word itself defines an interval of real numbers between 0 and 1. As the number of symbols in
the message increases, the interval used to represent it becomes smaller and the number of
information units (say, bits) required to represent the interval becomes larger. Each symbol of
the message reduces the size of the interval in accordance with its probability of occurrence.
Because the technique does not require, as does Huffman's approach, that each source symbol
translate into an integral number of code symbols (that is, that the symbols be coded one at a
time), it achieves (but only in theory) the bound established by the noiseless coding theorem.

Fig.5.1 Arithmetic coding procedure


Figure 5.1 illustrates the basic arithmetic coding process. Here, a five-symbol
sequence or message, a1a2a3a3a4, from a four-symbol source is coded. At the start of the

Digital Image Processing by P Sudheer Chakravarthi 13


CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

coding process, the message is assumed to occupy the entire half-open interval [0, 1). As
Table 5.2 shows, this interval is initially subdivided into four regions based on the
probabilities of each source symbol. Symbol ax, for example, is associated with subinterval
[0, 0.2). Because it is the first symbol of the message being coded, the message interval is
initially narrowed to [0, 0.2). Thus in Fig. 5.1 [0, 0.2) is expanded to the full height of the
figure and its end points labeled by the values of the narrowed range. The narrowed range is
then subdivided in accordance with the original source symbol probabilities and the process
continues with the next message symbol.

Table 5.1 Arithmetic coding example


In this manner, symbol a2 narrows the subinterval to [0.04, 0.08), a3 further narrows it
to [0.056, 0.072), and so on. The final message symbol, which must be reserved as a special
end-of- message indicator, narrows the range to [0.06752, 0.0688). Of course, any number
within this subinterval—for example, 0.068—can be used to represent the message.
In the arithmetically coded message of Fig. 5.1, three decimal digits are used to represent the
five-symbol message. This translates into 3/5 or 0.6 decimal digits per source symbol and
compares favorably with the entropy of the source, which is 0.58 decimal digits or 10-ary
units/symbol. As the length of the sequence being coded increases, the resulting arithmetic
code approaches the bound established by the noiseless coding theorem.
In practice, two factors cause coding performance to fall short of the bound: (1) the
addition of the end-of-message indicator that is needed to separate one message from an-
other; and (2) the use of finite precision arithmetic. Practical implementations of arithmetic
coding address the latter problem by introducing a scaling strategy and a rounding strategy
(Langdon and Rissanen [1981]). The scaling strategy renormalizes each subinterval to the [0,
1) range before subdividing it in accordance with the symbol probabilities. The rounding
strategy guarantees that the truncations associated with finite precision arithmetic do not
prevent the coding subintervals from being represented accurately.
LZW Coding:
The technique, called Lempel-Ziv-Welch (LZW) coding, assigns fixed-length code
words to variable length sequences of source symbols but requires no a priori knowledge of
Digital Image Processing by P Sudheer Chakravarthi 14
CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

the probability of occurrence of the symbols to be encoded. LZW compression has been
integrated into a variety of mainstream imaging file formats, including the graphic
interchange format (GIF), tagged image file format (TIFF), and the portable document format
(PDF). LZW coding is conceptually very simple (Welch [1984]). At the onset of the coding
process, a codebook or "dictionary" containing the source symbols to be coded is constructed.
For 8-bit monochrome images, the first 256 words of the dictionary are assigned to the gray
values 0, 1, 2..., and 255. As the encoder sequentially examines the image's pixels, gray-level
sequences that are not in the dictionary are placed in algorithmically determined (e.g., the
next unused) locations. If the first two pixels of the image are white, for instance, sequence
“255-255” might be assigned to location 256, the address following the locations reserved for
gray levels 0 through 255. The next time that two consecutive white pixels are encountered,
code word 256, the address of the location containing sequence 255-255, is used to represent
them. If a 9-bit, 512-word dictionary is employed in the coding process, the original (8 + 8)
bits that were used to represent the two pixels are replaced by a single 9-bit code word.
Cleary, the size of the dictionary is an important system parameter. If it is too small, the
detection of matching gray-level sequences will be less likely; if it is too large, the size of the
code words will adversely affect compression performance.
Consider the following 4 x 4, 8-bit image of a vertical edge:

Table 6.1 details the steps involved in coding its 16 pixels. A 512-word dictionary with the
following starting content is assumed:

Locations 256 through 511 are initially unused. The image is encoded by processing
its pixels in a left-to-right, top-to-bottom manner. Each successive gray-level value is

Digital Image Processing by P Sudheer Chakravarthi 15


CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

concatenated with a variable—column 1 of Table 6.1 —called the "currently recognized


sequence." As can be seen, this variable is initially null or empty. The dictionary is searched
for each concatenated sequence and if found, as was the case in the first row of the table, is
replaced by the newly concatenated and recognized (i.e., located in the dictionary) sequence.
This was done in column 1 of row 2.

Table 6.1 LZW coding example


No output codes are generated, nor is the dictionary altered. If the concatenated
sequence is not found, however, the address of the currently recognized sequence is output as
the next encoded value, the concatenated but unrecognized sequence is added to the
dictionary, and the currently recognized sequence is initialized to the current pixel value. This
occurred in row 2 of the table. The last two columns detail the gray-level sequences that are
added to the dictionary when scanning the entire 4 x 4 image. Nine additional code words are
defined. At the conclusion of coding, the dictionary contains 265 code words and the LZW
algorithm has successfully identified several repeating gray-level sequences—leveraging
them to reduce the original 128-bit image lo 90 bits (i.e., 10 9-bit codes). The encoded output
is obtained by reading the third column from top to bottom. The resulting compression ratio
is 1.42:1.
A unique feature of the LZW coding just demonstrated is that the coding dictionary or
code book is created while the data are being encoded. Remarkably, an LZW decoder builds

Digital Image Processing by P Sudheer Chakravarthi 16


CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

an identical decompression dictionary as it decodes simultaneously the encoded data stream. .


Although not needed in this example, most practical applications require a strategy for
handling dictionary overflow. A simple solution is to flush or reinitialize the dictionary when
it becomes full and continue coding with a new initialized dictionary. A more complex option
is to monitor compression performance and flush the dictionary when it becomes poor or
unacceptable. Alternately, the least used dictionary entries can be tracked and replaced when
necessary.
Bit-Plane Coding:
An effective technique for reducing an image's interpixel redundancies is to process
the image's bit planes individually. The technique, called bit-plane coding, is based on the
concept of decomposing a multilevel (monochrome or color) image into a series of binary
images and compressing each binary image via one of several well-known binary
compression methods.
Bit-plane decomposition:
The gray levels of an m-bit gray-scale image can be represented in the form of the
base 2 polynomial

Based on this property, a simple method of decomposing the image into a collection
of binary images is to separate the m coefficients of the polynomial into m 1-bit bit planes.
The zeroth-order bit plane is generated by collecting the a0 bits of each pixel, while the (m -
1) st-order bit plane contains the am-1, bits or coefficients. In general, each bit plane is
numbered from 0 to m-1 and is constructed by setting its pixels equal to the values of the
appropriate bits or polynomial coefficients from each pixel in the original image. The
inherent disadvantage of this approach is that small changes in gray level can have a
significant impact on the complexity of the bit planes. If a pixel of intensity 127 (01111111)
is adjacent to a pixel of intensity 128 (10000000), for instance, every bit plane will contain a
corresponding 0 to 1 (or 1 to 0) transition. For example, as the most significant bits of the two
binary codes for 127 and 128 are different, bit plane 7 will contain a zero-valued pixel next to
a pixel of value 1, creating a 0 to 1 (or 1 to 0) transition at that point.
An alternative decomposition approach (which reduces the effect of small gray-level
variations) is to first represent the image by an m-bit Gray code. The m-bit Gray code gm-
1...g2g1g0 that corresponds to the polynomial in Eq. above can be computed from

Digital Image Processing by P Sudheer Chakravarthi 17


CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

Here, denotes the exclusive OR operation. This code has the unique property that
successive code words differ in only one bit position. Thus, small changes in gray level are
less likely to affect all m bit planes. For instance, when gray levels 127 and 128 are adjacent,
only the 7th bit plane will contain a 0 to 1 transition, because the Gray codes that correspond
to 127 and 128 are 11000000 and 01000000, respectively

Lossless Predictive Coding:


The error-free compression approach does not require decomposition of an image into
a collection of bit planes. The approach, commonly referred to as lossless predictive coding,
is based on eliminating the interpixel redundancies of closely spaced pixels by extracting and
coding only the new information in each pixel. The new information of a pixel is defined as
the difference between the actual and predicted value of that pixel.
Figure 8.1 shows the basic components of a lossless predictive coding system. The
system consists of an encoder and a decoder, each containing an identical predictor. As each
successive pixel of the input image, denoted fn, is introduced to the encoder, the predictor
generates the anticipated value of that pixel based on some number of past inputs. The output
of the predictor is then rounded to the nearest integer, denoted f^n and used to form the
difference or prediction error which is coded using a variable-length code (by the symbol
encoder) to generate the next element of the compressed data stream.

Fig.8.1 A lossless predictive coding model: (a) encoder; (b) decoder

Digital Image Processing by P Sudheer Chakravarthi 18


CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

The decoder of Fig. 8.1 (b) reconstructs en from the received variable-length code words and
performs the inverse operation

Various local, global, and adaptive methods can be used to generate f^n. In most
cases, however, the prediction is formed by a linear combination of m previous pixels. That
is,

where m is the order of the linear predictor, round is a function used to denote the
rounding or nearest integer operation, and the αi, for i = 1,2,..., m are prediction coefficients.
In raster scan applications, the subscript n indexes the predictor outputs in accordance with
their time of occurrence. That is, fn, f^n and en in Eqns. above could be replaced with the more
explicit notation f (t), f^(t), and e (t), where t represents time. In other cases, n is used as an
index on the spatial coordinates and/or frame number (in a time sequence of images) of an
image. In 1-D linear predictive coding, for example, Eq. above can be written as

where each subscripted variable is now expressed explicitly as a function of spatial


coordinates x and y. The Eq. indicates that the 1-D linear prediction f(x, y) is a function of
the previous pixels on the current line alone. In 2-D predictive coding, the prediction is a
function of the previous pixels in a left-to-right, top-to-bottom scan of an image. In the 3-D
case, it is based on these pixels and the previous pixels of preceding frames. Equation above
cannot be evaluated for the first m pixels of each line, so these pixels must be coded by using
other means (such as a Huffman code) and considered as an overhead of the predictive
coding process. A similar comment applies to the higher-dimensional cases

Lossy Predictive Coding:


In this type of coding, we add a quantizer to the lossless predictive model and
examine the resulting trade-off between reconstruction accuracy and compression
performance. As Fig.9 shows, the quantizer, which absorbs the nearest integer function of the
error-free encoder, is inserted between the symbol encoder and the point at which the
prediction error is formed. It maps the prediction error into a limited range of outputs,
denoted e^n which establish the amount of compression and distortion associated with lossy
Digital Image Processing by P Sudheer Chakravarthi 19
CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

predictive coding.

Fig. 9 A lossy predictive coding model: (a) encoder and (b) decoder

In order to accommodate the insertion of the quantization step, the error-free encoder
of figure must be altered so that the predictions generated by the encoder and decoder are
equivalent. As Fig.9 (a) shows, this is accomplished by placing the lossy encoder's predictor
within a feedback loop, where its input, denoted f˙n, is generated as a function of past
predictions and the corresponding quantized errors. That is,

This closed loop configuration prevents error buildup at the decoder's output. Note from Fig.
9(b) that the output of the decoder also is given by the above Eqn.
Optimal predictors:
The optimal predictor used in most predictive coding applications minimizes the
encoder's mean-square prediction error

subject to the constraint that

and

That is, the optimization criterion is chosen to minimize the mean-square prediction

Digital Image Processing by P Sudheer Chakravarthi 20


CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

error, the quantization error is assumed to be negligible (e˙n ≈ en), and the prediction is
constrained to a linear combination of m previous pixels.1 These restrictions are not essential,
but they simplify the analysis considerably and, at the same time, decrease the computational
complexity of the predictor. The resulting predictive coding approach is referred to as
differential pulse code modulation (DPCM).
Transform Coding:
All the predictive coding techniques operate directly on the pixels of an image and
thus are spatial domain methods. In this coding, we consider compression techniques that are
based on modifying the transform of an image. In transform coding, a reversible, linear
transform (such as the Fourier transform) is used to map the image into a set of transform
coefficients, which are then quantized and coded. For most natural images, a significant
number of the coefficients have small magnitudes and can be coarsely quantized (or
discarded entirely) with little image distortion. A variety of transformations, including the
discrete Fourier transform (DFT), can be used to transform the image data.

Fig. 10 A transform coding system: (a) encoder; (b) decoder.


Figure 10 shows a typical transform coding system. The decoder implements the
inverse sequence of steps (with the exception of the quantization function) of the encoder,
which performs four relatively straightforward operations: subimage decomposition,
transformation, quantization, and coding. An N X N input image first is subdivided into
subimages of size n X n, which are then transformed to generate (N/n) 2 subimage transform
arrays, each of size n X n. The goal of the transformation process is to decorrelate the pixels
of each subimage, or to pack as much information as possible into the smallest number of
transform coefficients. The quantization stage then selectively eliminates or more coarsely
quantizes the coefficients that carry the least information. These coefficients have the
smallest impact on reconstructed subimage quality. The encoding process terminates by

Digital Image Processing by P Sudheer Chakravarthi 21


CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

coding (normally using a variable-length code) the quantized coefficients. Any or all of the
transform encoding steps can be adapted to local image content, called adaptive transform
coding, or fixed for all subimages, called nonadaptive transform coding.
Wavelet Coding:
The wavelet coding is based on the idea that the coefficients of a transform that
decorrelates the pixels of an image can be coded more efficiently than the original pixels
themselves. If the transform's basis functions—in this case wavelets—pack most of the
important visual information into a small number of coefficients, the remaining coefficients
can be quantized coarsely or truncated to zero with little image distortion.
Figure 11 shows a typical wavelet coding system. To encode a 2J X 2J image, an
analyzing wavelet, Ψ, and minimum decomposition level, J - P, are selected and used to
compute the image's discrete wavelet transform. If the wavelet has a complimentary scaling
function φ, the fast wavelet transform can be used. In either case, the computed transform
converts a large portion of the original image to horizontal, vertical, and diagonal
decomposition coefficients with zero mean and Laplacian-like distributions.

Fig.11 A wavelet coding system: (a) encoder; (b) decoder.


Since many of the computed coefficients carry little visual information, they can be
quantized and coded to minimize intercoefficient and coding redundancy. Moreover, the
quantization can be adapted to exploit any positional correlation across the P decomposition
levels. One or more of the lossless coding methods, including run-length, Huffman,
arithmetic, and bit-plane coding, can be incorporated into the final symbol coding step.
Decoding is accomplished by inverting the encoding operations—with the exception of
quantization, which cannot be reversed exactly.

The principal difference between the wavelet-based system and the transform coding
system is the omission of the transform coder's subimage processing stages.
Because wavelet transforms are both computationally efficient and inherently local
Digital Image Processing by P Sudheer Chakravarthi 22
CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression

(i.e., their basis functions are limited in duration), subdivision of the original image is
unnecessary

PREVIOUS QUESTIONS
1. Draw the functional block diagram of image compression system and explain the
purpose of each block.
2. Explain the need for image compression. How run length encoding approach is used
for compression? Is it lossy? Justify.
3. Describe about wavelet packets.
4. Write short notes on: i) Arithmetic coding. ii) Vector quantization. iii) JPEG
standards.
5. Explain about the Fast Wavelet Transform.
6. Explain two-band sub-band coding and decoding system.
7. What are the various requirements for multi-resolution analysis? Explain.
8. What is block transform coding? Explain.
9. With an example, explain Huffman coding.
10. Write about Haar Wavelet transform.
11. What is meant by redundancy in image? Explain its role in image processing.
3 −1
12. Compute the Haar transform of the 2 x 2 image F= [ ]
6 2

Digital Image Processing by P Sudheer Chakravarthi 23


CHAPTER-6 Morphological Image Processing & Image Segmentation

UNIT-6
Morphological Image Processing
Introduction
The word morphology commonly denotes a branch of biology that deals with the form
and structure of animals and plants. Morphology in image processing is a tool for extracting
image components that are useful in the representation and description of region shape, such
as boundaries and skeletons. Furthermore, the morphological operations can be used for
filtering, thinning and pruning. The language of the Morphology comes from the set theory,
where image objects can be represented by sets.
Some Basic Concepts form Set Theory :
 If every element of a set A is also an element of another set B, then A is said to be a
subset of B, denoted as A ⊆ B
 The union of two sets A and B, denoted by C = A∪B
 The intersection of two sets A and B, denote by D = A∩B

 Disjoint or mutually exclusive A∩B =ø

 The complement of a set A is the set of elements not contained in

 The difference of two sets A and B, denoted A - B, is defined as

Digital Image Processing by P Sudheer Chakravarthi 1


CHAPTER-6 Morphological Image Processing & Image Segmentation

Reflection and Translation by examples:


• Need for a reference point.
• Reflection of B: = {x|x=-b, for b∈B}
• Translation of A by x=(x1,x2), denoted by (A)x is defined as:
(A)x = {c| c=a+x, for a∈A}

Dilation
Dilation is used for expanding an element A by using structuring element B. Dilation
of A by B and is defined by the following equation:

This equation is based on obtaining the reflection of B about its origin and shifting
this reflection by z. The dilation of A by B is the set of all displacements z, such that and A
overlap by at least one element. Based On this interpretation the equation of (9.2-1) can be
rewritten as:

Digital Image Processing by P Sudheer Chakravarthi 2


CHAPTER-6 Morphological Image Processing & Image Segmentation

Dilation is typically applied to binary image, but there are versions that work on gray
scale image. The basic effect of the operator on a binary image is to gradually enlarge the
boundaries of regions of foreground pixels (i.e. white pixels, typically). Thus areas of
foreground pixels grow in size while holes within those regions become smaller.
Any pixel in the output image touched by the dot in the structuring element is set to
ON when any point of the structuring element touches a ON pixel in the original image. This
tends to close up holes in an image by expanding the ON regions. It also makes objects
larger. Note that the result depends upon both the shape of the structuring element and the
location of its origin.
Summary effects of dilation:
 Expand/enlarge objects in the image
 Fill gaps or bays of insufficient width
 Fill small holes of sufficiently small size
 Connects objects separated by a distance less than the size of the window
Erosion
Erosion is used for shrinking of element A by using element B. Erosion for Sets A
and B in Z2, is defined by the following equation:

This equation indicates that the erosion of A by B is the set of all points z such that B,
translated by z, is combined in A.

Digital Image Processing by P Sudheer Chakravarthi 3


CHAPTER-6 Morphological Image Processing & Image Segmentation

Any pixel in the output image touched by the · in the structuring element is set to ON
when every point of the structuring element touches a ON pixel in the original image. This
tends to makes objects smaller by removing pixels.
Duality between dilation and erosion:
Dilation and erosion are duals of each other with respect to set
complementation and reflection. That is,

Opening:
An erosion followed by a dilation using the same structuring element for both
operations.

 Smooth contour
 Break narrow isthmuses
 Remove thin protrusion

Digital Image Processing by P Sudheer Chakravarthi 4


CHAPTER-6 Morphological Image Processing & Image Segmentation

Closing:
A Dilation followed by a erosion using the same structuring element for both
operations.

 Smooth contour
 Fuse narrow breaks, and long thin gulfs.
 Remove small holes, and fill gaps.

Digital Image Processing by P Sudheer Chakravarthi 5


CHAPTER-6 Morphological Image Processing & Image Segmentation

Hit-or-Miss Transform:
The hit-and-miss transform is a basic tool for shape detection. The hit-or-miss
transform is a general binary morphological operation that can be used to look for particular
patterns of foreground and background pixels in an image.
Concept: To detect a shape:
 Hit object
 Miss background
Let the origin of each shape be located at its center of gravity.
 If we want to find the location of a shape– X , at (larger) image, A
 Let X be enclosed by a small window, say – W.
 The local background of X with respect to W is defined as the set difference (W -
X).
 Apply erosion operator of A by X, will get us the set of locations of the origin of X,
such that X is completely contained in A.
 It may be also view geometrically as the set of all locations of the origin of X at
which X found a match (hit) in A.
 Apply erosion operator on the complement of A by the local background set (W – X).
 Notice, that the set of locations for which X exactly fits inside A is the intersection of
these two last operators above.
 If B denotes the set composed of X and it’s background B = (B1,B2) ; B1 = X ,
B2 = (W-X).
 The match (or set of matches) of B in A, denoted as

B1: Object related, B2: Background related


 The reason for using these kind of structuring element B = (B1,B2) is based on
definition two or more objects are distinct only if they are disjoint (disconnected)
sets.
 In some applications, we may interested in detecting certain patterns (combinations)
of 1’s and 0’s, not for detecting individual objects.
 In this case a background is not required and the hit-or-miss transform reduces to
simple erosion.
 This simplified pattern detection scheme is used in some of the algorithms for –
identifying characters within a text.

Digital Image Processing by P Sudheer Chakravarthi 6


CHAPTER-6 Morphological Image Processing & Image Segmentation

The structural elements used for Hit-or-miss transforms are an extension to the ones
used with dilation, erosion etc. The structural elements can contain both foreground and
background pixels, rather than just foreground pixels, i.e. both ones and zeros. The
structuring element is superimposed over each pixel in the input image, and if an exact match
is found between the foreground and background pixels in the structuring element and the
image, the input pixel lying below the origin of the structuring element is set to the
foreground pixel value. If it does not match, the input pixel is replaced by the boundary pixel
value.
Basic Morphological Algorithms

Boundary Extraction:
The boundary of a set A is obtained by first eroding A by structuring element B and
then taking the set difference of A and it’s erosion. The resultant image after subtracting the
eroded image from the original image has the boundary of the objects extracted. The
thickness of the boundary depends on the size of the structuring element. The boundary of a
set A is obtained by first eroding A by structuring element B and then taking the set
difference of A and it’s erosion. The resultant image after subtracting the eroded image from

Digital Image Processing by P Sudheer Chakravarthi 7


CHAPTER-6 Morphological Image Processing & Image Segmentation

the original image has the boundary of the objects extracted. The thickness of the boundary
depends on the size of the structuring element. The boundary β (A) of a set A is

Region Filling or Hole Filling:


A Hole may be defined as a background region surrounded by connected
border of foreground pixels. This algorithm is based on a set of dilations,
complementation and intersections. Let P is the point inside the boundary, and
that is filled with the value of 1.

 The process stops when Xk = Xk-1


Digital Image Processing by P Sudheer Chakravarthi 8
CHAPTER-6 Morphological Image Processing & Image Segmentation

 The set Xk contains all the filled holes


 The result that given by union of A and Xk , is a set contains the filled set
and the boundary.

Extraction of Connected Components:


Extraction of connected components from a binary image is central to many
automated image analysis applications.
 Let A be a set containing one or more connected components and form an array X0
whose elements are 0s except at each location known to correspond to a point in each
connected component A which is set to 1.
 The objective is to start with X0 and find all the connected components.

Digital Image Processing by P Sudheer Chakravarthi 9


CHAPTER-6 Morphological Image Processing & Image Segmentation

Convex Hull:
A is said to be convex if a straight line segment joining any two points in A lies
entirely within A.
 The convex hull H of set S is the smallest convex set containing S
 The set difference H-S is called the convex deficiency of S
The convex hull and convex deficiency useful for object description. This algorithm
iteratively applying the hit-or-miss transforms to A with the first of B element, unions it with
A, and repeated with second element of B.
Let Bi , i=1,2,3,4 represents the four structuring elements. Then we need to
implement the

Let us consider

and the procedure terminates when

Digital Image Processing by P Sudheer Chakravarthi 10


CHAPTER-6 Morphological Image Processing & Image Segmentation

If

then the convex hull of A is defined as

Thinning:
The thinning of a set A by a structuring element B, can be defined by terms of the hit-
and-miss transform:

A more useful expression for thinning A symmetrically is based on a sequence of


structuring elements:
{B}={B1, B2, B3, …, Bn}

Digital Image Processing by P Sudheer Chakravarthi 11


CHAPTER-6 Morphological Image Processing & Image Segmentation

Where Bi is a rotated version of Bi-1. Using this concept we define thinning by a


sequence of structuring elements:

The process is to thin by one pass with B1, then thin the result with one pass with B2,
and so on until A is thinned with one pass with Bn. The entire process is repeated until no
further changes occur. Each pass is preformed using the equation:

Thickening:
Thickening is a morphological dual of thinning and is defined as

Digital Image Processing by P Sudheer Chakravarthi 12


CHAPTER-6 Morphological Image Processing & Image Segmentation

As in thinning, thickening can be defined as a sequential operation:

the structuring elements used for thickening have the same form as in thinning, but with all
1’s and 0’s interchanged.

Skeletons:
The skeleton of A is defined by terms of erosions and openings:

with

Where B is the structuring element and indicates k successive


erosions of A:

k times, and K is the last iterative step before A erodes to an empty set in other words:

The S(A) can be obtained as the union of skeleton subsets Sk(A). A can be also
reconstructed from subsets Sk(A) by using the equation

Digital Image Processing by P Sudheer Chakravarthi 13


CHAPTER-6 Morphological Image Processing & Image Segmentation

Where denotes k successive dilations of Sk(A) .that is:

Digital Image Processing by P Sudheer Chakravarthi 14


CHAPTER-6 Morphological Image Processing & Image Segmentation

GRAY SCALE MORPHOLOGY


Gray Scale Images:
In gray scale images on the contrary to binary images we deal with digital image
functions of the form f(x,y) as an input image and b(x,y) as a structuring element. (x,y) are
integers from Z*Z that represent a coordinates in the image. f(x,y) and b(x,y) are functions
that assign gray level value to each distinct pair of coordinates. For example the domain of
gray values can be 0-255, whereas 0 – is black, 255- is white.

Dilation:
Equation for gray-scale dilation is

Df and Db are domains of f and b. The condition that (s-x),(t-y) need to be in the
domain of f and x,y in the domain of b, is analogous to the condition in the binary definition
of dilation, where the two sets need to overlap by at least one element.

Digital Image Processing by P Sudheer Chakravarthi 15


CHAPTER-6 Morphological Image Processing & Image Segmentation

We will illustrate the previous equation in terms of 1-D. and we will receive an
equation for 1 variable:

The requirements the (s-x) is in the domain of f and x is in the domain of b implies
that f and b overlap by at least one element. Unlike the binary case, f, rather than the
structuring element b is shifted. Conceptually f sliding by b is really not different than b
sliding by f. The general effect of performing dilation on a gray scale image is twofold:
If all the values of the structuring elements are positive than the output image tends to
be brighter than the input. Dark details either are reduced or eliminated, depending on how
their values and shape relate to the structuring element used for dilation
Erosion:
Gray-scale erosion is defined as:

The condition that (s+x),(t+y) have to be in the domain of f, and x,y have to be in the
domain of b, is completely analogous to the condition in the binary definition of erosion,
where the structuring element has to be completely combined by the set being eroded. The
same as in erosion we illustrate with 1-D function

Digital Image Processing by P Sudheer Chakravarthi 16


CHAPTER-6 Morphological Image Processing & Image Segmentation

• General effect of performing an erosion in grayscale images:


 If all elements of the structuring element are positive, the output image tends
to be darker than the input image.
 The effect of bright details in the input image that are smaller in area than the
structuring element is reduced, with the degree of reduction being determined
by the grayscale values surrounding by the bright detail and by shape and
amplitude values of the structuring element itself.
• Similar to binary image grayscale erosion and dilation are duals with respect to
function complementation and reflection.
Opening:
In the opening of a gray-scale image, we remove small light details, while relatively
undisturbed overall gray levels and larger bright features.

The structuring element is rolled underside the surface of f. All the peaks that are
narrow with respect to the diameter of the structuring element will be reduced in amplitude
and sharpness. The initial erosion removes the details, but it also darkens the image. The
subsequent dilation again increases the overall intensity of the image without reintroducing
the details totally removed by erosion.
Opening a G-S picture is describable as pushing object B under the scan-line graph,
while traversing the graph according the curvature of B

Digital Image Processing by P Sudheer Chakravarthi 17


CHAPTER-6 Morphological Image Processing & Image Segmentation

Closing:
In the closing of a gray-scale image, we remove small dark details, while relatively
undisturbed overall gray levels and larger dark features

The structuring element is rolled on top of the surface of f. Peaks essentially are left
in their original form (assume that their separation at the narrowest points exceeds the
diameter of the structuring element). The initial dilation removes the dark details and
brightens the image. The subsequent erosion darkens the image without reintroducing the
details totally removed by dilation
Closing a G-S picture is describable as pushing object B on top of the scan-line graph,
while traversing the graph according the curvature of B. The peaks are usually remains in
their original form.

Applications of Gray-Scale Morphology:

Morphological smoothing
 Perform opening followed by a closing
 The net result of these two operations is to remove or attenuate both bright and
dark artifacts and noise.

Digital Image Processing by P Sudheer Chakravarthi 18


CHAPTER-6 Morphological Image Processing & Image Segmentation

Morphological gradient
 Dilation and erosion are use to compute the morphological gradient of an image,
denoted g:

 It uses to highlights sharp gray-level transitions in the input image.


 Obtained using symmetrical structuring elements tend to depend less on edge
directionality.
Top-hat and Bottom-hat transformation:
 Combining image subtraction with opening and closing results are referred as top-hat
and bottom-hat transformations.
 The top-hat transformation of gray scale image f is defined as

 The bottom-hat transformation of gray scale image f is defined as

 The top-hat transform is used for light objects on a dark background and the bottom-
hat transform is used for the converse.
Textural segmentation:
 The objective is to find the boundary between different image regions based on
their textural content.
 Close the input image by using successively larger structuring elements.
 Then, single opening is preformed ,and finally a simple threshold that yields the
boundary between the textural regions.

Digital Image Processing by P Sudheer Chakravarthi 19


CHAPTER-6 Morphological Image Processing & Image Segmentation

Granulometry:
 Granulometry is a field that deals principally with Determining the size distribution
of particles in an image.
 Because the particles are lighter than the background, we can use a morphological
approach to determine size distribution. To construct at the end a histogram of it.
 Based on the idea that opening operations of particular size have the most effect on
regions of the input image that contain particles of similar size.
 This type of processing is useful for describing regions with a predominant
particle-like character.

Digital Image Processing by P Sudheer Chakravarthi 20


CHAPTER-6 Morphological Image Processing & Image Segmentation

Image Segmentation
Image segmentation divides an image into regions that are connected and have some
similarity within the region and some difference between adjacent regions. The goal is
usually to find individual objects in an image. For the most part there are fundamentally two
kinds of approaches to segmentation: discontinuity and similarity.

 Similarity may be due to pixel intensity, color or texture.


 Differences are sudden changes (discontinuities) in any of these, but especially
sudden changes in intensity along a boundary line, which is called an edge.

Detection of Discontinuities:

There are three kinds of discontinuities of intensity: points, lines and edges. The most
common way to look for discontinuities is to scan a small mask over the image. The mask
determines which kind of discontinuity to look for.

9
R  w1 z1  w2 z 2  ...  w9 z9  w z
i 1
i i

Point Detection: Whose gray value is significantly different from its background
R T
where T : a nonnegativ e threshold

Digital Image Processing by P Sudheer Chakravarthi 21


CHAPTER-6 Morphological Image Processing & Image Segmentation

Line Detection:

 Only slightly more common than point detection is to find a one pixel wide line in an
image.

 For digital images the only three point straight lines are only horizontal, vertical, or
diagonal (+ or –45).

Preferred direction is weighted by with a larger coefficient


 The coefficients in each mask sum to zero response of constant gray level areas
 Compare values of individual masks (run all masks) or run only the mask of specified
direction

Edge Detection:

Edge is a set of connected pixels that lie on the boundary between two regions
• ’Local’ concept in contrast to ’more global’ boundary concept
• To be measured by grey-level transitions
• Ideal and blurred edges

Digital Image Processing by P Sudheer Chakravarthi 22


CHAPTER-6 Morphological Image Processing & Image Segmentation

 First derivative can be used to detect the presence of an edge (if a point is on a ramp).
 The sign of the second derivative can be used to determine whether an edge pixel lie
on the dark or light side of an edge
 Second derivative produces two value per edge
 Zero crossing near the edge midpoint
 Non-horizontal edges – define a profile perpendicular to the edge direction

Digital Image Processing by P Sudheer Chakravarthi 23


CHAPTER-6 Morphological Image Processing & Image Segmentation

Edges in the presense of noise


 Derivatives are sensitive to (even fairly little) noise
 Consider image smoothing prior to the use of derivatives
Edge definition again
 Edge point – whose first derivative is above a pre-specified threshold
 Edge – connected edge points
 Derivatives are computed through gradients (1st) and Laplacians (2nd)

Edge Detection Gradient Operators:

Gradient
– Vector pointing to the direction of maximum rate of change of f at coordinates (x,y)

Gx   fx 
f      f 
G y   y 

Digital Image Processing by P Sudheer Chakravarthi 24


CHAPTER-6 Morphological Image Processing & Image Segmentation

– Magnitude: gives the quantity of the increase (some times referred to as gradient too)


f  mag (f )  Gx2  Gy2 
1
2

– Direction: perpendicular to the direction of the edge at (x,y)


 Gx 
 ( x, y )  tan 1  

G
  y

Partial derivatives computed through 2x2 or 3x3 masks. Sobel operators introduce
some smoothing and give more importance to the center point

Digital Image Processing by P Sudheer Chakravarthi 25


CHAPTER-6 Morphological Image Processing & Image Segmentation

Laplacian
– Second-order derivative of a 2-D function

2 f 2 f
 f  2  2
2

x y
– Digital approximations by proper masks

 Complementary use for edge detection


 Cons: Laplacian is very sensible to noise; double edges
 Pros: Dark or light side of the edge; zero crossings are of better use
 Laplacian of Gaussian (LoG): preliminary smoothing to find edges through zero
crossings.
 Consider the function:

The Laplacian of h is

The Laplacian of a Gaussian sometimes is called the Mexican hat function. It also can
be computed by smoothing the image with the Gaussian smoothing mask, followed by
application of the Laplacian mask.

Digital Image Processing by P Sudheer Chakravarthi 26


CHAPTER-6 Morphological Image Processing & Image Segmentation

Edge Linking and Boundary Detection


Local Processing:
 Two properties of edge points are useful for edge linking:
o the strength (or magnitude) of the detected edge points
o their directions (determined from gradient directions)
 This is usually done in local neighborhoods.
 Adjacent edge points with similar magnitude and direction are linked.
For example, an edge pixel with coordinates (x0,y0) in a predefined neighborhood of (x,y)
is similar to the pixel at (x,y) if
Strength of the gradient vector response f ( x, y)  ( x0 , y0 )  E, E : a nonnegativ e threshold
Gradient vector direction  ( x, y)   ( x0 , y0 )  A, A : a nonegative angle threshold
Both magnitude and angle criteria should be satisfied

Global Processing via the Hough Transform:


Hough transform: a way of finding edge points in an image that lie along a straight line.
Example: xy-plane v.s. ab-plane (parameter space)
yi  axi  b

Digital Image Processing by P Sudheer Chakravarthi 27


CHAPTER-6 Morphological Image Processing & Image Segmentation

The Hough transform consists of finding all pairs of values of  and  which satisfy
the equations that pass through (x,y). These are accumulated in what is basically a
2-dimensional histogram. When plotted these pairs of  and  will look like a sine wave. The
process is repeated for all appropriate (x,y) locations.

Thresholding:

The range of intensity levels covered by objects of interest is different from the
background.
1 if f ( x, y )  T
g ( x, y )  
0 if f ( x, y )  T

Illumination:

Fig. a) Computer Generated Reflectance Function b) Histogram of Reflectance


Function c) Computer Generated Illumination Function d) Product of a and c e)
Histogram of Product Image.

Digital Image Processing by P Sudheer Chakravarthi 28


CHAPTER-6 Morphological Image Processing & Image Segmentation

 Image is a product of reflectance and illuminance


 Reflection nature of objects and backaground
 Poor (nonlinear) illumination could impede the segmentation
 The final histogram is a result of convolution of the histogram of the log reflectance
and log illuminance functions
 Normalization if the illuminance function is known

Basic Global Thresholding:

Threshold midway between maximum and minimum gray levels


– Appropriate for industrial inspection applications with controllable illumination
– Automatic algorithm
• Segment with initial T into regions G1 and G2
• Compute the average gray level m1 and m2
• Compute new T=0.5(m1+m2)
• Repeat until reach an acceptably small change of T in successive iterations

Region Based Segmentation:


Edges and thresholds sometimes do not give good results for segmentation. Region-
based segmentation is based on the connectivity of similar pixels in a region.

– Each region must be uniform.

– Connectivity of the pixels within the region is very important.

Digital Image Processing by P Sudheer Chakravarthi 29


CHAPTER-6 Morphological Image Processing & Image Segmentation

There are two main approaches to region-based segmentation: region growing and
region splitting.

Region Growing:

 Let R represent the entire image region.

 Segmentation is a process that partitions R into subregions, R1,R2,…,Rn, such that


n
(a)  Ri  R
i 1

(b) Ri is a connected region, i  1,2,..., n


(c) Ri  R j   for all i and j, i  j

(d) P( Ri )  TRUE for i  1,2,..., n

(e) P( Ri  R j )  FALSE for any adjacent regions Ri and R j


where P(Rk): a logical predicate defined over the points in set Rk

For example: P(Rk)=TRUE if all pixels in Rk have the same gray level.

Digital Image Processing by P Sudheer Chakravarthi 30


CHAPTER-6 Morphological Image Processing & Image Segmentation

Region Splitting:

 First there is a large region (possible the entire image).


 Then a predicate (measurement) is used to determine if the region is uniform.
 If not, then the method requires that the region be split into two regions.
 Then each of these two regions is independently tested by the predicate
(measurement).
 This procedure continues until all resulting regions are uniform.
 The main problem with region splitting is determining where to split a region.
 One method to divide a region is to use a quadtree structure.
 Quadtree: a tree in which nodes have exactly four descendants.

Digital Image Processing by P Sudheer Chakravarthi 31


CHAPTER-6 Morphological Image Processing & Image Segmentation

Segmentation by Morphological Watersheds:


 The concept of watersheds is based on visualizing an image in three dimensions: two
spatial coordinates versus gray levels.
 In such a topographic interpretation, we consider three types of points:
(a) Points belonging to a regional minimum
(b) Points at which a drop of water would fall with certainty to a single minimum
(c) Points at which water would be equally likely to fall to more than one such
minimum.
 The principal objective of segmentation algorithms based on these concepts is to find
the watershed lines.

Digital Image Processing by P Sudheer Chakravarthi 32


CHAPTER-6 Morphological Image Processing & Image Segmentation

PREVIOUS QUESTIONS
1. With necessary figures, explain the opening and closing operations.
2. Explain the following morphological algorithms i) Boundary extraction ii) Hole
filling.
3. Explain the following morphological algorithms i) Thinning ii) Thickening
4. What is Hit-or-Miss transformation? Explain.
5. Discuss about Grey-scale morphology.
6. Write a short notes on Geometric Transformation
7. Explain about edge detection using gradient operator.
8. What is meant by edge linking? Explain edge linking using local processing
9. Explain edge linking using Hough transform.
10. Describe Watershed segmentation Algorithm
11. Discuss about region based segmentation.
12. Explain the concept of Thresholding in image segmentation and discuss its
merits and limitations.

Digital Image Processing by P Sudheer Chakravarthi 33

You might also like