DIP Material
DIP Material
There are two categories of the steps involved in the image processing
1. Methods whose outputs are input are images.
2. Methods whose outputs are attributes extracted from those images.
i) Image acquisition
It could be as simple as being given an image that is already in digital form. Generally the
image acquisition stage involves processing such as scaling.
vi) Compression
It deals with techniques reducing the storage required to save an image, or the bandwidth
required to transmit it over the network. It has to major approaches:
a) Lossless Compression
b) Lossy Compression
ix) Recognition
It is the process that assigns label to an object based on its descriptors. It is the last step of image
processing which use artificial intelligence software.
x) Knowledge base
Knowledge about a problem domain is coded into an image processing system in the form of a
knowledge base. This knowledge may be as simple as detailing regions of an image where the
information of the interest in known to be located. Thus limiting search that has to be conducted
in seeking the information. The knowledge base also can be quite complex such interrelated list
of all major possible defects in a materials inspection problems or an image database containing
high resolution satellite images of a region in connection with change detection application
Components of Image Processing System
Image Sensors
With reference to sensing, two elements are required to acquire digital image. The first is a
physical device that is sensitive to the energy radiated by the object we wish to image and second
is specialized image processing hardware.
Computer:
It is a general purpose computer and can range from a PC to a supercomputer depending on the
application. In dedicated applications, sometimes specially designed computer are used to
achieve a required level of performance.
Software
It consist of specialized modules that perform specific tasks a well designed package also
includes capability for the user to write code, as a minimum, utilizes the specialized module.
More sophisticated software packages allow the integration of these modules.
Mass storage
This capability is a must in image processing applications. An image of size 1024 x1024 pixels,
in which the intensity of each pixel is an 8- bit quantity requires one megabytes of storage space
if the image is not compressed.
Image processing applications falls into three principal categories of storage
i) Short term storage for use during processing
ii) On line storage for relatively fast retrieval
iii) Archival storage such as magnetic tapes and disks
Image displays
Image displays in use today are mainly color TV monitors. These monitors are driven by the
outputs of image and graphics displays cards that are an integral part of computer system
Hardcopy devices
The devices for recording image includes laser printers, film cameras, heat sensitive devices
inkjet units and digital units such as optical and CD ROM disk. Films provide the highest
possible resolution, but paper is the obvious medium of choice for written applications.
Networking
It is almost a default function in any computer system in use today because of the large amount
of data inherent in image processing applications. The key consideration in image transmission
bandwidth.
a) The cornea and sclera: it is a tough, transparent tissue that covers the anterior surface of the
eye. Rest of the optic globe is covered by the sclera
b) The choroid: It contains a network of blood vessels that serve as the major source of nutrition
to the eyes. It helps to reduce extraneous light entering in the eye It has two parts
(1) Iris Diaphragms- it contracts or expands to control the amount of light that enters
the eyes.
(2) Ciliary body
The lens is made up of concentric layers of fibrous cells and is suspended by fibers that attach to
the ciliary body. It contains 60 to 70% water, about 6% fat, and more protein than any other
tissue in the eye.
The lens is colored by a slightly yellow pigmentation that increases with age. In extreme cases,
excessive clouding of the lens, caused by the affliction commonly referred to as cataracts, can
lead to poor color discrimination and loss of clear vision.
c) Retina: it is innermost membrane of the eye. When the eye is properly focused, light from an
object outside the eye is imaged on the retina. There are various light receptors over the surface
of the retina
The two major classes of the receptors are-
1) Cones- it is in the number about 6 to 7 million. These are located in the central portion of the
retina called the fovea. These are highly sensitive to color. Human can resolve fine details with
these cones because each one is connected to its own nerve end. Cone vision is called photopic
or bright light vision
2) Rods – these are very much in number from 75 to 150 million and are distributed over the
entire retinal surface. The large area of distribution and the fact that several roads are connected
to a single nerve give a general overall picture of the field of view.They are not involved in the
color vision and are sensitive to low level of illumination. Rod vision is called is scotopic or dim
light vision.
The absent of reciprocators is called blind spot. Figure shows the density of rods and cones for a
cross section of the right eye passing through the region of emergence of the optic nerve from the
eye.
The geometry in Fig. illustrates how to obtain the dimensions of an image formed on the retina.
For example, suppose that a person is looking at a tree 15 m high at a distance of 100 m. Letting
h denote the height of that object in the retinal image, the geometry of Fig. yields 15/100 = h/17
or h = 2.55mm. The retinal image is focused primarily on the region of the fovea. Perception
then takes place by the relative excitation of light receptors, which transform radiant energy into
electrical impulses that ultimately are decoded by the brain.
The curve represents the range of intensities to which the visual system can adopt. But the visual
system cannot operate over such a dynamic range simultaneously. Rather, it is accomplished by
change in its overcall sensitivity called brightness adaptation. For any given set of conditions, the
current sensitivity level to which of the visual system is called brightness adoption level , Ba in
the curve. The small intersecting curve represents the range of subjective brightness that the eye
can perceive when adapted to this level. It is restricted at level Bb , at and below which all
stimuli are perceived as indistinguishable blacks. The upper portion of the curve is not actually
restricted. Whole simply raise the adaptation level higher than Ba. The ability of the eye to
discriminate between change in light intensity at any specific adaptation level is also of
considerable interest. Take a flat, uniformly illuminated area large enough to occupy the entire
field of view of the subject. It may be a diffuser such as an opaque glass, that is illuminated from
behind by a light source whose intensity, I can be varied. To this field is added an increment of
illumination ΔI in the form of a short duration flash that appears as circle in the center of the
uniformly illuminated field. If ΔI is not bright enough, the subject cannot see any perceivable
changes.
As ΔI gets stronger the subject may indicate of a perceived change. ΔIc is the increment of
illumination discernible 50% of the time with background illumination I. Now, ΔIc /I is called
the Weber ratio.
Small value means that small percentage change in intensity is discernible representing “good”
brightness discrimination.
Large value of Weber ratio means large percentage change in intensity is required representing
“poor brightness discrimination”. Optical illusion In this the eye fills the non existing
information or wrongly pervious geometrical properties of objects
.
A plot of as a function of log I has the general shape shown in Fig. This curve shows that
brightness discrimination is poor (the Weber ratio is large) at low levels of illumination, and it
improves significantly (the Weber ratio decreases) as background illumination increases.
Two phenomena clearly demonstrate that perceived brightness is not a simple function of intensity. The
first is based on the fact that the visual system tends to undershoot or overshoot around the boundary of
regions of different intensities. Figure 2.7(a) shows a striking example of this phenomenon. Although the
intensity of the stripes is constant, we actually perceive a brightness pattern that is strongly scalloped near
the boundaries fig. These seemingly scalloped bands are called Mach bands.
The second phenomenon, called simultaneous contrast, is related to the fact that a region’s
perceived brightness does not depend simply on its intensity, as Fig demonstrates. All the center
squares have exactly the same intensity.
However, they appear to the eye to become darker as the background gets lighter.
Other examples of human perception phenomena are optical illusions, in which the eye fills in
nonexisting information or wrongly perceives geometrical properties of objects. Figure shows
some examples. In Fig.a. the outline of a square is seen clearly, despite the fact that no lines
defining such a figure are part of the image.The same effect, this time with a circle, can be seen
in Fig.(b); note how just a few lines are sufficient to give the illusion of a complete circle. The
two horizontal line segments in Fig. (c) are of the same length, but one appears shorter than the
other. Finally, all lines in Fig.(d) that are oriented at 45° are equidistant and parallel. Yet the
crosshatching creates the illusion that those lines are far from being parallel. Optical illusions are
a characteristic of the human visual system that is not fully understood.
Image Sensing and Acquisition:
The types of images in which we are interested are generated by the combination of an
“illumination” source and the reflection or absorption of energy from that source by the elements
of the “scene” being imaged.
Depending on the nature of the source, illumination energy is reflected from, or transmitted
through, objects. An example in the first category is light reflected from a planar surface. An
example in the second category is when X-rays pass through a patient’s body for the purpose of
generating a diagnostic X-ray film. In some applications, the reflected or transmitted energy is
focused onto a photo converter (e.g., a phosphor screen), which converts the energy into visible
light. Electron microscopy and some applications of gamma imaging use this approach.
The idea is simple: Incoming energy is transformed into a voltage by the combination of input
electrical power and sensor material that is responsive to the particular type of energy being
detected.
The output voltage waveform is the response of the sensor(s), and a digital quantity is obtained
from each sensor by digitizing its response. In this section, we look at the principal modalities for
image sensing and generation.
Figure 2.12 shows the three principal sensor arrangements used to transform illumination energy
into digital images.The idea is simple: Incoming energy is transformed into a voltage by the
combination of input electrical power and sensor material that is responsive to the particular type
of energy being detected.The output voltage waveform is the response of the sensor(s), and a
digital quantity is obtained from each sensor by digitizing its response. In this section, we look at
the principal modalities for image sensing and generation.
Image Acquisition Using a Single Sensor:
The components of a single sensor. The most familiar sensor of this type is the
photodiode, which is constructed of silicon materials and whose output voltage waveform is
proportional to light. The use of a filter in front of a sensor improves selectivity.
For example, a green (pass) filter in front of a light sensor favors light in the green band of
the color spectrum. As a consequence, the sensor output will be stronger for green light than for
other components in the visible spectrum.
In order to generate a 2-D image using a single sensor, there has to be relative
displacements in both the x- and y-directions between the sensor and the area to be imaged.
Figure 2.13 shows an arrangement used in high-precision scanning, where a film negative is
mounted onto a drum whose mechanical rotation provides displacement in one dimension. The
single sensor is mounted on a lead screw that provides motion in the perpendicular direction.
Since mechanical motion can be controlled with high precision, this method is an inexpensive
(but slow) way to obtain high-resolution images. Other similar mechanical arrangements use a
flat bed, with the sensor moving in two linear directions. These types of mechanical digitizers
sometimes are referred to as microdensitometers.
Image Acquisition Using Sensor Strips:
A geometry that is used much more frequently than single sensors consists of an in-line
arrangement of sensors in the form of a sensor strip, shows.
The strip provides imaging elements in one direction. Motion perpendicular to the strip
provides imaging in the other direction. This is the type of arrangement used in most flat
bed scanners.
Sensing devices with 4000 or more in-line sensors are possible.
In-line sensors are used routinely in airborne imaging applications, in which the imaging
system is mounted on an aircraft that flies at a constant altitude and speed over the
geographical area to be imaged.
One- dimensional imaging sensor strips that respond to various bands of the
electromagnetic spectrum are mounted perpendicular to the direction of flight.
The imaging strip gives one line of an image at a time, and the motion of the strip
completes the other dimension of a two-dimensional image.
Lenses or other focusing schemes are used to project area to be scanned onto the sensors.
Sensor strips mounted in a ring configuration are used in medical and industrial imaging
to obtain cross-sectional (“slice”) images of 3-D objects
The one-dimensional function shown in Fig. 2.16(b) is a plot of amplitude (gray level) values of
the continuous image along the line segment AB. The random variations are due to image noise.
To sample this function, we take equally spaced samples along line AB, The location of each
sample is given by a vertical tick mark in the bottom part of the figure. The samples are shown
as small white squares superimposed on the function. The set of these discrete locations gives the
sampled function. However, the values of the samples still span (vertically) a continuous range
of gray-level values. In order to form a digital function, the gray-level values also must be
converted (quantized) into discrete quantities. The right side gray-level scale divided into eight
discrete levels, ranging from black to white. The vertical tick marks indicate the specific value
assigned to each of the eight gray levels. The continuous gray levels are quantized simply by
assigning one of the eight discrete gray levels to each sample. The assignment is made depending
on the vertical proximity of a sample to a vertical tick mark. The digital samples resulting from
both sampling and quantization.
When a sensing array is used for image acquisition, there is no motion and the number of
sensors in the array establishes the limits of sampling in both directions. Quantization of the
sensor outputs is as before. Figure 2.17 illustrates this concept. Figure 2.17(a) shows a
continuous image projected onto the plane of an array sensor. Figure 2.17(b) shows the image
after sampling and quantization. Clearly, the quality of a digital image is determined to a large
degree by the number of samples and discrete intensity levels used in sampling and quantization.
Each element of this matrix is called an image element, picture element, pixel, or pel.
Traditional matrix notation to denote a digital image and its elements:
The sampling process may be viewed as partitioning the x-y plane into a grid with the
coordinates of the center of each grid being a pair of elements from the Cartesian products Z2
which is the set of all ordered pair of elements (Zi, Zj) with Zi and Zj being integers from Z.
Hence f(x,y) is a digital image if gray level (that is, a real number from the set of real number R)
to each distinct pair of coordinates (x,y). This functional assignment is the quantization process.
If the gray levels are also integers, Z replaces R, and a digital image become a 2D
function whose coordinates and the amplitude value are integers. Due to processing storage and
hardware consideration, the number of gray levels typically is an integer power of 2. L=2K Then,
the number b, of bits required to store a digital image is
b=M *N* K
When M=N The equation become b=N2*K
When an image can have 2k gray levels, it is referred to as “k- bit”. An image with 256 possible
gray levels is called an “8-bit image (because 256=28).
There are three basic ways to represent f(x, y),
Dynamic range of an imaging system to be the ratio of the maximum measurable intensity to the
minimum detectable intensity level in the system. As a rule, the upper limit is determined by
saturation and the lower limit by noise
The difference in intensity between the highest and lowest intensity levels in an image is called
Contrast level of an image.
Spatial and Intensity Resolution:
Spatial resolution is the smallest discernible details are an image. spatial resolution can
be stated in a number of ways, with line pairs per unit distance, and dots (pixels) per unit
distance. Suppose a chart can be constructed with vertical lines of width w with the space
between the also having width W, so a line pair consists of one such line and its adjacent space
thus. The width of the line pair is 2w and there is 1/2w line pair per unit distance resolution is
simply the smallest number of discernible line pair unit distance.
Dots per unit distance are a measure of image resolution used commonly in the printing and
publishing industry. In the U.S., this measure usually is expressed as dots per inch (dpi). To give
you an idea of quality, newspapers are printed with a resolution of 75 dpi, magazines at 133 dpi,
glossy brochures at 175 dpi, and the book page at which you are presently looking is printed at
2400 dpi.
Intensity resolution refers to smallest discernible change in gray levels.
Measuring discernible change in gray levels is a highly subjective process reducing the number
of bits R while repairing the spatial resolution constant creates the problem of false contouring .it
is caused by the use of an insufficient number of gray levels on the smooth areas of the digital
image . It is called so because the rides resemble top graphics contours in a map. It is generally
quite visible in image displayed using 16 or less uniformly spaced gray levels.
iso Preference Curves
To see the effect of varying N and K simultaneously, these pictures are taken having little, mid
level and high level of details.
Different image were generated by varying N and k and observers were then asked to rank the
results according to their subjective quality. Results were summarized in the form of iso-
preference curve in the N-k plane.
The iso-preference curve tends to shift right and upward but their shapes in each of the three
image categories are shown in the figure. A shift up and right in the curve simply means large
values for N and k which implies better picture quality The result shows that iso-preference
curve tends to become more vertical as the detail in the image increases. The result suggests that
for image with a large amount of details only a few gray levels may be needed. For a fixed value
of N, the perceived quality for this type of image is nearly independent of the number of gray
levels used.
Image Interpolation:
Interpolation is a basic tool used extensively in tasks such as zooming, shrinking, rotating, and
geometric corrections. All these tasks are called resampling methods.
Interpolation is the process of using known data to estimate values at unknown locations.
Suppose that an image of 500 X 500 size pixels has to be enlarged 1.5 times to 750 X750
pixels. A simple way to visualize zooming is to create an imaginary 750 X750 grid with the same
pixel spacing as the original, and then shrink it so that it fits exactly over the original image.
Obviously, the pixel spacing in the shrunken 750 X750 grid will be less than the pixel spacing in
the original image.
To perform intensity-level assignment for any point in the overlay, we look for its closest pixel
in the original image and assign the intensity of that pixel to the new pixel in the 750 X750
grid.When we are finished assigning intensities to all the points in the overlay grid, we expand it
to the original specified size to obtain the zoomed image. The method is called nearest neighbor
interpolation because it assigns to each new location the intensity of its nearest neighbor in the
original image.
A more suitable approach is bilinear interpolation, in which we use the four nearest neighbors to
estimate the intensity at a given location. Let (x, y) denote the coordinates of the location to
which we want to assign an intensity value and let v (x, y) denote that intensity value. For
bilinear interpolation, the assigned value is obtained using the equation
v(x, y) = ax + by + cxy + d
where the four coefficients are determined from the four equations in four unknowns that can be
written using the four nearest neighbors of point (x, y).
The next level of complexity is bicubic interpolation, which involves the sixteen nearest
neighbors of a point.The intensity value assigned to point (x, y) is obtained using the equation
CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain
UNIT-2
Fig: A 3×3 neighborhood about a point (x, y) in an image in the spatial domain.
The point (x, y) is an arbitrary location in the image and the small region shown
containing the point is a neighborhood of (x, y). The neighborhood is rectangular, centered on
(x, y) and much smaller in size than the image.
The process consists of moving the origin of the neighborhood from pixel to pixel and
applying the operator T to the pixels in the neighborhood to yield the output at that location.
Thus for any specific location (x, y) the value of the output image g at those coordinates is
equal to the result of applying T to the neighborhood with origin at (x, y) in f. This procedure
is called spatial filtering, in which the neighborhood, along with a predefined operation is
called a spatial filter. The smallest possible neighborhood is of size 1×1. In this case, g
depends only on the value of f at a single point (x, y) and T becomes an intensity
transformation or gray level mapping of the form
s = T(r)
Where s and r are variables represents the intensity of g and f at any point (x, y). The
effect of applying the transformation T(r) to every pixel of f to generate the corresponding
pixels in g would produce an image of higher contrast than the original by darkening the
levels below m and brightening the levels above m in the original image. This is known as
contrast stretching (Fig.(a)), the values of r below m are compressed by the transformation
function into a narrow range of s, toward black. The opposite effect takes place for values of
r above m. In the limiting case shown in Fig.(b), T(r) produces a two-level (binary) image. A
mapping of this form is called a thresholding function. Hence the enhancement at any point
in an image depends only on the gray level at that point, and the techniques in this category
are referred to as point processing.
Fig: Some basic Intensity transformation functions used for image enhancement.
Image Negatives
The negative of an image with gray levels in the range [0, L-1]is obtained by using
the negative transformation which is given by the expression
s=L-1–r
Reversing the intensity levels of an image in this manner produces the equivalent of a
photographic negative. This type of processing is particularly suited for enhancing white or
gray detail embedded in dark regions of an image, especially when the black areas are
dominant in size.
Fig: (a) Original digital mammogram. (b) Negative image obtained using the negative
transformation
Log Transformations
The general form of the log transformation is
S=c log (1+r)
where c is a constant, and it is assumed that r ≥ 0.The shape of the log curve shows that this
transformation maps a narrow range of low gray-level values in the input image into a wider
range of output levels. The opposite is true of higher values of input levels. We would use a
transformation of this type to expand the values of dark pixels in an image while compressing
the higher-level values. The opposite is true of the inverse log transformation. The log
transformation function has an important characteristic that it compresses the dynamic range
of images with large variations in pixel values. Log transformation is basically employed in
Fourier transform.
Fig: (a) Fourier spectrum. (b) Result of applying the log transformation
Power –Law Transformations
Power-law transformations have the basic form s = crγ
Where c and γ are positive constants. However Plots of s versus r for various values
of γ are shown in the following figure.
Fig: Plots of the equation s=crγ for various values of g (c=1 in all cases).
The curves generated with values of γ>1 have exactly the opposite effect as those
generated with values of γ<1. It reduces to the identity transformation when c=γ=1. The
power law transformation used in a variety of devices for image capture, printing, and
display. The exponent in the power-law equation is referred to as gamma is used to correct
this power-law response phenomena is called gamma correction.
Piecewise-Linear Transformation Functions
The principal advantage of piecewise linear functions is that these functions can be
arbitrarily complex. But their specification requires considerably more user input. These
transformations are of 3 types.
Contrast Stretching:
Contrast Stretching is a process that expands the range of intensity values in an image,
in order to utilize the dynamic range of intensity values. It is one of the simplest piecewise
linear function. Low contrast images can result from poor illumination. The following figure
shows that typical transformation used for contrast stretching.
Intensity-Level Slicing:
The process of highlighting a specific range of intensities in an image is known as
Intensity-Level Slicing. There are two basic approaches can be adopted for Intensity-Level
Slicing.
One approach is to display a high value for all gray levels in the range of interest and
a low value for all other intensities. This transformation produces a binary image.
The second approach, based on the transformation, brightens the desired range of gray
levels but preserves the background and gray-level in the image without change.
Fig: (a) The transformation highlights range [A, B] of gray levels and reduces all others to a constant
level. (b) The transformation highlights range [A, B] but preserves all other levels.
Bit-Plane Slicing:
Instead of highlighting gray-level ranges, highlighting the contribution made to total
image appearance by specific bits might be desired. Suppose that each pixel in an image is
represented by 8 bits. Imagine that the image is composed of eight 1-bit planes, ranging from
bit-plane 0 for the least significant bit to bit plane 7 for the most significant bit. In terms of 8-
bit bytes, plane 0 contains all the lowest order bits in the bytes comprising the pixels in the
image and plane 7 contains all the high-order bits. The following figure shows the various bit
planes for an image.
The higher-order bits contain the majority of the visually significant data. The other
bit planes contribute to more subtle details in the image. Separating a digital image into its bit
planes is useful for analyzing the relative importance played by each bit of the image, a
process that aids in determining the adequacy of the number of bits used to quantize each
pixel.
nk
p rk = for k = 0, 1, 2,…..L-1
MN
Histograms are the basis for numerous spatial domain processing techniques.
Histogram manipulation can be used effectively for image enhancement and also is quite
useful in other image processing applications, such as image compression and segmentation.
Histograms are simple to calculate in software and also lend themselves to economic
hardware implementations, thus making them a popular tool for real-time image processing.
The purpose of histogram is to classify the image falls in which category. Generally
images are classified as follows,
Dark images: The components of the histogram are concentrated on the low side of
the intensity scale.
Bright images: The components of the histogram are biased towards the high side of
the intensity scale.
Low contrast: An image with low contrast has a histogram that will be narrow and
will be centered toward the middle of the gray scale.
High contrast: The components of histogram in the high-contrast image cover a
broad range of the gray scale.
Fig: Dark, light, low contrast, high contrast images, and their corresponding histograms.
Histogram Equalization
Let the variable r represent the gray levels of the image to be enhanced. We assume
that r has been normalized to the interval [0, L-1], with r=0 representing black and r=L-1
representing white. For any r the conditions, then transformations of the form,
s = T(r) 0 ≤ r ≤ L-1
It produces an output intensity level s for every pixel in the input image having
intensity r. Assume that the transformation function T(r) satisfies the following conditions:
(a) T(r) is single-valued and monotonically increasing function in the interval 0 ≤ r ≤ L-1
(b) 0 ≤T( r) ≤ L-1 for 0 ≤ r ≤ L-1
Hence the Ps(s) is a uniform probability function and independent of the form Pr(r).
For discrete values we deal with probabilities and summations instead of probability density
functions and integrals. The probability of occurrence of gray level rk in an image is
approximated by
nk
Pr r = k = 0, 1, 2,…….L-1
MN
Where MN is the total number of pixels in the image, nk is the number of pixels that
have gray level rk and L is the number of possible intensity levels in the image. The discrete
version of the transformation function given is
𝑘
sk = Tk(r) = 𝑗 =0 𝑃𝑟 (𝑟𝑗 )
Thus, a processed (output) image is obtained by mapping each pixel with level rk in
the input image into a corresponding pixel with level sk in the output image. Hence the
transformation is called histogram equalization or histogram linearization.
Histogram Matching
Histogram equalization automatically determines a transformation function that seeks
to produce an output image that has a uniform histogram. This is a good approach because
the results from this technique are predictable and the method is simple to implement.
However in some applications the enhancement on a uniform histogram is not the best
approach. In particular, it is useful sometimes to be able to specify the shape of the histogram
that we wish the processed image to have. The method used to generate a processed image
that has a specified histogram is called histogram matching or histogram specification.
Let us consider a continuous gray levels r and z (considered continuous random
variables), and let pr(r) and pz(z) denote their corresponding continuous probability density
functions. Where r and z denote the gray levels of the input and output (processed) images
respectively. We can estimate pr(r) from the given input image, while pz(z) is the specified
probability density function that we wish the output image to have. Let s be a random
variable with the property,
𝑟
s = T(r) = 0 𝑟
𝑃 𝑤 𝑑𝑤
Obtain the inverse transformation function G–1 by using Z= G-1 (s) = G-1 [T(r)]
Obtain the output image by applying Z= G-1 (s) = G-1 [T(r)] to all the pixels in the
input image. The result of this procedure will be an image whose gray levels, z, have
the specified probability density function pz(z).
Local Histogram Processing
The histogram processing methods are global, in the sense that pixels are modified by
a transformation function based on the intensity distribution of an entire image. Although this
global approach is suitable for overall enhancement, there are cases in which it is necessary to
enhance details over small areas in an image. The number of pixels in these areas may have
negligible influence on the computation of a global transformation whose shape does not
necessarily guarantee the desired local enhancement. The solution is to devise transformation
functions based on the intensity distribution or other properties in the neighborhood of every
pixel in the image.
The histogram processing techniques previously described are easily adaptable to
local enhancement. The procedure is to define a square or rectangular neighborhood and
move the center of this area from pixel to pixel. At each location, the histogram of the points
in the neighborhood is computed and either a histogram equalization or histogram
specification transformation function is obtained. This function is finally used to map the
intensity of the pixel centered in the neighborhood. The center of the neighborhood region is
then moved to an adjacent pixel location and the procedure is repeated. Since only one new
row or column of the neighborhood changes during a pixel-to-pixel translation of the region,
updating the histogram obtained in the previous location with the new data introduced at each
motion step is possible. This approach has obvious advantages over repeatedly computing the
histogram over all pixels in the neighborhood region each time the region is moved one pixel
location. Another approach used some times to reduce computation is to utilize non
overlapping regions, but this method usually produces an undesirable checkerboard effect.
2.4 Enhancement Using Arithmetic/Logic Operations
Arithmetic/logic operations involving images are performed on a pixel-by-pixel basis
between two or more images. For an example, subtraction of two images results in a new
image whose pixel at coordinates (x, y) is the difference between the pixels in that same
location in the two images being subtracted. Depending on the hardware and/or software
being used, the actual mechanics of implementing arithmetic/logic operations can be done
sequentially, one pixel at a time, or in parallel, where all operations are performed
simultaneously.
As K increases, the variability (noise) of the pixel values at each location (x, y)
decreases. Because E {𝑔(x, y)} = f(x, y), this means that 𝑔(x, y) approaches f(x, y) as the
number of noisy images used in the averaging process increases. The images gi(x, y) must be
registered (aligned) in order to avoid the introduction of blurring and other artifacts in the
output image.
Fig: The mechanics of linear spatial filtering using a 3×3 filter mask.
For the 3×3 mask shown in the figure, the result (or response), g(x, y) of linear
filtering with the filter mask at a point (x, y) in the image is
It is the sum of products of the mask coefficients with the corresponding pixels
directly under the mask. Observe that the coefficient w (0, 0) coincides with image value f(x,
y), indicating that the mask is centered at (x, y) when the computation of the sum of products
takes place. For a mask of size m×n, we assume that m=2a+1 and n=2b+1, where a and b are
nonnegative integers.
In general, linear filtering of an image f of size M×N with a filter mask of size m×n is
given by the expression:
Where x and y are varied so that each pixel in w visits every pixel in f
Spatial Correlation and Convolution
Correlation is the process of moving a filter mask over the image and computing the
sum of products at each location. The mechanism of convolution is the same except that the
filter is first rotated by 1800. The difference between the correlation and convolution can be
explained with a 1-D image as follows.
The correlation of a filter w (x, y) of size m×n with an image is given by the equation
The convolution of a filter w (x, y) of size m×n with an image is given by the equation
It is the average of the gray levels of the pixels in the 3×3 neighborhood defined by
the mask. An m×n mask would have a normalizing constant equal to 1/mn. A spatial
averaging filter in which all coefficients are equal is called a box filter.
The second mask is called weighted average which is used to indicate that pixels are
multiplied by different coefficients. In this mask the pixel at the center of the mask is
multiplied by a higher value than any other, thus giving this pixel more importance in the
calculation of the average. The other pixels are inversely weighted as a function of their
distance from the center of the mask. The diagonal terms are further away from the center
than the orthogonal neighbors and, thus, are weighed less than these immediate neighbors of
the center pixel.
The general implementation for filtering an M×N image with a weighted averaging
filter of size m×n (m and n odd) is given by the expression
The complete filtered image is obtained by applying the above equation for x=0, 1, 2,
………. M-1 and y=0, 1, 2,………., N-1. The denominator is simply the sum of the mask
coefficients and, therefore, it is a constant that needs to be computed only once. This scale
factor is applied to all the pixels of the output image after the filtering process is completed.
Order-Statistics Filters
Order-statistics filters are nonlinear spatial filters whose response is based on ordering
(ranking) the pixels contained in the image area encompassed by the filter, and then replacing
the value of the center pixel with the value determined by the ranking result. The best-known
example in this category is the “Median filter”. It replaces the value of a pixel by the median
of the gray levels in the neighborhood of that pixel. Median filters are quite popular because,
they provide excellent noise-reduction capabilities. They are effective in the presence of
impulse noise, also called salt-and-pepper noise because of its appearance as white and black
dots superimposed on an image.
In order to perform median filtering at a point in an image, we first sort the values of
the pixel in question and its neighbors, determine their median, and assign this value to that
pixel. For example, in a 3×3 neighborhood the median is the 5th largest value, in a 5×5
neighborhood the 13th largest value, and so on. When several values in a neighborhood are
the same, all equal values are grouped. For example, suppose that a 3×3 neighborhood has
values (10, 20, 20, 20, 15, 20, 20, 25, 100). These values are sorted as (10, 15, 20, 20, 20, 20,
20, 25, 100), which results in a median of 20. Thus, the principal function of median filters is
to force points with distinct gray levels to be more like their neighbors.
Digital Image Processing by P Sudheer Chakravarthi 16
CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain
The median represents the 50th percentile of a ranked set of numbers, but the ranking
lends itself to many other possibilities. For example, using the 100th percentile filter is called
max filter, which is useful to find the brightest points in an image. The 0th percentile filter is
the min filter, used for the opposite purpose.
2.7 Sharpening Spatial Filters
The principal objective of sharpening is to highlight fine detail in an image or to
enhance detail that has been blurred, either in error or as a natural effect of a particular
method of image acquisition. It includes applications ranging from electronic printing and
medical imaging to industrial inspection and autonomous guidance in military systems.
As smoothing can be achieved by integration, sharpening can be achieved by spatial
differentiation. The strength of response of derivative operator is proportional to the degree of
discontinuity of the image at that point at which the operator is applied. Thus image
differentiation enhances edges and other discontinuities and deemphasizes the areas with
slow varying grey levels.
The derivatives of a digital function are defined in terms of differences. There are
various ways to define these differences. A basic definition of the first-order derivative of a
one-dimensional image f(x) is the difference
𝑑𝑓
= 𝑓 𝑥 + 1 − 𝑓(𝑥)
𝑑𝑥
The first order derivative must satisfy the properties such as,
Must be zero in the areas of constant gray-level values.
Must be nonzero at the onset of a gray-level step or ramp.
Must be nonzero along ramps.
𝑑2 𝑓
= 𝑓 𝑥 + 1 + 𝑓(𝑥 − 1) − 2𝑓(𝑥)
𝑑𝑥 2
The second order derivative must satisfy the properties such as,
Must be zero in the areas of constant gray-level values.
Must be nonzero at the onset and end of a gray-level step or ramp.
Must be zero along ramps of constant slope.
The second order derivatives in image processing are implemented by using the
Laplacian operator. The Laplacian for an image f(x, y), is defined as
2
𝑑2 𝑓 𝑑2 𝑓
∇ 𝑓= 2+ 2
𝑑𝑥 𝑑𝑦
𝑑2 𝑓
= 𝑓 𝑥 + 1, 𝑦 + 𝑓(𝑥 − 1, 𝑦) − 2𝑓(𝑥, 𝑦)
𝑑𝑥 2
Similarly in y-direction is
𝑑2 𝑓
= 𝑓 𝑥, 𝑦 + 1 + 𝑓(𝑥, 𝑦 − 1) − 2𝑓(𝑥, 𝑦)
𝑑𝑦 2
Then
∇2 𝑓 = 𝑓 𝑥 + 1, 𝑦 + 𝑓 𝑥 − 1, 𝑦 + 𝑓 𝑥, 𝑦 + 1 + 𝑓 𝑥, 𝑦 − 1 − 4𝑓(𝑥, 𝑦)
This equation can be implemented using the mask shown in the following, which
gives an isotropic result for rotations in increments of 90°.The diagonal directions can be
incorporated in the definition of the digital Laplacian by adding two more terms to the above
equation, one for each of the two diagonal directions. The form of each new term is the same
but the coordinates are along the diagonals. Since each diagonal term also contains a –2f(x, y)
term, the total subtracted from the difference terms now would be –8f(x, y). The mask used to
implement this new definition is shown in the figure. This mask yields isotropic results for
increments of 45°. The other two masks are also used frequently in practice.
Fig.(a) Filter mask used to implement the digital Laplacian (b) Mask used to implement an
extension of this equation that includes the diagonal neighbors. (c)&(d) Two other
Implementations of the Laplacian.
Digital Image Processing by P Sudheer Chakravarthi 18
CHAPTER-2 Intensity Transformations & Spatial Filtering, Filtering in the
Frequency Domain
The Laplacian with the negative sign gives equivalent results. Because the Laplacian
is a derivative operator, it highlights gray-level discontinuities in an image and deemphasizes
regions with slowly varying gray levels. This will tend to produce images that have grayish
edge lines and other discontinuities, all superimposed on a dark, featureless background.
Background features can be “recovered” while still preserving the sharpening effect of the
Laplacian operation simply by adding the original and Laplacian images. If the definition
uses a negative center coefficient, then we subtract, rather than add, then Laplacian image is
used to obtain a sharpened result. Thus, the basic way in which we use the Laplacian for
image enhancement is as follows:
A process that has been used for many years in the publishing industry to sharpen
images consists of subtracting a blurred version of an image from the original image. This
process, called unsharp masking, consists of following steps
Blur the original image.
Subtract the blurred image from the original.
Add the mask to the original.
Let 𝑓(𝑥, 𝑦) denotes the blurred image, unsharp masking is expressed as
gmask (x, y) = f(x, y) - 𝑓(𝑥, 𝑦)
Then we add a weighted portion of the mask to the original image
g(x,y) = f(x,y)+k* gmask(x,y) Where k is a weighted coefficient.
When k=1, which acts as unsharp masking
When k>1, It is referred as High-Boost Filtering.
When k<1, It de-emphasizes the contribution of the unsharp mask.
The first derivatives in image processing are implemented using the magnitude of the
gradient. The gradient of f at coordinates (x, y) is defined as the two-dimensional column
vector
The components of the gradient vector itself are linear operators, but the magnitude of
this vector obviously is not because of the squaring and square root. The computational
burden of implementing the above equation over an entire image is not trivial, and it is
common practice to approximate the magnitude of the gradient by using absolute values
instead of squares and square roots:
M(x, y) ≡ | gx | + | gy|
This equation is simpler to compute and it still preserves relative changes in gray
levels, but the isotropic feature property is lost in general. However, as in the case of the
Laplacian, the isotropic properties of the digital gradient are preserved only for a limited
number of rotational increments that depend on the masks used to approximate the
derivatives. As it turns out, the most popular masks used to approximate the gradient give the
same result only for vertical and horizontal edges and thus the isotropic properties of the
gradient are preserved only for multiples of 90°.
Let us denote the intensities of image points in a 3×3 region shown in figure (a). For
example, the center point, z5 , denotes f(x, y), z1 denotes f(x-1, y-1), and so on. The simplest
approximations to a first-order derivative that satisfy the conditions stated are gx= (z8-z5) and
gy = (z6-z5). Two other definitions proposed by Roberts [1965] in the early development of
digital image processing use cross differences:
This equation can be implemented with the two masks shown in figure (b) and
(c).These masks are referred to as the Roberts cross-gradient operators. Masks of even size
are difficult to implement. The smallest filter mask in which we are interested is of size 3×3.
An approximation using absolute values, still at point z5 , but using a 3×3 mask, is
These equations can be implemented using the masks shown in figure (d) and (e). The
difference between the third and first rows of the 3×3 image region approximates the
derivative in the x-direction, and the difference between the third and first columns
approximates the derivative in the y-direction. The masks shown in figure (d) and (e) are
referred as Sobel operator. The magnitude of gradient by using these masks is
The sifting property involves an impulse located at an arbitrary point to, denoted by
δ (t – to) is defined as
Let x represent a discrete variable, the unit discrete impulse, δ (x) is defined as
Convolution:
The convolution of two continuous functions, f(t) and h(t), of one continuous variable,
t, is defined as
and
Where 𝑓(t) denotes the sampled function and each component of this summation is
an impulse weighted by the value of f(t) at the location of the impulse. The value of each
sample is then given by the "strength" of the weighted impulse, which we obtain by
integration. That is, the value, f k, of an arbitrary sample in the sequence is given by
Fig: (a) A continuous function. (b) Train of impulses used to model the sampling process.(c)
Sampled function formed as the product of (a) and (b). (d) Sample values obtained by
integration and using the sifting prope.rty of the impulse.
Where
The summation in the last line shows that the Fourier transform of the sampled
function is an infinite, periodic sequence of copies of the transform of the original,
continuous function. The separation between copies is determined by the value of 1/ ΔT.
The quantity 1/ ΔT, is the sampling rate used to generate the sampled function. The
sampling rate was high enough to provide sufficient separation between the periods and thus
preserve the integrity of F(µ) is known as over-sampling. If the sampling rate was just
enough to preserve F(µ) is known as critically-sampling. The sampling rate was below the
minimum required to maintain distinct copies of F(µ) and thus failed to preserve the original
transform is known as under-sampling.
Fig. (a) Fourier transform of a band-limited function. (b)-(d) Transforms of the corresponding
sampled function under the conditions of over-sampling, critically sampling, and under-
sampling, respectively.
Sampling Theorem:
A function f(t) whose Fourier transform is zero for values of frequencies outside a
finite interval [-µmax, µmax] about the origin is called a band-limited function. We can recover
f(t) from its sampled version- if we can isolate a copy of F(µ) from the periodic sequence of
copies of this function contained in 𝐹 (µ). 𝐹 (µ) is a continuous, periodic function with period
1/ ΔT. This implies that we can recover f(t) from that single period by using the inverse
Fourier transform. Extracting from 𝐹 (µ) a single period that is equal to F (µ) is possible if the
separation between copies is sufficient with separation period
1
> 2𝜇𝑚𝑎𝑥
∆𝑇
This equation indicates that a continuous, band-limited function can be recovered
completely from a set of its samples if the samples are acquired at a rate exceeding twice the
highest frequency content of the function. This result is known as the sampling theorem.
Fig. (a) Transform of a band-limited function. (b) Transform resulting from critically
sampling the same function.
2.9. Extension to Functions of Two Variables
The 2-D Impulse and Its Sifting Property:
The impulse, δ (t, z), of two continuous variables, t and z, is defined as in
Let f (t, z) be a continuous function of two continuous variables, t and z. The two-
dimensional, continuous Fourier transform pair is given by the expressions
Where ΔT and ΔZ are the separations between samples along the t- and z-axis of the
continuous function f (t, z). Function f(t, z) is said to be band-limited if its Fourier Transform
is zero outside a rectangle established by the intervals [-µmax, µmax] and [-vmax , vmax] that is,
Where R (u, v) and I (u,v) are the real and imaginary parts of F(u,v). Some properties
of Fourier transform are listed below,
7. Obtain the final processed result, g(x,y), by extracting the M×N region from the top,
left quadrant of gp(x,y).
2.11. Image Smoothing using Frequency Domain Filters:
Smoothing is achieved in the frequency domain filtering by attenuating a specified
range of high frequency components in the transform of a given image.
Ideal Low pass Filter
The ideal low pass filter that passes without attenuation all frequencies within a circle
of radius D0 from the origin and “cuts off” all frequencies outside this circle. The 2-D low
pass filter
Where D(u,v) is the distance between a point (u,v) and the centre of the frequency
rectangle:
Fig: 3.3 (a) Perspective plot of an ideal Low pass Filter transfer function. (b) Filter displayed
as an image. (c) Filter Radial cross section.
The point of transition between H(u,v) = 1 and H(u,v) = 0 is called the cutoff
frequency. The sharp cutoff frequencies of an ILPF cannot be realized with electronic
components and it produces ringing effect where a series of lines decreasing intensity lie
parallel to the edges. To avoid this ringing effect Gaussian low-pass or Butterworth low-pass
filters are preferred.
If n increases, the filter becomes sharper with increased ringing in the spatial domain.
For n=1 it produces no ringing effect and n=2 ringing is present but imperceptible.
Fig: 3.4 (a) Perspective plot of a Butterworth Low pass Filter transfer function. (b) Filter
displayed as an image. (c) Filter Radial cross section of orders through n=1 to 4.
Gaussian Low pass Filter:
The transfer function of a 2D Gaussian low-pass filter (GLPF) is defined as
The Gaussian LPF transfer function is controlled by the value of cut-off frequency D0.
The advantage of the Gaussian filter is that it never causes ringing effect.
Fig. (a) Perspective plot of a Butterworth Low pass Filter transfer function. (b) Filter
displayed as an image. (c) Filter Radial cross section for various values of D0.
The IHPF is the opposite of the ILPF in the sense that it sets to zero all frequencies
inside a circle of radius Do while passing, without attenuation all frequencies ouside the
circle.
The order n determines the sharpness of the cutoff value and the amount of ringing.
The transition into higher values of cutoff frequencies is much smoother with the BHPF.
The results obtained by using GHPF are more gradual than with the IHPF, BHPF
filters. Even the filtering of the smaller objects and thin bars is cleaner with the Gaussian
filter.
Fig. Perspective plot, Filter displayed as an image and Filter Radial cross section.
2.13. The Laplacian in the Frequency Domain
The Laplacian for an image f(x, y) is defined as
𝑑2 𝑓 𝑑2 𝑓
∇2 𝑓(𝑥, 𝑦) = +
𝑑𝑥 2 𝑑𝑦 2
We know that,
Then
Hence the Laplacian can be implemented in the frequency domain by using the filter
In all filtering operations, the assumption is that the origin of F (u, v) has been
x+y
centered by performing the operation f(x, y) (-1) prior to taking the transform of the
image. If f (and F) are of size M X N, this operation shifts the center transform so that
(u, v) = (0, 0) is at point (M/2, N/2) in the frequency rectangle. As before, the center of the
filter function also needs to be shifted:
Conversely, computing the Laplacian in the spatial domain and computing the Fourier
transform of the result is equivalent to multiplying F(u, v) by H(u, v). Hence this dual
relationship in the Fourier-transform-pair notation,
The Enhanced image g(x, y) can be obtained by subtracting the Laplacian from the
original image,
Where Fi(u, v) and Fr (u, v) are the Fourier transforms of lni(x, y) and ln r(x, y),
respectively. If we process Z (u, v) by means of a filter function H (u, v) then, from
Finally, z (x, y) was formed by taking the logarithm of the original image f (x, y), and
the inverse operation yields the desired enhanced image, denoted by g(x, y).
Where
are the illumination and reflectance components of the output image. The filtering
appraoch issummarized as shown in the figure.
Where „W‟ is the width of the band, D is the distance D (u, v) from the centre of the
filter, Do is the cutoff frequency and n is the order of the Butterworth filter. The band reject
filters are very effective in removing periodic noise and the ringing effect normally small
A Band pass filter is obtained from the band reject filter as
Fig: Band Reject Filter and its corresponding Band Pass Filter
Notch Filters
A Notch filter Reject (or pass) frequencies in a predefined neighborhood about the
centre of the frequency rectangle. It is constructed as products of high pass filters whose
centers have been translated to the centers of the notches. The general form is defined as
Where Hk(u,v ) and H-k(u,v ) are high pass filters whose centers are at (uk, vk) and
(-uk, -vk) respectively. These centers are specified with respect to the center of the frequency
rectangle (M/2, N/2). The distance computations for each filter are defined as
A Notch Pass filter (NP) is obtained from a Notch Reject filter (NR) using:
For example the Butterworth notch reject filter of order n, containing three notch pairs
is defined as
PREVIOUS QUESTIONS
1. What is meant by image enhancement? Explain the various approaches used in image
enhancement.
2. a) Explain the Gray level transformation. Give the applications.
b) Compare frequency domain methods and spatial domain methods used in image
enhancement
3. Explain in detail about histogram processing.
4. What is meant by Histogram Equalization? Explain.
5. Explain how Fourier transforms are useful in digital image processing?
6. What is meant by Enhancement by point processing? Explain.
UNIT-3
IMAGE RESTORATION AND RECONSTRUCTION
Introduction
Image Restoration is the process to recover an image that has been degraded by using
a priori knowledge of the degradation phenomenon. These techniques are oriented toward
modeling the degradation and applying the inverse process to recover the original image.
Restoration improves image in some predefined sense. Image enhancement techniques are
subjective process, where as image restoration techniques are objective process.
Let f(x, y) is an input image and g(x, y) is the degraded image with some knowledge
about the degradation function H and some knowledge about the additive noise term η(x, y).
The objective of the restoration is to obtain an estimate 𝑓(x, y) of the original image. If H is a
linear, position-invariant process, then the degraded image is given in the spatial domain by
g(x,y)=f(x,y)*h(x,y)+η(x,y)
Where h(x, y) is the spatial representation of the degraded function. The degrade image in
frequency domain is represented as
G(u,v)=F(u,v)H(u,v)+N(u,v)
The terms in the capital letters are the Fourier Transform of the corresponding terms
in the spatial domain.
Where z represents intensity, 𝑧 is the mean and σ is its standard deviation and its
square (σ2 ) is called the variance of z. The values of Gaussian noise is approximately 70%
will be in the range [(𝑧 − σ), (𝑧 + σ)] and 95% will be in the range [(𝑧 − 2σ), (𝑧 + 2σ)].
Rayleigh Noise
The PDF of Rayleigh Noise is given by
Applications:
It is used for characterizing noise phenomenon in range imaging.
It describes the error in the measurement instrument.
It describes the noise affected in radar.
It determines the noise occurred when the signal is passed through the band pass
filter.
Exponential Noise
The probability density function of Exponential noise is given by
Applications:
It is used to describe the size of the raindrop.
It is used to describe the fluctuations in received power reflected from certain targets
It finds application in Laser imaging.
Uniform Noise
The probability density function of Uniform noise is given by
If b>a, gray level b will appear as a light dot in image. Level a will appear like a dark
dot. The salt and pepper noise is also called as bi-polar impulse noise or Data-drop-out and
spike noise.
Periodic Noise
Periodic noise in an image occurred from electrical or elecrtomechnaical interference
during image acquisition. This is the only type of spatially dependent noise and the
parameters are estimated by the Fourier spectrum of the image. Periodic noise tends to
produce frequency spikes that often can be detected even by visual analysis. The mean and
variance are defined as
Mean Filters
This operation can be using a convolution mask in which all coefficients have value
1/mn. A mean filter smoothes local variations in image Noise is reduced as a result of
blurring.
Here, each restored pixel is given by the product of the pixel in the sub-image
window, raised to the power 1/mn. A Geometric means filter achieves smoothing comparable
to the arithmetic mean filter, but it tends to lose image details in the process.
The harmonic mean filter works well for salt noise but fails for pepper noise. It does
well also with other types of noise.
Contra harmonic Mean Filter:
The contra harmonic mean filter yields a restored image based on the expression
Where Q is called the order of the filter and this filter is well suited for reducing the
effects of salt and pepper noise. For positive values of Q the filter eliminates pepper noise.
For negative values of Q it eliminates salt noise. It cannot do both simultaneously. The contra
harmonic filter reduces to arithmetic mean filter if Q=0 and to the harmonic filter if Q= -1.
Order-Static Filters
Order statistics filters are spatial filters whose response is based on ordering the pixel
contained in the image area encompassed by the filter. The response of the filter at any point
is determined by the ranking result.
Median Filter:
It is the best known order statistic filter. It replaces the value of a pixel by the median
of gray levels in the Neighborhood of the pixel.
The value of the pixel at (x, y) is included in the computation of the median. Median
filters are quite popular because for certain types of random noise, they provide excellent
noise reduction capabilities with considerably less blurring than smoothing filters of similar
size. These are effective for bipolar and unipolar impulse noise.
This filter is used for finding the brightest point in an image. Pepper noise in the
image has very low values, it is reduced by max filter using the max selection process in the
sublimated area SXY.
The 0th percentile filter is min filter
This filter is useful for flinging the darkest point in image. Also, it reduces salt noise
as a result of the min operation.
Midpoint Filter:
The midpoint filter simply computes the midpoint between the maximum and
minimum values in the area encompassed by the filter.
It combines the order statistics and averaging .This filter works best for randomly
distributed noise like Gaussian or uniform noise.
The value of d can range from 0 to mn-1. If d=0 this filter reduces to arithmetic mean
filter. If d=mn-1, the filter becomes a median filter. For the other values of d the alpha-
trimmed median filter is useful for multiple types of noise, such as combination of salt-and-
pepper and Gaussian noise.
2
fˆ ( x, y ) g ( x, y ) 2 g ( x, y ) mL
L
The only quantity that needs to be known or estimated is the variance of the overall
noise is 𝜎𝜂2 . The other parameters are computed from the pixels in SXY.
Adaptive median filter
Adaptive median filters are used to preserve the details while smoothing non impulse.
However it changes the size of SXY during the filtering operation, depending on certain
conditions. The output of the filter is a single value used to replace the value of pixel at (x, y).
Let us consider the following parameters,
The adaptive median filtering algorithm works in two stages, denoted as stage A and
stage B as follows:
Stage A: A1 = zmed - zmin
A2 = zmed - zmax
If A1>0 AND A2<0, go to stage B
Else increase the window size
If window size ≤ Smax repeat stage A
Else output zmed
Stage B: B1 = zxy - zmin
B2 = zxy - zmax
If B1>0 AND B2<0, output zxy.
Else output zmed
3.5 Periodic Noise Reduction by Frequency Domain Filtering
Periodic noise in images are appears as concentrated bursts of energy in the Fourier
transform at locations corresponding to the frequencies of the periodic interference. This can
be removed by using selective filters.
Band Reject Filter:
The Band Reject Filter transfer function is defined as
Where „W‟ is the width of the band, D is the distance D (u, v) from the centre of the
filter, Do is the cutoff frequency and n is the order of the Butterworth filter. The band reject
filters are very effective in removing periodic noise and the ringing effect normally small.
The perspective plots of these filters are
Fig: Perspective plots of (a) Ideal (b) Butterworth and (c) Gaussian Band Reject Filters
Notch Filters:
A Notch filter Reject (or pass) frequencies in a predefined neighborhood about the
centre of the frequency rectangle. It is constructed as products of high pass filters whose
centers have been translated to the centers of the notches. The general form is defined as
Where Hk(u,v ) and H-k(u,v ) are high pass filters whose centers are at (uk, vk) and
(-uk, -vk) respectively. These centers are specified with respect to the center of the frequency
rectangle (M/2, N/2).
Fig: Perspective plots of (a) Ideal (b) Butterworth and (c) Gaussian Notch Reject Filters
A Notch Pass filter (NP) is obtained from a Notch Reject filter (NR) using:
From the characteristics of this equation, we then deduce the complete degradation
function H (u, v) based on the consideration of position invariance.
Estimation by Experimentation:
The degrade image can be estimated accurately when the equipment is identical to the
one used to obtain the degraded image. Images similar to the degraded image can be acquired
with various system settings until they are degraded as closely as possible to the image we
wish to restore. Now obtain the impulse response of the degradation by imaging an impulse
using the same system settings.
An impulse is simulated by a bright dot of light, as bright as possible to reduce the
effect of noise. Then the degradation image can be expressed as
Where k is a constant that depends on the nature of the turbulence. Let f(x, y) is an
image that undergoes planar motion and that xo (t) and yo (t) are the time varying components
in the direction of x and y. The total blurring image g(x, y) is expressed as
If the motion variables xo(t) and yo(t) are known then the degradation function H (u, v)
can becomes
Then
From the above expression we can observe that if we know the degrade function we
cannot recover the un-degraded image exactly because N (u, v) is not known. If the degrade
function has zero or very small values, then the ratio N (u, v)/ H(u, v) could easily dominate
the 𝐹 (u, v). To avoid this disadvantage we limit the filter frequency values near the origin
because 𝐻(u, v) values are maximum at the origin.
3.9 Minimum Mean Square Error Filtering (Wiener Filtering)
In this filtering process, it incorporates both the degrade function and statistical
characteristics of noise .The images and noise in this method are considered as random
variables. The objective is to find an estimate 𝑓 of the uncorrupted image f such that the
mean square error between them is minimized. This error is measured by
Where E { } is the expected value of the argument. It is assumed that noise and image
are uncorrelated; one or the other has zero mean; the intensity levels in the estimate are a
linear function of the levels in the degraded image. Based on these conditions the minimum
of the error function in frequency domain is given by
This result is known as “Wiener filter”. The terms inside the bracket is commonly
referred as the minimum mean square error filter or the least square error filter. It does not
have the same problem as the inverse filter with zeros in the degraded function, unless the
entire denominator is zero for the same values of u and v.
H(u, v) = degraded function
H*(u, v) = complex conjugate of H(u, v)
The modified expression to estimate 𝑓 by using minimum mean square error filtering
Where γ is a parameter that must be adjusted so that constraint is satisfied and P (u, v)
is the Fourier transform of the function
α and β being positive real constants. Based on the values of α and β, geometric mean
filter performs the different actions
α = 1 => inverse filter
α = 0 => parametric Wiener filter (standard Wiener filter when β = 1)
α = 1/2 => actual geometric mean
α = 1/2 and β = 1 => spectrum equalization filter
PREVIOUS QUESTIONS
1. What is meant by image restoration? Explain the image degradation model
2. Discuss about the noise models
3. Explain the concept of algebraic image restoration
4. Discuss the advantages and disadvantages of wiener filter with regard to image
restoration.
5. Explain about noise modeling based on distribution function
6. Explain about wiener filter in noise removal
7. What is geometric mean filter? Explain
8. Explain the following. a) Minimum Mean square error filtering. b) Inverse filtering.
9. Discuss about Constrained Least Square restoration of a digital image in detail.
10. Explain in detail about different types of order statistics filters for Restoration.
11. Name different types of estimating the degradation function for use in image
restoration and explain in detail estimation by modeling.
12. Explain periodic noise reduction by frequency domain filtering
13. Explain adaptive filter and also what the two levels of adaptive median filtering
algorithms are
UNIT-5
Wavelets and Multi-resolution Processing
Image Compression
Introduction
In recent years, there have been significant advancements in algorithms and
architectures for the processing of image, video, and audio signals. These advancements have
proceeded along several directions. On the algorithmic front, new techniques have led to the
development of robust methods to reduce the size of the image, video, or audio data. Such
methods are extremely vital in many applications that manipulate and store digital data.
Informally, we refer to the process of size reduction as a compression process. We will
define this process in a more formal way later. On the architecture front, it is now feasible to
put sophisticated compression processes on a relatively low-cost single chip; this has spurred
a great deal of activity in developing multimedia systems for the large consumer market.
One of the exciting prospects of such advancements is that multimedia information
comprising image, video, and audio has the potential to become just another data type. This
usually implies that multimedia information will be digitally encoded so that it can be
manipulated, stored, and transmitted along with other digital data types. For such data usage
to be pervasive, it is essential that the data encoding is standard across different platforms
and applications. This will foster widespread development of applications and will also
promote interoperability among systems from different vendors. Furthermore, standardisation
can lead to the development of cost-effective implementations, which in turn will promote
the widespread use of multimedia information. This is the primary motivation behind the
emergence of image and video compression standards.
Compression is a process intended to yield a compact digital representation of a
signal. In the literature, the terms source coding, data compression, bandwidth compression,
and signal compression are all used to refer to the process of compression. In the cases where
the signal is defined as an image, a video stream, or an audio signal, the generic problem of
compression is to minimise the bit rate of their digital representation. There are many
applications that benefit when image, video, and audio signals are available in compressed
form. Without compression, most of these applications would not be feasible!
Example 1: Let us consider facsimile image transmission. In most facsimile machines, the
document is scanned and digitised. Typically, an 8.5x11 inches page is scanned at 200 dpi;
thus, resulting in 3.74 Mbits. Transmitting this data over a low-cost 14.4 kbits/s modem
would require 5.62 minutes. With compression, the transmission time can be reduced to 17
seconds. This results in substantial savings in transmission costs.
Image, video, and audio signals are amenable to compression due to the factors below.
There is considerable statistical redundancy in the signal.
1. Within a single image or a single video frame, there exists significant correlation
among neighbour samples. This correlation is referred to as spatial correlation.
2. For data acquired from multiple sensors (such as satellite images), there exists
significant correlation amongst samples from these sensors. This correlation is
referred to as spectral correlation.
3. For temporal data (such as video), there is significant correlation amongst samples in
different segments of time. This is referred to as temporal correlation.
The term data compression refers to the process of reducing the amount of data
required to represent a given quantity of information. A clear distinction must be made
between data and information. They are not synonymous. In fact, data are the means by
which information is conveyed. Various amounts of data may be used to represent the same
amount of information. Such might be the case, for example, if a long-winded individual and
someone who is short and to the point were to relate the same story. Here, the information of
interest is the story; words are the data used to relate the information. If the two individuals
use a different number of words to tell the same basic story, two different versions of the
story are created, and at least one includes nonessential data. That is, it contains data (or
words) that either provide no relevant information or simply restate that which is already
known. It is thus said to contain data redundancy.
Data redundancy is a central issue in digital image compression. It is not an abstract
concept but a mathematically quantifiable entity. If n1 and n2 denote the number of
information-carrying units in two data sets that represent the same information, the relative
Digital Image Processing by P Sudheer Chakravarthi 2
CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression
data redundancy RD of the first data set (the one characterized by n1) can be defined as
For the case n2 = n1, CR = 1 and RD = 0, indicating that (relative to the second data
set) the first representation of the information contains no redundant data. When n2 << n1, CR
∞ and RD1, implying significant compression and highly redundant data. Finally, when
n2 >> n1 , CR 0 and RD ∞, indicating that the second data set contains much more data
than the original representation. This, of course, is the normally undesirable case of data
expansion. In general, CR and RD lie in the open intervals (0,∞) and (-∞, 1), respectively. A
practical compression ratio, such as 10 (or 10:1), means that the first data set has 10
information carrying units (say, bits) for every 1 unit in the second or compressed data set.
The corresponding redundancy of 0.9 implies that 90% of the data in the first data set is
redundant.
In digital image compression, three basic data redundancies can be identified and
exploited: coding redundancy, interpixel redundancy, and psychovisual redundancy.
Data compression is achieved when one or more of these redundancies are reduced or
eliminated.
Coding Redundancy:
In this, we utilize formulation to show how the gray-level histogram of an image also
can provide a great deal of insight into the construction of codes to reduce the amount of data
used to represent it.
Let us assume, once again, that a discrete random variable rk in the interval [0, 1]
represents the gray levels of an image and that each rk occurs with probability pr (rk).
where L is the number of gray levels, nk is the number of times that the kth gray level appears
in the image, and n is the total number of pixels in the image. If the number of bits used to
represent each value of rk is l (rk), then the average number of bits required to represent each
pixel is
That is, the average length of the code words assigned to the various gray-level values
is found by summing the product of the number of bits used to represent each gray level and
the probability that the gray level occurs. Thus the total number of bits required to code an M
X N image is MNLavg.
Interpixel Redundancy:
Consider the images shown in Figs. 1.1(a) and (b). As Figs. 1.1(c) and (d) show, these
images have virtually identical histograms. Note also that both histograms are trimodal,
indicating the presence of three dominant ranges of gray-level values. Because the gray levels
in these images are not equally probable, variable-length coding can be used to reduce the
coding redundancy that would result from a straight or natural binary encoding of their
pixels. The coding process, however, would not alter the level of correlation between the
pixels within the images. In other words, the codes used to represent the gray levels of each
image have nothing to do with the correlation between pixels. These correlations result from
the structural or geometric relationships between the objects in the image.
Fig.1.1 Two images and their gray-level histograms and normalized autocorrelation
coefficients along one line.
Figures 1.1(e) and (f) show the respective autocorrelation coefficients computed along
one line of each image.
where
The scaling factor in Eq. above accounts for the varying number of sum terms that
arise for each integer value of n. Of course, n must be strictly less than N, the number of
pixels on a line. The variable x is the coordinate of the line used in the computation. Note the
dramatic difference between the shape of the functions shown in Figs. 1.1(e) and (f). Their
shapes can be qualitatively related to the structure in the images in Figs. 1.1(a) and (b).This
relationship is particularly noticeable in Fig. 1.1 (f), where the high correlation between
pixels separated by 45 and 90 samples can be directly related to the spacing between the
vertically oriented matches of Fig. 1.1(b). In addition, the adjacent pixels of both images are
highly correlated. When n is 1, γ is 0.9922 and 0.9928 for the images of Figs. 1.1 (a) and (b),
respectively. These values are typical of most properly sampled television images.
These illustrations reflect another important form of data redundancy—one directly
related to the interpixel correlations within an image. Because the value of any given pixel
can be reasonably predicted from the value of its neighbors, the information carried by
individual pixels is relatively small. Much of the visual contribution of a single pixel to an
image is redundant; it could have been guessed on the basis of the values of its neighbors. A
variety of names, including spatial redundancy, geometric redundancy, and interframe
redundancy, have been coined to refer to these interpixel dependencies. We use the term
interpixel redundancy to encompass them all.
In order to reduce the interpixel redundancies in an image, the 2-D pixel array
normally used for human viewing and interpretation must be transformed into a more
efficient (but usually "nonvisual") format. For example, the differences between adjacent
pixels can be used to represent an image. Transformations of this type (that is, those that
remove interpixel redundancy) are referred to as mappings. They are called reversible
mappings if the original image elements can be reconstructed from the transformed data set.
Psychovisual Redundancy:
The brightness of a region, as perceived by the eye, depends on factors other than
simply the light reflected by the region. For example, intensity variations (Mach bands) can
be perceived in an area of constant intensity. Such phenomena result from the fact that the
eye does not respond with equal sensitivity to all visual information. Certain information
simply has less relative importance than other information in normal visual processing. This
information is said to be psychovisually redundant. It can be eliminated without significantly
impairing the quality of image perception.
That psychovisual redundancies exist should not come as a surprise, because human
perception of the information in an image normally does not involve quantitative analysis of
every pixel value in the image. In general, an observer searches for distinguishing features
such as edges or textural regions and mentally combines them into recognizable groupings.
The brain then correlates these groupings with prior knowledge in order to complete the
image interpretation process. Psychovisual redundancy is fundamentally different from the
redundancies discussed earlier. Unlike coding and interpixel redundancy, psychovisual
redundancy is associated with real or quantifiable visual information. Its elimination is
possible only because the information itself is not essential for normal visual processing.
Since the elimination of psychovisually redundant data results in a loss of quantitative
information, it is commonly referred to as quantization.
This terminology is consistent with normal usage of the word, which generally means
the mapping of a broad range of input values to a limited number of output values. As it is an
irreversible operation (visual information is lost), quantization results in lossy data
compression.
Fidelity Criterion.
The removal of psycho visually redundant data results in a loss of real or quantitative
visual information. Because information of interest may be lost, a repeatable or reproducible
means of quantifying the nature and extent of information loss is highly desirable. Two
general classes of criteria are used as the basis for such an assessment:
A) Objective fidelity criteria and
B) Subjective fidelity criteria.
When the level of information loss can be expressed as a function of the original or
input image and the compressed and subsequently decompressed output image, it is said to be
based on an objective fidelity criterion. A good example is the root-mean-square (rms) error
between an input and output image. Let f(x, y) represent an input image and let f(x, y) denote
an estimate or approximation of f(x, y) that results from compressing and subsequently
decompressing the input. For any value of x and y, the error e(x, y) between f (x, y) and f^ (x,
y) can be defined as
where the images are of size M X N. The root-mean-square error, erms, between f(x, y)
and f^(x, y) then is the square root of the squared error averaged over the M X N array, or
The rms value of the signal-to-noise ratio, denoted SNRrms, is obtained by taking the
square root of Eq. above.
Although objective fidelity criteria offer a simple and convenient mechanism for
evaluating information loss, most decompressed images ultimately are viewed by humans.
Consequently, measuring image quality by the subjective evaluations of a human observer
often is more appropriate. This can be accomplished by showing a "typical" decompressed
image to an appropriate cross section of viewers and averaging their evaluations. The
evaluations may be made using an absolute rating scale or by means of side-by-side
comparisons of f(x, y) and f^(x, y).
The source decoder shown in Fig. 3.2(b) contains only two components: a symbol
decoder and an inverse mapper. These blocks perform, in reverse order, the inverse
operations of the source encoder's symbol encoder and mapper blocks. Because quantization
results in irreversible information loss, an inverse quantizer block is not included in the
general source decoder model shown in Fig. 3.2(b).
The Channel Encoder and Decoder:
The channel encoder and decoder play an important role in the overall encoding-
decoding process when the channel of Fig. 3.1 is noisy or prone to error. They are designed to
reduce the impact of channel noise by inserting a controlled form of redundancy into the
source encoded data. As the output of the source encoder contains little redundancy, it would
be highly sensitive to transmission noise without the addition of this "controlled redundancy."
One of the most useful channel encoding techniques was devised by R. W. Hamming
(Hamming [1950]). It is based on appending enough bits to the data being encoded to ensure
that some minimum number of bits must change between valid code words. Hamming
showed, for example, that if 3 bits of redundancy are added to a 4-bit word, so that the
distance between any two valid code words is 3, all single-bit errors can be detected and
corrected. (By appending additional bits of redundancy, multiple-bit errors can be detected
and corrected.) The 7-bit Hamming (7, 4) code word h1, h2, h3…., h6, h7 associated with a 4-
bit binary number b3b2b1b0 is
where denotes the exclusive OR operation. Note that bits h1, h2, and h4 are even-
parity bits for the bit fields b3 b2 b0, b3b1b0, and b2b1b0, respectively. (Recall that a string of
binary bits has even parity if the number of bits with a value of 1 is even.) To decode a
Hamming encoded result, the channel decoder must check the encoded value for odd parity
over the bit fields in which even parity was previously established. A single-bit error is
indicated by a nonzero parity word c4c2c1, where
If a nonzero value is found, the decoder simply complements the code word bit position
indicated by the parity word. The decoded binary value is then extracted from the corrected
code word as h3h5h6h7.
Variable-Length Coding:
The simplest approach to error-free image compression is to reduce only coding
redundancy. Coding redundancy normally is present in any natural binary encoding of the
gray levels in an image. It can be eliminated by coding the gray levels. To do so requires
construction of a variable-length code that assigns the shortest possible code words to the
most probable gray levels. Here, we examine several optimal and near optimal techniques for
constructing such a code. These techniques are formulated in the language of information
theory. In practice, the source symbols may be either the gray levels of an image or the output
of a gray-level mapping operation (pixel differences, run lengths, and so on).
Huffman coding:
The most popular technique for removing coding redundancy is due to Huffman
(Huffman [1952]). When coding the symbols of an information source individually, Huffman
coding yields the smallest possible number of code symbols per source symbol. In terms of
the noiseless coding theorem, the resulting code is optimal for a fixed value of n, subject to
the constraint that the source symbols be coded one at a time.
The first step in Huffman's approach is to create a series of source reductions by
ordering the probabilities of the symbols under consideration and combining the lowest
probability symbols into a single symbol that replaces them in the next source reduction.
Figure 4.1 illustrates this process for binary coding (K-ary Huffman codes can also be
constructed). At the far left, a hypothetical set of source symbols and their probabilities are
ordered from top to bottom in terms of decreasing probability values. To form the first source
reduction, the bottom two probabilities, 0.06 and 0.04, are combined to form a "compound
symbol" with probability 0.1. This compound symbol and its associated probability are
placed in the first source reduction column so that the
probabilities of the reduced source are also ordered from the most to the least probable. This
process is then repeated until a reduced source with two symbols (at the far right) is reached.
The second step in Huffman's procedure is to code each reduced source, starting with
the smallest source and working back to the original source. The minimal length binary code
for a two-symbol source, of course, is the symbols 0 and 1. As Fig. 4.2 shows, these symbols
are assigned to the two symbols on the right (the assignment is arbitrary; reversing the order
Digital Image Processing by P Sudheer Chakravarthi 11
CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression
of the 0 and 1 would work just as well). As the reduced source symbol with probability 0.6
was generated by combining two symbols in the reduced source to its left, the 0 used to code
it is now assigned to both of these symbols, and a 0 and 1 are arbitrarily
appended to each to distinguish them from each other. This operation is then repeated for
each reduced source until the original source is reached. The final code appears at the far left
in Fig. 4.2. The average length of this code is
and the entropy of the source is 2.14 bits/symbol. The resulting Huffman code efficiency is
0.973.
Huffman's procedure creates the optimal code for a set of symbols and probabilities
subject to the constraint that the symbols be coded one at a time. After the code has been
created, coding and/or decoding is accomplished in a simple lookup table manner. The code
itself is an instantaneous uniquely decodable block code. It is called a block code because
each source symbol is mapped into a fixed sequence of code symbols. It is instantaneous,
because each code word in a string of code symbols can be decoded without referencing
succeeding symbols. It is uniquely decodable, because any string of code symbols can be
decoded in only one way. Thus, any string of Huffman encoded symbols can be decoded by
examining the individual symbols of the string in a left to right manner. For the binary code
of Fig. 4.2, a left-to-right scan of the encoded string 010100111100 reveals that the first valid
code word is 01010, which is the code for symbol a3 .The next valid code is 011, which
corresponds to symbol a1. Continuing in this manner reveals the completely decoded message
to be a3a1a2a2a 6.
Arithmetic coding:
Unlike the variable-length codes described previously, arithmetic coding generates
nonblock codes. In arithmetic coding, which can be traced to the work of Elias, a one-to-one
correspondence between source symbols and code words does not exist. Instead, an entire
sequence of source symbols (or message) is assigned a single arithmetic code word. The code
word itself defines an interval of real numbers between 0 and 1. As the number of symbols in
the message increases, the interval used to represent it becomes smaller and the number of
information units (say, bits) required to represent the interval becomes larger. Each symbol of
the message reduces the size of the interval in accordance with its probability of occurrence.
Because the technique does not require, as does Huffman's approach, that each source symbol
translate into an integral number of code symbols (that is, that the symbols be coded one at a
time), it achieves (but only in theory) the bound established by the noiseless coding theorem.
coding process, the message is assumed to occupy the entire half-open interval [0, 1). As
Table 5.2 shows, this interval is initially subdivided into four regions based on the
probabilities of each source symbol. Symbol ax, for example, is associated with subinterval
[0, 0.2). Because it is the first symbol of the message being coded, the message interval is
initially narrowed to [0, 0.2). Thus in Fig. 5.1 [0, 0.2) is expanded to the full height of the
figure and its end points labeled by the values of the narrowed range. The narrowed range is
then subdivided in accordance with the original source symbol probabilities and the process
continues with the next message symbol.
the probability of occurrence of the symbols to be encoded. LZW compression has been
integrated into a variety of mainstream imaging file formats, including the graphic
interchange format (GIF), tagged image file format (TIFF), and the portable document format
(PDF). LZW coding is conceptually very simple (Welch [1984]). At the onset of the coding
process, a codebook or "dictionary" containing the source symbols to be coded is constructed.
For 8-bit monochrome images, the first 256 words of the dictionary are assigned to the gray
values 0, 1, 2..., and 255. As the encoder sequentially examines the image's pixels, gray-level
sequences that are not in the dictionary are placed in algorithmically determined (e.g., the
next unused) locations. If the first two pixels of the image are white, for instance, sequence
“255-255” might be assigned to location 256, the address following the locations reserved for
gray levels 0 through 255. The next time that two consecutive white pixels are encountered,
code word 256, the address of the location containing sequence 255-255, is used to represent
them. If a 9-bit, 512-word dictionary is employed in the coding process, the original (8 + 8)
bits that were used to represent the two pixels are replaced by a single 9-bit code word.
Cleary, the size of the dictionary is an important system parameter. If it is too small, the
detection of matching gray-level sequences will be less likely; if it is too large, the size of the
code words will adversely affect compression performance.
Consider the following 4 x 4, 8-bit image of a vertical edge:
Table 6.1 details the steps involved in coding its 16 pixels. A 512-word dictionary with the
following starting content is assumed:
Locations 256 through 511 are initially unused. The image is encoded by processing
its pixels in a left-to-right, top-to-bottom manner. Each successive gray-level value is
Based on this property, a simple method of decomposing the image into a collection
of binary images is to separate the m coefficients of the polynomial into m 1-bit bit planes.
The zeroth-order bit plane is generated by collecting the a0 bits of each pixel, while the (m -
1) st-order bit plane contains the am-1, bits or coefficients. In general, each bit plane is
numbered from 0 to m-1 and is constructed by setting its pixels equal to the values of the
appropriate bits or polynomial coefficients from each pixel in the original image. The
inherent disadvantage of this approach is that small changes in gray level can have a
significant impact on the complexity of the bit planes. If a pixel of intensity 127 (01111111)
is adjacent to a pixel of intensity 128 (10000000), for instance, every bit plane will contain a
corresponding 0 to 1 (or 1 to 0) transition. For example, as the most significant bits of the two
binary codes for 127 and 128 are different, bit plane 7 will contain a zero-valued pixel next to
a pixel of value 1, creating a 0 to 1 (or 1 to 0) transition at that point.
An alternative decomposition approach (which reduces the effect of small gray-level
variations) is to first represent the image by an m-bit Gray code. The m-bit Gray code gm-
1...g2g1g0 that corresponds to the polynomial in Eq. above can be computed from
Here, denotes the exclusive OR operation. This code has the unique property that
successive code words differ in only one bit position. Thus, small changes in gray level are
less likely to affect all m bit planes. For instance, when gray levels 127 and 128 are adjacent,
only the 7th bit plane will contain a 0 to 1 transition, because the Gray codes that correspond
to 127 and 128 are 11000000 and 01000000, respectively
The decoder of Fig. 8.1 (b) reconstructs en from the received variable-length code words and
performs the inverse operation
Various local, global, and adaptive methods can be used to generate f^n. In most
cases, however, the prediction is formed by a linear combination of m previous pixels. That
is,
where m is the order of the linear predictor, round is a function used to denote the
rounding or nearest integer operation, and the αi, for i = 1,2,..., m are prediction coefficients.
In raster scan applications, the subscript n indexes the predictor outputs in accordance with
their time of occurrence. That is, fn, f^n and en in Eqns. above could be replaced with the more
explicit notation f (t), f^(t), and e (t), where t represents time. In other cases, n is used as an
index on the spatial coordinates and/or frame number (in a time sequence of images) of an
image. In 1-D linear predictive coding, for example, Eq. above can be written as
predictive coding.
Fig. 9 A lossy predictive coding model: (a) encoder and (b) decoder
In order to accommodate the insertion of the quantization step, the error-free encoder
of figure must be altered so that the predictions generated by the encoder and decoder are
equivalent. As Fig.9 (a) shows, this is accomplished by placing the lossy encoder's predictor
within a feedback loop, where its input, denoted f˙n, is generated as a function of past
predictions and the corresponding quantized errors. That is,
This closed loop configuration prevents error buildup at the decoder's output. Note from Fig.
9(b) that the output of the decoder also is given by the above Eqn.
Optimal predictors:
The optimal predictor used in most predictive coding applications minimizes the
encoder's mean-square prediction error
and
That is, the optimization criterion is chosen to minimize the mean-square prediction
error, the quantization error is assumed to be negligible (e˙n ≈ en), and the prediction is
constrained to a linear combination of m previous pixels.1 These restrictions are not essential,
but they simplify the analysis considerably and, at the same time, decrease the computational
complexity of the predictor. The resulting predictive coding approach is referred to as
differential pulse code modulation (DPCM).
Transform Coding:
All the predictive coding techniques operate directly on the pixels of an image and
thus are spatial domain methods. In this coding, we consider compression techniques that are
based on modifying the transform of an image. In transform coding, a reversible, linear
transform (such as the Fourier transform) is used to map the image into a set of transform
coefficients, which are then quantized and coded. For most natural images, a significant
number of the coefficients have small magnitudes and can be coarsely quantized (or
discarded entirely) with little image distortion. A variety of transformations, including the
discrete Fourier transform (DFT), can be used to transform the image data.
coding (normally using a variable-length code) the quantized coefficients. Any or all of the
transform encoding steps can be adapted to local image content, called adaptive transform
coding, or fixed for all subimages, called nonadaptive transform coding.
Wavelet Coding:
The wavelet coding is based on the idea that the coefficients of a transform that
decorrelates the pixels of an image can be coded more efficiently than the original pixels
themselves. If the transform's basis functions—in this case wavelets—pack most of the
important visual information into a small number of coefficients, the remaining coefficients
can be quantized coarsely or truncated to zero with little image distortion.
Figure 11 shows a typical wavelet coding system. To encode a 2J X 2J image, an
analyzing wavelet, Ψ, and minimum decomposition level, J - P, are selected and used to
compute the image's discrete wavelet transform. If the wavelet has a complimentary scaling
function φ, the fast wavelet transform can be used. In either case, the computed transform
converts a large portion of the original image to horizontal, vertical, and diagonal
decomposition coefficients with zero mean and Laplacian-like distributions.
The principal difference between the wavelet-based system and the transform coding
system is the omission of the transform coder's subimage processing stages.
Because wavelet transforms are both computationally efficient and inherently local
Digital Image Processing by P Sudheer Chakravarthi 22
CHAPTER-5 Wavelets and Multi-resolution Processing, Image Compression
(i.e., their basis functions are limited in duration), subdivision of the original image is
unnecessary
PREVIOUS QUESTIONS
1. Draw the functional block diagram of image compression system and explain the
purpose of each block.
2. Explain the need for image compression. How run length encoding approach is used
for compression? Is it lossy? Justify.
3. Describe about wavelet packets.
4. Write short notes on: i) Arithmetic coding. ii) Vector quantization. iii) JPEG
standards.
5. Explain about the Fast Wavelet Transform.
6. Explain two-band sub-band coding and decoding system.
7. What are the various requirements for multi-resolution analysis? Explain.
8. What is block transform coding? Explain.
9. With an example, explain Huffman coding.
10. Write about Haar Wavelet transform.
11. What is meant by redundancy in image? Explain its role in image processing.
3 −1
12. Compute the Haar transform of the 2 x 2 image F= [ ]
6 2
UNIT-6
Morphological Image Processing
Introduction
The word morphology commonly denotes a branch of biology that deals with the form
and structure of animals and plants. Morphology in image processing is a tool for extracting
image components that are useful in the representation and description of region shape, such
as boundaries and skeletons. Furthermore, the morphological operations can be used for
filtering, thinning and pruning. The language of the Morphology comes from the set theory,
where image objects can be represented by sets.
Some Basic Concepts form Set Theory :
If every element of a set A is also an element of another set B, then A is said to be a
subset of B, denoted as A ⊆ B
The union of two sets A and B, denoted by C = A∪B
The intersection of two sets A and B, denote by D = A∩B
Dilation
Dilation is used for expanding an element A by using structuring element B. Dilation
of A by B and is defined by the following equation:
This equation is based on obtaining the reflection of B about its origin and shifting
this reflection by z. The dilation of A by B is the set of all displacements z, such that and A
overlap by at least one element. Based On this interpretation the equation of (9.2-1) can be
rewritten as:
Dilation is typically applied to binary image, but there are versions that work on gray
scale image. The basic effect of the operator on a binary image is to gradually enlarge the
boundaries of regions of foreground pixels (i.e. white pixels, typically). Thus areas of
foreground pixels grow in size while holes within those regions become smaller.
Any pixel in the output image touched by the dot in the structuring element is set to
ON when any point of the structuring element touches a ON pixel in the original image. This
tends to close up holes in an image by expanding the ON regions. It also makes objects
larger. Note that the result depends upon both the shape of the structuring element and the
location of its origin.
Summary effects of dilation:
Expand/enlarge objects in the image
Fill gaps or bays of insufficient width
Fill small holes of sufficiently small size
Connects objects separated by a distance less than the size of the window
Erosion
Erosion is used for shrinking of element A by using element B. Erosion for Sets A
and B in Z2, is defined by the following equation:
This equation indicates that the erosion of A by B is the set of all points z such that B,
translated by z, is combined in A.
Any pixel in the output image touched by the · in the structuring element is set to ON
when every point of the structuring element touches a ON pixel in the original image. This
tends to makes objects smaller by removing pixels.
Duality between dilation and erosion:
Dilation and erosion are duals of each other with respect to set
complementation and reflection. That is,
Opening:
An erosion followed by a dilation using the same structuring element for both
operations.
Smooth contour
Break narrow isthmuses
Remove thin protrusion
Closing:
A Dilation followed by a erosion using the same structuring element for both
operations.
Smooth contour
Fuse narrow breaks, and long thin gulfs.
Remove small holes, and fill gaps.
Hit-or-Miss Transform:
The hit-and-miss transform is a basic tool for shape detection. The hit-or-miss
transform is a general binary morphological operation that can be used to look for particular
patterns of foreground and background pixels in an image.
Concept: To detect a shape:
Hit object
Miss background
Let the origin of each shape be located at its center of gravity.
If we want to find the location of a shape– X , at (larger) image, A
Let X be enclosed by a small window, say – W.
The local background of X with respect to W is defined as the set difference (W -
X).
Apply erosion operator of A by X, will get us the set of locations of the origin of X,
such that X is completely contained in A.
It may be also view geometrically as the set of all locations of the origin of X at
which X found a match (hit) in A.
Apply erosion operator on the complement of A by the local background set (W – X).
Notice, that the set of locations for which X exactly fits inside A is the intersection of
these two last operators above.
If B denotes the set composed of X and it’s background B = (B1,B2) ; B1 = X ,
B2 = (W-X).
The match (or set of matches) of B in A, denoted as
The structural elements used for Hit-or-miss transforms are an extension to the ones
used with dilation, erosion etc. The structural elements can contain both foreground and
background pixels, rather than just foreground pixels, i.e. both ones and zeros. The
structuring element is superimposed over each pixel in the input image, and if an exact match
is found between the foreground and background pixels in the structuring element and the
image, the input pixel lying below the origin of the structuring element is set to the
foreground pixel value. If it does not match, the input pixel is replaced by the boundary pixel
value.
Basic Morphological Algorithms
Boundary Extraction:
The boundary of a set A is obtained by first eroding A by structuring element B and
then taking the set difference of A and it’s erosion. The resultant image after subtracting the
eroded image from the original image has the boundary of the objects extracted. The
thickness of the boundary depends on the size of the structuring element. The boundary of a
set A is obtained by first eroding A by structuring element B and then taking the set
difference of A and it’s erosion. The resultant image after subtracting the eroded image from
the original image has the boundary of the objects extracted. The thickness of the boundary
depends on the size of the structuring element. The boundary β (A) of a set A is
Convex Hull:
A is said to be convex if a straight line segment joining any two points in A lies
entirely within A.
The convex hull H of set S is the smallest convex set containing S
The set difference H-S is called the convex deficiency of S
The convex hull and convex deficiency useful for object description. This algorithm
iteratively applying the hit-or-miss transforms to A with the first of B element, unions it with
A, and repeated with second element of B.
Let Bi , i=1,2,3,4 represents the four structuring elements. Then we need to
implement the
Let us consider
If
Thinning:
The thinning of a set A by a structuring element B, can be defined by terms of the hit-
and-miss transform:
The process is to thin by one pass with B1, then thin the result with one pass with B2,
and so on until A is thinned with one pass with Bn. The entire process is repeated until no
further changes occur. Each pass is preformed using the equation:
Thickening:
Thickening is a morphological dual of thinning and is defined as
the structuring elements used for thickening have the same form as in thinning, but with all
1’s and 0’s interchanged.
Skeletons:
The skeleton of A is defined by terms of erosions and openings:
with
k times, and K is the last iterative step before A erodes to an empty set in other words:
The S(A) can be obtained as the union of skeleton subsets Sk(A). A can be also
reconstructed from subsets Sk(A) by using the equation
Dilation:
Equation for gray-scale dilation is
Df and Db are domains of f and b. The condition that (s-x),(t-y) need to be in the
domain of f and x,y in the domain of b, is analogous to the condition in the binary definition
of dilation, where the two sets need to overlap by at least one element.
We will illustrate the previous equation in terms of 1-D. and we will receive an
equation for 1 variable:
The requirements the (s-x) is in the domain of f and x is in the domain of b implies
that f and b overlap by at least one element. Unlike the binary case, f, rather than the
structuring element b is shifted. Conceptually f sliding by b is really not different than b
sliding by f. The general effect of performing dilation on a gray scale image is twofold:
If all the values of the structuring elements are positive than the output image tends to
be brighter than the input. Dark details either are reduced or eliminated, depending on how
their values and shape relate to the structuring element used for dilation
Erosion:
Gray-scale erosion is defined as:
The condition that (s+x),(t+y) have to be in the domain of f, and x,y have to be in the
domain of b, is completely analogous to the condition in the binary definition of erosion,
where the structuring element has to be completely combined by the set being eroded. The
same as in erosion we illustrate with 1-D function
The structuring element is rolled underside the surface of f. All the peaks that are
narrow with respect to the diameter of the structuring element will be reduced in amplitude
and sharpness. The initial erosion removes the details, but it also darkens the image. The
subsequent dilation again increases the overall intensity of the image without reintroducing
the details totally removed by erosion.
Opening a G-S picture is describable as pushing object B under the scan-line graph,
while traversing the graph according the curvature of B
Closing:
In the closing of a gray-scale image, we remove small dark details, while relatively
undisturbed overall gray levels and larger dark features
The structuring element is rolled on top of the surface of f. Peaks essentially are left
in their original form (assume that their separation at the narrowest points exceeds the
diameter of the structuring element). The initial dilation removes the dark details and
brightens the image. The subsequent erosion darkens the image without reintroducing the
details totally removed by dilation
Closing a G-S picture is describable as pushing object B on top of the scan-line graph,
while traversing the graph according the curvature of B. The peaks are usually remains in
their original form.
Morphological smoothing
Perform opening followed by a closing
The net result of these two operations is to remove or attenuate both bright and
dark artifacts and noise.
Morphological gradient
Dilation and erosion are use to compute the morphological gradient of an image,
denoted g:
The top-hat transform is used for light objects on a dark background and the bottom-
hat transform is used for the converse.
Textural segmentation:
The objective is to find the boundary between different image regions based on
their textural content.
Close the input image by using successively larger structuring elements.
Then, single opening is preformed ,and finally a simple threshold that yields the
boundary between the textural regions.
Granulometry:
Granulometry is a field that deals principally with Determining the size distribution
of particles in an image.
Because the particles are lighter than the background, we can use a morphological
approach to determine size distribution. To construct at the end a histogram of it.
Based on the idea that opening operations of particular size have the most effect on
regions of the input image that contain particles of similar size.
This type of processing is useful for describing regions with a predominant
particle-like character.
Image Segmentation
Image segmentation divides an image into regions that are connected and have some
similarity within the region and some difference between adjacent regions. The goal is
usually to find individual objects in an image. For the most part there are fundamentally two
kinds of approaches to segmentation: discontinuity and similarity.
Detection of Discontinuities:
There are three kinds of discontinuities of intensity: points, lines and edges. The most
common way to look for discontinuities is to scan a small mask over the image. The mask
determines which kind of discontinuity to look for.
9
R w1 z1 w2 z 2 ... w9 z9 w z
i 1
i i
Point Detection: Whose gray value is significantly different from its background
R T
where T : a nonnegativ e threshold
Line Detection:
Only slightly more common than point detection is to find a one pixel wide line in an
image.
For digital images the only three point straight lines are only horizontal, vertical, or
diagonal (+ or –45).
Edge Detection:
Edge is a set of connected pixels that lie on the boundary between two regions
• ’Local’ concept in contrast to ’more global’ boundary concept
• To be measured by grey-level transitions
• Ideal and blurred edges
First derivative can be used to detect the presence of an edge (if a point is on a ramp).
The sign of the second derivative can be used to determine whether an edge pixel lie
on the dark or light side of an edge
Second derivative produces two value per edge
Zero crossing near the edge midpoint
Non-horizontal edges – define a profile perpendicular to the edge direction
Gradient
– Vector pointing to the direction of maximum rate of change of f at coordinates (x,y)
Gx fx
f f
G y y
– Magnitude: gives the quantity of the increase (some times referred to as gradient too)
f mag (f ) Gx2 Gy2
1
2
Partial derivatives computed through 2x2 or 3x3 masks. Sobel operators introduce
some smoothing and give more importance to the center point
Laplacian
– Second-order derivative of a 2-D function
2 f 2 f
f 2 2
2
x y
– Digital approximations by proper masks
The Laplacian of h is
The Laplacian of a Gaussian sometimes is called the Mexican hat function. It also can
be computed by smoothing the image with the Gaussian smoothing mask, followed by
application of the Laplacian mask.
The Hough transform consists of finding all pairs of values of and which satisfy
the equations that pass through (x,y). These are accumulated in what is basically a
2-dimensional histogram. When plotted these pairs of and will look like a sine wave. The
process is repeated for all appropriate (x,y) locations.
Thresholding:
The range of intensity levels covered by objects of interest is different from the
background.
1 if f ( x, y ) T
g ( x, y )
0 if f ( x, y ) T
Illumination:
There are two main approaches to region-based segmentation: region growing and
region splitting.
Region Growing:
For example: P(Rk)=TRUE if all pixels in Rk have the same gray level.
Region Splitting:
PREVIOUS QUESTIONS
1. With necessary figures, explain the opening and closing operations.
2. Explain the following morphological algorithms i) Boundary extraction ii) Hole
filling.
3. Explain the following morphological algorithms i) Thinning ii) Thickening
4. What is Hit-or-Miss transformation? Explain.
5. Discuss about Grey-scale morphology.
6. Write a short notes on Geometric Transformation
7. Explain about edge detection using gradient operator.
8. What is meant by edge linking? Explain edge linking using local processing
9. Explain edge linking using Hough transform.
10. Describe Watershed segmentation Algorithm
11. Discuss about region based segmentation.
12. Explain the concept of Thresholding in image segmentation and discuss its
merits and limitations.