EC8093 DIP-NOTES ESW N - by WWW - Easyengineering.net 5
EC8093 DIP-NOTES ESW N - by WWW - Easyengineering.net 5
EC8093 DIP-NOTES ESW N - by WWW - Easyengineering.net 5
net
ww
w.E
asy
E ngi
nee
rin
g.n
et
Page |1
SNO CONTENT
UNIT I DIGITAL IMAGE FUNDAMENTALS
1. Introduction
2. Components
3. Steps in Digital Image Processing
4. Elements of Visual
Perception
5. Image Sensing and Acquisition
6. Image Sampling and Quantization
7. Relationships between pixels
ww 8.
9.
Color models.
Part A Question and Answers
10.
1. w.E
Part B Questions
Introduction
UNIT II IMAGE ENHANCEMENT
2.
3.
Spatial Domain:
asy
Gray level transformations
4.
5.
Histogram processing
Basics of Spatial Filtering
En
6.
7.
8.
9.
gi
Smoothing and Sharpening Spatial Filtering
Frequency Domain: Introduction to Fourier Transform
nee
Smoothing frequency domain filters Ideal, Butterworth and Gaussian filters.
Sharpening frequency domain filters Ideal, Butterworth and Gaussian filters.
10.
11.
Part A Question and Answers
Part B Questions rin
1.
UNIT III IMAGE RESTORATION
Noise models – Mean Filters g.n
2.
3.
4.
5.
Order Statistics
Adaptive filters
Band reject Filters – Band pass Filters
Notch Filters – Optimum Notch Filtering
e t
6. Inverse Filtering
7. Wiener filtering
8. Blind Image Restoration
9. Morphological Processing
10. Erosion and Dilation
11. Part A Question and Answers
12. Part B Questions
UNIT IV IMAGE COMPRESSION
1. Introduction
2. Wavelets
3. Subband coding
4. Multiresolution expansions
5. Compression: Fundamentals
6. Image Compression models – Error Free Compression
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
Page |2
ww 4.
5.
6.
Introduction
Boundary representation
Chain Code
7.
8.
9.
w.E
Polygonal approximation, signature, boundary segments
Boundary description
Shape number
10. Fourier Descriptor
asy
11. moments- Regional Descriptors
12. Topological feature
13. Texture En
14. Patterns and Pattern classes
15. Part A Question and Answers
16. Part B Questions
gi nee
rin
g.n
e t
Page |3
UNIT 1
DIGITAL IMAGE
ww
FUNDAMENTALS
w .Ea
syE
REFERRED BOOK:
ngi
nee
1. Rafael C. Gonzalez, Richard E. Woods - Digital Image Processing,
2. Anil K. Jain - Fundamentals of Digital Image Processing.
rin
g.n
e t
Page |4
UNIT I
DIGITAL IMAGE FUNDAMENTALS
1.INTRODUCTION
Interest in digital image processing methods stems from two principal application areas:
improvement of pictorial information for human interpretation; and processing of image data for
storage, transmission, and representation for autonomous machine perception.
Definition of an Image:
An image may be defined as a two-dimensional function,f(x,y) , where x and y are spatial (plane)
coordinates, and the amplitude of f at any pair of coordinates (x,y) is called the intensity or gray
level of the image at that point. When x,y and the amplitude values of f are all finite, discrete
quantities, we call the image a digital image. The field of digital image processing refers to
processing digital images by means of a digital computer.
ww
Note: A digital image is composed of a finite number of elements, each of which has a particular
location and value. These elements are referred to as picture elements, image elements, pels, and
w.E
pixels. Pixel is the term most widely used to denote the elements of a digital image.
2.COMPONENTS
asy
An image processing system can consist of a light source that illuminates the scene, a
sensor system, a frame grabber that can be used to collect images and a computer that
stores necessary software packages to process the collected images. Some I/O interfaces,
En
the computer screen and output devices((eg. printers) can be included as well.
gi nee
rin
g.n
e t
Fig 1.1 Components of an image processing System
Page |5
ww
Computer: w.E Fig 1.2 Elements of Image processing system
The computer in an image processing system is a general-purpose computer and can range
asy
from a PC to a supercomputer. In dedicated applications, sometimes custom computers are
used to achieve a required level of performance. For general purpose image processing any
En
well-equipped PC-type machine can be used.
Software:
specialized modules. gi
Software for image processing consists of specialized modules that perform specific tasks. A
well-designed package also includes the capability for the user to write code that utilizes the
nee
More sophisticated software packages allow the integration of those modules and general-
purpose software commands from at least one computer language
Mass storage:
rin
Mass storage capability is a must in image processing applications. When dealing with
g.n
thousands, or even millions of images, providing adequate storage in an image processing
system can be a challenge.
e
Storage is measured in bytes (eight bits), Kbytes (one thousand bytes), Mbytes (one million
bytes), Gbytes (meaning giga, or one billion, bytes), and Tbytes (meaning tera, or one trillion,
bytes).
Digital storage for image processing applications falls into three principal categories:
t
a) Short-term storage for use during processing
One method of providing short-term storage is computer memory. Another is by specialized
boards, called frame buffers, that store one or more images and can be accessed rapidly,
usually at video rates (e.g., at 30 complete images per second).
It allows virtually instantaneous image zoom, as well as scroll (vertical shifts) and pan
(horizontal shifts). Frame buffers are housed in the specialized image processing hardware
unit
b) On-line storage for relatively fast recall
On-line storage generally takes the form of magnetic disks or optical-media storage. The key
factor characterizing on-line storage is frequent access to the stored data.
c) Archival storage, characterized by infrequent access.
Archival storage is characterized by massive storage requirements but infrequent need for
access. Magnetic tapes and optical disks housed in “jukeboxes” are the usual media for
archival applications.
Image displays:
Page |6
Image displays in use are mainly color (preferably flat screen) TV monitors. Monitors are
driven by the outputs of image and graphics display cards that are an integral part of the
computer system
Hardcopy:
Hardcopy devices for recording images include laser printers, film cameras, heat-sensitive
devices, inkjet units, and digital units, such as optical and CDROM disks. Film provides the
highest possible resolution, but paper is the obvious medium of choice for written material.
Networking:
Networking is almost a default function in any computer system in use today. Because of the large
amount of data inherent in image processing applications, the key consideration in image
transmission is bandwidth. In dedicated networks, this typically is not a problem, but
communications with remote sites via the Internet are not always as efficient.
ww This figure shows a simplified horizontal cross section of the human eye. The eye is
nearly a sphere, with an average diameter of approximately 20 mm. Three membranes enclose
w.E
the eye: the cornea and sclera outer cover; the choroid; and the retina. The cornea is a tough,
transparent tissue that covers the anterior surface of the eye. Continuous with the cornea, the
sclera is an opaque membrane that encloses the remainder of the optic globe.
asy
The choroid lies directly below the sclera. This membrane contains a network of blood
vessels that serve as the major source of nutrition to the eye. Even superficial injury to the
choroid, often not deemed serious, can lead to severe eye damage as a result of inflammation
En
that restricts blood flow. The choroid coat is heavily pigmented and hence helps to reduce the
amount of extraneous light entering the eye and the backscatter within the optic globe. At its
gi
anterior extreme, the choroid is divided into the ciliary body and the iris. The latter contracts or
expands to control the amount of light that enters the eye. The central opening of the iris (the
nee
pupil) varies in diameter from approximately 2 to 8 mm. The front of the iris contains the visible
pigment of the eye, whereas the back contains a black pigment.
rin
The lenses made up of concentric layers of fibrous cells and are suspended by fibers
that attach to the ciliary body. It contains 60 to 70% water, about 6% fat, and more protein than
g.n
any other tissue in the eye. The lens is colored by a slightly yellow pigmentation that increases
with age. In extreme cases, excessive clouding of the lens, caused by the affliction commonly
referred to as cataracts, can lead to poor color discrimination and loss of clear vision. The lens
e
absorbs approximately 8% of the visible light spectrum, with relatively higher absorption at
shorter wavelengths. Both infrared and ultraviolet light are absorbed appreciably by proteins
within the lens structure and, in excessive amounts, can damage the eye.
The innermost membrane of the eye is the retina, which lines the inside of the wall’s
t
entire posterior portion. When the eye is properly focused, light from an object outside the eye
is imaged on the retina. Pattern vision is afforded by the distribution of discrete light receptors
over the surface of the retina.
There are two classes of receptors: cones and rods. The cones in each eye number
between 6 and 7 million. They are located primarily in the central portion of the retina, called
the fovea, and are highly sensitive to color. Humans can resolve fine details with these cones
largely because each one is connected to its own nerve end. Muscles controlling the eye rotate
the eyeball until the image of an object of interest falls on the fovea. Cone vision is called
photopic or bright-light vision.
Page |7
ww
w.E
asy
En
gi nee
rin
Fig 1.3 Simplified horizontal cross section of the human eye
The number of rods is much larger: Some 75 to 150 million are distributed over the retinal
g.n
surface. The larger area of distribution and the fact that several rods are connected to a single
nerve end reduce the amount of detail discernible by these receptors. Rods serve to give a
general, overall picture of the field of view. They are not involved in color vision and are sensitive
e
to low levels of illumination. For example, objects that appear brightly colored in daylight when
seen by moonlight appear as colorless forms because only the rods are stimulated. This
phenomenon is known as scotopic or dim-light vision.
This figure shows the density of rods and cones for a cross section of the right eye
t
passing through the region of emergence of the optic nerve from the eye. The absence of
receptors in this area results in the so-called blind spot. Except for this region, the distribution
of receptors is radially symmetric about the fovea.
Receptor density is measured in degrees from the fovea (that is, in degrees off axis, as
measured by the angle formed by the visual axis and a line passing through the center of the
lens and intersecting the retina). In figure the cones are most dense in the center of the retina
(in the center area of the fovea).Note also that rods increase in density from the center out to
approximately 20°off axis and then decrease in density out to the extreme periphery of the
retina. The fovea itself is a circular indentation in the retina of about 1.5 mm in diameter.
Page |8
and the focal length needed to achieve proper focus is obtained by varying the shape of the
lens.
The fibers in the ciliary body accomplish this, flattening or thickening the lens for
ww
w.E
asy
Fig 1.4 Density of rods and cones for a cross section of the right eye
En
distant or near objects, respectively. The distance between the center of the lens and the retina
along the visual axis is approximately 17 mm. The range of focal lengths is approximately 14
mm to 17 mm, the latter taking place when the eye is relaxed and focused distances greater
than about 3 m.
gi nee
rin
g.n
e
Fig 1.5 Graphical representation of the eye looking at a palm tree. Point C is the
t
optical centre of the lens
The geometry illustrates how to obtain the dimensions of an image formed on the retina.
For example, suppose that a person is looking at a tree 15 m height at a distance of 100 m. The
retinal image is focused primarily on the region of the fovea. Perception then takes place by the
relative excitation of light receptors, which transform radiant energy into electrical impulses that
ultimately are decoded by the brain.
Terminologies:
Brightness:
Brightness is an attribute of visual perception in which a source appears to be radiating
or reflecting light. In other words, brightness is the perception elicited by the luminance of a
visual target. This is a subjective attribute/property of an object being observed and one of the
color appearance parameters of color appearance models
Hue:
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
Page |9
Hue is one of the main properties (called color appearance parameters) of a color,
defined technically (in the CIECAM02 model), as "the degree to which a stimulus can be
described as similar to or different from stimuli that are described as red, green, blue, and
yellow" .
Mach bands:
w.E
Mach bands are an optical illusion named after the physicist Ernst Mach. It exaggerates
the contrast between edges of the slightly differing shades of gray, as soon as they contact one
another, by triggering edge-detection in the human visual system.
asy
The Mach bands effect is due to the spatial high-boost filtering performed by the human
visual system on the luminance channel of the image captured by the retina. This filtering is largely
performed in the retina itself, by lateral inhibition among its neurons.
nee
combination of input electrical power and sensor material that is responsive to the particular
type of energy being detected. The output voltage waveform is the response of the sensor(s),
and a digital quantity is obtained from each sensor by digitizing its response.
e
waveform is proportional to light. The use of a filter in front of a sensor improves selectivity. For
t
example, a green (pass) filter in front of a light sensor favors light in the green band of the color
spectrum. As a consequence, the sensor output will be stronger for green light than for other
components in the visible spectrum. In order to generate a 2-D image using a single sensor,
there has to be relative displacements in both the x- and y-directions between the sensor and
the area to be imaged.
P a g e | 10
Figure 2.13 shows an arrangement used in high-precision scanning, where a film negative is
mounted onto a drum whose mechanical rotation provides displacement in one dimension. The
single sensor is mounted on a lead screw that provides motion in the perpendicular direction.
Since mechanical motion can be controlled with high precision, this method is an inexpensive
(but slow) way to obtain high-resolution images.
ww
w.E
asy
Fig 1.8 Combining a single sensor with motion to generate a 2D image.
En
Other similar mechanical arrangements use a flat bed, with the sensor moving in two linear
directions. These types of mechanical digitizers sometimes are referred to as micro
gi
densitometers. Another example of imaging with a single sensor places a laser source
coincident with the sensor. Moving mirrors are used to control the outgoing beam in a scanning
nee
pattern and to direct the reflected laser signal onto the sensor.
rin
A geometry that is used much more frequently than single sensors consists of an in-line
arrangement of sensors in the form of a sensor strip.
g.n
Fig 1.9 LINE SENSOR
e
The strip provides imaging elements in one direction. Motion perpendicular to the strip
provides imaging in the other direction, as shown in (a).This is the type of arrangement used
in most flat bed scanners. Sensing devices with 4000 or more in-line sensors are possible. In-
t
line sensors are used routinely in airborne imaging applications, in which the imaging system
is mounted on an aircraft that flies at a constant altitude and speed over the geographical area
to be imaged.
One-dimensional imaging sensor strips that respond to various bands of the electromagnetic
spectrum are mounted perpendicular to the direction of flight. The imaging strip gives one line
of an image at a time, and the motion of the strip completes the other dimension of a two-
dimensional image. Lenses or other focusing schemes are used to project the area to be
scanned onto the sensors. Sensor strips mounted in a ring configuration are used in medical
and industrial imaging to obtain cross-sectional (“slice”) images of 3-Dobjects.
P a g e | 11
ww
to X-ray energy). It is important to note that the output of the sensors must be processed by
reconstruction algorithms whose objective is to transform the sensed data into meaningful
cross-sectional images. In other words, images are not obtained directly from the sensors by
w.E
motion alone; they require extensive processing. A 3-D digital volume consisting of stacked
images is generated as the object is moved in a direction perpendicular to the sensor ring. Other
modalities of imaging based on the CAT principle include magnetic resonance imaging (MRI)
asy
and positron emission tomography (PET).The illumination sources, sensors, and types of
images are different, but conceptually they are very similar to the basic imaging approach.
En
gi nee
rin
g.n
e t
P a g e | 12
ww
is that a complete image can be obtained by focusing the energy pattern onto the surface of the
array. The principal manner in which array sensors are used is shown in Figure.
w.E
asy
En
gi nee
rin
g.n
Fig 1.13 An Example of the digital image acquisition process
e
This figure shows the energy from an illumination source being reflected from a scene element,
t
but, as mentioned at the beginning of this section, the energy also could be transmitted through
the scene elements. The first function performed by the imaging system shown in Figure is to
collect the incoming energy and focus it onto an image plane. If the illumination is light, the front
end of the imaging system is a lens, which projects the viewed scene onto the lens focal plane,
as Figure shows. The sensor array, which is coincident with the focal plane, produces outputs
proportional to the integral of the light received at each sensor. Digital and analog circuitry
sweep these outputs and convert them to a video signal, which is then digitized by another
section of the imaging system. The output is a digital image, as shown diagrammatically in
Figure.
Introduction:
There are numerous ways to acquire images, but the objective is to generate digital
images from sensed data. The output of most sensors is a continuous voltage waveform whose
amplitude and spatial behavior are related to the physical phenomenon being sensed. To create
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
P a g e | 13
a digital image, convert the continuous sensed data into digital form. This involves two
processes: sampling and quantization.
ww
w.E
asy
En
gi nee
rin
g.n
Fig 1.14 Concept of sampling and Quantisation e t
The spatial location of each sample is indicated by a vertical tick mark in the bottom part
of the figure. The samples are shown as small white squares superimposed on the function.
The set of these discrete locations gives the sampled function. However, the values of the
samples still span (vertically) a continuous range of intensity values. In order to form a digital
function, the intensity values also must be converted (quantized) into discrete quantities.
The right side shows the intensity scale divided into eight discrete intervals, ranging from
black to white. The vertical tick marks indicate the specific value assigned to each of the eight
intensity intervals. The continuous intensity levels are quantized by assigning one of the eight
values to each sample.
The assignment is made depending on the vertical proximity of a sample to a vertical
tick mark. The digital samples resulting from both sampling and quantization. Starting at the top
of the image and carrying out this procedure line by line produces a two-dimensional digital
image. It is implied that, in addition to the number of discrete levels used, the accuracy achieved
in quantization is highly dependent on the noise content of the sampled signal.
When a sensing strip is used for image acquisition, the number of sensors in the strip
establishes the sampling limitations in one image direction. Mechanical motion in the other
direction can be controlled more accurately, but it makes little sense to try to achieve sampling
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
P a g e | 14
density in one direction that exceeds the sampling limits established by the number of sensors
in the other. Quantization of the sensor outputs completes the process of generating a digital
image.
ww
w.E
asy
En
Fig 1.15 Quantized image
gi
When a sensing array is used for image acquisition, there is no motion and the number
of sensors in the array establishes the limits of sampling in both directions. Quantization of the
nee
sensor outputs is as before. This figure illustrates this concept. This shows a continuous image
projected onto the plane of an array sensor. Clearly, the quality of a digital image is determined
to a large degree by the number of samples and discrete intensity levels used in sampling and
quantization.
rin
Definition
a. Band limited function: g.n
e t
Neighbors of a pixel
Assuming that a pixel has the coordinates (x,y) we then have its horizontal and
vertical neighbors which have coordinates as follows.
P a g e | 15
If an image point falls in the neighborhood of one particular pixel,we then call this
image point as the adjacency of that pixel. Normally, there are two types of adjacency
namely 4 and 8 adjacency.
ww
1) 4 adjacency: Two pixels are 4 adjacent if one of them is in the set with four pixels.
w.E
2) 8 adjacency: Two pixels are 8 adjacent if one of them is in the set with eight pixels. One
example is shown below, where red "1s" form the 8 adjacency set.
asy
En
gi nee
rin
Connectivity is relevent but has certain difference from adjacency. Two pixel from a subset G
are connected if and only if a path linking them also connects all the pixels within G. In addition,
g.n
G is a region of the image as it is a connected subset. A boundary is a group of pixels that
have one or more neighbors that do not belong to the group.
Distance Measures
e
Assuming there are two image points with coordinates (x,y) and (u,v). A distance measure is
t
normally conducted for evaluating how close these two pixels are and how they are related.
A number of distance measurements have been commonly used for this purpose.
The Euclidean distance between two 2D points I(x1,y1) and J(x2,y2) is defined as
The city block distance between two 2 D points (x1,y1) and (x2,y2) can be calculated as follows
For the above two 2 D image points the chessboard distance is Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
P a g e | 16
Definition of color:
Color is that aspect of visible radiant energy by which an observer may distinguish
between different spectral compositions.
Color is generally characterized by attaching names to the different stimuli e.g. white,
gray, back red, green, blue. Color stimuli are generally more pleasing to eye than “black and
ww
stimuli” .Consequently pictures with color are widespread in TV photography and printing.
Color Models
w.E
The purpose of a color model (also called color space or color system) is to facilitate the
asy
specification of colors in some standard, generally accepted way. In essence, a color model is
a specification of a coordinate system and a subspace within that system where each color is
represented by a single point.
En
Most color models in use today are oriented either toward hardware (such as for color
monitors and printers) or toward applications where color manipulation is a goal (such as in the
creation of color graphics for animation).
gi nee
In terms of digital image processing, the hardware-oriented models most commonly
used in practice are the RGB (red, green, blue) model for color monitors and a broad class of
color video cameras; the CMY (cyan, magenta, yellow) and CMYK (cyan, magenta, yellow,
rin
black) models for color printing; and the HSI (hue, saturation, intensity) model, which
corresponds closely with the way humans describe and interpret color.
g.n
The HSI model also has the advantage that it decouples the color and gray-scale
information in an image, making it suitable for many of the gray-scale techniques developed in
this book. There are numerous color models in use today due to the fact that color science is a
broad field that encompasses many areas of application.
P a g e | 17
ww
w.E Fig 1.16 RGB 24 bit color cube
In this model, the gray scale (points of equal RGB values) extends from black to white
asy
along the line joining these two points. The different colors in this model are points on or inside
the cube, and are defined by vectors extending from the origin. For convenience, the
En
assumption is that all color values have been normalized so that the cube shown in Fig.1.7 is
the unit cube. That is, all values of R, G, and B are assumed to be in the range [0,1]. Images
gi
represented in the RGB color model consist of three component images, one for each primary
color. When fed into an RGB monitor, these three images combine on the screen to produce a
nee
composite color image. The number of bits used to represent each pixel in RGB space is called
the pixel depth.
rin
Consider an RGB image in which each of the red, green, and blue images is an 8-bit
image. Under these conditions each RGB color pixel [that is, a triplet of values (R, G, B)] is said
g.n
to have a depth of 24bits (3 image planes times the number of bits per plane).
e
The term full color image is used often to denote a 24-bit RGB color image. The total
number of colors in a 24-bit RGB image is (28)3=16,777,216. Figure 1.8 shows the24-bit RGB
color cube corresponding to the diagram in Fig.1.7.
t
The HSI Color Model
As we have seen, creating colors in the RGB and CMY models and changing from one
model to the other is a straightforward process. These color systems are ideally suited for
hardware implementations. In addition, the RGB system matches nicely with the fact that the
human eye is strongly perceptive to red, green, and blue primaries Unfortunately the RGB and
CMY and other similar color models are not well suited for describing colors in terms that are
practical for human interpretation. For example, one does not refer to the color of an automobile
by giving the percentage of each of the primaries composing its color. Furthermore, we do not
think of color images as being composed of three primary images that combine to form that
single image.
When humans view a color object, we describe it by its hue, saturation, and brightness.
The hue is a color attribute that describes a pure color (pure yellow, orange, or red), whereas
saturation gives a measure of the degree to which a pure color is diluted by white light.
brightness is a subjective descriptor that is practically impossible to measure.
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
P a g e | 18
It embodies, the achromatic notion of intensity and is one of the key factors in describing color
sensation. The intensity (gray level) is a most useful descriptor of monochromatic images. This
quantity definitely is measurable and easily interpretable. The model we are about to present,
called the,FISI (hue, saturation, intensity) color model, decouples the intensity component from
the color-carrying information (hue and saturation) in a color image.
As a result, the HSI model is an ideal tool for developing image processing algorithms
based on color descriptions that are natural and intuitive to human who are the developers and
users of these algorithms.
In the arrangement show in Fig.1.9, the line (intensity axis) joining the black and white
vertices is vertical. To determine the intensity component of any color point in Fig.1.9, pass a
plane perpendicular to the intensity axis and containing the color point. The intersection of the
plane with the intensity axis would give us a point with intensity value in the range [0, 1]. The
ww
saturation of points on the intensity axis is zero, as evidenced by the fact that all points along
this axis are gray.
w.E
asy
En
gi nee
rin
g.n
Fig 1.17 Conceptual relationships between the RGB and HSI color Models
e
In order to see how hue can be determined also from a given RGB point, consider Fig.
1.9(b), which shows a plane defined by three points (black, white, and cyan). The fact that the
t
black and white points are contained in the plane tells us that the intensity axis also is contained
in the plane.
All points contained in the plane segment defined by the intensity axis and the
boundaries of the cube have the same hue (cyan in this case). By rotating the shaded plane
about the vertical intensity axis, we would obtain different hues.
P a g e | 19
ww
w.E
asy
Fig 1.18 Hue and saturation in the HSI color Model
En
From these concepts we arrive at the conclusion that the hue, saturation, and intensity
values required to form the HSI space can be obtained from the RGB color cube. That is, it is
possible to convert any RGB point to a corresponding point in the HSI color model by using
geometrical formulas.
gi nee
The important components of the HSI color space are the vertical intensity axis, the
length of the vector to a color point, and the angle this vector makes with the red axis.
Therefore, it is not unusual to see the HSI planes defined is terms of a hexagon, a triangle, or
rin
even a circle, as Figs. 1.10(c) and (d) show. The shape chosen does not matter because any
one of these shapes can be warped into one of the other two by a geometric transformation.
g.n
Figure 1.10 shows the HSI model based on color triangles and also on circles.
e
Given an image in RGB color format, the H component of each RGB pixel is obtained using the
equation
t
P a g e | 20
It is assumed that the RGB values have been normalized to the range [0, 1,] and that
angle is measured with respect to the red axis of the HSI space, as indicated in Fig.1.9. Hue
can be normalized to the range [0,1] by dividing by 360. The other two HSI components already
are in this range if the given RGB values are in the interval [0, 1].
ww
w.E
asy
En
gi nee
rin
g.n
e t
P a g e | 21
ww
w.E
Fig 1.19 The HSI color model based on (a) Triangular,(b) Circular color planes.
asy
En
gi nee
rin
g.n
e t
P a g e | 22
PART A
1. Define Image.
An Image may be defined as a two dimensional function f(x,y) where x & y are spatial(plane)
coordinates, and the amplitude of f at any pair of coordinates (x,y) is called intensity or gray level
of the image at that point. When x,y and the amplitude values of f are all finite, discrete quantities
we call the image as Digital Image.
2. Define Image Sampling.
Digitization of spatial coordinates (x,y) is called Image Sampling. To be suitable for computer
processing, an image function f(x,y) must be digitized both spatially and in magnitude.
3. Define Quantization.
ww
Digitizing the amplitude values is called Quantization. Quality of digital image is determined to a
large degree by the number of samples and discrete gray levels used in sampling and quantization.
w.E
The range of values spanned by the gray scale is called dynamic range of an image. Image will
have high contrast, if the dynamic range is high, and image will have dull washed out gray look if
the dynamic range is low.
5. Define Mach band effect. asy
En
The spatial interaction of Luminance from an object and its surround creates a phenomenon called
the mach band effect.
6. Define Brightness.
gi nee
Brightness of an object is the perceived luminance of the surround. Two objects with different
surroundings would have identical luminance but different brightness.
P a g e | 23
Hue is a color attribute that represents dominant color as perceived by an observer where
saturation gives a measure of the degree to which a pure color is diluted by white light.
12. List the applications of color models.
The applications of color models are,
i.RGB model--- used for color monitor & color video camera
ii.CMY model---used for color printing
iii.HIS model----used for color image processing
ww
13. What is Chromatic Adoption?
The hue of a perceived color depends on the adoption of the viewer. For example, the American
w.E
Flag will not immediately appear red, white, and blue, the viewer has been subjected to high
intensity red light before viewing the flag. The color of the flag will appear to shift in hue toward the
red component cyan.
14. Define Resolutions.
asy
En
Resolution is defined as the smallest number of discernible detail in an image. Spatial resolution is
the smallest discernible detail in an image and gray level resolution refers to the smallest
discernible change is gray level.
gi
15. Write the M X N digital image in compact matrix form?
nee
rin
16. Write the expression to find the number of bits to store a digital image?
The number of bits required to store a digital image is b=M X N X k When M=N, this equation
becomes b=N^2k
g.n
17. What is meant by pixel?
A digital image is composed of a finite number of elements, each of which has e
a particular location of value. These elements are referred to as pixels or image elements or pictu
re elements or pels elements.
t
18. Define digital image.
A digital image is an image f(x,y), that has been discretized both in spatial coordinates and
brightness.
19. List the steps involved in digital image processing.
The steps involved in digital image processing are,
i.Image Acquisition.
ii.Preprocessing.
iii.Segmentation.
P a g e | 24
Recognition is a process that assigns a label to an object based on the information provided by its
descriptors. Interpretation means assigning to a recognized object.
21. Specify the elements of DIP system.
The elements of DIP system are,
i.Image acquisition.ii.Storage.iii.Processing.iv.Communication.v.Display.
22. List the categories of digital storage.
The categories of digital storage are
ww
ii.Online storage for relatively fast recall.
iii. Archical storage for frequent access.
w.E
23. Write the two types of light receptors.
The two types of light receptors are, i.Cones. ii.Rods.
asy
24. How cones and rods are distributed in retina?
En
In each eye, cones are in the range 6-7 million, and rods are in the range 75-150 million.
25. Define subjective brightness and adaption.
gi nee
Subjective brightness means, intensity as preserved by the human visual system. Brightness
adaptation means the human visual system can operate only from scotopic to glare limit. It cannot
operate over the range simultaneously. It accomplishes this large variation by changes in its overall
intensity.
P a g e | 25
Zooming may be viewed as over sampling. It involves the creation of new pixel locations and the
assignment of gray levels to those new locations.
Radiance is the total amount of energy that flows from the light source, and it is usually measured
in watts (w).
32. Define the term Luminance.
ww
Luminance measured in lumens (lm), gives a measure of the amount of energy anobserver
perceiver from a light source.
w.E
An image can be expanded in terms of a discrete set of basis arrays called basis images.These
basis images can be generated by unitary matrices. Alternatively, a given NXN image can be
asy
viewed as an N^2X1 vectors. An image transform provides aset of coordinates or basis vectors for
vector space.
34. List the applications of transform.
The applications of transforms are, En
i.To reduce band width
ii.To reduce redundancy
gi nee
iii.To extract feature.
rin
35. What are the properties of unitary transform?
g.n
The properties of unitary transform are i. Determinant and the Eigen values of a unitary matrix have
t
The applications of DIP are,
i.Remote sensing
ii.Image transmission and storage for business application
iii.Medical imaging
iv.Astronomy
v) Gamma Ray imaging,
vi) X ray imaging,
vii) Imaging in UV band,
viii) Ultrasound Imaging
P a g e | 26
ww
40. What is aliasing and how to reduce aliasing?
Aliasing is an unwanted effect which is always present in a sampled image. A function is under
w.E
sampled if the sampling frequency is very low, so that it cannot satisfy the Shannon theorem. At this
condition, additional frequency components are introduced into the sampled function which corrupts
the sampled image. This effect is known as ‘aliasing’.
asy
Aliasing can be decreased by reducing the high frequency components. This is done by blurring or
smoothing the image before sampling.
En
41.Mention difference between a monochrome and a grayscale image? (Nov-2013)
Monochrome Image – Each pixel is stored as a single bit either as a ‘0’ or a ‘1’.
Part B
1.
gi
Gray scale Image – Each pixel is stored as a byte with a value between 0 and 255
1.
2.
Explain the Structure of the Human eye.
Explain the RGB model. (Nov/Dec 2011)
g.n
3.
4.
5.
6.
Describe the HSI color image model.
Explain sampling and quantization:
Explain about Mach band effect?
Explain color image fundamentals.
e t
7. Describe the basic relationship between the pixels.
8. What is Frame buffer? Discuss the categories of digital storage for image processing
applications. (Nov/Dec 2012)
9. Write notes on elements of visual perception.
P a g e | 27
UNIT 2
IMAGE
w w ENHANCEMENT
REFERRED BOOK:
w.E
asy
1. Rafael C. Gonzalez, Richard E. Woods - Digital Image Processing,
2. Anil K. Jain - Fundamentals of Digital Image Processing.
En
gi nee
rin
g.n
e t
P a g e | 28
UNIT II
IMAGE ENHANCEMENT
1. INTRODUCTION
Image enhancement refers to accentuation, or sharpening, of image features such as edges,
boundaries, or contrast to make a graphic display more useful for display and analysis. The
enhancement process does not increase the inherent information content in the data. But it
does increase the dynamic range of the chosen features so that they can be detected easily.
Image enhancement includes gray level and contrast manipulation, noise reduction, edge
crispening and sharpening, filtering, interpolation and magnification, pseudo coloring, and so
on. The greatest difficulty in image enhancement is quantifying the criterion for enhancement.
Therefore, a large number of image enhancement techniques are empirical and require
ww
interactive procedures to obtain satisfactory results. However, image enhancement remains a
very important topic because of its usefulness in virtually all image processing applications.
w.E
asy
En
gi nee
rin
2.Spatial Domain
g.n
The term spatial domain refers to the aggregate of pixels composing an image. Spatial domain
t
where f(x, y) is the input image, g(x, y) is the processed image, and T is an operator on f, defined
over some neighborhood of (x, y). In addition,T can operate on a set of input images, such as
performing the pixel-by-pixel sum of K images for noise reduction.
The principal approach in defining a neighborhood about a point (x, y) is to use a square or
rectangular subimage area centered at (x, y), as Figure shows. The center of the subimage is
moved from pixel to pixel starting, say, at the top left corner.
P a g e | 29
Fig 2.1 3X3 neighbourhood about a point (x,y)in an image in the spatialdomain.
The operator T is applied at each location (x, y) to yield the output,g, at that location. The process
ww
utilizes only the pixels in the area of the image spanned by the neighborhood. Although other
neighborhood shapes, such as approximations to a circle, sometimes are used, square and
w.E
rectangular arrays are by far the most predominant because of their ease of implementation.
The simplest form of T is when the neighborhood is of size 1*1 (that is, a single pixel). In this case,
g depends only on the value of f at (x, y), and T becomes a gray-level (also called an intensity or
mapping) transformation function of the form
asy
En
where, for simplicity in notation, r and s are variables denoting, respectively,the gray level of f(x, y)
and g(x, y) at any point (x, y). For example, if T(r) has the form shown in Figure, the effect of this
gi
transformation would be to produce an image of higher contrast than the original by darkening the
levels below m and brightening the levels above m in the original image. In this technique, known
nee
as contrast stretching, the values of r below m are compressed by the
transformation function into a narrow range of s, toward black.
rin
g.n
e t
P a g e | 30
ww
w.E
Fig 2.3 Intensity transformation function-Threshold function
3.Gray Level Transformation
An introduction to gray-level transformations, shows three basic types of functions used
asy
frequently for image enhancement: linear (negative and identity transformations), logarithmic (log
and inverse-log transformations), and power-law (nth power and nth root transformations).
En
gi nee
rin
g.n
e t
Fig 2.4 Some Basic Intensity transformation functions
Image Negatives
The negative of an image with gray levels in the range [0,L-1]is obtained by using the negative
transformation, which is given by the expression
Reversing the intensity levels of an image in this manner produces the equivalent of a photographic
negative. This type of processing is particularly suited for enhancing white or gray detail embedded
in dark regions of an image, especially when the black areas are dominant in size.
Log Transformations
The general form of the log transformation Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
P a g e | 31
where c is a constant, and it is assumed that r =>0.The shape of the log curve shows that this
transformation maps a narrow range of low gray-level values in the input image into a wider range
of output levels. The opposite is true of higher values of input levels. We would use a transformation
of this type to expand the values of dark pixels in an image while compressing the higher-level
values. The opposite is true of the inverse log transformation. Any curve having the general shape
of the log functions would accomplish this spreading/compressing of gray levels in an image. The
log function has the important characteristic that it compresses the dynamic range of images with
large variations in pixel values.
Power-Law Transformations
Power-law transformations have the basic form
ww
Equation can be written as
w.E
offsets typically are an issue of display calibration and as a result they are normally ignored.
Plots of s versus r for various values of γ are shown in Figure. As in the case of the log
transformation, power-law curves with fractional values of g map a narrow range of dark input
values into a wider range of output values, with the opposite being true for higher values of input
asy
levels. Unlike the log function, however, we notice here a family of possible transformation curves
obtained simply by varying γ. As expected, we see in Figure that curves generated with values of
En
γ >1 have exactly the opposite effect as those generated with values of γ<1. Finally, the equation
reduces to the identity transformation when c= γ =1.
A variety of devices used for image capture, printing, and display respond according to a
gi
power law. By convention, the exponent in the power-law equation is referred to as gamma. The
nee
process used to correct this power-law response phenomena is called gamma correction.
rin
g.n
e t
Fig 2.5 Plots of the equation for various values of γ
Contrast stretching
One of the simplest piecewise linear functions is a contrast-stretching transformation. Low-contrast
images can result from poor illumination, lack of dynamic range in the imaging sensor, or even
wrong setting of a lens aperture during image acquisition. The idea behind contrast stretching is to
increase the dynamic range of the gray levels in the image being processed. Figure shows a typical
transformation used for contrast stretching. The locations of points (r1, s1) and (r2, s2) control the
shape of the transformation function. If r 1=s1 and r2=s2, the transformation is a linear function that
produces no changes in gray levels. If r1=r2, s1=0and s2=L-1, the transformation becomes a
thresholding function that creates a binary image. Intermediate values of (r1, s1) and (r2, s2) produce
various degrees of spread in the gray levels of the output image, thus affecting its contrast.
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
P a g e | 32
ww
w.E
Fig 2.6 Contrast stretching a) Form of transformation function b)Low contrast image c)
Result of contrast stretching d) Thresholding
In general, r1 <=r2 and s1<= s2 is assumed so that the function is single valued and monotonically
asy
increasing. This condition preserves the order of gray levels, thus preventing the creation of
intensity artifacts in the processed image. Figure (b) shows an 8-bit image with low contrast. Fig.
(c) shows the result of contrast stretching, obtained by setting (r1, s1 )=(rmin, 0) and (r2,s2)=(rmax,L-
En
1) where rmin and rmax denote the minimum and maximum gray levels in the image, respectively.
Thus, the transformation function stretched the levels linearly from their original range to the full
gi
range [0, L-1]. Finally, Fig. (d) shows the result of using the thresholding function defined
previously, with r1=r2=m, the mean gray level in the image.
P a g e | 33
ww
w.E
asy
En
gi nee
Fig 2.7 a. This transformation highlights intensity range [A,B] & reduces all other intensities
to a lower level
rin
b. This transformation highlights intensity range [A,B] and preserves all other intensities
g.n
level
One approach is to display a high value for all gray levels in the range of interest and a low value
for all other gray levels. This transformation, shown in Figure (a), produces a binary image. The
e
second approach, based on the transformation shown in Figure(b), brightens the desired range of
gray levels but preserves the background and gray-level tonalities in the image. Figure (c) shows
a gray-scale image, and Figure (d) shows the result of using the transformation in Figure (a).
t
2. HISTOGRAM
2.1 Definition:
The histogram of a digital image with gray levels in the range [0,L-1] is a discrete function
h(rk) =nk , where rk is the gray level and nk is the number of pixels in the image having gray
level rk . It is common practice to normalize a histogram by dividing each of its values by the
total number of pixels in the image, denoted by n. Thus, a normalized histogram is given by
p(rk)= nk /n , for k=0,1,...,L-1.p(rk) gives an estimate of the probability of occurrence of gray level
rk .
P a g e | 34
Histograms are the basis for numerous spatial domain processing techniques. Histogram
manipulation can be used effectively for image enhancement. Histogram can also be used for
image compression and segmentation. Histograms are simple to calculate in software and also
lend themselves to economic hardware implementations, thus making them a popular tool for
real-time image processing.
2. 3 Histogram Equalization
ww
the form
S=T(r) 0 ≤ 𝑟 ≤ 1
that produce a level s for every pixel value r in the original image. For reasons that will become
w.E
obvious shortly, we assume that the transformation function T(r) satisfies the following conditions:
(a) T(r) is single-valued and monotonically increasing in the interval 0 ≤ 𝑟 ≤ 1
(b)0 ≤ 𝑇(𝑟) ≤ 1 for 0≤𝑟≤1
asy
The requirement in (a) that T(r) be single valued is needed to guarantee that the
En
inverse transformation will exist, and the monotonicity condition preserves the increasing order
from black to white in the output image. A transformation function that is not monotonically
gi
increasing could result in at least a section of the intensity range being inverted, thus producing
nee
some inverted gray levels in the output image. Condition (b) guarantees that the output gray
levels will be in the same range as the input levels.
rin
Figure 2.8 gives an example of a transformation function that satisfies these two
conditions. The inverse transformation from s back to r is denoted
r=T-1(s) 0 ≤ 𝑠 ≤ 1
g.n
e t
Fig. 2.8 Gray - level Transformation function that is both single value and
monotonically increasing.
P a g e | 35
The gray levels in an image may be viewed as random variables in the interval [0,1].One
of the most fundamental descriptors of a random variable is its probability density function (PDF).
Let pr(r) and ps(s) denote the probability density functions of random variables r and s,
respectively, where the subscripts on p are used to denote that pr and ps are different functions.
A basic result from an elementary probability theory is that, if pr(r) and T(r) are known
and satisfies condition (a), then the probability density function ps(s) of the transformed variable
s can be obtained using a rather simple formula:
Ps(s) = Pr(r)|𝑑𝑟|
𝑑𝑠
w.E
𝑟
S=T(r)= ∫0 𝑝𝑟(𝜔)𝑑𝜔
asy
Where ω is a dummy variable of integration.
En
The right side of equation is recognized as the cumulative distribution function (CDF) of
gi
random variable r. Since probability density functions are always positive, and recalling that the
nee
integral of a function is the area under the function, it follows that this transformation function is
single valued and monotonically increasing, and, therefore, satisfies condition (a). Similarly, the
integral of a probability density function for variables in the range [0,1] also is in the range [0, 1],
so condition (b) is satisfied as well.
rin
g.n
For discrete values the probability of occurrence of gray level rk in an image is
approximated by
pr(rk)=nk/n k=0,1,2,....L-1.
e
Where n is the total number of pixels in the image, nk is the number of pixels that have gray
level rk , and L is the total number of possible gray levels in the image. The discrete version of
t
the transformation function is
= ∑𝑘𝑗=0 𝑛𝑗 k=0,1,2,....L-1
𝑛
Thus, a processed (output) image is obtained by mapping each pixel with level rk in the
input image into a corresponding pixel with level sk in the output image.
P a g e | 36
Suppose the random variable u >0 with probability density pu(u) is to be transformed
to v>=0 such that it has a specified probability density pv(v). For this to be true, we define a
uniform random variable.
ww
Eliminating w, we obtain
w.E
If u and v are given as discrete random variables that take values xi and yi, i=0,1,...L-1 with
asy
probabilities pu(xi) and pv(yi) respectively, then v can be implemented approximately as
follows. Define
En
gi nee
Let ̃𝑛 such that it 𝑊
W denote the value 𝑊
is the output corresponding to u. Figure shows this algorithm. rin
̃𝑛 -W>0 for the smallest value of n. Then v=yn
g.n
e t
Fig.2.9 Histogram Specification
P a g e | 37
If the operation performed on the image pixels is linear then the filter is called a linear
spatial filter. Otherwise the filter is non linear spatial filter.
The mechanics of linear spatial filtering using a 3X3 neighborhood. At any point (x,y) in the
image, the response g(x,y) of the filter is the sum of products of the filter coefficients and the image
pixels encompassed by the filter.
g(x,y)= w(-1,-1)f(x-1,y-1)+ w(-1,0)f(x-1,y)+ .... w(0,0)f(x,y)+ ... w(1,1)f(x+1,y+1)
Observe that the center coefficients of the filter, w(0,0), aligns with the pixel at location (x,y). For a
mask of size mXn , we assume that m= 2a+1 and m=2b+1, where a and b are positive integers.
This means focus is on filters of odd size.
In general , linear spatial filtering of an image of size MXN with a filter of size mXn is given by the
expression:
ww
g(x,y)=∑𝑏𝑆=−𝑎 ∑𝑏𝑡=−𝑏 𝑤(𝑠, 𝑡)f(x+s,y+t)
where x and y are varied so that each pixel in w visits every pixel in f.
w.E
6.1.SMOOTHING SPATIAL FILTERS
Smoothing filters are used for blurring and for noise reduction. Blurring used in
asy
preprocessing tasks, such as removal of small details image prior to large object extraction, an
bridging of small gaps in lines or curves. Noise reduction can be accomplished by blurring with a
linear filter and by non linear filtering.
En
The output (response) of a smoothing, linear spatial filter is simply the average of the pixels
contained in the neighborhood of the filter mask. These are also called as averaging filters and
also referred as low pass filters.
gi nee
The idea behind smoothing filters is straight forward.By replacing the value of every pixel
in an image by the intensity levels in the neighborhood defined by fliter mask, this process results
rin
in in an image with reduced sharp transitions in intensities. Because random noise typically
consists of sharp transitions in intensity levels, the most obvious application of smoothing is noise
reduction. however , edges also are characterized by sharp intensity transitions, so averaging filters
have the undesirable side effect that they blur edges.
g.n
e
Another application of this type of process includes the smoothing of false contours that
t
result from using an insufficient number of gray levels. A major use of averaging filters is in the
reduction of “irrelevant” detail in an image. By “irrelevant” we mean pixel regions that are small with
respect to the size of the filter mask.
Figure s
.hows two 3X3 smoothing filters. Use of the first filter yields the standard average of the pixels
under the mask.
P a g e | 38
which is the average of the gray levels of the pixels in the 3X3 neighborhood defined by the mask.
Note that, instead of being 1/9, the coefficients of the filter are all 1’s.The idea here is that it is
computationally more efficient to have coefficients valued 1. At the end of the filtering process the
entire image is divided by 9.An mXn mask would have a normalizing constant equal to 1/mn. A
spatial averaging filter in which all coefficients are equal is sometimes called
a box filter.
This mask yields a so-called weighted average, terminology used to indicate that pixels are
ww
multiplied by different coefficients, thus giving more importance (weight) to some pixels at the
expense of others. In this mask figure the pixel at the center of the mask is multiplied by a higher
value than any other, thus giving this pixel more importance in the calculation of the average. The
w.E
other pixels are inversely weighted as a function of their distance from the center of the mask.
asy
En
gi nee
Fig 2.11 3X 3 smoothing averaging filter mask
rin
The diagonal terms are further away from the center than the orthogonal neighbors (by a
factor of √2) and, thus, are weighed less than these immediate neighbors of the center pixel. The
g.n
basic strategy behind weighing the center point the highest and then reducing the value of the
coefficients as a function of increasing distance from the origin is simply an attempt to reduce
e
blurring in the smoothing process. We could have picked other weights to accomplish the same
general objective. However, the sum of all the coefficients in the mask is equal to 16, an attractive
feature for computer implementation because it has an integer power of 2. In practice, it is difficult
in general to see differences between images smoothed by using either of the masks or similar
arrangements, because the area these masks span at any one location in an image is so small.
t
Order-Statistics Filters
Order-statistics filters are nonlinear spatial filters whose response is based on ordering
(ranking) the pixels contained in the image area encompassed by the filter, and then replacing the
value of the center pixel with the value determined by the ranking result. The best-known example
in this category is the median filter, which, as its name implies, replaces the value of a pixel by the
median of the gray levels in the neighborhood of that pixel (the original value of the pixel is included
in the computation of the median). Median filters are quite popular because, for certain types of
random noise, they provide excellent noise-reduction capabilities, with considerably less blurring
than linear smoothing filters of similar size. Median filters are particularly effective in the presence
of impulse noise, also called salt-and-pepper noise because of its appearance as white and black
dots superimposed on an image. The median, j, of a set of values is such that half the values in
the set are less than or equal to j, and half are greater than or equal to j. In order to perform median
filtering at a point in an image, we first sort the values of the pixel in question and its neighbors,
determine their median, and assign this value to that pixel.
For example, in a 3X3 neighborhood the median is the 5th largest value, in a 5X5 neighborhood
the 13th largest value, and so on. When several values in a neighborhood are the same, all equal
values are grouped. For example, suppose that a 3X3 neighborhood has values (10, 20, 20, 20,
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
P a g e | 39
15, 20, 20, 25, 100). These values are sorted as (10, 15, 20, 20, 20, 20, 20, 25, 100), which results
in a median of 20. Thus, the principal function of median filters is to force points with distinct gray
levels to be more like their neighbors. In fact, isolated clusters of pixels that are light or dark with
respect to their neighbors, and whose area is less than n2/2 (one-half the filter area), are eliminated
by an nXn median filter. In this case “eliminated” means forced to the median intensity of the
neighbors. Larger clusters are affected considerably less. Although the median filter is by far the
most useful order-statistics filter in image processing, it is by no means the only one. The median
represents the 50th percentile of a ranked set of numbers, but the reader will recall from basic
statistics that ranking lends itself to many other possibilities. For example, using the 100th
percentile results in the so-called max filter, which is useful in finding the brightest points in an
image. The response of a 3X3 max filter is given by R=max {zk| k=1, 2,p , 9}.The 0th percentile
filter is the min filter, used for the opposite purpose.
w.E
printing and medical imaging to industrial inspection and autonomous guidance in military systems.
The derivatives of a digital function are defined in terms of differences. There are various
ways to define these differences. However, we require that any definition we use for a first
asy
derivative (1) must be zero in flat segments (areas of constant gray-level values); (2) must be
nonzero at the onset of a gray-level step or ramp; and (3) must be nonzero along ramps. Similarly,
any definition of a second derivative (1) must be zero in flat areas; (2) must be nonzero at the onset
En
and end of a gray-level step or ramp; and (3) must be zero along ramps of constant slope. Since
we are dealing with digital quantities whose values are finite, the maximum possible gray-level
adjacent pixels.
gi
change also is finite, and the shortest distance over which that change can occur is between
nee
A basic definition of the first-order derivative of a one-dimensional function f(x) is the difference
P a g e | 40
ww
w.E
asy Fig 2.12
En
Let us consider the properties of the first and second derivatives as we traverse the profile
from left to right. First, we note that the first-order derivative is nonzero along the entire ramp, while
gi
the second-order derivative is nonzero only at the onset and end of the ramp. Because edges in
an image resemble this type of transition, we conclude that first-order derivatives produce “thick”
nee
edges and second-order derivatives, much finer ones. Next we encounter the isolated noise point.
Here, the response at and around the point is much stronger for the second- than for the first-order
derivative. Of course, this is not unexpected. A second-order derivative is much more aggressive
rin
than a first-order derivative in enhancing sharp changes. Thus, we can expect a second-order
derivative to enhance fine detail (including noise) much more than a first-order derivative. The thin
g.n
line is a fine detail, and we see essentially the same difference between the two derivatives. If the
maximum gray level of the line had been the same as the isolated point, the response of the second
e
derivative would have been stronger for the latter. Finally, in this case, the response of the two
derivatives is the same at the gray-level step (in most cases when the transition into a step is not
from zero, the second derivative will be weaker).We also note that the second derivative has a
transition from positive back to negative. In an image, this shows as a thin double line. This “double-
edge” effect is an issue that will be important, where we use derivatives for edge detection. It is of
t
interest also to note that if the gray level of the thin line had been the same as the step, the response
of the second derivative would have been stronger for the line than for the step.
In summary, comparing the response between first- and second-order derivatives, we arrive
at the following conclusions. (1) First-order derivatives generally produce thicker edges in an
image. (2) Second-order derivatives have a stronger response to fine detail, such as thin lines and
isolated points. (3) First order derivatives generally have a stronger response to a gray-level step.
(4) Second- order derivatives produce a double response at step changes in gray level.
We also note of second-order derivatives that, for similar changes in gray-level values in
an image, their response is stronger to a line than to a step, and to a point than to a line. In most
applications, the second derivative is better suited than the first derivative for image enhancement
because of the ability of the former to enhance fine detail. For this, and for reasons of simpler
implementation and extensions, we will focus attention initially on uses of the second derivative for
enhancement. Although the principle of use of first derivatives in image processing is for edge
extraction, they do have important uses in image enhancement.
P a g e | 41
Smoothing is achieved in the frequency domain by high frequency attenuation, by low pass
filtering. We consider three types of low pass filter, ideal, Butterworth and Gaussian.These three
catogories cover the range from very sharp(ideal) to very smooth(Gaussian) filtering. The
Butterworth filter has a parameter called the filter order. For high order values, the Butterworth
filter approaches the ideal filter. For lower order values, the Butterworth filter is more like a
Gaussian filter. Thus the Butterworth filter may be viewed as providing a transition between two
extremes.
A 2 D lowpass filter that passes without attenuation all frequencies within a circle of radius
D0 from the origin and cuts off all frequencies outside this circle is called Ideal Lowpass filter.It is
specified by the functon
ww
w.E
asy
Where D0 is a positive constant and D(u,v) isthe distance between a point (u,v) in the frequency
domain and the center of the frequency rectangle
En
gi nee
where P and Q are the padded sizes. Figure (a) shows a perspective plot of H(u,v). Figure (b)
shows the filter displayed as an image.
rin
g.n
e t
Fig 2.13 a)Perspective plot of an ideal low pass filter TF
The name ideal indicates that all frequencies on or inside a circle of radius D0 are passed without
attenuation, whereas all frequencies outside the circle are completely attenuated. The ideal low
pass filter is radially symmetric about the origin, which means that the filter is completely
defined by a radial cross section as shown in figure.. Rotating the cross section by 360֯ yields the
filter in 2D.
P a g e | 42
For an ILPF cross section, the point of transition between H(u,v)=1 and H(u,v)=0 is called
w.E
cut off frequency. In the figure the cut off frequency is D0. The sharp cutoff frequencies of an ILPF
cannot be realized with electronic components , although they certainly can be simulated in a
computer.
asy
One way to establish a set of standard cut off frequency loci is to compute circles that
En
enclose specified amounts of total image power PT. This quantity is obtained by summing the
components of the power spectrum of the padded images at each point (u,v) for u=0,1,...P-1 and
v=0,1,2.... Q-1.
gi nee
rin
g.n
If the DFT has been centered, a circle of radius D0 with origin at the center of the frequency
rectangle encloses α percent of the power where
e t
and the summation is taken over values of (u,v) that lie inside the circle or on its boundary.
Figure a and b shows a test pattern image and its spectrum. The circles superimposed on the
spectrum have radii of 10,30, 60, 160 and 460. These circles encloses α percent of the image
power for α= 87,93.1 95.7, 97.8 and 99.2% respectively. The spetrum with 87% of toatal power
being enclosed by a relatively small circle of radius 10.
P a g e | 43
ww
w.E
asy
Fig 2.15 Test image and its Fourier spectrum
En
The transfer function of a Butterworth low pass filter (BLPF) of order n and with cut off
gi
frequency at a distance D0 from the origin is defined as
nee
rin
g.n
This figure shows a perspective plot, image display and radial cross sections of BLPF function.
e t
Fig 2.16 Perspective plot of a Filter displayed as an image Filter radial cross section of
Butterworth lowpass filter orders
Unlike the ILPF, the BLPF transfer function does not have a sharp discontinuity that gives
clear cutoff between passed and filtered frequencies. For filters with smooth transfer function
defining a cutoff frequency locus at points for which H(u,v) is down to acertain fraction of its
maximum value is customary.
P a g e | 44
Gaussian low pass filter of one dimension as an aid in exploring some important relationships
between spatial and frequency domains. The form of these filters in frequency domain is given by
where D(u,v) is the distance from the center of the frequency rectangle. σ is a measure of spread
about the center. By letting σ = D0
ww
w.E
The table shows the inverse Fourier transform of the GLPF is Gaussian also. This means
asy
that a spatial Gaussian filter obtained by computing the IDFT of the above equation.
En
gi nee
rin
Fig 2.17 a.Perspective plot of a GLPF Transfer function g.n
b.Filter displayed as an image
c. Filter radial cross section for various values of D0
9.IMAGE SHARPENING USING FREQUENCY DOMAINFILTERS e t
Image sharpening can be achieved in the frequency domain by high pass filtering, which
attenuates the low frequency components without disturbing high frequency information in the
Fourier transform.
A high pass filter is obtained from a given low pass filter using the equation
where HLP(u,v) is the transfer function of the low pass filter. When the low pass filter attenuates
frequencies, the high pass filter them and vice versa.
P a g e | 45
where D0 is the cut off frequency. The IHPF is the opposite of the ILPF in the sense that it sets to
zero all frequencies inside a circle of radius D0 while passing, without attenuation, all frequencies
outside the circle. As in the case of the ILPF, the IHPF is not physically realizable.
ww
w.E
As with low pass filter we can expect Butterworth high pass filter to behave smoother than IHPFs.
asy
The boundaries are less distorted even for the smallest values of cutoff frequencies. The
transition into higher values of cut off frequencies is much smoother with the BHPF.
nee
rin
g.n
e t
P a g e | 46
PART A
Frequency domain
ww
techniques are based on modifying the Fourier transform of an image.
3. What do you mean by Point processing?
w.E
Image enhancement at any Point in an image depends only on the gray level at that point is often
referred to as Point processing.
4.What is gray level slicing?
asy
Highlighting a specific range of gray levels in an image is referred to as gray level slicing. It is used
in satellite imagery and x-ray images
En
5. What do you mean by Mask or Kernels.
gi nee
A Mask is a small two-dimensional array, in which the value of the mask coefficient determines the
nature of the process, such as image sharpening.
6. What is Image Negative?
rin
The negative of an image with gray levels in the range [0, L-1] is obtained by using the negative
g.n
transformation, which is given by the expression. s = L-1- r ,Where s is output pixel, r is input pixel
7. Define Histogram.
e
The histogram of a digital image with gray levels in the range [0, L-1] is a discrete function
h (rk) = nk, where rk is the kth gray level and nk is the number of pixels in the image having gray t
level rk .
8.What is histogram equalization
It is a technique used to obtain linear histogram . It is also known as histogram linearization.
Condition for uniform histogram is Ps(s) = 1
9. What is contrast stretching?
Contrast stretching reduces an image of higher contrast than the original, by darkening the levels
below m and brightening the levels above m in the image.
10. What is spatial filtering?
Spatial filtering is the process of moving the filter mask from point to point in an image. For linear
spatial filter, the response is given by a sum of products of the filter coefficients, and the
corresponding image pixels in the area spanned by the filter mask.
11. Define averaging filters.
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
P a g e | 47
The output of a smoothing, linear spatial filter is the average of the pixels contained in the
neighborhood of the filter mask. These filters are called averaging filters.
The 100th percentile is maximum filter is used in finding brightest points in an image.
The 0th percentile filter is minimum filter used for finding darkest points in an image.
14. Define high boost filter.
ww
High boost filtered image is defined as
HBF = A (original image) – LPF
w.E
= (A-1) original image + original image – LPF
HBF = (A-1) original image + HPF
asy
15. State the condition of transformation function s=T(r).
i.T(r) is single-valued and monotonically increasing in the interval 0_r_1 and
ii.0_T(r) _1 for 0_r_1.
En
gi
16. Write the application of sharpening filters.
The applications of sharpening filters are as follows,
nee
i.Electronic printing and medical imaging to industrial application
ii.Autonomous target detection in smart weapons. rin
17. Name the different types of derivative filters.
g.n
The different types of derivative filters are
i.Perwitt operators
ii.Roberts cross gradient operators
e t
iii.Sobel operators.
Part B
P a g e | 48
ww
w.E
asy
En
gi nee
rin
g.n
e t
P a g e | 49
UNIT 3
ww IMAGE RESTORATION AND
w.E
SEGMENTATION
asy
REFERRED BOOK:
En
gi
1. Rafael C. Gonzalez, Richard E. Woods - Digital Image Processing,
2. Anil K. Jain - Fundamentals of Digital Image Processing.
nee
rin
g.n
e t
P a g e | 50
UNIT III
IMAGE RESTORATION
1.Arithmetic Mean Filter
This is the simplest mean filter. Let Sxy represent the set of coordinates in a
rectangular subimage window(neighborhood) of size mXn centered at point (x,y). The
arithmetic mean filter computes the average value of the corrupted image g(x,y) in the area
defined by Sxy. The value of the restored image f at point (x,y) is simply the arithmetic
mean computed using the pixels in the region defined by S xy.
ww
w.E
This operation can be implemented using a spatial filter of size mXn in which all coefficients
have value 1/mn. A mean filter smoothes local variations in an image and noise is reduced
as a result of blurring.
Geometric mean filter
asy
En
An image restored using geometric mean filter is given by the expression
gi nee
rin
g.n
e
Here, each restored pixel is given by the product of the pixels in the sub image window, raised
to the power 1/mn . A geometric mean filter achieves smoothing comparable to the arithmetic
mean filter, but it tends to lose less image detail in the process.
t
Harmonic mean filter
The harmonic mean filtering operation is given by the expression
P a g e | 51
The harmonic mean filter works well for salt noise, but fails for pepper noise. It does well also
with other types of noise like Gaussian noise.
ww
w.E
Where Q is called the order of the filter.
asy
This filter is well suited for reducing virtually, eliminating the effects of salt and pepper noise.
For positive values of Q the filter eliminates pepper noise. For negative values of Q eliminates
En
salt noise. It cannot do both simultaneously. Note that the contraharmonic filter reduces to the
arithmetic mean filter if Q=0 ,and to the harmonic mean filter Q= -1 .
2.ORDER STATISTICS FILTER
gi nee
Order statistics filters are spatial filters whose response is based on ordering the values of the
pixels contained in the image area encompassed by the filter. The ranking result determines the
rin
response of the filter.
Median filter
g.n
The best known order statistics filter is the median filter, which as its name implies, replaces
the value of a pixel by the median of the intensity levels in the neighborhood of that pixel.
e t
The value of the pixel at (x,y) is included in the computation of the median. Median filters are quite
popular because for certain types of random noise, they provide excellent noise reduction
capabilities, with considerably less blurring than linear smoothing filters of similar size. Median
filters are particularly effective in the presence of both bipolar and unipolar impulse noise.
Max and Min filters
The 100th percentile filter is called the max filter.
P a g e | 52
This filter is used for finding the brightest points in an image. Also, because pepper noise has
very low values, it is reduced by this filter as a result of the max selection process in the subimage
area.
The 0th percentile filter is the min filter
ww
This filter is used for finding the darkest point in an image. Also it reduces salt noise as a result
of min operation.
MidPoint filter
w.E
The mid point filter computes the mid point between the maximum and minimum values in
asy
the area encompassed by the filter.
En
gi nee
It works best for randomly distributed noise, like Gaussian or uniform noise.
3.ADAPTIVE FILTER
rin
g.n
Adaptive filter is used to operate on a local region Sxy.The response of a filter at any point (x,y) on
which the regio is centered is to be based on four quantities.
a) g(x,y) the value of the noisy image at x,y
b) σ2η the variance of noise corrupting f(x,y) to form g(x,y)
c) mL the local mean of the pixel in Sxy
e t
d) σ2L the local variance of the pixel in Sxy
P a g e | 53
ww
techniques. The basic idea is that periodic noise appears as concentrated bursts of energy in the
fourer transform at locations corresponding to the frequencies of the periodic interference. The
approach is to use a selective filter to isolate the noise. The three types of selective filters are
w.E
1) Band reject filter 2) Band pass filter 3) Notch filter
asy
One of the principal applications of Bandreject filter is for noise removal in applications
where the general location of the noise components in the frequency domain is approximately
En
known. A good example is an image corrupted by additive periodic noise that can be approximated
as two dimensional sinusoidal function. It is not difficult to show that the fourier transform of a
gi
sine consists of two impulses that are mirror images of each other about the origin of the transform.
The impulse are both imaginary and are complex conjugate of each other.
4.1.BAND PASS FILTER
nee
by using the rin
A band pass filter performs the opposite direction of a band reject filter. The transfer function
HBP(u,v) of a band pass filter is obtained from a corresponding band reject filter wit transfer function
HBR(u,v) equation
g.n
e
Performing straight bandpass filter on an image is not a common procedure but it generally t
remove too much of image detail. However bandpass filtering is quite useful in isolating the effects
on an image caused by selected frequency band. In this band pass filtering most image detail was
lost but the information that remains is useful, as it is clear that noise pattern recovered using this
method is quite close to the noise that corrupted the image. In other words bandpass filtering help
to isolate the noise pattern. This is the useful result because it simplifies analysis of the noise
reasonably independent of image content.
5.NOTCH FILTER
A notch filter rejects or passes frequencies in predefined neighborhoods about a center
frequency. 3 D plots of ideal, Butterworth and Gaussian notch filters are shown below. Due to the
symmetry of the Fourier transform notch filter must appear in symmetric pair about the origin in
order to obtain the meaningful results.
P a g e | 54
ww
w.E
Fig 3.1 Perspective plots of a) ideal b) Butterworth c) Gaussian notch filters
The transfer function of Notch filters are given by
asy
En
Where HNP(u,v)is the transfer function of the notch pass filter corresponding to the notch reject
filter with transfer function HNR(u,v).
gi nee
Notch filtering approach that follows reduces the noise in the image without introducing the
appreciable blurring.
rin
The method here is optimum in the sense that minimizes local variance of the restored
estimate 𝑓̂(𝑥, 𝑦).
g.n
The procedure consists of first isolating the principal contributions of the interference pattern
e
and then subtracting a variable, weighted portion of the pattern from the corrupted image. The
basic approach is quite general and can be applied to other restoration tasks in which multiple
periodic interference is a problem. t
The first step is to extract the principal frequency components of the interference pattern.
This can be done by placing a notch pass filter HNP(u,v) at the location of each spike. If the filter is
constructed to pass only components associated with the interference pattern, then the fourier
transform of the interference noise pattern is given by the expression
P a g e | 55
where 𝑓̂(𝑥, 𝑦) is the estimate of f(x,y) and w(x,y) is to be determined. The function w(x,y) is called
a weighting or modulation function and the objective of the procedure is to select this function so
that the result is optimized in some meaningful way.
One approach is to select w(x,y) so that variance of the estimate 𝑓̂(𝑥, 𝑦) is minimzed over a
specified neighborhood of every point (x,y).
Consider a neighborhood of size (2a+1) by (2b+1) about a point (x,y). The local variance of
𝑓̂(𝑥, 𝑦) at coordinates (x,y) can be estimated from the samples as follows.
ww
w.E
asy
En
Points on or near the edge of the image can be treated by considering partial neighborhood or by
padding the border with 0s.
gi nee
rin
g.n
e
Assuming that w(x,y) remains essentially constant over the neighborhood gives the approximation t
in the neighborhood. With these approximations the equation becomes,
P a g e | 56
6.INVERSE FILTERING
ww The simplest approach to restoration is direct inverse filtering, where we compute an estimate
w.E
of the transform of the original image simply by dividing the transform of the degraded image, by
the degradation function:
asy
En
The division is an array operation.
The image in figure 3.2 was inverse filtered with F(u,v) using the exact inverse of the degradation
function that generated that image. That is, Inverse filtering. the degradation function used
was
with K: 0.0025 the M/2 and N/2 constants are offset values; they center the function so that
it will correspond with the centered Fourier transform. In this case, M=N=480. We know that
a Gaussian-shape function has no zeros, so that will not be a concern here.
However, the degradation values became so small that the result of full inverse filtering Fig.
3.2 is useless. Figures (b) through (d) show the results of cutting off values of the ratio
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
P a g e | 57
G(u,v)/H(u,v) outside a radius of 40,70, and 85,respectively. The cut off was implemented by
applying to the ratio a Butterworth lowpass function of order 10.
This provided a sharp (but smooth) transition at the desired radius. Radii near 70 yielded
the best visual results [Fig. 3.2 (c)). Radius values below that tended toward blurred images, as
illustrated in Fig 3.2(b), which was obtained using a radius of 40.
Values above 70 started to produce degraded images, as illustrated in Fig.3.2 (d), which
was obtained using a radius of 85. The image content is almost visible in this image behind a
"curtain" of noise, but the noise definitely dominates the result. Further increases in radius values
produced images that looked more and more like Fig.3.2(a).
ww
w.E
asy
En
Fig 3.2
gi nee
rin
7.MINIMUM MEAN SQUARE ERROR (WIENER) FILTERING
g.n
The inverse filtering approach discussed in the previous section makes no explicit
e
provision for handling noise. In this section, we discuss an approach that incorporates both the
degradation function and statistical characteristics of noise into the restoration process.
where E{.} is the expected value of the argument. It is assumed that the noise and the image
are uncorrelated; that one or the other has zero mean; and that the intensity levels in the
estimate are a linear function of the levels in the degraded image. Based on these conditions,
the minimum of the error function is given in the frequency domain by the expression
P a g e | 58
ww This result is known as the Wiener filter. The filter, which consists of the terms inside the
brackets, also is commonly referred to as the minimum mean square error filter or the least square
w.E
error filter. The terms in 𝐹̂ (𝑢, 𝑣) are as follows:
asy
En
gi nee
H(u,v) is the transform of the degradation function and G(u,v) is the transform of
rin
the degraded image. The restored image in the spatial domain is given by the inverse Fourier
g.n
transform of the frequency-domain estimate F(u,v). Note that if the noise is zero, then the noise
power spectrum vanishes and the Wiener filter reduces to the inverse filter.
e
A number of useful measures are based on the power spectra of noise and of the
undegraded image. One of the most important is the signal-to-noise ratio, approximated using
frequency domain quantities such as t
This ratio gives a measure of the level of information bearing signal power (i.e., of the original,
undegraded image) to the level of noise power. Images with low noise tend to have a high SNR
and, conversely, the same image with a higher level of noise has a lower SNR.
This ratio by itself is of limited value, but it is an important metric used in characterizing the
performance of restoration algorithms. The mean square error given in statistical form can be
approximated also in terms a summation involving the original and restored images:
P a g e | 59
ww The closer 𝑓̂ and f are, the larger this ratio will be. Sometimes the square root of these
measures is used instead, in which case they are referred to as the root-mean-square- signal-to-
w.E
noise ratio and the root-mean-square-error, respectively. An approach used frequently when
these quantities are not known or cannot be estimated is to approximate 𝐹̂ (𝑢, 𝑣) by the
asy
expression
En
gi nee
where K is a specified constant. Fig 3.3 Shows the comparison of inverse and Wiener filtering
rin
g.n
e t
Fig 3.3 Comparison of Inverse and Wiener Filtering.
a) Result of Inverse Filtering, (b) Radially limited inverse, (c) Wiener Filtering
P a g e | 60
PART A
1. Define Restoration.
Restoration is a process of reconstructing or recovering an image that has been degraded by using
a priori knowledge of the degradation phenomenon. Thus restoration techniques are oriented
towards modeling the degradation and applying the inverse process inorder to recover the original
image.
2. How a degradation process is modeled?
A system operator H, which together with an additive white noise term n(x,y) a operates on an input
image f(x,y) to produce a degraded image g(x,y).
3. What is homogeneity property and what is the significance of this property?
ww
Homogeneity property states that
H [k1f1(x,y)] = k1H[f1(x,y)]
w.E
Where H=operator
K1=constant
asy
f(x,y)=input image. It says that the response to a constant multiple of any input is equal to the
response to that input multiplied by the same constant.
4.What is meant by image restoration?
En
knowledge of the degrading phenomenon.
nee
rin
A square matrix, in which each row is a circular shift of the preceding row and the first row is a
circular shift of the last row, is called circulant matrix
6. What is the concept behind algebraic approach to restoration?
g.n
predefined criterion of performance where f is the image.
7. Why the image is subjected to wiener filtering?
e
Algebraic approach is the concept of seeking an estimate of f, denoted f^, that minimizes a
t
This method of filtering consider images and noise as random process and the objective is to find
an estimate f^ of the uncorrupted image f such that the mean square error between them is
minimized. So that image is subjected to wiener filtering to minimize the error.
8. Define spatial transformation.
Spatial transformation is defined as the rearrangement of pixels on an image plane.
9. Define Gray-level interpolation.
Gray-level interpolation deals with the assignment of gray levels to pixels in the spatially
transformed image.
10. Give one example for the principal source of noise.
The principal source of noise in digital images arise image acquisition (digitization)and/or
transmission. The performance of imaging sensors is affected by a variety of factors, such as
environmental conditions during image acquisition and by the quality of the sensing elements. The
factors are light levels and sensor temperature.
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
P a g e | 61
11. When does the degradation model satisfy position invariant property?
13. Which is the most frequent method to overcome the difficulty to formulate the spatial
relocation of pixels?
The point is the most frequent method, which are subsets of pixels whose location in the input
ww
(distorted) and output (corrected) imaged is known precisely.
14. What are the three methods of estimating the degradation function?
w.E
The three methods of degradation function are, i. Observation ii. Experimentation iii. Mathematical
modeling.
asy
15. How the blur is removed caused by uniform linear motion?
An image f(x,y) undergoes planar motion in the x and y-direction and x0(t) and y0(t)are the time
En
varying components of motion. The total exposure at any point of the recording medium (digital
memory) is obtained by integrating the instantaneous exposure over the time interval during which
the imaging system shutter is open.
16. What is inverse filtering? gi nee
The simplest approach to restoration is direct inverse filtering, an estimate F^(u,v) of the transform
rin
of the original image simply by dividing the transform of the degraded image G^(u,v) by the
degradation function.
17. Give the difference between Enhancement and Restoration.
g.n
Enhancement technique is based primarily on the pleasing aspects it might present to the viewer.
t
Image restoration or degradation is a process that attempts to reconstruct or recover an image that
has been degraded by using some clear knowledge of the degradation phenomena.
Degradation may be in the form of Sensor noise , blur due to camera misfocus, Relative object
camera motion
19. What is unconstrained restoration.
It is also known as least square error approach.
n = g-Hf
To estimate the original image f^, noise n has to be minimized and f^ = g/H
20.Draw the image observation model.
.
P a g e | 62
ww
w.E
asy
En
gi nee
rin
g.n
e t
P a g e | 63
ww
w.E
asy
En
ginee
rin
g.n
e t
P a g e | 64
ww UNIT 4
IMAGE
w COMPRESSION .Ea
REFERRED BOOK:
syE
ngi
1. Rafael C. Gonzalez, Richard E. Woods - Digital Image Processing,
2. Jayaraman- Digital Image Processing.
nee
rin
g.n
e t
ww
w.E
asy
E ngi
nee
rin
g.n
et
P a g e | 65
UNIT IV
IMAGE COMPRESSION
1.INTRODUCTION
Wavelet transform is the transformation which makes easy to compress, transmit and
analyze many images. Unlike Fourier transform whose basis functions are sinusoids, wavelet
transforms are based on small waves called wavelets, of varying frequency and limited duration.
This allows them to provide the equivalent of a musical score for an image, revealing not only
what notes to play but also when to play them. Fourier transform, on the other hand provide not
only the notes or frequency information, temporal information is lost in the transformation
process.
Wavelets are the foundation of a powerful new approach to signal processing and
ww
analysis called multiresolution theory. Multiresolution theory incorporates and unifies techniques
from a variety of disciplines, including subband coding from signal processing, quadrature mirror
filtering from digital speech recognition and pyramidal image processing. Multiresolution theory
w.E
is concerned with the representation and analysis of signal at more than one resolution.
If the objects are small in size or low in contrast, we normally examine them at high
resolutions; if they are large in size or high in contrast, a coarse view is all that is required. If both
asy
small and large objects and low and high contrast objects are present simultaneously, it can be
advantageous to study them at several resolutions. This is the fundamental motivation for
multiresolution processing.
2.Subband Coding
En
gi
Another important imaging technique with ties to multiresolution analysis is subband coding. In
nee
subband coding, an image is decomposed into a set of bandlimited components, called
subbands. The decomposition is performed so that the subbands can be reassembled to
reconstruct the original image without error. Because the decomposition and reconstructions are
performed by means of digital filters.
rin
g.n
e t
Consider the simple digital filter and it is constructed from three basic components unit delays,
multipliers and adders. Along the top of the filter, unit delays are connected in series to create
K-1 delayed versions of the input sequence f(n). Delayed sequence f(n-2).
P a g e | 66
The K multiplication constant in figure and the above equation are called filter coefficients. Each
coefficient defines a filter tap and the filter is said to be of the order K.
ww
w.E
By substituting 𝛿(𝑛) for input f(n) and making use of shifting property of the unit discrete impulse
, we find that the impulse response of the filter is the K element sequence of filter coefficients
asy
that define the filter. Physically, the unit impulse is shifted from the left to right across the top of
the filter, producing an output that assumes the value of the coefficient at the location of the
delayed impulse. Because there are K coefficients, the impulse response is of length K and the
En
filter is called a finite impulse response (FIR) filter.
gi
The below figureshows the impulse response of 6 functionally related filters. Filter h 2(n) in the
figure B is a signed reverse version of h1(n).i.e. h2(n) = -h1(n).
nee
Filters h3(n) and h4(n) in figure c and d are order reversed version of h 1(n).
h3(n) = h1(-n)
e
(i.e.shifted) version of h1(n. Neglected transalation, the responses of the two filters are identical.
t
Filter h5(n) in fig e, which is defined as
h5(n) = (-1)nh1(n)
is called a modulated version of h1(n). Because modulation changes the signs of all odd indexed
coefficients, h5(1) = -h1(1) and h5(3) = -h1(3),while h5(0) = h1(0) and h5(2) = h1(2). Finally, the
sequence shown in fig f is an order reversed version of h1(n) that is also modulated:
h6(n) = (-1)nh1(K-1-n)
This sequence is included to illustrate the fact that sign reversal, order reversal, and modulation
are sometimes combined in the specification of the relationship between two filters.
P a g e | 67
ww
w.E
Fig 4.2 Six functionally related filter response a) reference response b) sign reversal
c) and d) order reversal e) modulation f) order reversal and modulation
asy
Consider the two band subband codingand decoding system. As indicated in the figure,
the system is composed of two filter banks, each containing two FIR filters of the type. Note that
En
each of the four FIR filters is depicted as a single block, with the impulse response of each filter
written inside it. The analysis filter bank, which includes filters h0(n) and h1(n),is used to break
input sequence f(n) into two half length sequences f lp(n) and fhp(n), the subband that represent
gi
the input. Note that filters h0(n) and h1(n) are half band filters whose idealized transfer
nee
characteristics, h0 and h1. Filter h0(n) is a lowpass filter whose output, subband flp(n), is called an
approximation of f(n); filter h1(n) is a highpass filter whose output, subband fhp(n), is called the
high frequency ordetail part of f(n). Synthesis bankfilters g0(n) and g1(n) combine flp(n) and fhp(n)
rin
to produce f(n). The goal in subband coding is to select h 0(n), h1(n), g0(n) and g1(n) so that f(n)
= f(n). That is, so that the input and output of the subband coding and decoding system are
identical. When this is accomplished, the resulting system is said to employ perfect
reconstruction filters.
g.n
Thereare many two band, real coefficient, FIT, perfect reconstruction filter banks described in
e
the filter bank literature. In all of them, the synthesis filters are modulated versions of the analysis
filters-with one synthesis filter being sign reversed as well. For perfect reconstruction, the
impulse responses of the synthesis and analysis filters must be related in one of the following
two ways:
t
g0(n) = (-1)nh1(n)
g1(n) = (-1)n+1h0(n)
or
g0(n) = (-1)n+1h1(n)
g1(n) = (-1)nh0(n)
P a g e | 68
ww
w.E
asy
En
gi nee
Fig 4.3 a) A two band subband coding and decoding b) its spectrum splitting properties
rin
Filters h0(n), h1(n), g0(n) and g1(n) are said to be cross modulated because diagonally
g.n
opposed filters are related by modulation. Moreover, they can be shown to satisfy the following
bioorthogonality condition:
e t
Here, [hi(2n-k),gj(k)] denotes the inner product of hi(2n-k) and gj(k). When i is not equal to j, the
inner product is 0; when i and j are equal, the product is the unit discrete impulse function, (n).
Of special interest in subband coding - and in the development of the fast wavelet transform -are
filters that move beyond bioorthogonality and require
which defines orthonormality for perfect reconstruction filter banks. In addition to the above
equation, orthonormal filters can be shown to satisfy the following two conditions:
P a g e | 69
where the subscript on Keven is used to indicate that the number of filter coefficients must be
divisible by 2. Synthesis filter g1 is related to g0 by order reversal and modulation. In addition,
both h0 and h1 are order reversed versions of synthesis filters, g 0 and g1, respectively. Thus, an
othonormal filter bank can be developed around the impulse response of a single filter, called
the protype; the remaining filters can be computed from the specified prototypes impulse
response. For bioorthogonal filter banks, two prototypes are required. The generation of useful
prototype filters, whether orthonormal or biorthogonal, is beyond the scope of this chapter.
We note that 1-D orthogonal and biorthogonal filters can be used as 2-D separable filters for the
ww
processing of images. The seperable filters are applied in one dimension. Moreover down
sampling is performed in two stages - once before the second filtering operation to reduce the
overall number of computations. The resulting filtered outputs, denoted a(m,n), dv(m,n), dh(m,n),
w.E
and dD(m,n), are called the approximation, vertical detail, horizontal detail, and diagonal detail
subbands of the input image, respectively. These subbands can be split into four smalled
subbands, which can be split again, and so on.
asy
En
gi nee
rin
g.n
3.MULTIRESOLUTION EXPANSION
e
Fig 4.4 A two dimensional four band filter bank for subband image coding
t
The three well known imaging techniques that play an important role in a mathematical
framework called multiresolution analysis(MRA). In MRA a scaling function is used to create a
series of approximations of a function or image, each differing by a factor of 2 in resolution from
its nearest neighboring approximations. Additional functions called wavelets are then used to
encode the difference in information between adjacent approximations.
Series Expansions
A signal or function f(x) can often be better analyzed as a linear combination of expansion
functions
P a g e | 70
Where K is an integer index of a finite or infinite sum , the α k are real valued expansion
coefficients, and the 𝜑𝑘 (𝑥) are real valued expansion functions. If the expansion is unique, then
there is only one set of αk for any given for any given f(x) , the 𝜑𝑘 (𝑥) are called basis functions
and the expansion set, {𝜑𝑘 (𝑥)} is called a basis for the class of functions that can be so
expressed. The expressible functions form a function space that is referred to as the closed span
of the expansion set denoted as
ww
For any function space V and corresponding expansion set {𝜑𝑘 (𝑥)} there is a set of dual function
denoted {𝜑 ̃𝑘̇ (𝑥)} that can be used to compute the α k, coefficients for any f(x) ∈ 𝑉 . These
coefficients are computed by taking the integral inner products of the dual 𝜑̃𝑘̇ (𝑥) and function
w.E
f(x).
nee
rin
̃𝑘 (𝑥) and αk becomes
the basis and its dual are equivalent. That is 𝜑𝑘 (𝑥) = 𝜑
g.n
The αk are computed as the inner products of the basis functions and f(x).
e t
Case 2: If the expansion functions are not orthogonal but are an orthogonal basis for V, then
P a g e | 71
Case 3: If the expansions set is not basis for V, but supports the expansion, it is spanning set in
which there is more than one set of αk for any f(x) ∈ 𝑉. The expansion function and their duality
is said to be over complete or redundant. They form a frame in which
Dividing this equation by the norm squared of f(x), we see that A and B frame the normalised
inner products of the expansion coefficients and the function.If A=B the expansion set is called
a tight frame then
ww
w.E
Scaling Function
Consider the set of expansion functions composed of integer translations and binary
asy
scaling of the real, square integrable function 𝜑(𝑥) this is the set 𝜑𝑗,𝑘 (𝑥) where
En
gi nee
for all j,k ∈ 𝑍 and 𝜑(𝑥) ∈ 𝐿2 (𝑅). Here K determines the position of 𝜑𝑗,𝑘 (𝑥) along the x axis and j
determines the width of 𝜑𝑗,𝑘 (𝑥), that is how broad or narrow it is along the x axis. The term 2j/2
controls the amplitude of the function. Because the shape 𝜑𝑗,𝑘 (𝑥) changes with j, 𝜑(𝑥) is called
rin
scaling function..By choosing 𝜑(𝑥) properly 𝜑𝑗,𝑘 (𝑥) can be made to span 𝐿2 (𝑅) which is the set
of all measureable square integrable functions.
g.n
If we restrict j to a specific value, j=j 0 the resulting expansion set {𝜑𝑗0 ,𝑘 (𝑥)}, is a subset of
𝜑𝑗,𝑘 (𝑥) that spans a subspace of 𝐿2 (𝑅).
We can define that subspace as
e t
That is Vjo is the span of 𝜑𝑗0 ,𝑘 (𝑥) over k. If f(x) ∈ Vjo, we can write
More generally, we will denote the subspace spanned over k for any j as
P a g e | 72
Increasing j increases the size Vj allowing functions with smaller variations or finer detail to be
included n the subspace. As j increases the 𝜑𝑗,𝑘 (𝑥) are used to represent the subspace functions
become narrower and separated by smaller changes in x.
Four fundamental requirements of multiresolution analysis.
MRA Requirement 1: The scaling function is orthogonal to its integer translates.
MRA Requirement 2: The subspace spanned by the scaling function at low scales are nested
within those spanned at higher scales.
MRA Requirement 3: The only function that is common to all Vj is f(x)=0.
ww
MRA Requirement 4: Any function can be represented with arbitrary precision.
w.E
Wavelet Functions
A wavelet function 𝜑(𝑥) that together with its integer translates and binary scalings ,
spans the difference between any two adjacent scaling subspaces Vj and Vj+1.
asy
En
gi nee
rin
g.n
Fig 4.5 A nested function spaces spanned by a scaling function
e t
The space of all measurable square integrable function is defined as
P a g e | 73
which eliminates the scaling function and represents a functions in terms of wavelets alone. If
f(x) is an element of V1,but not V0. Wavelets from W0 would encode the difference between this
approximation and the actual function.
the above equation is generalized and is given by
Since wavelet spaces resides within the space spanned by the next higher resolution scaling
functions, any wavelet function can be expressed as a weighted sum of shifted double resolution
scaling functions.
ww
w.E
Where ℎ𝜑 (𝑛) are called wavelet function coefficients and ℎ𝜑 is the wavelet vector.
4.COMPRESSION
asy
En
The rapid growth of digital imaging application, including desktop publishing, multimedia,
teleconferencing and high definition television has increased the need for effective and
gi
standardized image compression techniques. The basic goal of image compression is to
represent an image with minimum number of bits of an acceptable image quality. All image –
nee
compression algorithms strive to remove statistical redundancy and exploit perceptual
irrelevancy while reducing the amount of data s much as possible.
g.n
definition television technologies, the amount of information that is handled by computers has
grown exponentially over the past decades. Hence, storage and transmission of the digital image
component of multimedia systems is a major problem. The amount of data required to resent
e
images at an acceptable level of quality is extremely large. High quality image date requires large
amounts of storage space and transmission band width something which the current technology
is unable to handle technically and economically. One of the possible solution to this problem is
to compress the information so that the storage space and transmission time can be reduced.
t
For example, if we want to store a 1600x1200 color image then the space required to store
the image is 1200x1600x3=46080,000 bits
= 5,760,000 bytes
=5,76 Mbytes
The maximum space available in on floppy disk is 1.44 Mb. If we have three floppies then
the maximum space is 1.44x3=5.76 Mbytes. That is, a minimum of four floppies are required to
store an RGB image of size 1600x1200.
The amount of date transmitted through the internet doubles every year, and a large
portion of that data comprises of images. Reducing the bandwidth needs of any given device will
result in significant cost reductions and will make the use of the device more affordable. Image
compression offers ways to represent an image in a more compact way, so that images can be
stored in a compact manner and can be transmitted faster
P a g e | 74
Source Coding
The goal of source coding is efficient conversion of the source data into a sequence of bits. The
source code reduces the entropy, which means decrease in the average number of bits required
to represent the image. The source decoding is essentially an inverse operation.
ww
w.E
asy
Fig 4.6 Image Compression Scheme
Channel
En
The channel is a mathematical model of the medium over which communication occurs
Channel Coding
gi
The channel encoder introduces controlled redundancy to the compressed output of the source
nee
encoder. The purpose of channel encoder is to protect the communication y against noise and
other transmission errors in the channel. the channel decoder exploits the redundancy in the bit
sequence to reproduce the compressed bits.
Classification of image compression
1. Lossless compression or reversible compression rin
2. Lossy compression or Irreversible compression
P a g e | 75
• Average length
• Efficiency
• Variance
Prefix Code
A code is a prefix code if no code word is the prefix of another code word. The Main
advantage of a prefix code is that if is uniquely decodable. An example of a prefix code is the
Huffman code.
Types of Huffman codes
Huffman code can be broadly classified into (i) Binary Huffman code, (ii) Non- Binary
Huffman code, (iii) Extended Huffman code, and (iv) Adaptive Huffman code.
ww
Binary Huffman Code
w.E
In Binary Huffman code, the code for each symbol will be a combination of ones and zeros.
The Huffman code can be represented as binary tree in which the leaves correspond to the
symbols. The Huffman code for any symbol can be obtained by traversing the tree from the
asy
root node to the leaf corresponding to the symbol by assigning a 0 to the cod word every
time the traversal takes us over an upper branch, and assigning a 1 every time the traversal
En
takes up over a lower branch.
gi
As a non - binary Huffman code, consider the ternary Huffman code. The number of nodes
to be combined initially is given by the formula.
J=2+(N-2)mod(D-1) nee
Where,
rin
J= Number of symbol to be combined initially
g.n
N= Number of probabilities
D= Number of counts
Adaptive Huffman Code
e t
The adaptive Huffman code learns the symbol probabilities by dynamically using the symbol
counts to adjust the code tree. The decoder must use the same initial counts and the count
incrementation algorithm used by the encoder, so the encoder - decoder pair maintains the
same tree.
Limitations of Huffman code
The average code word length achieved by Huffman coding satisfies the inequality
P a g e | 76
Where is the entropy of the source alphabet and denotes the maximum
occurrence p r o b a b i l i t y in the set of the source symbols. The inequality indicates that
the upper b o u n d of the average code word length of the Huffman code is determined by
the entropy and the maximum occurrence probability of the source symbols being encoded.
The Huffman coding always encodes a source symbol with an integer number of bits. If the
probability distribution of the source symbol is such that some probabilities are small while
some are quite large then the entropy of the s o u r c e alphabet will be close to zero since the
uncertainty is very small.
In such cases, two bits are required to represent the information, that the average code
word length is one, which means that the redundancy is very close to one. This inefficiency is
due to the fact that Huffman coding always encodes a source symbol with an integer number
ww
of bits. To overcome this limitation, arithmetic coding was proposed which is stream based. A
string of source symbol is encoded as a string of code symbols.
w.E
6. RUN LENGTH CODING
Images with repeating intensities along their rows (or columns) can often be
asy
compressed by representing runs of identical intensities as run-length pairs, where each run-
length pair specifies the start of a new intensity and the number of consecutive pixels that
have that intensity.
En
The technique, referred to as run-length encoding (RLE), was developed in the
gi
1950s and became, along with its 2-D extensions, the standard compression approach in
facsimile (FAX) coding. Compression is achieved by eliminating a simple form of spatial
nee
redundancy-groups of identical intensities. When there are few (or no) runs of identical pixels,
run-length encoding results in data expansion.
Run-length encoding is particularly effective when compressing binary images. Because
rin
there are only two possible intensities (black and white), adjacent pixels are more likely to be
identical.
g.n
The basic idea is to code each adjacent group (i.e., run) of 0s or 1s encountered in a
of the run.
e
left to right scan of a row by its length and to establish a convention for determining the value
The most common conventions are (1) to specify the value of the first run of each row or
t
(2) to assume that each row begins with a white run, whose run length may in fact be zero.
The black and white run lengths can be coded separately using variable-length
codes that are specifically tailored to their own statistics. For example, letting symbol
represent a black run of length j, we can estimate the probability that symbol was emitted
by an imaginary black run-length source by dividing the number of black run lengths of
length in the entire image by the total number of black runs. An estimate of the entropy of
this black run-length source, denoted , follows by substituting these
P a g e | 77
where the variables L0 and L1 denote the average values of black and white run lengths,
respectively. HRL provides an estimate of the average number of bits per pixel required to
code the run lengths in a binary image using a variable-length code.
6.1.ARITHMETIC CODING
ww The code word itself defines an interval of real numbers between 0 and 1. As the number
of symbols in the message increases, the interval used to represent it becomes smaller and
w.E
the number of information units (say, bits) required to represent the interval becomes larger.
Each symbol of the message reduces the size of the interval in accordance with its probability
of occurrence. Because the technique does not require that each source symbol translate into
asy
an integral number of code symbols.
This figure illustrates the basic arithmetic coding process. Here, a five-symbol sequence
or message,
En
from a four-symbol source is coded. At the start of the coding
process, the message is assumed to occupy the entire half-open interval [0, 1).
gi nee
rin
g.n
e t
Fig 4.7 Arithmetic coding procedure
P a g e | 78
This Table shows, this interval is subdivided initially into four regions based on the
probabilities of each source symbol. Symbol a 1, for example, is associated with subinterval
[0,0.2). Because it is the first symbol of the message being coded, the message interval is
initially narrowed to [0,0.2].
Thus [0,0.2) is expanded to the full height of the figure and its end points labeled by
the values of the narrowed range. The narrowed range is then subdivided in accordance with
the original source symbol probabilities and the process continues with the next message
symbol. In this manner, symbol a 2 narrows the subinterval to [0.04,0.08), a 3 further narrows
it to [0.056,0 .072),and so on.
The final message symbol, which must be reserved as a special end-of- message
indicator, narrows the range to 10.06752,0.0688). Any number within this subinterval - for
example,0.068-can be used to represent the message.
ww In the arithmetically coded message, three decimal digits are used to represent the
five-symbol message. This translates into 0.6 decimal digits per source symbol and compares
favorably with the entropy of the source from equation,
w.E
asy
is 0.58 decimal digits per source symbol. As the length of the sequence being coded
increases, the resulting arithmetic code approaches the bound established by
En
Shannon's first theorem. In practice, two factors cause coding performance to fall short of the
bound:
P a g e | 79
The above symbol denotes the exclusive OR operation. This code has the unique property that
successive code words differ in only one bit position. Thus, the small changes in intensity are
less likely to affect all m bit planes.
8.PREDICTIVE CODING
asy
Fig 4.9 Lossless Predictive Coding Model Encoder And Decoder
En
gi nee
rin
g.n
e t
The system consists of an encoder and decoder each containing an identical predictor. A
successive samples of discrete time input signal f(n) are introduced to the encoder, the
predictor generates anticipated value of each sample based on a specified number of past
samples. The output of the predictor is then rounded to the nearest integer denoted by 𝑓̂(𝑛) and
used to form a difference or prediction error which is encoded using a variable length code to
generate the next element of the compressed data stream.
The decoder reconstructs e(n) from the received variable length code words and
performs the inverse operation to decompose or recreate the original sequence.
P a g e | 80
Various local, global and adaptive methods can be used to generate 𝑓̂(𝑛). In many cases
the prediction is formed as a linear combination of m previous samples.
Where m is the order of the linear predictor, round is the function used to denote the rounding or
nearest integer operation and αi for i=1,2,3 ....m are prediction coefficients. If the input sequence
is considered to be samples of an image, the f(n) in equation 1 through 3 are pixels and the m
ww
samples used to predict the value of each pixel come from the current scan line(1 D linear
predictive coding), from the current and previous scan lines (2 D linear predictive coding)
or from the current image and previous image in a sequence of images(3 D linear predictive
w.E
coding). Thus for 1 D linear predictive coding equation 3 can be written as
asy
En
1 D linear prediction is a function of previous pixel on the current line alone. In 2 D
predictive coding the prediction is a function of previous pixels in the left to right, top to bottom
gi
scan of an image. In 3 D case it is based on pixels and the previous pixels of preceding frames.
Equation 4 cannot be evaluated for the first m pixels of each line, so those pixels must be coded
nee
by using other means such as Huffman code and considered as an overhead of the predictive
coding process. Similar comments apply for higher dimension.
9.Lossy predictive Coding
rin
We add a quantizer to the lossless predictive coding model and examine the tradeoff
g.n
between reconstruction accuracy and compression performance within the context of spatial
predictors.
e t
P a g e | 81
In this figure the quantizer replaces the nearest integer function of the error free encoder
is inserted between the symbol encoder and the point at which the prediction error is formed. It
maps the prediction error into a limited range of outputs denoted𝑒̂ (𝑛), which establishes the
amount of compression and distortion that occurs.
In order to accommodate the insertion of the quantization step, the error free encoder
must be altered so that the predictions generated by the encoder and decoder are equivalent.
The block diagram is accomplished by placing the lossy encoder's predictor within a feedback
loop, where its input, denoted 𝑓̇ (𝑛) is generated as a function of past predictions and the
corresponding quantized errors. The output of the decoder is given by
ww
This closed loop configuration prevents error buildup at the decoder's output.
Delta Modulation
w.E
Delta modulation is a simple but well known form of lossy predictive coding in which the
predictor and quantizer are defined as
asy
En
gi nee
rin
where α is the prediction coefficient and ς is a positive constant. The output of the quantizer
g.n
e(n) can be represented by a single bit, so the symbol encoder can utilize a 1 bit fixed length
code. The resulting DM code rate is 1 bit/pixel.
e
The tabular column illustrates the mechanics of the delta modulation process where the
calculation needed to compress and reconstruct the input sequence. The process begins with
the error free transfer of the first input sample to the decoder. With the initial condition t
𝑓̇ (0)=f(0)=14 established at both the encoder and decoder, the remaining outputs can be
computed by repeatedly evaluating the equations. Thus when n=1
For the tabulation, graph is drawn. Both the input and completely decoded output are shown.
P a g e | 82
ww
w.E
asy
En
gi nee
rin
Fig 4.11 A example of Delta modulation g.n
e
Note that in the rapidly changing area from n=14 to 19 where ς was too small to represent the
input's largest changes, a distortion known as slope overload occurs. Moreover, when ς was too
t
large to represent the input's smallest changes as in the relatively smooth region from n=0 to
n=7, granular noise appears. In images these two phenomenon leads to blurred objects
edges and grainy or noisy surfaces.
11.Compression Standards
The above figure lists the most important image compression standards, file formats and
containers in use today, grouped by the type of image handled. The entries in black are
P a g e | 83
UNIT IV
IMAGE COMPRESSION
PART A
ww
1. What is Data Compression?
Data compression requires the identification and extraction of source redundancy. In other
words, data compression seeks to reduce the number of bits used to store or transmit
information.
w.E
2. What are two main types of Data compression?
asy
The two main types of data compression are lossless compression and lossy compression.
En
Lossless compression can recover the exact original data after compression. It is used mainly
for compressing database records, spreadsheets or word processing files, where exact
replication of the original is essential.
rin
in compression. Lossy compression is more effective when used to compress graphic images
and digitized voice where losses outside visual or aural perception can be tolerated.
e
that compress a body of data on its way to a storage device and decompresses it when it is
retrieved. In terms of communications, the bandwidth of a digital communication link can be
effectively increased by compressing data at the sending end and decompressing data at the
t
receiving end. At any given time, the ability of the Internet to transfer data is fixed. Thus, if data
can effectively be compressed wherever possible, significant improvements of data throughput
can be achieved. Many files can be combined into one compressed document making sending
easier.
P a g e | 84
ww
codes. The general strategy is to allow the code length to vary from character to character and
to ensure that the frequently occurring characters have shorter codes.
w.E
11. What is Arithmetic Coding?
Arithmetic compression is an alternative to Huffman compression; it enables characters to be
represented as fractional bit lengths. Arithmetic coding works by representing a
asy
number by an interval of real numbers greater or equal to zero, but less than one. As a message
becomes longer, the interval needed to represent it becomes smaller and smaller, and thenumber
of bits needed to specify it increases.
nee
Since this is a linear process and no information is lost, the number of coefficients produced is
equal to the number of pixels transformed. The desired effect is that most of the energy in the
image will be contained in a few large transform coefficients. If it is generally the same few
rin
coefficients that contain most of the energy in most pictures, then the coefficients may be further
coded by loss less entropy coding. In addition, it is likely that the smaller coefficients can be
coarsely quantized or deleted (lossy coding) without doing visible damage to the reproduced
image.
g.n
13. What is zig zag sequence?
e
The purpose of the Zig-zag Scan are ,i.To group low frequency coefficients in top of vector.
ii.Maps 8 x 8 to a 1 x 64 vector.
t
14. What is block code?
Each source symbol is mapped into fixed sequence of code symbols or code words. So it is
called as block code.
P a g e | 85
In arithmetic coding, one to one corresponds between source symbols and code word doesn’t
exist where as the single arithmetic code word assigned for a sequence of source symbols. A
code word defines an interval of number between 0 and 1.
PART B
1. Write notes on Run length encoding and shift codes?
2. a) Determine the Huffman Code assignment procedure for the following data
ww SYMBOL
A1
PROBABILITY
0.1
w.E A2
A3
A4
0.4
0.06
0.1
asy A5
A6
0.04
0.3
En
Compute the average length of the code and the entropy of the source. Also find the efficiency?
(Nov/Dec 2012)
gi nee
b) A source emits 4 symbols {a,b,c,d} with probabilities {0.4, 0.2, 0.1, 0.3} respectively.
Construct arithmetic coding to encode and decode the word ‘DAD’?
rin
3 For the image shown below compute the compression ratio that can be achieved
using Huffman coding.
3
3 3 2
3 3 3 g.n
2
3
2
2 2 2
1 1 0
e t
4. A source emits three symbols A,B,C with a probability {0.5,0.25,0.25} respectively.
P a g e | 86
ww
w.E
asy
En
ginee
rin
g.n
e t
P a g e | 87
UNIT 5
ww IMAGE
SEGMENTATION
w.E AND
asy
REPRESENTATION
En
REFERRED BOOK:
gi nee
1. Rafael C. Gonzalez, Richard E. Woods - Digital Image Processing,
2. Jayaraman- Digital Image Processing.
rin
g.n
e t
P a g e | 88
Edge detection is the approach used most frequently for segmenting images based on-abrupt
(local) changes in intensity. Edge models are classified according to their intensity profiles.
a. Step Edge
A step edge shows a section of a vertical step edge and a horizontal intensity profile through the
edge.
ww
w.E Fig 3.4 Step Edge model
asy
Step edges occur, for example, in images generated by a computer for use in areas such
as solid modeling and animation. These clean, ideal edges can occur over the distance of 1
En
pixel, provided that no additional processing (such as smoothing) is used to make them "real."
Digital step edges are used frequently as edge models in algorithm development.
b. Ramp Edge
gi nee
In practice, digital images have edges that are blurred and noisy, with the degree of blurring
determined principally by limitations in the focusing mechanism and the noise level determined
principally by the electronic components of the imaging system.
rin
In such situation, edges are more closely modeled as having an intensity ramp profile, such
g.n
as the edge in Figure. The slope of the ramp is inversely proportional to the degree of blurring in
the edge.
e t
Fig 3.5 Ramp Edge profile
c. Roof Edge
A third model of an edge is the so-called, roof edge having the characteristics
illustrated in Figure
P a g e | 89
ww
Roof edges are models of line through region with the base (width) of a roof edge being
determined by thickness and sharpness of the line. In the limit, when its base is 1 pixel wide a
roof edge is really nothing more than a 1 pixel thick line running through a region in an image.
w.E
Roof edges arise, for example, in range imaging. when thin objects (such as pipes) are closer to
the sensor than their equidistant background (such as walls). The pipes appear brighter and thus
create image similar to the model in Figure. The areas roof edges appear routinely are in the
asy
digitization of line drawings and also in satellite images, where thin features, such as roads,
can be modeled by this type of edge.
EDGE DETECTION
En
gi
Edge detection is the process of finding meaningful transition in an image. Edge detection is one
of the central tasks of the lower levels of image processing. The points where sharp changes in
nee
the brightness occur typically form the border between different objects. These points can be
detected by computing intensity differences in local image regions. That is, the edge-detection
algorithm should look for a neighborhood with strong signs of change. Most of the edge detectors
work on measuring the intensity gradient at a point in the image.
rin
Importance of Edge Detection
g.n
Edge detection is a problem of fundamental importance in image analysis. The purpose of edge
e
detection is to identify areas of an image where a large change in intensity occurs. These
changes are often associated with some physical boundary in the scene from which the image
is derived. In typical images, edges characterize object boundaries and are useful for
segmentation, registration and identification of objects in a scene. t
a. Gradient Operator
A gradient is a two-dimensional vector that points to the direction in which the image intensity
grows fastest. The gradient operator V is given by,
P a g e | 90
The two functions that can be expressed in terms of the directional derivatives are the gradient
magnitude and the gradient orientation. It is possible to compute the magnitude of the gradient
and the orientation.
The gradient magnitude gives the amount of the difference between pixels in the neighborhood
which gives the strength of the edge. The gradient magnitude is defined by
ww
w.E
asy
The magnitude of the gradient gives the maximum rate of increase per unit distance in the
gradient.
En
gi
The gradient orientation gives the direction of the greatest change, which presumably is the
direction across the edge. The gradient orientation is given by,
nee
where the angle is measured with respect to the x-axis. rin
g.n
The direction of the edge is perpendicular to the direction of the gradient vector at that point.
Figure indicates the gradient of the edge pixel. The circle indicates the location of the pixel.
e t
P a g e | 91
ww
An image is a function of two variables equation,
w.E
only refers to the partial derivative along the x-axis. Pixel discontinuity can be determined
asy
along eight possible directions such as up, down, left, right and along the four diagonals.
The other method of calculating the first-order derivative is given by estimating the finite
difference:
En
gi nee
The finite difference can be approximated as
rin
g.n
e t
Using the pixel coordinate notation and considering that corresponds to the direction we have
P a g e | 92
ww
small to reliably find edges in the presence of noise. The simplest way to implement the first-
order partial derivative is by using the Roberts cross-gradient operator.
w.E
asy
En
Roberts operator masks are given by, gi
The partial derivatives given above can be implemented by approximating them to two 2
nee
rin
g.n
e
These filters have the shortest support, thus the position of the edges is more accurate, but the
t
problem with the short support of the filters is its vulnerability to noise.
d. Prewitt kernels
The Prewitt kernels are named after Judy Prewitt. Prewitt kernels are based on the idea of central
difference. The Prewitt edge detector is a much better operator than the Roberts operator.
Consider the arrangement of pixels about the central pixel as shown below
P a g e | 93
The constant in the above expressions implies the emphasis given to pixels closer to the centre
of the mask. and are the approximations at
ww
w.E
The Prewitt masks have longer support. The Prewitt mask differentiates in one direction and
averages in other direction; so the edge detector is less vulnerable to noise.
e. Sobel Operator
asy
En
The Sobel kernels are named after Irwin Sobel. The Sobel kernel relies on central differences,
but gives greater weight to the central pixels when averaging. The Sobel kernels can be thought
nee
rin
The Sobel masks in matrix form are given as
g.n
e t
The noise-suppression characteristics of a Sobel mask is better than that of a Prewitt mask.
f. Frei- Chen Edge Detector
Edge detection using Frei-Chen masks is implemented by mapping the intensity vector using a
linear transformation and then detecting edges based on the angle between the intensity vector
and its projection onto the edge subspace. Frei-Chen edge detection is realized with the
normalized weights. Frei-Chen masks are unique masks, which contain all the basis vectors.
This implies that 3 x 3 image area is represented with the weighted sum of nine Frei-Chen masks.
Primarily, the image is convolved with each of the nine masks.
P a g e | 94
ww
w.E
Then an inner product of the convolution results of each mask is performed. The nine Frei-Chen
masks are given above.
asy
The first four Frei-Chen masks are used for edges and the next four are used for lines and the
En
last mask is used to compute averages. For edge detection, appropriate masks are chosen and
the image is projected onto it.
gi
g. Second - Derivative Method of Detecting Edges in an Image
nee
Finding the ideal edge is equivalent to finding the point where the derivative is maximum or
minimum. The maximum or minimum value of a function can be computed by differentiating
rin
the given function and finding places where the derivative is zero. Differentiating the first
derivative gives the second derivative. Finding the optimal edges is equivalent to finding places
where the second derivative is zero. The differential operators can be applied to images; the
g.n
zeros rarely fall exactly on a pixel. Typically, they fall between pixels. The zeros can be isolated
by finding the zero crossings. Zero crossing is the place where one pixel is positive and a
t
For images, there is a single measure, similar to the gradient magnitude, that measures the
second derivative, which is obtained by taking the dot product of with itself.
P a g e | 95
ww
The Laplacian operation can be expressed in terms of difference equations as given below:
w.E
asy
En
This Implies that
gi nee
rin
The 3x3 Laplacian operator is given by g.n
e t
The Laplacian operator subtracts the brightness values of each of the neighbouring pixels from
the central pixel. When a discontinuity is present within the neighbourhood in the form of a
point, line or edge, the result of the Laplacian is a non-zero value. It may be either positive or
negative depending where the central point lies with respect to the edge. The Laplacian operator
is rotationally invariant. The Laplacian operator does not depend on direction as long as they
are orthogonal.
P a g e | 96
ww
Step 1 Smoothing of the input image f(m,n)
The input image f(m,n) is smoothed by convolving it with the Gaussian mask h(m,n) to get the
resultant smooth image g(m,n).
w.E
Step 2 The Laplacian operator is applied to the result obtained in Step 1. This is represented by
asy
therefore
En
gi nee
Here, f(m,n) represents the input image and h(m, n) represents the Gaussian mask. The
Gaussian mask is given by
rin
g.n
convolution is a linear operator, hence e t
On differentiating the Gaussian kernel,
P a g e | 97
ww
Differentiation of Gaussian Function
Fig 3.8 Differentiation of Gaussian Function
w.E
Disadvantages of LOG Operator:
The LOG operator being a second-derivative operator, the influence of noise is considerable. It
asy
always generates closed contours, which is not realistic.
9.Edge Linking via Hough Transform
En
Ideally, edge detection should yield sets of pixels lying only on edges. In practice, these pixels
gi
seldom characterize edges completely because of noise, breaks in the edges due to non uniform
illumination, and other effects that introduce spurious discontinuities in intensity values.
nee
Therefore, edge detection typically is followed by linking algorithms designed to assemble
edge pixels into meaningful edges and/or region boundaries.
Given n points in an image. Suppose that we want to find subsets of these points that lie on
straight lines. One possible solution is to find first all lines determined by every pair of points
and then find all subsets of points that are close to particular lines.
This approach involves finding n2 lines and then performing n3 comparisons of every point to
all lines. This is computationally prohibitive task in all but the most trivial applications.
Hough proposed an alternative approach, commonly referred to as the Hough transform
consider a point (xi,yi) in the xy plane and general equation of a stright line in slope intercept
form, yi=axi+b.
P a g e | 98
Infinity many lines pass through (xi,yi), but they all satisfy the equation yi=axi+b for
varying values of a and b. However, writing this equation as yi - axi = b;; and considering
the ab plane yields the equation of a single line for fixed pair (xi,yi). .
Furthermore, a second point (xj,yj) also has a line parameter space associated with it, and,
unless they are parallel, this line intersects the line associated with (xj,yj) at.some point
(a',b') where a ' is the slope and b', the intercept of the line containing both (xi,yi) and (xj,yj)
in the xy-plane. In fact, all the points on this line have lines in parameter space that intersect at
(a',b') .Figure illustrates these concepts.
ww
w.E
asy
En
gi
Fig 3.9
nee
rin
In principle, the parameter-space lines corresponding to all points (xj,yj) in the xy- plane could
be plotted, and the principal lines in that plane could be found by identifying points in parameter
g.n
space where large numbers of parameter-space lines intersect. A practical difficulty with this
approach is that a (the slope of a line) approaches infinity as the line approaches the vertical
e
direction. One way around this difficulty is to use the normal representation of a line:
P a g e | 99
ww
Figure 3.10 illustrates the geometrical interpretation-of the parameters ρ and ϴ. A horizontal
line has ϴ= 0 , with ρ being equal to the positive x- intercept. Similarly, a vertical line has ϴ=
90 , with ρ being equal to the positive y-intercept. Each sinusoidal curve in Figure 3.10
w.E
represents the family of lines that put. through a particular point (xk,yk) in the xy-plane.
The intersection point (ρ',ϴ') in figure 3.10 (b)corresponds to the line that passes
through both (xj,yj) in figure 3.10(a). The computational attractiveness of the Hough transform
asy
arises from sub-dividing the ρ, ϴ parameter space into so-called accumulator cells, as figure
3.10(c) illustrates, where ρmax,ρmin and ϴmax, ϴmin are the expected ranges of the parameter
En
values:
gi nee
where D is the maximum distance between opposite corners in an image.
10.REGION-BASED SEGMENTATION
rin
The objective of segmentation is to partition an image into regions. we are going to
g.n
discuss segmentation techniques that are based on finding the regions directly.
➢ Region Growing
➢ Region Splitting and Merging
Region Growing
e t
Region growing is a procedure that groups pixels or sub regions into larger regions based on
predefined criteria for growth. The basic approach is to start with a set of "seed" points and
from these grow regions by appending to each seed those neighboring pixels that have
predefined properties similar to the seed (such as specific ranges of intensity or color).
Selecting a set of one or more starting points often can be based on the nature of the problem.
When a priori information is not available, the procedure is to compute at every pixel the same
set of properties that ultimately will be used to assign pixels to regions during the growing
process.
If the result of these computations shows clusters of values, the pixels whose properties place
them near the centroid of these clusters can be used as seeds. The selection of similarity criteria
P a g e | 100
depends not only on the problem under consideration, but also on the type of image data
available.
For example, the analysis of land-use satellite imagery depends heavily on the use of color. This
problem would be significantly more difficult, or even impossible, to solve without the inherent
information available in color images.
When the images are monochrome, region analysis must be carried out with a set of descriptors
based on intensity levels and spatial properties
Descriptors alone can yield misleading results if connectively properties are not used in the
region-growing process. For example. visualize a random arrangement of pixels with only three
distinct intensity values.
Let f(x,y) denote an input image array S(x,y) denote a seed array containing 1s at the locations
ww
of seed points and 0s elsewhere and Q denote a predicate to be applied at each location (x,y).
Arrays f and S are assumed to be of the same size. A basic region-growing algorithm based
on S-connectivity may be stated as follows.
w.E
1. Find all connected components in S(x,y) and erode each connected component to one
pixel; label all such pixels found as 1. All other pixels in S are labeled 0.
asy
2. Form an image f0 such that, at a pair of coordinates (x,y). Let f0(x,y)=1 if the input image
satisfies the given predicate, Q. at those coordinates: otherwise, let f0(x,y)=0.
En
3. Let g be an image formed by appending to each seed point in S all the 1-valued points
in f0 that are 8-connected to that seed point.
nee
rin
The procedure discussed in the last section grows regions from a set of seed point.. An
alternative is to subdivide an image initially into a set of arbitrary disjoint, regions and then merge
and/or split the regions in an attempt to satisfy the condition of segmentation
g.n
Let R represent the entire image region and select a predicate Q. One approach for
Start with the entire region. If Q(R): FALSE, then divide the image into quadrants. If Q is FALSE t
for any quadrant, further subdivide that quadrant into sub quadrants, and so on. This particular
splitting technique has a convenient representation in the form of so- called quad trees, that is,
trees in which each node has exactly four descendants.
Figure shows (the images corresponding to the nodes of a quad tree sometimes are called quad
regions or quad images). Note that the root of the tree corresponds to the entire image and that
each node corresponds to the subdivision of a node into four descendant nodes. In this case,
only was subdivided further. If only splitting is used, the final partition normally contains
adjacent regions with identical properties. This drawback can be remedied by allowing merging
as well as splitting. Satisfying the constraints of segmentation, requires merging only adjacent
regions whose combined pixels satisfy the predicate Q. That is, two adjacent regions Rj and
R k are merged only if
P a g e | 101
ww
Fig 3.11 a) partitioned image b)corresponding quad tree R represents the entire image
w.E region
asy
En
gi nee
rin
UNIT V
g.n
1.INTRODUCTION
IMAGE REPRESENTATION AND RECOGNITION
e
After an image has been segmented into regions, the resulting aggregate of segmented
t
pixels usually is represented and described in a form suitable for further computer processing.
Basically, representing a region involves two choices.
1. We can represent a region in terms of its external characteristics(Boundary)
2. We can represent it in terms of its internal characteristics(the pixels comprising the region)
Choosing a representation scheme is only part of the task of making the data useful to a
computer. The next task is to describe the region based on the chosen representation.
An external representation is chosen when the primary focus is on shape characteristics.
An internal representation is selected when the primary focus is on the regional properties, such
as color and texture.
P a g e | 102
2.Boundary Following
We assume 1) that we are working with binary images in which object and background points
are labeled 1 and 0respectively.
2) that images are padded with a border of 0s to eliminate the possibility of an object merging
with the image border.
For convenience we limit the discussion to single regions.
Given a binary region R or its boundary, an algorithm for following the border of R or the
given boundary, consists of the following steps.
1. Let the starting point be b0, be the uppermost, leftmost point in the image. Denote by C 0 the
west neighbor of b0. Clearly C0 is always a background point. Examine the 8 neighbors of b 0,
ww
starting at C0 and proceeding in clockwise direction. Let b1 denote the first neighbor encountered
whose value is 1 and let c1 be the point immediately preceding b1 in the sequence. Store the
location of b0 and b1 for use in step 5.
w.E
2. Let b= b1 and C= C1
3. Let the 8 neighbors of b, starting at c and proceeding in a clockwise direction, be denoted by
asy
n1,n2,n3 ....n8. Find the first nk labeled 1.
4. Let b=nk and c=nk-1
En
5.Repeat step 3 and step 4until b=b0 and next boundary point found is b1. The sequence of b
points found when the algorithm stops constitutes the set of ordered boundary points.
3.CHAIN CODES
gi nee
Chain codes are used to represent a boundary by a connected sequence of straight line
segments of specified length and direction. This representation is based on 4 or 8 connectivity
rin
of the segments. The direction of each segment is coded by using a numbering scheme. A
boundary code formed as a sequence of such directional numbers is referred to as a Freeman
chain code.
g.n
Digital Images usually are acquired and processed in a grid format with equal spacing in
e
the x and y directions, so a chain code can be generated by the following a boundary in a
clockwise direction and assigning a direction to the segments connecting every pair of pixels.
This method is unacceptable for two principle reasons. t
1) The resulting chain tends to be quite long.
2) Any small disturbances along the boundary due to noise or imperfect segmentation cause
changes in the code that may not be related to the principal shape features of the boundary.
An approach frequently used to circumvent these problems to resample the boundary by
selecting a larger grid spacing. Then as the boundary is traversed, a boundary point is assigned
to each node of the large grid, depending on the proximity of the original boundary to that node.
The resampled boundary obtained in this way that can be represented by a 4 or 8 code. The
coarser boundary point represented by an 8 directional chain code. The starting point in figure 3
is at the topmost, left most point of the boundary which gives the chain code 0766...12. The
accuracy of the resulting code representation depends on the spacing of the sampling grid.
P a g e | 103
asy
be normalized with respect to the starting point by a straight forward procedure. We simply treat
the chain code as a circular sequence of direction numbers an redefine the starting point
so that resulting sequence of numbers forms an integer on minimum magnitude. We can
En
normalize also for rotation by using the first difference of the chain code instead of the code itself.
This difference is obtained by counting the number of direction changes that separate two
gi
adjacent element of the code. For instance, the first difference of the 4 direction chain code
10103322 is 3133030. If we treat the code as a circular sequence to normalize with respect to
nee
starting point, then the first element of the difference is computed by using the transition between
the last and first components of the chain. Here the result is 33133030. Size normalization can
be achieved by altering the size of the resampling grid.
rin
These normalization are exact only if the boundaries themselves are invariant to rotation
g.n
and scale change. For instance the same object digitized in two different orientation will have
different boundary shapes in general, with the degree of dissimilarity being proportional to image
resolution. This effect can be reduced by selecting chain elements that are long in proportion to
t
A signature is the 1 D functional representation of a boundary and may be generated in
various ways. One of the simplest is to plot the distance from the centroid to the boundary as the
function of angle.
P a g e | 104
ww
w.E
Fig 5.2 Distance versus angle signatures r(ϴ) The signature consists of repetition of the
is constant
asy pattern.
Signatures generated by the approach are invariant to translation, but they do depend on
En
rotation and scaling. Normalized with respect to rotation can be achieved by finding a way to
select the same staring point to generate the signature regardless of the shape's orientation.
gi
One way is to select the starting point as the point farthest from the centroid assuming
nee
that this point is unique for each shape of interest. Another way is to select the point on the eigen
axis that is farthest from the centroid. This method requires more computation but is more rugged
because the direction of the eigen axis is determined by using all contour points. Yet another
rin
way is to obtain the chain code of the boundary and then use the approach, assuming that the
coding is coarse enough so that rotation does not affect its circularity.
g.n
Based on the assumptions of uniformity in scaling with respect to both axes and that
sampling is taken at equal intervals of ϴ changes in size of a shape result in changes in the
e
amplitude values of the corresponding signature. One way to normalize this is to scale all
functions so that they always span the same range of values. The main advantage of this method
is simplicity, but it has the potentially serious disadvantage that scaling of the entire function
depends on only two values the minimum and maximum. If the shapes are noisy, this
dependence can be a source of significant error from object to object.
t
A more rugged approach is to divide each sample by the variance of the signature
assuming that the variance is not zero. use of variance yields a variable scaling factor that is
inversely proportional to changes in size and works much as automatic gain control does.
Distance versus angle is not only way to generate a signature. Another way is to traverse
the boundary and corresponding to each point on the boundary, plot the angle between a line
tangent to the boundary at that point and a reference line. Although the resulting signature is
quite different from r(ϴ) curves, would carry information about basic shape characteristics. For
instance, horizontal segments in the curve would correspond to straight lines along the boundary,
because the tangent angle would be constant there.
BOUNDARY SEGMENTS
P a g e | 105
ww
w.E
asy
En
Fig 5.3 A region S and its convex deficiencygi nee Partitioned boundary
The figure shows an object (set S) and its convex deficiency (shaded regions). The region
rin
boundary can be partitioned by following the contour of S and marking the points at which a
transition is made into or out of a component of the convex deficiency. In principle this scheme
g.n
is independent of region size and orientation.
e
variation in segmentation. These effects results in convex deficiencies that have small,
meaningless components scattered randomly throughout the boundary. Rather than attempt to
sort out these irregularities by post processing, a common approach is to smooth a boundary
prior to partitioning. There are number of ways to do so. One way is to traverse the boundary
and replace the coordinates of each pixel by the average coordinates of K of its neighbors along
t
the boundary. This approach works for small irregularities, but it is time consuming and difficult
to control. Large values of K can result in excessive smoothing, whereas small values of K might
not be sufficient in some segments of the boundary. A more rugged technique is to use a
polygonal approximation prior to finding the convex deficiency of a region. Most digital
boundaries of interest are simple polygons.
The concepts of a convex hull and its deficiency are equally useful for describing an entire
region, as well as just its boundary. For example, description of a region might be based on its
area and the area of its convex deficiency, the relative location of these components.
4.BOUNDARY DESCRIPTORS
SHAPE NUMBERS
P a g e | 106
The shape number of the boundary based on the 4-directional code is defined as the first
difference of smallest magnitude. The order n of a shape number is defined as the number of
digits in its representation. Moreover n is even for a closed boundary and its value limits the
number of possible different shapes. This figure shows all the shapes of order 4,6 and 8 along
with their chain code representation, first differences and corresponding shape numbers. Note
that the first difference is computed by treating the chain code as a circular sequence. Although
the first difference of a chain code is independent of rotation, in general the coded boundary
depends on the orientation of the grid. One way to normalize the grid orientation is by aligning
the chain code grid with the sides of the basic rectangle defined.
ww
w.E
asy
En
gi nee
rin
g.n
Fig 5.4 All shapes of order 4,6 and 8 The dots indicate the starting point
In practice, for a desired shape order we find the rectangle of order n whose eccentricity best
e
approximates that of the basic rectangle and use this new rectangle to establish the grid size.
P a g e | 107
ww
w.E
asy
En
gi nee
rin
5.FOURIER DESCRIPTOR
Fig 5.5 Steps in the generation of the shape number
g.n
e t
This figure shows a K point digital boundary in the xy plane. Starting at an arbitrary point
(x0,y0), coordinate pairs (x0,y0),(x1,y1),(x2,y2),.....(xk-1,yk-1) are encountered in traversing the
boundary say in the clockwise direction.
These coordinates can be expressed in the form x(k) = xk and y(k)=yk. With this notation the
boundary itself can be represented as the sequence of coordinates s(k)= [x(k)y(k)] for k=0,1,2
....k-1.
Moreover each coordinate pair can be treated as a complex number so that s(k)= x(k)+jy(k).
The x axis is treated as a real axis and y axis is treated as imaginary axis of a sequence of
complex numbers. Although the interpretation of the sequence was recast the nature of the
boundary itself was not changed. This representation has one great advantage:It reduces 2D to
a 1 D problem.
P a g e | 108
ww
w.E
Fig 5.6 A digital boundary and its representation as a complex sequence. The point (x0,y0)
asy
and (x1,y1) shown are the first two points in the sequence
The discrete Fourier transform of s(k) is
En
gi nee
rin
The complex coefficients a(u) are called the fourier descriptors of the boundary. The inverse
Fourier transform of these coefficients restores s(k).
g.n
e t
Suppose that insea of all the fourier coefficients only the first P coefficients are used. This is
equivalent to setting a(u)=0 for u>P-1. The result is the following approximation to s(k).
Although only P terms are used to obtain each component of s(k), k still ranges from 0 to k-1.
P a g e | 109
A few Fourier descriptors can be used to capture the gross essence of a boundary. This
property is valuable because these coefficients carry shape information. Thus they can be used
as the basis for differentiating between distinct boundary shapes.
ww
The table summarizes the Fourier descriptors for a boundary sequence s(k) that undergoes
rotation, translation, scaling and changes in starting point.
w.E
In other words translation consists of adding a constant displacement to all coordinates
in the boundary. Note that the translation has no effect on the descriptors except for u=0 which
has the impulse. Finally the expression sp(k)= s(k-k0) means redefining the sequence as
asy
En
which merely changes the starting point of the sequence to k=k0 from k=0.
6.STATISTICAL MOMENTS
gi nee
The shape of boundary segments can be described quantitatively by using statistical
moments such as mean, variance and higher order moments.
rin
Figure 1 shows the segment of the boundary and figure 2 shows the segment represented
as 1 D function g(r) of an arbitrary variable r.
g.n
This function is obtained by connecting the
two end points of the segment and rotating the line segment until it is horizontal. The coordinates
of the point are rotated by the same angle.
e t
P a g e | 110
Let us treat the amplitude of g as a discrete random variable v and form an amplitude
histogram p(vi), i=0,1,2,....A-1, where A is the number of discrete amplitude increments in which
we divide the amplitude scale. p(vi) is the estimate of the probability of vi.
The nth moment of v about its mean is
Where
ww
w.E
The quantity m is recognized as the mean or average value of v and μ 2 as its variance. Generally
only the first few moments are required to differentiate between signatures of clearly distinct
shapes.
asy
An alternative approach is to normalize g(r) to unit area and treat it as a histogram. In
other words g(ri) is now treated as the probability of value ri occurring. In this case r is treated as
the random variable and the moments are
En
gi nee
where
rin
g.n
The advantage of moments over other techniques is that implementation of moments is
e t
staightforward and they also carry a physical interpretation of boundary shape. The insensitivity
of this approach to rotation is clear. Size normalization can be achieved by scaling the range of
values of g and r.
7.REGIONAL DESCRIPTORS
P a g e | 111
transformation. In general the number of holes will change if the region is torn or folded. As
stretching affects distance topological properties do not depend on th notion of distance or any
properties implicitly based on the concepts of a distance measure.
ww
w.E
asy Fig 5.8 A region with two holes
En
Another topological property useful for region descriptors is the number of connected
components. A region with three connected components is shown in figure. The number of holes
E=C-H gi
H and connected components C in a figure can be used to define the Euler number E.
nee
rin
g.n
e t
Fig 5.9 A region with three connected components
The Euler number is also a topological property. The region shown in figure have Euler
numbers equal to 0 and -1 respectively, because the A has one connected component and one
hole and the B one connected component but two holes.
Regions represented by straight line segments have a particularly simple interpretation
in terms of Euler number. The figure shows a polygonal network. Classifying interior regions of
such a network into faces and holes is often important. Denoting the number of vertices by V,
P a g e | 112
the number of edges by Q, and the number of faces by F gives the following relationship called
the Euler formula
V-Q+F = C-H
ww
w.E
asy
En
Fig 5.10 A region containing a polygonal network
gi nee
The network has 7 vertices, 11 edges, 2 faces, 1 connected region and 3 holes. Thus the Euler
number is -2
7-11+2=1-3=-2
rin
g.n
Topological descriptors provide an additional feature is often useful in characterizing regions in
a scene.
8.TEXTURE
e
Texture provides measures of properties such as smoothness, coarseness and
regularity. The three principal approaches used in image processing to describe a texture of a t
region are statistical, structural and spectral. Statistical approaches yield characterizations of
textures as smooth, coarse, grainy and so on. Structural techniques deal with the arrangement
of image primitives such as description of texture based on regularly spaced parallel lines.
Spectral techniques are based on properties of the fourier spectrum and are used primarily to
detect global periodicity in an image by identifying high energy narrow peaks in the spectrum.
Statistical Approaches
One of the simplest approaches for describing texture is to use statistical moments of
the intensity histogram of an image or region. Let Z be a random variable denoting intensity
and let p(zi), i=0,1,2,.... L-1, be corresponding histogram where L is the number of distinct
intensity levels.
The nth moment of z about the mean is
P a g e | 113
ww
moment is
w.E
asy
is a measure of the skewness of the histogram while the forth moment is a measure of its relative
flatness. The fifth and higher moments are not so easily related to histogram shape but they do
En
provide further quantitative discrimination of the texture content. Some useful additional texture
measures based on histograms include a measure of uniformity given by
gi nee
and an average entropy measure. The basic information is defined as rin
g.n
e
Because the ps have value in the range [0,1] and their sum equals is 1 measure U is maximum
t
for an image in which all intensity levels are equal and decrease from there. Entropy is a measure
of variability and is 0 for a constant image.
Let Q be an operator that defines the position of two pixel relative to each other and
consider the image f with L possible intensity levels. Let G be a matrix whose element g ij is the
number of times that pixels pairs with intensities z i and zj occurs in f in the position specified by
Q. A matrix formed in this manner is referred to as a gray level co-occurrence matrix. When the
meaning is clear G is referred to simply as co-occurrence matrix.
Structural Approaches
A second category of texture description is based on structural concepts. Suppose we
have a rule of the form S------->aS, which indicates that the symbol S can be rewritten as aS. If
P a g e | 114
a represent a circle and the meaning of "circles to the right" is assigned to a string of the form
aaa..., the rule S-------> aS allows generation of the texture pattern.
Suppose we add some new rules to this scheme S---> bA, A----> cA, A--->c,A--->bS, S-
->a where the presence of a b means circle down and presence of c means circle to the left. We
can generate a string of the form aaabccbaa that corresponds to 3X3 matrix of circles.
The basic idea is that a simple texture primitive can be used to form more complex texture
patterns by means of some rules that limit the number of possible arrangements of the primitives.
Texture Primitive
w.E
asy
En
gi nee
Fig 5.11 2D texture pattern generated by this and other rules
Spectral Approaches
rin
g.n
Fourier spectrum is ideally suited for describing the directionality of periodic or almost
periodic 2 D patterns in an image. These global texture patterns are easily distinguishable as
concentrations of high energy bursts in the spectrum. We consider three features of the Furier
e
spectrum that are useful for texture description 1) Prominent peaks in the spectrum give the
principal direction of the texture patterns. 2) The location of the peaks in the frequency plane
gives the fundamental spatial period of the patterns. 3) Eliminating any periodic components via
filtering leaves non periodic image elements which can then be described by statistical
t
techniques. Recall that spectrum is symmetrical about the origin, so only half of the frequency
plane needs to be considered. Thus for the purpose of analysis every periodic patterns is
associated with only one peak in the spectrum rather than two.
Detection and interpretation of the spectrum features are simplified by expressing the
spectrum in polar coordinates to yield a function S(r,ϴ), where S is the spectrum function and r
and ϴ are the variables in this coordinate system. For each direction ϴ, S(r,ϴ) may be
considered a 1D function Sϴ(r). Similarly for each frequency r, Sr(ϴ) is a 1 D function. Analysing
Sϴ(r) for a fixed value of ϴ yield the behavior of the spectrum along a radial direction from the
origin, whereas analysing Sr(ϴ) for a fixed value of r yields the behavior along a circle centered
on the origin.
A global description is obtained by integrating these functions
P a g e | 115
ww
The result constitute a pair of values [S(r),S(ϴ)] for each pair of coordinates (r,ϴ). By varying
these coordinates we can generate two 1 D function S(r) and S(ϴ) that constitute spectral energy
description of texture for an entire image or region under consideration. Descriptors of these
w.E
functions themselves can be computed in order to characterize their behavior quantitatively.
9.PATTERNS AND PATTERNS CLASSES
asy
A pattern is an arrangement of descriptors. The name feature is used often in the pattern
recognition literature to denote a descriptor. A pattern class is a family of patterns that share
some common properties. Pattern classes are denoted ω1,ω2,...ωw where w is the number of
En
classes. Pattern recognition by machine involves techniques for assigning patterns to their
respective classes.
gi
Three common pattern arrangements used in practice are vectors, strings and trees.
nee
Pattern vectors are represented by bold lowercase letters such as x, y and z and take the form
where each component xi represents the ith descriptor and n is the total number of such
descriptors associated with the pattern. Pattern vectors are represented as columns (nX1
matrices).
rin
g.n
e t
The nature of the component of a pattern vector x depends on the approach used to
describe the physical pattern itself. An example is given, a new technique called discriminant
analysis to recognize three types of iris flower(Iris setosa, virginica and versicolor) by measuring
the width and length of their petals.
In our present terminology, each flower is described by two measurements which leads
to 2D pattern vector of the form
P a g e | 116
where x1 and x2 correspond to petal length and width respectively. The three pattern classes in
this case denoted by ω1,ω2,ω3 corresponds to the varieties setosa, virginica and versicolor
respectively.
ww
w.E
asy
En
gi nee
rin
Fig 5.12 Three types of iris flower described by two measurements
g.n
Because the petals of flowers vary in width and length, the pattern vectors describing
these flowers also will vary, not only between different classes but also within a class. Each
e
flower in this figure becomes a point in 2D Euclidean space. We note that measurements of petal
width and length in this case adequately separated the class of iris setosa from the other two but
did not separate as successfully the virginica and versicolor types each other. This result
illustrates the classic feature selection problem in which degree of class separability depends
strongly on the choice of descriptor selected for an application.
t
Another example of pattern vector generation. In this different types of noise shapes are
taken, a sample of which are shown in figure. If we elect to represent each object by its signature,
we obtain 1 D signals of the form shown in figure. We sample the signature at some specified
interval values denoted by ϴ1, ϴ2,... ϴn. Then we can form pattern vectors by letting
x1=r(ϴ1),x2=r(ϴ2),...,xn=r(ϴn). These vectors become points in n dimensional Euclidean space
and pattern class can be imagined to be clouds in n dimension.
P a g e | 117
ww
Instead of using signature amplitude directly we can compute the first n statistical moments of
a given signature and use these descriptors as components of each pattern vectors. In fact
pattern vectors can be generated in numerous different ways.
w.E
The techniques for generating pattern vectors yield pattern classes characterized by
quantitative information. In some application pattern characteristics are described by structural
relationships. For example fingerprint recognition is based on the interrelationships of print
asy
features called minutiae. Together with their relative sizes and locations these features are
primitive components that describe fingerprint ridge properties such as abrupt endings,
En
branching, merging and disconnected segments. Recognition problems of this type, in which not
only quantitative measures about each feature but also the spatial relationships between the
features determine class membership, generally best solved by structural approaches.
gi
A simple staircase pattern is shown in figure. This pattern could be sampled and expressed in
nee
terms of a pattern vector. The basic structure consisting of repetitions of two simple primitive
elements would be lost in this method of description. Let the pattern be the string of symbols
w=...abababab... The structure of this particular class of patterns is captured in this description
rin
by requiring that connectivity be defined in a head to tail manner and by allowing only alternating
symbols. This structural construct is applicable to staircase of any length but excludes other
g.n
types of structure that could be generated by other combinations of the primitives a and b.
Structure coded in terms of the primitive a and b to yield the string description ...ababab...
e t
P a g e | 118
PART A
1. What is segmentation?
The first step in image analysis is to segment the image. Segmentation subdivides animage into
its constituent parts or objects.
2. Write the applications of segmentation
.The applications of segmentation are, i. Detection of isolated points. ii. Detection of lines and
edges in an image.
ww
4. How the discontinuity is detected in an image using segmentation?
The steps used to detect the discontinuity in an image using segmentation are
w.E
i.Compute the sum of the products of the coefficient with the gray levels contained intheregion
encompassed by the mask.
asy
ii.The response of the mask at any point in the image is
R = w1z1+ w2z2+ w3z3+………..+w9z9
En
iiiWhere zi= gray level of pixels associated with mass coefficient wi.
gi
iv.The response of the mask is defined with respect to its center location.
nee
5. Why edge detection is most common approach for detecting discontinuities?
rin
The isolated points and thin lines are not frequent occurrences in most practical applications, so
edge detection is mostly preferred in detection of discontinuities.
6. How the derivatives are obtained in edge detection during formulation?
g.n
The first derivative at any point in an image is obtained by using the magnitude of the gradient
The approach for linking edge points is to analyse the characteristics of pixels in a small
t
neighborhood (3x3 or 5x5) about every point (x,y)in an image that has undergone edge detection.
All points that are similar are linked, forming a boundary of pixels that share some common
properties
P a g e | 119
An edge is a set of connected pixels that lie on the boundary between two regions. Edges are
more closely modeled as having a ramp like profile. The slope of the ramp is
inversely proportional to the degree of blurring in the edge.
10. What is meant by object point and background point?
To execute the objects from the background is to select a threshold T that separate these modes.
Then any point(x,y) for which f(x,y)>T is called an object point. Otherwise the point is called
background point.
11. Define region growing.
Region growing is a procedure that groups pixels or sub regions into layer regions based
on predefined criteria. The basic approach is to start with a set of seed points and from the grow
regions by appending to each seed these neighboring pixels that have properties similar to the
ww
seed.
w.E
An approach used to control over segmentation is based on markers. Marker is a connected
component belonging to an image. We have internal markers, associated with objects of interest
and external markers associated with background.
asy
13. What are the two principle steps involved in marker selection?
The two steps are
En
i.Preprocessing. ii.Definition of a set of criteria that markers must satisfy.
14. What is pattern?
gi nee
Pattern is a quantitative or structural description of an object or some other entity of interest in
an image. It is formed by one or more descriptors.
15. What is pattern class?
rin
g.n
It is a family of patterns that share some common properties. Pattern classes are denoted as
P a g e | 120
Chain codes are used to represent a boundary by a connected sequence of straight line segment
of specified length and direction. Typically this representation is based on 4 or 8connectivity of
segments. The direction of each segment is coded by using a numbering scheme.
19. What are the demerits of chain code?
ww
digital boundary can be approximated with arbitrary accuracy by a polygon. For a closed curve
theapproximation is exact when the number of segments in polygon is equal to the number
of points in the boundary so that each pair of adjacent points defines a segment in the polygon.
w.E
21.Specify the various polygonal approximation methods.
asy
The various polygonal approximation methods are
i.Minimum perimeter polygons.
ii.Merging techniques.
En
iii.Splitting techniques.
22.Name few boundary descriptors.
i.Simple descriptors.
gi nee
ii.Shape descriptors.
rin
iii.Fourier descriptors.
23. Define length of a boundary. g.n
e
The length of a boundary is the number of pixels along a boundary. Example, for a chain coded
curve with unit spacing in both directions, the number of vertical and horizontal components plus
t
√2 times the number of diagonal components gives its exact length.
24. Define shape numbers.
Shape number is defined as the first difference of smallest magnitude. The order n of a shape
number is the number of digits in its representation.
25. Name few measures used as simple descriptors in region descriptors
i.Area
i.Perimeter.
iii.Mean and median gray levels
P a g e | 121
ww
The approaches to describe the texture of a region are,
i.Statistical approach.
w.E
ii.Structural approach.
iii.Spectural approach.
Part B
asy
1. Write short notes on image segmentation.
En
gi
2. Write short notes on edge detection.
P a g e | 122
a) Shape Number
b) Fourier Descriptor
c) Moments
17.Explain in detail about regional descriptor.
18. Write a brief note on Recognition based on Matching.
ww
w.E
asy
En
gi nee
rin
g.n
e t
ww
w.E
asy
E ngi
nee
rin
g.n
et