Digital Image Processing Notes 2020
Digital Image Processing Notes 2020
Syllabus
Unit I
Unit II
Image Enhancements: Image Enhancements: In spatial domain: Basic gray level transformations,
Histogram processing, using arithmetic/Logic operations, smoothing spatial filters, Sharpening
spatial filters.
In Frequency domain: Introduction to the Fourier transform and frequency domain concepts,
smoothing frequency-domain filters, Sharpening frequency domain filters.
Unit III
Image Restoration and Colour Image processing: Various noise models, image restoration using
spatial domain filtering, image restoration using frequency domain filtering, Estimating the
degradation function, Inverse filtering.
Colour fundamentals, Colour models, Colour transformation, Smoothing and Sharpening, Colour
segmentation
Unit IV
Text Books:
Table of Contents
1.1 Introduction
The field of digital image processing refers to processing digital images by means of digital
computer. Digital image is composed of a finite number of elements, each of which has a
particular location and value. These elements are called picture elements, image elements, pels
and pixels. Pixel is the term used most widely to denote the elements of digital image.
An image is a two-dimensional function that represents a measure of some characteristic such
as brightness or color of a viewed scene. An image is a projection of a 3-D scene into a 2D
projection plane.
An image may be defined as a two-dimensional function f (x, y), where x and y are spatial
(plane) coordinates, and the amplitude of f at any pair of coordinates (x, y) is called the
intensity of the image at that point.
The term gray level is used often to refer to the intensity of monochrome images. Color images
are formed by a combination of individual 2-D images.
For example: The RGB color system, a color image consists of three (red, green and blue)
individual component images. For this reason, many of the techniques developed for
monochrome images can be extended to color images by processing the three component
images individually.
An image may be continuous with respect to the x- and y- coordinates and also in amplitude.
Converting such an image to digital form requires that the coordinates, as well as the amplitude,
be digitized.
1.2 Light and Electromagnetic Spectrum
Light, or Visible Light, commonly refers to electromagnetic radiation that can be detected by
the human eye. The entire electromagnetic spectrum is extremely broad, ranging from low
energy radio waves with wavelengths that are measured in meters, to high energy gamma rays
with wavelengths that are less than 1 x 10-11 meters. Electromagnetic radiation, as the name
suggests, describes fluctuations of electric and magnetic fields, transporting energy at the
Speed of Light (which is ~ 300,000 km/sec through a vacuum). Light can also be described in
terms of a stream of photons, massless packets of energy, each travelling with wavelike
properties at the speed of light.
1
Visible light is not inherently different from the other parts of the electromagnetic spectrum,
with the exception that the human eye can detect visible waves. This in fact corresponds to
only a very narrow window of the electromagnetic spectrum, ranging from about 400nm for
violet light through to 700nm for red light. Radiation lower than 400nm is referred to as Ultra-
Violet (UV) and radiation longer than 700nm is referred to as Infra-Red (IR), neither of which
can be detected by the human eye.
Gamma rays: Gamma-rays are high frequency (or shortest wavelength) electromagnetic
radiation and therefore carry a lot of energy. Used in Radio Therapy and to kill cancerous cells
x-rays: X-ray, electromagnetic radiation of extremely short wavelength and high frequency.
Used to detect bone fracture, the discovery of cavities and impacted wisdom teeth.
Ultraviolet rays: Electromagnetic radiation comes from the sun and transmitted in waves or
particles at different wavelengths and frequencies. Uses for UV light include getting a sun tan,
detecting forged bank notes in shops, and hardening some types of dental filling.
Visible light: Visible light is a very narrow band of frequencies of electromagnetic waves that
are perceptible by the human eye. The eye contains specialized cells called rods and cones that
are sensitive to the visible spectrum. As mentioned previously, most of us see visible light
every day. For example, the sun produces visible light. Incandescent light bulbs, fluorescent,
and neon lights are other examples of visible light that we may see on a regular basis.
Infrared rays: Infrared radiation (IR), sometimes known as infrared light, is electromagnetic
radiation (EMR) with wavelengths longer than those of visible light. It is used in sensors and
TV remote
Micro wave: An electromagnetic wave with a frequency in the range of 100 megahertz to 30
gigahertzes (lower than infrared but higher than other radio waves). Microwaves are used in
radar, radio transmission, cooking, and other applications.
Radio wave: Radio wave, wave from the portion of the electromagnetic spectrum at lower
frequencies than microwaves. They are used in standard broadcast radio and television.
2
1.3 Components of Image Processing
Image Sensors: With reference to sensing, two elements are required to acquire digital image.
The first is a physical device that is sensitive to the energy radiated by the object we wish to
image and second is specialized image processing hardware.
Specialize image processing hardware: It consists of the digitizer just mentioned, plus
hardware that performs other primitive operations such as an arithmetic logic unit, which
performs arithmetic such addition and subtraction and logical operations in parallel on images.
Computer: It is a general purpose computer and can range from a PC to a supercomputer
depending on the application. In dedicated applications, sometimes specially designed
computer are used to achieve a required level of performance
Software: It consists of specialized modules that perform specific tasks a well-designed
package also includes capability for the user to write code, as a minimum, utilizes the
specialized module. More sophisticated software packages allow the integration of these
modules.
Mass storage: This capability is a must in image processing applications. An image of size
1024 x1024 pixels, in which the intensity of each pixel is an 8- bit quantity requires one
Megabytes of storage space if the image is not compressed. Image processing applications falls
into three principal categories of storage
i) Short term storage for use during processing
ii) On line storage for relatively fast retrieval
iii) Archival storage such as magnetic tapes and disks
Image display: Image displays in use today are mainly color TV monitors. These monitors are
driven by the outputs of image and graphics displays cards that are an integral part of computer
system.
3
Hardcopy devices: The devices for recording image includes laser printers, film cameras, heat
sensitive devices inkjet units and digital units such as optical and CD ROM disk. Films provide
the highest possible resolution, but paper is the obvious medium of choice for written
applications.
Networking: It is almost a default function in any computer system in use today because of
the large amount of data inherent in image processing applications. The key consideration in
image transmission bandwidth.
1.4 Image formation and Digitization
In order to become suitable for digital processing, an image function f(x,y) must be digitized
both spatially and in amplitude. Typically, a frame grabber or digitizer is used to sample and
quantize the analogue video signal. Hence in order to create an image which is digital, we need
to covert continuous data into digital form. There are two steps in which it is done:
Sampling
Quantization
Sampling
Mathematically, sampling can be defined as mapping of the signal’s domain from the space R
to the space N. Sampling of the signal is a process of creating a discrete signal from the
continuous one, in a way that the values (samples) are taken only in the certain places (or with
certain time steps) from the original continuous signal. Thus, the image can be seen as matrix
Quantization
Quantization corresponds to a discretization of the intensity values. That is, of the co-domain
of the function. After sampling and quantization, we get f : [1, . . . ,N] × [1, . . . , M] −→ [0, . .
. , L]. Typically, 256 levels (8 bits/pixel) suffices to represent the intensity. For color images,
256 levels are usually used for each color intensity.
4
1.5 Neighbors of pixels, adjacency and connectivity
1.5.1 Neighbors of pixels
a. N4 (p): 4-neighbors of p:
Any pixel p (x, y) has two vertical and two horizontal neighbors, given by (x+1,
y), (x-1, y), (x, y+1), (x, y-1)
This set of pixels are called the 4-neighbors of P, and is denoted by N4(P)
Each of them is at a unit distance from P.
c. N8(p): 8-neighbors of p:
N4(P)and ND(p) together are called 8-neighbors of p, denoted by N8(p).
N8= N4 U ND
Some of the points in the N4, ND and N8 may fall outside image when P lies on
the border of image.
Two pixels are connected if they are neighbors and their gray levels satisfy some
specified criterion of similarity.
For example, in a binary image two pixels are connected if they are 4-neighbors and
have same value (0/1)
Let v: a set of intensity values used to define adjacency and connectivity.
5
In a binary Image v={1}, if we are referring to adjacency of pixels with value 1.
In a Gray scale image, the idea is the same, but v typically contains more elements,
for example v= {180, 181, 182,....,200}.
If the possible intensity values 0 to 255, v set could be any subset of these 256 values.
Types of adjacency
1. 4-adjacency: Two pixels’ p and q with values from v are 4-adjacent if q is in the set N4
(p).
2. 8-adjacency: Two pixels’ p and q with values from v are 8-adjacent if q is in the set
N8(p).
3. m-adjacency (mixed): two pixels’ p and q with values from v are m-adjacent if:
q is in N4(p) or
q is in ND (P) and The set N4(p) ∩ N4(q) has no pixel whose values are from v (No
intersection).
Mixed adjacency is a modification of 8-adjacency ''introduced to eliminate the
ambiguities that often arise when 8- adjacency is used. (eliminate multiple path
connection)
Pixel arrangement as shown in figure for v= {1}
6
1.6 Region and Boundaries
4-path between the two regions does not exist, (so their union in not a connected set).
Boundary (border) image contains K disjoint regions, Rk, k=1, 2, ...., k, none of which
touches the image border
Let RU denotes the union of all the K regions, (Ru)c denote its complement
(Complement of a set S is the set of points that are not in s). Ru is called foreground;
(Ru)c is called background of the image.
Boundary (border or contour) of a region R is the set of points that are adjacent to
points in the complement of R
7
The Euclidean Distance between p and q is defined as:
De (p,q) = [(x –s)2 + (y - t)2]1/2
Pixels having a distance less than or equal to some value r from (x,y) are the points
contained in a disk of radius ‘r’ centered at (x,y)
The D4 distance (also called city-block distance) between p and q is defined as: D4
(p,q) = |x –s | + | y –t |
Pixels having a D4 distance from (x,y), less than or equal to some value r form a
Diamond centered at (x,y)
Example:
The pixels with distance D4 ≤ 2 from x,y) for the( following contours of constant
distance.
The pixels with D4 = 1 are the 4-neighbors of (x,y)
The D8 distance (also called chessboard distance) between p and q is defined as: D8
(p,q) = max(| x –s |,| y –t |)
8
Pixels having a D8 distance from (x,y), less than or equal to some value r form a square
Centered at (x,y).
Example:
D8 distance ≤ 2 from (x,y) form the follow
Dm distance:
It is defined as the shortest m-path between the points.
In this case, the distance between two pixels will depend on the values of the
pixels along the path, as well as the values of their neighbors.
• Example:
Consider the following arrangement of pixels and assume that p, p2, and p4
have value 1 and that p1 and p3 can have can have a value of 0 or 1 Suppose
that we consider the adjacency of pixels’ values 1 (i.e. V = {1})
9
The length of the shortest m-path (the Dm distance) is 2 (p, p2, p4)
Case2: If p1 =1 and p3 = 0
now, p1 and p will no longer be adjacent (see m-adjacency definition) then, the length
of the shortest path will be 3 (p, p1, p2, p4)
Case3: If p1 =0 and p3 = 1
The same applies here, and the shortest –m-path will be 3 (p, p2, p3, p4)
Case4: If p1 =1 and p3 = 1
The length of the shortest m-path will be 4 (p, p1, p2, p3, p4)
1.8 Applications
Since digital image processing has very wide applications and almost all of the technical
fields are impacted by DIP, we will just discuss some of the major applications of DIP.
Digital image processing has a broad spectrum of applications, such as
Remote sensing via satellites and other spacecrafts
Image transmission and storage for business applications
Medical processing,
RADAR (Radio Detection and Ranging)
SONAR(Sound Navigation and Ranging) and
10
Acoustic image processing (The study of underwater sound is known as
underwater acoustics or hydro acoustics.)
Robotics and automated inspection of industrial parts.
Images acquired by satellites are useful in tracking of
Earth resources;
Geographical mapping;
Prediction of agricultural crops,
Urban growth and weather monitoring
Flood and fire control and many other environmental applications.
Space image applications include:
Recognition and analysis of objects contained in images obtained from deep space-
probe missions.
Image transmission and storage applications occur in broadcast television
Teleconferencing
Transmission of facsimile images (Printed documents and graphics) for office
automation
Communication over computer networks
Closed-circuit television based security monitoring systems and
In military communications.
Medical applications:
Processing of chest X- rays
Cineangiograms
Projection images of transaxial tomography and
Medical images that occur in radiology nuclear magnetic resonance(NMR)
Ultrasonic scanning
11
UNIT II
IMAGE ENHANCEMENTS
g(x,y) = T[f(x,y)]
where f(x, y) is the input image, g(x, y) is the processed image, and T is an operator on f,
defined over some neighborhood of (x, y). The principal approach in defining a neighborhood
about a point (x, y) is to use a square or rectangular subimage area centered at (x, y), as Fig.
shows. The center of the subimage is moved from pixel to pixel starting, say, at the top left
corner. The operator T is applied at each location (x, y) to yield the output, g, at that location.
The process utilizes only the pixels in the area of the image spanned by the neighborhood.
The simplest form of T is when the neighbourhood is of size 1*1 (that is, a single pixel). In
this case, g depends only on the value of f at (x, y), and T becomes a gray-level (also called an
intensity or mapping) transformation function of the form
12
s = T (r)
where r is the pixels of the input image and s is the pixels of the output image. T is a
transformation function that maps each value
For example, if T(r) has the form shown in Fig. 2.2(a), the effect of this transformation would
be to produce an image of higher contrast than the original by darkening the levels below m
and brightening the levels above m in the original image. In this technique, known as contrast
stretching, the values of r below m are compressed by the transformation function into a narrow
range of s, toward black. The opposite effect takes place for values of r above m.
In the limiting case shown in Fig. 2.2(b), T(r) produces a two-level (binary) image. A mapping
of this form is called a thresholding function.
One of the principal approaches in this formulation is based on the use of so-called masks (also
referred to as filters, kernels, templates, or windows). Basically, a mask is a small (say, 3*3)
2-D array, such as the one shown in Fig. 2.1, in which the values of the mask coefficients
determine the nature of the process, such as image sharpening. Enhancement techniques based
on this type of approach often are referred to as mask processing or filtering.
Image enhancement can be done through gray level transformations which are discussed
below.
1. 2. Basic gray level transformations:
Linear transformation
Log transformations
Power law transformations
Piecewise-Linear transformation functions
13
(a) Linear Transformation:
Linear transformation includes simple identity and negative transformation. Identity transition
is shown by a straight line. In this transition, each value of the input image is directly mapped
to each other value of output image. That results in the same input image and output image.
And hence is called identity transformation. It has been shown below:
Image negative or negative transformation: The image negative with gray level value in the
range of [0, L-1] is obtained by negative transformation given by S = T(r) or S = L -1 –r
Where r = gray level value at pixel (x, y), L is the largest gray level consists in the image. It
results in getting photograph negative. It is useful when for enhancing white details embedded
in dark regions of the image. The overall graph of these transitions has been shown below.
Fig. Some basic gray-level transformation functions used for image enhancement.
14
Fig. Negative transformations.
(b) Log transformations:
The log transformations can be defined by this formula
s = c log (r + 1).
Where s and r are the pixel values of the output and the input image and c is a constant. The
value 1 is added to each of the pixel value of the input image because if there is a pixel intensity
of 0 in the image, then log (0) is equal to infinity. So 1 is added, to make the minimum value
at least 1.
During log transformation, the dark pixels in an image are expanded as compare to the higher
pixel values. The higher pixel values are kind of compressed in log transformation. This result
in following image enhancement.
There are further two transformations is power law transformations, that include nth power and
nth root transformation. These transformations can be given by the expression:
s= cr γ (6)
This symbol γ is called gamma, due to which this transformation is also known as gamma
transformation.
Variation in the value of γ varies the enhancement of the images. Different display devices /
monitors have their own gamma correction, that is why they display their image at
different intensity.
15
where c and g are positive constants. Sometimes Eq. (6) is written as S = C (r + ε)γ
to account for an offset (that is, a measurable output when the input is zero). Plots of s versus
r for various values of γ are shown in Fig. As in the case of the log transformation, power-law
curves with fractional values of γ map a narrow range of dark input values into a wider range
of output values, with the opposite being true for higher values of input levels. Unlike the log
function, however, we notice here a family of possible transformation curves obtained simply
by varying γ.
In Fig that curves generated with values of γ>1 have exactly The opposite effect as those
generated with values of γ<1. Finally, we Note that Eq. (6) reduces to the identity
transformation when c=γ=1.
Fig. Plot of the equation S = crγ for various values of γ (c =1 in all cases).
This type of transformation is used for enhancing images for different type of display devices.
The gamma of different display devices is different. For example, Gamma of CRT lies in
between of 1.8 to 2.5, that means the image displayed on CRT is dark.
Varying gamma (γ) obtains family of possible transformation curves S = C* r γ. Here C and γ
are positive constants. Plot of S versus r for various values of γ is
γ > 1 compresses dark values
Expands bright values
γ <1 (similar to Log transformation)
Expands dark values Compresses bright values.
When C = γ = 1, it reduces to identity transformation.
A complementary approach to the methods discussed in the previous three sections is to use
piecewise linear functions. The principal advantage of piecewise linear functions over the types
of functions we have discussed thus far is that the form of piecewise functions can be arbitrarily
complex.
16
transformation. Low-contrast images can result from poor illumination, lack of dynamic range
in the imaging sensor, or even wrong setting of a lens aperture during image acquisition.
S= T(r )
Figure x(a) shows a typical transformation used for contrast stretching. The locations of points
(r1, s1) and (r2, s2) control the shape of the transformation Function. If r1=s1 and r2=s2, the
transformation is a linear function that produces No changes in gray levels. If r1=r2, s1=0 and
s2= L-1, the transformation Becomes a thresholding function that creates a binary image, as
illustrated in fig.
Intermediate values of ar1, s1b and ar2, s2b produce various degrees of spread in the gray levels
of the output image, thus affecting its contrast. In general, r1≤ r2 and s1 ≤ s2 is assumed so that
the function is single valued and Monotonically increasing.
Fig. x Contrast stretching. (a) Form of transformation function. (b) A low-contrast stretching.
(c) Result of contrast stretching. (d) Result of thresholding (original image courtesy of
Dr.Roger Heady, Research School of Biological Sciences, Australian National University
Canberra Australia.
Figure x(b) shows an 8-bit image with low contrast. Fig. x(c) shows the result of contrast
stretching, obtained by setting (r1, s1) = (rmin, 0) and (r2, s2) = (rmax, L-1) where rmin and rmax
denote the minimum and maximum gray levels in the image, respectively. Thus, the
transformation function stretched the levels linearly from their original range to the full range
[0, L-1]. Finally, Fig. x(d) shows the result of using the thresholding function defined
previously, with r1=r2=m, the mean gray level in the image. The original image on which these
results are based is a scanning electron microscope image of pollen, magnified approximately
700 times.
17
Gray-level slicing
Fig. y (a)This transformation highlights range [A, B] of gray levels and reduces all others to a
constant level (b) This transformation highlights range [A, B] but preserves all other levels.
(c) An image. (d) Result of using the transformation in (a).
Bit-plane slicing:
Instead of highlighting gray-level ranges, highlighting the contribution made to total image
appearance by specific bits might be desired. Suppose that each pixel in an image is represented
by 8 bits. Imagine that the image is composed of eight 1-bit planes, ranging from bit-plane 0
for the least significant bit to bit plane 7 for the most significant bit. In terms of 8-bit bytes,
plane 0 contains all the lowest order bits in the bytes comprising the pixels in the image and
plane 7 contains all the high-order bits.
Figure 3.12 illustrates these ideas, and Fig. 3.14 shows the various bit planes for the image
shown in Fig. 3.13. Note that the higher-order bits (especially the top four) contain the majority
of the visually significant data. The other bit planes contribute to subtler details in the image.
Separating a digital image into its bit planes is useful for analysing the relative importance
played by each bit of the image, a process that aids in determining the adequacy
of the number of bits used to quantize each pixel.
18
In terms of bit-plane extraction for an 8-bit image, it is not difficult to show that the (binary)
image for bit-plane 7 can be obtained by processing the input image with a thresholding gray-
level transformation function that (1) maps all levels in the image between 0 and 127 to one
level (for example, 0); and (2) maps all levels between 129 and 255 to another (for example,
255). The binary image for bit-plane 7 in Fig was obtained in just this manner.
1. 3. Histogram Processing:
The histogram of a digital image with gray levels in the range [0, L-1] is a discrete function of
the form
H(rk) = nk
where rk is the kth gray level and nk is the number of pixels in the image having the level rk..
A normalized histogram is given by the equation
P(rk) gives the estimate of the probability of occurrence of gray level rk. The sum of all
components of a normalized histogram is equal to 1. The histogram plots are simple plots of
H(rk)=nk versus rk.
In the dark image the components of the histogram are concentrated on the low (dark) side of
the gray scale. In case of bright image, the histogram components are biased towards the high
side of the gray scale. The histogram of a low contrast image will be narrow and will be centred
towards the middle of the gray scale.
The components of the histogram in the high contrast image cover a broad range of the gray
scale. The net effect of this will be an image that shows a great deal of gray levels details and
has high dynamic range.
19
Histogram Equalization:
Let there be a continuous function with r being gray levels of the image to be enhanced. The
range of r is [0, 1] with r=0 repressing black and r=1 representing white. The transformation
function is of the form
S=T(r) where 0<r<1
20
The transformation function is assumed to fulfil two condition T(r) is single valued and
monotonically increasing in the internal 0<T(r)<1 for 0<r<1.The transformation function
should be single valued so that the inverse transformations should exist. Monotonically
increasing condition preserves the increasing order from black to white in the output image.
The second conditions guarantee that the output gray levels will be in the same range as the
input levels. The gray levels of the image may be viewed as random variables in the interval
[0.1]. The most fundamental descriptor of a random variable is its probability density function
(PDF) Pr(r) and Ps(s) denote the probability density functions of random variables r and s
respectively. Basic results from an elementary probability theory states that if Pr(r) and Tr are
known and T-1(s) satisfies conditions (a), then the probability density function Ps(s) of the
transformed variable is given by the formula
Thus the PDF of the transformed variable s is the determined by the gray levels PDF of the
input image and by the chosen transformations function. A transformation function of a
particular importance in image processing
Image Addition:
21
Blending two images: H(x,y) = α I(x,y) + (1-α) J(x,y)
Applications: Brightening an image, Image Compositing, (Additive) Dissolves
Image Subtraction:
A standard logical operation can be performed between images such as NOT, OR, XOR, and
AND. In general, logical operation is performed between each corresponding bit of the image
pixel representation (i.e. a bit-wise operator).
NOT (inversion): This inverts the image representation. In the simplest case of a
binary image, the (black) background pixels become (white) and vice versa.
OR/XOR: are useful for processing binary-valued images (0 or 1) to detect objects
which have moved between frames. Binary objects are typically produced through
application of thresholding to a grey-scale image.
Logical AND: is commonly used for detecting differences in images, highlighting
target regions with a binary mask or producing bit-planes through an image.
Smoothing filters are used for blurring and for noise reduction.
22
Blurring is used in pre-processing steps, such as removal of small details from an
image prior to object extraction, and bridging of small gaps in lines or curves.
Noise reduction can be accomplishing by blurring with a linear filter and also by
nonlinear filtering.
Example:
23
1. 6 Sharpening Spatial filters:
To highlight fine detail in an image or to enhance detail that has been blurred, either in
error or as a natural effect of a particular method of image acquisition.
Uses of image sharpening vary and include applications ranging from electronic
printing and medical imaging to industrial inspection and autonomous target detection
in smart weapons.
The shape of the impulse response needed to implement a high pass spatial filter
indicates that the filter should have positive coefficients near its centre, and negative
coefficients in the outer periphery.
Example: filter mask of a 3x3 sharpening filter
24
The filtering output pixels might be of a gray level exceeding [0, L-1].
The results of high pass filtering involve some form of scaling and/or clipping to
make sure that the gray levels of the final results are within [0, L-1].
Differentiation can be expected to have the opposite effect of averaging, which tends
to blur detail in an image, and thus sharpen an image and be able to detect edges.
The most common method of differentiation in image processing applications is the
gradient.
For a function f (x, y), the gradient of f at coordinates (x', y') is defined as the vector
25
2. Image enhancement in frequency domain
2. 1. Introduction
In the frequency domain, a digital image is converted from spatial domain to frequency domain.
In the frequency domain, image filtering is used for image enhancement for a specific
application. A Fast Fourier transformation is a tool of the frequency domain used to convert
the spatial domain to the frequency domain. For smoothing an image, low filter is implemented
and for sharpening an image, high pass filter is implemented. When both the filters are
implemented, it is analysed for the ideal filter, Butterworth filter and Gaussian filter.
The basic principle of frequency domain analysis in image filtering is to computer 2D discrete
Fourier transform of the image.
26
2.3 Fourier transformation: Fourier transformation is a tool for image processing. it is used
for decomposing an image into sine and cosine components. The input image is a spatial
domain and the output is represented in the Fourier or frequency domain. Fourier
transformation is used in a wide range of application such as image filtering, image
compression. Image analysis and image reconstruction etc.
Example:
27
2. 4. Smoothing frequency domain filters:
Cuts off all high-frequency components at a distance greater than a certain distance
from origin (cut-off frequency).
Where D0 is a positive constant and D (u, v) is the distance between a point (u, v) in the
frequency domain and the centre of the frequency rectangle; that is
Whereas P and Q are the padded sizes from the basic equations. Wraparound error in their
circular convolution can be avoided by padding these functions with zeros,
Fig: ideal low pass filter 3-D view and 2-D view and line graph.
Fig. below(a) Test pattern of size 688x688 pixels, and (b) its Fourier spectrum. The spectrum
is double the image size due to padding but is shown in half size so that it fits in the page. The
superimposed circles have radii equal to 10, 30, 60, 160 and 460 with respect to the full-size
spectrum image. These radii enclose 87.0, 93.1, 95.7, 97.8 and 99.2% of the padded image
power respectively.
28
Fig: (a) Test patter of size 688x688 pixels (b) its Fourier spectrum
Fig: (a) original image, (b)-(f) Results of filtering using ILPFs with cut-off frequencies set at
radii values 10, 30, 60, 160 and 460, as shown in fig.2.2.2(b). The power removed by these
filters was 13, 6.9, 4.3, 2.2 and 0.8% of the total, respectively.
The severe blurring in this image is a clear indication that most of the sharp detail information
29
in the picture is contained in the 13% power removed by the filter. As the filter radius is
increases less and less power is removed, resulting in less blurring. Fig. (c ) through (e) are
characterized by “ringing” , which becomes finer in texture as the amount of high frequency
content removed decreases.
Ideal low-pass filter function is a rectangular function. The inverse Fourier transform of a
rectangular function is a sinc function.
profiles through the centre of the filters (the size of all cases is 1000x1000 and the cut-off
frequency is 5), observe how ringing increases as a function of filter order.
Transform function of a Butterworth low pass filter (BLPF) of order n, and with cut-off
frequency at a distance D0 from the origin, is defined as
30
Transfer function does not have sharp discontinuity establishing cut-off between passed and
filtered frequencies.
Fig. (a) perspective plot of a Butterworth low pass filter transfer function. (b) Filter displayed
as an image. (c)Filter radial cross sections of order 1 through 4.
Unlike the ILPF, the BLPF transfer function does not have a sharp discontinuity that gives a
clear cut-off between passed and filtered frequencies.
Gaussian low pass filters: The form of these filters in two dimensions is given by
Where D0 is the cut off frequency. When D(u,v) = D0, the GLPF is down to 0.607 of its
maximum value. This means that a spatial Gaussian filter, obtained by computing the IDFT of
above equation., will have no ringing. Fig. shows a perspective plot, image display and radial
cross sections of a GLPF function.
31
Fig. (a) Perspective plot of a GLPF transfer function. (b) Filter displayed as an image. (c) Filter
radial cross sections for various values of D0
Fig.(a) Original image. (b)-(f) Results of filtering using GLPFs with cut off frequencies at the
radii shown in fig.2.2.2. compare with fig.2.2.3 and fig.2.2.6
Fig. (a) Original image (784x 732 pixels). (b) Result of filtering using a GLPF with D0 = 100.
(c) Result of filtering using a GLPF with D0 = 80. Note the reduction in fine skin lines in the
32
magnified sections in (b) and (c).
Fig. shows an application of low pass filtering for producing a smoother, softer-looking result
from a sharp original. For human faces, the typical objective is to reduce the sharpness of fine
skin lines and small blemished.
The filter function H(u, v) are understood to be discrete functions of size PxQ; that is
the discrete frequency variables are in the-1 and range u = 0,1,2,……P-1 and v = 0, 1,2,
…..Q-1.
A high pass filter is obtained from a given low pass filter using the equation.
Where Htp (u,v) is the transfer function of the low-pass filter. That is when the low-pass filter
attenuates frequencies, the high-pass filter passed them, and vice-versa.
We consider ideal, Butter-worth, and Gaussian high-pass filters. As in the previous section, we
illustrate the characteristics of these filters in both the frequency and spatial domains. Fig..
shows typical 3-D plots, image representations and cross sections for these filters. As before,
we see that the Butter-worth filter represents a transition between the sharpness of the ideal
filter and the broad smoothness of the Gaussian filter. Fig. discussed in the sections the follow,
illustrates what these filters look like in the spatial domain. The spatial filters were obtained
and displayed by using the procedure used.
33
Fig: Top row: Perspective plot, image representation, and cross section of a typical ideal high-
pass filter. Middle and bottom rows: The same sequence for typical butter-worth and Gaussian
high-pass filters.
𝐻(𝑢, 𝑣) = 0, 𝑖𝑓 𝐷(𝑢, 𝑣) ≤ 𝐷0
{
1, 𝑖𝑓 𝐷(𝑢, 𝑣) > 𝐷0
Where D0 is the cut-off frequency and D(u, v) is given by eq. As intended, the IHPF is the
opposite of the ILPF in the sense that it sets to zero all frequencies inside a circle of radius D0
while passing, without attenuation, all frequencies outside the circle. As in case of the ILPF,
the IHPF is not physically realizable.
34
Fig. Spatial representation of typical (a) ideal (b) Butter-worth and (c) Gaussian frequency
domain high-pass filters, and corresponding intensity profiles through their centres. We can
expect IHPFs to have the same ringing properties as ILPFs. This is demonstrated clearly in Fig.
which consists of various IHPF results using the original image in Fig.(a) with D0 set to 30, 60
and 160 pixels, respectively. The ringing in Fig. (a) is so severe that it produced distorted,
thickened object boundaries (e.g., look at the large letter “a”). Edges of the top three circles do
not show well because they are not as strong as the other edges in the image (the intensity of
these three objects is much closer to the background intensity, giving discontinuities of smaller
magnitude).
Fig. Results of high-pass filtering the image in Fig.(a) using an IHPF with D0 = 30, 60, and
160.
The situation improved somewhat with D0 = 60. Edge distortion is quite evident still, but now
we begin to see filtering on the smaller objects. Due to the now familiar inverse relationship
between the frequency and spatial domains, we know that the spot size of this filter is smaller
than the spot of the filter with D0 = 30. The result for D0 = 160 is closer to what a high-pass
filtered image should look like. Here, the edges are much cleaner and less distorted, and the
smaller objects have been filtered properly.
Of course, the constant background in all images is zero in these high-pass filtered images
because high pass filtering is analogous to differentiation in the spatial domain.
A 2-D Butter-worth high-pass filter (BHPF) of order n and cut-off frequency D0 is defined as
Where D(u, v) is given by Eq.(3). This expression follows directly from Eqs. (3) and (6). The
middle row of Fig.2.2.11. shows an image and cross section of the BHPF function. Butter-
worth high-pass filter to behave smoother than IHPFs. Fig.2.2.14 shows the performance of a
BHPF of order 2 and with D0 set to the same values as in Fig.2.2.13. The boundaries are much
less distorted than in Fig.2.2.13. even for the smallest value of cut-off frequency.
35
FILTERED RESULTS: BHPF:
Fig. Results of high-pass filtering the image in Fig.2.2.2(a) using a BHPF of order 2 with D0
= 30, 60, and 160 corresponding to the circles in Fig.2.2.2(b). These results are much smoother
than those obtained with an IHPF.
The transfer function of the Gaussian high-pass filter(GHPF) with cut-off frequency locus at a
distance D0 from the centre of the frequency rectangle is given by
Where D(u,v) is given by Eq.(4). This expression follows directly from Eqs.(2) and (6). The
third row in Fig.2.2.11. shows a perspective plot, image and cross section of the GHPF
function. Following the same format as for the BHPF, we show in Fig.2.2.15. comparable
results using GHPFs. As expected, the results obtained are more gradual than with the previous
two filters.
Fig. Results of high-pass filtering the image in fig.(a) using a GHPF with D0 = 30, 60 and 160,
corresponding to the circles in Fig.(b)
36
UNIT III
IMAGE RESTORATION AND COLOUR IMAGE PROCESSING
1. Image Restoration:
1.1 Introduction
Degradation process operates on a degradation function that operates on an input image with
an additive noise term. Input image is represented by using the notation f(x,y), noise term can
be represented as η(x,y).These two terms when combined gives the result as g(x,y). If we are
given g(x,y), some knowledge about the degradation function H or J and some knowledge
about the additive noise teem η(x,y), the objective of restoration is to obtain an estimate f'(x,y)
of the original image. We want the estimate to be as close as possible to the original image.
The more we know about h and η, the closer f(x,y) will be to f'(x,y). If it is a linear position
invariant process, then degraded image is given in the spatial domain by
g(x,y)=f(x,y)*h(x,y)+η(x,y)
G(u,v)=F(u,v)H(u,v)+N(u,v)
The terms in the capital letters are the Fourier Transform of the corresponding terms in
the spatial domain.
37
Fig: A model of the image Degradation / Restoration process
The principal source of noise in digital images arises during image acquisition and /or
transmission. The performance of imaging sensors is affected by a variety of factors, such as
environmental conditions during image acquisition and by the quality of the sensing elements
themselves. Images are corrupted during transmission principally due to interference in the
channels used for transmission. Since main sources of noise presented in digital images are
resulted from atmospheric disturbance and image sensor circuitry, following assumptions can
be made i.e. the noise model is spatial invariant (independent of spatial location). The noise
model is uncorrelated with the object function.
Gaussian Noise:
These noise models are used frequently in practices because of its tractability in both spatial
and frequency domains. The PDF of Gaussian random variable is
Where z represents the gray level, μ= mean of average value of z, σ= standard deviation.
38
Rayleigh Noise:
39
Its shape is similar to Rayleigh disruption. This equation is referred to as gamma density it is
correct only when the denominator is the gamma function.
(iv) Exponential Noise:
Exponential distribution has an exponential shape. The PDF of exponential noise is given as
Where a>0. The mean and variance of this density are given by
40
The mean and variance of this noise is
g(x,y)=f(x,y)+η(x,y)
or
G(u,v)= F(u,v)+ N(u,v)
The noise terms are unknown so subtracting them from g(x,y) or G(u,v) is not a realistic
approach. In the case of periodic noise it is possible to estimate N(u,v) from the spectrum
G(u,v). So N(u,v) can be subtracted from G(u,v) to obtain an estimate of original image. Spatial
41
filtering can be done when only additive noise is present. The following techniques can be used
to reduce the noise effect:
i) Mean Filter:
(a)Arithmetic Mean filter:
It is the simplest mean filter. Let Sxy represents the set of coordinates in the sub image of size
m*n cantered at point (x, y). The arithmetic mean filter computes the average value of the
corrupted image g(x, y) in the area defined by Sxy. The value of the restored image f at any
point (x,y) is the arithmetic mean computed using the pixels in the region defined by Sxy.
This operation can be using a convolution mask in which all coefficients have value 1/mn A
mean filter smoothes local variations in image Noise is reduced as a result of blurring. For
every pixel in the image, the pixel value is replaced by the mean value of its neighboring
pixels with a weight. This will have resulted in a smoothing effect in the image.
Here, each restored pixel is given by the product of the pixel in the sub image window, raised
to the power 1/mn. A geometric mean filters but it to loose image details in the process.
(c) Harmonic Mean filter:
The harmonic mean filtering operation is given by the expression
The harmonic mean filter works well for salt noise but fails for pepper noise. It does well with
Gaussian noise also.
(d) Order statistics filter:
Order statistics filters are spatial filters whose response is based on ordering the pixel contained
in the image area encompassed by the filter. The response of the filter at any point is determined
by the ranking result.
(e) Median filter:
It is the best order statistic filter; it replaces the value of a pixel by the median of gray
42
levels in the Neighbourhood of the pixel.
The original of the pixel is included in the computation of the median of the filter are quite
possible because for certain types of random noise, the provide excellent noise reduction
capabilities with considerably less blurring then smoothing filters of similar size. These are
effective for bipolar and unipolar impulse noise.
(e) Max and Min filter:
Using the l00th percentile of ranked set of numbers is called the max filter and is given by the
equation
It is used for finding the brightest point in an image. Pepper noise in the image has very low
values, it is reduced by max filter using the max selection process in the sublimated area sky.
The 0th percentile filter is min filter.
This filter is useful for flinging the darkest point in image. Also, it reduces salt noise of the min
operation.
(f) Midpoint filter:
The midpoint filter simply computes the midpoint between the maximum and minimum values
in the area encompassed by
It comeliness the order statistics and averaging. This filter works best for randomly distributed
noise like Gaussian or uniform noise.
43
D(u, v)- the distance from the origin of the cantered frequency rectangle.
W- the width of the band
Do- the radial centre of the frequency rectangle.
Butterworth Band Reject Filter:
These filters are mostly used when the location of noise component in the frequency domain is
known. Sinusoidal noise can be easily removed by using these kinds of filters because it shows
two impulses that are mirror images of each other about the origin. Of the frequency transform.
These filters cannot be applied directly on an image because it may remove too much details
of an image but these are effective in isolating the effect of an image of selected frequency
bands.
44
Notch Filters:
A notch filter rejects (or passes) frequencies in predefined neighborhoods about a center
frequency. Due to the symmetry of the Fourier transform notch filters must appear in symmetric
pairs about the origin. The transfer function of an ideal notch reject filter of radius D0 with
centres a (u0, v0) and by symmetry at (-u0 , v0) is
45
1.5 Inverse Filtering
The simplest approach to restoration is direct inverse filtering where we complete an estimate
𝐹̂ (𝑢, 𝑣)of the transform of the original image simply by dividing the transform of the degraded
image G(u, v) by degradation function H(u, v)
From the above equation we observe that we cannot recover the undegraded image exactly
because N(u, v) is a random function whose Fourier transform is not known. One approach to
get around the zero or small-value problem is to limit the filter frequencies to values near the
origin. We know that H(0,0) is equal to the average values of h(x, y). By Limiting the analysis
to frequencies near the origin we reduce the probability of encountering zero values.
Minimum mean Square Error (Wiener) filtering:
The inverse filtering approach has poor performance. The wiener filtering approach uses the
degradation function and statistical characteristics of noise into the restoration process. The
objective is to find an estimate 𝑓̂ of the uncorrupted image f such that the mean square error
between them is minimized. The error measure is given by
46
Where H(u,v)= degradation function
H*(u,v)=complex conjugate of H(u,v)
| H(u,v)|2 = H* (u,v) H(u,v)
Sn(u,v)=|N(u,v)|2 = power spectrum of the noise
Sf(u,v)=|F(u,v)|2 = power spectrum of the underrated image
The power spectrum of the undegraded image is rarely known. An approach used frequently
when these quantities are not known or cannot be estimated then the expression used is
The optimality criteria for restoration is based on a measure of smoothness, such as the second
derivative of an image (Laplacian). The minimum of a criterion function C defined as
Where ||𝑤||2 ≜ 𝑤 𝑇 w is a euclidean vector norm 𝑓̂is estimate of the undegraded image. ∇2 is
laplacian operator.
The frequency domain solution to this optimization problem is given by
47
Where γ is a parameter that must be adjusted so that the constraint is satisfied. P(u,v) is the
Fourier transform of the laplacian operator
All colour values R, G and B have been normalized in the range [0, 1]. However, we can
represent each of R, G and B from 0 to 255. Each RGB colour image consist of three component
images, one for each primary colour as shown in the figure below. These three images are
combined on the screen to produce a colour image.
49
The total number of bits used to represent each pixel in RGB image is called pixel depth. For
example, in an RGB image if each of the red, green and blue images in 8-bit image, the pixel
depth of the RGB image is 24-bits. The figure below shows the component images of an RGB
image.
50
where, all colour values have been normalized to the range [0, 1]. In printing, combining equal
amounts of cyan, magenta and yellow produce muddy-looking black. In order to produce true
black, a fourth colour, and black is added, giving rise to the CMYK colour model.
The figure below shows the CMYK component images of an RGB image.
51
52
Converting colours from HIS to RGB
53
Fig. A full-colour images and its HSI component images
1.7 Colour Transformation
As with the gray-level transformation, we model colour transfermations using the expression
where f (x, y) is a colour input image, g(x, y) is the transformed colour output image, and T is
the colour transform.
This colour transform can also be written
For example, we wish to modify the intensity of the image shown in below figure
54
Fig. (a) Original image. (b) Result of decreasing its intensity
Colour Complement
Colour complement replaces each colour with its opposite colour in the colour
circle of the Hue component. This operation is analogous to image negative in a
gray scale image
55
Colour Slicing
56
Example
57
Colour image sharpening
58
Example:
59
Segmentation in RGB vector space
Example:
60
61
UNIT IV
IMAGE COMPRESSION AND IMAGE SEGMENTAION
1.1 Introduction
Image compression is an application of data compression that encodes the original image with
few bits. The objective of image compression is to reduce the redundancy of the image and to
store or transmit data in an efficient form. The main goal of such system is to reduce the storage
quantity as much as possible, and the decoded image displayed in the monitor can be similar
to the original image as much as can be.
1. 2 Image Compression Model
Compression has two types i.e. Lossy and Lossless technique. Atypical image compression
system comprises of two main blocks an Encoder (Compressor) and Decoder (Decompressor).
The image f (x, y) is fed to the encoder which encodes the image so as to make it suitable for
transmission. The decoder receives this transmitted signal and reconstructs the output image f
(x, y). If the system is an error free one f (x, y) will be a replica of f (x, y).
The encoder and the decoder are made up of two blocks each. The encoder is made up of a
Source encoder and a Channel encoder. The source encoder removes the input redundancies
while the channel encoder increases the noise immunity of the source encoders. The decoder
consists of a channel decoder and a source decoder. The function of the channel decoder is to
ensure that the system is immune to noise. Hence if the channel between the encoder and the
decoder is noise free, the channel encoder and the channel decoder are omitted.
Source encoder and decoder: The three basic types of the redundancies in an image are
interpixel, coding redundancies and psychovisual redundancies. Run length coding is used to
eliminate or reduce interpixel redundancies Huffman encoding is used to eliminate or reduce
coding redundancies while I.G.S is used to eliminate interpixel redundancies. The job of the
source decoders is to get back the original signal. The problem solved by runlength coding,
Huffman encoding and I.G.S coding are examples of source encoders and decoders.
The input image is passed through a mapper. The mapper reduces the interpixel redundancies.
The mapping stage is a lossless technique and hence is an reversible operation. The output of
a mapper is passed through a Quantizer block. The quantizer block reduces the psychovisual
62
redundancies. It compresses teh data by eliminating some information and hence is an
irreversible operation. The quantizer block uses JPEG compression which means a lossy
compression. Hence in case of lossless compression, the quantizer block is eliminated. The
final block of the source encoder is that of a symbol encoder. This block creates a variable
length code to represent the output of the quantizer. The Huffman code is a typical example of
the symbol encoder. The symbol encoder reduces coding redundancies.
The source decoder block performs exactly the reverse operation of the symbol encoder and
the mapper blocks. It is important to note that the source decoder has only two blocks. Since
quantization is irreversible, an inverse quantizer block does not exist. The channel is noise free
and hence have ignored the channel encoder and channel decoder.
Channel encoder and decoder: The channel encoder is used to make the system immune to
transmission noise. Since the output of the source encoder has very little redundancy, it is
highly susceptible to noise. The channel encoder inserts a controlled form of redundancy to the
source encoder output making it more noise resistant.
1. 3 Error-free compression or loss less compression
It is also known as entropy coding as it uses decomposition techniques to minimize loopholes.
The original image can be perfectly recovered from the compressed image, in lossless
compression techniques. These do not add noise to the signal. It is also known as entropy
coding as it uses decomposition techniques to minimize redundancy.
Following techniques are included in lossless compression:
Variable Length Coding (VLC)
Run length encoding
Differential coding
Predictive coding
Dictionary-based coding
Variable Length Coding (VLC): Most entropy-based encoding techniques rely on assigning
variable-length code words to each symbol, whereas the most likely symbols are assigned
shorter code words. In the case of image coding, the symbols may be raw pixel values or the
numerical values obtained at the output of the mapper stage (e.g., differences between
consecutive pixels, run-lengths, etc.). The most popular entropy-based encoding technique is
the Huffman code. It provides the least amount of information units (bits) per source symbol.
It is described in more detail in a separate short article
Run length encoding (RLE): RLE is one of the simplest data compression techniques. It
consists of replacing a sequence (run) of identical symbols by a pair containing the symbol and
63
the run length. It is used as the primary compression technique in the 1-D CCITT Group 3 fax
standard and in conjunction with other techniques in the JPEG image compression standard
(described in a separate short article).
Differential coding: Differential coding techniques explore the inter pixel redundancy in
digital images. The basic idea consists of applying a simple difference operator to neighbouring
pixels to calculate a difference image, whose values are likely to follow within a much narrower
range than the original gray-level range. As a consequence of this narrower distribution – and
consequently reduced entropy – Huffman coding or other VLC schemes will produce shorter
code words for the difference image.
Predictive coding:
Predictive coding techniques constitute another example of exploration of inter pixel
redundancy, in which the basic idea is to encode only the new information in each pixel. This
new information is usually defined as the difference between the actual and the predicted value
of that pixel. The key component is the predictor; whose function is to generate an estimated
(predicted) value for each pixel from the input image based on previous pixel values. The
predictor’s output is rounded to the nearest integer and compared with the actual pixel value:
the difference between the two called prediction error – is then encoded by a VLC encoder.
Since prediction errors are likely to be smaller than the original pixel values, the VLC encoder
will likely generate shorter code words. There are several local, global, and adaptive prediction
algorithms in the literature. In most cases, the predicted pixel value is a linear combination of
previous pixels.
Dictionary-based coding:
Dictionary-based coding techniques are based on the idea of incrementally building a
dictionary (table) while receiving the data. Unlike VLC techniques, dictionary-based
techniques use fixed-length code words to represent variable-length strings of symbols that
commonly occur together. Consequently, there is no need to calculate, store, or transmit the
probability distribution of the source, which makes these algorithms extremely convenient and
popular. The best-known variant of dictionary-based coding algorithms is the LZW (Lempel-
Ziv-Welch) encoding scheme, used in popular multimedia file formats such as GIF, TIFF, and
PDF.
1. 4 Lossy Compression
Lossy compression methods have larger compression ratios as compared to the lossless
compression techniques. Lossy methods are used for most applications. By this the output
image that is reconstructed image is not exact copy but somehow resembles it at larger portion.
64
As shown in Fig., this prediction – transformation – decomposition process is completely
reversible. There is loss of information due to process of quantization. The entropy coding after
the quantizing, is lossless. When decoder has input data, entropy decoding is applied to
compressed signal values to get the quantized signal values. Then, de-quantization is used on
it and the image is recovered which resembles to the original.
Lossy compression includes following methods:
Block truncation coding
Code Vector quantization
Fractal coding
Transform coding
Sub-band coding
Block Truncation Coding: In this, image is divided into blocks like we have in fractals. The
window of N by N of an image is considered as a block. The mean value of all values of that
window consisting a certain number of pixel. The threshold is normally the mean value of the
pixel values in the vector. Then a bitmap of that vector is generated by replacing all pixels
having values are greater than or equal to the threshold by a 1. Then for each segment in the
bitmap, a value is determined which is the average of the values of the corresponding pixels in
the original code vector.
Code Vector Quantization: The basic idea in Vector Quantization is to create a dictionary of
vectors of constant size, called code vectors. Values of pixels composed the blocks called as
code vector. A given image is then parted into non-recurring vectors called image vectors.
Dictionary is made out this information and it is indexed. Further, it is used for encoding the
original image. Thus, every image is then entropy coded with the help of these indices.
Fractal Compression: The basic thing behind this coding is to divide image into segments by
using standard points like colour difference, edges, frequency and texture. It is obvious that
parts of an image and other parts of the same image are usually resembling. Here, there is a
dictionary which is used as a look up table called as fractal segments. The library contains
codes which are compact sets of numbers. Doing an algorithm operation, fractals are operated
and image is encoded. This scheme is far more effective for compressing images that are natural
and textured.
65
Transform Coding: In this coding, transforms like Discrete Fourier Transform (DFT) and
Discrete Cosine Transform (DCT), Discrete Sine Transform are used to alter the pixel
specifications from spatial domain into frequency domain. One is the energy compaction
property; some few coefficients only have the energy of original image signal that can be used
to reproduce itself. Only those few significant coefficients are considered and the remaining is
discarded. These coefficients are given for quantization and encoding. DCT coding has been
the most commonly used in transformation of image data.
Subband Coding: In this scheme, quantization and coding is applied to each of the analyzed
sub-bands from the frequency components bands. This coding is very useful because
quantization and coding is more accurately applied to the sub-bands.
1. 5 Detection of Discontinuities
There are 3 basic types of discontinuities: point detection, line detection and edge detection.
The detection is based on convoluting the image with a spatial mask.
Point Detection:
A point has been detected at the location p(i,j) on which the mask is centered if |R |>T,
where T is a nonnegative threshold, and R is obtained with the following mask.
The idea is that the gray level of an isolated point will be quite different from the gray
level of its neighbours.
66
Line Mask
Horizontal line
45° line
Vertical line
-45° line
If, at a certain point in the image, |Ri|>|Rj| for all j ≠ i, that point is said to be more likely
associated with a line in the direction of mask i.
67
Edge Detection
It locates sharp changes in the intensity function.
Edges are pixels where brightness changes abruptly.
A change of the image function can be described by a gradient that points in the
direction of the largest growth of the image function.
An edge is a property attached to an individual pixel and is calculated from the image
function behaviour in a neighbourhood of the pixel.
Magnitude of the first derivative detects the presence of the edge
Sign of the second derivative determines whether the edge pixel lies on the dark sign
or light side.
68
(a) Gradient operator:
For a function f(x,y), the gradient of f at coordinates (x',y') is defined as the vector
69
Its magnitude can be approximated in the digital domain in a number of ways, which
result in a number of operators such as Roberts, Prewitt and Sobel operators for
computing its value.
The Laplacian has the same properties in all directions and is therefore invariant to
rotation in the image.
It can also be implemented in digital form in various ways.
For a 3x3 region, the mask is given as
70
It is seldom used in practice for edge detection for the following reasons:
1. As a 2nd-order derivative, it is unacceptably
sensitive to noise.
2. It produces double edges and is unable to detect
edge direction.
The Laplacian usually plays the secondary role of detector for establishing whether a
pixel is on the dark or light side of an edge.
1. 6 Edge linking and boundary detection
The techniques of detecting intensity discontinuities yield pixels lying only on the
boundary between regions.
In practice, this set of pixels seldom characterizes a boundary completely because if
noise, breaks in boundary from no uniform illumination, and other effects that introduce
spurious intensity discontinuities.
Edge detection algorithms are typically followed by linking and other boundary
detection procedures designed to assemble edge pixels into meaningful boundaries.
(a) Local Processing
Two principal properties used for establishing similarity of edge pixels in this kind
of analysis are:
1. The strength of the response of the gradient
operator used to produce the edge pixel.
2. The direction of the gradient.
In a small neighbourhood, e.g. 3x3, 5x5, all points with common properties are
linked
A point (x',y') in the neighbourhood of (x,y) is linked to the pixel at (x,y) if both the
following magnitude and direction criteria are satisfied.
71
1. 6 Thresholding
Thresholding is one of the most important approaches to image segmentation.
If background and object pixels have gray levels grouped into 2 dominant modes, they
can be separated with a threshold easily.
72
Multilevel thresholding is in general less reliable as it is difficult to establish effective
thresholds to isolate the regions of interest.
Adaptive thresholding
The threshold value varies over the image as a function of local image characteristics.
Image f is divided into sub images.
A threshold is determined independently in each sub image.
If a threshold can't be determined in a sub image, it can be interpolated with thresholds
obtained in neighbouring sub images.
Each sub image is then processed with respect to its local threshold.
73
Threshold selection based on boundary characteristics
A reliable threshold must be selected to identify the mode peaks of a given histogram.
This capability is very important for automatic threshold selection in situations where
image characteristics can change over a broad range of intensity distributions.
We can consider only those pixels that lie on or near the boundary between objects and
the background such that the associated histogram is well-shaped to provide a good
chance for us to select a good threshold.
The gradient can indicate if a pixel is on an edge or not.
The Laplacian can tell if a given pixel lies on the dark or light (background or object)
side of an edge.
The gradient and laplacian can produce a 3-level image
74
Where T is threshold
75