Digital Image Processing Course Material-TAB
Digital Image Processing Course Material-TAB
Prepared by:
Sri. T. Aravinda Babu
Asst. Prof.,
Dept. of ECE, CBIT
Contents:
Introduction
Applications
Course Orientation
x,y and the intensity values of f are all finite, discrete quantities then
the image is called as digital image.
Typical Applications:
Noise filtering
Content enhancement
Contrast enhancement
Deblurring
Remote sensing
Typical Applications:
Industrial machine vision for product assembly and inspection
2. Coding redundancy
3. Psychovisual redundancy
Applications:
1. Reduced storage
2. Reduction in bandwidth
SEE: 70 Marks
CIE: 30 Marks
Credits: 3
2005.
x,y and the intensity values of f are all finite, discrete quantities then
the image is called as digital image.
Figure is typical of the type of images that could be obtained using the15-
tone equipment.
34
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT
Step 1: Image Acquisition
The image is captured by a sensor (Ex: Camera), and digitized if the output of
the camera or sensor is not already in digital form, using analogue-to-digital
convertor.
Step 2: Image Enhancement
The process of manipulating an image so that the result is more suitable
than the original for specific applications.
The idea behind enhancement techniques is to bring out details that are
hidden, or simple to highlight certain features of interest in an image.
Step 3: Image Restoration
Improving the appearance of an image
Tend to be mathematical or probabilistic models. Enhancement, on the
other hand, is based on human subjective preferences regarding what
constitutes a “good” enhancement result.
6. Image Displays
The displays in use today are mainly color (preferably flat screen) TV
monitors. Monitors are driven by the outputs of the image and graphics
display cards that are an integral part of a computer system.
7. Hardcopy devices
Used for recording images, include laser printers, film cameras, heat-sensitive
devices, inkjet units and digital units, such as optical and CD-ROM disks.
(1) the amount of source illumination incident on the scene being viewed,
1 (total reflectance)
y) be denoted by L = f (x, y)
The interval [Lmin ,Lmax ] is called the intensity (or gray) scale.
All intermediate values are shades of gray varying from black to white.
The human visual system consists mainly of the eye (image sensor or
camera), optic nerve (transmission path), and brain (image
information processing unit or computer).
eye.
Sclera : Outer tough opaque membrane, covers rest of the optic globe.
Retina : Innermost membrane of the eye. When the eye is properly focused,
light form an object outside the eye is imaged on the retina. Pattern vision is
the retina. 54
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT
Receptors
Rods - 75-150 million, distributed over the entire retina, responsible for
scotopic vision (dim-light vision), not color sensitive, gives general overall
picture (not details).
In an ordinary photographic camera, the lens has a fixed focal length. Focusing
at various distances is achieved by varying the distance between the lens and
the imaging plane, where the film (or imaging chip in the case of a digital
camera) is located.
In the human eye, the distance between the center of the lens and the imaging
sensor (the retina) is fixed, and the focal length needed to achieve proper focus
is obtained by varying the shape of the lens.
Farther the object, smaller the refractive power of lens, larger the focal length.
15/100 = h / 17 or h = 2.55 mm
Perception then takes place by the relative excitation of light receptors, which
transform radiant energy into electrical impulses that ultimately are decoded
by the brain.
59
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT
Brightness Discrimination
The ability of the eye to discriminate between changes
in brightness levels is called brightness discrimination.
The increment of intensity ∆Ic that is discriminable
over a background intensity of I is measured.
Weber ratio --- it is the ratio ∆Ic/I.
Small value of Weber ratio --- good brightness
discrimination, a small percentage change in intensity is
discriminable.
Large value of Weber ratio --- poor brightness
discrimination, a large percentage change in intensity is
required.
At high intensities the brightness discrimination is
good (small Weber ratio), than at low intensities
60
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT
Perceived Brightness is not a Simple Function of Light Intensity
1. Sampling 2. Quantization
The grabbed image is now a digital image and can be accessed as a two
dimensional array of data.
the figure.
Starting at the top of the image carryings out procedure line by line
produces a two dimensional digital image.
Suppose that we sample the continuous image into a 2-D array, f(x,y)
containing M-rows and N- columns where ( x, y) are discrete coordinates.
The values of the image at any coordinate (x,y) is denoted f(x,y) where x
and y are integer. The section of the real plane spanned by the coordinates
of an image is called the spatial domain, with x and y being referred to as
spatial variables or spatial coordinates.
70
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT
Image digitization requires that decisions be made regarding the values for
M, N, and for the number L, of discrete intensity levels.
The discrete levels are equally spaced and that they are integers in the
range [0,L-1].
The range of values spanned by the gray scale is referred to as the dynamic
range, a term used in different ways in different fields
Dynamic range establishes the lowest and highest intensity levels that a
system can represent an image.
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT 71
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT 72
Image contrast is define as the difference in intensity between the
highest and lowest intensity levels in an image.
Q: Suppose a pixel has 1 bit, how many gray levels can it represent?
0:black , 1: white
Q: Suppose a pixel has 2 bit, how many gray levels can it represent?
Magazines – 133dpi
Glossy brochures-175dpi
Adjacency
Path
Connectivity
Region
Boundary
Distance
Diagonal-Neighbours of pixel
8-Neighbours of pixel
83
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT
The 4-neighbours and the diagonal neighbours of p are called 8-
neighbours of p : N8(p).
4-adjacency
8-adjacency
M-adjacency:
(a) q is in N4(p )
or
(b) q is in ND(p ) and the set N4(p ) ∩ N4(q ) has no pixels whose values
are from V
where points (x0,y0)=(x,y), ( xn ,yn) = (s,t) and pixels (xi ,yi) and (xi-1 ,yi-1)
are adjacent for 1 ≤ i ≤ n. In this case, n is the length of the path. If (x0,y0)=(
xn ,yn) the path is a closed path.
91
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT
Connectivity between pixels
It is an important concept in digital image processing.
94
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT
Example: For V={2,3,4}, compute the lengths of shortest 4,8 and M-paths between
p and q in the following image.
another way, the border of a region is the set of pixels in the region that
have at least one background neighbor.
Boundary of the region is defined as a set of pixels in the region that have
one or more neighbors that are not in R. Boundary is the edge of a region.
Example: Consider the following arrangement of pixels and assume that p, p2, and
p4 have value 1 and that p1 and p3 can have can have a value of 0 or 1 Suppose
that we consider the adjacency of pixels values 1 (i.e. V ={1}) Now, to compute
the Dm between points p and p4.
99
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT
Here we have 4 cases:
Case3: If p1 =0 and p3 = 1, The same applies here, and the shortest –m-path
will be 3 (p, p2, p3, p4) .
Binary Images
Multispectral images
The actual information stored in the digital image data is the gray level
information in each spectral band.
Typical color images are represented as red, green and blue (RGB image).
These are not images in the usual sense because the information
Image files are composed of digital data in one of these formats that can be
rasterized for use on a computer display or printer.
1. GIF
3. PNG
4. TIFF
5. BMP
Extremely limited colors range suitable for web but not for printing,
photography.
JPEG files are images that have been compressed to store a lot of
information in a small size file.
AJPEG is compressed loses some of the image detail during the compression
in order to make the file small.
JPEG files are usually used for photographs on the web, because they create
a small file that is easily loaded on a web page and also looks good.
They improve quality and compression ratio, but also require more
computational power to process.
Prepared by:
Sri. T. Aravinda Babu
Asst. Prof.,
Dept. of ECE, CBIT
Image Transform
Transforms are mathematical tool which allows to move from one domain to
another domain(e.g. time domain to frequency domain and vice versa)
Image Transform:
Transform
Another
Image
Image
NXN
NXN
Inverse
Transform
Coefficient
matrix
called basis images. These basis images are generated from unitary matrices.
Filtering correlation)
Orthogonal matrix:
Unitary matrix:
In case of a real matrix A=A*, Thus a real orthogonal matrix is also unitary.
A1T = A1A1T = =I
1D Unitary Transform:
In matrix notation,
V=AUAT
U=A*TVA*
17
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT
Properties
1. Elements of F are complex valued, Thus F is complex.
Forward Transformation
Reverse Transformation
21
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT 22
2D Discrete Cosine Transform(2D-DCT)
For N X N image u(m,n), 2D forward DCT is defined as
23
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT 24
Properties of DCT
1. DCT matrix C is real, not symmetric and unitary.
3. DCT has excellent energy compaction for highly correlated data. Hence
Where bi(n) represents ith (from LSB) bit of the binary value of n
decimal number represented in binary.
Where bi(n) represents ith (from LSB) bit of the binary value of n decimal
number represented in binary.
Data compression
for p = 0, q = 0,1
for p ≠ 0, 1≤ q ≤ 2p
Prepared by:
Sri. T. Aravinda Babu
Asst. Prof.,
Dept. of ECE, CBIT
UNIT – III
Spatial Enhancement Techniques: Introduction, Histogram
equalization, direct histogram specification, Local
enhancement.
Too dark
Too light
Noise
s = T[r]
pixel value s.
S=T[r]
Image negative
Log transformation
Contrast streching
Histogram equalization
Histogram specification
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT 6
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT 7
Image negative
Digital negatives of an image with the intensity levels in the range [0,L-1] is
given by
s=(L-1)-r
In this transformation, highest grey level is mapped to lowest and vice versa.
For an 8-bit image, the transformation is s=255-r.
Middle grey level has not changed much where as dark grey level has become
bright.
S = clog(1+r)
This transformation maps narrow range of low intensity values in the input
into a wider range of output levels. The opposite is true of higher values of
input levels.
S=crγ
As in case of the log transformation, power law curves with fractional values
of γ map a narrow range of dark input values into a wider range of output
values, with the opposite being true for higher values of input levels.
A variety of devices used for image capture, printing and display response
according to a power law. The process used to correct these power law
response phenomena is called gamma correction.
piecewise function.
Principle Disadvantage
Contrast Stretching
Gray-level Slicing
Bit-plane slicing
Low contrast images can result from poor illumination, lack of dynamic
range in the image sensor, or even the wrong setting of a lens aperture
during image acquisition.
Intermediate values of (r1, s1) and (r2, s2) produce various degrees of spread in
the gray levels of the output image, thus affecting its contrast.
(r1,s1)=(rmin,0) and (r2,s2)=(rmax,L-1), where rmin and rmax denote the minimum
The transformation function stretched the levels linearly from their original range
Figure(d) shows the result of using the thresholding function with (r1,s1)=(m,0)
image. It can be implemented in several ways, but the two basic themes are:
One approach is to display a high value for all gray levels in the range of
interest and a low value for all other gray levels. This transformation,
brightens the desired range of gray levels but preserves gray levels
unchanged.
Figure (c) shows a gray scale image, and Figure (d) shows the result of
It is a graph between various grey levels on x-axis and the number of times a
grey level has occurred in an image on y-axis.
h(rk) = nk
Histogram Processing
Basic for numerous spatial domain processing techniques
Components of
histogram are
concentrated on the low
side of the gray scale.
Bright image
Components of
histogram are
concentrated on the
high side of the gray
scale.
High-contrast image
Where 0 r 1
T(r) satisfies
0 r 1 r
k
31
Conditions of T(r)
Single-valued (one-to-one relationship) guarantees that the inverse
transformation will exist
0 T(r) 1 for 0 r 1 guarantees that the output gray levels will be in the
same range as the input levels.
r = T -1(s) ;0s1
32
Probability Density Function
The gray levels in an image may be viewed as random variables in the
interval [0,1]
Let
If pr(r) and T(r) are known and T-1(s) satisfies condition (a) then ps(s) can
be obtained using a formula :
dr
ps(s) pr (r)
ds
33
The PDF of the transformed variable s is determined by the gray-level PDF of
the input image and by the chosen transformation function
r
s T ( r ) pr ( w )dw
0
where w is a dummy variable of integration
35
The probability of occurrence of gray level in an image is approximated by
nk
pr ( rk ) where k 0 , 1, ..., L-1
n
The discrete version of transformation is given
k
sk T ( rk ) pr ( r j )
j 0
k nj
where k 0 , 1, ..., L-1
j 0 n
Thus, an output image is obtained by mapping each pixel with level rk in
the input image into a corresponding pixel with level sk in the output image
37
before after Histogram
equalization
38
Example
No. of pixels
6
2 3 3 2 5
4 2 4 3 4
3 2 3 5 3
2
2 4 2 4
1
Gray level
4x4 image
0 1 2 3 4 5 6 7 8 9
Gray scale = [0,9]
histogram
39
Gray
0 1 2 3 4 5 6 7 8 9
Level(j)
No. of pixels 0 0 6 5 4 1 0 0 0 0
n
j 0
j 0 0 6 11 15 16 16 16 16 16
15 16 16 16 16 16
k nj 11/1
s 0 0 6 /16
6
/ / / / / /
j 0 n 16 16 16 16 16 16
6
3 6 6 3 5
8 3 8 6 4
6 3 6 9 3
2
3 8 3 8
1
Output image
0 1 2 3 4 5 6 7 8 9
Gray scale = [0,9] Gray level
Histogram equalization
41
Example: for the given 8 X 8 image having grey levels between [0,7] get
histogram equalized image
42
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT 43
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT 44
Histogram specifications
Histogram equalization is capable of generating an approximation of a
uniform histogram.
Sometimes the ability of specify particular histogram shapes capable of
highlighting certain gray levels in an image is desirable.
Let Pr(r) and Pz(z) be the original and desirable probability density
functions respectively then histogram equalization of original image is
Pr(s) and pv(v) would be identical uniform densities because the final
result is independent of the density inside the integral.
Thus if instead of using v in the inverse process we use the uniform levels s
obtained from the original image, the resulting levels, would
have the desired probability density function.
This global approach is suitable for overall enhancement, but generally fails
when the objective is to enhance details over small areas in an image.
The procedure is to define a neighborhood and move its center from pixel to
pixel in a horizontal or vertical direction.
For example, a filter that passes low frequencies is called a lowpass filter.
If the operation performed on the image pixels is linear, then the filter is
called a linear spatial filter.
If the operation performed on the image pixels is nonlinear, then the filter is
called a nonlinear spatial filter.
Spatial filter kernel are mask, template, and window. We use the term filter
kernel or simply kernel.
Linear Filtering of an image f of size MxN filter mask of size mxn is given by
the expression
The above Equation implements the sum of products of the form, but for a
kernel of arbitrary odd size.
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT 60
simply move the filter mask from point to point in an image, at each point
(x,y), the response of the filter at that point is calculated using a
predefined relationship.
R w1 z1 w2 z 2 ... wmn z mn
mn
w
i i
i zi
noise reduction can be accomplished by blurring with a linear filter and also
by a nonlinear filter.
The output is simply the average of the pixels contained in the neighborhood
of the filter mask. This is called averaging filters or lowpass filters.
Replacing the value of every pixel in an image by the average of the gray levels
in the neighborhood will reduce the “sharp” transitions in gray levels.
The low pass filter preserves the smooth region in the image and it removes
the sharp variations leading to blurring effect.
The blurring effect will be more with the increase in the size of the mask.
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT 68
sharp transitions
random noise in the image
edges of objects in the image
Smoothing can reduce noises (desirable) and blur edges (undesirable)
In spatial mask, sum of the elements is equal to 1.
w( s, t ) f ( x s, y t )
g ( x, y ) s at b
a b
w( s, t )
s at b
Median filter replaces the value of a pixel by the median of the gray levels
in the neighborhood of that pixel (the original value of the pixel is
included in the computation of the median).
All pixels in the neighborhood of the pixel in the original image which are
identified by the mask are sorted in the ascending or descending order.
This filter technique is popular because for certain types of random noise
(impulse noise salt and pepper noise) , they provide excellent noise-
reduction capabilities, with considering less blurring than linear
smoothing filters of similar size.
72
Example : Median Filters
73
Sharpening Spatial Filters
74
Derivative operator
The strength of the response of a derivative operator is
proportional to the degree of discontinuity of the image at the point
at which the operator is applied.
75
First-order derivative
a basic definition of the first-order derivative of a one-dimensional
function f(x) is the difference
f
f ( x 1) f ( x )
x
Second-order derivative
similarly, we define the second-order derivative of a one-
dimensional function f(x) is the difference
2 f
f ( x 1) f ( x 1) 2 f ( x)
x 2
76
First and second order derivative
First-order derivatives produce thicker edges in an images,
generally have a stronger response to a gray-level step.
77
First and Second-order derivative of f(x,y)
when we consider an image function of two variables, f(x,y), at
which time we will dealing with partial derivatives along the two
spatial axes.
f ( x, y) f ( x, y) f ( x, y)
Gradient operator f
xy x y
Laplacian operator
2
f ( x , y ) 2
f ( x, y )
(linear operator) f
2
x 2
y 2
78
Discrete Form of Laplacian
from f 2
f ( x 1, y) f ( x 1, y) 2 f ( x, y)
x 2
f
2
f ( x, y 1) f ( x, y 1) 2 f ( x, y )
y 2
yield,
f [ f ( x 1, y ) f ( x 1, y )
2
f ( x, y 1) f ( x, y 1) 4 f ( x, y )]
79
Laplacian mask
80
Correct the effect of background
Easily by adding the original and laplacian image be careful with
the Laplacian filter used
81
Example
a). image of the North pole of the
moon
1 1 1
1 -8 1
1 1 1
c). Laplacian image scaled for
display purposes
82
Mask of Laplacian + addition
To simply the computation, we can create a mask which do both
operations, Laplacian Filter and Addition the original image.
g ( x, y ) f ( x, y ) [ f ( x 1, y ) f ( x 1, y )
f ( x, y 1) f ( x, y 1) 4 f ( x, y )]
5 f ( x, y ) [ f ( x 1, y ) f ( x 1, y )
f ( x, y 1) f ( x, y 1)]
0 -1 0
-1 5 -1
0 -1 0
83
f ( x, y) 2 f ( x, y)
Note g ( x, y)
f ( x, y) f ( x, y)
2
0 -1 0 0 0 0 0 -1 0
-1 5 -1 = 0 1 0 + -1 4 -1
0 -1 0 0 0 0 0 -1 0
0 -1 0 0 0 0 0 -1 0
-1 9 -1 = 0 1 0 + -1 8 -1
0 -1 0 0 0 0 0 -1 0
84
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT 85
Unsharp masking
Unsharp masking is one of the technique used for edge enhancement.
f s ( x, y ) f ( x, y ) f ( x, y )
sharpened image = original image – blurred image
86
High-boost filtering
A high boost filter is also known as a high frequency emphasis filter.
f hb ( x, y ) ( A 1) f ( x, y ) f ( x, y ) f ( x, y )
( A 1) f ( x, y ) f s ( x, y )
f hb ( x, y ) ( A 1) f ( x, y ) f s ( x, y )
if we use Laplacian filter to create sharpen image fs(x,y) with
addition of original image
f ( x, y ) 2 f ( x, y )
f s ( x, y )
f ( x, y ) 2
f ( x, y )
87
High-boost filtering
Af ( x, y) 2 f ( x, y)
f hb ( x, y)
Af ( x, y) f ( x, y)
2
88
High-boost Masks
A1
89
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT 90
Gradient Operator
Derivative operative are used for edge detection techniques.
f
G x x
f f
1
G y
f mag (f ) [G G ] 2
x
2
y
2
y
1
f 2
f
2 2
commonly approx.
x y
f Gx Gy
the magnitude becomes nonlinear
91
Gradient Mask
z1 z2 z3
simplest approximation, 2x2
z4 z5 z6
z7 z8 Z9
Gx ( z8 z5 ) and G y ( z 6 z5 )
1 1
f [G G ]
2
x
2
y
2
[( z8 z5 ) ( z6 z5 ) ]
2 2 2
f z8 z5 z6 z5
92
z1 z2 z3
Gradient Mask
z4 z5 z6
Roberts cross-gradient operators, 2x2 z7 z8 z9
G x ( z9 z5 ) and G y ( z8 z 6 )
1 1
f [G G ]
2
x
2
y
2
[( z9 z5 ) ( z8 z6 ) ]
2 2 2
f z9 z5 z8 z6
93
z1 z2 z3
z4 z5 z6
Gradient Mask
z7 z8 z9
f Gx Gy
the weight value 2 is to achieve
smoothing by giving more
important to the center point
94
Filtering in frequency domain
These three categories cover the range from very sharp (ideal) to very smooth
(Gaussian) filtering.
The shape of a Butterworth filter is controlled by a parameter called the filter order.
For large values of this parameter, the Butterworth filter approaches the ideal filter.
For lower values, the Butterworth filter is more like a Gaussian filter.
Contrast enhancement
A good deal of control can be gained over the illumination and reflectance
components with a homomorphic filter.
127
Sri. T. Aravinda babu, Asst. Prof., Dept. of ECE, CBIT
Image zooming
Zooming is enlarging a picture in a scene that the details in the image become
more visible and clear.
UNIT-III
IMAGE RESTORATION
125
Digital Image Processing Question & Answers
The distortion correction equations yield non integer values for x' and y'. Because the
distorted image g is digital, its pixel values are defined only at integer coordinates. Thus using
non integer values for x' and y' causes a mapping into locations of g for which no gray levels are
defined. Inferring what the gray-level values at those locations should be, based only on the pixel
values at integer coordinate locations, then becomes necessary. The technique used to
accomplish this is called gray-level interpolation.
The simplest scheme for gray-level interpolation is based on a nearest neighbor approach.
This method, also called zero-order interpolation, is illustrated in Fig. 6.1. This figure shows
(A) The mapping of integer (x, y) coordinates into fractional coordinates (x', y') by means of
following equations
and
(C) The assignment of the gray level of this nearest neighbor to the pixel located at (x, y).
Although nearest neighbor interpolation is simple to implement, this method often has the
drawback of producing undesirable artifacts, such as distortion of straight edges in images of
high resolution. Smoother results can be obtained by using more sophisticated techniques, such
as cubic convolution interpolation, which fits a surface of the sin(z)/z type through a much larger
126
Digital Image Processing Question & Answers
number of neighbors (say, 16) in order to obtain a smooth estimate of the gray level at any
127
Digital Image Processing Question & Answers
desired point. Typical areas in which smoother approximations generally are required include 3-
D graphics and medical imaging. The price paid for smoother approximations is additional
computational burden. For general-purpose image processing a bilinear interpolation approach
that uses the gray levels of the four nearest neighbors usually is adequate. This approach is
straightforward. Because the gray level of each of the four integral nearest neighbors of a non
integral pair of coordinates (x', y') is known, the gray-level value at these coordinates, denoted
v(x', y'), can be interpolated from the values of its neighbors by using the relationship
The inverse filtering approach makes no explicit provision for handling noise. This approach
incorporates both the degradation function and statistical characteristics of noise into the
restoration process. The method is founded on considering images and noise as random
processes, and the objective is to find an estimate f of the uncorrupted image f such that the mean
square error between them is minimized. This error measure is given by
e2 = E {(f- f )2}
where E{•} is the expected value of the argument. It is assumed that the noise and the image are
uncorrelated; that one or the other has zero mean; and that the gray levels in the estimate are a
linear function of the levels in the degraded image. Based on these conditions, the minimum of
the error function is given in the frequency domain by the expression
128
Digital Image Processing Question & Answers
where we used the fact that the product of a complex quantity with its conjugate is equal to the
magnitude of the complex quantity squared. This result is known as the Wiener filter, after N.
Wiener [1942], who first proposed the concept in the year shown. The filter, which consists of
the terms inside the brackets, also is commonly referred to as the minimum mean square error
filter or the least square error filter. The Wiener filter does not have the same problem as the
inverse filter with zeros in the degradation function, unless both H(u, v) and S η(u, v) are zero for
the same value(s) of u and v.
As before, H (u, v) is the transform of the degradation function and G (u, v) is the
transform of the degraded image. The restored image in the spatial domain is given by the
inverse Fourier transform of the frequency-domain estimate F (u, v). Note that if the noise is
zero, then the noise power spectrum vanishes and the Wiener filter reduces to the inverse filter.
When we are dealing with spectrally white noise, the spectrum │N (u, v│ 2 is a constant,
which simplifies things considerably. However, the power spectrum of the undegraded image
seldom is known. An approach used frequently when these quantities are not known or cannot be
estimated is to approximate the equation as
129
Digital Image Processing Question & Answers
The Fig. 6.3 shows, the degradation process is modeled as a degradation function that,
together with an additive noise term, operates on an input image f(x, y) to produce a degraded
image g(x, y). Given g(x, y), some knowledge about the degradation function H, and some
knowledge about the additive noise term η(x, y), the objective of restoration is to obtain an
estimate f(x, y) of the original image. the estimate should be as close as possible to the original
input image and, in general, the more we know about H and η, the closer f(x, y) will be to f(x, y).
130
Digital Image Processing Question & Answers
4. Explain about the restoration filters used when the image degradation is due to noise
only.
1. Mean filters
2. Order static filters and
3. Adaptive filters
This is the simplest of the mean filters. Let Sxy represent the set of coordinates in a
rectangular subimage window of size m X n, centered at point (x, y).The arithmetic mean
filtering process computes the average value of the corrupted image g(x, y) in the area defined by
Sxy. The value of the restored image f at any point (x, y) is simply the arithmetic mean computed
using the pixels in the region defined by S xy. In other words
This operation can be implemented using a convolution mask in which all coefficients have
value 1/mn
131
Digital Image Processing Question & Answers
Here, each restored pixel is given by the product of the pixels in the subimage window, raised to
the power 1/mn. A geometric mean filter achieves smoothing comparable to the arithmetic mean
filter, but it tends to lose less image detail in the process.
The harmonic mean filter works well for salt noise, but fails for pepper noise. It does well also
with other types of noise like Gaussian noise.
The contra harmonic mean filtering operation yields a restored image based on the expression
where Q is called the order of the filter. This filter is well suited for reducing or virtually
eliminating the effects of salt-and-pepper noise. For positive values of Q, the filter eliminates
pepper noise. For negative values of Q it eliminates salt noise. It cannot do both simultaneously.
Note that the contra harmonic filter reduces to the arithmetic mean filter if Q = 0, and to the
harmonic mean filter if Q = -1.
132
Digital Image Processing Question & Answers
The best-known order-statistics filter is the median filter, which, as its name implies,
replaces the value of a pixel by the median of the gray levels in the neighborhood of that pixel:
The original value of the pixel is included in the computation of the median. Median filters are
quite popular because, for certain types of random noise, they provide excellent noise-reduction
capabilities, with considerably less blurring than linear smoothing filters of similar size. Median
filters are particularly effective in the presence of both bipolar and unipolar impulse noise.
Although the median filter is by far the order-statistics filler most used in image
processing, it is by no means the only one. The median represents the 50th percentile of a ranked
set of numbers, but the reader will recall from basic statistics that ranking lends itself to many
other possibilities. For example, using the 100th percentile results in the so-called max filter,
given by
This filter is useful for finding the brightest points in an image. Also, because pepper noise has
very low values, it is reduced by this filter as a result of the max selection process in the
subimage area S xy.
This filter is useful for finding the darkest points in an image. Also, it reduces salt noise as a
result of the min operation.
The midpoint filter simply computes the midpoint between the maximum and minimum
values in the area encompassed by the filter:
134
Digital Image Processing Question & Answers
Note that this filter combines order statistics and averaging. This filter works best for randomly
distributed noise, like Gaussian or uniform noise.
It is a filter formed by deleting the d/2 lowest and the d/2 highest gray-level values of g(s,
t) in the neighborhood Sxy. Let gr (s, t) represent the remaining mn - d pixels. A filter formed by
averaging these remaining pixels is called an alpha-trimmed mean filter:
where the value of d can range from 0 to mn - 1. When d = 0, the alpha- trimmed filter reduces to
the arithmetic mean filter. If d = (mn - l)/2, the filter becomes a median filter. For other values
of d, the alpha-trimmed filter is useful in situations involving multiple types of noise, such as a
combination of salt-and-pepper and Gaussian noise.
Adaptive filters are filters whose behavior changes based on statistical characteristics of
the image inside the filter region defined by the m X n rectangular window Sxy.
The simplest statistical measures of a random variable are its mean and variance. These
are reasonable parameters on which to base an adaptive filler because they are quantities closely
related to the appearance of an image. The mean gives a measure of average gray level in the
region over which the mean is computed, and the variance gives a measure of average contrast in
that region.
This filter is to operate on a local region, Sxy. The response of the filter at any point (x, y)
on which the region is centered is to be based on four quantities: (a) g(x, y), the value of the
noisy image at (x, y); (b) a2, the variance of the noise corrupting /(x, y) to form g(x, y); (c) ray,
the local mean of the pixels in S xy; and (d) σ2L , the local variance of the pixels in Sxy.
135
Digital Image Processing Question & Answers
1. If σ2η is zero, the filler should return simply the value of g (x, y). This is the trivial, zero-noise
case in which g (x, y) is equal to f (x, y).
2. If the local variance is high relative to σ2η the filter should return a value close to g (x, y). A
high local variance typically is associated with edges, and these should be preserved.
3. If the two variances are equal, we want the filter to return the arithmetic mean value of the
pixels in S xy. This condition occurs when the local area has the same properties as the overall
image, and local noise is to be reduced simply by averaging.
The only quantity that needs to be known or estimated is the variance of the overall noise, a2.
The other parameters are computed from the pixels in S xy at each location (x, y) on which the
filter window is centered.
The median filter performs well as long as the spatial density of the impulse noise is not large (as
a rule of thumb, Pa and Pb less than 0.2). The adaptive median filtering can handle impulse noise
with probabilities even larger than these. An additional benefit of the adaptive median filter is
that it seeks to preserve detail while smoothing nonimpulse noise, something that the
"traditional" median filter does not do. The adaptive median filter also works in a rectangular
window area Sxy. Unlike those filters, however, the adaptive median filter changes (increases) the
size of Sxy during filter operation, depending on certain conditions. The output of the filter is a
single value used to replace the value of the pixel at (x, y), the particular point on which the
window Sxy is centered at a given time.
The adaptive median filtering algorithm works in two levels, denoted level A and level B, as
follows:
A2 = zmed - zmax
B2 = zxy - zmax
If B1> 0 AND B2 < 0, output zxy
An image is represented by two-dimensional functions of the form f(x, y). The value or
amplitude of f at spatial coordinates (x, y) is a positive scalar quantity whose physical meaning is
determined by the source of the image. When an image is generated from a physical process, its
values are proportional to energy radiated by a physical source (e.g., electromagnetic waves). As
a consequence, f(x, y) must be nonzero and finite; that is,
Appropriately, these are called the illumination and reflectance components and are denoted by
i (x, y) and r (x, y), respectively. The two functions combine as a product to form f (x, y).
and
Equation (4) indicates that reflectance is bounded by 0 (total absorption) and 1 (total
reflectance).The nature of i (x, y) is determined by the illumination source, and r (x, y) is
determined by the characteristics of the imaged objects. It is noted that these expressions also are
applicable to images formed via transmission of the illumination through a medium, such as a
chest X-ray.
The simplest approach to restoration is direct inverse filtering, where F (u, v), the
transform of the original image is computed simply by dividing the transform of the degraded
image, G (u, v), by the degradation function
It tells that even if the degradation function is known the undegraded image cannot be
recovered [the inverse Fourier transform of F( u, v)] exactly because N(u, v) is a random
function whose Fourier transform is not known.
If the degradation has zero or very small values, then the ratio N(u, v)/H(u, v) could
easily dominate the estimate F(u, v).
One approach to get around the zero or small-value problem is to limit the filter
139
Digital Image Processing Question & Answers
is usually the highest value of H (u, v) in the frequency domain. Thus, by limiting the analysis to
frequencies near the origin, the probability of encountering zero values is reduced.
The following are among the most common PDFs found in image processing applications.
Gaussian noise
Because of its mathematical tractability in both the spatial and frequency domains,
Gaussian (also called normal) noise models are used frequently in practice. In fact, this
tractability is so convenient that it often results in Gaussian models being used in situations in
which they are marginally applicable at best.
… (1)
where z represents gray level, µ is the mean of average value of z, and a σ is its standard
deviation. The standard deviation squared, σ2, is called the variance of z. A plot of this function
is shown in Fig. 5.10. When z is described by Eq. (1), approximately 70% of its values will be in
the range [(µ - σ), (µ +σ)], and about 95% will be in the range [(µ - 2σ), (µ + 2σ)].
Rayleigh noise
141
Digital Image Processing Question & Answers
Figure 5.10 shows a plot of the Rayleigh density. Note the displacement from the origin and the
fact that the basic shape of this density is skewed to the right. The Rayleigh density can be quite
useful for approximating skewed histograms.
where the parameters are such that a > 0, b is a positive integer, and "!" indicates factorial. The
mean and variance of this density are given by
µ=b/a
σ2 = b / a2
Exponential noise
µ=1/a
σ2 = 1 / a2
Uniform noise
142
Digital Image Processing Question & Answers
µ = a + b /2
σ2 = (b – a ) 2 / 12
Impulse (salt-and-pepper) noise
If b > a, gray-level b will appear as a light dot in the image. Conversely, level a will
appear like a dark dot. If either Pa or Pb is zero, the impulse noise is called unipolar. If neither
probability is zero, and especially if they are approximately equal, impulse noise values will
resemble salt-and-pepper granules randomly distributed over the image. For this reason, bipolar
impulse noise also is called salt-and-pepper noise. Shot and spike noise also are terms used to
refer to this type of noise.
143
Digital Image Processing Question & Answers
144
Digital Image Processing Question & Answers
11. Enumerate the differences between the image enhancement and image restoration.
(i) Image enhancement techniques are heuristic procedures designed to manipulate an image in
order to take advantage of the psychophysical aspects of the human system. Whereas image
restoration techniques are basically reconstruction techniques by which a degraded image is
reconstructed by using some of the prior knowledge of the degradation phenomenon.
(ii) Image enhancement can be implemented by spatial and frequency domain technique,
whereas image restoration can be implement by frequency domain and algebraic techniques.
(iii) The computational complexity for image enhancement is relatively less when compared to
the computational complexity for irrrage restoration, since algebraic methods requires
manipulation of large number of simultaneous equation. But, under some condition
computational complexity can be reduced to the same level as that required by traditional
frequency domain technique.
(iv) Image enhancement techniques are problem oriented, whereas image restoration techniques
are general and are oriented towards modeling the degradation and applying the reverse process
in order to reconstruct the original image.
(v) Masks are used in spatial domain methods for image enhancement, whereas masks are not
used for image restoration techniques.
(vi) Contrast stretching is considered as image enhancement technique because it is based on the
pleasing aspects of the review, whereas removal of’ image blur by applying a deblurring function
is considered as a image restoration technique.
12. Explain about iterative nonlinear restoration using the Lucy–Richardson algorithm.
With as the point spread function, the pixels in observed image are expressed as,
Here,
146
Digital Image Processing Question & Answers
The L-R algorithm cannot be used in application in which the psf (Pij) is dependent on one or
more unknown variables.
The L-R algorithm is based on maximum-likelihood formulation, in this formulation Poisson statistics are
used to model the image. If the likelihood of model is increased, then the result is an equation which
satisfies when the following iteration converges.
Here,
The factor f which is present in the right side denominator leads to non-linearity. Since, the
algorithm is a type of nonlinear restorations; hence it is stopped when satisfactory result is
obtained.
The basic syntax of function deconvlucy with the L-R algorithm is implemented is given
below.
g = Degraded image
fr = Restored image
DAMPAR
The DAMPAR parameter is a scalar parameter which is used to determine the deviation
of resultant image with the degraded image (g). The pixels which gel deviated from their original
value within the DAMPAR, for these pixels iterations are cancelled so as to reduce noise
148
Digital Image Processing Question & Answers
WEIGHT
WEIGHT parameter gives a weight to each and every pixel. It is array of size similar to
that of degraded image (g). In applications where a pixel leads to improper image is removed by
assigning it to a weight as 0’. The pixels may also be given weights depending upon the flat-field
correction, which is essential according to image array. Weights are used in applications such as
blurring with specified psf. They are used to remove the pixels which are pre9ent at the boundary
of the image and are blurred separately by psf.
If the array size of psf is n x n then the width of weight of border of zeroes being used is
ceil (n / 2)
150
Digital Image Processing
UNIT-V
IMAGE COMPRESSION
185
Digital Image Processing
Data redundancy is a central issue in digital image compression. It is not an abstract concept but
a mathematically quantifiable entity. If n1 and n2 denote the number of information-carrying
units in two data sets that represent the same information, the relative data redundancy R D of the
first data set (the one characterized by n1) can be defined as
For the case n2 = n1, CR = 1 and RD = 0, indicating that (relative to the second data set) the first
representation of the information contains no redundant data. When n2 << n1, CR ∞ and
RD1, implying significant compression and highly redundant data. Finally, when n2 >> n1
, CR 0 and R D ∞, indicating that the second data set contains much more data
than the original representation. This, of course, is the normally undesirable case of data
expansion. In general, CR and RD lie in the open intervals (0, ∞) and (- ∞, 1), respectively.
A practical compression ratio, such as 10 (or 10:1), means that the first data set has 10
information carrying
units (say, bits) for every 1 unit in the second or compressed data set. The corresponding
redundancy of 0.9 implies that 90% of the data in the first data set is redundant.
In digital image compression, three basic data redundancies can be identified and exploited:
coding redundancy, interpixel redundancy, and psychovisual redundancy. Data compression
is achieved when one or more of these redundancies are reduced or eliminated.
186
Digital Image Processing
Coding Redundancy:
In this, we utilize formulation to show how the gray-level histogram of an image also can
provide a great deal of insight into the construction of codes to reduce the amount of data used to
represent it.
Let us assume, once again, that a discrete random variable r k in the interval [0, 1] represents the
gray levels of an image and that each rk occurs with probability p r (rk).
where L is the number of gray levels, nk is the number of times that the kth gray level appears in
the image, and n is the total number of pixels in the image. If the number of bits used to represent
each value of rk is l (rk), then the average number of bits required to represent each pixel is
That is, the average length of the code words assigned to the various gray-level values is found
by summing the product of the number of bits used to represent each gray level and the
probability that the gray level occurs. Thus the total number of bits required to code an M X N
image is MNLavg.
Interpixel Redundancy:
Consider the images shown in Figs. 1.1(a) and (b). As Figs. 1.1(c) and (d) show, these images
have virtually identical histograms. Note also that both histograms are trimodal, indicating the
presence of three dominant ranges of gray-level values. Because the gray levels in these images
are not equally probable, variable-length coding can be used to reduce the coding redundancy
that would result from a straight or natural binary encoding of their pixels. The coding process,
however, would not alter the level of correlation between the pixels within the images. In other
words, the codes used to represent the gray levels of each image have nothing to do with the
correlation between pixels. These correlations result from the structural or geometric
relationships between the objects in the image.
187
Digital Image Processing
Fig.1.1 Two images and their gray-level histograms and normalized autocorrelation
coefficients along one line.
188
Digital Image Processing
Figures 1.1(e) and (f) show the respective autocorrelation coefficients computed along one line
of each image.
where
The scaling factor in Eq. above accounts for the varying number of sum terms that arise for each
integer value of Δn. Of course, Δn must be strictly less than N, the number of pixels on a line.
The variable x is the coordinate of the line used in the computation. Note the dramatic difference
between the shape of the functions shown in Figs. 1.1(e) and (f). Their shapes can be
qualitatively related to the structure in the images in Figs. 1.1(a) and (b).This relationship is
particularly noticeable in Fig. 1.1 (f), where the high correlation between pixels separated by 45
and 90 samples can be directly related to the spacing between the vertically oriented matches of
Fig. 1.1(b). In addition, the adjacent pixels of both images are highly correlated. When Δn is 1, γ
is 0.9922 and 0.9928 for the images of Figs. 1.1 (a) and (b), respectively. These values are
typical of most properly sampled television images.
189
Digital Image Processing
Psychovisual Redundancy:
The brightness of a region, as perceived by the eye, depends on factors other than simply the
light reflected by the region. For example, intensity variations (Mach bands) can be perceived in
an area of constant intensity. Such phenomena result from the fact that the eye does not respond
with equal sensitivity to all visual information. Certain information simply has less relative
importance than other information in normal visual processing. This information is said to be
psychovisually redundant. It can be eliminated without significantly impairing the quality of
image perception.
This terminology is consistent with normal usage of the word, which generally
means the mapping of a broad range of input values to a limited number of output values. As it is
an irreversible operation (visual information is lost), quantization results in lossy data
compression.
190
Digital Image Processing
When the level of information loss can be expressed as a function of the original or input image
and the compressed and subsequently decompressed output image, it is said to be based on an
objective fidelity criterion. A good example is the root-mean-square (rms) error between an input
and output image. Let f(x, y) represent an input image and let f(x, y) denote an estimate or
approximation of f(x, y) that results from compressing and subsequently decompressing the
input. For any value of x and y, the error e(x, y) between f (x, y) and f^ (x, y) can be defined as
where the images are of size M X N. The root-mean-square error, erms, between f(x, y) and f^(x,
y) then is the square root of the squared error averaged over the M X N array, or
A closely related objective fidelity criterion is the mean-square signal-to-noise ratio of the
compressed-decompressed image. If f^ (x, y) is considered to be the sum of the original image
f(x, y) and a noise signal e(x, y), the mean-square signal-to-noise ratio of the output image,
denoted SNRrms, is
The rms value of the signal-to-noise ratio, denoted SNRrms, is obtained by taking the square root
of Eq. above.
Although objective fidelity criteria offer a simple and convenient mechanism for
evaluating information loss, most decompressed images ultimately are viewed by humans.
Consequently, measuring image quality by the subjective evaluations of a human observer often
is more appropriate. This can be accomplished by showing a "typical" decompressed image to an
appropriate cross section of viewers and averaging their evaluations. The evaluations may be
made using an absolute rating scale or by means of side-by-side comparisons of f(x, y) and f^(x,
191
Digital Image Processing
y).
Fig. 3.1 shows, a compression system consists of two distinct structural blocks: an encoder and a
decoder. An input image f(x, y) is fed into the encoder, which creates a set of symbols from the
input data. After transmission over the channel, the encoded representation is fed to the decoder,
where a reconstructed output image f^(x, y) is generated. In general, f^(x, y) may or may not be
an exact replica of f(x, y). If it is, the system is error free or information preserving; if not, some
level of distortion is present in the reconstructed image. Both the encoder and decoder shown in
Fig. 3.1 consist of two relatively independent functions or subblocks. The encoder is made up of
a source encoder, which removes input redundancies, and a channel encoder, which increases the
noise immunity of the source encoder's output. As would be expected, the decoder includes a
channel decoder followed by a source decoder. If the channel between the encoder and decoder
is noise free (not prone to error), the channel encoder and decoder are omitted, and the general
encoder and decoder become the source encoder and decoder, respectively.
The source encoder is responsible for reducing or eliminating any coding, interpixel, or
psychovisual redundancies in the input image. The specific application and associated fidelity
requirements dictate the best encoding approach to use in any given situation. Normally, the
approach can be modeled by a series of three independent operations. As Fig. 3.2 (a) shows, each
operation is designed to reduce one of the three redundancies. Figure 3.2 (b) depicts the
corresponding source decoder. In the first stage of the source encoding process, the mapper
transforms the input data into a (usually nonvisual) format designed to reduce interpixel
redundancies in the input image. This operation generally is reversible and may or may not
reduce directly the amount of data required to represent the image.
192
Digital Image Processing
Run-length coding is an example of a mapping that directly results in data compression in this
initial stage of the overall source encoding process. The representation of an image by a set of
transform coefficients is an example of the opposite case. Here, the mapper transforms the image
into an array of coefficients, making its interpixel redundancies more accessible for compression
in later stages of the encoding process.
The second stage, or quantizer block in Fig. 3.2 (a), reduces the
accuracy of the mapper's output in accordance with some preestablished fidelity criterion. This
stage reduces the psychovisual redundancies of the input image. This operation is irreversible.
Thus it must be omitted when error-free compression is desired.
In the third and final stage of the source encoding process, the symbol
coder creates a fixed- or variable-length code to represent the quantizer output and maps the
output in accordance with the code. The term symbol coder distinguishes this coding operation
from the overall source encoding process. In most cases, a variable-length code is used to
represent the mapped and quantized data set. It assigns the shortest code words to the most
frequently occurring output values and thus reduces coding redundancy. The operation, of
course, is reversible. Upon completion of the symbol coding step, the input image has been
processed to remove each of the three redundancies.
Figure 3.2(a) shows the source encoding process as three successive operations, but all three
operations are not necessarily included in every compression system. Recall, for example, that
the quantizer must be omitted when error-free compression is desired. In addition, some
compression techniques normally are modeled by merging blocks that are physically separate in
193
Digital Image Processing
Fig. 3.2(a). In the predictive compression systems, for instance, the mapper and quantizer are
often represented by a single block, which simultaneously performs both operations.
The source decoder shown in Fig. 3.2(b) contains only two components: a symbol
decoder and an inverse mapper. These blocks perform, in reverse order, the inverse operations of
the source encoder's symbol encoder and mapper blocks. Because quantization results in
irreversible information loss, an inverse quantizer block is not included in the general source
decoder model shown in Fig. 3.2(b).
The channel encoder and decoder play an important role in the overall encoding-decoding
process when the channel of Fig. 3.1 is noisy or prone to error. They are designed to reduce the
impact of channel noise by inserting a controlled form of redundancy into the source encoded
data. As the output of the source encoder contains little redundancy, it would be highly sensitive
to transmission noise without the addition of this "controlled redundancy." One of the most
useful channel encoding techniques was devised by R. W. Hamming (Hamming [1950]). It is
based on appending enough bits to the data being encoded to ensure that some minimum number
of bits must change between valid code words. Hamming showed, for example, that if 3 bits of
redundancy are added to a 4-bit word, so that the distance between any two valid code words is
3, all single-bit errors can be detected and corrected. (By appending additional bits of
redundancy, multiple-bit errors can be detected and corrected.) The 7-bit Hamming (7, 4) code
word h1, h2, h3…., h6, h7 associated with a 4-bit binary number b3b2b1b0 is
where Ⓧ denotes the exclusive OR operation. Note that bits h1, h2, and h4 are even- parity bits for
the bit fields b3 b2 b0, b3b1b0, and b2b1b0, respectively. (Recall that a string of binary bits has
even parity if the number of bits with a value of 1 is even.) To decode a Hamming encoded
result, the channel decoder must check the encoded value for odd parity over the bit fields in
which even parity was previously established. A single-bit error is indicated by a nonzero parity
word c4c2c1, where
194
Digital Image Processing
If a nonzero value is found, the decoder simply complements the code word bit position
indicated by the parity word. The decoded binary value is then extracted from the corrected code
word as h3 h5 h6h7.
Variable-Length Coding:
The simplest approach to error-free image compression is to reduce only coding redundancy.
Coding redundancy normally is present in any natural binary encoding of the gray levels in an
image. It can be eliminated by coding the gray levels. To do so requires construction of a variable-
length code that assigns the shortest possible code words to the most probable gray levels. Here,
we examine several optimal and near optimal techniques for constructing such a code. These
techniques are formulated in the language of information theory. In practice, the source symbols
may be either the gray levels of an image or the output of a gray-level mapping operation (pixel
differences, run lengths, and so on).
Huffman coding:
The most popular technique for removing coding redundancy is due to Huffman (Huffman
[1952]). When coding the symbols of an information source individually, Huffman coding yields
the smallest possible number of code symbols per source symbol. In terms of the noiseless
coding theorem, the resulting code is optimal for a fixed value of n, subject to the constraint that
the source symbols be coded one at a time.
The first step in Huffman's approach is to create a series of source reductions by ordering the
probabilities of the symbols under consideration and combining the lowest probability symbols
into a single symbol that replaces them in the next source reduction. Figure 4.1 illustrates this
process for binary coding (K-ary Huffman codes can also be constructed). At the far left, a
hypothetical set of source symbols and their probabilities are ordered from top to bottom in terms
of decreasing probability values. To form the first source reduction, the bottom two probabilities,
0.06 and 0.04, are combined to form a "compound symbol" with probability 0.1. This compound
symbol and its associated probability are placed in the first source reduction column so that the
195
Digital Image Processing
probabilities of the reduced source are also ordered from the most to the least probable. This
process is then repeated until a reduced source with two symbols (at the far right) is reached.
appended to each to distinguish them from each other. This operation is then repeated for each
reduced source until the original source is reached. The final code appears at the far left in Fig.
4.2. The average length of this code is
196
Digital Image Processing
and the entropy of the source is 2.14 bits/symbol. The resulting Huffman code efficiency is
0.973.
Huffman's procedure creates the optimal code for a set of symbols and probabilities subject to
the constraint that the symbols be coded one at a time. After the code has been created, coding
and/or decoding is accomplished in a simple lookup table manner. The code itself is an
instantaneous uniquely decodable block code. It is called a block code because each source
symbol is mapped into a fixed sequence of code symbols. It is instantaneous, because each code
word in a string of code symbols can be decoded without referencing succeeding symbols. It is
uniquely decodable, because any string of code symbols can be decoded in only one way. Thus,
any string of Huffman encoded symbols can be decoded by examining the individual symbols of
the string in a left to right manner. For the binary code of Fig. 4.2, a left-to-right scan of the
encoded string 010100111100 reveals that the first valid code word is 01010, which is the code
for symbol a3 .The next valid code is 011, which corresponds to symbol a1. Continuing in this
manner reveals the completely decoded message to be a3a1a2a2a6.
Arithmetic coding:
Unlike the variable-length codes described previously, arithmetic coding generates nonblock
codes. In arithmetic coding, which can be traced to the work of Elias, a one-to-one
correspondence between source symbols and code words does not exist. Instead, an entire
sequence of source symbols (or message) is assigned a single arithmetic code word. The code
word itself defines an interval of real numbers between 0 and 1. As the number of symbols in the
message increases, the interval used to represent it becomes smaller and the number of
information units (say, bits) required to represent the interval becomes larger. Each symbol of the
message reduces the size of the interval in accordance with its probability of occurrence.
Because the technique does not require, as does Huffman's approach, that each source symbol
translate into an integral number of code symbols (that is, that the symbols be coded one at a
time), it achieves (but only in theory) the bound established by the noiseless coding theorem.
197
Digital Image Processing
Figure 5.1 illustrates the basic arithmetic coding process. Here, a five-symbol sequence or
message, a1a2a3a3a4, from a four-symbol source is coded. At the start of the coding process, the
message is assumed to occupy the entire half-open interval [0, 1). As Table 5.2 shows, this
interval is initially subdivided into four regions based on the probabilities of each source symbol.
Symbol ax, for example, is associated with subinterval [0, 0.2). Because it is the first symbol of
the message being coded, the message interval is initially narrowed to [0, 0.2). Thus in Fig. 5.1
[0, 0.2) is expanded to the full height of the figure and its end points labeled by the values of the
narrowed range. The narrowed range is then subdivided in accordance with the original source
symbol probabilities and the process continues with the next message symbol.
In this manner, symbol a2 narrows the subinterval to [0.04, 0.08), a3 further narrows it to [0.056,
0.072), and so on. The final message symbol, which must be reserved as a special end-of-
198
Digital Image Processing
message indicator, narrows the range to [0.06752, 0.0688). Of course, any number within this
subinterval—for example, 0.068—can be used to represent the message.
In the arithmetically coded message of Fig. 5.1, three decimal digits are used
to represent the five-symbol message. This translates into 3/5 or 0.6 decimal digits per source
symbol and compares favorably with the entropy of the source, which is 0.58 decimal digits or 10-
ary units/symbol. As the length of the sequence being coded increases, the resulting arithmetic
code approaches the bound established by the noiseless coding theorem.
In practice, two factors cause coding performance to fall short of the bound: (1)
the addition of the end-of-message indicator that is needed to separate one message from an-
other; and (2) the use of finite precision arithmetic. Practical implementations of arithmetic
coding address the latter problem by introducing a scaling strategy and a rounding strategy
(Langdon and Rissanen [1981]). The scaling strategy renormalizes each subinterval to the [0, 1)
range before subdividing it in accordance with the symbol probabilities. The rounding strategy
guarantees that the truncations associated with finite precision arithmetic do not prevent the
coding subintervals from being represented accurately.
LZW Coding:
The technique, called Lempel-Ziv-Welch (LZW) coding, assigns fixed-length code words to
variable length sequences of source symbols but requires no a priori knowledge of the
probability of occurrence of the symbols to be encoded. LZW compression has been integrated
into a variety of mainstream imaging file formats, including the graphic interchange format
(GIF), tagged image file format (TIFF), and the portable document format (PDF).
LZW coding is conceptually very simple (Welch [1984]). At the onset of the
coding process, a codebook or "dictionary" containing the source symbols to be coded is
constructed. For 8-bit monochrome images, the first 256 words of the dictionary are assigned to
the gray values 0, 1, 2..., and 255. As the encoder sequentially examines the image's pixels, gray-
level sequences that are not in the dictionary are placed in algorithmically determined (e.g., the
next unused) locations. If the first two pixels of the image are white, for instance, sequence ―255-
255‖ might be assigned to location 256, the address following the locations reserved for gray
levels 0 through 255. The next time that two consecutive white pixels are encountered, code
word 256, the address of the location containing sequence 255-255, is used to represent them. If
a 9-bit, 512-word dictionary is employed in the coding process, the original (8 + 8) bits that were
199
Digital Image Processing
used to represent the two pixels are replaced by a single 9-bit code word. Cleary, the size of the
dictionary is an important system parameter. If it is too small, the detection of matching gray-
level sequences will be less likely; if it is too large, the size of the code words will adversely
affect compression performance.
Table 6.1 details the steps involved in coding its 16 pixels. A 512-word dictionary with the
following starting content is assumed:
Locations 256 through 511 are initially unused. The image is encoded by processing its pixels in
a left-to-right, top-to-bottom manner. Each successive gray-level value is concatenated with a
variable—column 1 of Table 6.1 —called the "currently recognized sequence." As can be seen,
this variable is initially null or empty. The dictionary is searched for each concatenated sequence
and if found, as was the case in the first row of the table, is replaced by the newly concatenated
and recognized (i.e., located in the dictionary) sequence. This was done in column 1 of row 2.
200
Digital Image Processing
No output codes are generated, nor is the dictionary altered. If the concatenated sequence is not
found, however, the address of the currently recognized sequence is output as the next encoded
value, the concatenated but unrecognized sequence is added to the dictionary, and the currently
recognized sequence is initialized to the current pixel value. This occurred in row 2 of the table.
The last two columns detail the gray-level sequences that are added to the dictionary when
scanning the entire 4 x 4 image. Nine additional code words are defined. At the conclusion of
coding, the dictionary contains 265 code words and the LZW algorithm has successfully
identified several repeating gray-level sequences—leveraging them to reduce the original 128-bit
image lo 90 bits (i.e., 10 9-bit codes). The encoded output is obtained by reading the third
column from top to bottom. The resulting compression ratio is 1.42:1.
A unique feature of the LZW coding just demonstrated is that the coding
dictionary or code book is created while the data are being encoded. Remarkably, an LZW
decoder builds an identical decompression dictionary as it decodes simultaneously the encoded
data stream. . Although not needed in this example, most practical applications require a strategy
for handling dictionary overflow. A simple solution is to flush or reinitialize the dictionary when
201
Digital Image Processing
it becomes full and continue coding with a new initialized dictionary. A more complex option is
to monitor compression performance and flush the dictionary when it becomes poor or
unacceptable. Alternately, the least used dictionary entries can be tracked and replaced when
necessary.
Bit-Plane Coding:
An effective technique for reducing an image's interpixel redundancies is to process the image's
bit planes individually. The technique, called bit-plane coding, is based on the concept of
decomposing a multilevel (monochrome or color) image into a series of binary images and
compressing each binary image via one of several well-known binary compression methods.
Bit-plane decomposition:
The gray levels of an m-bit gray-scale image can be represented in the form of the base 2
polynomial
Based on this property, a simple method of decomposing the image into a collection of binary
images is to separate the m coefficients of the polynomial into m 1-bit bit planes. The zeroth-
order bit plane is generated by collecting the a0 bits of each pixel, while the (m - 1) st-order bit
plane contains the am-1, bits or coefficients. In general, each bit plane is numbered from 0 to m-1
and is constructed by setting its pixels equal to the values of the appropriate bits or polynomial
coefficients from each pixel in the original image. The inherent disadvantage of this approach is
that small changes in gray level can have a significant impact on the complexity of the bit planes.
If a pixel of intensity 127 (01111111) is adjacent to a pixel of intensity 128 (10000000), for
instance, every bit plane will contain a corresponding 0 to 1 (or 1 to 0) transition. For example,
as the most significant bits of the two binary codes for 127 and 128 are different, bit plane 7 will
contain a zero-valued pixel next to a pixel of value 1, creating a 0 to 1 (or 1 to 0) transition at
that point.
202
Digital Image Processing
Here, Ⓧ denotes the exclusive OR operation. This code has the unique property that successive
code words differ in only one bit position. Thus, small changes in gray level are less likely to
affect all m bit planes. For instance, when gray levels 127 and 128 are adjacent, only the 7th bit
plane will contain a 0 to 1 transition, because the Gray codes that correspond to 127 and 128 are
11000000 and 01000000, respectively.
The error-free compression approach does not require decomposition of an image into a
collection of bit planes. The approach, commonly referred to as lossless predictive coding, is
based on eliminating the interpixel redundancies of closely spaced pixels by extracting and
coding only the new information in each pixel. The new information of a pixel is defined as the
difference between the actual and predicted value of that pixel.
203
Digital Image Processing
The decoder of Fig. 8.1 (b) reconstructs en from the received variable-length code words and
performs the inverse operation
Various local, global, and adaptive methods can be used to generate f^ n. In most cases, however,
the prediction is formed by a linear combination of m previous pixels. That is,
where m is the order of the linear predictor, round is a function used to denote the rounding or
nearest integer operation, and the αi, for i = 1,2,..., m are prediction coefficients. In raster scan
applications, the subscript n indexes the predictor outputs in accordance with their time of
occurrence. That is, fn, f^n and en in Eqns. above could be replaced with the more explicit
notation f (t), f^(t), and e (t), where t represents time. In other cases, n is used as an index on the
spatial coordinates and/or frame number (in a time sequence of images) of an image. In 1-D
linear predictive coding, for example, Eq. above can be written as
204
Digital Image Processing
where each subscripted variable is now expressed explicitly as a function of spatial coordinates x
and y. The Eq. indicates that the 1-D linear prediction f(x, y) is a function of the previous pixels
on the current line alone. In 2-D predictive coding, the prediction is a function of the previous
pixels in a left-to-right, top-to-bottom scan of an image. In the 3-D case, it is based on these
pixels and the previous pixels of preceding frames. Equation above cannot be evaluated for the
first m pixels of each line, so these pixels must be coded by using other means (such as a
Huffman code) and considered as an overhead of the predictive coding process. A similar
comment applies to the higher-dimensional cases.
In this type of coding, we add a quantizer to the lossless predictive model and examine the
resulting trade-off between reconstruction accuracy and compression performance. As Fig.9
shows, the quantizer, which absorbs the nearest integer function of the error-free encoder, is
inserted between the symbol encoder and the point at which the prediction error is formed. It
maps the prediction error into a limited range of outputs, denoted e^ n which establish the amount
of compression and distortion associated with lossy predictive coding.
Fig. 9 A lossy predictive coding model: (a) encoder and (b) decoder.
205
Digital Image Processing
In order to accommodate the insertion of the quantization step, the error-free encoder of figure
must be altered so that the predictions generated by the encoder and decoder are equivalent. As
Fig.9 (a) shows, this is accomplished by placing the lossy encoder's predictor within a feedback
loop, where its input, denoted f˙n, is generated as a function of past predictions and the
corresponding quantized errors. That is,
This closed loop configuration prevents error buildup at the decoder's output. Note from Fig. 9
(b) that the output of the decoder also is given by the above Eqn.
Optimal predictors:
The optimal predictor used in most predictive coding applications minimizes the encoder's mean-
square prediction error
and
That is, the optimization criterion is chosen to minimize the mean-square prediction error, the
quantization error is assumed to be negligible (e˙n ≈ en), and the prediction is constrained to a
linear combination of m previous pixels.1 These restrictions are not essential, but they simplify
the analysis considerably and, at the same time, decrease the computational complexity of the
predictor. The resulting predictive coding approach is referred to as differential pulse code
modulation (DPCM).
206
Digital Image Processing
Transform Coding:
All the predictive coding techniques operate directly on the pixels of an image and thus are
spatial domain methods. In this coding, we consider compression techniques that are based on
modifying the transform of an image. In transform coding, a reversible, linear transform (such as
the Fourier transform) is used to map the image into a set of transform coefficients, which are
then quantized and coded. For most natural images, a significant number of the coefficients have
small magnitudes and can be coarsely quantized (or discarded entirely) with little image
distortion. A variety of transformations, including the discrete Fourier transform (DFT), can be
used to transform the image data.
Figure 10 shows a typical transform coding system. The decoder implements the inverse
sequence of steps (with the exception of the quantization function) of the encoder, which
performs four relatively straightforward operations: subimage decomposition, transformation,
quantization, and coding. An N X N input image first is subdivided into subimages of size n X n,
which are then transformed to generate (N/n) 2 subimage transform arrays, each of size n X n.
The goal of the transformation process is to decorrelate the pixels of each subimage, or to pack
as much information as possible into the smallest number of transform coefficients. The
quantization stage then selectively eliminates or more coarsely quantizes the coefficients that
carry the least information. These coefficients have the smallest impact on reconstructed
subimage quality. The encoding process terminates by coding (normally using a variable-length
code) the quantized coefficients. Any or all of the transform encoding steps can be adapted to
207
Digital Image Processing
local image content, called adaptive transform coding, or fixed for all subimages, called
nonadaptive transform coding.
The wavelet coding is based on the idea that the coefficients of a transform that decorrelates the
pixels of an image can be coded more efficiently than the original pixels themselves. If the
transform's basis functions—in this case wavelets—pack most of the important visual
information into a small number of coefficients, the remaining coefficients can be quantized
coarsely or truncated to zero with little image distortion.
Since many of the computed coefficients carry little visual information, they can be quantized
and coded to minimize intercoefficient and coding redundancy. Moreover, the quantization can
be adapted to exploit any positional correlation across the P decomposition levels. One or more
of the lossless coding methods, including run-length, Huffman, arithmetic, and bit-plane coding,
can be incorporated into the final symbol coding step. Decoding is accomplished by inverting the
encoding operations—with the exception of quantization, which cannot be reversed exactly.
208
Digital Image Processing
Because wavelet transforms are both computationally efficient and inherently local (i.e., their
basis functions are limited in duration), subdivision of the original image is unnecessary.
210