Digital Image Processing Fundamentals: There's More To It Than Meets The Eye
Digital Image Processing Fundamentals: There's More To It Than Meets The Eye
uch that .
(5.1)
We use the symbol R to denote the set of real numbers. Thus
R x x { } : is a real number , which says that R is the set of all x such that x is a real
number. We read (5.1) saying, in the limit, as t approaches a, such that a is a
member of the set of real numbers, s t s a ( ) ( ) . The expression x P x : ( ) { } is read as
the set of all xs such that P(x) is true [Moore 64].
This is an iff (i.e., if and only if) condition. Thus, the converse must also be true. That is,
s(t)is not continuous iff there exists a value, a such that:
lims(t) s(a)
t a
(5.2)
is true.
For example, if s(t) has multiple values at a, then the limit does not exist at a.
The analog-to-digital conversion consists of a sampler and a quantizer. The quantization is
typically performed by dividing the signal into several uniform steps. This has the effect of
introducing quantization noise. Quantization noise is given, in dB, using
SNR b + 6 4 8 . (5.3)
where SNR is the signal-to-noise ratio and b is the number of bits. To prove (5.3),
we follow [Moore] and assume that the input signal ranges from -1 to 1 volts. That
is,
s t x x R x ( ) : { } and 1 1 (5.3a)
Note that the number of quantization intervals is 2
b
. The least significant bit has a
quantization size of V
qe
b
2 . Following [Mitra], we obtain the bound on the size of
the error with:
V e V
qe qe
(5.3b)
The variance of a random variable, X, is found by
X X
t x f x dx
2 2
( ) ( ) ,
where
f x
X
( ) is a probability distribution function. For the signal whose average is zero, the
variance of (5.3b) is
e
qe
V
V
qe
V
d
V qe
qe
2
1
1
2
2
2
1
3
/
/
(5.3c).
The signal-to-noise ratio for the quantization power is
SNR b
b
( )
+ 10 3 2 20 2 10 3
2
log log log (5.3d)
Hence the range on the upper bound for the signal-to-quantization noise power is
SNR b + 6 4 8 . (5.3).
Q.E.D.
In the above proof we assumed that uniform steps were used over a signal whose
average value is zero. In fact, a digitizer does not have to requantize an image so that
steps are uniform. An in-depth examination of the effects of non-linear quantization
on SNR is given in [Gersho]. Following Gersho, we generalize the result of (5.3),
defining the SNR as
SNR
dB
20log
< e(x)| p(x) >
_
,
(5.3e)
where
standard deviation
and < e| p > is the mean-square distortion defined by the inner product between the
square of the quantization error for value x and the probability of value x. The inner
product between e and p is given by
< e| p > e(x) p(x)dx
(5.3f).
where
e(x) Q(x) x ( )
2
(5.3g).
The inner product is an important tool in transform theory. We will expand our
discussion of the inner product when we touch upon the topic of sampling.
We define Q(x) as the quantized value for x. Maximizing SNR requires that we select
the quantizer to minimize (5.3f), given a priori knowledge of the PDF (if the PDF is
available). Recall that for an image, we compute the PMF (using the Histogram
class) as well as the CMF. As we shall see later, (5.3f) is minimized for k-level
thresholding (an intensity reduction to k colors) when the regions of the CMF are
divided into k sections. The color is then remapped into the center of each of the
CMF regions. Hence (5.3f) provides a mathematical basis for reducing the number of
colors in an image provided that the PDF is of zero mean (i.e, no DC offset) and has
even symmetry about zero. That is p(x) p(x) . Also, we assume that the quantizer
has odd symmetry about zero, i.e., Q(x) Q(x).
A simple zero-memory 4-point quantizer inputs 4 decision levels and outputs 4
corresponding values for input values that range within the 4 decision levels. When
the decision levels are placed into an array of double precision numbers, in Java (for
the 256 gray-scale values) we write:
public void thresh4(double d[]) {
short lut[] = new short[256];
if (d[4] ==0)
for (int i=0; i < lut.length; i++) {
if (i < d[0]) lut[i] = 0;
else if (i < d[1]) lut[i] = (short)d[0];
else if (i < d[2]) lut[i] = (short)d[1];
else if (i < d[3]) lut[i] = (short)d[2];
else lut[i] = 255;
System.out.println(lut[i]);
}
applyLut(lut);
}
We shall revisit quantization in Section 5.2.2.
Using the Java AWTs Image class, we have seen that 32 bits are used, per pixel (red,
green, blue and alpha). There are only 24 bits used per color, however. Section 5.2.2
shows how this relates to the software of this book.
Recall also that the digitization process led to sampling an analog signal. Sampling a signal
alters the harmonic content (also known as the spectra) of the signal. Sampling a
continuous signal may be performed with a pre-filter and a switch. Fig. 5-2 shows a
continuous function, f (x), being sampled at a frequency of f
s
.
f
s
f (x)
Anti-aliasing
Filter
Fig. 5-2. Sampling System
The switch in Fig. 5-2 is like a binary amplifier that is being turned on and off every 1 / f
s
seconds. It multiplies f (x) by an amplification factor of zero or one. Mathematically,
sampling is expressed as a pulse train, p(x), multiplied by the input signal f (x), i.e.,
sampling is f (x) p(x). .
To discuss the pulse train mathematically, we must introduce the notation for an impulse.
The unit impulse, or Dirac delta, is a generalized function that is defined by
(x)
dx (x)
dx 1 (5.4)
where is arbitrarily small. The Dirac delta has unit area about a small neighborhood
located at x 0. Multiply the Dirac delta by a function and it will sift out the values
where the Dirac delta is equal to zero:
f (x)(x)
dx f (x)(x)
dx f (0) (5.5)
This is called the sifting property of the Dirac delta. In fact, the Dirac delta is equal to
zero whenever its argument is non-zero. To make the Dirac activate, given a non-zero
argument, we bias the argument with an offset, (x x
offset
). A pulse train is created
by adding an infinite number of Dirac deltas together:
p(x) (x n / f
s
)
n
(5.6)
f (x) p(x) f (x) (x n / f
s
)
n
(5.7)
To find the spectra of (5.7) requires that we perform a Fourier transform. The
Fourier transform, just like any transform, performs a correlation between a function
and a kernel. The kernel of a transform typically consists of an orthogonal basis
about which the reconstruction of a waveform may occur. Two functions are
orthogonal if their inner product < f | g > =0. Recall that the inner product is given by
< f | g > f (x)g(x)dx inner product
(5.7a)
From linear algebra, we recall that a collection of linearly independent functions
forms a basis if every value in the set of all possible values may be expressed as a
linear combination of the basis set. Functions are linearly independent iff the sum of
the functions is non-zero (for non-zero co-efficients). Conversely, functions are
linearly dependent iff there exists a combination of non-zero coefficients for which
the summation is zero. For example:
c
1
cos(x) + c
2
sin(x) 0
iff c
1
c
2
0
(5.7b)
The ability to sum a series of sine and cosine functions together to create an arbitrary
function is known an the super position principle and applies only to periodic
waveforms. This was discovered in the 1800s by Jean Baptiste Joseph de Fourier
[Halliday] and is expressed as a summation of sine and cosines, with constants that
are called Fourier coefficients.
f (x) (a
k
coskt
k 0
+ b
k
sinkt) (5.7c)
We note that (5.7c) shows that the periodic signal has discrete spectral components.
We find the Fourier coefficients by taking the inner product of the function, f(x) with
the basis functions, sine and cosine. That is:
a
k
< f | cos(kt) >
b
k
< f |sin(kt) >
(5.7d)
For an elementary introduction to linear algebra, see [Anton]. For a concise summary
see [Stollnitz]. For an alternative derivation see [Lyon and Rao].
It is also possible to approximate an aperiodic waveform. This is done with the
Fourier transform. The Fourier transform uses sine and cosine as the basis functions
to form the inner product, as seen in (5.7a):
F(u) f (x)e
j 2ux
dx
< f | e
j 2ux
> (5.8).
By Eulers identity,
e
i
cos + i sin
(5.9)
we see that the sine and cosine basis functions are separated by being placed on the
real and imaginary axis.
Substituting (5.7) into (5.8) yields
F(u) * P(u) f (x) (x n / f
s
)
n
e
j 2ux
1
]
1
dx
(5.10)
where
P(u) (x n / f
s
)
n
e
j 2ux
dx
(5.11)
The term
F(u) * P(u) F( )P(u )d
(5.12)
defines a convolution. We can write (5.10) because multiplication in the time domain
is equivalent to convolution in the frequency domain. This is known as the
convolution theorem. Taking the Fourier transform of the convolution between two
functions in the time domain results in
< f * h| e
j 2ux
>< f ( ) p(x )d
| e
j 2ux
> (5.13)
which is expanded by (5.8) to yield:
f ( ) p(x )d
1
]
1
e
j 2ux
dx (5.13a)
Changing the order of integration in (5.13a) yields
f ( ) p(x )e
j 2ux
dx
1
]
1
d (5.13b)
with
P(u) p(x )e
j 2ux
dx
(5.13c)
and
F(u) f (x)e
j 2ux
dx (5.13d)
we get
F(u)P(u) f (x) * p(x)e
j 2ux
dx (5.14).
This shows that convolution in the time domain is multiplication in the frequency
domain. We can also show that convolution in the frequency domain is equal to
multiplication in the time domain. See [Carlson] for an alternative proof.
As a result of the convolution theorem, the Fourier transform of an impulse train is
also an impulse train,
F(u) * P(u) F(u) * f
s
( f nf
s
)
n
(5.15)
Finally, we see that sampling a signal at a rate of f
s
causes the spectrum to be
reproduced at f
s
intervals:
F(u) * P(u) f
s
F( f nf
s
)
n
(5.16)
(5.16) demonstrates the reason why a band limiting filter is needed before the
switching function of Fig. 5-2. This leads directly to the sampling theorem which
states that a band limited signal may be reconstructed without error if the sample rate
is twice the bandwidth. Such a sample rate is called the Nyquist rate and is given by
f
s
2B Hz .
5. 2. 2. Image Digitization
Typically, a camera is used to digitize an image. The modern CCD cameras have a photo
diode arranged in a rectangular array. Flat-bed scanners use a movable platen and a linear
array of photo diodes to perform the two-dimensional digitization.
Older tube type cameras used a wide variety of materials on a photosensitive surface. The
materials vary in sensitivity and output. See [Galbiati] for a more detailed description on
tube cameras.
The key point about digitizing an image in two dimensions is that we are able to detect both
the power of the incident energy as well as the direction.
The process of digitizing an image is described by the amount of spatial resolution and the
signal -to-noise ratio (i.e., number of bits per pixel) that the digitizer has. Often the number
of bits per pixel is limited by performing a thresholding. Thresholding (a topic treated more
thoroughly in Chap. 10) reduces the number of color values available in an image. This
simulates the effect of having fewer bits per pixel available for display. There are several
techniques available for thresholding. For the grayscale image, one may use the cumulative
mass function for the probability of a gray value to create a new look-up table. Another
approach is simply to divide the look-up table into uniform sections. Fig. 5-2 shows the
mandrill before and after thresholding operation. The decision about when to increment the
color value was made based on the CMF of the image. The number of bits per pixel (bpp),
shown in Fig. 5-2, ranging from left to right, top to bottom, are: 1 bpp, 2 bpp, 3 bpp and 8
bpp. Keep in mind that at a bit rate of 28 kbps (the rate of a modest Internet connection
over a phone line) the 8 bpp image (128x128) will take 4 seconds to download. Compare
this to the uncompressed 1 bpp image which will take 0.5 seconds to download. Also note
that the signal-to-noise ratio for these images ranges from 10 dB to 52 dB.
Fig. 5-3. Quantizing with Fewer Bits Per Pixel
The code snippet allows the cumulative mass function of the image to bias decisions about
when to increment the color value. The input to the code is the number of gray values, k.
There are several methods to perform the quantization. The one shown in Fig. 5-3 is useful
in edge detection (a topic covered in Chap. 10). The kgreyThresh method follows:
public void kgreyThresh(double k) {
Histogram rh = new Histogram(r,"red");
double cmf[] = rh.getCMF();
TransformTable tt = new TransformTable(cmf.length);
short lut[] = tt.getLut();
int q=1;
short v=0;
short dv = (short)(255/k);
for (int i=0; i < lut.length; i++) {
if (cmf[i] > q/k) {
v += dv;
q++; //(k == q+1)||
if (q==k) v=255;
}
lut[i]=v;
}
tt.setLut(lut);
tt.clip();
tt.print();
applyLut(lut);
}
5. 2. 3. Image Display
One display device that has come into common use is the cathode-ray tube (CRT). The
cathode ray tube displays an image using three additive colors: red, green and blue. These
colors are emitted using phosphors that are stimulated with a flow of electrons. Different
phosphors have different colors (spectral radiance).
There are three kinds of television systems in the world today, NTSC, PAL and SECAM.
NTSC which stands for National Television Subcommittee, is used in North America and
Japan. PAL stands for phase alternating line and is used in parts of Europe, Asia, South
America and Africa. SECAM stands for Sequential Couleur Mmorie (sequential
chrominance signal and memory) and is used in France, Eastern Europe and Russia.
The gamut of colors and the reference color known as white (called white balance) are
different on each of the systems.
Another type of display held in common use is the computer monitor.
Factors that afflict all displays include: ambient light, brightness (black level) and contrast
(picture). There are also phosphor chromaticity differences between different CRTs. These
alter the color gamut that may be displayed.
Manufacturers products are sometimes adopted as a standard for the color gamut to be
displayed by all monitors. For example, one U.S. manufacturer, Conrac, had a phosphor
that was adopted by SMPTE (Society of Motion Picture and Television Engineers) as the
basis for the SMPTE C phosphors.
The CRTs have a transfer function like that of (4.14), assuming the value, v ranges from
zero to one:
f (v) v
(5.3)
Typically, this is termed the gamma of a monitor and runs to a value of 2.2 [Blinn].
As Blinn points out, for a gamma of 2, only 194 values appear in a look-up table of
256 values. His suggestion that 16 bits per color might be enough to perform image
processing has been taken to heart, and this becomes another compelling reason to
use the Java short for storing image values. Thus, the image processing software in
this book does all its image processing as if intensity were linearly related to the value
of a pixel. With the storage of 48 bits per pixel (for red, green and blue) versus the
Java AWT model of 24 bits per red, green and blue value, we have increased our
signal-to-noise ratio for our image representation by 48 dB per color. So far, we have
not made good use of this extra bandwidth, but it is nice to know that it is there if we
need it.