Dip Unit-I
Dip Unit-I
Introduction
The ability to see is one of the truly remarkable characteristics of living beings. It enables them to
perceive and assimilate in a short span of time an incredible amount of knowledge about the world
around them. The scope and variety of that which can pass through the eye and be interpreted by the
brain is nothing short of astounding.
It is thus with some degree of trepidation that we introduce the concept of visual information, because
in the broadest sense, the overall significance of the term is overwhelming. Instead of taking into
account all of the ramifications of visual information; the first restriction we shall impose is that of finite
image size, In other words, the viewer receives his or her visual information as if looking through a
rectangular window of finite dimensions. This assumption is usually necessary in dealing with real world
systems such as cameras, microscopes and telescopes for example; they all have finite fields of view
and can handle only finite amounts of information.
The second assumption we make is that the viewer is incapable of depth perception on his own. That
is, in the scene being viewed he cannot tell how far away objects are by the normal use of binocular
vision or by changing the focus of his eyes.
This scenario may seem a bit dismal. But in reality, this model describes an over whelming proportion
of systems that handle visual information, including television, photographs, x-rays etc.
In this setup, the visual information is determined completely by the wavelengths and amplitudes of light
that passes through each point of the window and reach the viewers eye. If the world outside were to
be removed and a projector installed that reproduced exactly the light distribution on the window, the
viewer inside would not be able to tell the difference.
Thus, the problem of numerically representing visual information is reduced to that of representing the
distribution of light energy and wavelengths on the finite area of the window. We assume that the image
perceived is "monochromatic" and static. It is determined completely by the perceived light energy
(weighted sum of energy at perceivable wavelengths) passing through each point on the window and
reaching the viewer's eye. If we impose Cartesian coordinates on the window, we can represent
perceived light energy or "intensity" at point by . Thus represents the
monochromatic visual information or "image" at the instant of time under consideration. As images that
occur in real life situations cannot be exactly specified with a finite amount of numerical data, an
approximation of must be made if it is to be dealt with by practical systems. Since number
bases can be changed without loss of information, we may assume to be represented by
binary digital data. In this form the data is most suitable for several applications such as transmission
via digital communications facilities, storage within digital memory media or processing by computer.
The 2D continuous image is divided into N rows and M columns. The intersection of a row
and a column is termed a pixel. The value assigned to the integer
coordinates with and is . In fact, in most
cases ,which we might consider to be the physical signal that impinges on the face of a 2D
sensor , is actually a function of many variables including depth ,color ( )and time .Unless
otherwise stated, we will consider the case of 2D, monochromatic, static images in this module.
The image shown in figure (1.1) has been divided into rows and The value assigned
to every pixel is the average brightness in the pixel rounded to the nearest integer value. The process
of representing the amplitude of the 2D signal at a given coordinate as an integer value with L different
gray levels is usually referred to as amplitude quantization or simple quantization.
Common values
There are standard values for the various parameters encountered in digital image processing. These
values can be caused by video standards, by algorithmic requirements, or by the desire to keep digital
circuitry simple. Table 1 gives some comm
Parameter Symbol Typical values
Rows N 256,512,525,625,1024,1035
Columns M 256,512,768,1024,1320
Gray Levels L 2,64,256,1024,4096,16384
Quite frequently we see cases of M=N=2k where .This can be motivated by digital
circuitry or by the use of certain algorithms such as the (fast) Fourier transform.
The number of distinct gray levels is usually a power of 2, that is, where B is the number of bits
in the binary representation of the brightness levels. When we speak of a gray-level image;
when we speak of a binary image. In a binary image there are just two gray levels which can be
referred to, for example, as "black" and "white" or "0" and "1".
Suppose that a continuous image is approximated by equally spaced samples arranged in the
form of an array as:
(1)
Each element of the array refered to as "pixel" is a discrete quantity. The array represents a digital
image.
The above digitization requires a decision to be made on a value for N a well as on the number of
discrete gray levels allowed for each pixel.
It is common practice in digital image processing to let N=2n and G = number of gray levels = . It
is assumed that discrete levels are equally spaced between 0 to L in the gray scale.
Reasonable question to ask at this point is how many samples and gray levels are required for a good
approximation? This brings up the question of resolution. The resolution (ie the degree of discernble
detail) of an image is strangely dependent on both N and m. The more these parameters are increased,
the closer the digitized array will approximate the original image.
Unfortunately this leads to large storage and consequently processing requirements increase rapidly
as a function of N and large m.
There is a variety of ways to classify and characterize image operations. The reason for doing so is to
understand what type of results we might expect to achieve with a given type of operation or what
might be the computational burden associated with a given operation.
Type of operations
The types of operations that can be applied to digital images to transform an input image a[m, n] into
an output image b[m, n] (or another representation) can be classified into three categories as shown in
Table 2.
Table 2: Types of image operations. Image size= neighborhood size= . Note that the
complexity is specified in operations per pixel.
Types of neighborhoods
Neighborhood operations play a key role in modern digital image processing. It is therefore important
to understand how images can be sampled and how that relates to the various neighborhoods that can
be used to process an image.
Rectangular sampling - In most cases, images are sampled by laying a rectangular grid over an image
as illustrated in Figure(1.1). This results in the type of sampling shown in Figure(1.3ab). Hexagonal
sampling-An alternative sampling scheme is shown in Figure (1.3c) and is termed hexagonal sampling.
Both sampling schemes have been studied extensively and both represent a possible periodic tiling of
the continuous image space. However rectangular sampling due to hardware and software and software
considerations remains the method of choice. Local operations produce an output pixel
the neighborhood .Some of the most common neighborhoods are the 4-connected
neighborhood and the 8-connected neighborhood in the case of rectangular sampling and the 6-
connected neighborhood in the case of hexagonal sampling illustrated in Figure(1.3).
In a interlaced image the odd numbered lines (1, 3, 5.) are scanned in half of the allotted time (e.g. 20
ms in PAL) and the even numbered lines (2, 4, 6,.) are scanned in the remaining half. The image
display must be coordinated with this scanning format. The reason for interlacing the scan lines of a
video image is to reduce the perception of flicker in a displayed image. If one is planning to use
images that have been scanned from an interlaced video source, it is important to know if the two
half-images have been appropriately "shuffled" by the digitization hardware or if that should be
implemented in software. Further, the analysis of moving objects requires special care with interlaced
video to avoid 'Zigzag' edges.
Tools
Certain tools are central to the processing of digital images. These include mathematical tools such
as convolution, Fourier analysis, and statistical descriptions, and manipulative tools such as chain
codes and run codes. We will present these tools without any specific motivation. The motivation will
follow in later sections.
2D Convolution
There are several possible notations to indicate the convolution of two (multi-dimensional)
signals to produce an output signal. The most common are:
In 2D discrete space:
Properties of Convolution
Convolution is associative.
Convolution is distributive.
2D Fourier Transforms
where , we can say that the Fourier transform produces a representation of a (2D)
signal as a weighted sum of sines and cosines. The defining formulas for the forward Fourier
and the inverse Fourier transforms are as follows. Given an image a and its Fourier transform
A, then the forward transform goes from the spatial domain (either continuous or discrete) to
the frequency domain which is always continuous.
The inverse Fourier transform goes from the frequency domain back to the spatial domain.
The specific formulas for transforming back and forth between the spatial domain and the
frequency domain are given below
In 2D continuous space:
In 2D Discrete space:
There are a variety of properties associated with the Fourier transform and the inverse Fourier
transform. The following are some of the most relevant for digital image processing.
* The Fourier transform is, in general, a complex function of the real frequency variables. As such the
transform con be written in terms of its magnitude and phase.
* A 2D signal can also be complex and thus written in terms of its magnitude and phase.
The symbol (*) indicates complex conjugation. For real signals equation leads directly to,
* If a 2D signal is real and even, then the Fourier transform is real and even
* The Fourier and the inverse Fourier transforms are linear operations
where a and b are 2D signals(images) and and are arbitrary, complex constants.
* The Fourier transform in discrete space, ,is periodic in both and .Both periods
are
integers
The definition indicates that the Fourier transform of an image can be complex. This is illustrated below in Figure (1.
4a-c).
Figure (1.4a) shows the original image , Figure (1.4b) the magnitude in a scaled form
Both the magnitude and the phase functions are necessary for the complete reconstruction of an image from its Fourier
transform. Figure(1. 5a) shows what happens when Figure (1.4a) is restored solely on the basis of the magnitude
information and Figure (1.5b) shows what happens when Figure (1.4a) is restored solely on the basis of the phase
information.
An arbitrary 2D signal can always be written in a polar coordinate system as .When the 2D signal
exhibits a circular symmetry this means that:
where and . As a number of physical systems such as lenses exhibit circular symmetry,
it is useful to be able to compute an appropriate Fourier representation.
The Fourier transform can be written in polar coordinates and then, for a circularly symmetric
signal, rewritten as a Hankel transform:
(1.2)
The Fourier transform of a circularly symmetric 2D signal is a function of only the radial frequency
.The dependence on the angular frequency has vanished. Further if is real, then it is
automatically even due to the circular symmetry. According to equ (1.2), will then be real and even.
Statistics
In image processing it is quite common to use simple statistical descriptions of images and sub-images. The notion
of a statistic is intimately connected to the concept of a probability distribution, generally the distribution of signal
amplitudes. For a given region-which could conceivably be an entire image-we can define the probability distribution
function of the brightnesses in that region and probability density function of the brightnesses in that region. We will
assume in the discussion that follows that we are dealing with a digitized image .
The probability distribution function, , is the probability that a brightness chosen from the region is less than or
equal to a given brightness value a. As a increases from increases from 0 to 1. is monotonic,
non-decreasing in a and thus .
Probability density function of the brightnesses
The probability that a brightness in a region falls between a and ,given the probability distribution
function can be expressed as where is the probability density function.
The brightness probability distribution function for the image is shown in Figure(1. 6a). The (unnormalized) brightness
histogram which is proportional to the estimated brightness probability density function is shown in Figure(1. 6b). The
height in this histogram corresponds to the number of pixels with a given brightness.
Figure (1.6a) Figure( 1.6b)
Figure(1. 6): (a) Brightness distribution function of Figure(1. 4a) with minimum, median, and maximum indicated.
Both the distribution function and the histogram as measured from a region are a statistical description of that region.
It must be emphasized that both and should be viewed as estimates of true distributions when they are
computed from a specific region. That is, we view an image and a specific region as one realization of the various
random processes involved in the formation of that image and that region . In the same context, the statistics defined
below must be viewed as estimates of the underlying parameters.
Average
The average brightness of a region is defined as sample mean of the pixel brightnesses within that region. The
average, of the brightness over the N pixels within a region is given by:
Alternatively, we can use a formulation based upon the (unnormalized) brightness histogram, ,with
discrete brightness values a. This gives:
The average brightness is an estimate of the mean brightness, ,of the underlying brightness probability
distribution.
Standard deviation
The unbiased estimate of the standard deviation, of the brightnesses within a region with N pixels is called
the sample standard deviation and is given by:
Coefficient-of-variation
Percentiles-
The percentile, p%, of an unquantized brightness distribution is defined as that value of the brightness
such that:
or equivalently
Mode-
The mode of the distribution is the most frequent brightness value. There is no guarantee that a mode
exists or that it is unique.
The signal-to-noise ratio, SNR, can have several definitions. The noise is characterized by its standard
deviation, .The characterization of the signal can differ. If the signal is known to lie between two
Bounded signal
(1.B)
If the signal is not bounded but has a statistical distribution then two other definitions are known:
The various statistics are given in Table 5 for the image and the region shown in Figure 7.
A SNR calculation for the entire image based on equ (1.3) is not directly available. The variations in
the image brightnesses that lead to the large value of s (=49.5) are not, in general, due to noise but to
the variation in local information. With the help of the region there is a way to estimate the SNR. We
can use the (=4.0) and the dynamic range, , for the image (=241-56) to calculate a
global SNR (=33.3 dB). The underlying assumptions are that (1) the signal is approximately constant
in that region and the variation in the region is therefore due to noise, and, (2 ) that the noise is the