0% found this document useful (0 votes)
42 views

Image Processing

Computer graphics uses display devices like CRT monitors to show images. CRT monitors use an electron beam to excite phosphors on the screen and produce light. Key aspects of CRT monitors include persistence (how long phosphors glow after being hit), resolution (number of pixels that can be displayed), aspect ratio (ratio of vertical to horizontal pixels), and raster scanning (sweeping the electron beam across the screen line by line). Color CRT monitors use techniques like beam penetration or shadow masking to control the colors produced at each pixel.

Uploaded by

sachin koli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Image Processing

Computer graphics uses display devices like CRT monitors to show images. CRT monitors use an electron beam to excite phosphors on the screen and produce light. Key aspects of CRT monitors include persistence (how long phosphors glow after being hit), resolution (number of pixels that can be displayed), aspect ratio (ratio of vertical to horizontal pixels), and raster scanning (sweeping the electron beam across the screen line by line). Color CRT monitors use techniques like beam penetration or shadow masking to control the colors produced at each pixel.

Uploaded by

sachin koli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

learning@cse.

iitk
Kamlesh Tiwari

1 Introduction to Computer require higher refresh rates to maintain a picture on the


screen without flicker. A phosphor with low persistence is
Graphics useful for animation; a high-persistence phosphor is useful
for displaying highly complex, static pictures. Although
Computer Graphics is a powerful tool for the rapid and some phosphors have a persistence greater than 1 second,
economical production of pictures. Few of the major ap- graphics monitors are usually constructed with a persis-
plication areas of CG are as below. tence in the range from 10 to 60 microseconds.

ˆ Computer aided design.


2.1.2 Resolution
ˆ Presentation graphics
The maximum number of points that can be displayed
ˆ Computer art
without overlap on a CRT is referred to as the resolu-
ˆ Entertainment tion. A more precise definition of resolution is the number
of points per centimeter that can be plotted horizontally
ˆ Education and training and vertically, although it is often simply stated as the to-
ˆ Visualization tal number of points in each direction. Two adjacent spot
will appear distinct as long as their separation is greater
ˆ Image processing than the diameter at which each spot has an intensity of
about 60 percent of that at the center of the spot. Res-
ˆ Graphical user interface
olution of a CRT is dependent on the type of phosphor,
the intensity to be displayed, and the focusing and deflec-
tion systems. Typical resolution on high-quality systems
2 Display Devices is 1280 by 1024. High- resolution systems are often re-
ferred to as high-definition systems. The physical size of a
The primary output device in a graphics system are video graphics monitor is given as the length of the screen diag-
monitor. The operation of most video monitors is based onal, with sizes varying from about 12 inches to 27 inches
on the standard cathode-ray tube (CRT)design, but sev- or more.
eral other technologies exist and solid-state monitors may
eventually predominate.
2.1.3 Aspect ratio

2.1 CRT This number gives the ratio of vertical points to horizon-
tal points necessary to produce equal-length lines in both
A beam of electrons (cathode rays), emitted by an elec- directions on the screen. An aspect ratio of 3 / 4 means
tron gun,passes through focusing and deflection systems that a vertical line plotted with three points has the same
that direct the beam toward specified positions on the length as a horizontal line plotted with four points.
phosphor coated screen. The phosphor then emits a small
spot of light at each position contacted by the electron
2.1.4 Raster scan
beam. Because the light emitted by the phosphor fades
very rapidly, some method is needed for maintaining the
In this system, the electron beam is swept across the
screen picture. One way to keep the phosphor glowing is
screen, one row at a time from top to bottom. As the
to redraw the picture repeatedly by quickly directing the
electron beam moves across each row, the beam intensity
electron beam back over the same points. This type of
is turned on and off to create a pattern of illuminated
display is called a refresh CRT.
spots. Picture definition is stored in a memory area called
the refresh buffer or frame buffer. Intensity range for
2.1.1 Persistence pixel positions depends on the capability of the raster sys-
tem. Refreshing on raster-scan displays is carried out at
For the phosphors coated on the screen, persistence rep- the rate of 60 to 80 frames per second, although some
resents the duration they continue to emit light after the systems are designed for higher refresh rates. At the end
CRT beam is removed. Persistence is defined as the time of each scan line, the electron beam returns to the left
it takes the emitted light from the screen to decay to one- side of the screen to begin displaying the next scan line.
tenth of its original intensity. Lower-persistence phosphors The return to the left of the screen, after refreshing each

1
2.2 Color monitors 2 DISPLAY DEVICES

scan line, is called the horizontal retrace of the electron 2.2.1 Beam-penetration
beam. And at the end of each frame, the electron beam
returns (vertical retrace) to the top left comer of the The beam-penetration method for displaying color pic-
screen to begin the next frame. tures has been used with random-scan monitors. Two lay-
ers of phosphor, usually red and green, are coated onto the
inside of the CRT screen, and the displayed color depends
on how far the electron beam penetrates into the phos-
2.1.5 Random scan
phor layers. A beam of slow electrons excites only the
outer red layer. A beam of very fast electrons penetrates
In this type of display system the CRT has the electron through the red layer and excites the inner green layer. At
beam directed only to the parts of the screen where a pic- intermediate beam speeds, combinations of red and green
ture is to be drawn. Random scan monitors draw a picture light are emitted to show two additional colors, orange and
one line at a time and for this reason are also referred to yellow. The speed of the electrons, and hence the screen
as vector displays (or stroke-writing or calligraphic color at any point, is controlled by the beam-acceleration
displays). The component lines of a picture can be drawn voltage. Beam penetration has been an inexpensive way
and refreshed by a random scan system in any specified to produce color in random-scan monitors, but only four
order. A pen plotter operates in a similar way andis an colors are possible, and the quality of pictures is not as
example of a random-scan, hard-copy device. good as with other methods.
Refresh rate on a random-scan system depends on the
number of lines to be displayed. Picture definition is now
2.2.2 Shadow-masking
stored as a set of linedrawing commands in an area of
memory refered to as the refresh display file. Some-
Shadow-mask methods are commonly used in rasterscan
times the refresh display file is called the display list, dis-
systems because they produce a much wider range of col-
play program, or simply the refresh buffer. To display a
ors than the beam-penetration method. A shadow-mask
specified picture, the system cycles through the set of com-
CRT has three phosphor color dots at each pixel posi-
mands in the display file, drawing each component line in
tion. One phosphor dot emits a red light, another emits
turn. After all line drawing commands have been pro-
a green light, and the third emits a blue light. This type
cessed, the system cycles back to the first line command
of CRT has three electron guns, one for each color dot,
in the list. Random-scan displays are designed to draw
and a shadow-mask grid just behind the phosphor-coated
all the component lines of a picture 30 to 60 times each
screen. The three electron beams are deflected and focused
second.
as a group onto the shadow mask, which contains a series
Random-scan systems are designed for linedrawing appli- of holes aligned with the phosphor-dot patterns. When
cations and can not display realistic shaded scenes. Since the three beams pass through a hole in the shadow mask,
pidure definition is stored as a set of linedrawing instruc- they activate a dot triangle, which appears as a small color
tions and not as a set of intensity values for all screen spot on the screen. The phosphor dots in the triangles are
points, vector displays generally have higher resolution arranged so that each electron beam can activate only its
than raster systems. Also, vector displays produce smooth corresponding color dot when it passes through the shadow
line drawings because the CRT beam directly follows the mask.
line path. A raster system, in contrast, produces jagged
We obtain color variations in a shadow-mask CRT by vary-
lines that are plotted as discrete point sets.
ing the intensity levels of the three electron beams. A so-
phisticated systems can set intermediate intensity levels
for the electron beams, allowing several million different
2.1.6 interlacing colors to be generated.

On some raster-scan systems (and in TV sets), each frame


2.2.3 Flat-Panel Display
is displayed in two passes using an interlaced refresh pme-
dure. In the first pass, the beam sweeps across every other
The term Flat-panel display refers to a class of video de-
scan line fmm top to bottom. Then after the vertical re-
vices that have reduced volume, weight, and power re-
trace, the beam sweeps out the remaining scan lines.
quirements compared to a CRT. A significant feature of
flat-panel displays is that they are thinner than CRTs, and
we can hang them on walls or wear them on our wrists.
2.2 Color monitors We can separate flat-panel displays into two categories:
emissive displays and nonemissive displays.
A CRT monitor displays color pictures by using a combi- The emissive displays are devices that convert electrical
nation of phosphors that emit different-colored light. By energy into light. Plasma panels, thin-film electrolumi-
combining the emitted light from the different phosphors, nescent displays, and Light-emitting diodes are examples
a range of colors can be generated. The two basic tech- of emissive displays. Flat CRTs have also been devised,
niques for producing color displays with a CRT are the in which electron beams arts accelerated parallel to the
beam-penetration method and the shadow-mask method. screen, then deflected 90’ to the screen. Nonemmissive dis-

o
2 of ?? κtiwari [at] cse.iitk.ac.in
3.1 Digital Image Definitions 3 CONCEPT OF VISUAL INFORMATION

plays use optical effects to convert sunlight or light from In this form the data is most suitable for several appli-
some other source into graphics patterns. The most im- cations such as transmission via digital communications
portant example of a nonemisswe flat-panel display is a facilities, storage within digital memory media or process-
liquid-crystal device. ing by computer.

3 Concept of Visual Information 3.1 Digital Image Definitions


A digital image f [m, n] described in a 2D discrete space
The ability to see is one of the truly remarkable charac- is derived from an analog image f (x, y) in a 2D contin-
teristics of living beings. It enables them to perceive and uous space through a sampling process that is frequently
assimilate in a short span of time an incredible amount referred to as digitization. The mathematics of that sam-
of knowledge about the world around them. The scope pling process will be described in subsequent Chapters.
and variety of that which can pass through the eye and be For now we will look at some basic definitions associated
interpreted by the brain is nothing short of astounding. with the digital image. The effect of digitization is shown
It is thus with some degree of trepidation that we intro- in figure ??.
duce the concept of visual information, because in the The 2D continuous image f (x, y) is divided into N rows
broadest sense, the overall significance of the term is over- and M columns. The intersection of a row and a col-
whelming. Instead of taking into account all of the ramifi- umn is termed a pixel. The value assigned to the inte-
cations of visual information; the first restriction we shall ger coordinates f [m, n] with (m = 0, 1, 2, ..., M − 1) and
impose is that of finite image size, In other words, the (n = 0, 1, 2, ..., N − 1) is f [m, n]. In fact, in most cases
viewer receives his or her visual information as if looking f (x, y), which we might consider to be the physical sig-
through a rectangular window of finite dimensions. This nal that impinges on the face of a 2D sensor, is actually
assumption is usually necessary in dealing with real world a function of many variables including depth (z),color (λ)
systems such as cameras, microscopes and telescopes for and time (t). Unless otherwise stated, we will consider the
example; they all have finite fields of view and can handle case of 2D, monochromatic, static images in this module.
only finite amounts of information.
The second assumption we make is that the viewer is inca-
pable of depth perception on his own. That is, in the scene
being viewed he cannot tell how far away objects are by
the normal use of binocular vision or by changing the fo-
cus of his eyes. This scenario may seem a bit dismal. But
in reality, this model describes an over whelming propor-
tion of systems that handle visual information, including
television, photographs, x-rays etc.
In this setup, the visual information is determined com-
pletely by the wavelengths and amplitudes of light that
passes through each point of the window and reach the
viewers eye. If the world outside were to be removed and Figure 1: Digitization of a continuous image.
a projector installed that reproduced exactly the light dis-
tribution on the window, the viewer inside would not be
able to tell the difference. The pixel at coordinates [m = 10, n = 3] has the integer
brightness value 110.
Thus, the problem of numerically representing visual in-
formation is reduced to that of representing the distribu- The image shown in figure ?? has been divided into N =
tion of light energy and wavelengths on the finite area 16 rows and M = 16 The value assigned to every pixel is
of the window. We assume that the image perceived is the average brightness in the pixel rounded to the nearest
”monochromatic” and static. It is determined completely integer value. The process of representing the amplitude of
by the perceived light energy (weighted sum of energy at the 2D signal at a given coordinate as an integer value with
perceivable wavelengths) passing through each point on L different gray levels is usually referred to as amplitude
the window and reaching the viewer’s eye. If we impose quantization or simple quantization.
Cartesian coordinates on the window, we can represent
perceived light energy or ”intensity” at point (x, y) by
f (x, y). Thus f (x, y) represents the monochromatic vi- 3.2 Common values
sual information or ”image” at the instant of time under
There are standard values for the various parameters en-
consideration. As images that occur in real life situations
countered in digital image processing. These values can be
cannot be exactly specified with a finite amount of numer-
caused by video standards, by algorithmic requirements,
ical data, an approximation of f (x, y) must be made if it
or by the desire to keep digital circuitry simple. Table ??
is to be dealt with by practical systems. Since number
bases can be changed without loss of information, we may gives some comm
assume f (x, y) to be represented by binary digital data. Quite frequently we see cases of M = N = 2k where (k =

o
3 of ?? κtiwari [at] cse.iitk.ac.in
3.4 Characteristics of Image Operations 3 CONCEPT OF VISUAL INFORMATION

Parameter Symbol Typical values smallest discernible detail in an image.


Rows N 256,512,525,625,1024,1035
As an example suppose we construct a chart with vertical
Columns M 256,512,768,1024,1320
lines of width W, and with space between the lines also
Gray Levels L 2,64,256,1024,4096,16384
having width W. A line-pair consists of one such line and
Table 1: Common values of digital image parameters its adjacent space. Thus width of line pair is and there
are line-pairs per unit distance. A widely used definition
of resolution is simply the smallest number of discernible
8, 9, 10) .This can be motivated by digital circuitry or by line pairs per unit distance; for es 100 line pairs/mm.
the use of certain algorithms such as the (fast) Fourier
Gray level resolution: This refers to the smallest dis-
transform.
cernible change in gray level. The measurement of dis-
The number of distinct gray levels is usually a power of 2, cernible changes in gray level is a highly subjective pro-
that is L = 2B , where B is the number of bits in the binary cess.
representation of the brightness levels. When B > 1 we
We have considerable discretion regarding the number of
speak of a gray-level image; when B = 1 we speak of a
Samples used to generate a digital image. But this is not
binary image. In a binary image there are just two gray
true for the number of gray levels. Due to hardware con-
levels which can be referred to, for example, as ”black”
straints, the number of gray levels is usually an integer
and ”white” or ”0” and ”1”.
power of two. The most common value is 8 bits. It can
Suppose that a continuous image f (x, y) is approximated vary depending on application. When an actual measure
by equally spaced samples arranged in the form of an N × of physical resolution relating pixels and level of detail
N array as: they resolve in the original scene are not necessary, it is
not uncommon to refer to an L-level digital image of size
  as having a spatial resolution of pixels and a gray level
f (0, 0) ... f (0, N − 1) resolution of L levels.
f (x, y) = 
 .. .. 
. . 
f (N − 1, 0) . . . f (N − 1, N − 1) N ×N
3.4 Characteristics of Image Operations
Each element of the array refered to as ”pixel” is a discrete
There is a variety of ways to classify and characterize im-
quantity. The array represents a digital image.
age operations. The reason for doing so is to understand
The above digitization requires a decision to be made on a what type of results we might expect to achieve with a
value for N a well as on the number of discrete gray levels given type of operation or what might be the computa-
allowed for each pixel. tional burden associated with a given operation.
It is common practice in digital image processing to let
N = 2n and G = number of gray levels = 2M . It is
3.4.1 Type of operations
assumed that discrete levels are equally spaced between 0
to L in the gray scale.
The types of operations that can be applied to digital im-
Therefore the number of bits required to store a digitized ages to transform an input image a[m, n] into an output
image of size N × N is b = N × N × m In other words image b[m, n] (or another representation) can be classified
a 128 × 128 image with 256 gray levels (ie 8 bits/pixel) into three categories as shown in Table ??
required a storage of ≈ 17000 bytes.
The representation given above is an approximation to a
continuous image. 3.4.2 Types of neighborhoods

Reasonable question to ask at this point is how many sam- Neighborhood operations play a key role in modern digital
ples and gray levels are required for a good approximation? image processing. It is therefore important to understand
This brings up the question of resolution. The resolution how images can be sampled and how that relates to the
(ie the degree of discernble detail) of an image is strangely various neighborhoods that can be used to process an im-
dependent on both N and m. The more these parameters age.
are increased, the closer the digitized array will approxi-
mate the original image. Rectangular sampling - In most cases, images are sam-
pled by laying a rectangular grid over an image as illus-
Unfortunately this leads to large storage and consequently trated in Figure(1.1). This results in the type of sampling
processing requirements increase rapidly as a function of shown in Figure(1.3ab). Hexagonal sampling-An alter-
N and large m. native sampling scheme is shown in Figure (1.3c) and is
termed hexagonal sampling.
3.3 Spatial and Gray level resolution Both sampling schemes have been studied extensively and
both represent a possible periodic tiling of the continuous
Sampling is the principal factor determining the spatial image space. However rectangular sampling due to hard-
resolution of an image. Basically spatial resolution is the ware and software and software considerations remains

o
4 of ?? κtiwari [at] cse.iitk.ac.in
3.6 Convolution 3 CONCEPT OF VISUAL INFORMATION

Operation Characterization Generic Complex- Standard Prop- NTSC PAL SECAM


ity/ Pixel erty
*Point -the output value constant Images / Second 29.97 25 25
at a specific coor- Ms / image 33.37 40.0 40.0
dinate is dependent Lines / image 525 625 625
only on the input (horiz./vert.)= 4:3 4:3 4:3
value at that same aspect radio
coordinate. interlace 2:1 2:1 2:1
*Local -the output value P2 Us / line 63.56 64.00 64.00
at a specific coor-
dinate is dependent Table 3: Standard video parameters
on the input val-
ues in the neighbor-
hood of that same scan lines of a video image is to reduce the perception
coordinate. of flicker in a displayed image. If one is planning to use
*Global –the output value N2 images that have been scanned from an interlaced video
at a specific coor- source, it is important to know if the two half-images have
dinate is dependent been appropriately ”shuffled” by the digitization hardware
on all the values in or if that should be implemented in software. Further,
the input image.. the analysis of moving objects requires special care with
interlaced video to avoid ’Zigzag’ edges.
Table 2: Types of image operations. Image size= N × N
neighborhood size= P × P . Note that the complexity is 3.5.1 Tools
specified in operations per pixel.
Certain tools are central to the processing of digital im-
the method of choice. Local operations produce an out- ages. These include mathematical tools such as convo-
put pixel value based upon the pixel values in the neigh- lution, Fourier analysis, and statistical descriptions, and
borhood .Some of the most common neighborhoods are manipulative tools such as chain codes and run codes. We
the 4-connected neighborhood and the 8-connected neigh- will present these tools without any specific motivation.
borhood in the case of rectangular sampling and the 6- The motivation will follow in later sections.
connected neighborhood in the case of hexagonal sampling
illustrated in Figure ??.
3.6 Convolution
There are several possible notations to indicate the con-
volution of two (multi-dimensional) signals to produce an
output signal. The most common are:

c=a⊗b=a∗b

We shall use the first form c = a ⊗ b with the following


formal definitions. In 2D continuous space:
Figure 2: Types of neighborhoods Z +00 Z +00
C(x, y) = a(x, y)⊗b(x, y) = a(λ, ζ)b(x−λ, y−ζ)dλdζ
−00 −00

3.5 Video Parameters In 2D discrete space:


+00
X +00
X
We do not propose to describe the processing of dy- C(m, n) = a(m, n)⊗b(m, n) = a(j, k)b(m−j, n−k)
namically changing images in this introduction. It is j=−00 j=−00
appropriate-given that many static images are derived
from video cameras and frame grabbers-to mention the
standards that are associated with the three standard 3.6.1 Properties of Convolution
video schemes that are currently in worldwide use- NTSC,
PAL, and SECAM. This information is summarized in Ta- There are a number of important mathematical properties
ble ??. associated with convolution.

In a interlaced image the odd numbered lines (1, 3, 5.) are ˆ Convolution is commutative.
scanned in half of the allotted time (e.g. 20 ms in PAL) c=a⊗b=b⊗a
and the even numbered lines (2, 4, 6,.) are scanned in the
remaining half. The image display must be coordinated ˆ Convolution is associative.
with this scanning format. The reason for interlacing the a ⊗ (b ⊗ c) = (a ⊗ b) ⊗ c = a ⊗ b ⊗ c

o
5 of ?? κtiwari [at] cse.iitk.ac.in
3.9 Importance of phase and magnitude 3 CONCEPT OF VISUAL INFORMATION

ˆ Convolution is distributive. ˆ The Fourier transform is, in general, a complex func-


a ⊗ (b + c) = (a ⊗ b) + (a ⊗ c) tion of the real frequency variables. As such the trans-
form con be written in terms of its magnitude and
where a, b, c, and d are all images, either continuous or phase.
discrete.
A(u, v) = |A(u, v)|ejφ(u,v) ; A(Ω, Ψ) = |A(Ω, Ψ)|ejφ(Ω,Ψ)

ˆ A 2D signal can also be complex and thus written in


3.7 Fourier Transforms
terms of its magnitude and phase.
The Fourier transform produces another representation of a(x, y) = |a(x, y)|ejθ(x,y) ; a(m, n) = |a(m, n)|ejθ(m,n)
a signal, specifically a representation as a weighted sum of
complex exponentials. Because of Euler’s formula: ˆ If a 2D signal is real, then the Fourier transform has
certain symmetries.1
ejq = cos(q) + j sin(q)
A(u, v) = A∗ (−u, −v); A(Ω, Ψ) = A∗ (−Ω, −Ψ)
2
where j = −1, we can say that the Fourier transform
produces a representation of a (2D) signal as a weighted For real signals equation leads directly to,
sum of sines and cosines. The defining formulas for the |A(u, v)| = |A(−u, −v)|; φ(u, v) = −φ(−u, −v)
forward Fourier and the inverse Fourier transforms are as
follows. Given an image a and its Fourier transform A, |A(Ω, Ψ)| = |A(−Ω, −Ψ)|; φ(Ω, Ψ) = −φ(−Ω, −Ψ)
then the forward transform goes from the spatial domain
(either continuous or discrete) to the frequency domain ˆ If a 2D signal is real and even, then the Fourier trans-
which is always continuous. form is real and even
A(u, v) = A(−u, −v); A(Ω, Ψ) = A(−Ω, −Ψ)
F orward − A = F {a}
ˆ The Fourier and the inverse Fourier transforms are
The inverse Fourier transform goes from the frequency do- linear operations
main back to the spatial domain.
F (w1 a + w2 b) = F (w1 a) + F (w2 b) = w1 a + w2 b
Inverse − a = F −1 {A}
F −1 (w1 a+w2 b) = F −1 (w1 a)+F −1 (w2 b) = w1 a+w2 b
The Fourier transform is a unique and invertible operation where a and b are 2D signals(images) and w1 and w2
so that: are arbitrary, complex constants.
ˆ The Fourier transform in discrete space, A(Ω, Ψ), is
In 2D continuous space: periodic in both Ω and Ψ Both periods are 2π
Z +∞ Z +∞ A(Ω + 2πj, Ψ + 2πk) = A(Ω, Ψ) j k ∈ integers
F orward : A(u, v) = a(x, y)e−j(ux+vy) dx dy
−∞ −∞
Z +∞ Z +∞ 3.9 Importance of phase and magnitude
1 +j(ux+vy)
Inverse : a(x, y) = A(u, v)e du dv
4π 2 −∞ −∞ The definition indicates that the Fourier transform of an
image can be complex. This is illustrated below in Figure
??.
In 2D Discrete space:
Figure (1.4a) shows the original image a[m, n], Figure
+∞
X +∞
X (1.4b) the magnitude in a scaled form as log(|A(Ω, Ψ)|)
F orward : A(Ω, Ψ) = a(m, n)e−j(Ωm+Ψn) and Figure (1.4c) the phase φ(Ω, Ψ). Both the magnitude
m=−∞ n=−∞ and the phase functions are necessary for the complete re-
construction of an image from its Fourier transform. Fig-
ure(1. 5a) shows what happens when Figure (1.4a) is re-

Z +π
1
Z stored solely on the basis of the magnitude information
Inverse : a(m, n) = A(Ω, Ψ)e+j(Ωm+Ψn) dΩ dΨ and Figure (1.5b) shows what happens when Figure (1.4a)
4π 2
−π −π is restored solely on the basis of the phase information.
Neither the magnitude information nor the phase infor-
3.8 Properties of Fourier Transforms mation is sufficient to restore the image. The magnitude-
only image Figure (1.5a) is unrecognizable and has severe
There are a variety of properties associated with the dynamic range problems. The phase-only image Figure
Fourier transform and the inverse Fourier transform. The (1.5b) is barely recognizable, that is, severely degraded in
following are some of the most relevant for digital image quality.
processing. 1 The symbol (*) indicates complex conjugation.

o
6 of ?? κtiwari [at] cse.iitk.ac.in
3.10 Statistics 3 CONCEPT OF VISUAL INFORMATION

of a probability distribution, generally the distribution of


signal amplitudes. For a given region-which could con-
ceivably be an entire image-we can define the probability
distribution function of the brightnesses in that region and
probability density function of the brightnesses in that re-
gion. We will assume in the discussion that follows that
we are dealing with a digitized image a(m, n).

Probability distribution function of the bright-


nesses

The probability distribution function P (a), is the proba-


bility that a brightness chosen from the region is less than
or equal to a given brightness value a. As a increases from
−∞ to +∞, P (a) increases from 0 to 1. P (a) is monotonic,
Figure 3: Importance of phase and magnitude non-decreasing in a and thus dPda ≥ 0.

3.9.1 Circularly symmetric signals Probability density function of the brightnesses

An arbitrary 2D signal a(x, y) can always be written in a The probability that a brightness in a region falls between
polar coordinate system as a(r, θ). When the 2D signal a and a + 4a ,given the probability distribution function
exhibits a circular symmetry this means that: P (a) can be expressed as P (a)4a where P (a) is the prob-
ability density function.
a(x, y) = a(r, θ) = a(r)
dP (a) 
P (a)4a = 4a
where r2 = x2 + y 2 and tan θ = y/π. As a number of da
physical systems such as lenses exhibit circular symmetry,
it is useful to be able to compute an appropriate Fourier Because of monotonic, non-decreasing
R +∞
character of P (a)
representation. we have P (a) ≥ 0 and −∞ P (a) da = 1. For an image
with quantized (integer) brightness amplitudes, the inter-
The Fourier transform A(u, v) can be written in polar co-
pretation of 4a is the width of a brightness interval. We
ordinates A(ωr , ζ and then, for a circularly symmetric sig-
assume constant width intervals. The brightness proba-
nal, rewritten as a Hankel transform:
bility density function is frequently estimated by counting
the number of times that each brightness occurs in the re-
Z∞ gion to generate a histogram, h[a].The histogram can then
A(u, v) = F {a(x, y)} = 2π a(r)J0 (wr r)dr = A(ωr ) be normalized so that the total area under the histogram
is 1. Said another way, the P (a) for region is the normal-
0
(1) ized count of the number of pixels, N, in a region that have
quantized brightness a:
where ωr2 = u2 + v 2 and tan ζ = v/u and J0 (∗) is a Bessel
function of the first kind of order zero. a X
p[a] = h[a] with N = h[a]
N
The inverse Hankel transform is given by: α

Z∞
1 The brightness probability distribution function for the
a(r) = A(wr r)wr dwr image is shown in Figure(1. 6a). The (unnormalized)
2
0 brightness histogram which is proportional to the esti-
mated brightness probability density function is shown in
The Fourier transform of a circularly symmetric 2D signal Figure(??). The height in this histogram corresponds to
is a function of only the radial frequency wr . The depen- the number of pixels with a given brightness.
dence on the angular frequency ζ has vanished. Further if
a(x, y) = a(r) is real, then it is automatically even due to Both the distribution function and the histogram as mea-
the circular symmetry. According to equ (??), will then sured from a region are a statistical description of that
be real and even. region. It must be emphasized that both P (a) and p(a)
should be viewed as estimates of true distributions when
they are computed from a specific region. That is, we
3.10 Statistics view an image and a specific region as one realization of
the various random processes involved in the formation
In image processing it is quite common to use simple sta- of that image and that region . In the same context, the
tistical descriptions of images and sub-images. The no- statistics defined below must be viewed as estimates of the
tion of a statistic is intimately connected to the concept underlying parameters.

o
7 of ?? κtiwari [at] cse.iitk.ac.in
3.12 Coefficient-of-variation 3 CONCEPT OF VISUAL INFORMATION

Percentiles

The percentile, p%, of an unquantized brightness distri-


bution is defined as that value of the brightness such that:
P (a) = p% or equivalently

Za
p(α) dα = p%
−∞

Three special cases are frequently used in digital image


processing.
Figure 4: (a) Brightness distribution function of Figure(1.
4a) with minimum, median, and maximum indicated. (b) ˆ 0% the minimum value in the region
Brightness histogram of Figure (1.4a).
ˆ 50% the median value in the region

ˆ 100% the maximum value in the region.


3.11 Average
The average brightness of a region is defined as sample All three of these values can be determined from Figure
mean of the pixel brightnesses within that region. The ?? (1.6a).
average ma of the brightness over the N pixels within a
region is given by:
Mode
a X
ma = a[m, n]
Nm,n∈R The mode of the distribution is the most frequent bright-
Alternatively, we can use a formulation based upon ness value. There is no guarantee that a mode exists or
the (unnormalized) brightness histogram, ,with discrete that it is unique.
brightness values a. This gives:
1 X Signal to noise ratio
ma = a.h(a)
N a
The signal-to-noise ratio, SNR, can have several defini-
The average brightness ma is an estimate of the mean
tions. The noise is characterized by its standard deviation,
brightness ua , of the underlying brightness probability dis-
.The characterization of the signal can differ. If the signal
tribution.
is known to lie between two boundaries, then the SNR is
defined as:
Standard deviation
ˆ Bounded signal
The unbiased estimate of the standard deviation, of the
brightnesses within a region with N pixels is called the sa 
SN R = 20 log10 dB (2)
sample standard deviation and is given by: sn
r
1 X
ˆ Stochastic signal:
Sa = (a[m, n] − ma )2
N −1 If the signal is not bounded but has a statistical dis-
m,n∈R
sP tribution then two other definitions are known:
2 2
m,n∈R a [m, n] − N ma S & N inter-dependent SN R = 20 log10 ssna dB and

=
N −1 S & N independent SN R = 20 log10 ssna dB where


Using the histogram formulation gives ma and sa are defined above.


rP
2 2
a a − h[a] − N.ma
Sa = Statistics from Figure ??
N −1
The standard deviation Sa is an estimate of σa of the
A SNR calculation for the entire image based on equ
underlying brightness probability distribution.
(??) is not directly available. The variations in the im-
age brightnesses that lead to the large value of s (=49.5)
3.12 Coefficient-of-variation are not, in general, due to noise but to the variation in
local information. With the help of the region there is a
The dimensionless coefficient-of-variation, CV, is defined way to estimate the SNR. We can use the SR (=4.0) and
as: the dynamic range amax − amin , for the image (=241-
Sa 56) to calculate a global SNR (=33.3 dB). The underlying
CV = × 100%
ma assumptions are that

o
8 of ?? κtiwari [at] cse.iitk.ac.in
4.1 Model of the Human Eye 4 PERCEPTION

Figure 5: Statistics from region interior of the circle

Statistic Image ROI


Average 137.7 219.3
Standard Deviation 49.5 4.0
Minimum 56 202 Figure 6: Elements of Human Visual Perception.
Median 141 220
Maximum 241 226
Mode 62 220
SNR (db) NA 33.3
fibrous cells and contains up to 60 to 70% of water. Its
Table 4: Statistics from Figure ?? operation is similar to that of the man made optical lenses.
It focuses the light on the ”retina” which is the innermost
membrane of the eye.
1. The signal is approximately constant in that region
and the variation in the region is therefore due to Retina has two kinds of photoreceptors: cones and rods.
noise, and The cones are highly sensitive to color. Their number is
6-7 million and they are mainly located at the central part
2. That the noise is the same over the entire image with of the retina. Each cone is connected to one nerve end.
a standard deviation given by Sn = SR .
Cone vision is the photopic or bright light vision. Rods
serve to view the general picture of the vision field. They
are sensitive to low levels of illumination and cannot dis-
4 Perception criminate colors. This is the scotopic or dim-light vision.
Their number is 75 to 150 million and they are distributed
Many image processing applications are intended to pro- over the retinal surface. Several rods are connected to a
duce images that are to be viewed by human observers. single nerve end. This fact and their large spatial distri-
It is therefore important to understand the characteristics bution explain their low resolution.
and limitations of the human visual system to understand Both cones and rods transform light to electric stimulus,
the ”receiver” of the 2D signals. At the outset it is impor- which is carried through the optical nerve to the human
tant to realise that (1) human visual system (HVS) is not brain for the high level image processing and perception.
well understood; (2) no objective measure exists for judg-
ing the quality of an image that corresponds to human
assessment of image quality, and (3) the typical human
observer does not exist Nevertheless, research in percep-
tual psychology has provided some important insights into
the visual system [stock ham]. 4.1 Model of the Human Eye
The first part of the visual system is the eye. This is shown
in figure(??). Its form is nearly spherical and its diameter
is approximately 20 mm. Its outer cover consists of the Based on the anatomy of the eye, a model can be con-
’cornea’ and ’sclera’ structed as shown in Figure(2.2).Its first part is a simple
optical system consisting of the cornea, the opening of
The cornea is a tough transparent tissue in the front part iris, the lens and the fluids inside the eye. Its second part
of the eye. The sclera is an opaque membrane, which is consists of the retina, which performs the photo electri-
continuous with cornea and covers the remainder of the cal transduction, followed by the visual pathway (nerve)
eye. Directly below the sclera lies the ”choroids”, which which performs simple image processing operations and
has many blood vessels.At its anterior extreme lies the iris carries the information to the brain.
diaphragm. The light enters in the eye through the central
opening of the iris, whose diameter varies from 2mm to    
8mm, according to the illumination conditions. Behind Optical System Retina Visual Pathway
- -
the iris is the ”lens” which consists of concentric layers of    

o
9 of ?? κtiwari [at] cse.iitk.ac.in
4.1 Model of the Human Eye 4 PERCEPTION

Image Formation in the Eye

The image formation in the human eye is not a simple phe-


nomenon. It is only partially understood and only some
of the visual phenomena have been measured and under-
stood. Most of them are proven to have non-linear charac-
teristics. Two examples of visual phenomena are:Contrast
sensitivity , Spatial Frequency Sensitivity

Contrast sensitivity
Figure 8: The Weber ratio with background

Figure 7: The Weber ratio without background

Let us consider a spot of intensity I+dI in a background


having intensity I, as is shown in Figure (2.3) ; dI is in-
creased from 0 until it becomes noticeable. The ratio dI/I,
called Weber ratio, is nearly constant at about 2% over a
wide range of illumination levels, except for very low or
very high illuminations, as it is seen in Figure (2.3). The
range over which the Weber ratio remains constant is re-
duced considerably, when the experiment of Figure (2.4)
is considered. In this case, the background has intensity
Figure 9: The Mach-band effect: (a) Vertical stripes hav-
I0 and two adjacent spots have intensities I and I+dI, re-
ing constant illumination (b) Actual image intensity pro-
spectively. The Weber ratio is plotted as a function of the
file (c) Perceived image intensity profile
background intensity in Figure (2.4). The envelope of the
lower limits is the same with that of Figure (2.3). The
derivative of the logarithm of the intensity I is the Weber
ratio: Spatial Frequency Sensitivity
dl
d[log(I)] = If the constant intensity (brightness) I0 is replaced by a
I
sinusoidal grating with increasing spatial frequency (Fig-
Thus equal changes in the logarithm of the intensity re- ure2.6a), it is possible to determine the spatial frequency
sult in equal noticeable changes in the intensity for a wide sensitivity. The result is shown in Figure (2.6a, 2.6b).
range of intensities. This fact suggests that the human To translate these data into common terms, consider an
eye performs a pointwise logarithm operation on the in- ”ideal” computer monitor at a viewing distance of 50 cm.
put image. The spatial frequency that will give maximum response is
Another characteristic of HVS is that it tends to ”over- at 10 cycles per degree. (See figure above) The one degree
shoot” around image edges (boundaries of regions having at 50 cm translates to 50 tan (1 deg.) =0.87 cm on the
different intensity). As a result, regions of constant in- computer screen. Thus the spatial frequency of maximum
tensity, which are close to edges, appear to have varying response fmax ==10 cycles/0.87 cm=11.46 cycles/cm at
intensity. Such an example is shown in Figure (2.5). The this viewing distance. Translating this into a general for-
stripes appear to have varying intensity along the horizon- mula gives:
tal dimension, whereas their intensity is constant. This 10 572.9
effect is called Mach band effect. It indicates that the hu- fmax = = Cycles/cm
d ∗ tan(1o ) d
man eye is sensitive to edge information and that it has
high-pass characteristics where d=viewing distance measured in cm.

o
10 of ?? κtiwari [at] cse.iitk.ac.in
4.4 Color matching 4 PERCEPTION

of the image plane, and λ the wavelength. The human


retina contains pre-dominantly three different color recep-
tors (called cones) that are sensitive to 3 overlapping areas
of the visible spectrum. The sensitivities of the recep-
tors peak at approximately 445. (Called blue), 535 (called
green) and 570 (called red) nanometers.
Each type of receptors integrates the energy in the inci-
dent light at various wavelengths in proportion to their
sensitivity to light at that wavelength. The three result-
ing numbers are primarily responsible for color sensation.
This is the basis for trichromatic theory of color vision,
which states that the color of light entering the eye may
Figure 10: b) shows Sinusoidal test grating ; spatial fre-
be specified by only 3 numbers, rather than a complete
quency sensitivity
function of wavelengths over the visible range. This leads
to significant economy in color specification and repro-
duction for human viewing. Much of the credit for this
4.2 Fundamentals of Color Images significant work goes to the physicist Thomas Young.

Light is a form of electromagnetic (em) energy that can The counterpart to trichromacy of vision is the Trichro-
be completely specified at a point in the image plane by macy of Color Mixture.
its wavelength distribution. Not all electromagnetic ra- This important principle states that light of any color can
diation is visible to the human eye. In fact, the entire be synthesized by an appropriate mixture of 3 properly
visible portion of the radiation is only within the narrow chosen primary colors.
wavelength band of 380 to 780 nms. Till now, we were
Maxwell in 1855 showed this using a 3-color projecting
concerned mostly with light intensity, i.e. the sensation
system. Several development took place since that time
of brightness produced by the aggregate of wavelengths.
creating a large body of knowledge referred to as colorime-
However light of many wavelengths also produces another
try.
important visual sensation called ”color”. Different spec-
tral distributions generally, but not necessarily, have dif- Although trichromacy of color is based on subjective &
ferent perceived color. Thus color is that aspect of visible physiological finding, these are precise measurements that
radiant energy by which an observer may distinguish be- can be made to examine color matches.
tween different spectral compositions.
A color stimulus therefore specified by visible radiant en- 4.4 Color matching
ergy of a given intensity and spectral composition.Color is
generally characterised by attaching names to the differ- Consider a bipartite field subtending an angle (<) of 2o at
ent stimuli e.g. white, gray, back red, green, blue. Color a viewer’s eye. The entire field is viewed against a dark,
stimuli are generally more pleasing to eye than ”black and neutral surround. The field contains the test color on left
stimuli” .Consequently pictures with color are widespread and an adjustable mixture of 3 suitably chosen primary
in TV photography and printing. colors on the right as shown in Figure (2.7).
Color is also used in computer graphics to add ”spice” to
the synthesized pictures. Coloring of black and white pic-
tures by transforming intensities into colors (called pseudo
colors) has been extensively used by artist’s working in
pattern recognition. In this module we will be concerned
with questions of how to specify color and how to repro-
duce it. Color specification consists of 3 parts:

1. Color matching Figure 11: 2o bipartial field at view’s eye

2. Color differences
3. Color appearance or perceived color It is found that most test colors can be matched by a
proper mixture of 3 primary colors as long as the pri-
We will discuss the first of these questions in this module mary colors are independent. The primary colors are usu-
ally chosen as red, green & blue or red, green & violet.
The ”tristimulus values” of a test color are the amount
4.3 Representation of color for human vi- of 3 primary colors required to give a match by additive
sion mixture.They are unique within an accuracy of the ex-
periment. Much of colorimetry is based on experimental
Let S(λ) denote the spectral power distribution (in watts results as well as rules attributed to Grassman. Two im-
/m2 /unit wavelength) of the light emanating from a pixel portant rules that are valid over a large range of observing

o
11 of ?? κtiwari [at] cse.iitk.ac.in
4.5 Color-Coordinate Systems. 5 SAMPLING

conditions are ”linearity ” and ”additivity”. They state in which left side of the split field shown in Fig (2.7), is
that, allowed to emit light of unit intensity whose spectral distri-
bution is constant wrt λ i.e. (equal energy white E).Then
1. The color match between any two color stimuli holds the amount of each primary required for a match is taken
even if the intensities of the stimuli are increased or by definition as one ”unit”.
decreased by the same multiplying factor, as long The amount of primaries for matching other test colors
as their relative spectral distributions remain un- is then expressed in terms of this unit. In practice equal
changed. As an example, if stimuli s1 (λ) and s2 (λ) energy white ’E’ is matched with positive amounts of each
match, and stimuli s3 (λ) and s4 (λ) also match, then primary.
additive mixtures s1 (λ) + s3 (λ) and s2 (λ) + s4 (λ) will
also match.

2. Another consequence of the above rules of Grassman


trichromacy is that any four colors cannot be linearly
independent. This implies tristimulus value of one of
the 4 colors can be expressed as linear combination
of tristimulus values of remaining 3 colors.. That is,
any color C is specified by its projection on 3-axes R,
G, B corresponding to chosen set of primaries. This
is shown in Figure ??

Figure 13: 2o bipartial field at view’s eye

4.5 Color-Coordinate Systems.

4.6 CIE System of Color Specification

Figure 12: The color-matching functions for the 20 Stan- 4.7 Chromaticity coordinates in CIE-
dard Observer , using primaries of wavelengths 700(red), XYZ system.
546.1 (green), and 435.8 nm (blue), with units such that
equal quantities of the three primaries are needed to match
the equal energy white, E
4.8 Color Mixtures

4.9 Polar Coordinate Representation of


color
Consider a mixture of two colors S1 and S2 i.e S=S1+S2
If S1 is specified by ( Rs1, Gs1, Bs1) and S2 is speci- 4.10 Color Transformations
fied by (Rs2, Gs2, Bs2) This implies, S is specified by
(Rs1,+Rs2,Gs1,+Gs2,Bs1,+Bs2)
The constraint of color matching experiment is that only 5 Sampling
non-ve amounts of primary colors can be added to match
a test color. In practice this is not sufficient to effect a It is generally true that all discrete sequences are formed
match. In this case, since negative amounts of primary in an attempt to represent some underlying continuous
cannot be produced, a match is made by simple transpo- signal. Although many discrete representations of con-
sition i.e. by adding positive amounts of primary to the tinuous signals are possible, periodic sampling is by far
test color the representation mostly used due to the simplicity of
a test color S might be matched by S+3G=2R+B or its implementation. We consider, in this section, the re-
S=2R-3G+B lationships between continuous signals and the discrete
sequences which are obtained from them by periodic sam-
⇒The negative tristimulus values (2,-3,1) present no spe-
pling. In particular, we first consider the specific case of
cial problem. rectangular periodic sampling, and then a more general
By convention, tristimulus values are expressed in normal- case of periodic sampling with arbitrary sampling geome-
ized form. This is done by a preliminary color experiment tries.

o
12 of ?? κtiwari [at] cse.iitk.ac.in
5.1 Evaluation 5 SAMPLING

5.0.1 Two dimensional rectangular sampling Here the spatial frequency variables Ω1 , Ω2 have the units
in cycles/mm and are related to radian frequencies by
We discuss 2D rectangular sampling of a stillerriage a scale factor of 2π. In order to evaluate the 2D FT
xa (t1 , t2 ) in two spatial coordinates. In spatial rectangle Xp (Ω1 , Ω2 ) of xp (t1 , t2 ) after substitution of equations and
sampling, we sample at the locations. exchange the order of function and summation to obtain,
+∞
Z +∞
t1 = n1 T1 XX Z
t2 = n2 T2 (3) Xp (Ω1 , Ω2 ) = xa (n1 T1 , n2 T2 ) (
n1 n2 −∞ −∞
Where T1 and T2 are sampling distances in the t1 and δ(t1 − n1 T1 , t2 − n2 T2 ) ×
t2 directions, respectively. The 2D rectangular sampling exp−j2π(Ω1 t1 ,Ω2 t2 ) ) dt1 dt2
grid is depicted in figure ?? below. The sampled signal can
which simplifies as,
XX
Xp (Ω1 , Ω2 ) = xa (n1 T1 , n2 T2 )e−j2π(Ω1 n1 T1 ,Ω2 n2 T2 )
n1 n2

Note that Xp (Ω1 , Ω2 ) is periodic with the fundamental


period given by the region Ω2 < | 2T1 1 | and Ω1 < | 2T1 2 |.
Letting w1 = Ω1 T1 and w2 = Ω2 T2 and using above equa-
tions, we obtain the discrete space Fourier transform re-
lation, in terms of unitless frequency variables w1 and w2
as,

Figure 14: 2D rectangular sampling grid 5.1 Evaluation


ˆ One MidSem Exam
be expressed in terms of the unitless coordinate variables
(n1 , n2 ) as: ˆ Assignment (Metlab based)

x(n1 , n2 ) = xa (n1 T1 , n2 T2 ) ∀(n, n2 ) ∈ Z 2 ˆ Term Paper understand the paper, fillup the gapes,
demonstration/seminar. Will be done in groups (of
In some cases it is convenient to define an intermediate two).
sampled signal in terms of continuous coordinate variables
given by, ˆ EndSem Exam
XX
xp (t1 , t2 ) = xa (t1 , t2 ) δ(t− n1 T1 , t2 − n2 T2 )
n1 n2 5.2 Questions
XX
= xa (n1 t1 , n2 t2 )δ(t− n1 T1 , t2 − n2 T2 ) Q1. Why do we process images?
n1 n2
XX – Picture digitization and coding - for transmis-
= x(n1 , n2 )δ(t− n1 T1 , t2 − n2 T2 )
sion, storage, and printing
n1 n2
– Picture enhansment and restoration
Note xp (t1 , t2 ) is indeed a sampled signal because of the
– Picture segmentation and description - for ma-
presence of 2D Dirac delta functions.
chine vision, image understanding.
Q2. What is image?
5.0.2 Spectrum of the sampled signal
– Panchromatic - gray scale, 2D light intensity
We now relate the Fourier Xp (Ω1 , Ω2 ) transform or function f(x,y)
X(w1 , w2 ) of the sampled signal to that of the continu- – Multispectral - color image, f(x,y) is a vector
ous signal xa (t1 , t2 ). As given earlier, the 2D continuous (R,G,B)
space Fourier transform Xa (Ω1 , Ω2 ) of a xa (t1 , t2 ) signal
with continuous variables (t1 , t2 ) is given by, Q3. What is digital image?
Descritize both in special and intensity function.
+∞
Z +∞ Z  
Xa (Ω1 , Ω2 ) = xa (t1 , t2 )exp−j2π(t1 Ω1 +t2 Ω2 ) dt1 dt2 f (1, 1) . . . f (1, N )
g=
 .. .. 
−∞ −∞ . . 
2 f (N, 1) . . . f (N, N ) N ×N
where (Ω1 , Ω2 ) ∈ R and the inverse Fourier transform is
given by, 0 ≤ f (x, y) ≤ G − 1, where G is number of gray levels
+∞
Z Z+∞ and is bounded by G = 2m
xa (t1 , t2 ) = Xa (Ω1 , Ω2 )exp−j2π(t1 Ω1 +t2 Ω2 ) dΩ1 dΩ2 Number of bits required to represent an image is N ×
−∞ −∞ N × m typically it is 512*512*8

o
13 of ?? κtiwari [at] cse.iitk.ac.in
6.1 Sensitivity 6 QUANTIZATION

Q4. What is resolution of an image? In other words point spread function h(x, α, y, β) expresses
Minute details observable in image. Checherboard how much the input value at position (x, y) influences the
pattern. value at point (α, β).
Reducing m but keeping N constant is called false
For a linear operator ~[aδ(x−α, y −β)] = a~[δ(x−α, y −
contouring.
β)].
Operator are defined in terms of point spread functions
6 Quantization Effect of an operator characterized by h(x, α, y, β) on an
image f (x, y) can be written as
6.1 Sensitivity N
X −1 N
X −1
g(α, β) = f (x, y)h(x, α, y, β)
For color images we use three set of sensors as below. x=0 y=0

Book:Image Processing, The Fundamentals Maria Petrou,


Panaglota Bosdogianni, Publisher: John Willey & Sons Shift invarient PSF do not have influance based on actual
Purpose of Image processing pixel position rather it depends on relative positions.

h(x, α, y, β) = h(α − x, β − y)
ˆ Picture degitization and coding

ˆ Picture enhansment and restoration


Convolutionunder assumption of shift invarient PSF, fol-
ˆ Picture segmentation and description lowing is convolution
N
X −1 N
X −1
Monochrome Imageis a function f (x, y), which is 2D hav- g(α, β) = f (x, y)h(α − x, β − y)
ing light intensity values2 . An image point at position x, y x=0 y=0
is called pixel.
An image of size N ×N with 2m different gray levels needs Seprable PSFWhen columns are influenced independently
N × N × m bits. from the rows of the image the PSF is calles seprable.
chackerboard effectkeeping m constant and decreasing N
h(x, α, y, β) ≡ hc (x, α)hr (y, β)
produces this effect.
False contouring3 Keeping N constant and reducing m. For such case we can write as below
For more detailed picture (like picture of croud) false con- N
X −1 N
X −1
touring have less effects. g(α, β) = hc (x, α) f (x, y)hr (y, β)
Resolutionexpresses how much details we can see in pic- x=0 y=0

ture. When PSF is both shift invarient and seprable, then


Brightness of a pixel is the light intensity value recorded N −1 N −1
at sensor from corresponding physical object part. g(α, β) =
X
h (α − x)
X
f (x, y)h (β − y)
c r
Image processing is doneby using image tramsformations. x=0 y=0
Which in turn done using operators. Operator takes an
image (called input image) and produces another image Define an extended source of constant brightness
(called output image).
Linear operator ’~’ have following property δn (x, y) ≡ n2 rect(nx, ny)

Where n is a positive constant


~[af + bg] = a ~ [f ] + b ~ [g]
1 inside a rectangle |nx| ≤ 21 , |ny| ≤ 1

rect(nx, ny) ≡ 2
0 elsewhere
Point spread function of an operator is what we get out if
we apply the operator on a point source The total brightness of this source is given by
+∞
Z +∞
Z +∞
Z +∞
Z
~[Point source] = point spread function 2
δn (x, y) dx dy = n rect(nx, ny) dx dy = 1
−∞ −∞ −∞ −∞
~[δ(x − α, y − β)] = h(x, α, y, β) | {z }
area of rectangle
where δ(x − α, y − β) is a point source of brightness 1
centred at point (α, β).
Dirac delta function
20≤ f (x, y) ≤ G − 1, where G is the maximum possible intensity 
value. G is generally in the form 2m 6= 0 for x = y = 0
3 Contour: the outline of a figure or body δ(x, y) =
0 elsewhere

o
14 of ?? κtiwari [at] cse.iitk.ac.in
6.1 Sensitivity 6 QUANTIZATION

With the property ˆ Image reconstruction


This is a special class of image restoration problems
+∞
Z +∞
Z where a two-(or higher) dimensional object is recon-
δ(x, y) dx dy = 1 structed from sveral one-dimensional projections.
−∞ −∞
ˆ Image data compression.
It has an interesting property Reduces the amount of storage required for same vi-
+∞ sual information.
Z +∞
Z
δ(x, y)g(x, y) dx dy = g(0, 0) √
j is −1, z ∗ is the complex conjugate of z
−∞ −∞
Seprable form for several well known one-dimensional
Shifting property4 functions their two-dimensional versions are seprable form
Z +∞
+∞ Z f (x, y) = f1 (x)f2 (y)
δn (x − a, y − b)g(x, y) dx dy = g(a, b)
−∞ −∞

Dirac Delta δ(x) = 0 for x 6= 0


Fundamental equation of linear image processing Z +
lim δ(x) dx = 1
g=Hf →0 −

Where
Shifting Property
ˆ H = hαβ Z +∞
f (x0 )δ(x − x0 ) dx0 = f (x)
ˆ hTαβ −∞
≡ [h(0, α, 0, β), h(1, α, 0, β), ..., h(N −
1, α, 0, β), h(0, α, 1, β), h(1, α, 1, β), ..., h(N −
1, α, 1, β), ..., h(0, α, N − 1, β), h(1, α, N − Scaling Property
1, β), ..., h(N − 1, α, N − 1, β)] δ(x)
δ(ax) =
|a|
ˆ Image f T
≡ [f (0, 0), f (1, 0), ..., f (N −
Kronecker Delta
1, 0), f (0, 1), f (1, 1), ..., f (N − 1, 1), ..., f (0, N −
1), f (1, N − 1), ..., f (N − 1, N − 1)]

0 n 6= 0
δ(n) =
1 n=0
Vn it is N × 1 column matrix having all elements except
nth set as zero.
Shifting Property of Kronecker Delta
Image processing refers to processing of two dimensional
picture by a digital computer. Basic classes of image pro- +∞
X
cessing applications f (m)δ(n − m) = f (n)
m=−∞
ˆ Image representation and modeling
Concerns with charqcterization of the quantity that
Rectangle
each picture-element represents. Image can represent 
1 |x| ≤ 1
rect(x) = 2
luminance of object in the scene, absorption charac- 0 |x| > 1
2
teristics of the body tissue, the radar cross section of a
target, temperature profile of a region ... or anything.
Signum
ˆ Image enhencement

 1 x>0
The goal is to accenture certain image feature for sgn(x) = 0 x=0
sbusequent analysis or for image display. 
−1 x<0
ˆ Image restoration
Refers to removal or minimization of known degrada- Sinc
sin πx
tion in an image. It includes deblurring, noise filter- sinc(x) =
πx
ing, geometric distortion corrections .. etc.
ˆ Image analysis Comb

Concerns with making quantative measurements from X
an image to produce its discription. comb(x) = δ(x − n)
n=−∞
4δ − a, y − b) = n2 rect[n(x − a), n(y − b)]
n (x

o
15 of ?? κtiwari [at] cse.iitk.ac.in
8 IMAGE PERCEPTION

Triangle  Two dimensional Fourier trnsform is defined in similar way


1 − |x| |x| ≤ 1 as
tri(x) =
0 |x| > 1 Z ∞Z ∞
4
F (ξ1 , ξ2 ) = f (x, y) exp[−j2π(xξ1 + yξ2 )] dx dy
−∞ −∞
Linear Syatem

H[a1 x1 (m, n)+a2 x2 (m, n)] = a1 H[x1 (m, n)]+a2 H[x2 (m, n)] Properties of the Fourier Transform.

1. Spatial Frequencies
Impulse Response When input is Kronecker delta function If f (x, y) is luminance and x, y the spatial coordi-
at location (m0 , n0 ) the output at location (m, n) is defined nates, then ξ1 , ξ2 are the spacial frequencies that rep-
as resent luminance changes with respect to spatial dis-
h(m, n, m0 , n0 ) = H[δ(m − m0 , n − n0 )] tances.

2. Uniqueness
PSFimpulse response is called the Point Spread Function For continuous functions f (x, y) and F (ξ1 , ξ2 ) are
when input and output represent a positive quantity. unique with one another.

Output of any linear systemcan be obtained from its im- 3. Seprability


pulse response and the input by applying the superposition By definition of FT kernal is seprable, so that it can
rule be written as a seprable transform in x and y that is
Z ∞Z ∞
y(m, n) = H[x(m, n)] 4
XX F (ξ1 , ξ2 ) = f (x, y) exp[−j2π(xξ1 +yξ2 )] dx dy
x(m0 , n0 )δ(m − m0 , n − n0 )

= H −∞ −∞
m0 n0
XX 4. Frequency response and eigenfunction of shift
x(m0 , n0 )H δ(m − m0 , n − n0 )
 
=
invarient system1
m0 n0
XX
= x(m0 , n0 ) h(m, n, m0 , n0 )
m0 n0

8 Image Perception
Shift invarient (or spatially invarient) system if translation
of input causes the translation of output. Lightis the electromagnetic radiation that stimulates our
H[δ(m, n)] = h(m, n; 0, 0) visual response. Light received from an oblect can be writ-
ten as
By definition I(λ) = ρ(λ)L(λ)
4 Where ρ(λ) represents the reflectivity or transmissivity
h(m, n, m0 , n0 ) = H[δ(m − m0 , n − n0 )] of the object, L(λ) is the incident energy distribution.
h(m, n, m0 , n0 ) = h(m − m0 , n − n0 ) Human Photoreceptors
Property Rods Cones
No. 100 million 6.5 million
Convolution for shift invarient system the output becomes Vision Scotopic Photopic
X∞ X∞ Color No Color color vision
0 0 0 0 Nerves one for group one for every
y(m, n) = h(m − m , n − n )x(m , n )
m=−∞ n=−∞ Concentration outer to fovea near to fovea
Luminance or intensityof a spatially distributed object
with light distribution I(x, y, λ) is defined as
Z ∞
7 Fourier Transform f (x, y) = I(x, y, λ) V (λ) dλ
0
The fourier transform of a complex function f (x) is fedined
as where V (λ) is relative luminous efficiency function of the
Z ∞ vidual system. For human eye V (λ) is bell-shaped curve.
4 4
F (ξ) = F[f (x)] = f (x) exp(−j2πξx) dx Brightness is the perceived luminance and depends on the
−∞ luminance of surround. Objects with same luminance can
have different brightness.
The inverse Fourier Transform F (ξ) is defined as
Simultaneous ContrastSince our perception is sensitive to
Z ∞
4 4 luminus contrast rather tham the absolute value, therefore
f (x) = F −1 [F (ξ)] = F (ξ) exp(j2πξx) dx two squares of same luminance value embedded between
−∞

o
16 of ?? κtiwari [at] cse.iitk.ac.in
9.1 References 9 IMAGE SAMPLING AND QUANTIZATION

square of different darkness will appear of different lumi- namely


nance.
COMB(ξ1 , ξ2 ) = F{comb(x, y, ∆x, ∆y)}
Weber’s LawIf the luminance fo of an object is just no- X∞X
ticeably different from the luminance of sorround fs , then = ξxs ξys δ(ξ1 − kξxs , ξ2 − kξys )
k,l=−∞
|fs − fo |
= Constant X∞X
fo = ξxs ξys comb(ξ1 − kξxs , ξ2 − kξys )
when fo = f and fs = f + 4f we can say k,l=−∞
(5)
4f
= d(log f ) = Constant
f ∆1 1∆
Where ξxs = ∆x , and ξys = ∆y , Finally Fourier transform
Value of Constant is found to be 0.02 of the sampled image fs (x, y) is given by convolution
Mach BandsThe effect shows that brightness is not a F (ξ , ξ ) = F (ξ1 , ξ2 ) ~ COMB(ξ1 , ξ2 )
s 1 2
monotonic function of luminance. Consider any two ad-
X∞X
jacent different gray level bars; The apparent brightness = ξxs ξys F (ξ1 , ξ2 ) ~ δ(ξ1 − kξxs , ξ2 − kξys )
is not uniform alonf the width of the bar. Transition of k,l=−∞
the gray level at the bar appears brighter at darker side X∞X
and diaker at lighter side. The overshoot and undershoot = ξxs ξys F (ξ1 − kξxs , ξ2 − kξys )
illustrates the Match band effect. (ref page 75 of book) k,l=−∞

Modulation Transfer Function (MTF) also known as spa- (6)


tial frequency response is a metric which characterizes
Therefore the Fourier transform of a sampled image is,
sharpness of a photographic imaging system or of a com-
within a scaler factor, a periodic replication of the Fourier
ponent of the system (lens, film, image sensor, scanner,
transform of the input image on a grid whose spacing is
enlarging lens, etc.)
(ξxs , ξys )
Reconstruction of the image from its samples If the x, y
9 Image Sampling and Quantiza- sampling frequencies are grater than twice the bandwidth,
that is ξxs > 2ξx0 , ξys > 2ξy0 or equivalently ∆x <
tion 1 1
2ξx0 , ∆y < 2ξy0 Then F (ξ1 , ξ2 ) can be reconstructed by
a low-pass filter with frequency response
Bandlimited signalA signal is said to be a band limited  1
signal if all of it’s frequency components are zero above a ξxs ξys if ξ1 , ξ2 ∈ R
H(ξ1 , ξ2 ) =
certain finite frequency. A function f (x, y) is bandlimited 0 Otherwise
if its Fourier transform F (ξ1 , ξ2 ) is zero outside a bounded
Where R is any region whose boundary ∂R is contained
region in frequency plane, that is F (ξ1 , ξ2 ) = 0, |ξ1 | > ξx0
within the annular ring between the rectangles R1 and
and |ξ2 | > ξy0 . Quantities ξx0 , and ξy0 are called the x
R2 .
and y bandwidth of the image.
Bayer filter mosaicis a color filter array for arranging RGB
Fourier transform of an arbitary sampled functionis a
color filters on a square grid of photosensors. The filter
scaled, periodic replication of the Fourier transform of the
pattern is 50% green, 25% red and 25% blue to mimic
original function.
the physiology of the human eye5 . Bryce Bayer’s patent
Ideal image sampling function is a two dimensional infinite in 1976 called the green photosensors luminance-sensitive
array of Dirac delta function situated on a rectangular grid elements and the red and blue ones chrominance-sensitive
with spacing ∆x, ∆y elements.


X∞X
comb(x, y, ∆x, ∆y) = δ(x − m∆x, y − n∆y)
m,n=−∞

Samples Imageis defined as

fs (x, y) = f (x, y)comb(x, y; ∆x, ∆y)


X∞X aliasingwraparound error
= f (m∆x, n∆y)δ(x − m∆x, y − n∆y)
m,n=−∞
(4) 9.1 References
ˆ A. K. Jain - Fundamental of Digital Image Processing.
Fourier Transform of a comb function with spacing 5 The retina has more rod cells than cone cells and rod cells are
1 1
∆x, ∆y is another comb function with spacing ( ∆x , ∆y ) most sensitive to green light.

o
17 of ?? κtiwari [at] cse.iitk.ac.in
9.1 References 9 IMAGE SAMPLING AND QUANTIZATION

ˆ Gonzales - DIP

ˆ Netravdi & Haskell - Digital Pictures.

ˆ J R Ohms - Multimedia Communication Technology.

ˆ Bovik - Handbook of Image and Video Processing

ˆ Sayood - Data Compression

ˆ NPTEL notes.

o
18 of ?? κtiwari [at] cse.iitk.ac.in

You might also like