Image Processing Module1 Notes
Image Processing Module1 Notes
Module 1
Digital Image Fundamentals
1 Digital Image
An image may be defined as a two-dimensional function, f (x, y), where x and y are spatial (plane) coordi-
nates, and the amplitude of f at any pair of coordinates (x, y) is called the intensity of the image at that
point. When x, y, and the intensity values of f are all finite, discrete quantities, we call the image a digital
image.
A digital image is composed of a finite number of elements, each of which has a particular location and
value. These elements are called picture elements, image elements, pels, and pixels.
Pixel is the smallest resolvable unit of a digital image or display
There are no clear-cut boundaries in the continuum from image processing at one end to computer vision
at the other. However, one useful paradigm is to consider three types of computerized processes in this
continuum: low-, mid-, and highlevel processes.
Low-level processes involve primitive operations such as image preprocessing to reduce noise, contrast
enhancement, and image sharpening. A lowlevel process is characterized by the fact that both its inputs
and outputs are images.
Mid-level processing of images involves tasks such as segmentation (partitioning an image into regions
or objects), description of those objects to reduce them to a form suitable for computer processing, and
classification (recognition) of individual objects. A mid-level process is characterized by the fact that its
1
Module 1 Digital Image Fundamentals
inputs generally are images, but its outputs are attributes extracted from those images (e.g., edges, contours,
and the identity of individual objects)
High-level processing involves making sense of an ensemble of recognized objects, as in image analysis,
and, at the far end of the continuum, performing the cognitive functions normally associated with human
vision.
On the other hand, there are fields such as computer vision whose ultimate goal is to use computers
to emulate human vision, including learning and being able to make inferences and take actions based on
visual inputs. This area itself is a branch of artificial intelligence (AI) whose objective is to emulate human
intelligence. The field of AI is in its earliest stages of infancy in terms of development, with progress having
been much slower than originally anticipated. The area of image analysis (also called image understanding)
is in between image processing and computer vision.
1. Image acquisition Acquisition could be as simple as being given an image that is already in digital
form. Generally, the image acquisition stage involves preprocessing, such as scaling.
2. Image enhancement is the process of manipulating an image so the result is more suitable than the
original for a specific application. (The word specific is important, for example, a method that is quite
useful for enhancing X-ray images may not be the best approach for enhancing satellite images taken
in the infrared band of the electromagnetic spectrum.)
There is no general theory of image enhancement. When an image is processed for visual interpretation,
the viewer is the ultimate judge of how well a particular method works.
3. Image restoration is an area that also deals with improving the appearance of an image. However,
unlike enhancement, which is subjective, image restoration is objective, in the sense that restoration
10. Image pattern classification Image pattern classification is the process that assigns a label (e.g.,
vehicle) to an object based on its feature descriptors.
Methods of image pattern classification range from classical approaches such as minimum-distance,
correlation, and Bayes classifiers, to more modern approaches implemented using deep neural networks
such as convolutional neural networks.
11. Knowledge Knowledge/prior knowledge, about a problem domain is coded into an image processing
system in the form of a knowledge database.
The trend continues toward miniaturizing and blending of general-purpose small computers with special-
ized image processing hardware and software.
Figure 3 shows the basic components comprising a typical general-purpose system used for digital image
processing.
1. Image Sensor
Two subsystems are required to acquire digital images.
(a) Sensor: that responds to the energy radiated by the object we wish to image.
(b) Digitizer: a device for converting the output of the physical sensing device into digital form
(a) Digitizer: that responds to the energy radiated by the object we wish to image. For instance,
in a digital video camera, the sensors (CCD chips) produce an electrical output proportional to
light intensity. The digitizer converts these outputs to digital data.
(b) hardware: that performs other primitive operations, such as an arithmetic logic unit (ALU),
ALU is used is in averaging images as quickly as they are digitized, for the purpose of noise
reduction. This type of hardware sometimes is called a front-end subsystem, and its most dis-
tinguishing characteristic is speed. In other words, this unit performs functions that require fast
data throughputs (e.g., digitizing and averaging video images at 30 frames/s) that the typical
main computer cannot handle.
3. The computer
can range from a PC to a supercomputer
• In dedicated applications, sometimes custom computers are used to achieve a required level of
performance
4. Software
for image processing consists of specialized modules that perform specific tasks
• sophisticated software packages allow the integration of those modules and general-purpose soft-
ware commands eg: OpenCV, Scikit-image, Matplotlib, and more.
5. Mass Storage
An image of size 1024 x 1024 pixels, in which the intensity of each pixel is an 8-bit quantity, requires
one megabyte of storage space if the image is not compressed. When dealing with image databases
that contain thousands, or even millions, of images, providing adequate storage in an image processing
system can be a challenge.
Digital storage for image processing applications falls into three principal categories:
6. Image Displays
in the form of
(a) Monitors: driven by the outputs of image and graphics display cards that are an integral part
of the computer system
(b) Stereo displays: implemented in the form of headgear containing two small displays embedded
in goggles worn by the user.
7. Hardcopy devices
for recording images
• laser printers, film cameras, heatsensitive devices, ink-jet units, and digital units, such as optical
and CD-ROM disks.
8. Networking and cloud communication
for recording images
• Because of the large amount of data inherent in image processing applications, the key consider-
ation in image transmission is bandwidth. In dedicated networks, this typically is not a problem
• but communications with remote sites via the internet are not always as efficient.
The eye is nearly a sphere (with a diameter of about 20 mm) enclosed by three membranes
1. The cornea and sclera outer cover
• The cornea is a tough, transparent tissue that covers the front surface of the eye
• Continuous with the cornea, the sclera is an opaque membrane that encloses the remainder of the
optic globe.
2. The choroid,the ciliary body and the iris along with the lens
• The choroid lies directly below the sclera. This membrane contains a network of blood vessels
that serve as the major source of nutrition to the eye.
• The choroid coat is heavily pigmented,to reduce extraneous light entering the eye and the optic
globe.
• At its front end, the choroid is divided into the ciliary body and the iris
• The iris contracts or expands to control the amount of light that enters the eye.
• The lens consists of concentric layers of fibrous cells and is suspended by fibers that attach to the
ciliary body. It is composed of 60% to 70% water
3. The retina is the innermost membrane
Figure 5 shows the density of rods and cones for a cross section of the right eye. The absence of receptors
due to passing of the optic nerve from the eye, causes the blind spot. Except for this region, the distribution
of receptors is radially symmetric about the fovea.
Figure 6: Graphical representation of the eye looking at a palm tree. Point C is the focal center of the lens.
In an ordinary photographic camera, the lens has a fixed focal length. Focusing at various distances is
achieved by varying the distance between the lens and the imaging plane, where the film (or imaging chip
in the case of a digital camera) is located.
In the human eye, the converse is true; the distance between the center of the lens and the imaging sensor
(the retina) is fixed [approximately 17 mm], and the focal length needed to achieve proper focus is obtained
by varying the shape of the lens.
The fibers in the ciliary body accomplish this by flattening or thickening the lens for distant or near
objects, respectively. The range of focal lengths is approximately 14 mm to 17 mm, the latter taking place
when the eye is relaxed and focused at distances greater than about 3 m.
For example, suppose that a person is looking at a tree 15 m high at a distance of 100 m. Letting h
denote the height of that object in the retinal image, the geometry of Fig. 6 yields
15 h
=
100 17
h = 2.5 mm
• The ability of the eye to discriminate between changes in light intensity at any specific adaptation level
can be determined by a classic experiment. of having a subject look at a flat, uniformly illuminated
area large enough to occupy the entire field of view.
• This area typically is a diffuser, such as opaque glass, illuminated from behind by a light source, I,
• To this field is added an increment of illumination, ∆I , in the form of a short-duration flash is added
as a circle in the center
• If ∆I is not bright enough, there will be no perceivable change.
Figure 9: An example of digital image acquisition. (a) Illumination (energy) source. (b) A scene. (c) Imaging
system. (d) Projection of the scene onto the image plane. (e) Digitized image.
Since the values of an image generated are proportional to energy radiated by a physical source (e.g.,
electromagnetic waves), f (x, y) must be nonnegative and finite; that is,
0 ≤ f (x, y) < ∞
The function f (x, y) is characterized by two components:
1. The amount of source illumination incident on the scene, illumination components i(x, y)
2. The amount of illumination reflected by the objects in the scene, reflectance components r(x, y)
l = f (x, y) (4)
it is evident that l lies in the range
Lmin ≤ l ≤ Lmax
In Practice,
Figure 10: (a) Continuous image. (b) A scan line showing intensity variations along line AB in the continuous
image. (c) Sampling and quantization. (d) Digital scan line. (The black border in (a) is included for clarity.
It is not part of the image).
Figure 10 shows a continuous image f that we want to convert to digital form. An image is continuous
with respect to the x- and y-coordinates, and also in amplitude. To digitize it, we have to sample the function
in both coordinates and also in amplitude.
Digitizing the coordinate values is called sampling.
Digitizing the amplitude values is called quantization.
• However, the values of the samples still span (vertically) a continuous range of intensity values.
• In order to form a digital function, the intensity values also must be converted (quantized) into discrete
quantities.
• The vertical gray bar in Figure 10c depicts the intensity scale divided into eight discrete intervals,
ranging from black to white. The vertical tick marks indicate the specific value assigned to each of the
eight intensity intervals.
• The continuous intensity levels are quantized by assigning one of the eight values to each sample,
depending on the vertical proximity of a sample
• The digital samples resulting from both sampling and quantization are shown as white squares in
Figure 10d
• Starting at the top of the continuous image and carrying out this procedure downward, line by line,
produces a two-dimensional digital image.
Thus the digital image f (x, y), resulting from sampling and quantization is a 2D matrix of real numbers,
which has M rows and N columns.
For notational clarity and convenience, we use integer values for these discrete coordinates: x = 0, 1,
2,. . . ,M -1 and y = 0, 1, 2,. . . , N -1.
the value of the digital image at the origin is f (0,0), and its value at the next coordinates along the first row
is f (0,1).
the value of a digital image at any coordinates (x, y) is denoted f (x, y), the coordinates of an image is called
the spatial domain, with x and y being referred to as spatial variables or spatial coordinates.
Using this notation the digital image is represented in matrix form as
Image digitization requires that decisions be made regarding the values for M, N, and for the number, L,
of discrete intensity levels.
There are no restrictions placed on M and N, other than they have to be positive integers. However, digital
storage and quantizing hardware considerations usually lead to the number of intensity levels, L, being an
integer power of two; that is
L = 2k (5)
where k is an integer. We assume that the discrete levels are equally spaced and that they are integers in
the range [0,L 1]. Sometimes, the range of values spanned by the gray scale is referred to as the dynamic
range.
The number, b, of bits required to store a digital image is
b=M ×N ×k (6)
Figure 11: (a) Image plotted as a surface. (b) Image displayed as a visual intensity array. (c) Image shown
as a 2-D numerical array. (The numbers 0, .5, and 1 represent black, gray, and white, respectively.)
representation is useful when working with grayscale sets whose elements are expressed as triplets of the
form (x, y,z), where x and y are spatial coordinates and z is the value of f at coordinates (x, y).
The representation in Figure 11b is more common, and it shows f (x, y) as it would appear on a computer
display or photograph.
Figure 11c shows, the third representation as an array (matrix) composed of the numerical values of f
(x, y).
These neighbors, together with the 4-neighbors, are called the 8-neighbors of p, denoted by N8 (P )
5.2 Adjacency
Let V be the set of intensity values used to define adjacency. For example, if we are dealing with the
adjacency of pixels whose values are in the range 0 to 255, set V could be any subset of these 256 values.
We consider three types of adjacency:
1. 4-adjacency:
Two pixels p and q with values from V are 4-adjacent if q is in the set N4 (P )
2. 8-adjacency:
Two pixels p and q with values from V are 8-adjacent if q is in the set N8 (P )
3. m-adjacency:
(also called mixed adjacency). Two pixels p and q with values from V are m-adjacent if
(a) q is in N4 (P ), or
T
(b) q is in ND (P ) and the set N4 (P ) N4 (P ) has no pixels whose values are from V.
5.3 Path
A digital path (or curve) from pixel p with coordinates (x0 , y0 ) to pixel q with coordinates (xn , yn ) is a
sequence of distinct pixels with coordinates (x 0, y0 ), (x1 , y1 ),. . . , (xn , yn ) where points (xi , yi ) and
(x i-1 , y i-1 ) are adjacent. We can define 4-, 8-, or m-paths, depending on the type of adjacency specified.
5.4 Connectivity
Let S represent a subset of pixels in an image. Two pixels p and q are said to be connected in S if there
exists a path between them consisting entirely of pixels in S. For any pixel p in S, the set of pixels that are
connected to it in S is called a connected component of S. If it only has one component, and that component
is connected, then S is called a connected set.
5.5 Region
Let R represent a subset of pixels in an image. We call R a region of the image if R is a connected set.
Two regions, Ri and Rj are said to be adjacent if their union forms a connected set. Regions that are not
adjacent are said to be disjoint.
5.6 Boundary
The boundary (also called the border or contour) of a region R is the set of pixels in R that are adjacent to
pixels in the complement of R. Stated another way, the border of a region is the set of pixels in the region
that have at least one background neighbor.
5.7 Edge
Edges in an image represent the measure of gray-level discontinuity at a point.
5.8.2 D − 4 distance
(called the city-block distance) between p and q is defined as
D4 (p, q) = |x − u| + |y − v| (9)
In this case, pixels having a D4 distance from (x, y) that is less than or equal to some value d form a diamond
centered at (x, y). For example, the pixels with D4 distance ≤ 2 from (x, y) (the center point) form the
following contours of constant distance
5.8.3 D − 8 distance
(called the chessboard distance) between p and q is defined as