Digital Image Processing

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Digital Image Processing

 An image may be defines as a two dimensional function, f(x, y), where x and y are spatial (plane) coordinates, and the
amplitude of f at any pair of coordinates (x, y) is called intensity or gray level of the image at that point.
 When x, y, and the intensity values of f are all finite, discrete quantities, we can call the image a digital image.
 The field of digital image processing refers to processing digital images by means of a digital computer.
 A digital image is composed of a finite number of elements, each of which has a particular location and value. These
elements are called picture elements, image elements, pels and pixels. Pixels are the basic building blocks of a digital image or
display and are created using geometric coordinates.

Classification of DIP techniques

 Low-level processes- It involves primitive operations such as image preprocessing to reduce noise, contrast enhancement and
image sharpening. Both, its inputs and outputs, are images.
 Mid-level processes- It involves tasks such as segmentation, description of those objects to reduce them to a form suitable
for computer processing and classification (recognition) of individual objects. Its inputs are generally images but its outputs are
attributes extracted from those images(e.g., edges, contours and identity of individual objects)
 High-level processes- It involves ‘making sense’ of an ensemble of recognized objects as in image analysis, and at the far end
of the continuum, performing the cognitive functions normally associated with vision. (Artificial intelligence).

Fundamental steps in DIP

1. Image Acquisition: The image is captured by a sensor and digitized. If the output of the camera or sensor is not already in
digital form- an analog-to-digital converter (ADC) digitizes it. It also involves preprocessing, such as scaling. Example- to take an
image by camera.
2. Image Enhancement: to bring out detail that is obscured, or simply to highlight certain features of interest in an image.
Example- to increase the contrast of an image because ‘it looks better’.
3. Image Restoration: to "compensate for" or "undo" defects which degrade an image. Tend to be mathematical or probabilities
models of image degradation. Degradation comes in many forms such as motion blur, noise, and camera mis-focus.
4. Color Image Processing: Use the color of the image to extract features of interest in an image. This may include color
modeling and processing in a digital domain etc.
5. Wavelets and Multiresolution Processing: the foundation for representing images in various degrees of resolution. In
particular, this is used for image data compression and for pyramidal representation, in which images are subdivided successively
into smaller regions.
6. Compression: deals with techniques for reducing the storage required saving an image, or the bandwidth required
transmitting it. Image compression is familiar to most users of computers in the form of image file extensions, such as the jpg file
extension used in the JPEG image compression standard
7. Morphological Processing: the tools for extracting image components those are useful in the representation and description
of shape.
8. Image Segmentation: Computer tries to separate objects from the image background. It is the process in which image is
converted into small segments so that we can extract the more accurate image attributes. It is one of the most difficult tasks in
DIP. If the segments are properly autonomous (two segments of an image should not have any identical information) then
representation and description of image will be accurate and if we are taking rugged segmentation, the result will not be
accurate.
9. Representation and description: The first decision that must be made is whether the data should be represented as a
boundary or as a complete region. Boundary representation is appropriate when the focus is on external shape characteristics,
such as corners and inflections. Regional representation is appropriate when the focus is on internal properties, such as texture
or skeletal shape. Choosing a representation is only part of the solution for transforming raw data into a form suitable for
subsequent computer processing. Description, also called feature selection, deals with extracting attributes that result in some
quantitative information of interest or are basic for differentiating one class of objects from another.
10.Recognition: It is the process that assigns a label (e.g., “vehicle”) to an object based on its descriptors.
11.Knowledge Base: It is the only base that helps to synchronize all the processes to each other.

Components of Image Processing System

1. Image Sensor: With reference to sensing, two elements are required to acquire digital images. The first is a physical device
that is sensitive to the energy radiated by the object we wish to image. The second, called a digitizer, is a device for converting
the output of the physical sensing device into digital form. For instance, in a digital video camera, the sensors produce an
electrical output proportional to light intensity. The digitizer converts these outputs to digital data.
2. Specialized Image Processing Hardware: It usually consists of the digitizer plus hardware that performs other primitive
operations such as arithmetic and logical operations (ALU). Eg. Noise reduction. This type of hardware sometimes is called a
front end subsystem.
3. Intelligent Processing Machine (Computer): an image processing system is a general-purpose computer and can range from
a PC to a supercomputer. In these systems, almost any well-equipped PC-type machine is suitable for offline image processing
tasks.
4. Image Processing Software: It consists of specialized modules that perform specific tasks. A well-designed package also
includes the capability for the user to write code that, as a minimum, utilizes the specialized modules.
5. Mass Storage: Mass storage capability is a must in image processing applications. An image of size 1024*1024 pixels, in which
the intensity of each pixel is an 8-bit quantity, requires one megabyte of storage space if the image is not compressed. When
dealing with thousands, or even millions, of images, providing adequate storage in an image processing system can be a
challenge. Digital storage for image processing applications falls into three principal categories:
a. Short-term storage for use during processing
b. On-line storage for relatively fast re-call,
c. Archival storage, characterized by infrequent access.
6. Image displays: These in use today are mainly color (preferably flat screen) TV monitors.
7. Hardcopy: devices for recording images include laser printers, film cameras, heat-sensitive devices, inkjet units, and digital
units, such as optical and CD-ROM disks.
8. Networking: By networking, one system user can also process the system at another place. For image, we require high
bandwidth so optical fiber and broadband technologies are better options.
Neighbors of a Pixel

The neighborhood of a pixel is the set of pixels that touch it. Thus, the neighborhood of a pixel can have a maximum of 8
pixels (images are always considered 2D).

1. Any pixel p(x, y) has two vertical and two horizontal neighbors, given by (x+1, y), (x-1, y), (x, y+1), (x, y-1). This set of
pixels is called the 4-neighbors of P, and is denoted by N4 (P). Each pixel is at a unit distance from (x, y). The
neighborhood consisting of only the pixels directly touching. That is, the pixel above, below, to the left and right for the
4-neighbourhood of a particular pixel.

2. The four diagonal neighbors of p(x, y) are given by, (x+1, y+1), (x+1, y-1), (x-1, y+1), (x-1, y-1). This set is denoted by ND
(P). This neighborhood consists of those pixels that do not touch it, or they touch the corners. That is, the diagonal
pixels.

3. The points ND(P) and N4(P) are together known as 8-neighbors of the point P, denoted by N8(P). Some of the points in
the N4, ND and N8 may fall outside image when (x, y) lies on the border of image.

Adjacency and Connectivity

Two pixels (say p and q) are connected if they are adjacent im some sense and their gray levels satisfy some specified
criterion of similarity. In a binary (black and white) image, two neighboring pixels (as defined above) are connected if their values
are the same, i.e., both equal to 0 (black) or 255 (white). In a gray level image, two neighboring pixels are connected if their
values are close to each other, i.e., they both belong to the same subset of similar gray levels: p∈V and q∈V, where V is a subset
of all gray levels in the image.

In a binary image, two pixels p and q are

1. 4-adjacency: two pixels p and q with values from V are 4-adjacent if q is in the set N4(p).
2. 8-adjacency: two pixels p and q with values from V are 8-adjacent if q is in the set N8(p).
3. m-adjacency: two pixels p and q with values from V are m-adjacent if
i. q is in N4(p)
ii. q is in ND(p) and the set N4(p)∩N4(q) has no pixels whose values are from V.

Mixed adjacency is a modification of 8-adjacency and is used to eliminate the multiple path connections that often
arise when 8-adjacency is used.

A path (curve) from pixel p with coordinates (x,y) to pixel q with coordinates (s,t) is a sequence of distinct pixels:
(x0,y0), (x1,y1), …, (xn,yn)
where (x0,y0)=(x,y),
(xn,yn)=(s,t),
and (xi,yi) is adjacent to (xi-1,yi-1), for 1≤i ≤n ;
In this case n is the length of the path.
If p and q are pixels of an image subset S then p is connected to q in S if there is a path from p to q consisting entirely
of pixels in S. The set of pixels in S that are connected to p is called connected component of S. If it has one connected
component, then the set S is called a connected set.

Labelling of Connected Components

Refer side notes.

Distance measures
Given pixels p, q, and z at (x,y), (s,t) and (u,v) respectively, D is a distance function (or metric) if:
(1) D(p,q) ≥ 0 (D(p,q)=0 iff p=q),
(2) D(p,q) = D(q,p), and
(3) D(p,z) ≤ D(p,q) + D(q,z).
• The Euclidean distance between p and q is given by:
De ( p , q )=√ ( x−s ) + ( y−t )
2 2

The pixels having distance less than or equal to some value r from (x,y) are the points contained in a disk of radius r
centered at (x,y).

• The D4 distance (also called the city block distance) between p and q is given by:
D 4 ( p , q )=| x−s|+| y −t|

The pixels having a D4 distance less than some r from (x,y) form a diamond/rhombus centered at (x,y)
Example: pixels where D4 ≤ 2

2
2 1 2 Note: Pixels with D4=1 are the 4-
neighbors of (x, y)
2 1 0 1 2
2 1 2
2
 The D8 distance (also called the chessboard distance) between p and q is given by:
D 8 ( p , q )=max ⁡(|x−s| ,| y−t|)

The pixels having a D8 distance less than some r from (x,y) form a square centered at (x,y)

Example: pixels where D8 ≤ 2

2 2 2 2 2
2 1 1 1 2 Note: Pixels with D8=1 are the
2 1 0 1 2 8-neighbors of (x, y)

2 1 1 1 2
2 2 2 2 2
Arithmetic/ Logical Operations

 Arithmetic & logic operations on images used extensively in most image processing applications
– May cover the entire image or a subset

Arithmetic operation between pixels p and q are defined as:

1. Addition: (p+q)
• Used often for image averaging to reduce noise
2. Subtraction: (p-q)
• Used often for static background removal
3. Multiplication: (p*q) (or pq, p×q)
• Used to correct gray-level shading
4. Division: (p÷q) (or p/q)
• As in multiplication

 Arithmetic operation between pixels p and q are defined as:

– AND: p AND q (also p⋅q)


– OR: p OR q (also p+q)
– COMPLEMENT: NOT q (also q’)
It forms a functionally complete set.

Logic gates are applicable to binary images.


Arithmetic operations are applicable to multivalue pixels.

Neighborhood-oriented operations
Arithmetic and logical operations may take place on a subset of the image.
– Typically neighborhood oriented
The value of 1 pixel is determined by value of its surrounding pixels.
Formulated in the context of mask operations (also called template, window or filter operations)
Basic concept: let the value of a pixel be a function of its (current) gray level and the gray level of its neighbors (in
some sense)
Consider the following subset of pixels in an image.
Suppose we want to filter the image by replacing the value at Z 5 with the average value of the pixels in a 3x3 region
centered around Z5
Perform an operation of the form:
9
1 1
z= ( z 1 + z 2 …+ z 9) = ∑ z i
9 9 i=1
and assign to z5 the value of z.

In the more general form, the operation may look like:


9
z=( w1 z 1 +w 2 z 2 +…+w 9 z 9 ) =∑ w i z i
i=1
• This equation is widely used in image processing
• Proper selection of coefficients (weights) allows for operations such as
– noise reduction
– region thinning
– edge detection

Imaging Geometry

1) Translation

2) Scaling
3) Rotation

Perspective Transformation

A perspective transformation (also called an imaging transformation) projects 3D points onto a


plane. Perspective transformations play a central role in image processing because they provide
an approximation to the manner in which an image is formed by viewing a 3D world. These
transformations are fundamentally different, because they are nonlinear in that they involve
division by coordinate values.

The camera coordinate system (x, y, z) has the image plane coincident with the xy plane and the optical axis
(established by the center of the lens) along the z axis. Thus the center of the image plane is at the origin, and the centre
of the lens is at coordinates (0.0, λ). If the camera is in focus for distant objects, λ is the focal length of the lens. Here the
assumption is that the camera coordinate system is aligned with the world coordinate system (X, Y, Z).
W (world)  Wh (world homogeneous)
Wh  Ch (Camera homogeneous)
Ch  C (camera)
A point in the cartesian world coordinate system may be expressed in vector form as,
X
Y
Z
¿
righ
¿
¿
¿
[¿ ] [¿ ]¿

W= ¿
¿

and its homogeneous counterpart is


kX
kY
kZ
k
¿
righ
¿
¿
¿
[ ¿ ] [ ¿ ] [ ¿] ¿

Wh= ¿
¿

we define the perspective transformation matrix as

[ ]
1 0 0 0
0 1 0 0
0 0 1 0
−1
0 0 1
P= λ
The product PWh yields a vector denoted Ch; Ch=PWh

The inverse perspective transformation maps an image point back into 3-D.
wh=P-1Ch

Suppose that an image point has coordinates (xo, yo, 0), where the 0 in the z location simply indicates that the image
plane is located at z = 0. This point may be expressed in homogeneous
vector form as
This result obviously is unexpected because it gives Z = 0 for any 3-D point. The problem here is caused by mapping a 3-D scene
onto the image plane, which is a many-to-one transformation. The image point (x0, y0) corresponds to the set of collinear 3-D
points that lie on the line passing through (xo, yo, 0) and (0, 0, λ).

Stereo Imaging
Missing depth information in the case of perspective transformation can be obtained by using stereoscopic imaging.
It involves obtaining 2 image views of an object having world point ‘w’.
We assume that cameras are identical in the coordinate system. Both the cameras are perfectly aligned, differing
only in the location of their origin. (x, y) plane of the image is aligned with (x, y) plane of world coordinate system.
Hence, Z coordinate of both ‘w’ systems is same for both camera coordinate systems.

x
X = ( λ−Z )
λ
For first camera,
x1
X 1= (λ−Z 1)
λ
For second camera,
x2
X 2= ( λ−Z 2)
λ

X 2 = X 1+ B

Z 1=Z 2=Z

x1
X 1= (λ−Z)
λ

x2
X 1 + B= (λ−Z)
λ
Perform (7) –(6)
λ−Z
B= ( x 2−x 1 )
λ

λB=( λ−Z)(x 2−x 1)

λB
=λ−Z
x 2−x 1

λB
Z =λ−
x 2−x 1
Visual Perception

 The Human Eye: Diameter: 20 mm,3 membranes enclose the eye


– Cornea & sclera
– Choroid
– Retina
 The Choroid: The choroid contains blood vessels for eye nutrition and is heavily pigmented to reduce extraneous
light entrance and backscatter. It is divided into the ciliary body and the iris diaphragm, which controls the amount
of light that enters the pupil (2 mm ~ 8 mm).
 The Lens: The lens is made up of fibrous cells and is suspended by fibers that attach it to the ciliary body. It is slightly
yellow and absorbs approx. 8% of the visible light spectrum.
 The Retina: The retina lines the entire posterior portion. Discrete light receptors are distributed over the surface of
the retina:
– cones (6-7 million per eye) Three types
1. Red (R)
2. Green (G)
3. Blue (B)
– rods (75-150 million per eye)
 Cones: Cones are located in the fovea and are sensitive to color. Each one is connected to its own nerve end. Cone
vision is called photopic (or bright-light vision).
 Rods: Rods are giving a general, overall picture of the field of view and are not involved in color vision. Several rods
are connected to a single nerve and are sensitive to low levels of illumination (scotopic or dim-light vision).
 Receptor Distribution: The distribution of receptors is radially symmetric about the fovea. Cones are most dense in
the center of the fovea while rods increase in density from the center out to approximately 20% off axis and then
decrease.
 The Fovea: The fovea is circular (1.5 mm in diameter) but can be assumed to be a square sensor array (1.5 mm x 1.5
mm). The density of cones: 150,000 elements/mm2 ~ 337,000 for the fovea. A CCD imaging chip of medium
resolution needs 5 mm x 5 mm for this number of elements.
 Image Formation in the Eye: The eye lens (if compared to an optical lens) is flexible. It gets controlled by the fibers
of the ciliary body and to focus on distant objects it gets flatter (and vice versa). Distance between the center of
the lens and the retina (focal length):
– varies from 17 mm to 14 mm (refractive power of lens goes from minimum to maximum).
Objects farther than 3 m use minimum refractive lens powers (and vice versa). Perception takes place by the relative
excitation of light receptors. These receptors transform radiant energy into electrical impulses that are ultimately
decoded by the brain.
 Brightness Adaptation & Discrimination: Range of light intensity levels to which HVS (human visual system) can
adapt: on the order of 1010. Subjective brightness (i.e. intensity as perceived by the HVS) is a logarithmic function
of the light intensity incident on the eye. The HVS cannot operate over such a range simultaneously. For any given
set of conditions, the current sensitivity level of HVS is called the brightness adaptation level. The eye also
discriminates between changes in brightness at any specific adaptation level.

Small values of Weber ratio mean good brightness discrimination (and vice versa). At low levels of illumination
brightness discrimination is poor (rods) and it improves significantly as background illumination increases (cones).
 A particular type of EM radiation that can be seen and sensed by the human eye.
– Range from approximately 0.43µm (violet) to 0.79 µm (red)
– Violet, blue, green, yellow, orange, red
– Blend smoothly
 A body that reflects light and is relatively balanced in all visible wavelengths
– appears white to the observer.
A body that favors reflectance in a limited range of the visible spectrum
– exhibits some shades of color.
Achromatic or monochromatic light:
– the only attribute is intensity--Gray-level
– Black to Gray to White
 Illuminance is the amount of source light incident on scene, represented as i(x,y) and luminance that of the light
reflected from it. Reflectance is the amount of light refllected by the object in the scene. It is represented by r(x,y).

You might also like