Digital Image Processing
Digital Image Processing
Digital Image Processing
An image may be defines as a two dimensional function, f(x, y), where x and y are spatial (plane) coordinates, and the
amplitude of f at any pair of coordinates (x, y) is called intensity or gray level of the image at that point.
When x, y, and the intensity values of f are all finite, discrete quantities, we can call the image a digital image.
The field of digital image processing refers to processing digital images by means of a digital computer.
A digital image is composed of a finite number of elements, each of which has a particular location and value. These
elements are called picture elements, image elements, pels and pixels. Pixels are the basic building blocks of a digital image or
display and are created using geometric coordinates.
Low-level processes- It involves primitive operations such as image preprocessing to reduce noise, contrast enhancement and
image sharpening. Both, its inputs and outputs, are images.
Mid-level processes- It involves tasks such as segmentation, description of those objects to reduce them to a form suitable
for computer processing and classification (recognition) of individual objects. Its inputs are generally images but its outputs are
attributes extracted from those images(e.g., edges, contours and identity of individual objects)
High-level processes- It involves ‘making sense’ of an ensemble of recognized objects as in image analysis, and at the far end
of the continuum, performing the cognitive functions normally associated with vision. (Artificial intelligence).
1. Image Acquisition: The image is captured by a sensor and digitized. If the output of the camera or sensor is not already in
digital form- an analog-to-digital converter (ADC) digitizes it. It also involves preprocessing, such as scaling. Example- to take an
image by camera.
2. Image Enhancement: to bring out detail that is obscured, or simply to highlight certain features of interest in an image.
Example- to increase the contrast of an image because ‘it looks better’.
3. Image Restoration: to "compensate for" or "undo" defects which degrade an image. Tend to be mathematical or probabilities
models of image degradation. Degradation comes in many forms such as motion blur, noise, and camera mis-focus.
4. Color Image Processing: Use the color of the image to extract features of interest in an image. This may include color
modeling and processing in a digital domain etc.
5. Wavelets and Multiresolution Processing: the foundation for representing images in various degrees of resolution. In
particular, this is used for image data compression and for pyramidal representation, in which images are subdivided successively
into smaller regions.
6. Compression: deals with techniques for reducing the storage required saving an image, or the bandwidth required
transmitting it. Image compression is familiar to most users of computers in the form of image file extensions, such as the jpg file
extension used in the JPEG image compression standard
7. Morphological Processing: the tools for extracting image components those are useful in the representation and description
of shape.
8. Image Segmentation: Computer tries to separate objects from the image background. It is the process in which image is
converted into small segments so that we can extract the more accurate image attributes. It is one of the most difficult tasks in
DIP. If the segments are properly autonomous (two segments of an image should not have any identical information) then
representation and description of image will be accurate and if we are taking rugged segmentation, the result will not be
accurate.
9. Representation and description: The first decision that must be made is whether the data should be represented as a
boundary or as a complete region. Boundary representation is appropriate when the focus is on external shape characteristics,
such as corners and inflections. Regional representation is appropriate when the focus is on internal properties, such as texture
or skeletal shape. Choosing a representation is only part of the solution for transforming raw data into a form suitable for
subsequent computer processing. Description, also called feature selection, deals with extracting attributes that result in some
quantitative information of interest or are basic for differentiating one class of objects from another.
10.Recognition: It is the process that assigns a label (e.g., “vehicle”) to an object based on its descriptors.
11.Knowledge Base: It is the only base that helps to synchronize all the processes to each other.
1. Image Sensor: With reference to sensing, two elements are required to acquire digital images. The first is a physical device
that is sensitive to the energy radiated by the object we wish to image. The second, called a digitizer, is a device for converting
the output of the physical sensing device into digital form. For instance, in a digital video camera, the sensors produce an
electrical output proportional to light intensity. The digitizer converts these outputs to digital data.
2. Specialized Image Processing Hardware: It usually consists of the digitizer plus hardware that performs other primitive
operations such as arithmetic and logical operations (ALU). Eg. Noise reduction. This type of hardware sometimes is called a
front end subsystem.
3. Intelligent Processing Machine (Computer): an image processing system is a general-purpose computer and can range from
a PC to a supercomputer. In these systems, almost any well-equipped PC-type machine is suitable for offline image processing
tasks.
4. Image Processing Software: It consists of specialized modules that perform specific tasks. A well-designed package also
includes the capability for the user to write code that, as a minimum, utilizes the specialized modules.
5. Mass Storage: Mass storage capability is a must in image processing applications. An image of size 1024*1024 pixels, in which
the intensity of each pixel is an 8-bit quantity, requires one megabyte of storage space if the image is not compressed. When
dealing with thousands, or even millions, of images, providing adequate storage in an image processing system can be a
challenge. Digital storage for image processing applications falls into three principal categories:
a. Short-term storage for use during processing
b. On-line storage for relatively fast re-call,
c. Archival storage, characterized by infrequent access.
6. Image displays: These in use today are mainly color (preferably flat screen) TV monitors.
7. Hardcopy: devices for recording images include laser printers, film cameras, heat-sensitive devices, inkjet units, and digital
units, such as optical and CD-ROM disks.
8. Networking: By networking, one system user can also process the system at another place. For image, we require high
bandwidth so optical fiber and broadband technologies are better options.
Neighbors of a Pixel
The neighborhood of a pixel is the set of pixels that touch it. Thus, the neighborhood of a pixel can have a maximum of 8
pixels (images are always considered 2D).
1. Any pixel p(x, y) has two vertical and two horizontal neighbors, given by (x+1, y), (x-1, y), (x, y+1), (x, y-1). This set of
pixels is called the 4-neighbors of P, and is denoted by N4 (P). Each pixel is at a unit distance from (x, y). The
neighborhood consisting of only the pixels directly touching. That is, the pixel above, below, to the left and right for the
4-neighbourhood of a particular pixel.
2. The four diagonal neighbors of p(x, y) are given by, (x+1, y+1), (x+1, y-1), (x-1, y+1), (x-1, y-1). This set is denoted by ND
(P). This neighborhood consists of those pixels that do not touch it, or they touch the corners. That is, the diagonal
pixels.
3. The points ND(P) and N4(P) are together known as 8-neighbors of the point P, denoted by N8(P). Some of the points in
the N4, ND and N8 may fall outside image when (x, y) lies on the border of image.
Two pixels (say p and q) are connected if they are adjacent im some sense and their gray levels satisfy some specified
criterion of similarity. In a binary (black and white) image, two neighboring pixels (as defined above) are connected if their values
are the same, i.e., both equal to 0 (black) or 255 (white). In a gray level image, two neighboring pixels are connected if their
values are close to each other, i.e., they both belong to the same subset of similar gray levels: p∈V and q∈V, where V is a subset
of all gray levels in the image.
1. 4-adjacency: two pixels p and q with values from V are 4-adjacent if q is in the set N4(p).
2. 8-adjacency: two pixels p and q with values from V are 8-adjacent if q is in the set N8(p).
3. m-adjacency: two pixels p and q with values from V are m-adjacent if
i. q is in N4(p)
ii. q is in ND(p) and the set N4(p)∩N4(q) has no pixels whose values are from V.
Mixed adjacency is a modification of 8-adjacency and is used to eliminate the multiple path connections that often
arise when 8-adjacency is used.
A path (curve) from pixel p with coordinates (x,y) to pixel q with coordinates (s,t) is a sequence of distinct pixels:
(x0,y0), (x1,y1), …, (xn,yn)
where (x0,y0)=(x,y),
(xn,yn)=(s,t),
and (xi,yi) is adjacent to (xi-1,yi-1), for 1≤i ≤n ;
In this case n is the length of the path.
If p and q are pixels of an image subset S then p is connected to q in S if there is a path from p to q consisting entirely
of pixels in S. The set of pixels in S that are connected to p is called connected component of S. If it has one connected
component, then the set S is called a connected set.
Distance measures
Given pixels p, q, and z at (x,y), (s,t) and (u,v) respectively, D is a distance function (or metric) if:
(1) D(p,q) ≥ 0 (D(p,q)=0 iff p=q),
(2) D(p,q) = D(q,p), and
(3) D(p,z) ≤ D(p,q) + D(q,z).
• The Euclidean distance between p and q is given by:
De ( p , q )=√ ( x−s ) + ( y−t )
2 2
The pixels having distance less than or equal to some value r from (x,y) are the points contained in a disk of radius r
centered at (x,y).
• The D4 distance (also called the city block distance) between p and q is given by:
D 4 ( p , q )=| x−s|+| y −t|
The pixels having a D4 distance less than some r from (x,y) form a diamond/rhombus centered at (x,y)
Example: pixels where D4 ≤ 2
2
2 1 2 Note: Pixels with D4=1 are the 4-
neighbors of (x, y)
2 1 0 1 2
2 1 2
2
The D8 distance (also called the chessboard distance) between p and q is given by:
D 8 ( p , q )=max (|x−s| ,| y−t|)
The pixels having a D8 distance less than some r from (x,y) form a square centered at (x,y)
2 2 2 2 2
2 1 1 1 2 Note: Pixels with D8=1 are the
2 1 0 1 2 8-neighbors of (x, y)
2 1 1 1 2
2 2 2 2 2
Arithmetic/ Logical Operations
Arithmetic & logic operations on images used extensively in most image processing applications
– May cover the entire image or a subset
1. Addition: (p+q)
• Used often for image averaging to reduce noise
2. Subtraction: (p-q)
• Used often for static background removal
3. Multiplication: (p*q) (or pq, p×q)
• Used to correct gray-level shading
4. Division: (p÷q) (or p/q)
• As in multiplication
Neighborhood-oriented operations
Arithmetic and logical operations may take place on a subset of the image.
– Typically neighborhood oriented
The value of 1 pixel is determined by value of its surrounding pixels.
Formulated in the context of mask operations (also called template, window or filter operations)
Basic concept: let the value of a pixel be a function of its (current) gray level and the gray level of its neighbors (in
some sense)
Consider the following subset of pixels in an image.
Suppose we want to filter the image by replacing the value at Z 5 with the average value of the pixels in a 3x3 region
centered around Z5
Perform an operation of the form:
9
1 1
z= ( z 1 + z 2 …+ z 9) = ∑ z i
9 9 i=1
and assign to z5 the value of z.
Imaging Geometry
1) Translation
2) Scaling
3) Rotation
Perspective Transformation
The camera coordinate system (x, y, z) has the image plane coincident with the xy plane and the optical axis
(established by the center of the lens) along the z axis. Thus the center of the image plane is at the origin, and the centre
of the lens is at coordinates (0.0, λ). If the camera is in focus for distant objects, λ is the focal length of the lens. Here the
assumption is that the camera coordinate system is aligned with the world coordinate system (X, Y, Z).
W (world) Wh (world homogeneous)
Wh Ch (Camera homogeneous)
Ch C (camera)
A point in the cartesian world coordinate system may be expressed in vector form as,
X
Y
Z
¿
righ
¿
¿
¿
[¿ ] [¿ ]¿
W= ¿
¿
Wh= ¿
¿
[ ]
1 0 0 0
0 1 0 0
0 0 1 0
−1
0 0 1
P= λ
The product PWh yields a vector denoted Ch; Ch=PWh
The inverse perspective transformation maps an image point back into 3-D.
wh=P-1Ch
Suppose that an image point has coordinates (xo, yo, 0), where the 0 in the z location simply indicates that the image
plane is located at z = 0. This point may be expressed in homogeneous
vector form as
This result obviously is unexpected because it gives Z = 0 for any 3-D point. The problem here is caused by mapping a 3-D scene
onto the image plane, which is a many-to-one transformation. The image point (x0, y0) corresponds to the set of collinear 3-D
points that lie on the line passing through (xo, yo, 0) and (0, 0, λ).
Stereo Imaging
Missing depth information in the case of perspective transformation can be obtained by using stereoscopic imaging.
It involves obtaining 2 image views of an object having world point ‘w’.
We assume that cameras are identical in the coordinate system. Both the cameras are perfectly aligned, differing
only in the location of their origin. (x, y) plane of the image is aligned with (x, y) plane of world coordinate system.
Hence, Z coordinate of both ‘w’ systems is same for both camera coordinate systems.
x
X = ( λ−Z )
λ
For first camera,
x1
X 1= (λ−Z 1)
λ
For second camera,
x2
X 2= ( λ−Z 2)
λ
X 2 = X 1+ B
Z 1=Z 2=Z
x1
X 1= (λ−Z)
λ
x2
X 1 + B= (λ−Z)
λ
Perform (7) –(6)
λ−Z
B= ( x 2−x 1 )
λ
λB
=λ−Z
x 2−x 1
λB
Z =λ−
x 2−x 1
Visual Perception
Small values of Weber ratio mean good brightness discrimination (and vice versa). At low levels of illumination
brightness discrimination is poor (rods) and it improves significantly as background illumination increases (cones).
A particular type of EM radiation that can be seen and sensed by the human eye.
– Range from approximately 0.43µm (violet) to 0.79 µm (red)
– Violet, blue, green, yellow, orange, red
– Blend smoothly
A body that reflects light and is relatively balanced in all visible wavelengths
– appears white to the observer.
A body that favors reflectance in a limited range of the visible spectrum
– exhibits some shades of color.
Achromatic or monochromatic light:
– the only attribute is intensity--Gray-level
– Black to Gray to White
Illuminance is the amount of source light incident on scene, represented as i(x,y) and luminance that of the light
reflected from it. Reflectance is the amount of light refllected by the object in the scene. It is represented by r(x,y).