Image Processing All 5 Units
Image Processing All 5 Units
• Components
What is an Image?
An image is a visual representation of an object or a scene. It is formed by capturing
light reflected or emitted from the object onto a two-dimensional surface, such as a
camera sensor or photographic film. In the context of digital image processing, an image
is represented as a matrix of pixel values, where each pixel value corresponds to the
intensity or color information at a particular point.
Example: A digital photograph taken by a camera is an image, where each pixel
value indicates the brightness and color at that point in the photograph.
Types of Images
Images can be categorized into several types based on their characteristics:
1. Binary Images
Binary images contain only two pixel values, typically 0 (black) and 1 (white). They are
used for representing simple shapes and structures.
Example: A scanned document where text is represented in black on a white back-
ground.
1
ITECH WORLD AKTU BCS057
2. Grayscale Images
Grayscale images represent various shades of gray, ranging from black (0) to white (255
in 8-bit images). They contain only intensity information without color.
Example: A black and white photograph.
3. Color Images
Color images use multiple color channels, such as RGB (Red, Green, Blue), to represent
colors at each pixel. Each channel has its own intensity value, and the combination of
these values determines the final color.
Example: A digital photograph taken in color.
4. Indexed Images
Indexed images use a colormap to map pixel values to specific colors. Each pixel value is
an index into a table of colors.
Example: A GIF image with a limited palette of 256 colors.
5. Multispectral Images
Multispectral images capture data across multiple wavelengths of light, such as infrared,
visible, and ultraviolet. They are used in remote sensing and satellite imagery.
Example: Satellite images used for land cover classification.
2. Preprocessing: Enhancing the quality of the image by removing noise and ad-
justing contrast.
2
ITECH WORLD AKTU BCS057
6. Knowledge Base: Utilizing prior information about the problem domain to assist
in processing.
• Transmission: Digital images can be transmitted over networks with minimal loss
of quality.
• Integration with Other Systems: Digital images can be easily integrated with
other data types, such as text and audio, for multimedia applications.
Disadvantages
• Storage Requirements: High-resolution digital images require significant storage
space.
• Processing Time: Large digital images may require significant processing time
and computational resources for analysis.
3
ITECH WORLD AKTU BCS057
Components
Digital Image Processing involves various components that work together to achieve the
desired image analysis.
• Image Sensors: Devices that capture the image, such as CCD or CMOS sensors
in cameras.
• Software: Programs and tools that provide an interface for implementing image
processing techniques.
4
ITECH WORLD AKTU BCS057
• 1. Light and Color Perception: The way humans perceive colors and brightness,
depending on the wavelength of light.
• 4. Depth Perception: The ability to perceive the world in three dimensions and
judge the distance of objects.
• 7. Adaptation: The ability of the human visual system to adjust to varying levels
of light, ensuring clear vision in different lighting conditions.
Example: The human eye is more sensitive to changes in brightness than to changes
in color, which is why grayscale images often reveal more detail than colored images.
• Image Sensors:
• Image Formation: The process begins with light from the scene entering through
the lens and focusing onto the sensor array. The lens plays a crucial role in deter-
mining the field of view and the focus of the captured image.
5
ITECH WORLD AKTU BCS057
• Digitization: The analog electrical signals from the image sensor are converted
into digital values using an Analog-to-Digital Converter (ADC). This process in-
volves sampling the analog signal at discrete intervals and quantizing the sampled
values into digital numbers, typically represented as a binary code.
• Image Acquisition System: In addition to the sensor and ADC, an image ac-
quisition system may include components like amplifiers, filters, and timing circuits
that ensure accurate signal processing and conversion.
• Image Storage: The digitized image data is stored in memory or transmitted to a
processing unit for further analysis. The format and resolution of the stored image
depend on the application requirements and sensor capabilities.
• Calibration and Correction: Calibration processes like white balance, gamma
correction, and lens distortion correction are applied to the raw image data to ensure
accurate color reproduction and image quality.
Example: In a digital camera, light enters through the lens and strikes the image
sensor, which could be either a CCD or CMOS sensor. The sensor converts the light
into electrical signals, which are then digitized by an ADC. The resulting digital image
is stored in the camera’s memory card, ready for viewing or editing.
6
ITECH WORLD AKTU BCS057
7
ITECH WORLD AKTU BCS057
8
ITECH WORLD AKTU BCS057
– Adjacency:
∗ Adjacency describes the relationship between pixels that share a common
side or corner. There are different types of adjacency:
· 4-adjacency: Two pixels are 4-adjacent if they share a common side.
· 8-adjacency: Two pixels are 8-adjacent if they share a common side
or a common corner.
9
ITECH WORLD AKTU BCS057
Example: In a binary image, two adjacent pixels with the same value are consid-
ered connected. For instance, if both pixels have a value of 1 and share a common
edge, they are 4-connected. This concept is used in connected component labeling
to identify distinct objects in an image.
10
ITECH WORLD AKTU BCS057
– RGB Model:
∗ The RGB model represents colors as combinations of the primary colors
Red, Green, and Blue. Each color is defined by its intensity values of R,
G, and B, ranging from 0 to 255 in an 8-bit representation.
∗ It is widely used in digital displays and imaging devices such as cameras,
monitors, and scanners.
∗ Colors are additive in nature, meaning they are formed by adding the
values of R, G, and B.
– HSI Model:
∗ The HSI model represents colors using three components: Hue (color
type), Saturation (color purity), and Intensity (brightness).
∗ It is more intuitive for human interpretation because it separates color
information (Hue) from brightness (Intensity).
∗ HSI is commonly used in image analysis, computer vision, and color-based
object recognition.
Example: The RGB model is widely used in digital displays and imaging
devices due to its straightforward representation of colors. In contrast, the
HSI model is preferred for image analysis and object recognition because it
separates color information from intensity, making it easier to analyze the
color features of objects independently from their brightness.
11
ITECH WORLD AKTU BCS057
1 2D Transforms
2D transforms are crucial in image processing, facilitating various appli-
cations such as compression, filtering, and feature extraction. Key trans-
forms include:
· Discrete Fourier Transform (DFT):
· The DFT converts an image from the spatial domain to the frequency
domain, allowing for the analysis of frequency components.
· It helps identify periodic patterns and frequencies in images, which is
essential for tasks like image filtering and noise reduction.
· The transformation reveals how different frequency components con-
tribute to the overall image, aiding in various processing techniques.
· Discrete Cosine Transform (DCT):
· The DCT decomposes an image into a sum of cosine functions, empha-
sizing lower frequencies while minimizing high-frequency components.
· It is widely used in JPEG compression, where images are divided into
blocks, and the DCT is applied to each block to reduce data storage
requirements.
· By concentrating on significant low-frequency information, the DCT
allows for effective compression while preserving visual quality.
Example: In JPEG compression, the DCT is applied to each 8x8 block of
pixels. High-frequency components, which typically carry less perceptible
detail, are quantized more coarsely. This enables substantial data reduc-
tion while maintaining acceptable image quality during reconstruction.
12
ITECH WORLD AKTU BCS057
13
ITECH WORLD AKTU
U
T
K
Unit 2: Image Enhancement
A
Syllabus: LD
• Spatial Domain:
– Gray level transformations
– Histogram processing
R
1
1.1 Types of Image Enhancement Techniques
There are two primary domains for image enhancement:
• Spatial Domain Techniques: These techniques directly operate on the pixel
values of an image. Some common spatial domain techniques include:
– Gray Level Transformations: This includes operations like image nega-
tives, contrast stretching, and thresholding, where the pixel values are modified
to enhance the visual quality.
– Histogram Processing: Techniques like histogram equalization and his-
togram matching are used to improve the contrast of the image.
U
– Spatial Filtering: This includes operations like smoothing and sharpening
using various filters (e.g., mean filter, median filter, and Laplacian filter).
T
• Frequency Domain Techniques: These techniques modify the Fourier transform
K
of the image to enhance its appearance. Some common frequency domain techniques
include:
A
– Fourier Transform: By transforming an image into the frequency domain,
filtering operations can be applied more effectively to target specific frequency
components.
LD
– Frequency Domain Filtering: This includes operations like low-pass filter-
ing (smoothing) and high-pass filtering (sharpening).
R
• Object Recognition: Improving image quality for better object detection and
recognition in computer vision applications.
C
2
1.4 Conclusion
Image enhancement plays a crucial role in various fields by improving image quality and
visibility of features. The choice of enhancement techniques depends on the nature of the
image and the desired outcome. Proper care must be taken to balance enhancement and
the preservation of essential image details.
U
T
K
A
LD
2 1. Spatial Domain
R
The spatial domain refers to the space in which images are defined in terms of their pixel
values. In this domain, image processing techniques operate directly on the pixels of
O
an image. This is in contrast to the frequency domain, where transformations like the
Fourier Transform are applied to analyze and modify the frequency components of an
W
image.
Example: Consider a simple 3x3 image matrix with pixel intensity values ranging
from 0 to 255 (for grayscale images):
H
12 45 78
34 56 90
C
78 123 150
Any modification in this matrix directly alters the spatial domain representation of
E
the image. For instance, increasing each pixel value by 10 would brighten the image in
the spatial domain.
IT
3
• Power-Law Transformation: Used for contrast adjustments, also known as
gamma correction.
U
of an image is obtained by subtracting each pixel value from the maximum intensity value
(255).
T
• Formula: If I is the input image, the negative image I ′ is given by:
K
I ′ = 255 − I
A
• Explanation: For a pixel with an intensity value of 50, its negative would be:
I ′ = 255 − 50 = 205
LD
This transformation darkens bright areas and lightens dark areas, resulting in a
visually inverted image.
R
30 100
150 200
W
Here, dark areas have been lightened and bright areas darkened.
C
structures.
4
U
T
K
A
2.2 1.2 Histogram Processing
Histogram processing involves the analysis and modification of an image’s histogram to
LD
enhance its contrast. The histogram of an image represents the distribution of pixel
intensities, and by modifying this distribution, we can improve the visibility of image
features.
Types of Histogram Processing:
R
j=0
C
where sk is the new pixel value, L is the total number of intensity levels, and
pr (rj ) is the probability of occurrence of intensity level rj .
E
values are spread over the entire intensity range, resulting in a higher contrast
image.
– Example: If we have an image with low contrast and we want to match its
histogram to that of a high-contrast reference image, histogram matching can
achieve this transformation.
5
– Example: Adaptive Histogram Equalization (AHE) enhances the contrast
locally in small regions, making it effective for images with varying contrast.
• Linear Filters: These filters use a linear combination of pixel values within a
neighborhood defined by the filter mask.
U
– Example: Smoothing with a Box Filter - The box filter smooths an image
T
by averaging the pixel values in the neighborhood defined by the filter size.
The formula for a box filter is:
K
a b
1 X X
h(x, y) = 2 f (x + i, y + j)
A
n i=−a j=−b
where n is the size of the filter, and a and b are the dimensions of the filter.
LD
• Non-Linear Filters: These filters do not use a linear combination of pixel values.
Examples include median filters, which replace each pixel with the median value of
the neighborhood.
R
• Smoothing Filters: These filters are used to reduce noise and smooth out varia-
tions in an image.
6
– Example: Gaussian Filter - This filter uses a Gaussian function to give
more weight to the central pixel and its neighbors, thereby reducing noise.
The formula is:
1 − x2 +y2 2
h(x, y) = e 2σ
2πσ 2
where σ is the standard deviation, controlling the degree of smoothing.
U
the image, highlighting areas of rapid intensity change (edges). The formula
is:
∂ 2f ∂ 2f
T
h(x, y) = + 2
∂x2 ∂y
K
The Laplacian filter can be applied using a mask like:
0 −1 0
A
−1 4 −1
0 −1 0
LD
R
:
W
with varying frequencies and amplitudes. By transforming an image into the frequency
domain, we can easily manipulate these components for various applications like filtering,
E
Fourier Transform:
The Fourier Transform is a mathematical tool used to convert an image from the
spatial domain (where each pixel value corresponds to a specific location) to the fre-
quency domain (where each value corresponds to a specific frequency component). It
helps to separate the image into its constituent frequencies, making it easier to process
high-frequency components (like edges and textures) and low-frequency components (like
smooth areas) separately.
Formula:
The formula provided is the 2D Discrete Fourier Transform (DFT) of an image f (x, y).
It transforms the image from its spatial representation to its frequency representation.
Explanation:
7
• F (u, v): Represents the frequency component of the image at coordinates (u, v).
• f (x, y): The pixel intensity at the spatial coordinates (x, y).
• The exponential term e−j2π(ux/M +vy/N ) represents the basis functions that oscillate
at different frequencies depending on the values of u and v.
• The double summation sums over all pixel values in the image, weighting them by
the complex exponential term to compute the frequency component F (u, v).
U
T
K
A
LD
R
O
W
H
C
E
IT
• Smoothing Filter:
8
– Useful for blurring and reducing fine details or texture in an image.
– Examples include Gaussian Low-Pass Filter (GLPF) and Ideal Low-Pass Filter
(ILPF).
– Smoothing can reduce sharp transitions, making the image appear softer.
• Sharpening Filter:
U
ter (IHPF).
– Increases the contrast between neighboring pixels, making the image appear
T
sharper.
K
• Practical Applications:
A
– Smoothing filters are used in image preprocessing to reduce noise before further
analysis.
– Sharpening filters are used in medical imaging to enhance anatomical struc-
LD
tures.
• Ideal Filter:
O
– An Ideal Filter has a sharp and distinct boundary between the pass band
(where frequencies are allowed to pass) and the stop band (where frequencies
W
∗ Ideal Low-Pass Filter (ILPF): Allows all frequencies below the cutoff
frequency to pass while blocking all higher frequencies. It is used for
E
frequency and attenuates those below. It is used for enhancing the edges
and fine details of the image.
– Limitations: Due to the sharp cutoff, ideal filters can cause ringing artifacts
in the spatial domain, known as the Gibbs phenomenon, making them less
practical for real-world applications.
• Butterworth Filter:
– A Butterworth Filter provides a gradual transition between the pass band and
the stop band, making it more practical and avoiding the harsh cutoffs seen
in ideal filters.
9
– The degree of smoothness of this transition is controlled by the order of the
filter. A higher-order filter has a sharper transition, while a lower-order filter
has a more gradual roll-off.
– Types of Butterworth filters include:
∗ Butterworth Low-Pass Filter (BLPF): Reduces high-frequency com-
ponents more gently compared to an ideal low-pass filter, minimizing ar-
tifacts and preserving important image features.
∗ Butterworth High-Pass Filter (BHPF): Enhances high-frequency
components but with a smooth transition, avoiding the abrupt changes
caused by ideal high-pass filters.
U
– Advantages: The smoother transition reduces ringing artifacts, making But-
T
terworth filters suitable for applications where preserving the integrity of the
image is crucial.
K
• Gaussian Filter:
A
– A Gaussian Filter has a bell-shaped curve both in the spatial and frequency
domains, providing a very smooth transition without any abrupt changes.
– It is defined by the standard deviation parameter, σ, which controls the width
LD
of the Gaussian curve. A larger σ results in a wider curve, which in turn results
in a stronger smoothing effect.
– Types of Gaussian filters include:
R
with minimal distortion, avoiding the ringing artifacts and the abrupt
changes seen with ideal and Butterworth high-pass filters.
– Advantages: Gaussian filters do not introduce ringing artifacts and are
H
widely used for applications requiring smooth and artifact-free filtering. They
are optimal for applications like image blurring, noise reduction, and feature
C
detection.
10
2.8 2.4 Homomorphic Filtering
U
Homomorphic filtering is a technique that combines and manipulates the illumination
and reflectance components of an image in the frequency domain to enhance its overall
T
appearance. This method is particularly useful for improving the contrast and brightness
K
of images with non-uniform illumination, such as photographs taken in poor lighting
conditions.
A
• Concept:
– An image can be modeled as the product of two components:
LD
∗ Illumination Component (i): Represents the varying lighting condi-
tions in the image, usually containing low-frequency information.
∗ Reflectance Component (r): Represents the intrinsic properties of the
R
objects in the image, such as texture and color, usually containing high-
frequency information.
O
log(f (x, y)) = log(i(x, y) · r(x, y)) = log(i(x, y)) + log(r(x, y))
E
11
• Advantages:
• Applications:
U
– Medical Imaging: Enhances the visibility of tissues and organs in medical
images, improving diagnostic accuracy.
T
– Document Processing: Improves the legibility of scanned documents by
K
correcting uneven lighting.
– Satellite Imaging: Enhances the contrast and detail in satellite images af-
A
fected by varying lighting conditions.
• Limitations:
LD
– The choice of filter parameters is crucial and may require experimentation to
achieve optimal results.
– Over-enhancement can lead to the amplification of noise and artifacts in the
R
image.
– May not perform well on images with complex lighting conditions, where the
O
12
U
2.9 2.5 Color Image Enhancement
T
Color image enhancement involves adjusting the intensity and color channels of an image
K
to improve its visual appearance, contrast, and color balance. This process is essential for
enhancing the perceptual quality of images in various applications such as photography,
A
remote sensing, and medical imaging.
• Concept:
LD
– Unlike grayscale images, color images contain multiple channels, usually rep-
resented in the RGB (Red, Green, Blue) color space. Enhancement techniques
must consider all channels to avoid color distortion.
R
– The goal is to improve the visibility of details, correct color imbalances, and
enhance the contrast of the image while preserving its natural appearance.
O
• Techniques:
W
– Histogram Equalization:
∗ Enhances the contrast of the image by redistributing the intensity values
of each color channel.
∗ Can be applied separately to each channel (RGB) or to the intensity com-
H
lead to unnatural color shifts, so it’s often preferred in the HSV or YCbCr
E
∗ Involves stretching the range of intensity values in the image to cover the
full available range, enhancing the contrast.
∗ Often used to improve the visibility of features in low-contrast images.
∗ Limitation: May cause clipping of bright or dark regions if not carefully
adjusted.
– Color Balance Adjustment:
∗ Adjusts the relative intensities of the RGB channels to correct color im-
balances, such as removing color casts due to incorrect white balance.
∗ Often used to make an image appear more natural or to match a desired
aesthetic.
13
– Saturation Enhancement:
∗ Increases the saturation of colors, making the image appear more vibrant
and lively.
∗ Performed in the HSV or HSL (Hue, Saturation, Lightness) color spaces
to avoid affecting the brightness of the image.
∗ Limitation: Over-enhancement can lead to unnatural and oversaturated
colors.
– Color Space Conversion:
∗ Converts the image from one color space (e.g., RGB) to another (e.g.,
U
HSV, Lab) to simplify the enhancement process.
∗ Specific enhancements like contrast or saturation adjustments can be more
T
effectively applied in certain color spaces.
∗ After enhancement, the image is converted back to the original color space.
K
– Gamma Correction:
∗ Adjusts the brightness of the image by applying a power-law transforma-
A
tion to the pixel values.
∗ Useful for correcting lighting issues, such as underexposed or overexposed
images.
LD
∗ Limitation: Incorrect gamma settings can lead to loss of details in shad-
ows or highlights.
• Applications:
R
– Multimedia: Used in video and image editing to achieve desired visual effects
and color corrections.
C
• Challenges:
E
14
IT
E
C
H
W
O
15
R
LD
A
K
T
U
ITECH WORLD AKTU
• Properties
• Noise models
• Mean Filters
• Order Statistics
• Adaptive Filters
• Notch Filters
• Inverse Filtering
• Wiener Filtering
Image Restoration
Image restoration refers to the process of recovering an original image from a degraded
version using mathematical models. The degradation may occur due to various factors
such as noise, motion blur, or any environmental condition.
1
ITECH WORLD AKTU 2
2. Noise Reduction: Effective image restoration reduces various types of noise, en-
hancing clarity and detail in the processed image.
4. Spatial and Frequency Domain: Restoration methods can be applied in both spatial
and frequency domains, allowing for versatile approaches based on specific needs.
6. User Control: Restoration techniques often allow user input for adjusting parame-
ters, enabling tailored processing based on individual requirements.
Degradation Model
A degradation model in image processing defines the relationship between the original
image and the degraded (or observed) image. This model is crucial in image restoration,
as it helps in understanding how the image has been altered by external factors such as
noise, blur, or any other distortions.
Mathematical Representation:
ITECH WORLD AKTU 3
• If the camera moves horizontally during the image capture, the degradation func-
tion could be a horizontal line kernel. Each pixel value in the degraded image is
influenced by the neighboring pixel values along the direction of the motion.
To restore the original image, we apply an inverse filtering technique where we estimate
the original image by deconvolving the blurred image with the degradation function.
However, noise η(x, y) complicates this process, as blindly applying inverse filtering can
amplify the noise.
Conclusion:
The degradation model forms the foundation for many image restoration techniques.
By understanding the nature of the degradation, including the type of noise and the
degradation function, we can apply appropriate restoration techniques to recover the
original image as accurately as possible.
Noise Models
Noise models describe the different types of noise that can affect an image, often de-
pending on the acquisition method and environmental conditions. Some common noise
models are:
• Rayleigh Noise: This noise follows a Rayleigh distribution and is usually observed
in situations where the overall signal involves scattering or multiple reflections, such
as radar or ultrasonic imaging. The probability density function (PDF) is given by:
( 2
z−a −(z−a)
e b for z ≥ a
p(z) = b
0 for z < a
This distribution has a long tail, which means that it is asymmetric. It can often
be modeled for cases where noise is not symmetrically distributed.
Example: Rayleigh noise is common in radar signal processing.
• Gamma (Erlang) Noise: Gamma or Erlang noise is used to model noise where
the data follows a Gamma distribution, commonly in imaging systems where the
variance changes depending on the signal’s amplitude. The PDF for Gamma noise
is:
a(b − 1)k−1 e−(b−1)z
p(z) =
(b − 1)!
Where a and b control the shape and scale of the distribution. This type of noise
often occurs in signals that involve waiting times or processes that sum several
independent variables.
Example: Gamma noise is used to model noise in telecommunications systems.
ITECH WORLD AKTU 5
The exponential distribution is suitable for modeling noise where lower intensities
are more likely to occur, with the likelihood of higher values dropping off exponen-
tially.
Example: Exponential noise is often modeled in signal detection systems, such as
sonar.
These noise models help in selecting appropriate restoration techniques to improve image
quality in various fields of application.
Mean Filters
Mean filters are simple averaging filters used to reduce noise. They operate by averaging
the pixel values in a neighborhood.
m n
1 XX
fmean (x, y) = g(x + i, y + j)
mn i=1 j=1
Example: A 3 × 3 mean filter can reduce Gaussian noise in an image.
Mean Filters
Mean filters are used to reduce noise by averaging the pixel values in a local neighborhood.
This is an effective technique for smoothing images.
ITECH WORLD AKTU 6
2. They are typically applied in a sliding window manner over the image.
5. Mean filters can blur edges as they smooth all pixel values indiscriminately.
• Order statistics filters work by sorting the pixel values within a defined neigh-
borhood (usually a window around the target pixel).
• For each pixel, the surrounding values are arranged in ascending or descending
order.
• The filter then selects a specific pixel value from the sorted list based on the
desired statistical measure (median, minimum, maximum, etc.).
• The process of sorting helps identify outliers and central tendencies, which
play a crucial role in removing noise.
2. Median Filter:
• The most widely used order statistics filter is the Median Filter.
• It replaces the value of a pixel with the median of the pixel values in a defined
neighborhood.
• The median is the middle value in a sorted list, making this filter particularly
robust to outliers such as salt and pepper noise.
3. Edge Preservation:
• Unlike mean filters that blur edges, median filters are known for their ability
to preserve edges while removing noise.
• This characteristic makes them suitable for applications where edge clarity is
important, such as medical imaging or satellite image processing.
ITECH WORLD AKTU 7
• Median filters are highly effective in removing impulse noise, also known as
salt and pepper noise, which manifests as random occurrences of black and
white pixels.
5. Handling Outliers:
• Median filters handle outliers better than mean filters by focusing on the cen-
tral value in the sorted neighborhood, making them less sensitive to extreme
pixel values.
• This makes them particularly effective in images where a small percentage of
pixels are significantly different from their neighbors.
7. Example:
article graphicx
• When the computational block fully overlaps with the pixels inside the image,
the median is calculated using all the neighborhood pixel values.
• Example: The center pixel and all its neighbors are part of the image, so the
median value is computed normally.
• In this scenario, part of the filter window extends beyond the image boundary.
Several strategies can handle this case:
– Padding with zeros: Values outside the image are assumed to be zero.
– Padding with edge values: The nearest image boundary values are
repeated to fill the missing pixels.
– Cyclic padding: The image is treated as wrapping around, so missing
values are taken from the opposite side of the image.
• Example: If the computational block overlaps the edge, values outside the
image can be padded with zeros.
ITECH WORLD AKTU 8
Adaptive Filters
Adaptive filters are dynamic tools that adjust their parameters based on local image
statistics, making them particularly effective in varying noise environments.
4. Popular Example: The Wiener Filter is a widely used adaptive filter known
for its ability to minimize mean square error, adjusting itself based on local noise
statistics.
6. Rapid Changes: They perform particularly well in regions where noise charac-
teristics exhibit rapid changes, ensuring that details are preserved while noise is
reduced.
2. Noise Removal: Band reject filters are specifically utilized to eliminate periodic
interference or noise patterns, such as hum or buzz, that often occur in captured
images.
3. Cutoff Frequencies: The frequencies targeted for removal are bounded by two
cutoff frequencies, D1 and D2 , defining the band that the filter will suppress.
1. Band Pass Filters: Allow frequencies within a specific range to pass through while
attenuating those outside.
2. They effectively isolate certain image features by enhancing desired frequency com-
ponents.
5. Both filter types enhance image quality: band pass filters minimize background
noise, while notch filters provide precise noise removal.
6. The choice of cutoff frequencies in band pass filters influences feature emphasis,
while notch filters focus on eliminating particular interference.
7. Examples: Band pass filters are used in medical imaging to highlight edges, while
notch filters are effective in removing repetitive noise patterns from old photographs.
Inverse Filtering
Inverse filtering attempts to recover the original image by applying the inverse of the
degradation process.
ITECH WORLD AKTU 11
G(u, v)
F (u, v) =
H(u, v)
4. Inverse filtering works well when the noise levels are low.
Wiener Filtering
Wiener filtering minimizes the mean square error between the original and the degraded
image, taking into account both the noise and image properties.
1. Wiener filtering is optimal when both the noise and the image signal have known
power spectra.
2. The Wiener filter smoothens the image while retaining important features.
— END OF UNIT 3 —
ITECH WORLD AKTU
Syllabus
1. Edge detection
2. Edge linking via Hough transform
3. Thresholding
4. Region-based segmentation
5. Region growing
6. Region splitting and merging
7. Morphological processing: erosion and dilation
8. Segmentation by morphological watersheds: basic concepts
9. Dam construction and Watershed segmentation algorithm
1
ITECH WORLD AKTU
Image Segmentation
Image segmentation is the process of dividing an image into distinct regions or segments, where
each segment corresponds to objects or parts of the image. It simplifies the image, making it
easier to analyze by grouping pixels with similar characteristics.
1 Edge Detection
Edge detection is a fundamental tool in image processing and computer vision, primarily used
to identify points in a digital image where the brightness changes sharply or discontinuously.
These sharp changes are known as edges, which typically represent the boundaries between
different objects or regions within the image. Detecting these edges is essential for tasks like
image segmentation, object recognition, and scene understanding.
• Sobel Operator: This operator is used to compute the gradient of image intensity at
each pixel, identifying the direction of the largest possible increase in intensity and the
rate of change in that direction. It works by applying a convolution using two 3x3 kernels
(one for the horizontal and one for the vertical direction).
• Canny Edge Detection: A more advanced multi-step algorithm that provides superior
edge detection by reducing noise and false edges. The steps include Gaussian filtering,
gradient computation, non-maximum suppression, and hysteresis thresholding.
• Prewitt Operator: This is similar to the Sobel operator, but it uses a simpler kernel.
It is less sensitive to noise compared to Sobel and typically used when noise reduction is
less critical.
2
ITECH WORLD AKTU
3
ITECH WORLD AKTU
y = mx + c
However, to avoid the difficulty of representing vertical lines, the Hough transform often
uses the polar coordinate form of a line:
ρ = x cos(θ) + y sin(θ)
Where:
Each point in the image space corresponds to a sinusoidal curve in the parameter space
(ρ, θ).
2. Transform to Hough Space: For each edge pixel (x, y), compute the values of ρ and
θ for a range of angles (usually 0◦ to 180◦ ) and plot the sinusoidal curves in the Hough
parameter space. Each point in the image space corresponds to a curve in Hough space.
4. Identify Peaks in the Accumulator Array: Once all points are transformed into
Hough space, peaks in the accumulator array represent potential lines in the original
image. These peaks correspond to the parameters (ρ, θ) of lines that pass through multiple
edge points.
5. Line Drawing: Finally, map the detected peaks back into the image space to draw the
detected lines.
4
ITECH WORLD AKTU
• Detects multiple shapes: It can be extended to detect different shapes like circles, ellipses,
or other parametric curves by adjusting the parameterization.
• Can handle occlusion: Even when parts of the shape are missing or occluded, the Hough
transform can still link edges and detect the complete shape.
• Discretization errors: The accuracy of the detected lines depends on the resolution of the
parameter space, which may lead to discretization errors.
• Memory usage: The accumulator array can become large, especially when detecting mul-
tiple shapes in an image.
• Road lane detection: Used in autonomous driving to detect lanes on roads by identi-
fying straight lines in the image.
• Medical image analysis: Helps in detecting structures like blood vessels or bone frac-
tures in medical images.
• Object recognition: The transform is used to detect predefined shapes in images, such
as circular objects or regular geometric patterns.
5
ITECH WORLD AKTU
3 Thresholding
Thresholding is a fundamental technique in image processing used to convert a grayscale image
into a binary image by classifying pixel values into two categories: foreground and background.
The goal of thresholding is to segment an image by selecting a proper threshold value, which
differentiates between the object and its background.
• Global Thresholding: A single threshold value T is chosen for the entire image. This
method works well when the image has uniform lighting and clear contrast between the
object and the background. However, it may fail when the lighting conditions are uneven.
– Mean Thresholding: The threshold for each pixel is set to the mean of the inten-
sities in the local neighborhood of that pixel.
– Gaussian Thresholding: The threshold for each pixel is calculated based on a
weighted sum of local intensities, giving more importance to the pixels closer to the
center.
6
ITECH WORLD AKTU
• Before Thresholding: Suppose a grayscale image has the following intensity values for
a small 3x3 region:
100 150 120
200 90 140
60 180 130
• After Thresholding: If a threshold value T = 128 is used, all pixel values greater than
or equal to 128 will be set to 255 (white), and those less than 128 will be set to 0 (black).
The result is:
0 255 0
255 0 255
0 255 255
In this example, the thresholding operation helps in separating brighter regions (object)
from the darker background.
7
ITECH WORLD AKTU
4 Region-Based Segmentation
Region-based segmentation is a technique in image processing that divides an image into regions
based on similarities in pixel properties such as intensity, color, or texture. The goal is to group
together pixels that share similar characteristics, effectively separating different objects or areas
within the image.
8
ITECH WORLD AKTU
• Region Splitting and Merging: This method begins by considering the entire image
as a single region, then splits it into smaller regions. If adjacent regions are found to be
similar, they are merged back together.
2. Grow the Region: For each seed point, examine its neighboring pixels. If a neighboring
pixel has a similar intensity (or another property), it is added to the region.
3. Repeat: The process continues recursively, growing the region by checking the neighbors
of the newly added pixels.
4. Stop Condition: The region-growing process stops when no more neighboring pixels
satisfy the similarity condition.
Criteria for Region Growing:
• Pixel intensity difference: Neighboring pixels are added if their intensity is within a certain
range of the seed point’s intensity.
• Texture: Regions may be grown based on texture similarity rather than intensity.
• Color: For colored images, pixels may be added based on the similarity in RGB or other
color space values.
• The algorithm then grows regions around these seed points, including neighboring pixels
that have similar intensity values to the seed pixel (indicating they belong to the same
tissue type).
9
ITECH WORLD AKTU
• This results in a segmented image where different tissues are clearly separated based on
their intensity.
• Effective in Homogeneous Regions: Region growing works well when there is a clear
difference between objects and background.
• Good for Smooth Boundaries: Produces regions with smooth boundaries, making it
useful for medical image segmentation.
• Noise Sensitivity: If the image is noisy, the region-growing process may include irrele-
vant pixels, leading to inaccurate segmentation.
10
ITECH WORLD AKTU
11
ITECH WORLD AKTU
12
ITECH WORLD AKTU
• Boundary Handling: The method may struggle with accurately segmenting complex
shapes and boundaries.
13
ITECH WORLD AKTU
6.1 Erosion
Erosion is a morphological operation that removes pixels on the boundaries of objects. It works
by shrinking the boundaries of the foreground object based on the shape of the structuring
element.
How Erosion Works:
• A structuring element (e.g., a small square or cross) is placed over each pixel in the image.
• If every pixel under the structuring element matches the shape of the structuring element,
the central pixel remains unchanged. Otherwise, it is removed (set to background).
Effect: Erosion causes objects in the image to shrink and can be useful for removing small
noise, separating connected objects, or shrinking object boundaries.
Erosion Example: Consider a binary image where ’1’ represents foreground pixels (object)
and ’0’ represents background:
0 1 1 0
1 1 1 1
Original Image: 0 1 1 0
0 0 1 0
After applying erosion with a 3 × 3 structuring element, the result might look like:
0 0 0 0
0 1 1 0
Eroded Image: 0 1 1 0
0 0 0 0
6.2 Dilation
Dilation is the opposite of erosion. It adds pixels to the boundaries of objects, effectively
enlarging the object. The structuring element defines the shape of this expansion.
How Dilation Works:
• A structuring element is placed over each pixel in the image.
• If at least one pixel under the structuring element is a foreground pixel, the central pixel
is set to the foreground (expanded).
14
ITECH WORLD AKTU
Effect: Dilation increases the size of objects, fills in small holes, and can connect nearby
objects that are close together.
Dilation Example: Given the same binary image:
0 1 1 0
1 1 1 1
Original Image: 0 1 1 0
0 0 1 0
After applying dilation with a 3 × 3 structuring element, the result might be:
1 1 1 1
1 1 1 1
Dilated Image: 1 1 1 1
0 1 1 1
Here, the object has expanded, filling in the gaps and connecting nearby foreground pixels.
• Object Separation: Erosion helps separate objects that are connected in a binary
image.
• Hole Filling: Dilation can fill small holes or gaps inside objects.
• Edge Detection: Combining erosion and dilation can be used for edge detection by
subtracting the eroded image from the dilated image.
• Opening: Erosion followed by dilation. This operation removes small objects or noise
while preserving the shape and size of larger objects.
• Closing: Dilation followed by erosion. This operation is useful for closing small holes
inside objects and connecting close objects.
15
ITECH WORLD AKTU
2. Flooding Process: The image is conceptually ”flooded” from the lowest intensity points
(basins). As water rises, basins start filling up.
3. Dam Construction: When two basins meet, a dam (ridge) is built to prevent further
merging. These dams represent the segmentation boundaries.
4. Output Segmentation: The final result is the set of ridges or watersheds that separate
different regions (basins) in the image, forming the segmented regions.
16
ITECH WORLD AKTU
7.3 Example
Example of Watershed Algorithm: Consider an image where objects are overlapping or
touching, such as coins or cells. These objects may be difficult to separate using traditional
thresholding or region-growing techniques.
Steps in Watershed Segmentation:
• First, apply a preprocessing step such as noise removal and gradient computation to
emphasize the object boundaries.
• Then, mark the background and foreground regions (e.g., using markers for known ob-
jects).
• The watershed algorithm will flood the valleys, and the boundaries where the water from
different regions meets will be identified as segmentation lines.
• Disadvantages:
• Gradient-Based Watershed: The gradient of the image is computed, and the watershed
algorithm is applied on the gradient, which sharpens the object boundaries.
17
ITECH WORLD AKTU
8 Watershed Algorithm
The watershed algorithm is a region-based segmentation technique that visualizes an image as
a topographic surface. In this topography, pixel intensity values represent the height of the
surface: high-intensity areas correspond to peaks or ridges, and low-intensity areas correspond
to valleys or basins. The goal of the watershed algorithm is to divide the image into distinct
regions by identifying the boundaries between basins.
18
ITECH WORLD AKTU
• The gradient of the image is calculated, which highlights the areas of rapid intensity
change, often corresponding to edges between objects.
2. Identify Markers:
• Markers are placed in the image to identify foreground objects (the regions of inter-
est) and background regions. These markers help guide the segmentation process.
3. Flooding Process:
• Starting from the markers, the image is conceptually ”flooded” from the lowest
intensity to the highest intensity points. Water rises simultaneously from all basins.
4. Dam Construction:
• When water from two different basins meets, a dam (watershed line) is constructed
to prevent the merging of regions. These watershed lines define the boundaries
between regions.
5. Segmentation Output:
• The final result is the segmented image, where regions are divided by the watershed
lines.
19
ITECH WORLD AKTU
8.4 Example
Consider an image containing overlapping circular objects, such as coins. The watershed algo-
rithm can be used to segment these objects, even when they are touching or overlapping.
Steps:
• Preprocessing: Apply a smoothing filter or a morphological operation (e.g., opening)
to remove noise.
• Gradient Computation: Compute the gradient of the image to highlight object bound-
aries.
• Marker Placement: Mark the foreground (coins) and background (spaces between
coins).
• Apply Watershed: The algorithm floods the regions from the markers, and watershed
lines are formed where the regions meet.
Result: The touching coins are successfully separated into distinct segments, with water-
shed lines marking the boundaries between them.
8.7 Applications
• Medical Imaging: Watershed segmentation is used to separate overlapping structures
such as cells, tissues, or anatomical features in medical images.
• Object Detection: Helps in detecting and separating closely connected objects in im-
ages, such as leaves, coins, or industrial components.
20
ITECH WORLD AKTU 1
• Huffman Coding
• Shift Codes
• Arithmetic Coding
• JPEG Standard
• MPEG Standard
• Boundary Representation
• Boundary Description
• Fourier Descriptor
IMAGE COMPRESSION
What is Image Compression?
1. Image compression refers to the process of reducing the amount of data required to
represent a digital image.
2. The goal is to minimize the file size of the image while maintaining acceptable image
quality.
3. Compression can be classified into lossless and lossy methods. Lossless preserves original
data, while lossy sacrifices some data for higher compression rates.
4. Image compression reduces storage space, making it easier to manage and store large
image datasets.
6. Compression is often used to lower the costs of storing and transmitting large images in
sectors like healthcare, satellites, and multimedia.
7. Efficient image compression is vital for applications like video streaming, where bandwidth
and storage costs are critical factors.
What is Data Compression and Why Do We Need It? Data compression involves
encoding information using fewer bits than the original representation. It helps in reducing the
size of data for storage, transmission, and efficient use of bandwidth.
Why Do We Need Data Compression?
• To reduce the time and bandwidth required for data transmission over networks.
• Compression helps in making data more portable and shareable across different systems.
• Reducing data size enables faster processing, reducing both time and cost for large data-
driven applications.
Application Areas:
• Medical Imaging: In medical diagnostics, large images from MRI, CT scans, or X-rays
are compressed to reduce storage and enable quick transmission for remote analysis.
• Multimedia Storage and Streaming: Videos, images, and audio files are compressed
to provide real-time streaming services and reduce storage demands for platforms like
YouTube and Netflix.
• Security and Surveillance: Security cameras generate large amounts of image data
that need to be compressed for storage and monitoring purposes.
Image Compression Algorithms: There are several algorithms used to compress images,
and each employs different techniques for efficient compression. Below are some of the most
popular approaches:
1. Transform Coding:
2. Entropy Coding:
• Entropy coding is a lossless data compression technique that compresses data based
on the probability distribution of the symbols.
• It works by assigning shorter codes to more frequent symbols and longer codes to
less frequent ones, optimizing the overall size of the encoded data.
• Huffman coding and arithmetic coding are the most common types of entropy coding.
• This technique is often used in conjunction with other compression methods to op-
timize the final data size, such as after transform coding or predictive coding.
3. Predictive Coding:
• Predictive coding is a technique where the current pixel value is predicted from
neighboring pixel values, and only the difference (error) between the actual and
predicted values is encoded.
• By encoding the prediction errors, which typically have smaller values than the
original pixel intensities, the data can be more efficiently compressed.
• Lossless predictive coding involves exact predictions with no loss of data, whereas
lossy predictive coding allows some inaccuracies to improve compression.
• Predictive coding is widely used in lossless image formats like PNG and lossless
JPEG.
4. Layered Coding:
Huffman Coding: Huffman coding is a lossless data compression algorithm based on the
frequency of occurrence of a data item. The principle behind Huffman coding is to use shorter
codes for more frequent items and longer codes for less frequent items, resulting in efficient
compression.
• Huffman coding is based on constructing a binary tree where each leaf node represents a
data symbol.
• The most frequent symbols are given shorter binary codes, and the less frequent symbols
are given longer codes.
• The algorithm first calculates the frequency of each symbol in the dataset, then constructs
the binary tree by merging the two lowest-frequency nodes repeatedly until only one node
remains.
• Once the tree is built, each symbol is assigned a unique prefix-free binary code, which
ensures that no code is a prefix of another, making decoding unambiguous.
• Huffman coding is widely used in image compression algorithms like JPEG and PNG.
1. RLE works by replacing consecutive repeated values (or runs) with the value and the
number of times it is repeated.
2. It is highly efficient for data with long runs of the same value, such as simple images or
text files with large blocks of identical characters.
4. The compression ratio depends on the nature of the data—better performance with repet-
itive data and poor performance with random data.
5. In an image, long horizontal lines of the same color can be significantly compressed using
RLE.
6. RLE is simple to implement, making it popular for tasks like image compression in formats
like BMP or TIFF.
7. A drawback of RLE is that it can increase file size for data with few runs, such as noisy
or highly varied images.
Example: In a binary image with a black background and white text, RLE can efficiently
compress the long black sequences.
Arithmetic Coding
Arithmetic coding is a lossless coding technique that represents a sequence of symbols as a
single number in a continuous range of real numbers between 0 and 1.
1. Unlike Huffman coding, which assigns a fixed number of bits to each symbol, arithmetic
coding encodes an entire message as a single fractional number.
2. Arithmetic coding is more efficient in cases where the symbol probabilities are not powers
of two, allowing for more precise compression.
3. It works by subdividing the interval between 0 and 1 according to the probabilities of the
symbols and then successively narrowing this interval based on the input sequence.
4. The final output is a number within the last narrowed range, representing the entire
sequence of symbols.
6. Its main advantage over Huffman coding is its ability to handle sources with fractional
probabilities more efficiently.
7. Arithmetic coding is computationally more complex than Huffman coding, but it offers
better compression for highly skewed probability distributions.
• Speech and audio compression, where symbol probabilities may vary significantly.
JPEG Standard
JPEG (Joint Photographic Experts Group) is a widely used method of lossy compression for
digital images, particularly photographs.
1. JPEG transforms the image into the frequency domain using the Discrete Cosine Trans-
form (DCT).
2. The image is divided into 8x8 blocks of pixels, and each block is transformed into frequency
components.
3. The DCT coefficients are quantized, reducing precision and leading to data loss.
4. High-frequency components, which contribute less to image quality, are more heavily
quantized.
5. The quantized coefficients are encoded using entropy coding such as Huffman or arithmetic
coding.
6. JPEG provides high compression ratios while maintaining acceptable visual quality, mak-
ing it ideal for photographs and web images.
7. One limitation is that multiple compression cycles degrade image quality due to cumula-
tive data loss.
MPEG Standard
MPEG (Moving Picture Experts Group) is a standard for compressing video files, commonly
used in streaming and storage applications.
1. MPEG exploits both spatial redundancy (within frames) and temporal redundancy (be-
tween frames) to achieve compression.
2. Instead of storing every frame completely, MPEG stores a reference frame and then only
the differences between successive frames.
3. It uses techniques like motion estimation to predict frame differences and reduce the data
required.
4. MPEG compression is divided into three types of frames: I-frames (intra-coded), P-frames
(predictive-coded), and B-frames (bi-directional coded).
5. I-frames are compressed independently, while P and B frames are encoded based on
differences with neighboring frames.
6. The use of temporal redundancy allows MPEG to achieve high compression ratios, making
it ideal for streaming video over the internet.
7. MPEG compression standards include MPEG-1, MPEG-2 (used in DVDs), and MPEG-4
(used in online video streaming platforms like YouTube).
Boundary Representation
Boundary representation involves describing the shape of an object by its edges or boundaries.
This is particularly useful for object recognition in images where the focus is on the shape of
the objects.
1. Minimum Perimeter Polygons: This method approximates the object’s boundary us-
ing the smallest possible polygon, helping to simplify the shape while preserving essential
features.
5. Boundary Segments: The boundary can be segmented into parts, each representing a
specific curve or straight-line segment for further analysis.
6. Skeletons: Skeletonization reduces a shape to its central structure, capturing the general
form of the object with minimal data.
Boundary Description
Boundary description refers to the detailed outline of the object’s shape, which can be repre-
sented using different mathematical descriptors. Two important descriptors include:
• Eccentricity: This measures how much the shape deviates from being circular. It is
calculated as the ratio of the major axis to the minor axis of the boundary’s fitted ellipse.
• Curvature: Curvature describes how much the boundary deviates from being straight.
It is the rate of change of direction along the boundary.
Fourier Descriptor
Fourier descriptors provide a compact and efficient representation of a boundary by transform-
ing its coordinates into the frequency domain using the Fourier transform.
1. Fourier descriptors are computed by applying the Fourier transform to the boundary
points, treating them as a sequence of complex numbers.
2. The resulting coefficients represent the boundary at different frequency components, with
low-frequency components capturing the general shape and high-frequency components
capturing finer details.
3. Truncating high-frequency components allows for smoothing the boundary, useful for
noise reduction and shape approximation.
4. Fourier descriptors are invariant to translation, rotation, and scaling, making them ideal
for object recognition tasks.
5. The descriptor is highly efficient in representing complex shapes with fewer coefficients
than other methods.
6. Fourier descriptors are widely used in image analysis for pattern recognition and shape
classification.
7. They also allow for shape reconstruction from a limited number of frequency components.
Topological Features
Topological features describe the structural properties of an image region, independent of geo-
metric distortions. Important topological features include:
2. Euler Number: The Euler number is a topological invariant that measures the difference
between the number of connected components and the number of holes in a region.
4. Total Hole Area: The total hole area is the sum of the areas of all the holes inside a
connected region, providing insight into the shape and structure of the object.
1. Statistical Approach: This approach uses statistical properties such as mean, variance,
and standard deviation of pixel values within the region to describe the texture and
appearance.
3. Spectral Approach: This method uses frequency-domain features derived from trans-
formations like the Fourier transform or wavelet transform to capture the texture and
spatial frequency components of the region.
Relational Descriptor
A relational descriptor describes the spatial relationship between objects or regions in an image.
It captures how different parts of an image are positioned relative to each other, providing
additional context beyond individual object properties. Common relational descriptors include:
This image contains two regions: the top-left and bottom-right squares of ”1”s. A boundary
around the top-left square could be described by the binary sequence 11001100. The sequence
helps describe the pattern of pixel transitions along the boundary.
Binary sequences are also used in run-length encoding, where consecutive pixels of the same
value are compressed by storing the length of each run of identical values. For example, the
binary sequence 11001100 could be encoded as (2, 2, 2, 2), representing two 1’s, two 0’s, two 1’s,
and two 0’s.
2. Encoding a message: Let’s encode the message ”ABC”. Start with the interval [0, 1).
For each symbol, narrow the interval based on its probability range:
• For B, the range within [0, 0.5) is reduced to [0.25, 0.4) (since B occupies [0.5, 0.8) of the
current interval).
• For C, the final range within [0.25, 0.4) is [0.37, 0.4) (since C occupies [0.8, 1) of the
current interval).
3. Final tag: The tag representing the message ”ABC” is any number within the final
interval [0.37, 0.4). For example, we could choose 0.375 as the tag.
4. Decoding: To decode the message, start with the tag 0.375 and reverse the process.
Find which symbol’s range contains the tag:
• Narrow the interval to [0, 0.5) and continue. 0.375 lies in [0.25, 0.4) within this range, so
the second symbol is B.
• Arithmetic coding is particularly useful when symbol probabilities are not powers of two,
leading to more precise compression.
• The final tag represents the entire sequence, allowing for efficient encoding of long mes-
sages.