0% found this document useful (0 votes)
30 views66 pages

Computer Vision Unit 1, 2

The document provides an overview of computer vision and image processing, detailing various techniques, types, and applications. It covers methods such as image enhancement, restoration, segmentation, and filtering, along with their practical uses in fields like medical imaging, remote sensing, and security. Additionally, it discusses thresholding techniques, including simple and adaptive thresholding, highlighting their pros and cons.

Uploaded by

Kamalin Dany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views66 pages

Computer Vision Unit 1, 2

The document provides an overview of computer vision and image processing, detailing various techniques, types, and applications. It covers methods such as image enhancement, restoration, segmentation, and filtering, along with their practical uses in fields like medical imaging, remote sensing, and security. Additionally, it discusses thresholding techniques, including simple and adaptive thresholding, highlighting their pros and cons.

Uploaded by

Kamalin Dany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 66

Module 1

What is computer vision?


Computer vision is a field of artificial intelligence (AI) that uses machine learning
and neural networks to teach computers and systems to derive meaningful
information from digital images, videos and other visual inputs—and to make
recommendations or take actions when they see defects or issues.
What is Image Processing?
Image processing is a method used to perform operations on an image to enhance it
or to extract useful information from it. It involves various techniques and
algorithms that process images in a digital format. This can include a range of tasks
such as improving the visual quality of images, detecting patterns, segmenting
objects, and transforming images into different formats. Image processing can be
used for both photos and video frames. The process usually involves steps such as
inputting the image, processing the image through various algorithms, and then
outputting the results in a format that is usable or can be further analyzed.
Types of Image Processing
1. Analog Image Processing
Analog image processing refers to techniques used to process images in their
analog form, such as photographs, printed pictures, or images captured on film.
This type of processing involves modifying images through physical or chemical
means. Before the advent of digital technology, all image processing was done
using analog methods. These methods are generally less flexible and more time-
consuming compared to digital techniques, but they have historical significance
and specific applications.
2. Digital Image Processing
Digital image processing involves the use of computer algorithms to perform
operations on digital images. Unlike analog processing, digital techniques offer
more flexibility, precision, and automation. Digital images are composed of pixels,
and processing these images involves manipulating pixel values to achieve the
desired effect. The use of digital processing is widespread due to its efficiency and
the vast array of tools and techniques available.
Image Processing Techniques
1. Image Enhancement
1. Contrast Adjustment
Contrast adjustment is a technique used to improve the visibility of features in an
image by enhancing the difference between the light and dark areas. This can be
achieved through methods like contrast stretching, which adjusts the intensity
values of pixels to span the full range of the histogram.
2. Histogram Equalization
Histogram equalization is a method used to enhance the contrast of an image by
transforming its intensity values so that the histogram of the output image is evenly
distributed. This technique improves the global contrast and is particularly useful
in images with backgrounds and foregrounds that are both bright or both dark.
3. Noise Reduction
Noise reduction techniques are used to remove unwanted random variations in
brightness or color, known as noise, from an image. Common methods include
median filtering, Gaussian smoothing, and bilateral filtering, each of which aims to
smooth the image while preserving important details.
2. Image Restoration
1. Deblurring
Deblurring techniques are used to restore sharpness to an image that has been
blurred due to factors like camera shake or motion. Methods such as inverse
filtering and Wiener filtering are commonly employed to reconstruct the original
image.
2. Inpainting
Inpainting involves reconstructing lost or deteriorated parts of an image. This
technique is often used for restoring old photographs, removing objects, or filling
in missing data. Algorithms for inpainting include patch-based methods and partial
differential equations (PDE) based methods.
3. Denoising
Denoising is the process of removing noise from an image while preserving its
details. Techniques such as wavelet thresholding and non-local means filtering are
used to achieve this, ensuring that the image quality is improved without losing
significant features.
3. Image Segmentation
1. Thresholding
Thresholding is a simple technique for segmenting an image by converting it into a
binary image. This is done by selecting a threshold value, and all pixels with
intensity values above the threshold are turned white, while those below are turned
black.
2. Edge Detection
Edge detection involves identifying the boundaries within an image. Techniques
like the Sobel, Canny, and Prewitt operators are used to detect edges by finding
areas of high intensity gradient.
3. Region-Based Segmentation
Region-based segmentation divides an image into regions based on predefined
criteria. This can include methods like region growing, where adjacent pixels are
grouped based on similar properties, and watershed segmentation, which treats the
image like a topographic map.
4. Image Compression
1. Lossy Compression
Lossy compression reduces the size of an image file by permanently eliminating
certain information, especially redundant data. Techniques like JPEG compression
are used to significantly reduce file size at the cost of some loss in quality.
2. Lossless Compression
Lossless compression reduces the image file size without any loss of quality.
Methods such as PNG compression ensure that all original data can be perfectly
reconstructed from the compressed file.
5. Image Synthesis
1. Texture Synthesis
Texture synthesis generates large textures from small sample images, ensuring that
the generated texture looks natural and continuous. This technique is widely used
in computer graphics and game design.
2. Image Generation
Image generation involves creating new images from scratch or based on existing
images using techniques such as generative adversarial networks (GANs). This can
be used in applications like creating realistic human faces or artistic images.
5. Feature Extraction
1. Shape and Texture Analysis
Shape and texture analysis techniques are used to identify and quantify the shapes
and textures within an image. Methods like edge detection, contour analysis, and
texture filters help in understanding the geometric and surface properties of objects
in the image.
2. Color Detection
Color detection involves identifying and segmenting objects based on their color
properties. Techniques such as color thresholding and color histograms are used to
analyze the color distribution and extract relevant features.
3. Pattern Recognition
Pattern recognition is the process of classifying input data into objects or classes
based on key features. Techniques such as neural networks, support vector
machines, and template matching are used to recognize patterns and make
classifications.
6. Morphological Processing
1. Dilation and Erosion
Dilation and erosion are basic morphological operations used to process binary
images. Dilation adds pixels to the boundaries of objects, making them larger,
while erosion removes pixels from the boundaries, making objects smaller.
2. Opening and Closing
Opening and closing are compound operations used to remove noise and smooth
images. Opening involves erosion followed by dilation, which removes small
objects and smooths contours. Closing involves dilation followed by erosion,
which fills small holes and gaps.
3. Morphological Filters
Morphological filters are used to process images based on their shapes. These
filters, including hit-or-miss transform and morphological gradient, are used to
extract relevant structures and enhance image features.
Applications of Image Processing
1. Medical Imaging
 MRI and CT Scans: Enhancing the clarity of MRI and CT scans for better
diagnosis and treatment planning.
 X-Ray Imaging: Improving the quality and detail of X-ray images to detect
fractures, tumors, and other anomalies.
 Ultrasound Imaging: Enhancing ultrasound images for more accurate
visualization of internal organs and fetal development.
2. Remote Sensing
 Satellite Imaging: Analyzing satellite images for applications like land use
mapping and resource monitoring.
 Aerial Photography: Using drones and aircraft to capture high-resolution
images for mapping and surveying.
 Environmental Monitoring: Monitoring environmental changes and natural
disasters using image analysis.
3. Industrial Inspection
 Quality Control: Automating the inspection process to ensure product
quality and consistency.
 Defect Detection: Detecting defects in manufacturing processes to maintain
high standards.
 Robotics Vision: Enabling robots to interpret and navigate their environment
using image processing techniques.
4. Security and Surveillance
 Facial Recognition: Identifying individuals by analyzing facial features for
security purposes.
 Object Detection: Detecting and identifying objects in surveillance footage
to enhance security measures.
 Motion Detection: Monitoring and detecting movement in video feeds for
security and surveillance.
5. Automotive Industry
 Autonomous Vehicles: Processing images from sensors to enable
autonomous driving.
 Traffic Sign Recognition: Identifying and interpreting traffic signs to assist
drivers and autonomous systems.
 Driver Assistance Systems: Enhancing driver safety with features like lane
departure warnings and collision avoidance.
6. Entertainment and Multimedia
 Photo and Video Editing: Enhancing and manipulating images and videos
for artistic and practical purposes.
 Virtual Reality and Augmented Reality: Creating immersive experiences by
integrating real-world images with virtual elements.
 Gaming: Enhancing graphics and creating realistic environments in video
games.
7. Document Processing
 OCR (Optical Character Recognition): Converting printed text into digital
text for easy editing and searching.
 Barcode and QR Code Scanning: Reading and interpreting barcodes and QR
codes for quick information retrieval.
 Document Enhancement and Restoration: Improving the quality of scanned
documents and restoring old or damaged documents.

Classical filtering operations


Filtering is a technique for modifying or enhancing an image. For example, you
can filter an image to emphasize certain features or remove other features. Filtering
is a neighborhood operation, in which the value of any given pixel in the output
image is determined by applying some algorithm to the values of the pixels in the
neighborhood of the corresponding input pixel.
Filtering
Filtering is the process of manipulating an image to alter its appearance. This can
be done by applying special effects, such as blurring, sharpening, or color
correction. Filters are used to modify the appearance of an image, making it look
more interesting or realistic. They can also be used to create a certain effect, such
as making an image look brighter, more vivid, or more detailed.
Types of Filters
There are several types of filters that can be used to modify an image. Some of the
most common filters are a blur, sharpen, edge detection, color correction, and noise
reduction.
 Blur: Blur filters are used to soften the edges of an image, creating a more
subtle look. This can be used to make an image look more natural or to
reduce the visibility of distracting details.
 Sharpen: Sharpen filters are used to make an image appear sharper and
clearer. This can be used to make an image look more detailed or to make
the colors more vibrant.
 Edge Detection: Edge detection filters are used to accentuate the outlines of
an image. This can be used to make objects stand out more or to create a
more dramatic effect.
 Color Correction: Color correction filters are used to adjust the hue,
saturation, and brightness of an image. This can be used to make an image
look more realistic, or to create a specific color palette.
 Noise Reduction: Noise reduction filters are used to remove unwanted noise
from an image. This can be used to make an image look more natural or to
reduce the visibility of digital artifacts.
How to Use Filters?
Filters can be applied to an image using a variety of software programs. Many
programs, such as Adobe Photoshop and GIMP, include a range of filters that can
be used to modify an image. Some programs also have specialized filters for
specific tasks, such as noise reduction or color correction. When using filters, it is
important to be mindful of the effects they can have on an image. For example,
blur filters can make an image look softer, but can also reduce the visibility of
important details. Similarly, sharpened filters can make an image look sharper, but
can also make the colors look overly saturated. It is important to experiment with
different filters to find the right balance of effects.
Applications of Filtering
 Image processing: Filtering techniques can be used to enhance the clarity
and detail of an image. For instance, a low-pass filter can be used to reduce
noise, while a high-pass filter can be used to sharpen the image.
 Video processing: Filtering techniques can be used to reduce noise and
enhance the quality of a video. For instance, a low-pass filter can be used to
reduce background noise, while a high-pass filter can be used to sharpen the
video.
 Medical imaging: Filtering techniques can be used to enhance the clarity and
detail of medical images. For instance, a low-pass filter can be used to
reduce noise and a high-pass filter can be used to sharpen the image.
 3D rendering: Filtering techniques can be used to enhance the clarity and
detail of 3D renderings. For instance, a low-pass filter can be used to reduce
noise, while a high-pass filter can be used to sharpen the image.
 Image Enhancement: Filtering is used to enhance the visual appearance of
digital images. It can be used to improve contrast, sharpness, brightness, and
color balance. It can also be used to reduce noise and artifacts.
 Image Restoration: Filtering is also used for image restoration, which is the
process of restoring a degraded image to its original state. This is often done
by removing noise, sharpening the image, and correcting the color balance.
 Image Compression: Filtering can be used to reduce the size of an image
without compromising on its quality. This is achieved by eliminating
redundant information in the image.
 Image Segmentation: Filtering can also be used for image segmentation,
which is the process of separating an image into regions of interest. This is
often done by applying a filter to the image to identify regions with specific
characteristics.
Advantages of Filtering
 Improved clarity and detail: Filtering techniques can be used to improve the
clarity and detail of an image. Filtering can reduce noise and sharpen the
image, resulting in a more detailed and accurate picture.
 Enhanced contrast and definition: Filtering techniques can be used to add
contrast and definition to an image. Edge detectors can be used to detect
edges and enhance them, resulting in a more visually appealing image.
 Reduced file size: Filtering techniques can be used to reduce the size of an
image. By reducing the amount of noise in an image, the file size can be
significantly reduced. This can be beneficial for applications where file size
is a concern.
 Improved Image Quality: Filtering can improve the visual quality of digital
images by enhancing certain features, reducing noise, and eliminating
undesired artifacts.
 Reduced Image Size: Filtering can also be used to reduce the size of an
image without compromising on its quality. This is achieved by eliminating
redundant information in the image.
 Increased Efficiency: Filtering can increase the efficiency of computer
graphics operations, as it can reduce the amount of time and resources
needed to process an image.
 Increased Accuracy: Filtering can also increase the accuracy of computer
graphics operations by eliminating noise and artifacts that can lead to errors.

 3) Edge Detection — Edge detection image filters highlight the borders


separating various areas or objects in a picture. In processes like object
identification, picture segmentation, and feature extraction, edges must be
detected.

 4) Image Enhancement — Filters may bring out particular details or qualities


in an image. For instance, sharpening filters can enhance the sharpness and
definition of edges, giving the appearance of greater detail in the image. By
increasing the visual contrast between various components in the image,
contrast enhancement filters can make some parts of the scene stand out
more.

 5) Aesthetic Alterations — Image filtering may also be employed for


aesthetic reasons, enabling photographers, designers, and artists to alter the
look of their photos to create a certain visual aesthetic or emotional state.
Filters can be used to produce artistic alterations, replicate film effects, and
create retro aesthetics.
Thresholding techniques
Image thresholding is a technique in computer vision that converts a grayscale
image into a binary image by setting each pixel to either black or white based on a
specific threshold value.
Thresholding Techniques in Computer Vision
1. Simple Thresholding
Simple thresholding uses a single threshold value to classify pixel intensities. If a
pixel's intensity is greater than the threshold, it is set to 255 (white); otherwise, it is
set to 0 (black).
[Tex]\begin{equation} T(x, y) = \begin{cases} 0 & \text{if } I(x, y) \leq T \\ 255 &
\text{if } I(x, y) > T \end{cases} \end{equation} [/Tex]
In this formula:
 I(x,y) is the intensity of the pixel at coordinates (x, y).
 T is the threshold value.
 If the pixel intensity I(x,y) is less than or equal to the threshold T, the output
pixel value is set to 0 (black).
 If the pixel intensity I(x,y) is greater than the threshold T, the output pixel
value is set to 255 (white).
Pros of Simple Thresholding
 Simple and easy to implement.
 Computationally efficient.
Cons of Simple Thresholding
 Ineffective for images with varying lighting conditions.
 Requires manual selection of the threshold value.
2. Adaptive Thresholding
Adaptive thresholding is used for images with non-uniform illumination. Instead of
a single global threshold value, it calculates the threshold for small regions of the
image, which allows for better handling of varying lighting conditions.
Types of Adaptive Thresholding
 Mean Thresholding: The threshold value is the mean of the neighborhood
area.
o T(x,y)=∑(i,j)∈Nf(i,j)T(x,y)=∑(i,j)∈Nf(i,j)
o here,
o N is the neighborhood of (x,y)
o |N| is the number of pixels in the neighborhood
 Gaussian Thresholding: The threshold value is a weighted sum (Gaussian
window) of the neighborhood area.
o T(x,y)=∑(i,j)∈Nw(i,j)I(i,j)T(x,y)=∑(i,j)∈Nw(i,j)I(i,j)
o here,
o w(i,j) are the weights given by the Gaussian window
Pros of Adaptive Thresholding
 Handles varying illumination well.
 More accurate for complex images.
Cons of Adaptive Thresholding
 More computationally intensive.
 Requires careful selection of neighborhood size and method parameters.
3. Otsu's Thresholding
Otsu's method is an automatic thresholding technique that calculates the optimal
threshold value by minimizing the intra-class variance (the variance within the
foreground and background classes).
Steps to perform Otsu's Thresholding
1. Compute the histogram and probabilities of each intensity level.
2. Compute the cumulative sums, means, and variances for all threshold
values.
3. Select the threshold that minimizes the within-class variance.
 σb2(T)=ω1(T)ω2(T)(μ1(T)−μ2(T))2σb2(T)=ω1(T)ω2(T)(μ1(T)−μ2
(T))2
 here,
o where ω1 and ω2 are the probabilities of the two classes
separated by a threshold T, and μ1 and μ2 are the means of
these classes.
Pros of Otsu's Thresholding
 Automatic selection of the threshold value.
 Effective for bimodal histograms.
Cons of Otsu's Thresholding
 Assumes a bimodal histogram, which may not be suitable for all images.
 Computationally more intensive than simple thresholding.
4. Multilevel Thresholding
Multilevel thresholding extends simple thresholding by using multiple threshold
values to segment the image into more than two regions. This is useful for images
with complex structures and varying intensities.
Approaches of Multilevel Thresholding
 Otsu's Method Extension: Extending Otsu's method to multiple levels.
 Optimization Techniques: Using optimization algorithms to determine
multiple thresholds.
Pros of Multilevel Thresholding
 Can segment images into multiple regions.
 Useful for images with complex intensity distributions.
Cons of Multilevel Thresholding
 More computationally intensive.
 Requires careful selection of the number of thresholds.
5. Color Thresholding
In color images, thresholding can be applied to each color channel (e.g., RGB,
HSV) separately. This method leverages color information to segment objects.
Approaches of Color Thresholding
 Manual Thresholding: Setting thresholds for each color channel manually.
 Automatic Thresholding: Using methods like Otsu's method for each
channel.
Pros of Color Thresholding
 Effective for segmenting objects based on color.
 Can handle images with rich color information.
Cons of Color Thresholding
 More complex than grayscale thresholding.
 Requires careful selection of thresholds for each channel.
6. Local Thresholding
Local thresholding calculates a different threshold for each pixel based on its local
neighborhood. This method is effective for images with non-uniform illumination
or varying textures.
Techniques of Local Thresholding
1. Niblack's Method
 The threshold is calculated as the mean of the local neighborhood minus a
constant times the standard deviation.
 T(x,y)=μ(x,y)+kσ(x,y)T(x,y)=μ(x,y)+kσ(x,y)
 Here,
o μ(x,y) is the mean and σ(x,y) is the standard deviation of the local
neighborhood
o k is a constant.
2. Sauvola's Method
 An improvement over Niblack's method that adjusts the constant factor
dynamically based on the mean and standard deviation.
 T(x,y)=μ(x,y)[1+k(σ(x,y)R−1)]T(x,y)=μ(x,y)[1+k(Rσ(x,y)−1)]
 Here,
o R is the dynamic range of the standard deviation
o k is a constant
Pros of Local Thresholding
 Handles non-uniform illumination well.
 More accurate for textured images.
Cons of Local Thresholding
 Computationally intensive.
 Sensitive to parameter selection.
7. Global Thresholding
Global thresholding uses a single threshold value for the entire image. This
technique is suitable for images with uniform lighting and clear contrast between
the foreground and background.
Pros of Global Thresholding
 Simple and easy to implement.
 Computationally efficient.
Cons of Global Thresholding
 Not suitable for images with varying illumination.
 Requires manual selection of the threshold value
8. Iterative Thresholding
Iterative thresholding starts with an initial guess for the threshold value and
iteratively refines it based on the mean intensity of the pixels above and below the
threshold. The process continues until the threshold value converges.
Steps to perform Iterative Thresholding
1. Choose an initial threshold value ToTo.
2. Segment the image into two classes C1C1 and C2C2 using TkTk.
3. Compute the mean intensities μ1μ1 and μ2μ2 of C1C1 and C2C2.
4. Update the threshold value:
 Tk+1=μ1+μ22Tk+1=2μ1+μ2
5. Repeat steps 2-4 until ∣Tk+1−Tk∣<ϵ∣Tk+1−Tk∣<ϵ
Pros of Iterative Thresholding
 Provides an automatic way to determine the threshold.
 Suitable for images with a clear distinction between foreground and
background.
Cons of Iterative Thresholding
 May require several iterations to converge.
 Not effective for images with complex intensity distributions.
Applications of Thresholding
Thresholding techniques are used in various applications, including:
1. Document Image Analysis: Thresholding is widely used to binarize text in
scanned documents, making it easier for Optical Character Recognition
(OCR) systems to process the text.
2. Medical Imaging: In medical imaging, thresholding is used to segment
anatomical structures in MRI or CT scans, aiding in diagnosis and treatment
planning.
3. Industrial Inspection: Thresholding is employed in industrial inspection
systems to detect defects in manufactured products, ensuring quality control.
4. Object Detection: In survillance footage or robotic vision systems,
thresholding is used to identify and track objects, enhancing security and
automation.

Edge Detection Techniques


The concept of edge detection is used to detect the location and presence of edges
by making changes in the intensity of an image. Different operations are used in
image processing to detect edges. It can detect the variation of grey levels but it
quickly gives response when a noise is detected. In image processing, edge
detection is a very important task. Edge detection is the main tool in pattern
recognition, image segmentation and scene analysis. It is a type of filter which is
applied to extract the edge points in an image. Sudden changes in an image occurs
when the edge of an image contour across the brightness of the image.
In image processing, edges are interpreted as a single class of singularity. In a
function, the singularity is characterized as discontinuities in which the gradient
approaches are infinity.
As we know that the image data is in the discrete form so edges of the image are
defined as the local maxima of the gradient. lll
Mostly edges exits between objects and objects, primitives and primitives, objects
and background. The objects which are reflected back are in discontinuous form.
Methods of edge detection study to change a single pixel of an image in gray area.
Edge detection is mostly used for the measurement, detection and location changes
in an image gray. Edges are the basic feature of an image. In an object, the clearest
part is the edges and lines. With the help of edges and lines, an object structure is
known. That is why extracting the edges is a very important technique in graphics
processing and feature extraction.
The basic idea behind edge detection is as follows:
1. To highlight local edge operator use edge enhancement operator.
2. Define the edge strength and set the edge points.
NOTE: edge detection cannot be performed when there are noise and blurring
image.

There are 4 edge detection operators they are as follows:


1. Sobel Edge Detection Operator
The Sobel edge detection operator extracts all the edges of an image, without
worrying about the directions. The main advantage of the Sobel operator is that it
provides differencing and smoothing effect.
Sobel edge detection operator is implemented as the sum of two directional edges.
And the resulting image is a unidirectional outline in the original image.
Sobel Edge detection operator consists of 3x3 convolution
kernels. Gx is a simple kernel and Gy is rotated by 90°
These Kernels are applied separately to input image because separate
measurements can be produced in each orientation i.e Gx and Gy.
Following is the gradient magnitude:

As it is much faster to compute An approximate magnitude is computed:

2. Robert's cross operator


Robert's cross operator is used to perform 2-D spatial gradient measurement on an
image which is simple and quick to compute. In Robert's cross operator, at each
point pixel values represents the absolute magnitude of the input image at that
point.
Robert's cross operator consists of 2x2 convolution kernels. Gx is a simple kernel
and Gy is rotated by 90o
Advertisement

Following is the gradient magnitude:

As it is much faster to compute An approximate magnitude is computed:


3. Laplacian of Gaussian
The Laplacian of Gaussian is a 2-D isotropic measure of an image. In an
image, Laplacian is the highlighted region in which rapid intensity changes
and it is also used for edge detection. The Laplacian is applied to an image
which is been smoothed using a Gaussian smoothing filter to reduce the
sensitivity of noise. This operator takes a single grey level image as input
and produces a single grey level image as output.

Following is the Laplacian L(x,y) of an image which has pixel intensity


value I(x, y).

In Laplacian, the input image is represented as a set of discrete pixels. So


discrete convolution kernel which can approximate second derivatives in
the definition is found.
3 commonly used kernels are as following:

This is 3 discrete approximations which are used commonly in Laplacian


filter.

Following is 2-D Log function with Gaussian standard deviation:


4. Prewitt operator

Prewitt operator is a differentiation operator. Prewitt operator is used for


calculating the approximate gradient of the image intensity function. In an
image, at each point, the Prewitt operator results in gradient vector or
normal vector. In Prewitt operator, an image is convolved in the horizontal
and vertical direction with small, separable and integer-valued filter. It is
inexpensive in terms of computations.
Some Real-world Applications of Image Edge Detection:

 medical imaging, study of anatomical structure


 locate an object in satellite images
 automatic traffic controlling systems
 face recognition, and fingerprint recognition
Corner detection
To detect the corners of objects in an image, one can start by detecting edges then
determining where two edges meet. There are however other methods, among
which:
 the Moravec detector [Moravec 1980],
 the Harris detector [Harris & Stephens 1988].
Moravec detector
The principle of this detector is to observe if a sub-image, moved around one pixel
in all directions, changes significantly. If this is the case, then the considered pixel
is a corner.
Fig. 121 Principle of Moravec detector. From left to right : on a flat area, small
shifts in the sub-image (in red) do not cause any change; on a contour, we observe
changes in only one direction; around a corner there are significant changes in all
directions.
Mathematically, the change is characterized in each pixel (m,n) of the image
by Em,n(x,y) which represents the difference between the sub-images for an
offset (x,y):
∀m,n,x,yEm,n(x,y)=∑u,vwm,n(u,v)[f(u+x,v+y)−f(u,v)]2
where:
 x and y represent the offsets in the four directions: (x,y)∈{(1,0),(1,1),(0,1),
(−1,1)},
 wm,n is a rectangular window around pixel (m,n),
 f(u+x,v+y)−f(u,v) is the difference between the sub-image f(u,v) and the
offset patch f(u+x,v+y),
In each pixel (m,n), the minimum of Em,n(x,y) in the four directions is kept and
denoted Fm,n. Finally, the detected corners correspond to the local maxima
of Fm,n, that is, at pixels (m,n) where the smallest value of Em,n(x,y) is large.
It turns out that Moravec detector has several limitations. First, w is a binary
window and therefore the detector considers all pixels in the window with the same
weight. When the noise in the image is high, it can lead to false corner
detections. Second, only four directions are considered. Third, the detector
remains very sensitive to edges because only the minimum of E is considered.
For these reasons, Harris has proposed a detector to overcome these limitations.
Harris detector
The Harris corner detector is a corner detection operator that is commonly used in
computer vision algorithms to extract corners and infer features of an image.
Corner Detection : Corners are locations in images where a slight shift in the
location will lead to a large change in intensity in both horizontal (X) and vertical
(Y) axes.
Harris Corner Detector
The Harris Corner Detector algorithm in simple words is as follows:
STEP 1. It determines which windows (small image patches) produce very large
variations in intensity when moved in both X and Y directions (i.e. gradients).
STEP 2. With each such window found, a score R is computed.
STEP 3. After applying a threshold to this score, important corners are selected &
marked.

To avoid a noisy response, the rectangular window w of the Moravec detector is


replaced by a Gaussian window w in the expression of Em,n(x,y).
To extend the Moravec detector to all directions, not limited to the initial four
directions, a Taylor series expansion is performed on the shifted sub-
image f(u+x,v+y):
f(u+x,v+y)≈f(u,v)+x∂xf(u,v)+y∂yf(u,v).
Therefore :
Em,n(x,y)≈∑u,vwm,n(u,v)[x∂xf(u,v)+y∂yf(u,v)]2
This expression can be written in the following matrix form:
Em,n(x,y)≈(xy)M(xy)
where
M=∑u,vwm,n(u,v)((∂xf)2∂xf∂yf∂xf∂yf(∂yf)2)
Finally, the last limit of the Moravec detector can be avoided by considering a new
measure of the presence of a corner: more information about the intensity change
in the window can be obtained by analyzing the eigenvalues λ1 and λ2 of the
matrix M (Fig. 122). Indeed, the presence of a corner is attested if the derivatives
of f are very large, then M has large coefficients, and its eigenvalues are also very
large.

Fig. 122 Decision to be taken in function of the eigenvalues.


The calculation of the eigenvalues of M can be difficult, so an alternative is to
calculate:
R=det(M)−k(trace(M))2=λ1λ2−k(λ1+λ2)2
with 0.04<k<0.06.
Thus, the values of R are low in a flat region, negative on an edge, and positive on
a corner (Fig. 123).

Fig. 123 Decision to be taken in function of R.#


The Harris detector is illustrated on the example of Fig. 124.

Fig. 124 Harris detector. The binary images represent the negative (contours),
weak (flat areas) and positive (corners) values of the coefficient R.
Morphological operations
Morphological operations are image-processing techniques used to analyze and
process geometric structures in binary and grayscale images. These operations
focus on the shape and structure of objects within an image. They are particularly
useful in image segmentation, object detection, and noise removal tasks.
Morphological operations aim to probe and transform an image based on its shape,
enhancing features or removing imperfections.
What is Morphological Operations?
Morphological operations are techniques used in image processing that focus on
the structure and form of objects within an image. These operations process images
based on their shapes and are primarily applied to binary images, but can also be
extended to grayscale images. The core idea is to probe an image with a
structuring element and modify the pixel values based on their spatial arrangement
and the shape of the structuring element. Key morphological operations include
erosion, dilation, opening, closing, and others, each serving distinct purposes in
enhancing and analyzing images.
Morphological operations rely on two key elements:
 The Input Image: Usually a binary image, where the objects of interest are
represented by foreground pixels (typically white) and the background by
background pixels (typically black). Grayscale images can also be processed
using morphological operations.
 The Structuring Element: A small matrix or kernel that defines the
neighborhood of pixels over which the operation is performed. The shape
and size of the structuring element can greatly influence the outcome of the
morphological operation.
Different Morphological Operations
1. Erosion
Erosion is a fundamental morphological operation that reduces the size of objects
in a binary image. It works by removing pixels from the boundaries of objects.
 Purpose: To remove small noise, detach connected objects, and erode
boundaries.
 How it Works: The structuring element slides over the image, and for each
position, if all the pixels under the structuring element match the foreground,
the pixel in the output image is set to the foreground. Otherwise, it is set to
the background.
2. Dilation
Dilation is the opposite of erosion and is used to increase the size of objects in an
image.
 Purpose: To join adjacent objects, fill small holes, and enhance features.
 How it Works: The structuring element slides over the image, and for each
position, if any pixel under the structuring element matches the foreground,
the pixel in the output image is set to the foreground.
3. Opening
Opening is a compound operation that involves erosion followed by dilation.
 Purpose: To remove small objects or noise from the image while preserving
the shape and size of larger objects.
 How it Works: First, the image undergoes erosion, which removes small
objects and noise. Then, dilation is applied to restore the size of the
remaining objects to their original dimensions.
4. Closing
Closing is another compound operation that consists of dilation followed by
erosion.
 Purpose: To fill small holes and gaps in objects while preserving their
overall shape.
 How it Works: First, dilation is applied to the image, filling small holes and
gaps. Then, erosion is applied to restore the original size of the objects.
Useful link:
https://www.mathworks.com/help/images/morphological-dilation-and-
erosion.html

Texture
What Is Texture Analysis In Computer Vision?
The texture is one of the major characteristics of image data which is
used for identifying objects or regions of interest in an image.
In computer vision, we are required to deal with the different structural
characteristics of image or video data. The texture is one of the major
characteristics of this kind of data which is used for identifying objects
or regions of interest in an image. In this article, we will have an
understanding of texture and texture analysis. We will also discuss some
of the important procedures that are required to be followed in the way
of texture analysis. The major points to be discussed in this article are
listed below.
Table of Contents
1. What is Texture?
2. What is Texture Analysis?
3. Challenges in Texture Analysis
4. Feature Extraction Method for Categorizing Textures
1. Feature extraction
2. Classification
5. Application of Texture Analysis
Let’s start the discussion by understanding the texture.
What is Texture?
There are two kinds of texture: one is tactile and the other one is optical,
where we can feel the tactile texture by touching or seeing the surface.
When we talk about the optical or visual texture, it refers to the shape
and content of the image. Humans can easily diagnose the texture of the
image but making a machine to analyze the texture of the image has its
complexity. In the field of image processing, we can consider the spatial
changes of the brightness intensity of the pixel as the texture of the
image.
In image processing, textural images are those images in which a
specific pattern of texture distribution is repeated sequentially
throughout the image. The below image can be a representation of the
textural image where part (b), represents the repeated pattern on the
texture.
Going through the above points, we can define and understand the
texture. Now the main aim of the article is to understand texture
analysis. In the next section, we will see how we can define and
generalize the texture analysis.
What is Texture Analysis?
Till now we have got an understanding of the texture in the image data.
The aim of the article is to discuss how machines understand the texture
of images so that machines can be capable of performing different tasks
of image processing. Understanding the texture of the images requires
texture analysis and we can consider texture analysis as a whole subject.
In the general view of the textural analysis, we can find the areas like the
following.
Considering the above representation, the texture analysis can be
categorized into four categories: texture segmentation, texture synthesis,
texture classification, texture shape.
Let’s have a small discussion about the categories so that we will have a
proper vision about the areas of image processing where they can be
used.
Texture Segmentation: In image data, we can find out the difference
between the image areas in the context of the texture. By texture
segmentation, we find different boundaries of the different textures in
the image. We can also say that, in texture segmentation, we compare
different areas of the images if the textual characteristics are different
and define them by assigning the boundaries.
Texture Synthesis: In image synthesis, we use methods to make images
that have a similar texture as the images we have as input. This part of
the texture analysis is being used in the creation of computer games and
image graphics.
Texture Shape Extraction: In this section, we try to extract the 3D
view and areas of the images. Normally these areas are covered with a
unique or specific texture. This section is useful in analyzing the shape
and structure of the objects in the image using the image’s textual
properties and spatial relationship of textures with each other.
Texture classification: We can consider it as the most important lesson
of texture analysis which is responsible for describing the type of image
texture. Texture classification is the process of assigning an unknown
sample of textures from the image to any predefined texture class.
Here we have seen a basic introduction to the texture analysis and the
parts of the texture analysis. Now we are required to know the
challenges we may face in the texture analysis.
Challenges in Texture Analysis
Talking about the real world, we can face two major challenges in
texture analysis. These major challenges are as follows:
 Rotation image
 Noise image
We can say that these challenges in texture analysis and image
classification can have various destructive effects. So if we are applying
texture analysis methods for classification against these challenges, the
methods are not sustainable. In practice, the performance level of the
process can be reduced severely. We always want to make the analyzing
and categorizing process of the images robust and stable while
neutralizing the effect of these challenges.
Also, there can be various chances of images to differ from each other in
terms of scale, viewpoint, brightness or intensity of light. Formally, this
causes challenges in texture classification. To reduce the effects of the
challenges, various methods and logic have been introduced. Also, we
can simply classify the texture using feature extraction. Let’s take a look
at the feature extraction for categorizing texture.
Feature Extraction Method for Categorizing Textures
As we have discussed in the above section, the classification of texture is
one of the most important parts of texture analysis and the basic idea
behind it is to provide the labels to samples of any image according to
the class of texture. We can perform the classification using feature
extraction from the images. We can split the process into two parts as
follows:
1. The feature extraction part: In this part, we try to extract the
textual properties of the images and the motive of this part is to
make a model which can deal with every texture of the image that
exists in training time.
2. The classification part: In this part, we perform the texture
analysis on the test images with the same techniques which we
applied for the training images and apply a classification algorithm
which can be a statistic or deep learning algorithm.
The images get examined by the feature extractor and then texture
classification is done by the classification algorithm. The basic
representation of the procedure can be given by the following image:
Let’s have a look at a more descriptive definition of these two parts.
Feature extraction
As we have discussed in the above points, the basic idea behind this part
is to extract texture features from the images, and for this procedure, we
are required to have a model for every texture available in the training
images. These features can be discrete histogram, numerical, empirical
distribution, and texture features such as contrast, spatial structure,
direction, etc. The extracted texture feature can be used for
teaching classification. There can be various ways to classify texture and
the efficiency of these ways can be dependent on the type of texture
features extracted. These methods can be divided into the following
groups:
1. Statistical methods
2. Structural methods
3. Model-based methods
4. Transformer methods
We can use any of the methods for extracting features from the images.
Classification
In the second stage of the process, we perform classification on the
extracted texture features based on the machine learning algorithms with
classification algorithms. Using the classification method, appropriate
classes for each texture are selected. Using the comparison between the
vector of the extracted texture feature from the extraction part of the
process and the vector of the selection test phase characteristics, we
determine its classes. This step is repeated for every image presented in
the. The estimated classes for evaluation with the actual class are
adapted and the recognition rate is calculated which shows the efficiency
of the implemented algorithm. Normally applied accuracy measure is:
Classification accuracy = (correct matches / number of test image) ×100
Here we have seen how the texture classification can be applied to the
images which is an important part of the texture analysis.
Application of Texture Analysis
In the above sections of the article, we have seen that textures present in
the image are the precious information that can be utilized for various
tasks related to image processing. Some of the tasks and applications
that can be performed using texture analysis are as follows:
1. Face detection
2. Tracking objects in the videos
3. Diagnosis of product quality
4. Medical image analysis
5. Remote sensing
6. Vegetation

Module 2
Transformation: Orthogonal, Euclidean, Affine and Projective.
In geometric transformations, "orthogonal" refers to a transformation that
preserves angles and lengths, "Euclidean" encompasses rotations,
translations, and scaling while preserving distances and angles, "affine"
allows for shearing and scaling in addition to Euclidean transformations while
still preserving parallel lines, and "projective" is the most general type,
allowing for perspective distortion and not necessarily preserving parallelism,
lengths, or angles, but still maintaining collinearity

 Orthogonal Transformation: Rotating an object in 3D space without


changing its size or shape.
 Euclidean Transformation: Moving and rotating an object on a
screen while maintaining its size and shape.

 Affine Transformation: Skewing an image while still keeping parallel


lines parallel

 Projective Transformation: Modeling the perspective distortion seen


in a camera image, where parallel lines appear to converge at a
vanishing point.

 In computer vision, "orthogonal," "Euclidean," "affine," and "projective"


transformations represent different levels of geometric complexity,
with orthogonal being the most restrictive (only rotations), Euclidean
allowing rotations, translations, and reflections, affine adding shearing,
and projective encompassing the most general transformations
including perspective distortion, where parallel lines can appear to
converge; all while maintaining specific geometric properties like
angles and lengths depending on the transformation type.

 Orthogonal Transformation:

 Only involves rotations around a fixed point, preserving distances and


angles perfectly.

 Represented by an orthogonal matrix where the determinant is always


+1 or -1.

 Euclidean Transformation:

 Includes rotations, translations, and reflections, maintaining distances


and angles between points.

 Considered a subset of affine transformations.

 Affine Transformation:

 Allows for scaling, shearing, rotation, and translation, preserving


parallelism of lines but not necessarily angles or lengths between
vectors.

 Useful for tasks like image alignment where perspective distortion is


not significant.

 Projective Transformation:

 Most general transformation, including perspective projection which


can make parallel lines appear to converge.
 Represented by a 3x3 homogeneous transformation matrix, allowing
for mapping between arbitrary quadrilaterals.

Affine Transformations:
Affine transformations are the simplest form of transformation. These
transformations are also linear in the sense that they satisfy the following
properties:
 Lines map to lines
 Points map to points
 Parallel lines stay parallel
Some familiar examples of affine transforms
are translations, dilations, rotations, shearing, and reflections. Furthermore, any
composition of these transformations (like a rotation after a dilation) is another
affine transform.
Fourier transform
The Fourier transform is a mathematical operation that analyzes the frequency
components of an image in computer vision. It's a useful tool for understanding the
features of an image or signal.
The Fourier Transform is a mathematical tool used to decompose a signal into its
frequency components. In the case of image processing, the Fourier Transform can
be used to analyze the frequency content of an image, which can be useful for tasks
such as image filtering and feature extraction.
In this article, we will discuss how to find the Fourier Transform of an image using
the OpenCV Python library. We will begin by explaining the basics of the Fourier
Transform and its application in image processing, then we will move on to the
steps involved in finding the Fourier Transform of an image using OpenCV.
Basics of the Fourier Transform
The Fourier Transform decomposes a signal into its frequency components by
representing it as a sum of sinusoidal functions. For a signal represented as a
function of time, t, the Fourier Transform is given by the following equation:

Where is the Fourier Transform of the signal f(t), and f is the frequency in
Hertz (Hz). The Fourier Transform can be thought of as a representation of the
signal in the frequency domain, rather than the time domain.
In the case of image processing, the Fourier Transform can be used to analyze the
frequency content of an image. This can be useful for tasks such as image filtering,
where we want to remove certain frequency components from the image, or feature
extraction, where we want to identify certain frequency patterns in the image.
Steps to find the Fourier Transform of an image using OpenCV
Step 1: Load the image using the cv2.imread() function. This function takes in the
path to the image file as an argument and returns the image as a NumPy array.
Step 2: Convert the image to grayscale using the cv2.cvtColor() function. This is
optional, but it is generally easier to work with grayscale images when performing
image processing tasks.
Step 3: Use the cv2.dft() function to compute the discrete Fourier Transform of
the image. This function takes in the image as an argument and returns the Fourier
Transform as a NumPy array.
Step 4: Shift the zero-frequency component of the Fourier Transform to the center
of the array using the numpy.fft.fftshift() function. This step is necessary because
the cv2.dft() function returns the Fourier Transform with the zero-frequency
component at the top-left corner of the array.
Step 5: Compute the magnitude of the Fourier Transform using
the numpy.abs() function. This step is optional, but it is generally easier to
visualize the frequency content of an image by looking at the magnitude of the
Fourier Transform rather than the complex values.
Step 6: Scale the magnitude of the Fourier Transform using
the cv2.normalize() function. This step is also optional, but it can be useful for
improving the contrast of the resulting image.
Step 7: Use the cv2.imshow() function to display the magnitude of the Fourier
Transform.
Example 1
Here is the complete example of finding the Fourier Transform of an image
using OpenCV:
Input Image :

Input Image
 Python3

import cv2
import numpy as np
# now we will be loading the image and converting it to grayscale
image = cv2.imread(r"Dhoni-dive_165121_730x419-m.jpg")

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Compute the discrete Fourier Transform of the image


fourier = cv2.dft(np.float32(gray), flags=cv2.DFT_COMPLEX_OUTPUT)

# Shift the zero-frequency component to the center of the spectrum


fourier_shift = np.fft.fftshift(fourier)

# calculate the magnitude of the Fourier Transform


magnitude = 20*np.log(cv2.magnitude(fourier_shift[:,:,0],fourier_shift[:,:,1]))

# Scale the magnitude for display


magnitude = cv2.normalize(magnitude, None, 0, 255, cv2.NORM_MINMAX,
cv2.CV_8UC1)

# Display the magnitude of the Fourier Transform


cv2.imshow('Fourier Transform', magnitude)
cv2.waitKey(0)
cv2.destroyAllWindows()

Output image:
Fourier transform

In this example, we first load the image and convert it to grayscale using the
cv2.imread() and cv2.cvtColor() functions. Then, we compute the discrete Fourier
Transform of the image using the cv2.dft() function and store the result in the
‘fourier’ variable.

Convolution and Filtering


Convolution is one of the most important concepts in signal processing and a
precursor topic to understanding Convolutional Neural Networks. There is a lot of
complex mathematical theory that can be explored. In basic terms, a convolution is
a mathematical operation on two functions that produce a third function. In this
article, we explore an application of convolution in the realm of digital image
processing.
Just to review a bit. Digital pictures are represented by pixel values. The typical
format for storing pixels is by byte or eight bits. 8 bits can store 2⁸ amount of
information. Grayscale images have 1 channel that goes from a scale from 0 to 255
where 0 is black and 255 is white. Colored images typically have 3 channels and
stored as 3 bytes: red, green, blue (RGB) values each ranging from 0 to 255
depending on intensity. Therefore, a picture can be represented as a matrix of
values. The illustration below reveals pixels as one zooms into the photo of Mr.
Musk.

Greyscale Elon Zoomed up


In image convolution, involves a kernel, or matrix that is applied over the input
image’s pixels to generate an output image. The kernel size and values determine
the effect the kernel has on the image. The dimensions of the kernel should be
smaller or equal to that of the input image’s. The kernel typically is square shaped
and has an odd kernel size out of convenience. How is the kernel applied in a
convolution? The first step is that the kernel is flipped both horizontally and
vertically. This is done by definition of a convolution. Using a non-flipped kernel
would be doing a cross-correlation rather than a convolution. In the case of a
symmetric kernel, the cross-correlation is equivalent to its convolution. Flipping or
not flipping the kernel generally does not have a large impact on the resulting
image visually. For the rest of the article, we assume that the kernel visualizations
have already been flipped.

Flipped Kernel values Visualized


The animation below visually demonstrates how a 3x3 kernel is applied over a 5x5
input image generating a 3x3 output image. Note that the kernel slides along the
input image.
Convolution Visualized (Kernel already flipped)
The output image pixels are calculated by performing an element by element
multiplication with the kernel and the covered section of the input image and then
summing them up. Given an example kernel and input image, an example of the
calculation is shown below with the first pixel. In the example, I intentionally used
small numbers for ease of calculation. Additionally, its important to note that the
output pixels of a convolution can yield values outside of 0–255. In some cases, it
may be useful to normalize the results through Histogram Equalization or round
the pixels to the nearest highest/lowest value.
Mathematically, convolution in 2 dimensions is defined as follows:

k, l represents the row, length indices of the kernel respectively. x(n,m) is the input
and y(n,m) is the output images. n,m is the row, column indices of the input and
output images.
Notice that the output image size is smaller than the input image size. A larger
kernel size would further decrease the output image dimensions. One way to fix
this downsizing is to pad the input image. You can populate the padded image by
extending the pixel values at the edge. Extending the edge pixels is one of many
methods of padding. Below shows the input image padded by 1 pixel. The padded
pixels are outlined in blue dotted lines.
Convolution computation illustrated.
The updated illustration with padding is shown below. Now, the output image has
the same dimension as the original input image.

Convolution with Padding.


The values of the kernels have differing effects on the output image. Using an
example image of the dog shown below, here are some resulting images produced
by the following kernels.

Base Image
The following kernel sharpens the image.

Sharpening
One application of convolutional filtering is with edge detection. This kernel is a
diagonal edge detecting kernel. It rewards changes in color along a diagonal.

Diagonal Edge Detector


This is a blurring kernel. The output pixels are determined by a combination of the
central pixel and neighboring pixels.

Histogram processing is a fundamental technique in digital image processing that


plays a crucial role in enhancing the visual quality and improving the
interpretability of images. It involves the analysis and manipulation of the
distribution of pixel intensities within an image, as represented by its
histogram. The histogram of an image is a graphical representation that displays
the frequency of each intensity level present in the image, effectively illustrating
the image’s contrast, brightness, and overall tonal characteristics.
Histogram equalization, a specific form of histogram processing, is a method used
to enhance the contrast and dynamic range of images by redistributing pixel
intensities. It achieves this by modifying the image’s histogram in such a way that
the intensities become more uniformly distributed, stretching or compressing the
intensity levels to cover the entire available range. This not only results in
improved image quality but also facilitates better feature extraction and object
detection in computer vision applications.
In this exploration of histogram processing, we will delve into the principles and
techniques involved in histogram manipulation, with a particular focus on
histogram equalization. We will examine the mathematical foundations,
algorithms, and practical applications of these methods, shedding light on their
significance in various fields, such as medical imaging, remote sensing, and
computer vision. By the end of this discussion, you will have a comprehensive
understanding of how histogram processing can be a powerful tool for enhancing
and analyzing digital images.

Histogram
A histogram is a graphical representation or visual display that shows the
distribution of data. It is commonly used in statistics, data analysis, and various
fields to illustrate the frequency or occurrence of different values within a dataset.
Histograms provide a way to understand the shape, central tendency, and spread of
data, making it easier to identify patterns and trends.
In the context of digital image processing, a histogram is a specific representation
that displays the frequency of each intensity level or color value within an image. It
is a fundamental tool for analyzing the tonal characteristics of an image, showing
how many pixels in the image have a particular intensity value. This information
can be used to evaluate image contrast, brightness, and overall visual quality.
A basic histogram typically consists of bars or bins that correspond to different
intensity or value ranges on the x-axis, while the frequency of pixels falling within
each range is displayed on the y-axis. By examining the shape and distribution of
the histogram, one can gain insights into the image’s characteristics, which can be
useful for image enhancement, processing, and analysis.
import cv2
import numpy as np
import matplotlib.pyplot as plt

# Load an image
image = cv2.imread('pic.jpeg')

# Convert the image to grayscale (if not already)


gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Calculate the histogram


hist = cv2.calcHist([gray_image], [0], None, [256], [0, 256])

# Reshape the histogram for plotting


hist = hist.reshape(-1)
plt.figure(figsize=(10, 5)) # Create a figure with a specified size
plt.subplot(1, 2, 1) # Subplot for the original image
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.title("original image")

plt.subplot(1, 2, 2) # Subplot for the original image


plt.title('Histogram')
plt.xlabel('Pixel Value')
plt.ylabel('Frequency')
plt.plot(hist, color='black')
plt.xlim([0, 256])
plt.show()

histogram of a image
Histogram Equalization
Histogram equalization is a technique used in image processing to enhance the
contrast and dynamic range of an image. It works by redistributing the pixel
intensities in such a way that they become more uniformly distributed across the
entire available intensity range. This can result in an image with improved visual
quality and enhanced details, making it particularly useful for various computer
vision and image analysis applications.
histogram equalization Wikipedia
Histogram equalization applications
1. Medical Imaging: Histogram equalization is used to enhance medical
images, such as X-rays, MRIs, and CT scans, to make subtle details more
visible. This can aid in the detection of anomalies and improve the accuracy
of diagnoses.
2. Satellite and Remote Sensing: In satellite imagery and remote sensing,
histogram equalization can improve the visibility of features on the Earth’s
surface, which is vital for applications like land cover classification and
environmental monitoring.
3. Computer Vision: Histogram equalization is used in computer vision tasks,
such as object detection, recognition, and tracking, to enhance the contrast in
images and make objects and features more distinguishable.
4. Photography and Image Editing: In image editing software, histogram
equalization can be applied to adjust the contrast and brightness of
photographs. This can be especially useful when dealing with underexposed
or overexposed images.
5. Historical Image Restoration: For restoring old and deteriorated images,
histogram equalization can be used to enhance image quality, recover
details, and make historical documents and photos more accessible and
legible.
6. Astronomy: In astronomical image processing, histogram equalization can
bring out details in images of celestial objects, making it easier to study and
analyze distant galaxies, stars, and other astronomical phenomena.
7. Enhancing Low-Light Images: Images taken in low-light conditions can
suffer from poor visibility and high noise. Histogram equalization can help
improve the quality and reveal details in such images.
8. Ultrasound Imaging: In medical ultrasound imaging, histogram equalization
can be applied to enhance the visibility of structures within the body, aiding
in diagnosis.
9. Forensic Analysis: In forensic science, histogram equalization can be used to
enhance surveillance footage and images to better identify individuals and
objects.
10.Document Scanning and OCR: When scanning documents, histogram
equalization can enhance the text and illustrations, making it easier for
optical character recognition (OCR) software to accurately extract text.
11.Enhancing Historical and Cultural Artifacts: For preserving and studying
historical manuscripts, paintings, and artifacts, histogram equalization can
be applied to reveal faded or degraded details.
How to apply Histogram equalization
Then after understanding the application of histogram equalization, we will go to
the details of calculation formulas.
Calculate the Histogram:

Here, δ is the Kronecker delta function, which is 1 if I(x,y)=k and 0 otherwise. This
formula computes the histogram H(k) by counting the frequency of each intensity
level k in the image.
Calculate the Cumulative Distribution Function (CDF)
Here, N is the total number of pixels in the image. This formula computes the
CDF C(k) by summing up the relative frequencies of intensity levels from 0 to k.
Histogram Equalization Mapping
The equalization mapping function E(k) maps the original intensity levels to new
levels, and it is given by

Here, C min is the minimum value of the CDF, and L is the number of possible
intensity levels. This function scales the CDF values to cover the full intensity
range (0 to L−1).
Apply Equalization
Finally, the equalized image equalized I(x,y) is created by applying the equalization
mapping to the original image:

The result is an image with improved contrast due to the redistribution of pixel
intensities, which can enhance the visual quality and reveal details that might be
obscured in the original image. This technique is widely used in image processing
and computer vision for various applications, such as image enhancement and
feature extraction.
import cv2
import matplotlib.pyplot as plt
import numpy as np

# Load a grayscale image


image = cv2.imread('low_brighness.jpg', cv2.IMREAD_GRAYSCALE)

# Apply histogram equalization


equalized_image = cv2.equalizeHist(image)

# Calculate histograms for the original and equalized images


hist_original = cv2.calcHist([image], [0], None, [256], [0, 256])
hist_equalized = cv2.calcHist([equalized_image], [0], None, [256], [0, 256])

# Plot the original and equalized images along with their histograms
plt.figure(figsize=(12, 8))

plt.subplot(2, 2, 1)
plt.title('Original Image')
plt.imshow(image, cmap='gray')

plt.subplot(2, 2, 2)
plt.title('Equalized Image')
plt.imshow(equalized_image, cmap='gray')

plt.subplot(2, 2, 3)
plt.title('Original Histogram')
plt.plot(hist_original, color='black')
plt.xlim([0, 256])

plt.subplot(2, 2, 4)
plt.title('Equalized Histogram')
plt.plot(hist_equalized, color='black')
plt.xlim([0, 256])

plt.show()
MODULE 3
Basics of Edge
Edges can be defined as the points in an image where the intensity of pixels
changes sharply. These changes often correspond to the physical boundaries of objects
within the scene.
Characteristics of Edges
1. Gradient Magnitude: The edge strength is determined by the gradient
magnitude, which measures the rate of change in intensity.
2. Gradient Direction: The direction of the edge is perpendicular to the
direction of the gradient, indicating the orientation of the boundary.
3. Localization: Edges should be well-localized, meaning they should
accurately correspond to the true boundaries in the image.
4. Noise Sensitivity: Edges can be affected by noise, making it essential to use
techniques that can distinguish between actual edges and noise.
Types of Edge Detection
Edge detection techniques can be broadly categorized based on the method they
use to identify edges. Here are the main types:
1. Gradient-Based Methods
 Sobel Operator
 Roberts Cross Operator
 Prewitt Operator
2. Second-Order Derivative Methods
 Laplacian of Gaussian (LoG)
 Difference of Gaussians (DoG)
3. Optimal Edge Detection
 Canny Edge Detector
Laplacian of Gaussian (LoG)
The Laplacian of Gaussian (LoG) is a method used to detect edges in an image. It
involves smoothing the image with a Gaussian filter to reduce noise, followed by
applying the Laplacian operator to highlight regions of rapid intensity change. This
combination allows for effective edge detection while minimizing the impact of noise.
Mathematical Formulation
1. Gaussian Smoothing: The image is first smoothed using a Gaussian filter to
reduce noise. The Gaussian filter is defined
as: G(x,y)=12πσ2ex2+y22σ2G(x,y)=2πσ21e2σ2x2+y2, σσ is the standard
deviation of the Gaussian.
2. Laplacian Operator: The Laplacian operator is then applied to the

as: ∇2f(x,y)=∂2f∂x2+∂2f∂y2∇2f(x,y)=∂x2∂2f+∂y2∂2f
smoothed image. The Laplacian is defined

3. LoG: The combined LoG operator is the result of convolving the Gaussian-
smoothed image with the
Laplacian: LoG(x,y)=∇2(G(x,y)∗I(x,y))LoG(x,y)=∇2(G(x,y)∗I(x,y))
Advantages
 Reduces noise through Gaussian smoothing before edge detection.
 Effective at detecting edges of various orientations and scales.
Disadvantages
 Computationally intensive due to the convolution operations.
 Sensitive to the choice of σ\sigmaσ (standard deviation of the Gaussian).
Difference of Gaussian (DoG)
The Difference of Gaussian (DoG) is an edge detection technique that
approximates the Laplacian of Gaussian by subtracting two Gaussian-blurred versions
of the image with different standard deviations. This method is simpler and faster to
compute than LoG while providing similar edge detection capabilities.
Mathematical Formulation:
1. Gaussian Smoothing: The image is smoothed using two Gaussian filters
with different standard deviations, σ1σ1 and σ2σ2
: G1(x,y)=12πσ12ex2+y22σ12,G2(x,y)=12πσ22ex2+y22σ22G1(x,y)=2πσ12
1e2σ12x2+y2,G2(x,y)=2πσ221e2σ22x2+y2
2. Difference of Gaussian: The DoG is computed by subtracting the two
Gaussian-blurred images: DoG(x,y)=(Gσ1(x,y)
−Gσ2(x,y))∗I(x,y)DoG(x,y)=(Gσ1(x,y)−Gσ2(x,y))∗I(x,y)
Advantages
 Computationally more efficient than LoG.
 Provides good edge detection by approximating the Laplacian of Gaussian.
Disadvantages
 Less accurate than LoG due to the approximation.
 Sensitive to the choice of the standard deviations (σ1σ1 and σ2σ2) for the
Gaussian filters.
Canny Edge Detector
The Canny Edge Detector is a multi-stage algorithm known for its accuracy and
robustness in detecting edges. Introduced by John Canny in 1986, this method aims to
find edges by looking for the local maxima of the gradient of the image. It optimizes the
edge detection process based on three criteria: low error rate, good localization, and
minimal response to noise.
Steps Involved:
1. Smoothing: The first step involves reducing noise in the image using a
Gaussian filter: G(x,y)=12πσ2ex2+y22σ2G(x,y)=2πσ21e2σ2x2+y2. The
image is convolved with this Gaussian kernel to produce a smoothed image.
2. Finding Gradients: The gradients of the smoothed image are computed
using finite difference approximations, typically with the Sobel
operator: [−101−202−101][−1−2−1000121]⎣⎡−1−2−1000121⎦⎤⎣⎡−101
−202−101⎦⎤ The gradient magnitude is then computed
as: G=Gx2+Gy2,θ=tan⁡−1GyGxG=Gx2+Gy2,θ=tan−1GxGy.
3. Non-Maximum Suppression: This step involves thinning the edges by
suppressing non-maximum gradient values. Only the local maxima in the
direction of the gradient are preserved, resulting in a set of thin edges.
4. Double Thresholding: Two thresholds, Tlow and ThighTlow and Thigh ,
are applied to classify the gradient magnitudes into strong, weak, and non-
relevant pixels:
 Strong edges: G≥ThighG≥Thigh
 Weak edges: Tlow≤G<ThighTlow≤G<Thigh
 Non-relevant pixels: G<TlowG<Tlow
5. Edge Tracking by Hysteresis: Weak edges connected to strong edges are
preserved, while others are discarded. This step ensures continuity and
accuracy in edge detection by linking weak edge pixels that form a
continuous line with strong edges.
Advantages
 High accuracy and robustness to noise.
 Good localization of edges.
 Produces thin, well-defined edges.
Disadvantages
 Computationally intensive due to multiple processing steps.
 Sensitive to the choice of thresholds for double thresholding.
Line detectors (Hough Transform)
The Hough Transform is a widely applied algorithm in computer
vision for feature extraction. In theory, it can detect any kind of
shape, e.g. lines, circles, ellipses, etc.
Hough transform in its simplest from can be used to detect
straight lines in an image.
Algorithm
A straight line is the simplest boundary we can recognize in an
image. Multiple straight lines can form a much complex boundary.
We transform the image space into hough space. By doing this
we convert a line in image space to a point on hough space.

The equation of the line in the image space is of the form y = mx


+ c where m is the slope and c is the y-intercept of the line. This line
will be transformed to a point of the form (m, c) in the hough space.
But in this representation m goes to infinity for vertical lines. So let us
use the polar coordinates instead.
The line is represented by the length of that segment ρ , and the
angle θ it makes with the x-axis. This line will be transformed to a
point of the form (ρ,θ) in the hough space.
The Hough transform constructs a histogram array representing
the parameter space (i.e., an M x N matrix, for M different values of
the radius ρ and N different values of angle θ). For each parameter
combination,ρ and θ we then find the number of non-zero pixels in
the input image that would fall close to the corresponding line, and
increment the array at position (ρ,θ) appropriately.
Intuition for line detection
The intersection of multiple lines in image space represent
corresponding multiple points in hough space.

Similarly the reverse i.e lines intersecting at a point (m, c) in


hough space can be transformed to a line y = mx + c in image space.
If we have a line made up of many segments or points close to
the same line equation in the image space, that turns into many
intersecting lines in hough space.
So, consider a line in the image space which is an edge detected
and has small discontinuities. To find the continous line in an image
we can transform this dicontinous line in image space to hough space
and look for intersection points in hough space. This intersection
point in hough space will represent the continous line in image space.

Code
import numpy as np
import matplotlib.pyplot as plt
import cv2%matplotlib inline# Read in the image
image = cv2.imread('images/phone.jpg')# Change color to RGB (from
BGR)
image = cv2.cvtColor(image,
cv2.COLOR_BGR2RGB)plt.imshow(image)
Performing Edge detection
# Convert image to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)# Define our
parameters for Canny
low_threshold = 50
high_threshold = 100
edges = cv2.Canny(gray, low_threshold,
high_threshold)plt.imshow(edges, cmap='gray')

Find lines using Hough transform


# Define the Hough transform parameters
# Make a blank the same size as our image to draw on
rho = 1
theta = np.pi/180
threshold = 60
min_line_length = 50
max_line_gap = 5line_image = np.copy(image) #creating an image
copy to draw lines on# Run Hough on the edge-detected image
lines = cv2.HoughLinesP(edges, rho, theta, threshold, np.array([]),
min_line_length, max_line_gap)# Iterate over the
output "lines" and draw lines on the image copy
for line in lines:
for x1,y1,x2,y2 in line:
cv2.line(line_image,(x1,y1),(x2,y2),(255,0,0),5)

plt.imshow(line_image)

You might also like