Unit 1 CV

Download as pdf or txt
Download as pdf or txt
You are on page 1of 78

UNIT -1 Image Processing Foundations:

Review of image processing techniques – classical


filtering
operations
– thresholding techniques –
edge detection techniques –
corner and interest point
detection –
mathematical morphology – texture.

❖ Image processing technique : -


Before we jump into image processing, we need to first understand what
exactly constitutes an image. An image is represented by its dimensions
(height and width) based on the number of pixels. For example, if the
dimensions of an image are 500 x 400 (width x height), the total number of
pixels in the image is 200000.

This pixel is a point on the image that takes on a specific shade, opacity or
color. It is usually represented in one of the following:

● Grayscale - A pixel is an integer with a value between 0 to 255 (0 is


completely black and 255 is completely white).
● RGB - A pixel is made up of 3 integers between 0 to 255 (the
integers represent the intensity of red, green, and blue).
● RGBA - It is an extension of RGB with an added alpha field, which
represents the opacity of the image.

Image processing requires fixed sequences of operations that are


performed at each pixel of an image. The image processor performs the
first sequence of operations on the image, pixel by pixel. Once this is fully
done, it will begin to perform the second operation, and so on. The output
value of these operations can be computed at any pixel of the image.

➔What Is Image Processing?

Image processing is the process of transforming an image into a digital form and
performing certain operations to get some useful information from it. The image
processing system usually treats all images as 2D signals when applying certain
predetermined signal processing methods.

➔Types of Image Processing

There are five main types of image processing:

● Visualization - Find objects that are not visible in the image


● Recognition - Distinguish or detect objects in the image
● Sharpening and restoration - Create an enhanced image from the
original image
● Pattern recognition - Measure the various patterns around the
objects in the image
● Retrieval - Browse and search images from a large database of
digital images that are similar to the original image.

Deep learning has had a tremendous impact on various fields of technology


in the last few years. One of the hottest topics buzzing in this industry is
computer vision, the ability for computers to understand images and videos
on their own. Self-driving cars, biometrics and facial recognition all rely on
computer vision to work. At the core of computer vision is image
processing.

➔What Is an Image?

Before we jump into image processing, we need to first understand what exactly
constitutes an image. An image is represented by its dimensions (height and width)
based on the number of pixels. For example, if the dimensions of an image are 500 x
400 (width x height), the total number of pixels in the image is 200000.

This pixel is a point on the image that takes on a specific shade, opacity or color. It is
usually represented in one of the following:
● Grayscale - A pixel is an integer with a value between 0 to 255 (0 is completely
black and 255 is completely white).
● RGB - A pixel is made up of 3 integers between 0 to 255 (the integers
represent the intensity of red, green, and blue).
● RGBA - It is an extension of RGB with an added alpha field, which represents
the opacity of the image.

Image processing requires fixed sequences of operations that are performed at each
pixel of an image. The image processor performs the first sequence of operations on
the image, pixel by pixel. Once this is fully done, it will begin to perform the second
operation, and so on. The output value of these operations can be computed at any pixel
of the image.

➔What Is Image Processing?

Image processing is the process of transforming an image into a digital form and
performing certain operations to get some useful information from it. The image
processing system usually treats all images as 2D signals when applying certain
predetermined signal processing methods.

➔Types of Image Processing

There are five main types of image processing:

● Visualization - Find objects that are not visible in the image


● Recognition - Distinguish or detect objects in the image
● Sharpening and restoration - Create an enhanced image from the original
image
● Pattern recognition - Measure the various patterns around the objects in the
image
● Retrieval - Browse and search images from a large database of digital images
that are similar to the original image

➔Components of Image Processing

Computer

A general-purpose computer, which may be anything from a PC to a supercomputer, is


used in an image processing system. Sometimes, specifically built computers are
utilized in specialized applications to reach a specified degree of performance.

Hardware for Specialized Image Processing

It comprises the digitizer and hardware that can carry out basic operations, including an
Arithmetic Logic Unit (ALU), which can carry out simultaneous arithmetic and logical
operations on whole pictures.

Massive Storing

In applications involving image processing, the skill is essential. The three main types of
digital storage for image processing applications are as follows: Three types of storage
exist (1) short-term storage, (2) online storage for quick recall (3) archive storage, which
is characterized by rare access.
In applications involving image processing, the skill is essential. The three main types of
digital storage for image processing applications are as follows: Three types of storage
exist (1) short-term storage, (2) online storage for quick recall (3) archive storage, which
is characterized by rare access.

➔ Camera Sensors

It alludes to perception. The image sensor's primary function is to collect incoming light,
transform it into an electrical signal, measure that signal, and then output it to
supporting electronics. It consists of a two-dimensional array of light-sensitive
components that convert photons into electrons. Images are captured by equipment
like digital cameras using image sensors like CCD and CMOS. Two components are
often needed on image sensors to collect digital pictures. The first is an actual tool
(sensor) that can detect the energy emitted by the object we want to turn into an image.
The second is a digitizer, which transforms a physical sensing device's output into
digital form.

➔ Software

The image processing software comprises specialized modules that carry out particular
functions.

➔ Hardcopy Equipment

Laser printers, film cameras, heat-sensitive equipment, inkjet printers, and digital
equipment like optical and CDROM discs are just a few examples of the instruments
used to record pictures.
➔ Networking

To send visual data through a networked computer, it is a necessary component. The


most important factor in picture transmission is bandwidth since image processing
applications require vast amounts of data.

➔Fundamental Image Processing Steps


1. Image Acquisition

Image acquisition is the first step in image processing. This step is also
known as preprocessing in image processing. It involves retrieving the
image from a source, usually a hardware-based source.

2. Image Enhancement

Image enhancement is the process of bringing out and highlighting certain


features of interest in an image that has been obscured. This can involve
changing the brightness, contrast, etc.

3. Image Restoration

Image restoration is the process of improving the appearance of an image.


However, unlike image enhancement, image restoration is done using
certain mathematical or probabilistic models.
4. Color Image Processing

Color image processing includes a number of color modeling techniques in


a digital domain. This step has gained prominence due to the significant
use of digital images over the internet.

5. Wavelets and Multiresolution Processing

Wavelets are used to represent images in various degrees of resolution.


The images are subdivided into wavelets or smaller regions for data
compression and for pyramidal representation.

6. Compression

Compression is a process used to reduce the storage required to save an


image or the bandwidth required to transmit it. This is done particularly
when the image is for use on the Internet.

7. Morphological Processing

Morphological processing is a set of processing operations for morphing


images based on their shapes.

8. Segmentation
Segmentation is one of the most difficult steps of image processing. It
involves partitioning an image into its constituent parts or objects.

9. Representation and Description

After an image is segmented into regions in the segmentation process,


each region is represented and described in a form suitable for further
computer processing. Representation deals with the image’s
characteristics and regional properties. Description deals with extracting
quantitative information that helps differentiate one class of objects from
the other.

10. Recognition

Recognition assigns a label to an object based on its description.

❖ Classical filtering :-
➔ Introduction

Filtering is a technique for modifying or enhancing an image. For example,

you can filter an image to emphasize certain features or remove other

features. Filtering is a neighborhood operation, in which the value of any

given pixel in the output image is determined by applying some algorithm

to the values of the pixels in the neighborhood of the corresponding input


pixel. A pixel’s neighborhood is some set of pixels, defined by their

locations relative to that pixel.

Linear & Non-Linear


Linear Filtering occurs when the operation performed on each pixel is

a simple mathematical operation with a scalar where the result is

similar for all pixels. For example, if we are multiplying the intensity

of each pixel by 2, then the entire image gets intensified by a factor of

two which means we have effectively multiplied the image matrix by 2.

On the other hand in a non-linear operation, the overall effect on the

image cannot be predicted just by the operation performed on each

pixel. For example, squaring each pixel is not the same as squaring the

image matrix.

➔Linear Filtering
In MATLAB, linear filtering of images is implemented through

two-dimensional convolution or correlation. In convolution, the value


of an output pixel is computed by multiplying elements of two

matrices and summing the results. Correlation is similar to

convolution with the only difference being that the kernel is flipped

180 degrees before the process.

One of these matrices represents the image itself, while the other

matrix is the filter.

For example, a filter might be:


k = [9 6 4; 5 8 3; 2 4 9];

This filter representation is known as a convolution kernel. The MATLAB

function conv2 implements image filtering by applying your convolution

kernel to an image matrix. conv2 takes as arguments an input image and a

filter and returns an output image. For example, in this call, k is the

convolution kernel, A is the input image, and B is the output image.

Another function which can be used is the imfilter function.


Non-Linear FIltering
In addition to convolution, there are many other filtering operations

you can implement through sliding neighborhoods. Many of these

operations are nonlinear in nature. For example, you can implement a

sliding neighborhood operation where the value of an output pixel is

equal to the standard deviation of the values of the pixels in the input

pixel’s neighborhood.
You can use the nlfilter function to implement a variety of sliding

neighborhood operations.nlfilter takes as input arguments an image,

a neighborhood size, and a function that returns a scalar, and returns

an image of the same size as the input image. The value of each pixel

in the output image is computed by passing the corresponding input

pixel’s neighborhood to the function. For example, this call computes

each output pixel by taking the standard deviation of the values of the

input pixel’s 3-by-3 neighborhood (that is, the pixel itself and its eight

contiguous neighbors):
You can write an M-file to implement a specific function, and then use

this function with nlfilter. For example, this command processes the

matrix I 2-by-3 neighborhoods with a function called myfun.m:You

can also use an inline function; in this case, the function name appears

in the nlfilter call without quotation marks. The example below uses

nlfilter to set each pixel to the maximum value in

its 3-by-3 neighborhood.


The result in the case of the previous command is quite apparent as

some level of noise has appeared in the image.

❖ Image Thresholding Techniques :-

Image thresholding is a technique in computer vision that converts a


grayscale image into a binary image by setting each pixel to either black or
white based on a specific threshold value. The article provides a
comprehensive overview of various image thresholding techniques used
in computer vision, detailing their processes, pros, cons, and
applications.

➔What is Image Thresholding?

Image thresholding works on grayscale images, where each pixel has an

intensity value between 0 (black) and 255 (white). The thresholding

process involves converting this grayscale image into a binary image,

where pixels are classified as either foreground (object of interest) or

background based on their intensity values and a predetermined


threshold. Pixels with intensities above the threshold are assigned to the

foreground, while those below are assigned to the background.

Key Points:

● Process: Compare each pixel's intensity to a threshold value.

● Result: Pixels above the threshold are set to white (255), and

those below are set to black (0).

● Purpose: Simplifies the image, making it easier to identify and

analyze regions of interest.

Thresholding Techniques in Computer Vision

1. Simple Thresholding

Simple thresholding uses a single threshold value to classify pixel intensities.

If a pixel's intensity is greater than the threshold, it is set to 255 (white);

otherwise, it is set to 0 (black).

[Tex]\begin{equation} T(x, y) = \begin{cases} 0 & \text{if } I(x, y) \leq T \\ 255

& \text{if } I(x, y) > T \end{cases} \end{equation} [/Tex]


In this formula:

● I(x,y) is the intensity of the pixel at coordinates (x, y).

● T is the threshold value.

● If the pixel intensity I(x,y) is less than or equal to the threshold T, the

output pixel value is set to 0 (black).

● If the pixel intensity I(x,y) is greater than the threshold T, the output

pixel value is set to 255 (white).

Pros of Simple Thresholding

● Simple and easy to implement.

● Computationally efficient.

Cons of Simple Thresholding

● Ineffective for images with varying lighting conditions.

● Requires manual selection of the threshold value.

2. Adaptive Thresholding

Adaptive thresholding is used for images with non-uniform illumination.

Instead of a single global threshold value, it calculates the threshold for

small regions of the image, which allows for better handling of varying

lighting conditions.
Pros of Adaptive Thresholding

● Handles varying illumination well.

● More accurate for complex images.

Cons of Adaptive Thresholding

● More computationally intensive.

● Requires careful selection of neighborhood size and method

parameters.

3. Otsu's Thresholding
Otsu's method is an automatic thresholding technique that calculates the

optimal threshold value by minimizing the intra-class variance (the variance

within the foreground and background classes).

Steps to perform Otsu's Thresholding

1. Compute the histogram and probabilities of each intensity level.

2. Compute the cumulative sums, means, and variances for all

threshold values.

3. Select the threshold that minimizes the within-class variance.

Pros of Otsu's Thresholding

● Automatic selection of the threshold value.

● Effective for bimodal histograms.

Cons of Otsu's Thresholding

● Assumes a bimodal histogram, which may not be suitable for all

images.

● Computationally more intensive than simple thresholding.


4. Multilevel Thresholding

Multilevel thresholding extends simple thresholding by using multiple

threshold values to segment the image into more than two regions. This is

useful for images with complex structures and varying intensities.

Approaches of Multilevel Thresholding

● Otsu's Method Extension: Extending Otsu's method to multiple

levels.

● Optimization Techniques: Using optimization algorithms to

determine multiple thresholds.

Pros of Multilevel Thresholding

● Can segment images into multiple regions.

● Useful for images with complex intensity distributions.

Cons of Multilevel Thresholding

● More computationally intensive.

● Requires careful selection of the number of thresholds.

5. Color Thresholding

In color images, thresholding can be applied to each color channel (e.g., RGB,

HSV) separately. This method leverages color information to segment

objects.
Approaches of Color Thresholding

● Manual Thresholding: Setting thresholds for each color channel

manually.

● Automatic Thresholding: Using methods like Otsu's method for

each channel.

Pros of Color Thresholding

● Effective for segmenting objects based on color.

● Can handle images with rich color information.

Cons of Color Thresholding

● More complex than grayscale thresholding.

● Requires careful selection of thresholds for each channel.

6. Local Thresholding

Local thresholding calculates a different threshold for each pixel based on its

local neighborhood. This method is effective for images with non-uniform

illumination or varying textures.

Techniques of Local Thresholding

. Niblack's Method
● The threshold is calculated as the mean of the local neighborhood

minus a constant times the standard deviation.

● T(x,y)=μ(x,y)+kσ(x,y)
● Here,

○ μ(x,y) is the mean and σ(x,y) is the standard deviation

of the local neighborhood

○ k is a constant.

2. Sauvola's Method

● An improvement over Niblack's method that adjusts the constant

factor dynamically based on the mean and standard deviation.

Pros of Local Thresholding

● Handles non-uniform illumination well.

● More accurate for textured images.


Cons of Local Thresholding

● Computationally intensive.

● Sensitive to parameter selection.

7. Global Thresholding

Global thresholding uses a single threshold value for the entire image. This

technique is suitable for images with uniform lighting and clear contrast

between the foreground and background.

Pros of Global Thresholding

● Simple and easy to implement.

● Computationally efficient.

Cons of Global Thresholding

● Not suitable for images with varying illumination.

● Requires manual selection of the threshold value

8. Iterative Thresholding

Iterative thresholding starts with an initial guess for the threshold value and

iteratively refines it based on the mean intensity of the pixels above and
below the threshold. The process continues until the threshold value

converges.

Steps to perform Iterative Thresholding

1. Choose an initial threshold value


2. To
3. T
4. o

5. ​

6. ​.

7. Segment the image into two classes


8. C1
9. C
10. 1

11. ​

12. and
13. C2
14. C
15. 2

16. ​

17. ​using
18. Tk
19. T
20. k

21. ​

22. ​.

23. Compute the mean intensities


24. μ1
25. μ
26. 1

27. ​

28. and
29. μ2
30. μ
31. 2

32. ​

33. of
34. C1
35. C
36. 1

37. ​

38. and
39. C2
40. C
41. 2

42. ​

43. ​.

44. Update the threshold value:

● Tk+1=μ1+μ22
● T
● k+1
● ​

● =
● 2
● μ
● 1
● ​
● +μ

● 2

● ​

● ​

45. Repeat steps 2-4 until


46. ∣Tk+1−Tk∣<ϵ
47. ∣T
48. k+1

49. ​

50. −T
51. k

52. ​

53. ∣<ϵ

Pros of Iterative Thresholding

● Provides an automatic way to determine the threshold.

● Suitable for images with a clear distinction between foreground and

background.

Cons of Iterative Thresholding

● May require several iterations to converge.

● Not effective for images with complex intensity distributions.

Applications of Thresholding
Thresholding techniques are used in various applications, including:

1. Document Image Analysis: Thresholding is widely used to binarize

text in scanned documents, making it easier for Optical Character

Recognition (OCR) systems to process the text.

2. Medical Imaging: In medical imaging, thresholding is used to

segment anatomical structures in MRI or CT scans, aiding in

diagnosis and treatment planning.

3. Industrial Inspection: Thresholding is employed in industrial

inspection systems to detect defects in manufactured products,

ensuring quality control.

4. Object Detection: In survillance footage or robotic vision systems,

thresholding is used to identify and track objects, enhancing security

and automation.

Conclusion

Thresholding is a crucial technique in computer vision for image

segmentation. The choice of thresholding technique depends on the specific

requirements of the application and the characteristics of the image. Simple

thresholding and global thresholding are suitable for images with uniform

lighting and clear contrast, while adaptive thresholding and local


thresholding are more effective for images with varying illumination and

textures. Techniques like Otsu's method and iterative thresholding provide

automatic ways to determine the optimal threshold value, making them

useful in diverse applications. Understanding these techniques and their

appropriate use cases is essential for effective image segmentation and

analysis in computer vision.

By leveraging the strengths and understanding the limitations of each

thresholding technique, practitioners can choose the most suitable method

for their specific needs, leading to more accurate and efficient image

processing workflows.

❖ Edge detection

Edge detection is a fundamental image processing technique for identifying and locating

the boundaries or edges of objects in an image. It is used to identify and detect the

discontinuities in the image intensity and extract the outlines of objects present in an
image. The edges of any object in an image (e.g. flower) are typically defined as the

regions in an image where there is a sudden change in intensity. The goal of edge

detection is to highlight these regions.

There are various types of edge detection techniques, which include the following:

● Sobel Edge Detection

● Canny Edge Detection

● Laplacian Edge Detection

● Prewitt Edge Detection

● Roberts Cross Edge Detection

● Scharr edge detection

The goal of edge detection algorithms is to identify the most significant edges within an

image or scene. These detected edges should then be connected to form meaningful

lines and boundaries, resulting in a segmented image that contains two or more distinct

regions. The segmented results are subsequently used in various stages of a machine

vision system for tasks such as object counting, measuring, feature extraction, and

classification.

Edge Detection Concepts


Let’s talk through a few of the main concepts you need to know to understand edge

detection and how it works.

Edge Models

Edge models are theoretical constructs used to describe and understand the different

types of edges that can occur in an image. These models help in developing algorithms

for edge detection by categorizing the types of intensity changes that signify edges. The

basic edge models are Step, Ramp and Roof. A step edge represents an abrupt

change in intensity, where the image intensity transitions from one value to another in a

single step. A ramp edge describes a gradual transition in intensity over a certain

distance, rather than an abrupt change. A roof edge represents a peak or ridge in the

intensity profile, where the intensity increases to a maximum and then decreases.

Image Intensity Function


The image intensity function represents the brightness or intensity of each pixel in a

grayscale image. In a color image, the intensity function can be extended to include

multiple channels (e.g., red, green, blue in RGB images).

First and Second Derivative

The first derivative of an image measures the rate of change of pixel intensity. It is

useful for detecting edges because edges are locations in the image where the intensity

changes rapidly. It detects edges by identifying significant changes in intensity. The first

derivative can be approximated using gradient operators like the Sobel, Prewitt, or

Scharr operators.
The second derivative measures the rate of change of the first derivative. It is useful for

detecting edges because zero-crossings (points where the second derivative changes

sign) often correspond to edges. It detects edges by identifying zero-crossings in the

rate of change of intensity. The second derivative can be approximated using the

Laplacian operator.

Edge Detection Approaches


There are several approaches to edge detection. Let's talk about the most common

approaches one by one.

Sobel Edge Detection

Sobel edge detection is a popular technique used in image processing and computer

vision for detecting edges in an image. It is a gradient-based method that uses

convolution operations with specific kernels to calculate the gradient magnitude and

direction at each pixel in the image. Here's a detailed explanation of Sobel edge

detection.

The Sobel operator uses two 3x3 convolution kernels (filters), one for detecting changes

in the x-direction (horizontal edges) and one for detecting changes in the y-direction

(vertical edges). These kernels are used to compute the gradient of the image intensity

at each point, which helps in detecting the edges. Here are the Sobel kernels:

Horizontal Kernel (𝐺𝑥):

This kernel is used to detect horizontal edges by emphasizing the gradient in the

x-direction.


The 𝐺𝑥 kernel emphasizes changes in intensity in the horizontal direction. The positive

values (+1 and +2) on the right side will highlight bright areas, while the negative values

(-1 and -2) on the left side will highlight dark areas, effectively detecting horizontal

edges.

Vertical Kernel (𝐺𝑦):

This kernel is used to detect vertical edges by emphasizing the gradient in the

y-direction.

​The 𝐺𝑦 kernel emphasizes changes in intensity in the vertical direction. Similarly, the

positive values (+1 and +2) at the bottom will highlight bright areas, while the negative

values (-1 and -2) at the top will highlight dark areas, effectively detecting vertical

edges.

Let's walk through an example of Sobel edge detection using Python and the OpenCV

library. Here’s the Step-by-Step Example:


1. Load and Display the Image: First, we need to load a sample image and display

it to understand what we're working with.

2. Convert to Grayscale: Convert the image to grayscale as the Sobel operator

works on single-channel images.

3. Apply Gaussian Smoothing (Optional): Apply a Gaussian blur to reduce noise

and make edge detection more robust.

4. Apply Sobel Operator: Use the Sobel operator to calculate the gradients in the x

and y directions.

5. Calculate Gradient Magnitude: Compute the gradient magnitude from the

gradients in the x and y directions. A threshold is applied to the gradient

magnitude image to classify pixels as edges or non-edges. Pixels with gradient

magnitude above the threshold are considered edges.

6. Normalization: The gradient magnitude and individual gradients are normalized

to the range 0-255 for better visualization.

7. Display the Resulting Edge Image: Normalize and display the edge-detected

image.

Here, in the following code for sobel operator cv2.CV_64F specifies the desired depth of

the output image. Using a higher depth helps in capturing precise gradient values,

especially when dealing with small or fine details. For 𝐺𝑥 the values (1, 0) means taking

the first derivative in the x-direction and zero derivative in the y-direction. For 𝐺𝑦 the
values (0, 1) means taking the first derivative in the y-direction and zero derivative in the

x-direction. ksize=3 specifies the size of the extended 3x3 Sobel kernel.

The following is the output generated by the code.

Canny Edge Detection

Canny Edge Detection is a multi-stage algorithm to detect a wide range of edges in

images. It was developed by John F. Canny in 1986 and is known for its optimal edge
detection capabilities. The algorithm follows a series of steps to reduce noise, detect

edges, and improve the accuracy of edge detection.

Following are the steps of steps of Canny Edge Detection:

1. Noise Reduction using Gaussian Blurring: The first step in the Canny edge

detection algorithm is to smooth the image using a Gaussian filter. This helps in

reducing noise and unwanted details in the image. The Gaussian filter is applied

to the image to convolve it with a Gaussian kernel. The Gaussian kernel (or

Gaussian function) is defined as:

This step helps to remove high-frequency noise, which can cause spurious edge

detection.

2. Gradient Calculation:

After noise reduction, the Sobel operator is used to calculate the gradient intensity and

direction of the image. This involves calculating the intensity gradients in the x and y
directions (𝐺𝑥 and 𝐺𝑦). The gradient magnitude and direction are then computed using

these gradients.

3. Non-Maximum Suppression: To thin out the edges and get rid of spurious

responses to edge detection, non-maximum suppression is applied. This step

retains only the local maxima in the gradient direction. The idea is to traverse the

gradient image and suppress any pixel value that is not considered to be an

edge, i.e., any pixel that is not a local maximum along the gradient direction.
In the above image, point A is located on the edge in the vertical direction. The gradient

direction is perpendicular to the edge. Points B and C lie along the gradient direction.

Therefore, Point A is compared with Points B and C to determine if it represents a local

maximum. If it does, Point A proceeds to the next stage; otherwise, it is suppressed and

set to zero.

4. Double Thresholding: After non-maximum suppression, the edge pixels are

marked using double thresholding. This step classifies the edges into strong,

weak, and non-edges based on two thresholds: high and low. Strong edges are

those pixels with gradient values above the high threshold, while weak edges are

those with gradient values between the low and high thresholds.

Given the gradient magnitude 𝑀 and two thresholds 𝑇high and 𝑇low, the classification can

be mathematically expressed as:


Laplacian Edge Detection

Laplacian Edge Detection is a technique in image processing used to highlight areas of

rapid intensity change, which are often associated with edges in an image. Unlike

gradient-based methods such as Sobel and Canny, which use directional gradients,

Laplacian Edge Detection relies on the second derivative of the image intensity.
Following are the key Concepts of Laplacian Edge Detection:

The Laplacian operator is used to detect edges by calculating the second derivative of

the image intensity. Mathematically, the second derivative of an image 𝑓(𝑥, 𝑦) can be

represented as:

This can be implemented using convolution with a Laplacian kernel. Common 3x3

kernels for the Laplacian operator include:

cv.Laplacian() is a function provided by the OpenCV library used for performing

Laplacian edge detection on images. This function applies the Laplacian operator to the

input image to compute the second derivative of the image intensity. Following are the

steps for Edge Detection Using Laplacian


1. Convert the Image to Grayscale: Edge detection usually starts with a grayscale

image to simplify computations.

2. Apply Gaussian Blur (Optional): Smoothing the image with a Gaussian blur can

reduce noise and prevent false edge detection.

3. Apply the Laplacian Operator: Convolve the image with a Laplacian kernel to

calculate the second derivative.

Prewitt Edge Detection

Prewitt edge detection is a technique used for detecting edges in digital images. It

works by computing the gradient magnitude of the image intensity using convolution
with Prewitt kernels. The gradients are then used to identify significant changes in

intensity, which typically correspond to edges.

Prewitt edge detection uses two kernels, one for detecting edges in the horizontal

direction and the other for the vertical direction. These kernels are applied to the image

using convolution.

Horizontal Prewitt Kernel (Gx):


Roberts Cross Edge Detection

Roberts Cross edge detection is a simple technique used for detecting edges in digital

images. It works by computing the gradient magnitude of the image intensity using

convolution with Roberts Cross kernels. These kernels are small, simple, and efficient

for detecting edges, especially when the edges are thin and prominent. Lawrence

Roberts first introduced it in 1963 as one of the earliest edge detectors.

Roberts Cross edge detection uses two kernels, one for detecting edges in the

horizontal direction and the other for the vertical direction. These kernels are applied to

the image using convolution.

Horizontal Roberts Cross Kernel (Gx):


1. Convert the Image to Grayscale: Roberts Cross edge detection typically operates

on grayscale images. If the input image is in color, it needs to be converted to a

single channel (grayscale) image.

2. Apply the Horizontal and Vertical Roberts Cross Kernels: Convolve the image

with the horizontal Roberts Cross kernel (Gx) to detect horizontal edges and with

the vertical Roberts Cross kernel (Gy) to detect vertical edges.

3. Compute Gradient Magnitude: Combine the horizontal and vertical edge maps to

compute the gradient magnitude of the image intensity at each pixel. The

gradient magnitude represents the strength of the edge at each pixel.

4. Thresholding (Optional): Apply a threshold to the gradient magnitude image to

highlight significant edges and suppress noise. Thresholding helps in identifying

prominent edges while reducing false detections.


Scharr Edge Detection

Scharr edge detection is another method used to detect edges in digital images. It is an

improvement over the Sobel operator. The Scharr operator consists of two 3x3

convolution kernels, one for approximating the horizontal gradient and the other for

approximating the vertical gradient. These kernels are applied to the image to compute

the gradient at each pixel, which highlights areas of rapid intensity change or edges.

The horizontal gradient kernel (Gx) is designed to approximate the rate of change of

intensity in the horizontal direction, while the vertical gradient kernel (Gy) approximates

the rate of change of intensity in the vertical direction. The Scharr kernels are as

follows.

Horizontal gradient kernel (Gx):


❖ corner and interest point

detection

The algorithms we have studied thus far have required us to provide

two sets of interest points and, by virtue of the indexing of the two sets

of coordinates, correspondences. So far we have done this by manually

clicking on points in the images, but, in general, we will need to

automate this process, and that will be the subject of this and the
following lecture. Figure 1 shows the left and right view from a wide

baseline stereo pair and contains two examples of correspondences.

The circle (intersection on the checkerboard) indicates a somewhat

straightforward correspondence, and the square (corner of the mouth)

is slightly more difficult. The correspondence problem increases in

difficulty with wider baselines. For the human visual system, finding

these correspondences is a trivial task. In this lecture we will discuss

how to find these interest points in an image.


Interest Point Detection We would like interest points to be: • Sparse •

Informative • Reproducible

Terminology In the computer vision community, interest point

detection is often called corner detection, even though not all features

need be corners. What we usually mean by corners are actually L

junctions, but there are also Y junctions, X junctions, Ψ junctions, etc.

each having different levels if informativeness.


➔ An Example of Corner Detection

It is desirable for a corner detector to fire when there exists a real

corner. The area circled in green in Figure 2 is an example of a real

corner in the physical world, and the area circled in red is an example

of a window in the image that is not a real corner. Something like the

latter is not a good interest point because leads to a false

correspondence. This type of junction is called an occlusion junction,

since it is formed by one surface occluding another surface at a

different depth.
Properties of Corners If we think of each point on the image plane as

having a certain brightness, then we can imagine the image to be a 3D

surface with light points being high and dark points being low. This

enables us to use basic calculus on the surface and the simplest thing

we can do is compute its gradient. Definition 13.1. The gradient of an

image I at a point (x, y) is denoted ∇I and points in the direction of

greatest change from dark to light. ∇I = ∂I ∂x∂I ∂y =

Corner features occur where there is a sharp change in the angle of the

gradient. Figure 3 shows what a corner might look like in a window of

10 by 16 pixels. The arrows point in the direction of the gradient.

➔ Finding Corners : We would like to know whether each part of

the image is a corner and if so whether that corner is a good

corner. To do so we can analyze the pixels of the image in a small

neighborhood, N , and assign the location of a corner to the point


with minimum weighted distance to all the tangent lines through

each pixel in N (see Figure 3b). A typical size for N is 5 × 5 pixels,

though in practice it should be set as a percentage of the image

dimensions, and its width should be odd, for reasons of

symmetry.

➔ Consider a neighborhood of an image containing an ideal corner.

For the image in Figure 4(a), the tangent lines intersect exactly at

the location of the corner.


❖ Mathematical morphology –

texture.

➔ Mathematical Morphology

Mathematical Morphology is a tool for extracting image components

that are useful for representation and description. The technique was

originally developed by Matheron and Serra [3] at the Ecole des Mines

in Paris. It is a set-theoretic method of image analysis providing a

quantitative description of geometrical structures. (At the Ecole des

Mines they were interested in analysing geological data and the

structure of materials). Morphology can provide boundaries of objects,

their skeletons, and their convex hulls. It is also useful for many pre-

and post-processing techniques, especially in edge thinning and

pruning.

Generally speaking most morphological operations are based on

simple expanding and shrinking operations. The primary application


of morphology occurs in binary images, though it is also used on grey

level images. It can also be useful on range images. (A range image is

one where grey levels represent the distance from the sensor to the

objects in the scene rather than the intensity of light reflected from

them).

Set operations

The two basic morphological set transformations are erosion and

dilation

These transformations involve the interaction between an image A

(the object of interest) and a structuring set B, called the structuring

element.

Typically the structuring element B is a circular disc in the plane, but it

can be any shape. The image and structuring element sets need not be

restricted to sets in the 2D plane, but could be defined in 1, 2, 3 (or

higher) dimensions.
Let A and B be subsets of Z2. The translation of A by x is denoted Ax

and is defined as

The reflection of B, denoted , is defined as

The complement of A is denoted Ac, and the difference of two sets A

and B is denoted A - B.

Dilation

Dilation of the object A by the structuring element B is given by

The result is a new set made up of all points generated by obtaining

the reflection of B about its origin and then shifting this relection by x.
Consider the example where A is a rectangle and B is a disc centred on

the origin. (Note that if B is not centred on the origin we will get a

translation of the object as well.) Since B is symmetric, .

Applications of morphological operations

Erosion and dilation can be used in a variety of ways, in parallel and

series, to give other transformations including thickening, thinning,

skeletonisation and many others.

Two very important transformations are opening and closing. Now

intuitively, dilation expands an image object and erosion shrinks it.

Opening generally smooths a contour in an image, breaking narrow

isthmuses and eliminating thin protrusions. Closing tends to narrow

smooth sections of contours, fusing narrow breaks and long thin gulfs,

eliminating small holes, and filling gaps in contours.


The opening of A by B, denoted by , is given by the erosion by B,

followed by the dilation by B, that is

❖ Shapes and Regions: Binary

shape analysis – connectedness

➔ SHAPES AND REGIONS:

Binary shape analysis

Binary shape analysis refers to the examination and interpretation of


shapes or patterns represented in binary form, typically as a series of 1s and
0s. In this context, binary shapes can be seen as arrangements of pixels in a
binary image, where each pixel can be either "on" (represented by 1) or "off”
(represented by0) Binary shape analysis plays a significant role in various
fields, including computer vision, image processing, pattern recognition,
and computer graphics. It involves a range of techniques and algorithms
aimed at extracting meaningful information from binary shapes and
understanding their characteristics. Here are a few key aspects of binary
shape analysis.

➔ Shape Representation:
Binary shapes can be represented using various
techniques, such as boundary-based representations (e.g., contour tracing
algorithms), region-based representations (e.g., connected component
labeling), or skeletal representations (e.g., skeletonization algorithms)
➔ Shape Descriptors:
Shape descriptors are numerical or symbolic features that
capture specific characteristics of binary shapes. These descriptors are calculated
based on shape properties, such as area, perimeter, compactness, moments, or
orientation. They provide quantitative measures that can be used for shape
comparison, classification, or recognition.

➔ Geometric Transformations: Binary shape analysis often involves


applying geometric transformations to shapes, such as translation,
rotation, scaling, or reflection. These transformations enable
alignment and normalization of shapes, facilitating shape matching
and analysis
➔ Shape Matching and Recognition: Binary shape analysis techniques
are employed for matching and recognizing shapes in images or
datasets. Shape matching algorithms aim to find similarities or
correspondences between shapes, while shape recognition algorithms
identify specific shapes based on predefined templates or learned
models

➔ Morphological Operations: Morphological operations, such as


erosion, dilation,opening, and closing, are fundamental tools in
binary shape analysis. These operations modify the shape of binary
objects by considering the configuration of neighboring pixels,
allowing for noise removal, shape smoothing, or boundary extraction.
➔ Feature Extraction: Binary shape analysis involves extracting
informative features from binary shapes. These features can be local,
capturing properties of individual regions or pixels, or global,
summarizing the overall shape characteristics. Feature extraction
techniques are crucial for dimensionality reduction and subsequent
analysis tasks.
➔ Segmentation: Binary shape analysis is closely related to image
segmentation, which aims to partition an image into meaningful
regions or objects. Segmentation algorithms based on binary shape
analysis exploit shape properties to distinguish objects from the
background and separate different objects in an image.
➔ Binary shape analysis is a broad and active research area, with
numerous applications in fields like object recognition, medical
imaging, robotics, quality control, and more. It involves a
combination of mathematical, statistical, and computational
techniques to extract knowledge and insights from binary shapes and
utilize them for various purposes.

➔CONNECTEDNESS
➔ Connectedness, in the context of binary shape analysis, refers to the
property of a set of pixels or objects being connected to form a
coherent shape or region. It determines the relationships and
connectivity between the pixels in a binary image, where each pixel
can be either "on" (foreground) or "off" (background).

➔ The concept of connectedness is crucial in image processing,


computer vision, and pattern recognition, as it allows for the
identification and manipulation of objects or regions of interest
within an image. Here are some key aspects related to connectedness

➔ Connected Components: Connected components are sets of pixels


that are connected to each other through their adjacent neighbors. In
a binary image, a connected component consists of foreground pixels
(usually denoted as 1s) that form a continuous region, while the
background pixels (0s) are excluded.Connected components analysis
identifies and labels each distinct region or object in the image.

➔ Connectivity Criteria: The definition of connectivity depends on the


criteria used to determine the adjacency between pixels. The most
common criteria are 4-connectivity and 8-connectivity
➔ connectivity considers two pixels to be connected if they share a
common edge. In other words, pixels are connected if they are
horizontally or vertically adjacent.connectivity considers two pixels to
be connected if they share a common edge or corner. In this case,
pixels that are diagonally adjacent are also considered connected.
The choice of connectivity criteria depends on the specific application and
the desired level of detail in the analysis.

➔ Connectedness in Object Recognition: Connectedness is often used in


object recognition tasks to distinguish individual objects from the
background or other objects. By identifying and labeling connected
components, it becomes possible to isolate and analyze each object
separately. Connectedness can also help in segmenting an image into
meaningful regions based on the connectivity relationships.
➔ Connectivity Analysis: Connectedness analysis provides valuable
information about the size, shape, and spatial arrangement of objects
or regions in an image. It allows for quantifying various properties,
such as the number of connected components, their areas,
perimeters, centroids, or bounding boxes. These measurements can
be used as features for classification, tracking, or other image analysis
tasks.
➔ Connectivity-Based Operations: Connectedness information is used
to perform operations such as region growing, region merging, hole
filling, or object removal. These operations leverage the connectivity
relationships to modify or manipulate specific regions of interest
within an image.
➔ Graph-Based Representation: Connectedness can be represented
using graph structures, where each pixel or object is represented as a
node, and the connectivity relationships are represented as edges.
Graph-based representations enable efficient analysis and processing
of connected components by utilizing graph algorithms and
techniques. Connectedness plays a fundamental role in binary shape
analysis and facilitates the interpretation and manipulation of shapes
and objects in binary images. By analyzing the connectivity
relationships between pixels, it enables the extraction of meaningful
information and the implementation of various image processing
tasks

❖ OBJECT LABELING
AND COUNTING
Object labeling and counting refer to the process of identifying and
categorizing objects within an image or a visual scene and determining the
number of instances of each object category present. It is a common task in
computer vision and image processing, often used in various applications
such as object detection, image recognition, and scene understanding.
The general steps involved in object labeling and counting are as follows:
1. Image Acquisition: Obtain the image or visual data that you want to
analyze.
This could be a photograph, a video frame, or even a live video feed.
2. Preprocessing: Preprocess the image to enhance its quality and remove
any noise or unwanted elements. Common preprocessing techniques
include resizing, noise reduction, and image enhancement.

➔ Object Detection: Use an object detection algorithm or model to


identify and localize objects within the image. There are various
object detection methods available, such as region-based
convolutional neural networks (RCNN), You Only Look Once (YOLO),
and Single Shot MultiBox Detector (SSD)
Object Labeling: Once the objects are detected, assign a label or class to
each detected object. This involves associating a meaningful name or
category with each object, such as "car," "person," "cat," etc.

5. Object Counting: Count the number of instances or occurrences of each


labeled object within the image. This can be achieved by iterating through
the detected objects and keeping a count for each object category.

6. Visualization or Output: Finally, visualize or present the labeled objects


and their counts. This can be done by highlighting the objects in the
original image, generating a bounding box around each object, or creating a
separate list or table displaying the object categories and their respective
counts. Object labeling and counting can be performed using various
software libraries and frameworks in programming languages like Python.
Popular libraries for computer vision tasks include OpenCV, TensorFlow,
and PyTorch. These libraries provide pre-trained models and functions that
simplify the process of object detection, labeling, and counting.

3. Object Detection: Use an object detection algorithm or model to identify


and
localize objects within the image. There are various object detection
methods available, such as region-based convolutional neural networks
(RCNN), You Only Look Once (YOLO), and Single Shot MultiBox
Detector (SSD).
4. Object Labeling: Once the objects are detected, assign a label or class to
each detected object. This involves associating a meaningful name or
category with each object, such as "car," "person," "cat," etc.
5. Object Counting: Count the number of instances or occurrences of each
labeled object within the image. This can be achieved by iterating through
the detected objects and keeping a count for each object category.
6. Visualization or Output: Finally, visualize or present the labeled objects
and their counts. This can be done by highlighting the objects in the
original image, generating a bounding box around each object, or creating a
separate list or table displaying the object categories and their respective
counts.
It's worth noting that the accuracy and performance of object labeling and
counting can vary based on the complexity of the objects, the quality of the
images, and the chosen detection algorithm or model. Therefore, it's
important to experiment and evaluate different techniques to achieve the
desired results

❖ ACTIVE COUNTERS :
Image segmentation means partitioning the input image by clustering pixel
values. It mainly identifies various surfaces or living or nonliving objects in
an image. For example, if you have the following image as an input, you
can have a tiger, green grass, blue water, and land as various surfaces in
your output image.
Various image segmentation techniques exist, such as Active contours,

split and merge, watershed, region splitting, region merging, graph-based

segmentation, mean shift and model finding, and Normalized cut.This

article explains one of the most useful image segmentation techniques,

Active Contours.

➔What are Active Contours?

Active contour is a segmentation method that uses energy forces and

constraints to separate the pixels of interest from a picture for further

processing and analysis.An active contour is defined as an active model for

the segmentation process. Contours are the boundaries that define the

region of interest in an image. A contour is a collection of points that have

been interpolated. Depending on how the curve in the image is described,

the interpolation procedure might be linear, splines, or polynomial.

➔Why is Active Contours Needed?


Active contours are mainly used to identify uneven shapes in images. They

are also used to define smooth shapes in images and construct closed

contours for regions.Active contours are used in various medical image

segmentation applications. Active contour models are employed in various

medical applications, particularly for separating desired regions from

medical images. For example, a slice of a brain CT scan is examined for

segmentation using active contour models.

➔How Does Active Contour Work?

Active contours are the technique of obtaining deformable models or

structures in an image with constraints and forces for segmentation.

Contour models define the object borders or other picture features to

generate a parametric curve or contour.The curvature of the models is

determined using several contour techniques that employ external and

internal forces. The energy function is always related to the image’s curve.

External energy is described as the sum of forces caused by the picture


that is specifically used to control the location of the contour onto the

image, and internal energy is used to govern deformable changes.

The contour segmentation constraints for a certain image need to be

determined. The desired shape is obtained by defining the energy function.

A collection of points that locate a contour describes contour deformation.

This shape corresponds to the desired image contour, defined by

minimizing the energy function.

➔Active Contour Segmentation Models

Let us now look at some active contour segmentation models.

➔ . Snake Model

The snake model is a technique that can solve a broad range of

segmentation problems. The model’s primary function is identifying and

outlining the target object for segmentation. It requires prior knowledge of

the target object’s shape, especially for complicated things. Active snake

models, often known as snakes, are generally configured by using a spline.


focused on minimizing energy, followed by various forces governing the

image .

➔ Equation

A simple snake model can be denoted by a set of n points, vi for i=0,….n-1,

the internal elastic energy term EInternal, and the external edge-based

energy term External. The internal energy term regulates the snake’s

deformations, while the exterior energy term controls the contour’s fitting

onto the image. The external energy is typically a combination of forces

caused by the picture Eimage and constraint forces imposed by the user

Econ.The snake’s energy function is the total of its exterior and internal

energy, which can be written as below.

Advantage
The applications of the active snake model are expanding rapidly,

particularly in the many imaging domains. In medical imaging, the snake

model is used to segment one portion of an image with unique

characteristics compared to other regions. Traditional snake model

applications in medical imaging include optic disc and cup segmentation to

identify glaucoma, cell image segmentation, vascular region segmentation,

and several other regions segmentation for diagnosing and studying

disorders or anomalies.

Disadvantage

The conventional active snake model approach has various inefficiencies,

such as noise sensitivity and erroneous contour detection in

high-complexity objects, addressed in advanced contour methods.

❖ shape models and shape

recognition
Shape models and shape recognition are critical components in computer

vision, particularly for tasks involving object detection, recognition, and

image analysis. In the context of an MDU (Modular Design Unit) course,

understanding these concepts can help you grasp how computers interpret

and process shapes in images, enabling various applications from simple

pattern recognition to complex object detection systems. Here are some

essential notes on shape models and shape recognition in computer vision.

1. Shape Models

Shape models represent objects' shapes mathematically to facilitate

recognition and analysis. Various models are used depending on the

complexity and requirements of the application. Some common shape

models include:

● Geometric Models: Simple shapes like circles, rectangles, polygons,

and ellipses are represented mathematically. These models work well


when objects have clear, regular boundaries and are often used in

applications with basic shape detection requirements.

● Template Matching: This method involves creating templates or

prototypes of the shapes of interest, then matching these templates

to shapes in an image. Template matching can be either rigid

(invariant to scale, rotation) or flexible (allowing variations in scale or

rotation).

● Statistical Shape Models (SSMs): These models describe shape

variation statistically by analyzing a set of training examples to

capture common patterns in shapes. SSMs are commonly used in

medical imaging and face recognition, where shapes might vary

slightly across samples.

● Active Shape Models (ASM): ASMs are a specific type of SSM

where the shape model is actively fitted to the data by iterating to find

the best match. ASMs adjust to image features (e.g., edges) to


improve accuracy, making them useful for applications like facial

recognition or medical image segmentation.

● Deformable Shape Models: These models allow shapes to be

flexible or "deformable," which is particularly useful for objects that

can change shape. Examples include snakes (active contour models)

and level set methods. These methods adapt shapes to match

boundaries in images, which is useful in complex scenes with varying

shapes and forms.

2. Shape Recognition Techniques

Shape recognition is the process of identifying and categorizing shapes in

images. Common techniques include:

● Edge Detection: Edge detection is often the first step in shape

recognition, as edges define the boundaries of objects in an image.

Common edge detection algorithms include the Sobel, Canny, and

Laplacian operators.
● Corner Detection: Corners are high-information points that help

define the structure of shapes, often detected using algorithms like

the Harris corner detector.

● Contour Detection and Analysis: Contours outline the shape of an

object, and contour-based recognition techniques analyze these

outlines to classify shapes. OpenCV's contour functions are

frequently used for tasks like object segmentation and shape

classification.

● Hough Transform: The Hough Transform is a method for detecting

simple shapes (lines, circles, ellipses) in an image by transforming

the spatial domain into a parameter space. It’s especially useful for

detecting regular shapes in cluttered images.

● Fourier Descriptors: Fourier descriptors represent shapes based on

the frequency components of their contours, making it easier to

compare shapes based on their periodic structure. Fourier descriptors


are useful for recognizing shapes that are similar but slightly

deformed.

● Moment Invariants: Moment invariants, such as Hu Moments,

describe shape properties (like area and orientation) that are invariant

to translation, scale, and rotation. They are helpful for recognizing

shapes in various orientations and sizes.

● Deep Learning Techniques: Convolutional Neural Networks (CNNs)

have become prominent in shape recognition, particularly in complex

and large-scale applications. CNNs automatically learn shape

features from labeled datasets and can recognize highly varied

shapes in challenging conditions (e.g., partial occlusions or

non-standard orientations).

3. Challenges in Shape Recognition

● Noise and Occlusion: Noise in images or occluded shapes can

affect recognition accuracy. Robust models and preprocessing

techniques like filtering are often needed to mitigate these issues.


● Invariance to Transformations: Shapes may appear at different

scales, rotations, or translations in images. Techniques like moment

invariants and CNNs can help achieve transformation invariance.

● Complex Shapes and Non-Rigid Deformations: Recognizing

complex, deformable, or articulated shapes is challenging and often

requires advanced techniques such as active contour models,

deformable part models, or neural networks.

4. Applications of Shape Models and Recognition

● Medical Imaging: Shape models (e.g., ASMs) are used to identify

anatomical structures in MRI and CT scans, aiding in diagnostics.

● Automotive and Robotics: Shape recognition enables tasks like

autonomous driving and object manipulation, where identifying

objects and their shapes is essential for navigation and interaction.

● Biometrics: Face and fingerprint recognition rely on shape models to

identify individuals based on unique physical traits.


● Industrial Inspection: Shape recognition helps detect defects in

manufactured parts by comparing actual shapes with desired models.

In summary, shape models and recognition techniques form the backbone

of many computer vision applications, where representing, analyzing, and

recognizing shapes is essential.

BELOW TOPICS ARE SHARED IN GROUP KINDLY

REFER THEM

centroidal profiles – handling occlusion – boundary length measures

– boundary descriptors –

chain codes – Fourier descriptors – region descriptors – moments.

You might also like