0% found this document useful (0 votes)
16 views45 pages

Unit 4 Int345

The document discusses feature detection and description in computer vision, focusing on methods to identify and extract key features such as edges, corners, and blobs from images. It covers various algorithms including Harris Corner Detection, SIFT, SURF, and FAST, detailing their processes, advantages, and applications in tasks like object recognition and image alignment. Additionally, it highlights the differences between these algorithms and provides programming examples for implementing them using OpenCV.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views45 pages

Unit 4 Int345

The document discusses feature detection and description in computer vision, focusing on methods to identify and extract key features such as edges, corners, and blobs from images. It covers various algorithms including Harris Corner Detection, SIFT, SURF, and FAST, detailing their processes, advantages, and applications in tasks like object recognition and image alignment. Additionally, it highlights the differences between these algorithms and provides programming examples for implementing them using OpenCV.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Feature Detection and Description

Feature detection
• Feature detection is the process of checking the important features of
the image in this case features of the image can be edges, corners,
ridges, and blobs in the images.
• Feature detection is a fundamental concept in computer vision, which
refers to the process of identifying and extracting distinct and
informative patterns or structures from an image or a set of images.
• These features can represent key aspects of the image, such as edges,
corners, blobs, or even more complex structures like object parts or
shapes. Feature detection is crucial for various computer vision tasks,
including image recognition, object tracking, image stitching, and
more.
Can you match the scene points?

o Detection
o Description
o Matching

Translation, rotation, scale ..


Feature Matching
• Feature matching in computer vision refers to the process of finding
corresponding features or keypoints between two or more images or
frames of a video sequence.
• These features are distinctive and identifiable points, areas, or
descriptors within the images that can be matched across different
views or frames.
• Feature matching is a crucial step in various computer vision tasks,
including object recognition, image stitching, 3D reconstruction, and
image alignment.
Different Feature Descriptors
• Harris Corner Detector
• Scale-Invariant Feature Transform (SIFT)
• Speeded Up Robust Features (SURF)
• FAST (Features from Accelerated Segment Test)
• ORB (Oriented FAST and Rotated BRIEF)
• Histogram of Oriented Gradients (HOG)
• Edge Detectors (e.g., Canny Edge Detector)
Haris corner detection
• Haris corner detection is a method in which we can detect the corners
of the image by sliding a slider box all over the image by finding the
corners and it will apply a threshold and the corners will be marked in
the image.
• This algorithm is mainly used to detect the corners of the image.
Algorithm
• Gradients Calculation:
• Compute the gradients of the image in both the x and y directions using derivative filters
• Structure Tensor Calculation:
• Construct the structure tensor for each pixel in the image. The structure tensor is a 2x2 matrix that
summarizes the gradient information in a local neighborhood around each pixel.
• The structure tensor is defined as:

• Here, Ix and Iy are the gradients in the x and y directions, and w(x,y) is a weighting function (e.g., Gaussian
window) over the local neighborhood.
• Corner Response Function:
• Compute a corner response function to evaluate the likelihood of a pixel being a corner. Harris and
Stephens introduced the following corner response function:

• k is a constant typically set between 0.04 and 0.06. The det() function calculates the determinant, and
trace() calculates the trace of the structure tensor.
Sobel Operator for gradient Magnitude
• The Sobel operator uses convolution with two 3x3 kernels, one for detecting vertical edges
(Sobel_x) and the other for horizontal edges (Sobel_y).
• These kernels are used to compute the gradient of the image in the x and y directions,
respectively.
• Let's consider the following 3x3 grayscale image matrix:
| 100 150 200 |
| 120 180 220 |
| 80 110 140 |

• Sobel_x kernel:
-1 0 1
-2 0 2
-1 0 1

• Sobel_y kernel:
-1 -2 -1
0 0 0
X_gradient:
[[(-1*100 + 0*150 + 1*200) , (-2*100 + 0*150 + 2*200) , (-1*100 + 0*150 + 1*200)]
[(-1*120 + 0*180 + 1*220), (-2*120 + 0*180 + 2*220), (-1*120 + 0*180 + 1*220)]
[(-1*80 + 0*110 + 1*140) , (-2*80 + 0*110 + 2*140), (-1*80 + 0*110 + 1*140)]
Y_gradient:
(-1*100 - 2*150 - 1*200) (0*100 + 0*150 + 0*200) (1*100 + 2*150 + 1*200)
(-1*120 - 2*180 - 1*220) (0*120 + 0*180 + 0*220) (1*120 + 2*180 + 1*220)
(-1*80 - 2*110 - 1*140) (0*80 + 0*110 + 0*140) (1*80 + 2*110 + 1*140)
gradient_magnitude = sqrt((Sobel_x)^2 + (Sobel_y)^2)
• Corner Detection:
• Select pixels where R is above a certain threshold, indicating corners or
interest points in the image.
• Optionally, non-maximum suppression can be applied to retain only the local
maxima in the corner response.
• The Harris Corner Detector identifies regions where there are
significant variations in intensity in multiple directions, indicating the
presence of corners. It's important to note that Harris Corner
Detector is sensitive to noise and changes in scale and rotation.
Numerical Example
• Suppose we have gradients Ix=3 and Iy=5 for a specific pixel, and we
choose k=0.04 and consider threshold =0, identify given pixel is
corner point or not using harris corner operator.
Program Harris Corner Operator
import cv2
import numpy as np

# Load the image


image = cv2.imread('path_to_your_image.jpg‘,0)

# Harris corner detection parameters


block_size = 2 # Neighborhood size for corner detection

ksize = 3 # Aperture parameter for the Sobel operator

k = 0.04 # Harris detector free parameter (typically 0.04 - 0.06)

# Detect corners using Harris corner detection


corners = cv2.cornerHarris(gray, block_size, ksize, k)

# Threshold for an optimal value, it may vary depending on the image and the Harris detector parameter k
threshold = 0.01 * corners.max()

# Mark detected corners in the image


image[corners > threshold] = [0, 0, 255] # Mark corners in red (BGR)

# Display the result


Cv2_imshow( image)
SIFT (Scale-Invariant Feature Transform)
• While Haris is the algorithms to detect the corners of the image.
• SIFT is one of the important algorithms that detect objects irrelevant
to the scale and rotation of the image and the reference.
• This helps a lot while we are comparing the real-world objects to an
image though it is independent of the angle and scale of the image.
• This method will return the key points of the images which we need to
mark in the image.
SIFT Algorithm
• Scale-space extrema detection: SIFT begins by constructing a scale-space representation of the image by
applying Gaussian blurring at different scales. This creates a pyramid of blurred images, with each level
representing a different degree of blurring (scale). The algorithm then detects local extrema (maxima and
minima) across the scale-space pyramid to identify potential keypoints.
• Keypoint localization: For each potential keypoint, SIFT performs precise localization by fitting a 3D quadratic
function to the scale-space data to find the accurate location and scale of the keypoint. It ensures that the
detected keypoints are stable under scaling transformations.
• Orientation assignment: SIFT computes the dominant orientation for each keypoint to achieve rotation
invariance. It considers the gradient magnitudes and orientations of the nearby pixels and assigns a primary
orientation to the keypoint based on this information.
• Keypoint descriptor generation: SIFT constructs a descriptor for each keypoint by considering the gradients in
its local neighborhood. The gradients are transformed into a representation that is invariant to changes in
rotation, scale, and illumination. The resulting descriptor is a vector of numerical values that characterizes the
keypoint's appearance.
image_path = '/content/2.jpg'
image = cv2.imread(image_path,0)

# Create an SIFT object


sift = cv2.SIFT_create()

# Detect keypoints and compute descriptors


keypoints, descriptors = sift.detectAndCompute(image, None)

# Draw keypoints on the original image


image_with_keypoints = cv2.drawKeypoints(image, keypoints, outImage=None)

# Display the original image with keypoints


plt.figure(figsize=(20, 5))
plt.imshow(cv2.cvtColor(image_with_keypoints, cv2.COLOR_BGR2RGB))
plt.title('Image with SIFT Keypoints')
plt.show()
speeded up robust features (SURF)
• In computer vision, speeded up robust features (SURF) is a patented
local feature detector and descriptor. It can be used for tasks such
as object recognition, image registration, classification, or 3D
reconstruction.
• It is partly inspired by the scale-invariant feature transform (SIFT)
descriptor.
• The standard version of SURF is several times faster than SIFT and
claimed by its authors to be more robust against different image
transformations than SIFT.
SURF Algorithm
• Integral Images: SURF utilizes integral images, a technique that allows for rapid computation of rectangular
sum regions. This significantly speeds up the computation of box filters, which are essential for
approximating the Gaussian filter.
• Gaussian Approximation using Box Filters: SURF approximates the Gaussian convolution using box filters at
multiple scales. This approximation speeds up the computation of the scale-space representation, making
SURF more efficient compared to SIFT.
• Keypoint Detection: SURF detects keypoints based on the responses to a blob detector, using the
determinant of the Hessian matrix. The Hessian matrix is computed using the approximated second-order
Gaussian derivatives.
• Orientation Assignment: SURF assigns orientation to keypoints to achieve rotation invariance. It uses Haar
wavelet responses in a circular region around each keypoint to determine the dominant orientation.
• Descriptor Calculation: The SURF descriptor is based on Haar wavelet responses in horizontal and vertical
directions within a circular region around each keypoint. These responses are used to compute a compact
64 (or 128) element descriptor that characterizes the local image information.
• Scale and Rotation Invariance: SURF maintains scale and rotation invariance, ensuring that the detected
features can be matched across different scales and orientations.
• import cv2

# Load an image
• image = cv2.imread( '/content/2.jpg' , cv2.IMREAD_COLOR)

# Create a SURF object
• surf = cv2.xfeatures2d.SURF_create()

# Detect and compute SURF keypoints and descriptors
• keypoints, descriptors = surf.detectAndCompute(image, None)

# Draw the keypoints on the image
• image_with_keypoints = cv2.drawKeypoints(image, keypoints, None, (0, 255, 0), 4)

# Display the original image with keypoints
• cv2.imshow( 'Original Image with SURF Keypoints' , image_with_keypoints)
• cv2.waitKey( 0)
• cv2.destroyAllWindows()

Difference Between SURF and SIFT
• Algorithm Speed and Efficiency:
• SURF is generally faster than SIFT, making it more suitable for real-time applications or scenarios where
computational efficiency is critical.
• SURF achieves its speed by using integral images and a different approximation of the Gaussian filter
compared to SIFT.
• Gaussian Approximation:
• SIFT uses a Gaussian filter with a fixed window size, which can be computationally expensive at different
scales.
• SURF approximates the Gaussian filter using box filters, which can be computed using integral images,
resulting in faster computation.
• Scale Invariance:
• Both SIFT and SURF are designed to be scale-invariant, meaning they can detect features at different
scales in an image.
• Keypoint Detection and Localization:
• SIFT uses a Difference of Gaussians (DoG) approach for keypoint detection and precise localization.
• SURF uses a Hessian matrix for keypoint detection, which provides faster and efficient localization
compared to DoG.
• Descriptor Calculation:
• SIFT computes a 128-dimensional vector (or more) for each keypoint, providing a highly distinctive
descriptor.
• SURF, on the other hand, computes a shorter 64-dimensional vector (or more) for each keypoint,
providing a relatively less distinctive but faster-to-calculate descriptor.
• Robustness to Rotation:
• Both SIFT and SURF are designed to be robust to image rotation, making them suitable for applications
where the object's orientation may vary.
Hessian operator
• The Hessian operator is a mathematical tool used in image processing
and computer vision for analyzing the second-order spatial derivatives
of an image.
• It is primarily employed for detecting and characterizing local
structures and features within an image, such as edges, corners, and
blobs.
• In the context of feature detection, the Hessian matrix is used to
compute the Hessian determinant, which is a measure of local
structure and can be used for features like blobs or regions of
interest.
1. Eigenvalues: The eigenvalues of the Hessian matrix (λ1 and λ2) can
be used to determine the type of feature at that location:
1. If both eigenvalues are positive, it indicates a region with
significant intensity variation, typically an edge.
2. If both eigenvalues are negative, it indicates a region with low
intensity and uniformity, such as a flat region.
3. If one eigenvalue is positive and the other is negative, it suggests a
corner or junction.
2. Orientation: The eigenvector associated with the larger eigenvalue
(λ1) provides the orientation of the feature.
import cv2 # Compute elements of the Hessian matrix
Ixx = cv2.filter2D(Ix, cv2.CV_64F, mask_x)
import numpy as np
Iyy = cv2.filter2D(Iy, cv2.CV_64F, mask_y)
Ixy = cv2.filter2D(Ix, cv2.CV_64F, mask_y)
# Load the image
image_path = 'path/to/your/image.jpg' # Calculate the determinant of the Hessian matrix
image = cv2.imread(image_path, 0) hessian_det = (Ixx * Iyy) - (Ixy**2)

# Define the Hessian operator masks # Display the images or do further processing based on
mask_x = np.array([[1, 0, -1], [2, 0, -2], [1, 0, -1]]) hessian_det
mask_y = np.array([[1, 2, 1], [0, 0, 0], [-1, -2, -1]])
# For visualization (optional)
cv2.imshow('Image', image.astype(np.uint8))
# Compute the second-order partial derivatives cv2.imshow('Ix', Ix)
Ix = cv2.filter2D(image, cv2.CV_64F, mask_x) cv2.imshow('Iy', Iy)
Iy = cv2.filter2D(image, cv2.CV_64F, mask_y) cv2.imshow('Ixx', Ixx)
cv2.imshow('Iyy', Iyy)
cv2.imshow('Ixy', Ixy)
cv2.imshow('Hessian Determinant',
hessian_det.astype(np.uint8))
cv2.waitKey(0)
cv2.destroyAllWindows()
FAST Features from Accelerated Segment
Test
• SURF is fast when compared to SIFT but not as fast to use with
real-time devices like mobile phones and surveillance cameras.
• So FAST algorithm was introduced with a very fast computing time.
• However FAST gives us only the key points and we may need to
compute descriptors with other algorithms like SIFT and SURF.
• With a Fast algorithm, we can detect corners and also blobs.
• https://docs.opencv.org/3.4/df/d0c/tutorial_py_fast.html
Syntax:
fast = cv2.FastFeatureDetector_create()
fast.setNonmaxSuppression(False)
kp = fast.detect(gray_img, None)
# Create a FAST detector object
fast = cv2.FastFeatureDetector_create()

# Detect FAST corners


keypoints = fast.detect(gray, None)

# Draw the detected corners on the image


result_image = cv2.drawKeypoints(image, keypoints, None, color=(0, 255,
0))

# Display the image with corners


plt.imshow(cv2.cvtColor(result_image, cv2.COLOR_BGR2RGB))
plt.title('FAST Corner Detection')
plt.axis('off')
plt.show()
BRIEF(Binary Robust Independent Elementary
Features)
• we use binary strings as an efficient feature point descriptor, which is
called BRIEF.
• BRIEF is very fast both to build and to match.
• BRIEF easily outperforms other fast descriptors such as SURF and
SIFT in terms of speed and terms of recognition rate in many cases.
• https://docs.opencv.org/3.4/dc/d7d/tutorial_py_brief.html
ORB (Oriented FAST and Rotated Brief)
• ORB is a very effective way of detecting the features of the image when
compared to SIFT and SURF.
• ORB is programmed to find fewer features in the image when compared to
the SIFT and SURF algorithm because it detects the very important features
in less time than them yet this algorithm is considered as a very effective
algorithm when compared to other detecting algorithms.
• https://docs.opencv.org/3.4/d1/d89/tutorial_py_orb.html
• Syntax:
• orb = cv2.ORB_create(nfeatures=2000)
• kp, des = orb.detectAndCompute(gray_img, None)
import cv2
import matplotlib.pyplot as plt

# Reading the image and converting it to grayscale


image = cv2.imread('/content/2.jpg')
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Applying the ORB detector


orb = cv2.ORB_create(nfeatures=2000)
kp, des = orb.detectAndCompute(gray_image, None)

# Drawing the keypoints


kp_image = cv2.drawKeypoints(image, kp, None, color=(0, 255, 0), flags=0)

# Convert the image from BGR to RGB format for matplotlib


kp_image_rgb = cv2.cvtColor(kp_image, cv2.COLOR_BGR2RGB)

# Display the image using matplotlib


plt.imshow(kp_image_rgb)
plt.axis('off')
plt.title('ORB Keypoints')
plt.show()
histogram of oriented gradients (HOG)
• The histogram of oriented gradients (HOG) is a feature
descriptor used in computer vision and image processing for the
purpose of object detection. The technique counts occurrences
of gradient orientation in localized portions of an image.
• Suppose we have a tiny grayscale image of size 4x4 pixels, and we
want to compute its HOG features. We'll use a simple 2x2 cell size
and a 2x2 block size for this example.
Image:
[[ 10, 20, 5, 15],
[ 15, 30, 10, 25],
[ 5, 10, 2, 5],
[ 10, 20, 5, 15]]
Step 1: Compute the Gradient Magnitude and Direction
We'll use the Sobel operator to compute gradient magnitudes and directions. Let's assume the following
gradient magnitudes (G) and directions (D) are calculated:

Gradient Magnitudes (G):


[[ 5, 10, 5, 10],
[10, 15, 10, 15],
[ 5, 10, 5, 10],
[ 5, 10, 5, 10]]

Gradient Directions (D):


[[135, 90, 135, 90],
[135, 90, 135, 90],
[135, 90, 135, 90],
[135, 90, 135, 90]]
Step 2: Divide the Image into Cells (2x2)
We divide the 4x4 image into non-overlapping 2x2 cells, resulting in four cells. We will compute histograms for
each cell.
Step 3: Create Histograms for Each Cell
For each cell, we create an 8-bin histogram of gradient orientations (0 to 180 degrees). Let's compute the
histograms for two cells:

Cell 1 (top-left):
•Gradient Directions: [135, 90, 135, 90] Cell 2 (top-right):
•Gradient Magnitudes: [5, 10, 5, 10] •Gradient Directions: [135, 90, 135, 90]
Histogram bins (0-20, 20-40, ..., 160-180 degrees): •Gradient Magnitudes: [5, 10, 5, 10]
•Bin 0-20 degrees: 0 (no magnitudes in this range) The histogram for Cell 2 is also [0, 0, 0, 0, 2, 0, 0, 2].
•Bin 20-40 degrees: 0
•Bin 40-60 degrees: 0
•Bin 60-80 degrees: 0
•Bin 80-100 degrees: 2 (from 90-degree gradients)
•Bin 100-120 degrees: 0
•Bin 120-140 degrees: 0
•Bin 140-160 degrees: 2 (from 135-degree gradients)
So, the histogram for Cell 1 is [0, 0, 0, 0, 2, 0, 0, 2].
Step 4: Normalize Histograms within Blocks (2x2)
Now, we group cells into 2x2 blocks and normalize the histograms within each block. In this simplified
example, we have only one block, which consists of Cell 1 and Cell 2. Normalization is often done using
techniques like L2 normalization.

Step 5: Concatenate the Block Histograms


Since we have only one block in this example, we concatenate the normalized block histograms. So, our
final HOG feature vector for this tiny image would be [0, 0, 0, 0, 2, 0, 0, 2].
Texture descriptors
• Texture descriptors in computer vision are used to characterize
the visual texture properties of an image or a region within an
image.
• Texture describes the spatial arrangement of color or intensity
patterns within an image and is an important feature for tasks
like image classification, segmentation, and object recognition.
• Texture descriptors aim to capture various texture properties
such as smoothness, roughness, regularity, or randomness.
Some commonly used texture descriptors in
computer vision:
1. Statistical Texture Descriptors:
1. Gray Level Co-occurrence Matrix (GLCM): GLCM computes the
co-occurrence of pixel intensity values at different spatial relationships within
an image. It is used to capture texture properties related to spatial patterns.
2. Gray Level Run Length Matrix (GLRLM): GLRLM characterizes texture
based on the length and occurrence of consecutive pixel values along rows
or columns in an image.
3. Gray Level Size Zone Matrix (GLSZM): GLSZM quantifies texture by
analyzing the size and frequency of connected regions with the same pixel
intensity value.
4. Haralick Texture Features: Derived from GLCM, Haralick features are
statistical measures that describe texture properties such as contrast,
energy, and entropy.

2. Filter-Based Descriptors:
• Gabor Filters: Gabor filters are used to analyze the frequency and
orientation components of textures in an image. They are particularly
useful for texture discrimination.
• Local Binary Patterns (LBP): LBP encodes the relationship
between a central pixel and its neighbors by thresholding the pixel
values. It is effective for texture classification.
• Histogram of Oriented Gradients (HOG): Originally designed for
object detection, HOG can also be used to capture texture
information by analyzing gradient orientations within local image
regions.
3. Transform-Based Descriptors:
1. Wavelet Transform: Wavelet transform decomposes an image into different
scales and orientations, allowing the analysis of textures at multiple
resolutions.
2. Discrete Cosine Transform (DCT): DCT is used to represent an image in the
frequency domain. It can capture texture features by analyzing the distribution
of frequency components.
4. Deep Learning-Based Descriptors:
3. Convolutional Neural Networks (CNNs): CNNs, especially pre-trained
models like VGG, ResNet, or Inception, can extract high-level texture features
by learning hierarchical representations from images.
4. Texture-Related CNN Architectures: There are also specialized CNN
architectures designed for texture analysis, such as TNet and T-CNN.
5. Texture Moments:
1. Hu Moments: Hu moments are used to describe the shape and texture
properties of objects in an image. They are invariant to translation,
rotation, and scale changes.
6. Local Descriptors:
2. Local Binary Patterns (LBP) and Local Texture Patterns (LTP):
These local descriptors focus on analyzing texture properties within
small image regions or patches.
• The choice of texture descriptor depends on the specific application
and the characteristics of the textures you want to capture. In many
cases, a combination of descriptors and machine learning techniques
may be used to achieve robust texture analysis and recognition.
Image descriptors
• In computer vision, visual descriptors or image
descriptors are descriptions of the visual features of the
contents in images, videos, or algorithms or applications that
produce such descriptions. They describe elementary
characteristics such as the shape, the color, the texture or
the motion, among others.
Descriptors are the first step to find out the connection
between pixels contained in a digital image and what humans recall
after having observed an image or a group of images after some
minutes.
Visual descriptors are divided in two main groups:
• General information descriptors: contain low level descriptors
which give a description about color, shape, regions, textures and
motion.
• Specific domain information descriptors: give information about
objects and events in the scene. A concrete example would be face
recognition.
Descriptors Applications:
Among all applications, the most important ones are:

• Multimedia documents search engines and classifiers.


• Digital library: visual descriptors allow a very detailed and concrete search of any
video or image by means of different search parameters. For instance, the search of
films where a known actor appears, the search of videos containing the Everest
mountain, etc.
• Personalized electronic news service.
• Possibility of an automatic connection to a TV channel broadcasting a soccer match,
for example, whenever a player approaches the goal area.
• Control and filtering of concrete audio-visual contents, like violent or pornographic
material. Also, authorization for some multimedia contents.
Thank you!

You might also like