Featuredescriptor
Featuredescriptor
So, in this topic, we will learn about an image matching algorithm that
identifies the key features from the images and is able to match these
features to a new image of the same object.
SIFT helps locate the local features in an image, commonly known as the ‘key points‘ of
the image. These key points are scale & rotation invariant that can be used for various
computer vision applications, like image matching, object detection, scene detection, etc.
Broadly speaking, the entire process can be divided into 4
parts:
Now, we need to ensure that these features must not be scale-dependent. This means we will
be searching for these features on multiple scales, by creating a ‘scale space’. Scale space is a
collection of images having different scales, generated from a single image.
Hence, these blur images are created for multiple scales. To create a new set of images of
different scales, we will take the original image and reduce the scale by half. For each new
image, we will create blur versions.
how many times do we need to scale the image and how many subsequent blur images need
to be created for each scaled image? The ideal number of octaves should be four, and for
each octave, the number of blur images should be five.
Next, we will try to enhance the features using a
technique called Difference of Gaussians (DoG).
Difference of Gaussian is a feature enhancement algorithm that involves the subtraction of one blurred
version of an original image from another, less blurred version of the original.
DoG creates another set of images, for each octave, by subtracting every image from the previous image in
the same scale. Here is a visual explanation of how DoG is implemented:
Let us create the DoG for the images in scale space. Take a look at the below diagram. On the left, we have 5
images, all from the first octave (thus having the same scale).
2. Key point Localization
Once the images have been created, the next step is to find the important key points from the
image that can be used for feature matching. The idea is to find the local maxima and
minima for the images. This part is divided into two steps:
1.Find the local maxima and minima
2.Remove low contrast key points (key point selection)
But some of these key points may not be robust to noise. This is why we need to perform a final
check to make sure that we have the most accurate key points to represent the image features.
Hence, we will eliminate the key points that have low contrast, or lie very close to the edge.
To deal with the low contrast key points, a second-order Taylor expansion is computed for
each key point. If the resulting value is less than 0.03 (in magnitude), we reject the key point.
So what do we do about the remaining key points? Well, we perform a check to identify the
poorly located key points. These are the key points that are close to the edge and have a high
edge response but may not be robust to a small amount of noise. A second-order Hessian
matrix is used to identify such key points.
Now that we have performed both the contrast test and the edge test to reject the unstable
key points, we will now assign an orientation value for each key point to make the rotation
invariant.
Orientation Assignment
At this stage, we have a set of stable key points for the images. We will now assign an orientation
to each of these key points so that they are invariant to rotation. We can again divide this step into
two smaller steps:
1.Calculate the magnitude and orientation
2.Create a histogram for magnitude and orientation
Let’s say we want to find the magnitude and orientation for the pixel value in
red. For this, we will calculate the gradients in x and y directions by taking the
difference between 55 & 46 and 56 & 42. This comes out to be Gx = 9 and Gy
= 14 respectively.
Once we have the gradients, we can find the magnitude and orientation using
the following formulas:
Magnitude = √[(Gx)2+(Gy)2] = 16.64
Φ = atan(Gy / Gx) = atan(1.55) = 57.17
The magnitude represents the intensity of the pixel and the orientation gives
the direction for the same.
Creating a Histogram for Magnitude and Orientation
On the x-axis, we will have bins for angle values, like 0-9,
10 – 19, 20-29, up to 360. Since our angle value is 57, it
will fall in the 6th bin. The 6th bin value will be in
proportion to the magnitude of the pixel, i.e. 16.64. We
will do this for all the pixels around the key point.
•The histogram values are also thresholded to reduce the influence of large gradients.
This will make the information partly immune to local, non-uniform changes in
illumination.
SURF (Speeded Up Robust Features) Algorithm
1.SURF is also patented with nonfree functionality and a more ‘speeded’ up version of
SIFT. Unlike SIFT, SURF approximates Laplacian of Gaussian (unlike SIFT) with Box
Filter.
2.SURF relies on the determinant of Hessian Matrix for both its location and scale.
3.Rotation invariance is not a requisite in many applications. So not finding this
orientation speeds up the process.
4.SURF includes several features that the speed improved in each step. Three times
faster than SIFT, SURF is great with rotation and blurring. It is not as great in
illumination and viewpoint change though.
Binary Robust Independent Elementary Features (BRIEF)
Oriented FAST and Rotated BRIEF (ORB)
ORB is a one-shot facial recognition algorithm. It is currently being used in your mobile phones
and apps like Google photos in which you group the people stab you see the images are grouped
according to the people. This algorithm does not require any kind of major computations. It does
not require GPU. Here, two algorithms are involved. FAST and BRIEF. It works on key point
matching.
ORB is a fusion of FAST key point detector and BRIEF descriptor with some added features to
improve the performance. FAST is Features from Accelerated Segment Test used to detect
features from the provided image. It also uses a pyramid to produce multiscale-features.
Histogram of Oriented Gradients (HOG)
Some important aspects of HOG that makes it different from other feature descriptors:
The HOG descriptor focuses on the structure or the shape of an object. In the case of edge features, we only
identify if the pixel is an edge or not. HOG is able to provide the edge direction as well. This is done by
extracting the gradient and orientation ( magnitude and direction) of the edges.
Additionally, these orientations are calculated in ‘localized’ portions. This means that the complete image is
broken down into smaller regions and for each region, the gradients and orientation are calculated.
Finally the HOG would generate a Histogram for each of these regions separately. The histograms are created
using the gradients and orientations of the pixel values, hence the name ‘Histogram of Oriented Gradients’.
Step 2: Calculating Gradients (direction x and y). Gradients are the small change in the x and y directions.
For the pixel value 85. Now, to determine the gradient (or change) in the x-direction, we need to subtract the value on
the left from the pixel value on the right. Similarly, to calculate the gradient in the y-direction.
Hence the resultant gradients in the x and y direction for this pixel are:
Change in X direction(G ) = 89 – 78 = 11
x
Change in Y direction(G ) = 68 – 56 = 8
y
The same process is repeated for all the pixels in the image.
The gradients are basically the base and perpendicular here. So, for the previous example, we had Gx and Gy as 11 and 8.
Let’s apply the Pythagoras theorem to calculate the total gradient magnitude:
Total Gradient Magnitude = √[(G ) +(G ) ]
x
2
y
2
The orientation comes out to be 36 when we plug in the values. So now, for every pixel value, we have the
total gradient (magnitude) and the orientation (direction).
We need to generate the histogram using these gradients and orientations.
Here, we are going to take the angle or orientation on the x-axis and the frequency on the y-axis.
For the highlighted pixel (85). Since the orientation for this pixel is 36, we will add a number against angle value 36,
denoting the frequency:
The same process is repeated for all the pixel values, and we end up with a frequency table that denotes angles and the
occurrence of these angles in the image.
Note that here the bin value of the histogram is 1. Hence we get about
180 different buckets, each representing an orientation value.
Method 2:
This method is similar to the previous method, except that here we have a bin size of 20. So, the number of buckets we
would get here is 9. Again, for each pixel, we will check the orientation, and store the frequency of the orientation values
in the form of a 9 x 1 matrix. Plotting this would give us the histogram:
Method 3:
The above two methods use only the orientation values to generate histograms and do not take the gradient value into
account. Here is another way in which we can generate the histogram – instead of using the frequency, we can use the
gradient magnitude to fill the values in the matrix. Below is an example of this:
Method 4:
Let’s make a small modification to the above method. Here, we will add the contribution of a pixel’s
gradient to the bins on either side of the pixel gradient. Remember, the higher contribution should be to the
bin value which is closer to the orientation.
Step 4: Calculate Histogram of Gradients in 8×8 cells (9×1)
The histograms created in the HOG feature descriptor are not generated for the whole image. Instead, the
image is divided into 8×8 cells, and the histogram of oriented gradients is computed for each cell. Why do
you think this happens?
By doing so, we get the features (or histogram) for the smaller patches which in turn represent the whole
image. We can certainly change this value here from 8 x 8 to 16 x 16 or 32 x 32.
If we divide the image into 8×8 cells and generate the histograms, we will get a 9 x 1 matrix for each cell.
This matrix is generated using method 4 that we discussed in the previous section.
Although we already have the HOG features created for the 8×8 cells of the image, the gradients of the
image are sensitive to the overall lighting. This means that for a particular picture, some portion of the
image would be very bright as compared to the other portions.
We cannot completely eliminate this from the image. But we can reduce this lighting variation by
normalizing the gradients by taking 16×16 blocks. Here is an example that can explain how 16×16 blocks
are created:
Here, we will be combining four 8×8 cells to create a 16×16 block. And we already know that each 8×8 cell
has a 9×1 matrix for a histogram. So, we would have four 9×1 matrices or a single 36×1 matrix. To
normalize this matrix, we will divide each of these values by the square root of the sum of squares of the
values. Mathematically, for a given vector V:
V = [a1, a2, a3, ….a36]
We calculate the root of the sum of squares:
k = √(a1)2+ (a2)2+ (a3)2+ …. (a36)2
And divide all the values in the vector V with this value k:
Step 6: Features for the complete image
So far, we have created features for 16×16 blocks of the image. Now, we will combine all these to get the features for the
final image. We would first need to find out how many such 16×16 blocks would we get for a single 64×128 image.
LBP image descriptor (along with a bit of machine learning) to automatically classify and identify textures
and patterns in images (such as the texture/pattern of wrapping paper, cake icing, or candles, for instance).
The first step in constructing the LBP texture descriptor is to convert the image to grayscale. For each
pixel in the grayscale image, we select a neighborhood of size r surrounding the center pixel. A LBP value
is then calculated for this center pixel and stored in the output 2D array with the same width and height as
the input image.
In this example we start at the top-right point and work our way clockwise accumulating the binary string
as we go along. We can then convert this binary string to decimal, yielding a value of 23.
This value is stored in the output LBP 2D array, which we can then visualize below
The last step is to compute a histogram over the output LBP array. Since a 3 x 3 neighborhood has 2 ^ 8 =
256 possible patterns, our LBP 2D array thus has a minimum value of 0 and a maximum value of 255, allowing
us to construct a 256-bin histogram of LBP codes as our final feature vector:
s
• Min/
Often useful to
Max
mask regions of
interest first Chen et al
2017
Contrast (Moment 2 or standard deviation) is a measure of intensity or gray level variations between the reference
pixel and its neighbor. Large contrast reflects large intensity differences in GLCM:
Homogeneity measures how close the distribution of elements in the GLCM is to the diagonal of
GLCM. As homogeneity increases, the contrast, typically, decreases:
Entropy is the randomness or the degree of disorder present in the image. The value of entropy is the largest when all
elements of the cooccurrence matrix are the same and small when elements are unequal:
Energy is derived from the Angular Second Moment (ASM). The ASM measures the local
uniformity of the gray levels. When pixels are very similar, the ASM value will be large. Consider
Procedure to Generate GLCM Matrix: