0% found this document useful (0 votes)
7 views

Featuredescriptor

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Featuredescriptor

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Feature Descriptors

SIFT (Scale Invariant Feature Transform)


We naturally understand that the scale or angle of the image may change but the object
remains the same. But machines have an almighty struggle with the same idea. It’s a
challenge for them to identify the object in an image if we change certain things (like the
angle or the scale). Here’s the good news – machines are super flexible and we can teach
them to identify images at an almost human-level.

So, in this topic, we will learn about an image matching algorithm that
identifies the key features from the images and is able to match these
features to a new image of the same object.

SIFT helps locate the local features in an image, commonly known as the ‘key points‘ of
the image. These key points are scale & rotation invariant that can be used for various
computer vision applications, like image matching, object detection, scene detection, etc.
Broadly speaking, the entire process can be divided into 4
parts:

 Constructing a Scale Space: To make sure that features


are scale-independent.

 Key point Localisation: Identifying the suitable features


or key points.

 Orientation Assignment: Ensure the key points are


rotation invariant.

 Key point Descriptor: Assign a unique fingerprint to


each key point.

Finally, we can use these key points for feature matching!


1. Constructing the Scale Space
We need to identify the most distinct features in a given image while ignoring any noise. For
every pixel in an image, the Gaussian Blur calculates a value based on its neighboring pixels.
The texture and minor details are removed from the image and only the relevant information like
the shape and edges remain.

 Now, we need to ensure that these features must not be scale-dependent. This means we will
be searching for these features on multiple scales, by creating a ‘scale space’. Scale space is a
collection of images having different scales, generated from a single image.

 Hence, these blur images are created for multiple scales. To create a new set of images of
different scales, we will take the original image and reduce the scale by half. For each new
image, we will create blur versions.

 how many times do we need to scale the image and how many subsequent blur images need
to be created for each scaled image? The ideal number of octaves should be four, and for
each octave, the number of blur images should be five.
Next, we will try to enhance the features using a
technique called Difference of Gaussians (DoG).
 Difference of Gaussian is a feature enhancement algorithm that involves the subtraction of one blurred
version of an original image from another, less blurred version of the original.

 DoG creates another set of images, for each octave, by subtracting every image from the previous image in
the same scale. Here is a visual explanation of how DoG is implemented:
Let us create the DoG for the images in scale space. Take a look at the below diagram. On the left, we have 5
images, all from the first octave (thus having the same scale).
2. Key point Localization
Once the images have been created, the next step is to find the important key points from the
image that can be used for feature matching. The idea is to find the local maxima and
minima for the images. This part is divided into two steps:
1.Find the local maxima and minima
2.Remove low contrast key points (key point selection)

Local Maxima and Local Minima

To locate the local maxima and minima, we go through


every pixel in the image and compare it with its
neighboring pixels. ‘Neighboring’, this not only includes
the surrounding pixels of that image (in which the pixel
lies), but also the nine pixels for the previous and next
image in the octave. This means that every pixel value is
compared with 26 other pixel values to find whether it is
the local maxima/minima.
Key point Selection

 But some of these key points may not be robust to noise. This is why we need to perform a final
check to make sure that we have the most accurate key points to represent the image features.
Hence, we will eliminate the key points that have low contrast, or lie very close to the edge.

 To deal with the low contrast key points, a second-order Taylor expansion is computed for
each key point. If the resulting value is less than 0.03 (in magnitude), we reject the key point.
 So what do we do about the remaining key points? Well, we perform a check to identify the
poorly located key points. These are the key points that are close to the edge and have a high
edge response but may not be robust to a small amount of noise. A second-order Hessian
matrix is used to identify such key points.
 Now that we have performed both the contrast test and the edge test to reject the unstable
key points, we will now assign an orientation value for each key point to make the rotation
invariant.
Orientation Assignment
At this stage, we have a set of stable key points for the images. We will now assign an orientation
to each of these key points so that they are invariant to rotation. We can again divide this step into
two smaller steps:
1.Calculate the magnitude and orientation
2.Create a histogram for magnitude and orientation

Calculate Magnitude and Orientation

Let’s say we want to find the magnitude and orientation for the pixel value in
red. For this, we will calculate the gradients in x and y directions by taking the
difference between 55 & 46 and 56 & 42. This comes out to be Gx = 9 and Gy
= 14 respectively.
Once we have the gradients, we can find the magnitude and orientation using
the following formulas:
Magnitude = √[(Gx)2+(Gy)2] = 16.64
Φ = atan(Gy / Gx) = atan(1.55) = 57.17
The magnitude represents the intensity of the pixel and the orientation gives
the direction for the same.
Creating a Histogram for Magnitude and Orientation

 On the x-axis, we will have bins for angle values, like 0-9,
10 – 19, 20-29, up to 360. Since our angle value is 57, it
will fall in the 6th bin. The 6th bin value will be in
proportion to the magnitude of the pixel, i.e. 16.64. We
will do this for all the pixels around the key point.

 This histogram would peak at some point. The bin at which


we see the peak will be the orientation for the
keypoint. Additionally, if there is another significant peak
(seen between 80 – 100%), then another key point is
generated with the magnitude and scale the same as the key
point used to generate the histogram. And the angle or
orientation will be equal to the new bin that has the peak.

Effectively at this point, we can say that there can be a small


increase in the number of key points.
Key point Descriptor
This is the final step for SIFT. Now, we will use the neighboring pixels, their orientations, and magnitude,
to generate a unique fingerprint for this key point called a ‘descriptor’. A SIFT descriptor is a 3-D
spatial histogram of the image gradients in characterizing the appearance of a key point.
 Additionally, since we use the surrounding pixels, the descriptors will be partially invariant to illumination
or brightness of the images.
 We will first take a 16×16 neighborhood around the key point. This 16×16 block is further divided into
4×4 sub-blocks and for each of these sub-blocks, we generate the histogram using magnitude and
orientation.

At this stage, the bin size is increased and


we take only 8 bins (not 36). Each of
these arrows represents the 8 bins and the
length of the arrows define the
magnitude. So, we will have a total of
128 bin values for every key point.
•The histograms are normalized at the end which means the descriptors will not store the
magnitudes of the gradients, only their relations to each other. This should make the
descriptors invariant against global, uniform illumination changes.

•The histogram values are also thresholded to reduce the influence of large gradients.
This will make the information partly immune to local, non-uniform changes in
illumination.
SURF (Speeded Up Robust Features) Algorithm
1.SURF is also patented with nonfree functionality and a more ‘speeded’ up version of
SIFT. Unlike SIFT, SURF approximates Laplacian of Gaussian (unlike SIFT) with Box
Filter.
2.SURF relies on the determinant of Hessian Matrix for both its location and scale.
3.Rotation invariance is not a requisite in many applications. So not finding this
orientation speeds up the process.
4.SURF includes several features that the speed improved in each step. Three times
faster than SIFT, SURF is great with rotation and blurring. It is not as great in
illumination and viewpoint change though.
Binary Robust Independent Elementary Features (BRIEF)
Oriented FAST and Rotated BRIEF (ORB)
ORB is a one-shot facial recognition algorithm. It is currently being used in your mobile phones
and apps like Google photos in which you group the people stab you see the images are grouped
according to the people. This algorithm does not require any kind of major computations. It does
not require GPU. Here, two algorithms are involved. FAST and BRIEF. It works on key point
matching.

ORB is a fusion of FAST key point detector and BRIEF descriptor with some added features to
improve the performance. FAST is Features from Accelerated Segment Test used to detect
features from the provided image. It also uses a pyramid to produce multiscale-features.
Histogram of Oriented Gradients (HOG)

Some important aspects of HOG that makes it different from other feature descriptors:
 The HOG descriptor focuses on the structure or the shape of an object. In the case of edge features, we only
identify if the pixel is an edge or not. HOG is able to provide the edge direction as well. This is done by
extracting the gradient and orientation ( magnitude and direction) of the edges.
 Additionally, these orientations are calculated in ‘localized’ portions. This means that the complete image is
broken down into smaller regions and for each region, the gradients and orientation are calculated.
 Finally the HOG would generate a Histogram for each of these regions separately. The histograms are created
using the gradients and orientations of the pixel values, hence the name ‘Histogram of Oriented Gradients’.

The HOG feature descriptor counts the occurrences of


gradient orientation in localized portions of an image.
Step 1: Preprocess the Data (64 x 128). We need to preprocess the image and bring down the width to
height ratio to 1:2. The image size should preferably be 64 x 128. This is because we will be dividing the
image into 8*8 and 16*16 patches to extract the features. Having the specified size (64 x 128) will make all
our calculations pretty simple. In fact, this is the exact value used in the original paper.

Step 2: Calculating Gradients (direction x and y). Gradients are the small change in the x and y directions.

For the pixel value 85. Now, to determine the gradient (or change) in the x-direction, we need to subtract the value on
the left from the pixel value on the right. Similarly, to calculate the gradient in the y-direction.
Hence the resultant gradients in the x and y direction for this pixel are:
 Change in X direction(G ) = 89 – 78 = 11
x

 Change in Y direction(G ) = 68 – 56 = 8
y

The same process is repeated for all the pixels in the image.

The magnitude would be higher when there is a sharp change in


intensity, such as around the edges.

Step 3: Calculate the Magnitude and Orientation

The gradients are basically the base and perpendicular here. So, for the previous example, we had Gx and Gy as 11 and 8.
Let’s apply the Pythagoras theorem to calculate the total gradient magnitude:
Total Gradient Magnitude = √[(G ) +(G ) ]
x
2
y
2

Total Gradient Magnitude = √[(11) +(8) ] = 13.6


2 2
Next, calculate the orientation (or direction) for the same pixel. We know that we can write the tan for the
angles:
tan(Φ) = Gy / Gx
Hence, the value of the angle would be:
Φ = atan(Gy / Gx)

The orientation comes out to be 36 when we plug in the values. So now, for every pixel value, we have the
total gradient (magnitude) and the orientation (direction).
We need to generate the histogram using these gradients and orientations.

Here, we are going to take the angle or orientation on the x-axis and the frequency on the y-axis.
For the highlighted pixel (85). Since the orientation for this pixel is 36, we will add a number against angle value 36,
denoting the frequency:

The same process is repeated for all the pixel values, and we end up with a frequency table that denotes angles and the
occurrence of these angles in the image.

Note that here the bin value of the histogram is 1. Hence we get about
180 different buckets, each representing an orientation value.
Method 2:
This method is similar to the previous method, except that here we have a bin size of 20. So, the number of buckets we
would get here is 9. Again, for each pixel, we will check the orientation, and store the frequency of the orientation values
in the form of a 9 x 1 matrix. Plotting this would give us the histogram:
Method 3:
The above two methods use only the orientation values to generate histograms and do not take the gradient value into
account. Here is another way in which we can generate the histogram – instead of using the frequency, we can use the
gradient magnitude to fill the values in the matrix. Below is an example of this:

Method 4:
Let’s make a small modification to the above method. Here, we will add the contribution of a pixel’s
gradient to the bins on either side of the pixel gradient. Remember, the higher contribution should be to the
bin value which is closer to the orientation.
Step 4: Calculate Histogram of Gradients in 8×8 cells (9×1)
The histograms created in the HOG feature descriptor are not generated for the whole image. Instead, the
image is divided into 8×8 cells, and the histogram of oriented gradients is computed for each cell. Why do
you think this happens?
By doing so, we get the features (or histogram) for the smaller patches which in turn represent the whole
image. We can certainly change this value here from 8 x 8 to 16 x 16 or 32 x 32.
If we divide the image into 8×8 cells and generate the histograms, we will get a 9 x 1 matrix for each cell.
This matrix is generated using method 4 that we discussed in the previous section.

Step 5: Normalize gradients in 16×16 cell (36×1)

Although we already have the HOG features created for the 8×8 cells of the image, the gradients of the
image are sensitive to the overall lighting. This means that for a particular picture, some portion of the
image would be very bright as compared to the other portions.
We cannot completely eliminate this from the image. But we can reduce this lighting variation by
normalizing the gradients by taking 16×16 blocks. Here is an example that can explain how 16×16 blocks
are created:
Here, we will be combining four 8×8 cells to create a 16×16 block. And we already know that each 8×8 cell
has a 9×1 matrix for a histogram. So, we would have four 9×1 matrices or a single 36×1 matrix. To
normalize this matrix, we will divide each of these values by the square root of the sum of squares of the
values. Mathematically, for a given vector V:
V = [a1, a2, a3, ….a36]
We calculate the root of the sum of squares:
k = √(a1)2+ (a2)2+ (a3)2+ …. (a36)2
And divide all the values in the vector V with this value k:
Step 6: Features for the complete image

So far, we have created features for 16×16 blocks of the image. Now, we will combine all these to get the features for the
final image. We would first need to find out how many such 16×16 blocks would we get for a single 64×128 image.

We would have 105 (7×15) blocks of 16×16. Each of these 105


blocks has a vector of 36×1 as features. Hence, the total features
for the image would be 105 x 36×1 = 3780 features.
difference between these two rough
surfaces with your finger tips?

What are the ways in which you


could quantify these textures?
Texture
Classification

63 patients w/chronic hepatitis B/C  adaptive filtering of speckle,


nonlinear attenuation
 cirrhosis stage correlated with texture entropy; earliest stages hardest
to detect
Local Binary Pattern (LBP)

LBP image descriptor (along with a bit of machine learning) to automatically classify and identify textures
and patterns in images (such as the texture/pattern of wrapping paper, cake icing, or candles, for instance).

The first step in constructing the LBP texture descriptor is to convert the image to grayscale. For each
pixel in the grayscale image, we select a neighborhood of size r surrounding the center pixel. A LBP value
is then calculated for this center pixel and stored in the output 2D array with the same width and height as
the input image.

For example, let’s take a look at the original LBP


descriptor which operates on a fixed 3 x 3 neighborhood
of pixels just like this:
In the above figure we take the center pixel (highlighted in red) and threshold it against its neighborhood of 8
pixels. If the intensity of the center pixel is greater-than-or-equal to its neighbor, then we set the value to 1;
otherwise, we set it to 0. With 8 surrounding pixels, we have a total of 2 ^ 8 = 256 possible combinations of
LBP codes.
From there, we need to calculate the LBP value for the center pixel. We can start from any neighboring
pixel and work our way clockwise or counter-clockwise, but our ordering must be kept consistent for all
pixels in our image and all images in our dataset. Given a 3 x 3 neighborhood, we thus have 8 neighbors that
we must perform a binary test on. The results of this binary test are stored in an 8-bit array, which we then
convert to decimal, like this:

In this example we start at the top-right point and work our way clockwise accumulating the binary string
as we go along. We can then convert this binary string to decimal, yielding a value of 23.
This value is stored in the output LBP 2D array, which we can then visualize below
The last step is to compute a histogram over the output LBP array. Since a 3 x 3 neighborhood has 2 ^ 8 =
256 possible patterns, our LBP 2D array thus has a minimum value of 0 and a maximum value of 255, allowing
us to construct a 256-bin histogram of LBP codes as our final feature vector:

A primary benefit of this original LBP implementation is that we can capture


extremely fine-grained details in the image. However, being able to capture details
at such a small scale is also the biggest drawback to the algorithm — we cannot
capture details at varying scales, only the fixed 3 x 3 scale!
GLCM
GLCM
Level Co-ocurrence Matrix (GLCM) method is a way of extracting second order
statistical texture features. The GLCM functions characterize the texture of an image
by calculating how often pairs of pixel with specific values and in a specified spatial
relationship occur in an image, creating a GLCM, and then extracting statistical measures
from this matrix.
 A gray level co-occurrence matrix (GLCM) contains information about the positions of
pixels having similar gray level values.
 The Gray Level Co-ocurrence Matrix (GLCM) method is used for extracting Statistical
Texture Parameters i.e., Homogeneity, Entropy, Moment, Contrast, Brightness,
Correlation.
 The GLCM is computed in the first step, while the texture features
based on the GLCM are calculated in the second step.
First Order Statistical Texture
Histogram Analysis

• Mea Clear cell renal cell


carcinoma
• nStandard
Deviation
• Skewness
•• Kurtosis
Entropy Renal
• Quartile oncocytoma

s
• Min/
Often useful to
Max
mask regions of
interest first Chen et al
2017
Contrast (Moment 2 or standard deviation) is a measure of intensity or gray level variations between the reference
pixel and its neighbor. Large contrast reflects large intensity differences in GLCM:

Homogeneity measures how close the distribution of elements in the GLCM is to the diagonal of
GLCM. As homogeneity increases, the contrast, typically, decreases:

Entropy is the randomness or the degree of disorder present in the image. The value of entropy is the largest when all
elements of the cooccurrence matrix are the same and small when elements are unequal:
Energy is derived from the Angular Second Moment (ASM). The ASM measures the local
uniformity of the gray levels. When pixels are very similar, the ASM value will be large. Consider
Procedure to Generate GLCM Matrix:

You might also like