0% found this document useful (0 votes)
24 views

Classical Computer Vision - Session 2

Uploaded by

tahatarek7770
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Classical Computer Vision - Session 2

Uploaded by

tahatarek7770
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

TEXTURE FEATURES

Texture features are repeated patterns of local variation in image intensity. It gives us
information about the spatial arrangement of color or intensities in an image. In other
words texture analysis attempts to quantify intuitive qualities described by terms such
as rough, smooth, silky, or bumpy as a function of the spatial variation in pixel
intensities. This will be done using binary codes as we will take on the next slides.
Note: Texture feature can’t be defined for a point (pixel).
Images can have same intensity distribution with different textures as shown below:
TEXTURE FEATURES
Applications of texture analysis
● Segment image into regions with same texture. (Image segmentation)
● Object recognition based on their textures.
● Edge detection can be done by the change in the texture (The parts where the
texture changes are mostly edges.

Texture basic element is called texels. The texture generally is found in 2 parts:
● Tone which is the pixel intensity
● Structure which the spatial relationship between the texels.
LOCAL BINARY PATTERNS (LBP)
Local binary patterns (LBPs) are texture descriptors made in 2002 that work locally on
parts of images. This local representation is constructed by comparing each pixel with
its surrounding neighborhood of pixels.
Steps of constructing LBP descriptor
● Convert image to grayscale images.
● Loop over each pixel such that we compare it with its 8 neighbours (28 possibility).
○ If neighbour has less value then put 1.
○ If neighbour has higher value then put 0.
LOCAL BINARY PATTERNS (LBP)
● Convert the LPB code you got to a decimal value in counter clockwise manner.

● Change value of output image at this pixel location to 23 and keep looping on
other pixels and do the same previous steps.
LOCAL BINARY PATTERNS (LBP)
The output will be something like this which is not very intuitive to us but make sense
for computer vision tasks like segmentation..
LOCAL BINARY PATTERNS (LBP)
The problem of this original LBP descriptor is that it can’t sense the details at varying
scales (it can only sense the variability in 3x3 neighbourhood), two parameters will be
introduced to account for variable neighbourhood sizes:
● Number of points (P) in circularly neighbourhood to consider.
● Radius (R) which gives us the availability to deal with different scales.

How we get the values of g1, g3, g5, and g7 points at figure (a) ? Bilinear interpolation !!
LOCAL BINARY PATTERNS (LBP)
Bilinear interpolation is a popular method for two-dimensional interpolation on a
rectangle. That is, we assume that we know the values of some unknown function at
four points that form a rectangle. Using bilinear interpolation, we can estimate this
function's value at any point (x, y) inside this rectangle. We will denote this unknown
value by P.
INTEREST POINTS
Look at this red box and ask yourself, is this part can be assumed as interest point? Is it
distinguishable and seems unique?
INTEREST POINTS (FOR IMAGE MATCHING)
INTEREST POINTS
To find points that can tell me that this image is same as this one, we need to find
features that are unique and distinguishable where we can decide whether the 2 images
are the same or not based on them. These are the interest points which are local
features (subparts of image) which are invariant to transformation as shown below:
INTEREST POINTS
One of the most used interest points that gives uniqueness and distinguishability are
corners. WHY?
INTEREST POINTS
What makes corner a great interest point that we can match or distinguish based on it?
INTEREST POINTS
CORNER DETECTION (HARRIS)
How is the discovery of corners done mathematically?

(u, v) is the shifting in x & y directions.


The equation is actually measuring the sum of squared differences between the
intensity of the pixels after shifting and before shifting where w(x, y) is a rectangular
(gives 0s weights to all pixels outside the window and 1s to all pixels in the window we
test) or Gaussian (Give higher weights to the center of the window and lower weights
towards the window sides)

In your opinion, what output of E(u, v) makes it corner or not?


CORNER DETECTION (HARRIS)
Since I(x+u, y+v) - I(x, y) is the measurement of quantity of variation between the original
patch(window) and after shifting the window hence as the value of E(u, v) increases, this
means more variation which means more cornerness. If E(u, v) is 0 this means that the
shift of the window doesn’t affect the values intensities which means it is not a corner
but a flat surface or moving along an edge as we said before.
CORNER DETECTION (HARRIS)
Using Taylor’s series of expansion on I(x+u, y+v), we reach the following:

If the shift (u, v) is small, then we can approximate it to: (remove higher order terms)

Note: Ix means the differentiation of I in the x-direction which is actually the vertical
edges. It can be computed using sobel filter easily.
CORNER DETECTION (HARRIS)
This leads to the following equation of E(u, v):
CORNER DETECTION (HARRIS)
CORNER DETECTION (HARRIS)
Finally we can reach our final equation:

Now what we want is to know which directions makes the largest or smallest values of
E(u, v)? Eigenvalues can do this for us. λ1 and λ2 will be the 2 eigenvalues of M.
● If λ1 and λ2 are small, hence the E(u, v) will be small ⇒ Flat surface.
● If λ1 >> λ2 or vice versa hence change is in one direction ⇒ Edge.
● If λ1 and λ2 are large and close to each other (λ1 ~ λ2) which means E(u, v) is high in
all direction ⇒ Corner
CORNER DETECTION (HARRIS)
CORNER DETECTION (HARRIS)
Instead of computing the Eigenvalues explicitly, we can use the following equation to
get corner strength (R):

Notes:
● trace(matrix) =
● k value is empirically between 0.04 and 0.06.
● det(M) = λ1 . λ2
● trace(M) = λ1 + λ2
How to say whether it is a corner, edge or flat?
● If R is above threshold hence corner (interest point)
● If R is negative hence edge (contour)
● If R is small around 0 hence flat (uniform)
CORNER DETECTION (HARRIS)
CORNER DETECTION (HARRIS)
R-Value where red means high and blue means low
CORNER DETECTION (HARRIS)
Thresholding if R > threshold value
CORNER DETECTION (HARRIS)
CORNER DETECTION (HARRIS)
Now we can say that Harris detector can get the interest points on an image where we
can use them to define pixels or parts that are unique enough to identify an image.

The output of Harris detector is rotation invariance so R is invariant to image rotation as


the ellipse rotates but its shape (eigenvalues) remains the same.
CORNER DETECTION (HARRIS)
Although Harris detector is rotation invariant however it is not scale invariant. TO make
it scale invariant, we need to discuss Blob detection but it is not included in this course.
HAAR CASCADES

HAAR Cascades is an algorithm for object detection regardless of objects’ location or


scale in image. it can run in real-time which make it possible to run on video streams.
The original paper focused its research on faces objects however it can be run on any
type of objects needed to be detected.
The algorithm can summarized in four stages:
● Calculate Haar features
● Create integral images
● Apply AdaBoost
● Implement attention cascade classifiers

Let us now dive deeper to understand each stage assuming that we will be detecting
faces as the original images in our example.
HAAR CASCADES

Step1: Calculate Haar features


Before going in calculations, you need to know that Haar cascade is like any machine
learning algorithm needs positive and negative samples of the object you train on. It is
a machine learning-based algorithm as it adopts the idea of training as well.
Haar features are calculations that are performed on adjacent rectangular regions at
specific location in the detection window.
Haar Features are similar to the kernels that we have gone through before and what we
will go through in CNNs (the difference with the CNNs is that it is not trainable but
manually determined).

In Haar cascades, we have about 180,000 Haar features used to find the suitable
features representing the objects we search for. But firstly what is Haar features look
like?
HAAR CASCADES

Haar features are broadly classified into three categories. The first set of two rectangle
features are responsible for finding out the edges in a horizontal or in a vertical
direction (as shown above). The second set of three rectangle features are responsible
for finding out if there is a lighter region surrounded by darker regions on either side or
vice-versa. The third set of four rectangle features are responsible for finding out change
of pixel intensities across diagonals. These are scaled at different scales and aspect
ratios to get our 180,000 Haar features.
HAAR CASCADES
HAAR CASCADES

These Haar features are mainly applied in iterative manner as sliding window on the
image:
HAAR CASCADES

What actually happens is that When haar features are applied to image as shown in the
previous slide, each feature results in a single value which is calculated by subtracting
the average of pixels under white rectangle from the average of pixels under black
rectangle.
HAAR CASCADES

The objective from this step is to find out if the image has an edge separating dark
pixels on the right and light pixels on the left or not. Hence we say that there is an edge
detected if the haar value is closer to 1. In the example above, there is no edge as the
haar value is far from 1 (-0.02).
This is just one representation of a particular haar feature separating a vertical edge.
Now there are other haar features as well, which will detect edges in other directions
and any other image structures. To detect an edge anywhere in the image, the haar
feature needs to traverse the whole image.
Now, the haar features looping on an image would involve a lot of mathematical
calculations. As we can see for a single rectangle on either side, it involves 18 pixel value
additions (for a rectangle enclosing 18 pixels). Imagine doing this for the whole image
with all sizes (to be explained later) of the haar features. This would be a hectic
operation even for a high performance machine. This would be the motivation of the
second step.
HAAR CASCADES

Step2: Create integral images


The integral image is an image created from the original image where each pixel is
actually the summation of all pixels above and to the left of the current pixel.
HAAR CASCADES

Can you guess what is the gain of using integral images?


With the Integral Image, only maximum of 4 constant value operations are needed each
time for any feature size (with respect to the 18 additions earlier).
Operation = Bottom right value - value to left of bottom left value - value above top
right value + value to the top left of the top left value
HAAR CASCADES

Step3: Apply AdaBoost


We have said before that we use 180,000 haar features to find the object which is too
much to be used that is why Adaboost came to reduce the 180,000 to the top 6000 (as
proposed in the paper but can be any relevant number to your task) Haar features that
best distinguish the object from the non-object. The 180,000 haar features are called
weak learners as we have said before in the Adaboost (Revise tree-based algorithms in
ML) where from these weak learner we will try to figure out strong learners
How learning takes place in Adaboost?
● Given positive and negative image examples at the object (cropped faces at our
example)
● Loop by each of the 180,000 haar features we have on all our positive and negative
images and for each image we classify whether it is a face or not
● Calculate our error rate for each haar feature by computing the number of
misclassified faces over the total number of images.
HAAR CASCADES

● Choose the haar feature which has the lowest error rate and get it out of your haar
features (remaining haar features now is (180,000-1).
● Now update the weights of each misclassified image sample higher to stress on
correcting them in the next iteration.
● Loop again with all remaining haar features on all images once again and repeat
the error computation and choose the next haar feature to retain until reaching
the number of haar features needed or reach the accuracy you are seeking for.
Note:
● Don’t forget that in Adaboost, weights will be given to each predictor (haar
feature) based on its erroneous (performance) so better haar features will have
higher weights in the inference.
● Weights of data samples are different from final weights of predictors as weights
of data samples affect the next decision tree while weights of predictors affect the
final output in the inference stage.
HAAR CASCADES

Step4: Implement attention cascade classifiers


The subset of 6000 features will again run on the inference stage to detect if there’s a
facial feature present or not. The authors have taken a standard window size of 24x24
within which the feature detection will be running but this is a very computational and
time consuming task and that is why the attention cascade classifier comes.
The idea behind this is, not all the features need to run on each and every window. If a
feature fails on a particular window, then we can say that the facial features are not
present there. Hence, we can move to the next windows where there can be facial
features present.
HAAR CASCADES

Haar features are applied on the images in stages in the following manner:
● The stages in the beginning contain simpler features, in comparison to the
features in a later stage which are complex, complex enough to find the nitty gritty
details on the face. If the initial stage won’t detect anything on the window, then
discard the window itself from the remaining process, and move on to the next
window. This way a lot of processing time will be saved, as the irrelevant windows
will not be processed in the majority of the stages.
● The second stage processing would start, only when the features in the first stage
are detected in the image. The process continues like this, i.e. if one stage passes,
the window is passed onto the next stage, if it fails then the window is discarded.

Check the visualization in the next slide.


HAAR CASCADES

This is a simple visualization, however in reality there is much stages than that and
much features in each stage.
HAAR CASCADES

In the paper, the author proposed a total of 38 stages for something around 6000
features. The number of features in the first five stages are 1, 10, 25, 25, and 50, and this
increased in the subsequent stages.
The initial stages with simpler and lesser number of features removed most of the
windows not having any facial features, thereby reducing the false negative ratio,
whereas the later stages with complex and more number of features can focus on
reducing the error detection rate, hence achieving low false positive ratio.
HAAR CASCADES
FEATURE DESCRIPTOR
Feature descriptor is a representation of an image that simplifies the image by
extracting only the most useful information and throw away all unneeded information.
Let us take an example and say:
Can you tell me what you see in the two images below?
FEATURE DESCRIPTOR
Let us make it slightly harder, Now Can you tell me what you see in the two images
below?
FEATURE DESCRIPTOR
What is the difference between first pair and second pair of images?
The first pair carries much more information like colors, shapes, background, etc.., while
the second pair only carries the edges, corners and shapes. Although the information in
second pair is less, we were able to say what is the object in the image because it
carries the most important information which is sufficient to recognize the object. This
what feature descriptors are made to do.
Famous feature descriptors:
● HoG (Histogram of Gradients)
● SIFT (Scale Invariant Feature Transform)
● SURF (Speeded-Up Robust Feature)
SURF is not included in the course but it is a successor of SIFT that uses integral images
in convolution with box filters to fast up the computation.
Let us now start with the Histogram of Gradients (HoG).
HISTOGRAM OF GRADIENTS (HOG)
HoG is is a feature descriptor that mainly relies on the idea of gradients (magnitudes
and directions) to detect an object in an image. It focuses on representing edges in a
better way as we will show now.
Steps:
● Preprocess the image.
● Calculate the gradients.
● Calculate the magnitude and the orientation of gradients.
● Calculate the histogram of the gradients in nxn cells.
● Normalize gradients in 2n x 2n cells.
● Generate the features for the whole image.

Let us now take it step by step and understand it.


HISTOGRAM OF GRADIENTS (HOG)
Step1: Preprocess the image.
The preprocessing needed here is resizing the image to 1:2 or 2:1 ratio between the width
and the height based on the object you are searching for if it is landscape or portrait.
The image size would be 64x128 (this the size used in their papers) which make it easier
in calculations in the following steps to grid our image to 8x8 or 16x16 grids.
HISTOGRAM OF GRADIENTS (HOG)
Step2: Calculate the gradients.
Calculate the gradient of each pixel in the image in both x-direction and y-direction

Gradient in x-direction at red pixel is (Gx) = 89-78 = 11


Gradient in y-direction at red pixel is (Gy) = 68-56 = 8
HISTOGRAM OF GRADIENTS (HOG)
This process will give us two new matrices – one storing gradients in the x-direction and
the other storing gradients in the y direction. The magnitude would be higher when
there is a sharp change in intensity, such as around the edges.
Can you tell us what this step looks like from what we have taken?
This is similar to using a Sobel or Prewitt Kernel of with size 1x3 or 3x1.

We have calculated the gradients in both x and y direction separately. The same process
is repeated for all the pixels in the image.
HISTOGRAM OF GRADIENTS (HOG)
Step3: Calculate the magnitude and the orientation
To calculate the magnitude and gradient at each pixel, we will use Pythagoras theorem:
Gradient Magnitude = √(Gx2 + Gy2) ⇒ √(112 + 82) = 13.6
Gradient Orientation = tan-1 (Gy / Gx) ⇒ tan-1 (8/11) = 36o
Step4: Calculate the histogram of the gradients in nxn cells (in the example below 5x5)
The simplest method for generating histogram is just counting the occurrence of each
value as shown below:
HISTOGRAM OF GRADIENTS (HOG)
The process is repeated for all the pixels orientation and magnitudes noting that here
the bin value of the histogram is 1. Hence we get about 180 different buckets, each
representing an orientation value. Another method is to create the histogram features
for higher bin values. By using a bin value of 20, we get 9 buckets only as shown below:

This gives us a 9x1 matrix instead of the 180x1 we got in the previous one.
HISTOGRAM OF GRADIENTS (HOG)
As we can notice, the only value that is taken into consideration is orientation, where is
the magnitude value contribution? We will make what we call weighted histogram.
Note: the higher contribution should be to the bin value which is closer to the
orientation.
HISTOGRAM OF GRADIENTS (HOG)
The histograms created in the HOG feature descriptor are not generated for the whole
image. Instead, the image is divided into 8×8 cells, and the histogram of oriented
gradients is computed for each cell. Why do you think this happens?
By doing so, we get the features (or histogram) for the smaller patches which in turn
represent the whole image. We can certainly change this value here from 8 x 8 to 16 x 16
or 32 x 32.
If we divide the image into 8×8 cells and generate the histograms,
we will get a 9 x 1 matrix for each grid (8x8 cell). The histograms are
weighted histograms as we have shown in the previous slide.
HISTOGRAM OF GRADIENTS (HOG)
Step5: Normalize gradients in 2n x 2n cells
If the grid was 8x8, hence we normalize the gradients on 16x16 cells.
Why we do such step? The gradients of the image are sensitive to the overall lighting.
This means that for a particular picture, some portion of the image would be very bright
as compared to the other portions. We cannot completely eliminate this from the image.
But we can reduce this lighting variation by normalizing the gradients by taking 16×16
blocks.
HISTOGRAM OF GRADIENTS (HOG)
How we normalize a vector of numbers?
Each 8×8 cell has a 9×1 matrix for a histogram. So, we would have four 9×1 matrices or a
single 36×1 matrix. To normalize this matrix, we will divide each of these values by the
square root of the sum of squares of the values.

Remember: The sum of squared of values is k = √(a12 + a22 + a32 + … + a362) noting that an2
is the nth value in the 36 x 1 matrix of the 16x16 grids we normalize on.
HISTOGRAM OF GRADIENTS (HOG)
Step6: Generate the features for the whole image
Can you guess what would be the total number of features that we will have for the
given image noting that the grids are 8x8 hence normalizing on 16x16 and the size of
image is 64x128?
we have created features for 16×16 blocks of the image. Now, we will combine all these
to get the features for the final image. We would have 105 (7x15) blocks of 16×16 for a
single 64×128 image.
Each of these 105 blocks has a vector of 36×1 as features.
The total features for the image would be 105 x 36 x 1 = 3780 features.
HISTOGRAM OF GRADIENTS (HOG)
What happens if initial image size increases with same grid size?
The total number of features representing the images will increase as well which
represents more information in the image but takes more time as well.

Now we can say that HoG describes all the edges and their orientation in the image in
the form of important features called feature descriptors.
SCALE INVARIANT FEATURE TRANSFORMATION (SIFT)

SIFT is our second feature descriptor which is used widely in image search, object
recognition and tracking. As HoG stressed on describing edges, SIFT stressed on
describing the interest points.
Can you tell me a common element in the following pictures?
SCALE INVARIANT FEATURE TRANSFORMATION (SIFT)

Probably you have said it is the Eiffel tower, The keen-eyed among you will also have
noticed that each image has a different background, is captured from different angles,
and also has different objects in the foreground (in some cases).It doesn’t matter if the
image is rotated at a weird angle or zoomed in to show only half of the Tower. We
naturally understand that the scale or angle of the image may change but the object
remains the same.

SIFT helps us locate these local features in different images which is the interest points
(Key Points) and we can use this feature descriptor as features for our image to detect
object. The major advantage of SIFT features, over edge features or hog features, is that
they are not affected by the size or orientation of the image (invariant to scale, rotation
and illumination changes while being robust to noise as well)
SCALE INVARIANT FEATURE TRANSFORMATION (SIFT)

Steps:
● Constructing scale space.
● Laplacian of Gaussian approximation (DoG).
● Finding key (interest) points.
● Eliminate edges and low contrast regions.
● Assign an orientation to the key points.
● Generate the SIFT features.

Let us now start understanding each step


SCALE INVARIANT FEATURE TRANSFORMATION (SIFT)

Step1: Constructing scale space


To create a scale space, we use Gaussian blur filters on the original images to reduce
noise in the image.the texture and minor details are removed from the image and only
the relevant information like the shape and edges remain
SCALE INVARIANT FEATURE TRANSFORMATION (SIFT)

Actually we aren’t doing the Gaussian blur once but progressively with higher sigmas
and on different octaves.
Note: Octaves means different image scaling. First octave is the original image, second
octave is half the size of first octave, third octave is half the size of second octave and
so on.
Why we want to resize image? To make our descriptor scale-invariant. This means we
will be searching for these features on multiple scales, by creating a ‘scale space’.
Scale space is a collection of images having different scales(different sigmas), generated
from a single image.
how many times do we need to resize the image and how many subsequent blur images
need to be created for each resized image? The ideal number of octaves should be four,
and for each octave, the number of blur images should be five.
Check the following slide
SCALE INVARIANT FEATURE TRANSFORMATION (SIFT)
SCALE INVARIANT FEATURE TRANSFORMATION (SIFT)

Step2: Laplacian of Gaussian approximation (DoG).


Difference of Gaussians (DoG) is an approximation of making laplacian operation
(Second order derivative) which is computationally expensive. DoG on the other hand is
using the scale space we have created in the first step and just subtracting each two
consecutive Gaussians from each other (in the same scale) which is a feature
enhancement algorithm.
SCALE INVARIANT FEATURE TRANSFORMATION (SIFT)

What the output (DoG) of two Gaussians really means?


Since we subtract higher sigma (more blurred image) from the lower sigma (less blurred
image) then the DoG is actually what was appearing in the less blurred image and
disappeared in the more blurred image which is equivalent to Laplacian operation.
Remember: Gaussian washes out the edges more so subtracting the two images gives us
the detailed image. (Revise on sharpening part if you don’t remember)
SCALE INVARIANT FEATURE TRANSFORMATION (SIFT)
SCALE INVARIANT FEATURE TRANSFORMATION (SIFT)

Step3: Finding key (interest) points.


Key point is the pixel we are working on if it has the highest or lowest value among its
26 neighbours. What are the 26 neighbours !!! The 8 neighbours of the pixel around it in
addition to the 9 pixels in the previous and following images in the same octave.
Note: Scale here means different gaussians not different sizes (octaves). We work on 2
DoG only that have higher and lower images.
SCALE INVARIANT FEATURE TRANSFORMATION (SIFT)

Step4: Eliminate edges and low contrast regions.


Key points that are generated in the previous step are too much while knowing that
some of them are edges (not unique but found a lot in any image)or don’t have enough
contrast. We will eliminate those key points to keep only the more intuitive key points
like corners (Revise Harris detector) which actually can be distinguishable and unique
for the image.
- Reject points with bad contrast (DoG smaller than 0.03 in magnitude)
- Reject edges as we have done in the Harris detector by noticing the lambda values
if one of them is much higher than the other.

TAKE CARE that we do this for all octaves and then get the union of all final key points of
all included levels (2 middle levels in the 4 DoG levels) in all octaves and place the key
point location on the original image.
SCALE INVARIANT FEATURE TRANSFORMATION (SIFT)
SCALE INVARIANT FEATURE TRANSFORMATION (SIFT)

Step6: Assign an orientation to the key points


Calculate the magnitude and orientation at each key point pixel with among nxn grid
similarly to what we have done in the HoG descriptor by getting gradient magnitude (√
(Gx2 + Gy2)) and gradient orientation (tan-1 (Gy / Gx)). After that we make weighted
histogram (histogram with bins>1 and weights) as we have done before so we would
have something like that:
SCALE INVARIANT FEATURE TRANSFORMATION (SIFT)

This histogram would peak at some point. The bin at which we see the peak will be the
orientation for the keypoint. Also, any peaks above 80% of the highest peak are
converted into a new keypoint. This new keypoint has the same location and scale as
the original but with different orientation. So, orientation can split up one keypoint into
multiple keypoints.
SCALE INVARIANT FEATURE TRANSFORMATION (SIFT)

Step6: Generate the SIFT features.


So far, we have stable key points that are scale-invariant and rotation invariant. In this
step, we will use the neighboring pixels, their orientations, and magnitude, to generate a
unique fingerprint for this keypoint called a descriptor.
This is done by taking a 16×16 window around the keypoint. This 16×16 window is broken
into sixteen 4×4 windows
SCALE INVARIANT FEATURE TRANSFORMATION (SIFT)

Within each 4×4 window, gradient magnitudes and orientations are calculated. These
orientations are put into an 8 bin histogram. Do this for all sixteen 4×4 regions. So you
end up with 16x8 = 128 numbers. Once you have all 128 numbers, you normalize them.
These 128 numbers form the “feature vector”. This keypoint is uniquely identified by this
feature vector. We will have a feature vector (descriptor) for each key point in the image.

Note:
The higher gradient magnitudes are by default around the key point (This is why it was
identified as key point from the beginning) which means that pixels in each 4x4 window
that are closer to the key point will contribute more in the histogram (feature vector)
which is what we want.

You might also like