SIFT Transform
SIFT Transform
SIFT Transform
Let the original image be A. Let the new image be B. You move the mask all over the image
A. Let us suppose the mask is centered at location (i,j) in image A. You compute the point-
wise product between the mask entries and the corresponding entries in A. You store the
sum of these products in B(i,j).
r r
B(i, j ) A(i a, j b) M (a, b)
a rb r
Whenever you are performing a filtering operation on image, the resultant image is obtained
by convolving the original image with the filter, and is said to be the response to the filter.
For further details, refer to Section 3.4 of the book on Digital Image Processing by Gonzalez,
or see the animation at: http://en.wikipedia.org/wiki/Convolution
Step 1: Approximate keypoint location
Down- D( x, y, )
sampling
L( x, y, k ) L( x, y, )
L ( x, y , )
G( x, y, )* I ( x, y)
Octave = doubling of σ0. Within an octave, the adjacent scales differ by a constant factor k. If an
octave contains s+1 images, then k = 2(1/s). The first image has scale σ0, the second image has
scale kσ0, the third image has scale k2σ0, and the last image has scale ksσ0. Such a sequence of
images convolved with Gaussians of increasing σ constitute a so-called scale space.
Scale = 0 Scale = 1 Scale = 4
http://en.wikipedia.org/wiki/Scale_space
Step 1: Approximate keypoint location
Image taken from D. Lowe,
“Distinctive Image Features
from Scale-Invariant
Points”, IJCV 2004
http://upload.wikimedia.org/wikipedia/commons/4/44/Sift_keypoints_filtering.jpg
Step 2: Refining keypoint location
• The keypoint location and scale is discrete –
we can interpolate for greater accuracy.
• For this, we express the DoG function in a
small 3D neighborhood around a keypoint
(xi,yi,σi) by a second-order Taylor-series:
T
D( x, y , ) 1 T 2 D ( x, y , )
D( x, y , ) D( xi , yi , i )
2 xx ,
;
( x, y , ) y yi ,
x xi , 2 ( x, y , ) y yi ,
i
i i
x xi
y yi Gradient vector 3 x 3 Hessian matrix
evaluated digitally at the
i evaluated digitally at
the keypoint keypoint
Step 2: Refining keypoint location
• To find an extremum of the DoG values in this
neighborhood, set the derivative of D(.) to 0. This gives us:
xˆ 1
D ( x, y , )
2
D( x, y , )
ˆ
y
( x, y , ) 2 x xi , ( x, y , ) x xi ,
ˆ y yi , y yi ,
i i
xˆ
T
1 D( x, y , )
Dextremal D( xi , yi , i ) yˆ
2 ( x, y , ) y yii ,
x x ,
i ˆ
http://upload.wikimedia.org/wikipedia/commons/4/44/Sift_keypoints_filtering.jpg
Step 2: Refining keypoint location
• Some keypoints reside on edges, as edges
always give a high response to a DoG filter.
• But edges should not be considered salient
points (why?).
• So we discard points that lie on edges.
• In the case of KLT tracker, we saw how to
detect points lying on salient edges using the
structure tensor.
Step 2: Refining keypoint location
• The SIFT paper uses the 2nd derivative matrix
(called the Hessian matrix):
http://en.wikipedia.org/wiki/Principal_curvature
Step 2: Refining keypoint location
• An edge will have high maximal curvature, but
very low minimal curvature.
• A keypoint which is a corner (not an edge) will
have high maximal and minimal curvature.
• The following can be regarded as an edge-ness
measure:
Should be less than a threshold
(say 10).
http://upload.wikimedia.org/wikipedia/commons/4/44/Sift_keypoints_filtering.jpg
Step 3: Assigning orientations
• Compute the gradient magnitudes and
orientations in a small window around the
keypoint – at the appropriate scale.
35
30 Histogram of gradient
25
orientation – the bin-counts
20
are weighted by gradient
15
magnitudes and a Gaussian
weighting function. Usually,
10
36 bins are chosen for the
5
orientation.
0
-0.5 0 0.5 1 1.5 2 2.5 3 3.5
Step 3: Assigning orientations
• Assign the dominant orientation as the
orientation of the keypoint.
• In case of multiple peaks or histogram entries
more than 0.8 x peak, create a separate
descriptor for each orientation (they will all
have the same scale and location).
35
30
Histogram of gradient
25
orientation – the bin-counts
are weighted by gradient
20
magnitudes and a Gaussian
15
weighting function. Usually,
10
36 bins are chosen for the
5
orientation.
0
-0.5 0 0.5 1 1.5 2 2.5 3 3.5
Step 4: Descriptors for each keypoint
• Consider a small region around the keypoint. Divide it
into n x n cells (usually n = 2). Each cell is of size 4 x 4.
• Build a gradient orientation histogram in each cell. Each
histogram entry is weighted by the gradient magnitude
and a Gaussian weighting function with σ = 0.5 times
window width.
• Sort each gradient orientation histogram bearing in mind
the dominant orientation of the keypoint (assigned in
step 3).
Image taken from D. Lowe,
“Distinctive Image Features
from Scale-Invariant
Points”, IJCV 2004
Step 4: Descriptors for each keypoint
• We now have a descriptor of size rn2 if there are r bins in
the orientation histogram.
• Typical case used in the SIFT paper: r = 8, n = 4, so length
of each descriptor is 128.
• The descriptor is invariant to rotations due to the sorting.
Image taken
from D. Lowe,
“Distinctive
Image Features
from Scale-
Invariant
Points”, IJCV
2004
Step 4: Descriptors for each keypoint
• For scale-invariance, the size of the window should be
adjusted as per scale of the keypoint. Larger scale =
larger window.
http://www.vlfeat.org
/overview/sift.html
Step 4: Descriptors for each keypoint
• The SIFT descriptor (so far) is not illumination invariant –
the histogram entries are weighted by gradient
magnitude.
• Hence the descriptor vector is normalized to unit
magnitude. This will normalize scalar multiplicative
intensity changes.
• Scalar additive changes don’t matter – gradients are
invariant to constant offsets anyway.
• Not insensitive to non-linear illumination changes.
Step 1: Keypoint computations?
(More details)
• Why are DoG filters used?
• The Gaussian filter (and its derivatives) is shown to be
the only filter obeys all of the following:
Linearity
Shift-invariance
Structures at coarser scales are related to structures at
finer scales in a consistent way (smoothing process
does not produce new structures)
Rotational symmetry
Semi-group property: G( x, y,1 ) * G( x, y, 2 ) G( x, y,1 2 )
+ some other properties
http://en.wikipedia.org/wiki/Scale_space
Step 1: Keypoint computations?
(More details)
• Why difference of Gaussians?
The DoG is an approximation to the scale-
multiplied Laplacian of Gaussian filter in
image processing – a rotationally invariant
filter.
The DoG is a good model for how neurons in
the retina extract image details to be sent to
the brain for processing.
LoG filter of scale σ
produces strong
responses for patterns
of radius σ * sqrt(2).
http://www.cs.utexas.edu/~grauman/courses/spring2011/slides/lecture14_localfeats.pdf
Keypoint = center of blob
http://www.cs.utexas.edu/~grauman/courses/spring2011/slides/lecture14_localfeats.pdf
http://www.cs.utexas.edu/~grauman/courses
/spring2011/slides/lecture14_localfeats.pdf
http://www.cs.utexas.edu
/~grauman/courses/spring
2011/slides/lecture14_loc
alfeats.pdf
http://www.cs.utexas.edu
/~grauman/courses/spring
2011/slides/lecture14_loc
alfeats.pdf