0% found this document useful (0 votes)
53 views34 pages

Sift Detector and Descriptor: (Scale Invariant Feature Transform)

1. The SIFT detector and descriptor were developed by David Lowe to detect and describe local features in images that are invariant to changes in scale, rotation, and illumination. 2. SIFT extracts distinctive invariant features from images based on location and scale, orientation, and local image gradients. 3. Keypoints are filtered and described by histograms of local image gradient orientations to provide features that can be matched between different images.

Uploaded by

Nguyen Viet Anh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views34 pages

Sift Detector and Descriptor: (Scale Invariant Feature Transform)

1. The SIFT detector and descriptor were developed by David Lowe to detect and describe local features in images that are invariant to changes in scale, rotation, and illumination. 2. SIFT extracts distinctive invariant features from images based on location and scale, orientation, and local image gradients. 3. Keypoints are filtered and described by histograms of local image gradient orientations to provide features that can be matched between different images.

Uploaded by

Nguyen Viet Anh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

The SIFT (Scale Invariant Feature

Transform) Detector and Descriptor


developed by David Lowe
University of British Columbia
Initial paper ICCV 1999
Newer journal paper IJCV 2004

Review: Matt Browns Canonical Frames

11/1/2010

Multi-Scale Oriented Patches

Extract oriented patches at multiple scales

11/1/2010

[ Brown, Szeliski, Winder CVPR 2005 ]

Application: Image Stitching

11/1/2010

[ Microsoft Digital Image Pro version 10 ]

Ideas from Matts Multi-Scale Oriented Patches

1. Detect an interesting patch with an interest


operator. Patches are translation invariant.
2. Determine its dominant orientation.
3. Rotate the patch so that the dominant
orientation points upward. This makes the
patches rotation invariant.
4. Do this at multiple scales, converting them
all to one scale through sampling.
5. Convert to illumination invariant form

11/1/2010

Implementation Concern:
How do you rotate a patch?

Start with an empty patch whose dominant


direction is up.
For each pixel in your patch, compute the
position in the detected image patch. It will be
in floating point and will fall between the
image pixels.
Interpolate the values of the 4 closest pixels
in the image, to get a value for the pixel in
your patch.

11/1/2010

Rotating a Patch
(x,y)

T
(x,y)

empty canonical patch

x = x cos y sin
y = x sin + y cos

patch detected in the image

counterclockwise rotation

11/1/2010

Using Bilinear Interpolation

Use all 4 adjacent samples

I01

I11

I00

11/1/2010

I10

SIFT: Motivation

The Harris operator is not invariant to scale and


correlation is not invariant to rotation1.

For better image matching, Lowes goal was to


develop an interest operator that is invariant to scale
and rotation.

Also, Lowe aimed to create a descriptor that was


robust to the variations corresponding to typical
viewing conditions. The descriptor is the most-used
part of SIFT.

1But

Schmid and Mohr developed a rotation invariant descriptor for it in 1997.

11/1/2010

Idea of SIFT
Image content is transformed into local feature
coordinates that are invariant to translation, rotation,
scale, and other imaging parameters

SIFT Features
11/1/2010

10

Claimed Advantages of SIFT

Locality: features are local, so robust to occlusion


and clutter (no prior segmentation)

Distinctiveness: individual features can be


matched to a large database of objects

Quantity: many features can be generated for even


small objects

Efficiency: close to real-time performance

Extensibility: can easily be extended to wide range


of differing feature types, with each adding
robustness

11/1/2010

11

Overall Procedure at a High Level


1. Scale-space extrema detection
Search over multiple scales and image locations.

2. Keypoint localization
Fit a model to detrmine location and scale.
Select keypoints based on a measure of stability.

3. Orientation assignment
Compute best orientation(s) for each keypoint region.

4. Keypoint description
Use local image gradients at selected scale and rotation
to describe each keypoint region.
11/1/2010

12

1. Scale-space extrema detection

Goal: Identify locations and scales that can be


repeatably assigned under different views of the
same scene or object.
Method: search for stable features across multiple
scales using a continuous function of scale.
Prior work has shown that under a variety of
assumptions, the best function is a Gaussian
function.
The scale space of an image is a function L(x,y,)
that is produced from the convolution of a Gaussian
kernel (at different scales) with the input image.

11/1/2010

13

Aside: Image Pyramids


And so on.
3rd level is derived from the
2nd level according to the same
funtion
2nd level is derived from the
original image according to
some function

Bottom level is the original image.

11/1/2010

14

Aside: Mean Pyramid


And so on.
At 3rd level, each pixel is the mean
of 4 pixels in the 2nd level.
At 2nd level, each pixel is the mean
of 4 pixels in the original image.
mean

Bottom level is the original image.

11/1/2010

15

Aside: Gaussian Pyramid


At each level, image is smoothed and
reduced in size.
And so on.

Apply Gaussian filter

At 2nd level, each pixel is the result


of applying a Gaussian mask to
the first level and then subsampling
to reduce the size.

Bottom level is the original image.

11/1/2010

16

Example: Subsampling with Gaussian pre-filtering

G 1/8
G 1/4

Gaussian 1/2

11/1/2010

17

Lowes Scale-space Interest Points

Laplacian of Gaussian kernel

Scale normalised (x by scale2)


Proposed by Lindeberg

Scale-space detection

Find local maxima across scale/space


A good blob detector

11/1/2010

[ T. Lindeberg IJCV 1998 ]

18

Lowes Scale-space Interest Points:


Difference of Gaussians

11/1/2010

Gaussian is an ad hoc
solution of heat
diffusion equation

Hence

k is not necessarily very


small in practice
19

Lowes Pyramid Scheme


Scale space is separated into octaves:
Octave 1 uses scale
Octave 2 uses scale 2
etc.
In each octave, the initial image is repeatedly convolved
with Gaussians to produce a set of scale space images.
Adjacent Gaussians are subtracted to produce the DOG
After each octave, the Gaussian image is down-sampled
by a factor of 2 to produce an image the size to start
the next level.
11/1/2010

20

Lowes Pyramid Scheme

s+2 filters
s+1=2(s+1)/s0
.
.
i=2i/s0
.
.
2=22/s0
1=21/s0
0
11/1/2010

s+3
images
including
original
The parameter s determines the number of images per octave.

s+2
difference
images
21

Key point localization

Detect maxima and


minima of difference-ofGaussian in scale space
Each point is compared
to its 8 neighbors in the
current image and 9
neighbors each in the
scales above and below

11/1/2010

s+2 difference images.


top and bottom ignored.
s planes searched.

Resample
Blur
Subtract

For each max or min found,


output is the location and
the scale.

22

Scale-space extrema detection: experimental results over 32 images


that were synthetically transformed and noise added.
% detected

average no. detected

% correctly matched
average no. matched

Stability

Expense

Sampling in scale for efficiency

How many scales should be used per octave? S=?

11/1/2010

More scales evaluated, more keypoints found


S < 3, stable keypoints increased too
S > 3, stable keypoints decreased
S = 3, maximum stable keypoints found
23

Keypoint localization

Once a keypoint candidate is found, perform a


detailed fit to nearby data to determine

location, scale, and ratio of principal curvatures

In initial work keypoints were found at location and


scale of a central sample point.
In newer work, they fit a 3D quadratic function to
improve interpolation accuracy.
The Hessian matrix was used to eliminate edge
responses.

11/1/2010

24

Eliminating the Edge Response

Reject flats:

< 0.03

Reject edges:
Let be the eigenvalue with
larger magnitude and the smaller.

Let r = /.
So = r

r < 10
What does this look like?

11/1/2010

(r+1)2/r is at a
min when the
2 eigenvalues
are equal.

25

3. Orientation assignment

Create histogram of
local gradient directions
at selected scale
Assign canonical
orientation at peak of
smoothed histogram
Each key specifies
stable 2D coordinates
(x, y, scale,orientation)

If 2 major orientations, use both.


11/1/2010

26

Keypoint localization with orientation

233x189

832
initial keypoints

729
keypoints after
gradient threshold

11/1/2010

536
keypoints after
ratio threshold

27

4. Keypoint Descriptors

At this point, each keypoint has

location
scale
orientation

Next is to compute a descriptor for the local


image region about each keypoint that is

11/1/2010

highly distinctive
invariant as possible to variations such as
changes in viewpoint and illumination
28

Normalization

Rotate the window to standard orientation

Scale the window size based on the scale at


which the point was found.

11/1/2010

29

Lowes Keypoint Descriptor


(shown with 2 X 2 descriptors over 8 X 8)

In experiments, 4x4 arrays of 8 bin histogram is used,


a total of 128 features for one keypoint
11/1/2010

30

Lowes Keypoint Descriptor

use the normalized region about the keypoint


compute gradient magnitude and orientation at each
point in the region
weight them by a Gaussian window overlaid on the
circle
create an orientation histogram over the 4 X 4
subregions of the window
4 X 4 descriptors over 16 X 16 sample array were
used in practice. 4 X 4 times 8 directions gives a
...
vector of 128 values.

11/1/2010

31

Using SIFT for Matching Objects

11/1/2010

32

11/1/2010

33

Uses for SIFT

Feature points are used also for:

11/1/2010

Image alignment (homography, fundamental


matrix)
3D reconstruction (e.g. Photo Tourism)
Motion tracking
Object recognition
Indexing and database retrieval
Robot navigation
many others
[ Photo Tourism: Snavely et al. SIGGRAPH 2006 ]

34

You might also like