0% found this document useful (0 votes)
10 views68 pages

Comparis I On

The document outlines the SIFT (Scale-Invariant Feature Transform) algorithm, detailing its distinctive features, computation steps, and robustness to various transformations. Key steps include scale-space extrema detection, keypoint localization, orientation assignment, and descriptor generation, which enable effective matching of image features. Applications of SIFT span object recognition, categorization, localization, and image retrieval, with variations like PCA-SIFT and SURF mentioned for enhanced performance.

Uploaded by

Satya Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views68 pages

Comparis I On

The document outlines the SIFT (Scale-Invariant Feature Transform) algorithm, detailing its distinctive features, computation steps, and robustness to various transformations. Key steps include scale-space extrema detection, keypoint localization, orientation assignment, and descriptor generation, which enable effective matching of image features. Applications of SIFT span object recognition, categorization, localization, and image retrieval, with variations like PCA-SIFT and SURF mentioned for enhanced performance.

Uploaded by

Satya Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 68

• Highly distinctive

– A single feature can be correctly matched with high


probability against a large database of features from many
images.
• Scale and rotation invariant.
• Partially invariant to 3D camera viewpoint
– Can tolerate up to about 60 degree out of plane rotation
• Partially invariant to changes in illumination
• Can be computed fast and efficiently.
Example

http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_implementations_of_SIFT
SIFT Computation – Steps
(1) Scale-space extrema detection
– Extract scale and rotation invariant interest points (i.e.,
keypoints).
(2) Keypoint localization
– Determine location and scale for each interest point.
– Eliminate “weak” keypoints
(3) Orientation assignment
– Assign one or more orientations to each keypoint.
(4) Keypoint descriptor
– Use local image gradients at the selected scale.
D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal
of Computer Vision, 60(2):91-110, 2004.
1. Scale-space Extrema Detection
scale
• Harris-Laplace

 LoG 
• Find local maxima of:
– Harris detector in space y
– LoG in scale
 Harris  x

• SIFT scale

 DoG 
Find local maxima of:
– Hessian in space y
– DoG in scale

 Hessian  x
1. Scale-space Extrema Detection
(cont’d)
• DoG images are grouped by octaves (i.e., doubling of σ0)
• Fixed number of levels per octave

22 σ 0
D ( x, y ,  ) 
L( x, y, k )  L( x, y,  )
down-sample
where
2σ0
L ( x, y ,  ) 
G ( x, y ,  ) * I ( x, y )

σ0
1. Scale-space Extrema Detection
(cont’d)
(ks=2) ksσ0

… • Images within each octave are


separated by a constant factor k
k2σ0
• If each octave is divided in s intervals:
k1σ0
ks=2 or k=21/s
k0σ0
Choosing SIFT parameters
(cont’d)
2σ0

• Pre-smoothing discards high frequencies.


• Double the size of the input image


k2σ0 (i.e., using linear interpolation) prior to
building the first level of the DoG
kσ0 pyramid.
σ0

• Increases the number of stable


keypoints by a factor of 4.
1. Scale-space Extrema Detection
(cont’d)
• Extract local extrema (i.e., minima or maxima) in DoG pyramid.
-Compare each point to its 8 neighbors at the same level, 9 neighbors
in the level above, and 9 neighbors in the level below (i.e., 26 total).
2. Keypoint Localization
• Determine the location and scale of
keypoints to sub-pixel and sub-
scale accuracy by fitting a 3D
quadratic polynomial:
keypoint
X i ( xi , yi ,  i ) location

X ( x  xi , y  yi ,    i ) offset

sub-pixel, sub-scale Substantial improvement


X i  X i  X Estimated location
to matching and stability!
2. Keypoint Localization

• Use Taylor expansion to locally approximate D(x,y,σ) (i.e., DoG


function) and estimate Δx:

D T ( X i ) 1 T  2
D( X i )
D (X )  D ( X i )     2

 2 

• Find the extrema of D(ΔX):

D ( X i )  2 D ( X i )
 2
X 0
X X
2. Keypoint Localization
2 D ( X i ) D ( X i )  2 D  1 ( X i ) D ( X i )
 X   X 
X 2
X X 2 X

• ΔX can be computed by solving a 3x3 linear system:


 2 D 2 D 2 D   D  D Dki ,j1  Dki ,j1
  
 2
 y  x      2 use finite
     
 2 D 2 D 2
D     D   2 D Dki ,j1  2 Dki , j  Dki ,j1 differences:
   y    
  y y 2 yx  y   2
1
 x   
 2 D 2 D 2
D   D   2 D ( Dki 11, j  Dki 11, j )  ( Dki 11, j  Dki  11, j )
   x  
  x yx x 2  y 4

If X  0.5 in any dimension, repeat.


2. Keypoint Localization (cont’d)
• Reject keypoints having low contrast.
– i.e., sensitive to noise

If | D( X i  X ) | 0.03 reject keypoint


– i.e., assumes that image values have been normalized in [0,1]
2. Keypoint Localization (cont’d)
• Reject points lying on edges (or being close to
edges)

• Harris uses the auto-correlation


 f x2 f x f y  matrix:
AW ( x, y )   
xW , yW  f x f y f 2 

y

R(AW) = det(AW) – α trace2(AW)

or R(AW) = λ1 λ2- α (λ1+ λ2)2


2. Keypoint Localization (cont’d)
• SIFT uses the Hessian matrix (for efficiency).
– i.e., Hessian encodes principal curvatures
α: largest eigenvalue (λmax)
β: smallest eigenvalue (λmin)
(proportional to principal curvatures)

(r = α/β)

Reject keypoint if: (SIFT uses r = 10)


2. Keypoint Localization (cont’d)

(a) 233x189 image

(b) 832 DoG extrema

(c) 729 left after low


contrast threshold

(d) 536 left after testing


ratio based on Hessian
3. Orientation Assignment
• Create histogram of gradient directions, within a region
around the keypoint, at selected scale:
L( x, y,  ) G ( x, y,  ) * I ( x, y )

m ( x , y )  ( L ( x  1, y )  L ( x  1, y )) 2  ( L ( x , y  1)  L ( x , y  1)) 2
 ( x , y )  a tan 2(( L ( x , y  1)  L ( x , y  1)) / ( L ( x  1, y )  L ( x  1, y )))

36 bins (i.e., 10o per bin)

0 2
• Histogram entries are weighted by (i) gradient magnitude and (ii) a
Gaussian function with σ equal to 1.5 times the scale of the keypoint.
3. Orientation Assignment (cont’d)
• Assign canonical orientation at peak of smoothed
histogram (fit parabola to better localize peak).

0 2

• In case of peaks within 80% of highest peak,


multiple orientations assigned to keypoints.
– About 15% of keypoints has multiple orientations
assigned.
– Significantly improves stability of matching.
3. Orientation Assignment (cont’d)
• Stability of location, scale, and orientation (within 15
degrees) under noise.
4. Keypoint Descriptor

8 bins
4. Keypoint Descriptor (cont’d)
1. Take a 16 x16
window around
(8 bins)
detected
interest point.

2. Divide into a
4x4 grid of
cells.

3. Compute
histogram in 16 histograms x 8 orientations
each cell. = 128 features
4. Keypoint Descriptor (cont’d)
• Each histogram entry is weighted by (i) gradient magnitude
and (ii) a Gaussian function with σ equal to 0.5 times the
width of the descriptor window.
4. Keypoint Descriptor (cont’d)
• Partial Voting: distribute histogram entries into adjacent bins
(i.e., additional robustness to shifts)
– Each entry is added to all bins, multiplied by a weight of 1-d,
where d is the distance from the bin it belongs.
4. Keypoint Descriptor (cont’d)
• Descriptor depends on two main parameters:
(1) number of orientations r
rn2 features
(2) n x n array of orientation histograms

SIFT: r=8, n=4


128 features
4. Keypoint Descriptor (cont’d)
• Invariance to linear illumination changes:
– Normalization to unit length is sufficient.

128 features
4. Keypoint Descriptor (cont’d)
• Non-linear illumination changes:
– Saturation affects gradient magnitudes more
than orientations
– Threshold entries to be no larger than 0.2 and
renormalize to unit length

128 features
Robustness to viewpoint changes
• Match features after random change in image scale and
orientation, with 2% image noise, and affine distortion.
• Find nearest neighbor in database of 30,000 features.

Additional
robustness can
be achieved using
affine invariant
region detectors.
Distinctiveness
• Vary size of database of features, with 30 degree affine
change, 2% image noise.
• Measure % correct for single nearest neighbor match.
Matching SIFT features
• Given a feature in I1, how to find the best
match in I2?
1. Define distance function that compares two
descriptors.
2. Test all the features in I2, find the one with
min distance.

I1 I2
Matching SIFT features (cont’d)

f1 f2

I1 I2
Matching SIFT features (cont’d)
• Accept a match if SSD(f1,f2) < t
• How do we choose t?
Matching SIFT features (cont’d)
• A better distance measure is the following:
– SSD(f1, f2) / SSD(f1, f2’)
• f2 is best SSD match to f1 in I2
• f2’ is 2nd best SSD match to f1 in I2

f1 f2' f2

I1 I2
Matching SIFT features (cont’d)
• Accept a match if SSD(f1, f2) / SSD(f1, f2’) < t
• t=0.8 has given good results in object
recognition.
– 90% of false matches were eliminated.
– Less than 5% of correct matches were discarded
Matching SIFT features (cont’d)
• How to evaluate the performance of a feature
matcher?
50
75
200
Matching SIFT features (cont’d)
• Threshold t affects # of correct/false matches

50
true match
75
200
false match

• True positives (TP) = # of detected matches that


are correct
• False positives (FP) = # of detected matches that
are incorrect
Matching SIFT features(cont’d)

• ROC Curve 1
0.7
- Generated by computing
(FP, TP) for different TP
rate
thresholds.

- Maximize area under the


curve (AUC). 0 0.1 FP rate 1

http://en.wikipedia.org/wiki/Receiver_operating_characteristic
Applications of SIFT
• Object recognition
• Object categorization
• Location recognition
• Robot localization
• Image retrieval
• Image panoramas
Object Recognition
Object Models
Object Categorization
Location recognition
Robot Localization
Map continuously built over time
Image retrieval – Example 1


> 5000
images

change in viewing angle


Matches

22 correct matches
Image retrieval – Example 2


> 5000
images

change in viewing angle


+ scale change
Matches

33 correct matches
Image panoramas from an unordered image set
Variations of SIFT features
• PCA-SIFT

• SURF

• GLOH
SIFT Steps - Review
(1) Scale-space extrema detection
– Extract scale and rotation invariant interest points (i.e.,
keypoints).
(2) Keypoint localization
– Determine location and scale for each interest point.
– Eliminate “weak” keypoints
(3) Orientation assignment
– Assign one or more orientations to each keypoint.
(4) Keypoint descriptor
– Use local image gradients at the selected scale.
D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal
of Computer Vision, 60(2):91-110, 2004.
Cited 9589 times (as of 3/7/2011)
PCA-SIFT
• Steps 1-3 are the same; Step 4 is modified.
• Take a 41 x 41 patch at the given scale,
centered at the keypoint, and normalized to
a canonical direction.

Yan Ke and Rahul Sukthankar, “PCA-SIFT: A More Distinctive Representation for Local
Image Descriptors”, Computer Vision and Pattern Recognition, 2004
PCA-SIFT
• Instead of using weighted histograms,
concatenate the horizontal and vertical
gradients (39 x 39) into a long vector.
• Normalize vector to unit length.

2 x 39 x 39 = 3042 vector
PCA-SIFT
• Reduce the dimensionality of the vector using
Principal Component Analysis (PCA)
– e.g., from 3042 to 36

'
AKxN I Nx1 I
PCA
Kx1

Nx1 Kx1

• Some times, less discriminatory than SIFT.


SURF: Speeded Up Robust Features
• Speed-up computations by fast approximation of
(i) Hessian matrix and (ii) descriptor using “integral
images”.

• What is an “integral image”?

Herbert Bay, Tinne Tuytelaars, and Luc Van Gool, “SURF: Speeded Up Robust Features”,
European Computer Vision Conference (ECCV), 2006.
Integral Image
• The integral image IΣ(x,y) of an image I(x, y) represents the
sum of all pixels in I(x,y) of a rectangular region formed by
(0,0) and (x,y).

• . Using integral images, it


takes only four array
references to calculate the
sum of pixels over a
rectangular region of any
i x j y
I  ( x, y )   I (i, j ) size.
i 0 j 0
SURF: Speeded Up Robust Features
(cont’d)
• Approximate Lxx, Lyy, and Lxy using box filters.
(box filters shown are 9 x 9 – good approximations for a Gaussian with σ=1.2)

derivative approximation derivative approximation

• Can be computed very fast using integral


images!
SURF: Speeded Up Robust Features
(cont’d)

• In SIFT, images are


repeatedly
smoothed with a
Gaussian and
subsequently sub-
sampled in order to
achieve a higher
level of the
pyramid.
SURF: Speeded Up Robust Features
(cont’d)
• Alternatively, we can
use filters of larger
size on the original
image.

• Due to using integral


images, filters of any
size can be applied at
exactly the same
speed!

(see Tuytelaars’ paper for details)


SURF: Speeded Up Robust Features
(cont’d)
• Approximation of H:

Using DoG  Dxx Dxy 


SIFT
SIFT : H 
approx
 Dyx Dyy 

SURF
 Lˆxx Lˆxy 
SURF : H approx  
Using box filters  Lˆ yx ˆ
Lyy 
SURF: Speeded Up Robust Features
(cont’d)
• Instead of using a different measure for selecting the
location and scale of interest points (e.g., Hessian and
SURF
DOG in SIFT), SURF uses the determinant of H approx
to find both.

• Determinant elements must be weighted to obtain a


good approximation:

det( H SURF ˆ ˆ ˆ
) Lxx Lyy  (0.9 Lxy ) 2
approx
SURF: Speeded Up Robust Features
(cont’d)
• Once interest points have been localized both in space
and scale, the next steps are:
(1) Orientation assignment
(2) Keypoint descriptor
SURF: Speeded Up Robust Features
(cont’d)
• Orientation assignment
Circular neighborhood of ( dx,  dy )
radius 6σ around the interest point
600
(σ = the scale at which the point was detected)
angle

x response y response
Haar wavelets
(responses weighted
with Gaussian)
Side length = 4σ

Can be computed very fast using integral images!


SURF: Speeded Up Robust Features
(cont’d)
• Keypoint descriptor (square region of size 20σ)

• Sum the response over


4x4
each sub-region for dx
grid
and dy separately.
• To bring in information
about the polarity of
the intensity changes,
extract the sum of
absolute value of the
responses too.

Feature vector size:


4 x 16 = 64
SURF: Speeded Up Robust Features
(cont’d)
• SURF-128
– The sum of dx
and |dx| are
computed
separately for
points where
dy < 0 and dy >0

– Similarly for the


sum of dy and |
dy |

– More
discriminatory!
SURF: Speeded Up Robust Features

• Has been reported to be 3 times faster than SIFT.

• Less robust to illumination and viewpoint changes


compared to SIFT.

K. Mikolajczyk and C. Schmid,"A Performance Evaluation of Local Descriptors",


IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 10,
pp. 1615-1630, 2005.
[1] Ouyang, W., Tombari, F., Mattoccia, S., Di Stefano, L., Cham,
W.-K. (2012). Performance evaluation of full search equivalent
pattern matching algorithms. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 34 (1), 127-143.
[2] Birinci, M., Diaz-de-Maria, F., Abdollahian, G. (2011).
Neighborhood matching for object recognition algorithms based on
local image features. In IEEE Digital Signal Processing Workshop
and IEEE Signal Processing Education Workshop (DSP/SPE), 4-7
January 2011. IEEE, 157-162.
[3] Mian, A., Bennamoun, M., Owens, R. (2010). On the repeatability
and quality of keypoints for local featurebased 3D object retrieval
from cluttered scenes. International Journal of Computer Vision, 89
(2-3),
348-361.
[4] Mikulka, J., Gescheidtova, E., Bartusek, K. (2012). Soft-tissues
image processing: Comparison of traditional segmentation methods
with 2D active contour methods. Measurement Science Review, 12
(4),
153-161.

You might also like