0% found this document useful (0 votes)
15 views130 pages

Unit - 3: Feature Extraction and Matching

The document covers key concepts in feature extraction and matching in computer vision, including feature detection methods like Harris Corner Detection and SIFT, as well as feature matching algorithms such as Brute-Force and FLANN. It discusses the importance of invariant local features for applications like image alignment, 3D reconstruction, and object recognition. Additionally, it delves into edge detection theory, gradient calculations, and the use of eigenvalues in feature detection, culminating in the Harris operator for identifying key features in images.

Uploaded by

iamkarmabhatt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views130 pages

Unit - 3: Feature Extraction and Matching

The document covers key concepts in feature extraction and matching in computer vision, including feature detection methods like Harris Corner Detection and SIFT, as well as feature matching algorithms such as Brute-Force and FLANN. It discusses the importance of invariant local features for applications like image alignment, 3D reconstruction, and object recognition. Additionally, it delves into edge detection theory, gradient calculations, and the use of eigenvalues in feature detection, culminating in the Harris operator for identifying key features in images.

Uploaded by

iamkarmabhatt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 130

Computer Vision

01CE0612

Unit - 3
Feature Extraction and
Matching

Dr. Anjali Diwan


Department of Computer Engineering
• Feature Detection and Description,
• Harris Corner Detection,
• Scale Invariant Feature Transform (SIFT),
• Speeded-Up Robust Features (SURF),
• Feature Matching Algorithms (e.g., Brute-
Force, FLANN),
• RANSAC for Robust Estimation, Applications of
Feature Matching

2
Image Matching
Image Matching
Invariant local features

Find features that are invariant to transformations


– geometric invariance: translation, rotation, scale
– photometric invariance: brightness, exposure, …

Feature Descriptors
Advantages of local features
Locality
– features are local, so robust to occlusion and clutter
Distinctiveness:
– can differentiate a large database of objects
Quantity
– hundreds or thousands in a single image
Efficiency
– real-time performance achievable
Generality
– exploit different types of features in different situations
More motivation…
Feature points are used for:
– Image alignment (e.g., mosaics)
– 3D reconstruction
– Motion tracking
– Object recognition
– Indexing and database retrieval
– Robot navigation
– … other
What makes a good feature?
Want uniqueness
Look for image regions that are unusual
– Lead to unambiguous matches in other images

How to define “unusual”?


Local measures of uniqueness
Suppose we only consider a small window
of pixels
– What defines whether a feature is a good or
bad candidate?
Feature detection
Local measure of feature uniqueness
– How does the window change when you shift it?
– Shifting the window in any direction causes a big
change

“flat” region: “edge”: “corner”:


no change in all no change along significant change
directions the edge direction in all directions
Origin of Edges

surface normal discontinuity

depth discontinuity

surface color discontinuity

illumination discontinuity

• Edges are caused by a variety of factors


How can you tell that a pixel is on an edge?
Edge Types

Step Edges

Roof Edge Line Edges


Real Edges

We want an Edge Operator that produces:

– Edge Magnitude
– Edge Orientation
– High Detection Rate and Good Localization
Noisy and Discrete!
Gradient
• Gradient equation:

• Represents direction of most rapid change in intensity

• Gradient direction:

• The edge strength is given by the gradient magnitude


Theory of Edge Detection

Ideal edge
y
B1 Lx, y   x sin   y cos    0
B1 : Lx, y   0
B2 t B2 : Lx, y   0
x
Image intensity (brightness):

I x, y   B1  B2  B1 u x sin   y cos   


Unit step function:

1 for t  0

u t    1 u t     s ds
t
for t  0
2 
0 for t  0

Theory of Edge Detection

• Image intensity (brightness): • Squared gradient:


2
 I   I 
2

I x, y   B1  B2  B1 u x sin   y cos    sx, y         B2  B1  x sin   y cos   


2

 x   y 
Edge Magnitude: s  x, y 
 I I 
Edge Orientation: arctan /  (normal of the edge)
 y x 
Rotationally symmetric, non-linear operator
• Partial derivatives (gradients):
I
  sin  B2  B1   x sin   y cos    
x
I
  cos  B2  B1  x sin   y cos    
y
Theory of Edge Detection

• Partial derivatives (gradients): • Image intensity (brightness):


I
  sin  B2  B1   x sin   y cos     I x, y   B1  B2  B1 u x sin   y cos   
x
I
  cos  B2  B1  x sin   y cos    
y
• Laplacian:
2I 2I
 I  2  2  B2  B1  ' x sin   y cos    
2

x y
Rotationally symmetric, linear operator
I 2I
I
x x 2 zero-crossing

x
x x
Discrete Edge Operators

Finite difference approximations:

• How can we I
differentiate a 
1
Ii1, j 1  I i, j 1   Ii1, j  I i, j  I i , j 1 I i 1, j 1
x 2 
I
discrete image?

1
I i1, j 1  I i1, j   I i, j 1  I i, j  Ii, j I i 1, j
y 2

Convolution masks :

I 1 1 1 I 1 1 1
 
x 2 y 2
1 1 1 1
Discrete Edge Operators

• Second order partial derivatives:


I i 1, j 1 I i , j 1 I i 1, j 1
2I 1
 I i 1, j  2 I i , j  I i 1, j  I i 1, j I i , j I i 1, j
x 2  2
• Laplacian : I i 1, j 1 I i , j 1 I i 1, j 1
2I 1
 I  I
2 2
 I i , j 1  2 I i , j  I i , j 1 
2 I   y 2  2
x 2 y 2
Convolution masks :

0 1 0 1 4 1
1 1
 I2
1 4 1 or  20 4
2 6 2
4
0 1 0 1 4 1

(more accurate)
The Sobel Operators

• Better approximations of the gradients exist

– The Sobel operators below are commonly used

-1 0 1 1 2 1
-2 0 2 0 0 0
-1 0 1 -1 -2 -1
Comparing Edge Operators
Good Localization
Gradient: Noise Sensitive
Poor Detection
0 1 1 0
Roberts (2 x 2): -1 0 0 -1

-1 0 1 1 1 1
Sobel (3 x 3): -1 0 1 0 0 0
-1 0 1 -1 -1 1

-1 -2 0 2 1 1 2 3 2 1
-2 -3 0 3 2 2 3 5 3 2
Poor Localization
-3 -5 0 5 3 0 0 0 0 0
Sobel (5 x 5): Less Noise Sensitive
-2 -3 0 3 2 -2 -3 -5 -3 -2 Good Detection
-1 -2 0 2 1 -1 -2 -3 -2 -1
Feature detection: the math
Consider shifting the window W by (u,v)
• how do the pixels in W change?
• compare each pixel before and after by
summing up the squared differences (SSD) W
• this defines an SSD “error” of E(u,v):
Small
Taylor Series expansion of I:
motion
assumption
If the motion (u,v) is small, then first order approx is good

Plugging this into the formula on the previous slide…


Feature
Consider shifting the window W by (u,v) detection:


how do the pixels in W change?
compare each pixel before and after by
summing up the squared differences
the math
W
• this defines an “error” of E(u,v):
Feature
This can be rewritten:
detection:
the math

For the example above


• You can move the center of the green window to anywhere on
the blue unit circle
• Which directions will result in the largest and smallest E values?
• We can find these directions by looking at the eigenvectors of H
Quick eigenvalue/eigenvector
review
The eigenvectors of a matrix A are the vectors x that satisfy:

The scalar  is the eigenvalue corresponding to x


– The eigenvalues are found by solving:

– In our case, A = H is a 2x2 matrix, so we have

– The solution: Once you know , you find x by solving


Feature
This can be rewritten: detectio
n: the
math
x-

x+

Eigenvalues and eigenvectors of H


• Define shifts with the smallest and largest change (E value)
• x+ = direction of largest increase in E.
• + = amount of increase in direction x+
• x- = direction of smallest increase in E.
• - = amount of increase in direction x+
Feature
How are +, x+, -, and x+ relevant for feature detection?
detection
• What’s our feature scoring function?
: the
Want E(u,v) to be large for small shifts in all directions
• the minimum of E(u,v) should be large, over all unit vectors [u v]
math
• this minimum is given by the smaller eigenvalue (-) of H
Feature
Here’s what you do
• Compute the gradient at each point in the image detection
• Create the H matrix from the entries in the gradient


Compute the eigenvalues.
Find points with large response (- > threshold)
summary
• Choose those points where - is a local maximum as features
Feature
Here’s what you do
• Compute the gradient at each point in the image detection
• Create the H matrix from the entries in the gradient


Compute the eigenvalues.
Find points with large response (- > threshold)
summary
• Choose those points where - is a local maximum as features
The Harris operator
- is a variant of the “Harris operator” for feature detection

• The trace is the sum of the diagonals, i.e., trace(H) = h11 + h22
• Very similar to - but less expensive (no square root)
• Called the “Harris Corner Detector” or “Harris Operator”
• Lots of other detectors, this is one of the most popular
The
Harris
operator
Harris
operator
Harris detector example
f value (red high, blue low)
Threshold (f > value)
Find local maxima of f
Harris features (in red)
Invariance
Suppose you rotate the image by some angle
– Will you still pick up the same features?

What if you change the brightness?

Scale?
Scale invariant detection
Suppose you’re looking for
corners

Key idea: find scale that


gives local maximum of f
– f is a local maximum in
both position and scale
– Common definition of f:
Laplacian
(or difference between two Gaussian filtered images with
different sigmas)
Lindebergetetal,
Lindeberg al.,1996
1996

Slide
Slidefrom
from
Tinne
Tinne
Tuytelaars
Tuytelaars
Feature descriptors

We know how to detect good


points
?
Next question: How to match
them?
Feature descriptors
We know how to detect
good points
Next question: How to
match them?

Lots of possibilities (this is a


?
popular research area)
– Simple option: match
square windows around
the point
– State of the art
approach: SIFT
• David Lowe, UBC
http://www.cs.ubc.ca/~lowe/keypoints/
Invariance
Suppose we are comparing two images I1 and I2
– I2 may be a transformed version of I1
– What kinds of transformations are we likely to
encounter in practice?
Invariance
Suppose we are comparing two images I1 and I2
– I2 may be a transformed version of I1
– What kinds of transformations are we likely to encounter
in practice?

We’d like to find the same features regardless of the


transformation
– This is called transformational invariance
– Most feature methods are designed to be invariant to
• Translation, 2D rotation, scale

– They can usually also handle


• Limited 3D rotations (SIFT works up to about 60 degrees)
• Limited affine transformations (some are fully affine invariant)
• Limited illumination/contrast changes
How to achieve invariance
Need both of the following:
1. Make sure your detector is invariant
– Harris is invariant to translation and rotation
– Scale is trickier
• common approach is to detect features at many scales using a Gaussian pyramid (e.g., MOPS)
• More sophisticated methods find “the best scale” to represent each feature (e.g., SIFT)

2. Design an invariant feature descriptor


– A descriptor captures the information in a region
around the detected feature point
– The simplest descriptor: a square window of pixels
• What’s this invariant to?

– Let’s look at some better approaches…


Rotation invariance for feature descriptors

Find dominant orientation of the


image patch
– This is given by x+, the
eigenvector of H corresponding
to +
• + is the larger eigenvalue
– Rotate the patch according to this
angle

Figure by Matthew Brown


2D Gaussian Edge Operators

Gaussian Derivative of Gaussian (DoG) Laplacian of Gaussian


Mexican Hat (Sombrero)

• is the Laplacian operator:


Scale-space extrema detection

• Goal: Identify locations and scales that can be


repeatably assigned under different views of the same
scene or object.
• Method: search for stable features across multiple
scales using a continuous function of scale.
• Prior work has shown that under a variety of
assumptions, the best function is a Gaussian function.
• The scale space of an image is a function L(x,y,) that is
produced from the convolution of a Gaussian kernel (at
different scales) with the input image.
Image Pyramids
And so on.

3rd level is derived from the


2nd level according to the same
funtion

2nd level is derived from the


original image according to
some function

Bottom level is the original image.

58
Mean Pyramid
And so on.

At 3rd level, each pixel is the mean


of 4 pixels in the 2nd level.

At 2nd level, each pixel is the mean


of 4 pixels in the original image.

mean

Bottom level is the original image.


Gaussian Pyramid
At each level, image is smoothed and reduced in size.

And so on.

At 2nd level, each pixel is the result


of applying a Gaussian mask to
the first level and then subsampling
Apply Gaussian filter to reduce the size.

Bottom level is the original image.


Example: Subsampling with Gaussian pre-filtering

G 1/8
G 1/4

Gaussian 1/2
What is SIFT?
It is a technique for detecting salient, stable
feature points in an image.

For every such point, it also provides a set


of “features” that “characterize/describe” a
small image region around the point. These
features are invariant to rotation and scale.

62
Scale Invariant Feature Transform
When all images are similar in nature (same scale,
orientation, etc) simple corner detectors can work. But
when you have images of different scales and rotations,
you need to use the Scale Invariant Feature Transform.

SIFT isn't just scale invariant. You can change the following,
and still get good results:
•Scale (duh)
•Rotation
•Illumination
•Viewpoint
Scale Invariant Feature Transform
Basic idea:
• Take 16x16 square window around detected feature
• Compute edge orientation (angle of the gradient - 90) for each pixel
• Throw out weak edges (threshold gradient magnitude)
• Create histogram of surviving edge orientations

0 2
angle
histogram
SIFT Algorithm

• SIFT is quite an involved algorithm. Here's an outline of what happens in


SIFT.
• Constructing a scale space: Create internal representations of the original image to
ensure scale invariance. This is done by generating a "scale space".
• LoG Approximation: The Laplacian of Gaussian is great for finding interesting points
(or key points) in an image. But it's computationally expensive. So approximate it.
• Finding keypoints: Now try to find key points. These are maxima and minima in the
Difference of Gaussian image we calculate in step 2.
• Get rid of bad key points: Edges and low contrast regions are bad keypoints.
Eliminating these makes the algorithm efficient and robust.
• Assigning an orientation to the keypoints: An orientation is calculated for each key
point. Further calculations are done relative to this orientation. This effectively
cancels out the effect of orientation, making it rotation invariant.
• Generate SIFT features Finally, with scale and rotation invariance in place, one
more representation is generated.
Scale spaces in SIFT
1. Take the original image, and
generate progressively
blurred out images
2. Then, you resize the original
image to half size
3. And generate blurred out
images again
• Octaves and Scales: The number of octaves and scale depends on
the size of the original image. SIFT suggests that 4 octaves and 5
blur levels are ideal for the algorithm
• The first octave: If the original image is doubled in size and
antialiased a bit (by blurring it) then the algorithm produces more
keypoints
• Blurring: Mathematically, "blurring" is referred to as the
convolution of the Gaussian operator on the image

• L is a blurred image
• G is the Gaussian Blur operator
• I is an image
• x,y are the location coordinates
• σ is the "scale" parameter. Think of it as the amount of blur. Greater the value,
greater the blur.
• The * is the convolution operation in x and y. It "applies" Gaussian blur G onto
the image I.
Scale-space extrema detection

• Find the points, whose surrounding patches (with some


scale) are distinctive
• An approximation to the scale-normalized Laplacian of
Gaussian

the actual Gaussian Blur operator


LoG approximation

Maxima and minima in a


3*3*3 neighborhood
Find subpixel maxima/minima
(1)

(2)

(3)
Keypoint localization

• There are still a lot of points, some of them are not good
enough.

• The locations of keypoints may be not accurate.

• Eliminating edge points.


Eliminating edge points

• Such a point has large principal curvature across the edge but
a small one in the perpendicular direction
• The principal curvatures can be calculated from a Hessian
function

• The eigenvalues of H are proportional to the principal


curvatures, so two eigenvalues shouldn’t diff too much
Orientation assignment
• Assign an orientation to each keypoint, the keypoint
descriptor can be represented relative to this
orientation and therefore achieve invariance to image
rotation

• Compute magnitude and orientation on the Gaussian


smoothed images
Orientation assignment

• A histogram is formed by quantizing the orientations into 36


bins;
• Peaks in the histogram correspond to the orientations of the
patch;
• For the same scale and location, there could be multiple
;
keypoints with different orientations
Feature descriptor
SIFT feature descriptor
Full version
• Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown
below)
• Compute an orientation histogram for each cell
• 16 cells * 8 orientations = 128 dimensional descriptor
Getting a Feature

Within each 4x4 window,


gradient magnitudes and
orientations are calculated.
These orientations are put into
an 8 bin histogram

This 16x16 window is broken into sixteen 4x4 windows.


• orientation in the range 0-44
degrees add to the first bin.
• 45-89 add to the next binAnd
so on.
• The amount added to the bin
depends on the magnitude of
the gradient
Properties of SIFT

Extraordinarily robust matching technique


– Can handle changes in viewpoint
• Up to about 60 degree out of plane rotation

– Can handle significant changes in illumination


• Sometimes even day vs. night (below)

– Fast and efficient—can run in real time


– Lots of code available
Uses for SIFT
• Feature points are used also for:
– Image alignment (homography, fundamental matrix)
– 3D reconstruction (e.g. Photo Tourism)
– Motion tracking
– Object recognition
– Indexing and database retrieval
– Robot navigation
– … many others
Maximally Stable Extremal Regions
J.Matas et.al. “Distinguished Regions for Wide-baseline Stereo”. BMVC 2002.

• Maximally Stable Extremal Regions


– Threshold image intensities: I >
thresh
for several increasing values of
thresh
– Extract connected components
(“Extremal Regions”)
– Find a threshold when region is
“Maximally Stable”, i.e. local
minimum
of the relative growth
– Approximate each region with
an ellipse
Variations of SIFT features
• PCA-SIFT

• SURF

• GLOH
SIFT Steps - Review

(1) Scale-space extrema detection


– Extract scale and rotation invariant interest points (i.e.,
keypoints).
(2) Keypoint localization
– Determine location and scale for each interest point.
– Eliminate “weak” keypoints
(3) Orientation assignment
– Assign one or more orientations to each keypoint.
(4) Keypoint descriptor
– Use local image gradients at the selected scale.

D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal


of Computer Vision, 60(2):91-110, 2004.
Cited 9589 times (as of 3/7/2011)
PCA-SIFT

• Steps 1-3 are the same; Step 4 is


modified.
• Take a 41 x 41 patch at the given
scale, centered at the keypoint,
and normalized to a canonical
direction.
Yan Ke and Rahul Sukthankar, “PCA-SIFT: A More Distinctive Representation for Local
Image Descriptors”, Computer Vision and Pattern Recognition, 2004
PCA-SIFT

• Instead of using weighted


histograms, concatenate the
horizontal and vertical gradients
(39 x 39) into a long vector.
• Normalize vector to unit length.

2 x 39 x 39 = 3042 vector
PCA-SIFT

• Reduce the dimensionality of the vector using


Principal Component Analysis (PCA)
– e.g., from 3042 to 36

PCA
AKxN I Nx1  I Kx
'
1

Nx1 Kx1

• Some times, less discriminatory than SIFT.


SURF: Speeded Up Robust
Features
• Speed-up computations by fast approximation of
(i) Hessian matrix and (ii) descriptor using “integral
images”.

• What is an “integral image”?

Herbert Bay, Tinne Tuytelaars, and Luc Van Gool, “SURF: Speeded Up Robust Features”,
European Computer Vision Conference (ECCV), 2006.
Integral Image
• The integral image IΣ(x,y) of an image I(x, y) represents
the sum of all pixels in I(x,y) of a rectangular region
formed by (0,0) and (x,y).

• . Using integral images, it


takes only four array
references to calculate the
sum of pixels over a
rectangular region of any
i x j  y
I  ( x, y )   I (i, j ) size.
i 0 j 0
SURF: Speeded Up Robust
Features (cont’d)
• Approximate Lxx, Lyy, and Lxy using box filters.
(box filters shown are 9 x 9 – good approximations for a Gaussian with σ=1.2)

derivative approximation derivative approximation

• Can be computed very fast using integral


images!
SURF: Speeded Up Robust
Features (cont’d)
• In SIFT, images are
repeatedly
smoothed with a
Gaussian and
subsequently sub-
sampled in order
to achieve a higher
level of the
pyramid.
SURF: Speeded Up Robust
Features (cont’d)
• Alternatively, we can
use filters of larger
size on the original
image.

• Due to using integral


images, filters of any
size can be applied at
exactly the same
speed!
SURF: Speeded Up Robust
Features (cont’d)
• Approximation of H:

Using DoG  Dxx Dxy 


SIFT

Dyy 
SIFT : H approx
 Dyx

 Lˆ xx Lˆxy 
SURF : H SURF
 
 Lˆ yx ˆ
approx
Lyy 
Using box filters
SURF: Speeded Up Robust
Features (cont’d)
• Instead of using a different measure for selecting the
location and scale of interest points (e.g., Hessian
and DOG in SIFT), SURF uses the determinantHof SURF
approx

to find both.

• Determinant elements must be weighted to obtain a


good approximation:
SURF
det( H approx )  Lˆxx Lˆ yy  (0.9 Lˆxy ) 2
SURF: Speeded Up Robust
Features (cont’d)
• Once interest points have been localized both in
space and scale, the next steps are:
(1) Orientation assignment
(2) Keypoint descriptor
SURF: Speeded Up Robust
Features (cont’d)
• Orientation assignment
Circular neighborhood of ( dx,  dy )
radius 6σ around the interest point
600
(σ = the scale at which the point was detected)
angle

x response y response
Haar wavelets
(responses weighted
with Gaussian)
Side length = 4σ

Can be computed very fast using integral images!


SURF: Speeded Up Robust
Features (cont’d)
• Keypoint descriptor (square region of size 20σ)

• Sum the response over


4x4
each sub-region for dx
grid and dy separately.
• To bring in information
about the polarity of the
intensity changes,
extract the sum of
absolute value of the
responses too.

Feature vector size:


4 x 16 = 64
SURF: Speeded Up Robust
Features (cont’d)
• SURF-128
– The sum of dx
and |dx| are
computed
separately for
points where
dy < 0 and dy >0

– Similarly for the


sum of dy and
|dy|

– More
discriminatory!
SURF: Speeded Up Robust
Features
• Has been reported to be 3 times faster than SIFT.

• Less robust to illumination and viewpoint


changes compared to SIFT.

K. Mikolajczyk and C. Schmid,"A Performance Evaluation of Local Descriptors",


IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 10,
pp. 1615-1630, 2005.
Gradient location-orientation histogram (GLOH)

• Compute SIFT using a log-polar location grid:


– 3 bins in radial direction (i.e., radius 6, 11, and
15)
– 8 bins in angular direction
• Gradient orientation quantized in 16 bins.
• Total: (2x8+1)*16=272 bins PCA.

K. Mikolajczyk and C. Schmid,"A Performance Evaluation of Local Descriptors", IEEE


Trans. on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615-1630, 2005.
Shape Context

• A 3D histogram of edge point locations and


orientations.
– Edges are extracted by the Canny edge
detector.
– Location is quantized into 9 bins (using a log-
polar coordinate system).
– Orientation is quantized in 4 bins (i.e.,
(horizontal, vertical, and two diagonals).
• Total number of features: 4 x 9 = 36
Spin image

• A histogram of quantized pixel


locations and intensity values.
– A normalized histogram is computed
for each of five rings centered on the
region.
– The intensity of a normalized patch is
quantized into 10 bins.
• Total number of features: 5 x 10 = 50
Differential Invariants
• “Local jets” of derivatives obtained by convolving the
image with Gaussian derivates.
• Derivates are computed at different orientations by
rotating the image patches.
 I ( x, y )  G ( ) 
 
Example: some Gaussian derivatives  I ( x, y )  Gx ( ) 
up to fourth order  I ( x, y ) * G y ( ) 
  compute
 I ( x , y ) * G xx ( ) 
 I ( x, y ) * Gxy ( ) 
invariants
 
 I ( x , y ) * G yy ( ) 
 
 
Bank of Filters

(e.g., Gabor filters)


Moment Invariants
• Moments are computed for derivatives of an image patch
using:

where p and q is the order, a is the degree, and Id is the image


gradient in direction d.

• Derivatives are computed in x and y directions.


Bank of Filters: Steerable Filters
Feature matching
Given a feature in I1, how to find the best match
in I2?
1. Define distance function that compares two
descriptors
2. Test all the features in I2, find the one with min
distance
Feature distance
How to define the difference
between two features f1, f2?
– Simple approach is SSD(f1, f2)
f1• f
sum of square differences between entries of the two descriptors
2
• can give good scores to very ambiguous (bad) matches

I1 I2
Feature distance
How to define the difference between two
features f1, f2?
– Better approach: ratio distance = SSD(f1, f2) / SSD(f1,
f2’)
• f2 is best SSD match to f1 in I2
• f2’ is 2nd best SSD match to f1 in I2
• gives small values for ambiguous matches

f1 f 2' f2

I1 I2
Evaluating the results
How can we measure the performance of a
feature matcher? 50
75
200

feature distance
True/false positives

50
true match
75
200
false match

feature distance

The distance threshold affects performance


– True positives = # of detected matches that are correct
• Suppose we want to maximize these—how to choose threshold?

– False positives = # of detected matches that are incorrect


• Suppose we want to minimize these—how to choose threshold?
Evaluating the results
How can we measure the performance of a feature matcher?

0.7

# true positives true


# matching features (positives) positive
rate

0 0.1 false positive rate 1


# false positives
# unmatched features
(negatives)
Evaluating the results
How can we measure the performance of a feature matcher?
ROC curve (“Receiver Operator Characteristic”)
1

0.7

# true positives true


# matching features (positives) positive
rate

0 0.1 false positive rate 1


# false positives
# unmatched features (negatives)
ROC Curves
• Generated by counting # current/incorrect matches, for different threholds
• Want to maximize area under the curve (AUC)
• Useful for comparing different feature matching methods
• For more info: http://en.wikipedia.org/wiki/Receiver_operating_characteristic
Object recognition (David Lowe)
Sony Aibo

SIFT usage:

Recognize
charging
station

Communicate
with visual
cards

Teach object
recognition
RANSAC

•A model fitting method for line detection

Features and Fitting


– RANSAC
• Let’s say we have chosen a parametric model for a set of features
–For example, we have a line equation that we want to fit to a set of edge points

AC
RANS
• We can ’search’ in parameter space by trying many potential parameter values and
see which set of parameters ‘agree’/fit with our set of features 1

• Three main questions:


1
6

–What model represents this set of features best?


–Which of several model instances gets which feature?
–How many model instances are there?
• Computational complexity is important
–It is infeasible to examine every possible set of parameters and every possible
combination of features
Example: Line
Fitting
• Why fit lines? Many objects characterized by
presence of straight lines

Features and Fitting


AC
RANS
1
1
7
Difficulty of Line
Fitting
• Extra edge points (clutter), multiple
models:
–Which points go with
which line, if any?
• Only some parts of each line
detected, and some parts are

Features and Fitting


missing:
–How to find a line that
bridges missing evidence?
• Noise in measured edge points,
orientations:
–How to detect true underlying

AC
RANS
parameters?

1
1
8
Voting as a fitting
technique

Features and Fitting


•It’s not feasible to check all combinations of features by fitting a model to each

AC
RANS
possible subset. For example, the naïve line fitting we saw last time was O(N2).
• Voting is a general technique where we let the features vote for all models that
are compatible with it.
1
–Cycle through features, cast votes for model parameters. 1
9

–Look for model parameters that receive a lot of votes.


• Noise & clutter features will cast votes too, but typically their votes should be
inconsistentwith the majority of “good” features.
• Ok if some features not observed, as model can span multiple fragments.
RANSA
C

Features and Fitting


[Fischler & Bolles 1981]

AC
RANS
• RANdom SAmple Consensus
1
2

•Approach: we want to avoid the impact of outliers, so let’s look for “inliers”, and 0

use only those.

• Intuition: if an outlier is chosen to compute the current fit, then the resulting
line
won’t have much support from rest of the points.
RANSA
C

Features and Fitting


[Fischler & Bolles 1981]

AC
RANS
RANSAC loop:
Repeat for k iterations:
1
1. Randomly select a seed group of points on which to perform a model estimate (e.g., a group
1 of edge points)
2

2. Compute model parameters from seed group


3. Find inliers to this model
4. If the number of inliers is sufficiently large, re-compute least-squares estimate of model on all of the inliers

– Keep the model with the largest number of inliers


Features and Fitting RANS

1
6
AC
RANSAC Line Fitting
Example
• Task: Estimate the best line
– How many points do we need to estimate the line?

Features and Fitting


AC
RANS
1
2
3
RANSAC Line Fitting
Example

• Task: Estimate the best line

Sample two
points
RANSAC Line Fitting
Example

• Task: Estimate the best line

Features and Fitting


AC
RANS
Fit a line to them
RANSAC Line Fitting
Example
• Task: Estimate the best line

Features and Fitting


RANSAC
Repeat, until we get a good result.
RANSAC: How many
iterations “k”?
• How many samples are needed?
–Suppose w is fraction of inliers (points from line).
–n points needed to define hypothesis (2 for lines)
–k samples chosen.
• Prob. that a single sample of n points is correct: 𝑤 𝑛

Features and Fitting


• Prob. that a single sample of n points fails: 1 − 𝑤 𝑛
• Prob. that all k samples fail is: (1 − 𝑤 𝑛 ) 𝑘 𝑘
• Prob. that at least one of the k samples is correct: 1 − 1 − 𝑤 𝑛

AC
RANS
 Choose k high enough to keep this below desired failure
rate.
RANSAC: Computed k
(p=0.99)
Sa Proportion ofoutliers
m

Features and Fitting


pl 5% 10% 20 25% 30% 40% 50%
e %
siz
e

n
2 2 3 5 6 7 11 17
3 3 4 7 9 11 19 35

AC
RANS
4 3 5 9 13 17 34 72
5 4 6 1 17 26 57 146
2
6 4 7 1 24 37 97 293
6
7 4 8 2 33 54 163 588
0
8 5 9 2 44 78 272 1177
6
estimat
e
• RANSAC computes its best estimate from a minimal sample
of n points, and divides all data points into inliers and outliers

Features and Fitting


using this estimate.
• We can improve this initial estimate by estimation over all inliers
(e.g. with standard least-squares minimization).
• But this may change inliers, so alternate fitting with re-
classification as inlier/outlier.

AC
RANS
1
Slide credit: DavidLowe 2
9
RANSAC: Pros and
Cons

Features and Fitting


AC
RANS
• Pros:
–General method suited for a wide range of model fitting problems
–Easy to implement and easy to calculate its failure rate
• Cons:
–Only handles a moderate percentage of outliers without cost blowing up
–Many real problems have high rate of outliers (but sometimes selective choice of
random
subsets can help)
• A voting strategy, The Hough transform, can handle high percentage ofoutliers

You might also like