Unit - 3: Feature Extraction and Matching
Unit - 3: Feature Extraction and Matching
01CE0612
Unit - 3
Feature Extraction and
Matching
2
Image Matching
Image Matching
Invariant local features
Feature Descriptors
Advantages of local features
Locality
– features are local, so robust to occlusion and clutter
Distinctiveness:
– can differentiate a large database of objects
Quantity
– hundreds or thousands in a single image
Efficiency
– real-time performance achievable
Generality
– exploit different types of features in different situations
More motivation…
Feature points are used for:
– Image alignment (e.g., mosaics)
– 3D reconstruction
– Motion tracking
– Object recognition
– Indexing and database retrieval
– Robot navigation
– … other
What makes a good feature?
Want uniqueness
Look for image regions that are unusual
– Lead to unambiguous matches in other images
depth discontinuity
illumination discontinuity
Step Edges
– Edge Magnitude
– Edge Orientation
– High Detection Rate and Good Localization
Noisy and Discrete!
Gradient
• Gradient equation:
• Gradient direction:
Ideal edge
y
B1 Lx, y x sin y cos 0
B1 : Lx, y 0
B2 t B2 : Lx, y 0
x
Image intensity (brightness):
1 for t 0
u t 1 u t s ds
t
for t 0
2
0 for t 0
Theory of Edge Detection
x y
Edge Magnitude: s x, y
I I
Edge Orientation: arctan / (normal of the edge)
y x
Rotationally symmetric, non-linear operator
• Partial derivatives (gradients):
I
sin B2 B1 x sin y cos
x
I
cos B2 B1 x sin y cos
y
Theory of Edge Detection
x y
Rotationally symmetric, linear operator
I 2I
I
x x 2 zero-crossing
x
x x
Discrete Edge Operators
• How can we I
differentiate a
1
Ii1, j 1 I i, j 1 Ii1, j I i, j I i , j 1 I i 1, j 1
x 2
I
discrete image?
1
I i1, j 1 I i1, j I i, j 1 I i, j Ii, j I i 1, j
y 2
Convolution masks :
I 1 1 1 I 1 1 1
x 2 y 2
1 1 1 1
Discrete Edge Operators
0 1 0 1 4 1
1 1
I2
1 4 1 or 20 4
2 6 2
4
0 1 0 1 4 1
(more accurate)
The Sobel Operators
-1 0 1 1 2 1
-2 0 2 0 0 0
-1 0 1 -1 -2 -1
Comparing Edge Operators
Good Localization
Gradient: Noise Sensitive
Poor Detection
0 1 1 0
Roberts (2 x 2): -1 0 0 -1
-1 0 1 1 1 1
Sobel (3 x 3): -1 0 1 0 0 0
-1 0 1 -1 -1 1
-1 -2 0 2 1 1 2 3 2 1
-2 -3 0 3 2 2 3 5 3 2
Poor Localization
-3 -5 0 5 3 0 0 0 0 0
Sobel (5 x 5): Less Noise Sensitive
-2 -3 0 3 2 -2 -3 -5 -3 -2 Good Detection
-1 -2 0 2 1 -1 -2 -3 -2 -1
Feature detection: the math
Consider shifting the window W by (u,v)
• how do the pixels in W change?
• compare each pixel before and after by
summing up the squared differences (SSD) W
• this defines an SSD “error” of E(u,v):
Small
Taylor Series expansion of I:
motion
assumption
If the motion (u,v) is small, then first order approx is good
x+
• The trace is the sum of the diagonals, i.e., trace(H) = h11 + h22
• Very similar to - but less expensive (no square root)
• Called the “Harris Corner Detector” or “Harris Operator”
• Lots of other detectors, this is one of the most popular
The
Harris
operator
Harris
operator
Harris detector example
f value (red high, blue low)
Threshold (f > value)
Find local maxima of f
Harris features (in red)
Invariance
Suppose you rotate the image by some angle
– Will you still pick up the same features?
Scale?
Scale invariant detection
Suppose you’re looking for
corners
Slide
Slidefrom
from
Tinne
Tinne
Tuytelaars
Tuytelaars
Feature descriptors
58
Mean Pyramid
And so on.
mean
And so on.
G 1/8
G 1/4
Gaussian 1/2
What is SIFT?
It is a technique for detecting salient, stable
feature points in an image.
62
Scale Invariant Feature Transform
When all images are similar in nature (same scale,
orientation, etc) simple corner detectors can work. But
when you have images of different scales and rotations,
you need to use the Scale Invariant Feature Transform.
SIFT isn't just scale invariant. You can change the following,
and still get good results:
•Scale (duh)
•Rotation
•Illumination
•Viewpoint
Scale Invariant Feature Transform
Basic idea:
• Take 16x16 square window around detected feature
• Compute edge orientation (angle of the gradient - 90) for each pixel
• Throw out weak edges (threshold gradient magnitude)
• Create histogram of surviving edge orientations
0 2
angle
histogram
SIFT Algorithm
• L is a blurred image
• G is the Gaussian Blur operator
• I is an image
• x,y are the location coordinates
• σ is the "scale" parameter. Think of it as the amount of blur. Greater the value,
greater the blur.
• The * is the convolution operation in x and y. It "applies" Gaussian blur G onto
the image I.
Scale-space extrema detection
(2)
(3)
Keypoint localization
• There are still a lot of points, some of them are not good
enough.
• Such a point has large principal curvature across the edge but
a small one in the perpendicular direction
• The principal curvatures can be calculated from a Hessian
function
• SURF
• GLOH
SIFT Steps - Review
2 x 39 x 39 = 3042 vector
PCA-SIFT
PCA
AKxN I Nx1 I Kx
'
1
Nx1 Kx1
Herbert Bay, Tinne Tuytelaars, and Luc Van Gool, “SURF: Speeded Up Robust Features”,
European Computer Vision Conference (ECCV), 2006.
Integral Image
• The integral image IΣ(x,y) of an image I(x, y) represents
the sum of all pixels in I(x,y) of a rectangular region
formed by (0,0) and (x,y).
Lˆ xx Lˆxy
SURF : H SURF
Lˆ yx ˆ
approx
Lyy
Using box filters
SURF: Speeded Up Robust
Features (cont’d)
• Instead of using a different measure for selecting the
location and scale of interest points (e.g., Hessian
and DOG in SIFT), SURF uses the determinantHof SURF
approx
to find both.
x response y response
Haar wavelets
(responses weighted
with Gaussian)
Side length = 4σ
– More
discriminatory!
SURF: Speeded Up Robust
Features
• Has been reported to be 3 times faster than SIFT.
I1 I2
Feature distance
How to define the difference between two
features f1, f2?
– Better approach: ratio distance = SSD(f1, f2) / SSD(f1,
f2’)
• f2 is best SSD match to f1 in I2
• f2’ is 2nd best SSD match to f1 in I2
• gives small values for ambiguous matches
f1 f 2' f2
I1 I2
Evaluating the results
How can we measure the performance of a
feature matcher? 50
75
200
feature distance
True/false positives
50
true match
75
200
false match
feature distance
0.7
0.7
SIFT usage:
Recognize
charging
station
Communicate
with visual
cards
Teach object
recognition
RANSAC
AC
RANS
• We can ’search’ in parameter space by trying many potential parameter values and
see which set of parameters ‘agree’/fit with our set of features 1
AC
RANS
parameters?
1
1
8
Voting as a fitting
technique
AC
RANS
possible subset. For example, the naïve line fitting we saw last time was O(N2).
• Voting is a general technique where we let the features vote for all models that
are compatible with it.
1
–Cycle through features, cast votes for model parameters. 1
9
AC
RANS
• RANdom SAmple Consensus
1
2
•Approach: we want to avoid the impact of outliers, so let’s look for “inliers”, and 0
• Intuition: if an outlier is chosen to compute the current fit, then the resulting
line
won’t have much support from rest of the points.
RANSA
C
AC
RANS
RANSAC loop:
Repeat for k iterations:
1
1. Randomly select a seed group of points on which to perform a model estimate (e.g., a group
1 of edge points)
2
1
6
AC
RANSAC Line Fitting
Example
• Task: Estimate the best line
– How many points do we need to estimate the line?
Sample two
points
RANSAC Line Fitting
Example
AC
RANS
Choose k high enough to keep this below desired failure
rate.
RANSAC: Computed k
(p=0.99)
Sa Proportion ofoutliers
m
n
2 2 3 5 6 7 11 17
3 3 4 7 9 11 19 35
AC
RANS
4 3 5 9 13 17 34 72
5 4 6 1 17 26 57 146
2
6 4 7 1 24 37 97 293
6
7 4 8 2 33 54 163 588
0
8 5 9 2 44 78 272 1177
6
estimat
e
• RANSAC computes its best estimate from a minimal sample
of n points, and divides all data points into inliers and outliers
AC
RANS
1
Slide credit: DavidLowe 2
9
RANSAC: Pros and
Cons