GNR602-Lec15-21 Image Segmentation and Feature Detection
GNR602-Lec15-21 Image Segmentation and Feature Detection
Slot 13
Lecture 15-21 Image Segmentation Techniques
IIT Bombay Slide 1
Lecture 15-21 Image Segmentation
Definition
Definition
• Given by Theodosis Pavlidis
• Widely accepted, since it covers the logical
process of image segmentation
• If the image is decomposed into its constituent
parts, each part should correspond to some
object in real world, and one or more parts build
up the entire object
r S
i 1
i
Obviously this condition does not make sense if the regions are not
spatially adjacent, since the regions representing the same type of
objects can be physically separated.
Cont…
• The goal of segmentation is to simplify and/or
change the representation of an image into
something that is more meaningful and easier to
analyze.
Image thresholding
• The conventional thresholding schemes :
– Global Thresholding
– Local Thresholding
– Dynamic Thresholding
thresholding
x 10
4
histogram
single threshold multiple thresholds
3
2 .5
1 .5
0 .5
0
0 50 100 150 200 250
Cont…
g(i,j) = T(f(i,j)) = B if f(i,j) < P
= W otherwise
Global Thresholding
• Identification of a level P such that all pixels in the image
with gray level below P to be represented by a value B
and all pixels with gray level equal to or above P to be
represented by a value W.
• g(i,j) = T(f(i,j)) = B if f(i,j) < P
= W otherwise
• T(.) is the thresholding function.
• This operation applies globally to all pixels in the image
irrespective of their position or the gray levels of their
neighbors. This is performed by a simple table-lookup
operation in real time
T = 5.5 0 0 1 1 1 1 1 1 0 0
0 0 1 1 0 1 1 1 0 0
0 0 1 0 0 0 1 1 0 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0
Local Thresholding
• Local thresholding varies the threshold based on
the region in which the pixel is located
B11 B12 B13
1 5 6 7 8 9 10 11 5 5
1 5 6 7 3 9 10 11 5 5
1 5 6 2 3 3 10 11 5 5
1 5 2 2 3 3 4 11 5 5
0 1 1 1 1 1 1 2 2 3 3 4 4 5 5 1 1 1 0 0
0 1 1 1 0 1 1 1 0 0
0
0
1
1
1
0
0
0
0
0
Thresholding Thresholding 0 1 1 0 0
0 0 1 0 0
0 0 0 0 0 T=4 T=7 0 0 0 0 0
Dynamic Thresholding
• The threshold is allowed to vary from pixel to
pixel in this case.
• The procedure is similar to local thresholding.
The additional step here is the generation of a
threshold surface by interpolating the threshold
values computed for different blocks.
• The threshold surface provides a threshold for
each pixel of the image.
• This overcomes the limitation of discontinuities
at the block boundaries
Foreground Background
Optimal
threshold ?
Histograms
Effect of Smoothing
-
median filtering
thresholding
Original artwork from the book Digital Image Processing by R.C. Gonzalez and R.E. Woods ©
R.C. Gonzalez and R.E. Woods, reproduced with permission granted to instructors by authors on
the website www.imageprocessingplace.com
Optimal Thresholding
• Histogram shape can be useful in locating the
threshold. However it is not reliable for
threshold selection when peaks are not clearly
resolved
• Optimal thresholding: a criterion function is
devised that yields some measure of
separation between regions
• A criterion function is calculated for each
intensity and that which maximizes/minimizes
this function is chosen as the threshold
Otsu’s Methold
• Otsu’s thresholding method is based on selecting the
lowest point between two classes (histo.peaks).
• Based on the threshold, the two classes have respective
means and standard deviations
Otsu’s Methold
• Analysis of variance (variance=standard deviation2)
1 T 1 L1 1 L1
0 ni .i 1 ni .i ni .i
N0 i 0 N1 i T 1 N i 0
L 1
1
2
N
(i
i 0
) 2
ni
Otsu’s Method
Between-classes variance (b2 ): The variation of the
mean values for each class from the overall intensity
mean of all pixels:
b2 = w0 (0 - )2 + w1(1 - )2,
Otsu’s Method
• The criterion function involves between-classes
variance to the total variance is defined as:
(T) = b2 / 2
• Since b2 is a function of threshold T, (T) is
evaluated for all possible thresholds, and the one
that maximizes is chosen as the optimal
threshold
Otsu’s Method
• The within cluster variance need not be
separately minimized, since maximizing
between cluster variance automatically
minimizes within cluster variance. This is
because the sum of between cluster variance
and within cluster variance equals the total
variance of the image, which is independent of
the threshold chosen
Input Image
2 thresholds
3 display
levels
0, 127, 254
6 thresholds
7 display
levels
0, 42, 84, 126,
168, 210, 252
Entropy Method
• Entropy is served as a measure of information
content
• A threshold level t separates the whole
information into two classes, and the entropy
associated with them is:
t 255
H b pi log( pi ) H w pi log( pi )
i 0 i t 1
1 x
2
p( g | i) exp i
2 i2 2 i
To avoid confusion, we write p(g|i) as p(g|i,T) since classes 1
and 2 are formed according to the threshold T chosen in our
context.
GNR602 Lecture 15-21 B. Krishna Mohan
IIT Bombay Slide 39
ih(i)
1 (T ) i 0
P1 (T )
n
ih(i)
2 (T ) i T 1 P2 (T )
GNR602 Lecture 15-21 B. Krishna Mohan
IIT Bombay Slide 41
Minimum Error Thresholding
• The probability of a pixel being mapped correctly
(below threshold or above threshold) is denoted
by P(g,T) and given by
p ( g | i, T ) Pi (T )
P( g , T )
p( g )
where i=1 if g T, and i=2 if g > T
Substituting the expressions for the individual class
conditional distributions, and taking logarithms, and
multiplying by -2, we can rewrite P(g,T)
GNR602 Lecture 15-21 B. Krishna Mohan
IIT Bombay Slide 42
Minimum Error Thresholding
2
g i (T )
P( g , T ) 2.log( i (T )) 2 log( i (T ))
i (T )
where i=1 if g T, and i=2 if g > T (Verify!)
The average performance for the whole image is
given by
J (T ) p ( g ) P ( g , T )
g
What is an Edge ?
• An edge is a discontinuity in the perceptual property –
brightness / color / texture / surface orientation
• An edge is a set of connected pixels that lie on the
boundary between two regions
• The pixels on an edge are called edge pixels or edgels
• Gray level / color / texture discontinuity across an edge
causes edge perception
• Position & orientation of edge are key properties
Different Edges
Cont…
• The boundaries of an object are often considered to be
analogous to its edges.
• These boundaries are discovered by following a path of
rapid change in image intensity.
• Most edge-detection functions look for places in the
image where the intensity changes rapidly by locating
places where the first derivative of the intensity is larger
in magnitude than some threshold, or finding places
where the second derivative of the intensity has a zero
crossing
Types of Edge
Step Edges
Different
kinds of
edges
Real Edges
I
1
I i1, j 1 I i, j 1 I i1, j I i, j I i , j 1 I i 1, j 1
x 2
I
1
I i 1, j 1 I i 1, j I i, j 1 I i, j Ii, j I i 1, j
y 2
Convolution masks :
I 1 1 1 I 1 1 1
x 2 y 2
1 1 1 1
x y
Convolution masks :
0 1 0 1 4 1
1
2 I 1 4 1 or 4 20 4 (more accurate)
20
0 1 0 1 4 1
Sobel operators
Better approximations of the gradients exist
Prewitt (3 x 3):
-1 0 1 1 1 1
-1 0 1 0 0 0
-1 0 1 -1 -1 1
Sobel (5 x 5): -1 -2 0 2 1 1 2 3 2 1
-2 -3 0 3 2 2 3 5 3 2
-3 -5 0 5 3 0 0 0 0 0 Poor Localization
-2 -3 0 3 2 -2 -3 -5 -3 -2 Less Noise Sensitive
-1 -2 0 2 1 -1 -2 -3 -2 -1 Good Detection
2
f ( x , y ) 2
f ( x, y )
f ( x, y )
2
x 2
y 2
Cont…
Common Laplacian kernels are:
Laplacian of Gaussian
• The noise in the input image is reduced by smoothing.
• Among the various smoothing operators, Gaussian filter
has desirable properties in terms of space-frequency
localization.
• Input image is therefore smoothed using the Gaussian
shaped smoothing operator, whose width s is user
controllable.
• In this approach, an image should first be convolved
with Gaussian filter
g(x, y) G(x, y,) f (x, y)
2
Laplacian of Gaussian
Consider
LoG operator
Zero-crossings
GNR602 Lecture 15-21 B. Krishna Mohan
IIT Bombay Slide 63
2D edge detection filters
Laplacian of Gaussian
Cont…
• The order of performing differentiation and convolution
can be interchanged because of the linearity of the
operators involved:
g(x, y) 2G(x, y) f (x, y)
Cont…
• zero at a long distance from the edge,
• positive just to one side of the edge,
• negative just to the other side of the edge,
• zero at some point in between, on the edge itself.
Cont…
Applications of LoG
• Spurious edges detected away from any obvious edges
hence to increase the smoothing of the Gaussian to
preserve only strong edges.
• The gradient of the LoG at the zero crossing (i.e. the
third derivative of the original image) and only keep zero
crossings where this is above a certain threshold. This
will tend to retain only the stronger edges, but it is
sensitive to noise, since the third derivative will greatly
amplify any high frequency noise in the image.
Cont…
• Reduce current sigma by s again
• Repeat the process till the desired scale of sigma is
reached
Advantage :
• The degree of detail can be controlled by the choice of
starting sigma and step size.
• Computational load can be reduced by limiting the zero
crossing detection at smaller sigma values to those
locations that are in the neighborhood of those detected
at the previous level
Refer
• F. Bergholm’s paper “Edge Focusing” in IEEE
Trans. Pattern Analysis and Machine
Intelligence, November 1987.
Cont…
• There are four steps following the diagram:
Smoothing
• For the smoothing step Gaussian LPF.
• The standard deviation, determines the width of the filter
and hence the amount of smoothing.
S ( x, y ) G ( x, y , ) f ( x, y )
• Let f(x,y) denote the input image. The result from
convolving the image with Gaussian smoothing filter
using separable filtering is an array of smoothed data.
• S(x,y) = The spread of the Gaussian and controls the
degree of smoothing.
Cont…
The edge enhancement step simply involves
calculation of the gradient vector at each pixel of
the smoothed image.
gx (x, y) S(x 1, y) S(x 1, y) gy (x, y) S(x, y 1) S(x, y 1)
gy
Magnitude ( x, y ) g g x2 g y2
tan 1
gx
GNR602 Lecture 15-21 B. Krishna Mohan
IIT Bombay Slide 77
Gradient
• At each point convolve with
1 1 1 1
Gx Gy
1 1 1 1
• Magnitude and Orientation of the Gradient are computed
as
M [i, j ] P[i, j ] Q[i, j ]
2 2
NMS
• The localization step has two stages: non-maximal
suppression and hysteresis thresholding
Non-Maximal Suppression 90
(NMS) thins the ridges of 135 45
gradient magnitude in
180
magnitude image by 0
suppressing all values along the
225 315
line of the gradient that are not
peak values of a ridge . 270
s Sector [ ( x , y )]
GNR602 Lecture 15-21 B. Krishna Mohan
IIT Bombay Slide 79
NMS
• Search in a 3x3 neighborhood at every point in magnitude
image
• Consider the gradient direction at the centre pixel
• Compare the edge magnitude at centre pixel with its two
neighbors along the gradient direction.
• If the magnitude value at the center point is not greater than
both of the neighbor magnitudes along the gradient direction,
then g(x,y) is set zero. The values for the height of the ridge
are retained in the NMS magnitude.
gsuppressed image = nms [g(x,y), s]
Non-Maxima Suppression
• After nonmaxima suppression one ends up with
an image which is zero everywhere except the
local maxima points.
• Thin edges by keeping large values of Gradient
Principle of NMS
• Thin the broad ridges in M[i,j] into ridges that are only
one pixel wide
• Find local maxima in M[i,j] by suppressing all values
along the line of the Gradient that are not peak values of
the ridge
0 0 0 0 1 1 1 3
3 0 0 1 2 1 3 1
0 0 2 1 2 1 1 0
false 0 1 3 2 1 1 0 0
edges 0 3 2 1 0 0 1 3
2 3 2 0 0 1 0 1 gaps
2 3 2 0 1 0 2 1
GNR602 Lecture 15-21 B. Krishna Mohan
IIT Bombay Slide 82
Gradient Orientation
• Reduce angle of Gradient
θ[i,j] to one of the 4 sectors
• Check the 3x3 region of each
M[i,j]
• If the value at the center is
not greater than the 2 values
along the gradient, then M[i,j]
is set to 0
0 0 0 0 0 0 0 3
0 0 0 0 2 1 3 0
0 0 2 1 2 0 0 0
0 0 3 0 0 0 0 0 false edges
0 3 2 0 0 0 0 0
0 3 0 0 0 1 0 1
0 3 0 0 1 0 2 0
GNR602 Lecture 15-21 B. Krishna Mohan
IIT Bombay Slide 84
0 0 0 0 1 1 1 3 Definite Edges
0 0 0 1 2 1 3 1
0 0 2 1 2 1 1 0
Definite
NMS 0 1 3 2 1 1 0 0 non-edges
0 3 2 1 0 0 1 0 To be
considered
2 3 2 0 0 1 0 1
2 3 2 0 1 0 2 1
Thresholding
• Reduce number of false edges by applying a
threshold T
– all values below T are changed to 0
– selecting a good value for T is difficult
– some false edges will remain if T is too low
– some edges will disappear if T is too high
– some edges will disappear due to softening of
the edge contrast by shadows
Double Thresholding
• Apply two thresholds in the suppressed image
– T2 > T1
– two images in the output
– the image from T2 contains fewer edges but has gaps
in the contours
– the image from T1 has many false edges
– combine the results from T1 and T2
– link the edges of T2 into contours until we reach a gap
– link the edge from T2 with edge pixels from a T1
contour until a T2 edge is found again
GNR602 Lecture 15-21 B. Krishna Mohan
Input Image
Canny Output s=1, T1=0.2, T2=0.5
Canny Output s=1, T1=0.4, T2=0.7
Input Image
Canny Output s=2, T1=0.2, T2=0.5
Canny Output s=2, T1=0.4, T2=0.7
Input Image
Canny Output s=5, T1=0.2, T2=0.5
Canny Output s=5, T1=0.4, T2=0.7
IIT Bombay Slide 87
Spatial Spectral
IRS-1C LISS3
23.5m
AIRBORNE
NATURAL
COLOR
COMPOSITE
Spatial Spectral
Evolution of Segmentation
Techniques
• Pixel based classification using spectral features
(Landsat, IRS 1C/1D, SPOT 1,2,3)
• Pixel based classification using spectral and
textural features (IRS P6, SPOT 5, …)
• Object based classification using spatial
features, spectral and textural features
Step 1
Preprocessor
– Image smoothing
– Suppress noise / eliminate minute detail that
is not of interest
– Adaptive Gaussian / Median
– Very useful to produce proper segmentation
– Optional step
GNR602 Lecture 15-21 B. Krishna Mohan
IIT Bombay Slide 117
Preprocessing
Mean Filter
Gaussian Filter
x 2
y2
1
G ( x, y ) e 2 2
2
Preprocessing
(close-open) alternating
sequential
morphological filter
CpOpCp-1Op-1…C1O1(I)
GNR602 Lecture 15-21 B. Krishna Mohan
IIT Bombay Slide 119
• =(Gx,Gy)T
• where superscript t in It stands for
iteration index.
•GNR602
Io(x,y) is the original
Lecture 15-21 image
B. Krishna Mohan
IIT Bombay Slide 120
Algorithm
• Since the weights assigned to
neighboring pixels decrease with
magnitude of the gradient, we consider
t 2
d ( x, y )
2k 2
• w(x,y)= e
• Nt = ( x i, y j )
wt
i 1 j 1
Illustration of
the working of
edge-preserving
smoothing
Tree structure
Parent Node
N
NW node NW NE
NE node
W E
SW node
SE node SW SE
Nodetype (root/leaf/intermediate)
S
Size
Mean
Notation
Texture
4 7 8 8 20 22 22 22
Splitting of an image
10 8 8 8 21 23 22 22
12 12 10 10 26 25 40 50
12 12 14 16 27 28 45 48
Merging process
10 8 8 8 22 23 22 22
12 12 10 10 26 23 40 50
27 49
12 12 14 16 27 28 45 48
( x) | g ( x) mean[ g ( y)]|
to determine whether x can be added to region y.
Small value of (x) means high affinity of x to region y.
( x) mini | g ( x) mean[ g ( yi )] |
Algorithm
• Alternatively, x can be labeled as a
boundary pixel, and we append a pixel z to
region Ai such that
( z ) min x ( x)
• This process refers to one iteration, that is
repeated till all pixels in the image are
allocated to one region or the other.
5 6 7 22 25
11 10 12 29 28
20 104 20 26 27
E (u , v) w( x, y ) I ( x u, y v) I ( x, y )
2
x, y
Ix Ix Ix Iy u
w(x, y)(u v)
x,y Ix Iy Iy Iy v
u
E (u, v) u, v M
v
I x2 IxIy
M w( x, y ) 2
x, y I x I y I y
Harris Detector: Mathematics
Intensity change in shifting window: eigenvalue analysis
u
E (u, v) u, v M 1, 2 – eigenvalues of M
v
direction of the
fastest change
direction of the
slowest change
Ellipse E(u,v) = const
(max)-1/2
(min)-1/2
Example to show a
polynomial as an ellipse
Let the polynomial be 4x2 – 8x + y2 + 4y = 8
The standard form of an ellipse with centre
at (p,q) and semi-axes given by a and b is:
(x-p)2 / a2 + (y-q)2 / b2 = 1
The above polynomial can be rewritten as:
4x2 – 8x + 4 + y2 + 4y + 4 = 8 + 4 + 4
4(x – 1)2 + (y + 2)2 = 16 or
(x – 1)2 / 4 + (y + 2)2 / 16 = 1
Harris Detector: Threshold
Classification of 2 “Edge”
image points using 2 >> 1 “Corner”
eigenvalues of M: 1 and 2 are large,
1 ~ 2 ;
E increases in all
directions
1
Harris Detector: Threshold
Measure of corner response:
R det M k trace M
2
det M 12
trace M 1 2
2 “Edge” “Corner”
• R depends only on R<0
eigenvalues of M
• R is large for a corner R>0
• R is negative with large
magnitude for an edge
• |R| is small for a flat
region “Flat” “Edge”
|R| small R<0
1
Harris Detector
• The Algorithm:
– Find points with large corner response
function R (R > threshold)
– Take the points of local maxima of R
Harris Detector: Workflow
Harris Detector: Workflow
Compute corner response R
Harris Detector: Workflow
Find points with large corner response: R>threshold
Harris Detector: Workflow
Take only the points of local maxima of R
Harris Detector: Workflow
Harris Detector: Summary
R R
threshold
• Geometry
– Rotation
– Similarity (rotation + uniform scale)
L ( x, y 1) L ( x, y 1)
( x, y ) tan 1
L ( x 1, y ) L ( x 1, y )
2 2
※L is the luminance value of pixel
HOG feature extraction
algorithm(2)
3. To create a histogram of gradient orientations for
each cell(5×5pixel) using the gradient magnitude
and orientation of the calculated.
– The orientation bins are evenly spaced over 0°– 180°with
Image nine bins each of 20o each. By adding the magnitude of
size = the luminance gradient for each orientation, generate a
60x30 histogram.
Orientation num is
2 2 9
2
HOG feature extraction
algorithm(3)
4. Normalization and Descriptor Blocks
– Normalization is performed using the following
equation:
v(n)
v(n)
339 2
v(k) 1
k 1
v(n) is the magnitude of each
direction
HOG feature extraction
algorithm(3)
4. Normalization and Descriptor Blocks
– Normalization is performed using the following
equation:
v(n)
v(n)
339
v(k) 1
2
k 1
v is the magnitude of each direction
Normalized vector is computed over 9
Cells and 9 orientations
For a block, the vector size = 9x9=81
Feature
HOG image
descriptor
size
•12x6 cells
•Number of
orientations =
9
•Block size =
3x3=9
•Block moves
4 steps to
right and 10
steps down
Descriptor
size for total
image =
10x4x9x9=
3240
Example of using HOG
• HOG can represent a rough shape of the
object, so that it has been used for general
object recognition, such as people or cars.
• In order to achieve the general object
recognition, the classifier (eg SVM) is be
used.
1. To teach the classifier, the correct image and
the incorrect image.
2. Scan the classifier to determine whether
there are people in the detection window.
SVM based Classification
SVM divides space into two domains
according to a teacher signal.
New examples are predicted to belong to
a category based on which side of the gap
domain.
SVM Based Classification
SVM divides space into two domains
according to a teacher signal.
New examples are predicted to belong to
a category based on which side of the gap
domain.
SVM Based Classification
Comparison with Different
Feature Descriptors
Comparison with Different
Feature Descriptors
Summary
Steps
Parameters
3/15/2023 211
Multi-Scale Oriented Patches
3/15/2023 213
[ Microsoft Digital Image Pro version 10 ]
Ideas from Matt’s Multi-Scale
Oriented Patches
1. Detect an interesting patch with an interest
operator. Patches are translation invariant.
2. Determine its dominant orientation.
3. Rotate the patch so that the dominant
orientation points upward. This makes the
patches rotation invariant.
4. Do this at multiple scales, converting them all
to one scale through sampling.
5. Convert to illumination “invariant” form
3/15/2023 214
Implementation Concern:
How do you rotate a patch?
• Start with an “empty” patch whose
dominant direction is “up”.
• For each pixel in your patch, compute the
position in the detected image patch. It will
be in floating point and will fall between
the image pixels.
• Interpolate the values of the 4 closest
pixels in the image, to get a value for the
pixel in your patch.
3/15/2023 215
Rotating a Patch
(x,y) T
(x’,y’)
3/15/2023 216
Using Bilinear Interpolation
• Use all 4 adjacent samples
I01 I11
y
I00 I10
x
3/15/2023 217
SIFT: Motivation
• The Harris operator is not invariant to scale and
correlation is not invariant to rotation1.
SIFT Features
3/15/2023 219
Claimed Advantages of SIFT
• Locality: features are local, so robust to occlusion
and clutter (no prior segmentation)
• Distinctiveness: individual features can be
matched to a large database of objects
• Quantity: many features can be generated for even
small objects
• Efficiency: close to real-time performance
• Extensibility: can easily be extended to wide range
of differing feature types, with each adding
robustness
3/15/2023 220
Overall Procedure at a High
Level
1. Scale-space extrema detection
Search over multiple scales and image locations.
2. Keypoint localization
Fit a model to determine location and scale.
Select keypoints based on a measure of stability.
3. Orientation assignment
Compute best orientation(s) for each keypoint region.
4. Keypoint description
Use local image gradients at selected scale and rotation
to describe each keypoint region.
3/15/2023 221
1. Scale-space extrema detection
• Goal: Identify locations and scales that can be
repeatably assigned under different views of the
same scene or object.
• Method: search for stable features across multiple
scales using a continuous function of scale.
• Prior work has shown that under a variety of
assumptions, the best function is a Gaussian
function.
• The scale space of an image is a function L(x,y,)
that is produced from the convolution of a Gaussian
kernel (at different scales) with the input image.
3/15/2023 222
Aside: Image Pyramids
And so on.
3/15/2023 223
Aside: Mean Pyramid
And so on.
mean
3/15/2023 224
Aside: Gaussian Pyramid
At each level, image is smoothed and
reduced in size.
And so on.
3/15/2023 225
Example: Subsampling with Gaussian pre-
filtering
G 1/8
G 1/4
Gaussian 1/2
3/15/2023 226
Lowe’s Scale-space Interest
Points
• Laplacian of Gaussian kernel
– Scale normalised (x by scale2)
– Proposed by Lindeberg
• Scale-space detection
– Find local maxima across scale/space
– A good “blob” detector
3/15/2023 227
[ T. Lindeberg IJCV 1998 ]
Lowe’s Scale-space Interest Points:
Difference of Gaussians
• Gaussian is an ad hoc
solution of heat
diffusion equation
• Hence
3/15/2023 228
Lowe’s Pyramid Scheme
• Scale space is separated into octaves:
• Octave 1 uses scale
• Octave 2 uses scale 2
• etc.
s+2 filters
s+1=2(s+1)/s0
.
.
i=2i/s0
.
. s+3 s+2
2=22/s0 images differ-
1=21/s0 including ence
0 original images
3/15/2023 The parameter s determines the number of images per octave. 230
Key point localization s+2 difference images.
top and bottom ignored.
s planes searched.
B lur
Subtrac t
% correctly matched
Stability Expense
• Sampling in scale for efficiency
– How many scales should be used per octave? S=?
• More scales evaluated, more keypoints found
• S < 3, stable keypoints increased too
• S > 3, stable keypoints decreased
• S = 3, maximum stable keypoints found
3/15/2023 232
Keypoint localization
3/15/2023 233
Keypoint localization
• There are still a lot of points, some of them
are not good enough.
• The locations of keypoints may be not
accurate.
• Eliminating edge points.
(1)
(2)
(3)
Eliminating the Edge Response
• Reject flat areas (in terms of intensity):
< 0.03
• Reject edges:
Let be the eigenvalue with
larger magnitude and the smaller.
3/15/2023 237
Keypoint localization with
orientation
832
233x189
initial keypoints
536
729
keypoints after keypoints after
gradient threshold ratio threshold
3/15/2023 238
4. Keypoint Descriptors
• At this point, each keypoint has
– location
– scale
– orientation
• Next is to compute a descriptor for the local
image region about each keypoint that is
– highly distinctive
– invariant as possible to variations such as
changes in viewpoint and illumination
3/15/2023 239
Normalization
• Rotate the window to standard orientation
3/15/2023 240
Lowe’s Keypoint Descriptor
(shown with 2 X 2 descriptors
over 8 X 8)
3/15/2023 241
Lowe’s Keypoint Descriptor
• use the normalized region about the keypoint
• compute gradient magnitude and orientation at each
point in the region
• weight them by a Gaussian window overlaid on the
circle
• create an orientation histogram over the 4 X 4
subregions of the window
• 4 X 4 descriptors over 16 X 16 sample array were
used in practice. 4 X 4 times 8 directions gives a
vector of 128 values. ...
3/15/2023 242
Using SIFT for Matching “Objects”
3/15/2023 243
3/15/2023 244
Uses for SIFT
• Feature points are used also for:
– Image alignment (homography, fundamental
matrix)
– 3D reconstruction (e.g. Photo Tourism)
– Motion tracking
– Object recognition
– Indexing and database retrieval
– Robot navigation
– … many others
3/15/2023 245
[ Photo Tourism: Snavely et al. SIGGRAPH 2006 ]
GNR602 Lecture 15-21 B. Krishna Mohan