Feature Extraction
Feature Extraction
2
Why Feature Extraction?
Each image contains a huge amount of data (e.g., 640´480´8 bits)
In order to allow real-time image interpretation, it is necessary to
discard most of the data
The aim of feature extraction is to dramatically reduce the amount of
data by
Discarding the redundant information in the images
(e.g., those related to reflectance and lighting conditions)
Preserving the useful information in the images
(e.g., those related to camera pose and scene structures)
images
generic salient features
10 Mbytes/s
5 Kbytes/s
(mono CCD)
All subsequent interpretation is then performed on the generic
representation, but not on the original image
3
Salient Features
It seems possible to interpret images using a small amount of edge and
corner data
4
Image Structures
A featureless
region
An edge
A corner
5
Image Structures - Featureless Region
A featureless region is characterized by a smooth variation of intensities
6
Image Structures - Edge
A patch containing an edge reveals an intensity discontinuity in one
direction
7
Image Structures - Corner
A patch containing a corner reveals an intensity discontinuity in two
directions
8
1D Edge Detection
Recall the derivative of a function
𝑑𝐼(𝑥) 𝐼(𝑥 + Δ𝑥) − 𝐼(𝑥)
= 𝐼′(𝑥) = lim
𝑑𝑥 '(→! Δ𝑥
I’(x) gives the rate of change of intensity with respect to x
An intuitive approach to edge detection is to look for maxima and
minima in I’(x)
This simple strategy is, however, defeated by noise
9
1D Edge Detection
To overcome the problem caused by noise, the image is usually
smoothed by a Gaussian filter before edge detection
1 (!
%
𝑔* (𝑥) = 𝑒 +*!
σ 2π
10
1D Edge Detection
Example:
11
1D Edge Detection
The smoothing in step (1) is performed by a 1D convolution
For discrete signals, the differentiation in step (2) is also performed by a
1D convolution with the kernel [1/2 0 -1/2] (or simply [1 0 -1])
Edge detection would therefore appear to require two computationally
expensive convolutions
However, the derivative theorem of convolution says that
𝑑
𝑆′(𝑥) = 𝑔 (𝑥) ∗ 𝐼(𝑥) = 𝑔′* (𝑥) ∗ 𝐼(𝑥)
𝑑𝑥 *
Hence, S’(x) can also be computed by convolving only once
13
1D Edge Detection
Having obtained the derivative S’(x), interpolation can be used to locate
any maxima or minima to sub-pixel accuracy f(x)=ax2+bx+c
Approximate the function locally by 𝑓(𝑥) = 𝑎𝑥 + + 𝑏𝑥 + 𝑐
Without loss of generality, let the sample maximum
and its immediate neighbors have coordinates
x = 0, -1 and 1 respectively
This gives
𝑓(1) + 𝑓(−1) − 2𝑓(0)
𝑓(−1) = 𝑎 − 𝑏 + 𝑐 𝑎=
2
A 𝑓(0) = 𝑐 Þ 𝑏=
𝑓(1) − 𝑓(−1)
𝑐 = 𝑓(0)
2 x
𝑓(1) = 𝑎 + 𝑏 + 𝑐 -1 0 xe 1
Locate the maximum/minimum by solving 𝑓 , (𝑥) = 2𝑎𝑥 + 𝑏 = 0 which gives
𝑏 𝑓(1) − 𝑓(−1)
𝑥- = − =−
2𝑎 2 𝑓(1) + 𝑓(−1) − 2𝑓(0)
Finally, an edge is marked at each maximum or minimum whose
magnitude exceeds some thresholds
14
2D Edge Detection
The 1D edge detection scheme can be
extended to work in 2D
The first step is to smooth the image I(x, y)
by convolving with a 2D Gaussian kernel
Gs(x, y) 1 %
!
( ./ !
17
2D Edge Detection
The fourth step is to threshold the
edgels, so that only those strong edgels
with ||ÑS|| above a certain value are
retained
In the final step, weak edgels which
have been deleted are revived if they
span the gaps between some strong
edgels in a process known as hysteresis
The edge detection algorithm just described is called the Canny Edge
Detection
The output is a list of edgel positions, each with a strength ||ÑS|| and an
orientation ÑS/||ÑS||
18
Multi-scale Edge Detection
The variable s in the Gaussian kernel controls
the standard deviation of the Gaussian/normal
distribution and hence the amount of smoothing
The amount of smoothing determines the scale
at which the image is being analyzed Image of a table cover
There is no right or wrong size for the Gaussian
kernel:
A Gaussian kernel with a small s (i.e., modest
smoothing) brings out edges at a fine scale
A Gaussian kernel with a larger s (i.e., more
smoothing) identifies edges at a larger scale, s = 0.5
suppressing the finer details
Note that fine scale edge detection is particularly
sensitive to noise
19
s = 2.0
Aperture Problem
Edges are a powerful intermediate Aperture problem
representation for image interpretation
The motion of an edge is, however, rendered
ambiguous by the aperture problem: when
viewing a moving edge, it is only possible to
measure the motion normal to the edge ?
Hence, edges are often insufficient in
analyzing image motion
This calls for the use of corner features in
motion analysis
Unlike edges, the displacement of a corner
is not ambiguous
Corners are most useful for tracking in image
sequences and matching in stereo pairs
20
Corner Detection
A corner is characterized by an intensity
discontinuity in two directions
Such a discontinuity can be detected
using cross correlation
3 3
21
Corner Detection
To handle the differences caused by lighting conditions, each image
patch is first normalized by subtracting the mean intensity value of the
patch from each pixel and then scaling the resulting patch to have a unit
Frobenius norm, and this results in normalized cross correlation
% %
𝑃 𝑢, 𝑣 − 𝑃. 𝐼 𝑥 + 𝑢, 𝑦 + 𝑣 − 𝐼 ̅
𝑁𝐶𝐶(𝑥, 𝑦) = ) )
"#$% &#$%
∑ ∑ 𝑃 𝑢, 𝑣 − 𝑃. ' ∑ ∑ 𝐼 𝑥 + 𝑢, 𝑦 + 𝑣 − 𝐼 ̅ '
6
𝐏%𝐏 𝐈%𝐈̅ 𝐏M
𝑁𝐶𝐶(𝑥, 𝑦) = P = 𝑐𝑜𝑠q −
6
𝐏%𝐏 𝐈%𝐈̅ 𝐏
q
𝐈 − 𝐈M
An output of 1 indicates a perfect match, while an output of 0 indicates
a perfect mismatch
22
Corner Detection
Examples: normalized cross correlation with different patches
Patch Patch
Image
Patch Patch
23
Corner Detection
A practical corner detection algorithm needs to do something more
efficient than calculating correlation functions for every pixel
The most commonly used corner detection algorithm, the Harris corner
detector, marks corners by examining the maximum and minimum
changes in intensity at each pixel
Consider the change in intensity in a given direction n
𝐼( 𝜕𝐼 𝜕𝐼
𝐼𝐧 (𝑥, 𝑦) = ∇𝐼(𝑥, 𝑦) ⋅ 𝐧/ 𝐧 where ∇𝐼 = 𝐼 and 𝐼( = , 𝐼/ =
/ 𝜕𝑥 𝜕𝑦
𝐧
= ∇𝐼(𝑥, 𝑦):
𝐧: 𝐧
24
Corner Detection
The squared change in intensity around (x, y) in the direction n is given
by 𝐧 : ∇𝐼∇𝐼 : 𝐧 𝐧 : 𝐀𝐧
𝐶𝐧 (𝑥, 𝑦) = 𝐼𝐧+ = 𝐼𝐧: 𝐼𝐧 = = :
𝐧: 𝐧 𝐧 𝐧
25
Corner Detection
Image structure around each pixel can therefore be classified by looking
at the eigenvalues of A:
Case 1: l1 » l2 » 0
Smooth variation in intensity (i.e., a featureless region)
Case 2: l1 » 0, l2 is large
Intensity discontinuity in one direction (i.e., an edge)
The eigenvector u1 (with corresponding eigenvalue l1) gives the direction
along the edge, while the eigenvector u2 (with corresponding eigenvalue l2)
gives the normal to the edge
Case 3: Both l1 and l2 are large and distinct
Intensity discontinuity in two directions (i.e., a corner)
26
Corner Detection
In Harris corner detection algorithm, corners are marked at points where
the quantity R = l1l2 − k(l1 + l2)2 exceeds some threshold, here k is a
parameter set to 0.04 as suggested by Harris
Note that l1l2 = det(A) and l1+l2 = trace(A)
𝑎!! 𝑎!#
𝐀= 𝑎 𝑎## det( 𝐀) = 𝑎!! 𝑎## − 𝑎!# 𝑎#! trace( 𝐀) = 𝑎!! + 𝑎##
#!
27
Corner Detection
To lessen the effect of noise, it might be desirable to smooth the image
with a Gaussian kernel. This might, however, also “smooth away” the
corner features
Smoothing is also done on the images containing the squared image
derivatives (i.e., Ix2, Iy2 and IxIy), otherwise det(A) will always be zero
𝐼(+ 𝐼( 𝐼/
𝐀=
𝐼( 𝐼/ 𝐼/+
28
Corner Detection
Summary of Harris corner detection algorithm:
1. Compute Ix and Iy at each pixel I(x, y)
2. Form the images of Ix2, Iy2 and IxIy respectively
3. Smooth the images of squared image derivatives
4. Form an image of the cornerness function R using the smoothed images of
squared derivatives (i.e., áIx2 ñ, áIy2 ñ and áIxIyñ)
5. Locate local maxima in the image of R as corners
6. Compute the coordinates of the corners up to sub-pixel accuracy by
quadratic approximation using values in the neighborhood
7. Threshold the corners so that only those with a value of R above a certain
value are retained
29
Corner Detection
Examples:
31
Implementation Details
Convolving with a 2D Gaussian kernel is computational expensive
The 2D convolution can be decomposed into two 1D convolutions as
follows:
𝐺= (𝑥, 𝑦) ∗ 𝐼(𝑥, 𝑦) = 𝑔= (𝑥) ∗ 𝑔= (𝑦) ∗ 𝐼(𝑥, 𝑦)
and the computational saving would be (2n+1)2/2(2n+1), where (2n+1)
is the width/height of the kernel
Proof: 3 3 ! !
1 %
1 .4
𝐺= (𝑥, 𝑦) ∗ 𝐼(𝑥, 𝑦) = I I + 𝑒 +=! 𝐼(𝑥 − 𝑢, 𝑦 − 𝑣)
2𝜋𝜎
12%3 42%3
3 3
1 1! 1 4!
% %
= I I 𝑒 += ! 𝑒 += ! 𝐼(𝑥 − 𝑢, 𝑦 − 𝑣)
12%3 42%3
2𝜋𝜎 2𝜋𝜎
3 3
1 1! 1 4!
% ! %
= I 𝑒 += I 𝑒 += ! 𝐼(𝑥 − 𝑢, 𝑦 − 𝑣)
12%3
2𝜋𝜎 42%3
2𝜋𝜎
= 𝑔= (𝑥) ∗ 𝑔= (𝑦) ∗ 𝐼(𝑥, 𝑦) 32