Localized Feature Extraction
Localized Feature Extraction
Two main areas are covered here. The traditional approaches aim to derive local features by
measuring specific image properties. The main target has been to estimate curvature: peaks
of local curvature are corners, and analysing an image by its corners is especially suited to
images of artificial objects. The second area includes more modern approaches that improve
performance by using region or patch-based analysis. We shall start with the more established
curvature-based operators, before moving to the patch or region-based analysis.
4.8.1 Detecting image curvature (corner extraction)
4.8.1.1 Definition of curvature
Edges are perhaps the low-level image features that are most obvious to human vision. They
preserve significant features, so we can usually recognize what an image contains from its
edge-detected version. However, there are other low-level features that can be used in computer
vision. One important feature is curvature. Intuitively, we can consider curvature as the rate of
change in edge direction. This rate of change characterizes the points in a curve; points where the
edge direction changes rapidly are corners, whereas points where there is little change in edge
direction correspond to straight lines. Such extreme points are very useful for shape description
and matching, since they represent significant information with reduced data.
Curvature is normally defined by considering a parametric form of a planar curve. The
parametric contour v t = x t Ux + y t Uy describes the points in a continuous curve as the
endpoints of the position vector. Here, the values of t define an arbitrary parameterization, the
unit vectors are again Ux = 1 0 and Uy = 0 1. Changes in the position vector are given by
the tangent vector function of the curve vt. That is, v̇ t = ẋ t Ux + ẏ t Uy . This vectorial
expression has a simple intuitive meaning. If we think of the trace of the curve as the motion
of a point and t is related to time, the tangent vector definesthe instantaneous motion. At
any moment, the point
moves
with a speed given by v̇ t = ẋ2 t + ẏ2 t in the direction
−1
t = tan ẏ t ẋ t . The curvature at a point vt describes the changes in the direction
t with respect to changes in arc length. That is,
d t
t = (4.41)
ds
where s is arc length, along the edge itself. Here is the angle of the tangent to the curve.
That is, = ± 90 , where is the gradient direction defined in Equation 4.13. That is, if
we apply an edge detector operator to an image, we have for each pixel a gradient direction
value that represents the normal direction to each point in a curve. The tangent to a curve is
given by an orthogonal vector. Curvature is given with respect to arc length because a curve
parameterized by arc length maintains a constant speed of motion. Thus, curvature represents
changes in direction for constant displacements along the curve. By considering the chain rule,
we have
d t dt
t = (4.42)
dt ds
The differential ds/dt defines the change in arc length with respect to the parameter t. If we
again consider the curve as the motion of a point, this differential defines the instantaneous
change in distance with respect to time. That is, the instantaneous speed. Thus,
ds dt = v̇ t = ẋ2 t + ẏ2 t (4.43)
and
dt ds = 1 ẋ2 t + ẏ2 t (4.44)
By considering that t = tan−1 ẏ t ẋ t , then the curvature at a point vt in Equation 4.42
is given by
ẋ t ÿ t − ẏ t ẍ t
t = (4.45)
ẋ2 t + ẏ2 t3/ 2
This relationship is called the curvature function and it is the standard measure of curvature for
planar curves (Apostol, 1966). An important feature of curvature is that it relates the derivative
of a tangential vector to a normal vector. This can be explained by the simplified Serret–Frenet
equations (Goetz, 1970) as follows. We can express the tangential vector in polar form as
If the curve is parameterized by arc length, then v̇t is constant. Thus, the derivative of a
tangential vector is simply given by
v̈ t = v̇ t − sin t + j cos t d t dt (4.47)
Since we are using a normal parameterization, then d t dt = d t ds. Thus, the tangential
vector can be written as
where n t = vt − sin t + j cos t defines the direction of v̈t, while the cur-
vature t defines its modulus. The derivative of the normal vector is given by ṅt =
v̇ t − cos t − i sin t d t /ds, which can be written as
Clearly, nt is normal to v̇t. Therefore, for each point in the curve, there is a pair of orthogonal
vectors v̇t and nt whose moduli are proportionally related by the curvature.
In general, the curvature of a parametric curve is computed by evaluating Equation 4.45.
For a straight line, for example, the second derivatives ẍt and ÿt are zero, so the curvature
ẋ t = r cos t and ẏ t = −r sin t. Thus,
function is nil. For a circle of radius r, we have that
ÿ t = −r cos t, ẍ t = −r sin t and t = 1 r. However, for curves in digital images, the
derivatives must be computed from discrete data. This can be done in three main ways. The
most obvious approach is to calculate curvature by directly computing the difference between
angular direction of successive edge pixels in a curve. A second approach is to derive a measure
of curvature changes in image intensity. Finally, a measure of curvature can be obtained by
correlation.
where the sequence t−1 t t+1 t+2 represents the gradient direction of a sequence of
pixels defining a curve segment. Gradient direction can be obtained as the angle given by an edge
detector operator. Alternatively, it can be computed by considering the position of pixels in the
sequence. That is, by defining t = yt−1 − yt+1 xt−1 − xt+1 , where xt yt denotes pixel t in
the sequence. Since edge points are only defined at discrete points, this angle can only take eight
values, so the computed curvature is very ragged. This can be smoothed out by considering the
difference in mean angular direction of n pixels on the leading and trailing curve segment. That is,
1 n
1 −1
kn t = t+i − (4.51)
n i=1 n i=−n t+i
The average also gives some immunity to noise and it can be replaced by a weighted average
if Gaussian smoothing is required. The number of pixels considered, the value of n, defines
a compromise between accuracy and noise sensitivity. Notice that filtering techniques may
also be used to reduce the quantization effect when angles are obtained by an edge detection
operator. As we have already discussed, the level of filtering the filtering is related to the size
of the template (as in Section 3.4.3).
To compute angular differences, we need to determine connected edges. This can easily be
implemented with the code already developed for hysteresis thresholding in the Canny edge
operator. To compute the difference of points in a curve, the connect routine (Code 4.12)
only needs to be arranged to store the difference in edge direction between connected points.
Code 4.16 shows an implementation for curvature detection. First, edges and magnitudes are
determined. Curvature is only detected at edge points. As such, we apply maximal suppression.
The function Cont returns a matrix containing the connected neighbour pixels of each edge.
Each edge pixel is connected to one or two neighbours. The matrix Next stores only the
direction of consecutive pixels in an edge. We use a value of −1 to indicate that there is no
connected neighbour. The function NextPixel obtains the position of a neighbouring pixel
by taking the position of a pixel and the direction of its neighbour. The curvature is computed
as the difference in gradient direction of connected neighbour pixels.
The result of applying this form of curvature detection to an image is shown in Figure 4.37.
Figure 4.37(a) contains the silhouette of an object; Figure 4.39(b) is the curvature obtained
by computing the rate of change of edge direction. In this figure, curvature is defined only
at the edge points. Here, by its formulation the measurement of curvature gives just a
thin line of differences in edge direction which can be seen to track the perimeter points of
the shapes (at points where there is measured curvature). The brightest points are those with
greatest curvature. To show the results, we have scaled the curvature values to use 256 intensity
values. The estimates of corner points could be obtained by a uniformly thresholded version of
Figure 4.37(b), well in theory anyway!
Unfortunately, as can be seen, this approach does not provide reliable results. It is essentially a
reformulation of a first order edge detection process and presupposes that the corner information
lies within the threshold data (and uses no corner structure in detection). One of the major
difficulties with this approach is that measurements of angle can be severely affected by
quantization error and accuracy is limited (Bennett and MacDonald, 1975), a factor which will
return to plague us later when we study methods for describing shapes.
This defines a forward measure of curvature along the edge direction. We can use an alternative
direction to measure of curvature. We can differentiate backwards (in the direction of − x y
giving − x y. In this case we consider that the curve is given by x t = x + t cos − x y
and y t = y + t sin − x y. Thus,
1 Mx My My Mx
− x y = 3
My2 − MxMy − Mx2 + MxMy (4.55)
Mx2 + My2 2 x x y y
Two further measures can be obtained by considering the forward and a backward differential
along the normal. These differentials cannot be related to the actual definition of curvature, but
can be explained intuitively. If we consider that curves are more than one pixel wide, differenti-
ation along the edge will measure the difference between the gradient angle between interior and
exterior borders of a wide curve. In theory, the tangent angle should be the same. However, in dis-
crete images there is a change due to the measures in a window. If the curve is a straight line, then
the interior and exterior borders are the same. Thus, gradient direction normal to the edge does not
change locally. As we bend a straight line, we increase the difference between the curves defining
the interior and exterior borders. Thus, we expect the measure of gradient direction to change.
That is, if we differentiate along the normal direction, we maximize detection of gross curvature.
The value ⊥ x y is obtained when x t = x + t sin x y and y t = y + t cos x y.
In this case,
1 My My My Mx
⊥ x y = 3
Mx2 − MxMy − MxMy + MyMy (4.56)
Mx2 + My2 2 x x y y
This was originally used by Kass et al. (1988) as a means to detect line terminations, as
part of a feature extraction scheme called snakes (active contours), which are covered in
Chapter 6. Code 4.17 shows an implementation of the four measures of curvature. The function
Gradient is used to obtain the gradient of the image and to obtain its derivatives. The output
image is obtained by applying the function according to the selection of parameter op.
Let us see how the four functions for estimating curvature from image intensity perform for
the image given in Figure 4.37(a). In general, points where the curvature is large are highlighted
by each function. Different measures of curvature (Figure 4.38) highlight differing points on the
feature boundary. All measures appear to offer better performance than that derived by refor-
mulating hysteresis thresholding (Figure 4.37b), although there is little discernible performance
advantage between the direction of differentiation. As the results in Figure 4.38 suggest, detect-
ing curvature directly from an image is not a totally reliable way of determining curvature, and
(a) κϕ (b) κ –ϕ
hence corner information. This is in part due to the higher order of the differentiation process.
(Also, scale has not been included within the analysis.)
This equation approximates the autocorrelation function in the direction (u, v). A measure of
curvature is given by the minimum value of Euv x y obtained by considering the shifts (u, v)
in the four main directions. That is, by (1,0), (0,−1), (0,1) and (−1,0). The minimum is chosen
because it agrees with the following two observations. First, if the pixel is in an edge defining a
straight line, Euv x y is small for a shift along the edge and large for a shift perpendicular to
the edge. In this case, we should choose the small value since the curvature of the edge is small.
Secondly, if the edge defines a corner, then all the shifts produce a large value. Thus, if we also
chose the minimum, this value indicates high curvature. The main problem with this approach
is that it considers only a small set of possible shifts. This problem is solved in the Harris
corner detector (Harris and Stephens, 1988) by defining an analytic expression for the autocor-
relation. This expression can be obtained by considering the local approximation of intensity
changes.
We can consider that the points Px+iy+j and Px+i+uy+j+v define a vector (u, v) in the
image. Thus, in a similar fashion to the development given in Equation 4.58, the increment
in the image
function between
the points can be approximated by the directional derivative
uPx+iy+j x + vPx+iy+j y. Thus, the intensity at Px+i+uy+j+v can be approximated as
Px+iy+j Px+iy+j
Px+i+uy+j+v = Px+iy+j + u+ v (4.59)
x y
This expression corresponds to the three first terms of the Taylor expansion around Px+iy+j (an
expansion to first order). If we consider the approximation in Equation 4.58 we have:
w
w
Px+iy+j Px+iy+j 2
Euv x y = u+ v (4.60)
i=−w j=−w x y
By expansion of the squared term (and since u and v are independent of the summations), we
obtain:
where
2 2
w
w
Px+iy+j
w
w
Px+iy+j
A x y = B x y =
i=−w j=−w x i=−w j=−w y
(4.62)
w
w
Px+iy+j Px+iy+j
C x y =
i=−w j=−w x y
That is, the summation of the squared components of the gradient direction for all the pixels
in the window. In practice, this average can be weighted by a Gaussian function to make the
measure less sensitive to noise (i.e. by filtering the image data). To measure the curvature
at a point (x, y), it is necessary to find the vector (u, v) that minimizes Euv x y given in
Equation 4.61. In a basic approach, we can recall that the minimum is obtained when the
window is displaced in the direction of the edge. Thus, we can consider that u = cos x y
and v = sin x y. These values were defined in Equation 4.53. Accordingly, the minima
values that define curvature are given by
In a more sophisticated approach, we can consider the form of the function Euv x y. We can
observe that this is a quadratic function, so it has two principal axes. We can rotate the function
such that its axes have the same direction as the axes of the coordinate system. That is, we
rotate the function Euv x y to obtain
produces more contrast between lines with low and high curvature than uv x y. The reason is
the inclusion of the second term in Equation 4.73. In general, the measure of correlation is not
only useful to compute curvature; this technique has much wider application in finding points
for matching pairs of images.
g x y k
− g x y
∗ P
D x y
=
k
−
(4.74)
= L x y k
− L x y k
The function L is a scale-space function which can be used to define smoothed images at
different scales. Note again the influence of scale-space in the more modern techniques. Rather
than any difficulty in locating zero-crossing points, the features are the maxima and minima of
the function. Candidate keypoints are then determined by comparing each point in the function
with its immediate neighbours. The process then proceeds to analysis between the levels of
scale, given appropriate sampling of the scale-space. This then implies comparing a point with
its eight neighbours at that scale and with the nine neighbours in each of the adjacent scales,
to determine whether it is a minimum or a maximum, as well as image resampling to ensure
comparison between the different scales.
To filter the candidate points to reject those which are the result of low local contrast (low
edge strength) or which are poorly localized along an edge, a function is derived by local
curve fitting, which indicates local edge strength and stability as well as location. Uniform
thresholding then removes the keypoints with low contrast. Those that have poor localization,
i.e. their position is likely to be influenced by noise, can be filtered by considering the ratio of
curvature along an edge to that perpendicular to it, in a manner following the Harris operator in
Section 4.8.1.4, by thresholding the ratio of Equations 4.71 and 4.72.
To characterize the filtered keypoint features at each scale, the gradient magnitude is calcu-
lated in exactly the manner of Equations 4.12 and 4.13 as
MSIFT x y = L x + 1 y − L x − 1 y2 + L x y + 1 − L x y − 12 (4.75)
−1 L x y + 1 − L x y − 1
SIFT x y = tan (4.76)
L x + 1 y − L x − 1 y
The peak of the histogram of the orientations around a keypoint is then selected as the local
direction of the feature. This can be used to derive a canonical orientation, so that the resulting
descriptors are invariant with rotation. As such, this contributes to the process which aims to
reduce sensitivity to camera viewpoint and to non-linear change in image brightness (linear
changes are removed by the gradient operations) by analysing regions in the locality of the
selected viewpoint. The main description (Lowe, 2004) considers the technique’s basis in much
greater detail, and outlines factors important to its performance, such as the need for sampling
and performance in noise.
As shown in Figure 4.41, the technique can certainly operate well, and scale is illustrated by
applying the operator to the original image and to one at half the resolution. In all, 601 keypoints
are determined in the original resolution image and 320 keypoints at half the resolution. By
inspection, the major features are retained across scales (a lot of minor regions in the leaves
disappear at lower resolution), as expected. Alternatively, the features can be filtered further
by magnitude, or even direction (if appropriate). If you want more than results to convince
you, implementations are available for Windows and Linux (http://www.cs.ubc.ca/spider/lowe/
research.html): a feast for a developer. These images were derived using siftWin32, version 4.
(a) Original image (b) Key points at full (c) Key points at half
resolution resolution
4.8.2.2 Saliency
The new saliency operator (Kadir and Brady, 2001) was also motivated by the need to extract
robust and relevant features. In the approach, regions are considered salient if they are simultane-
ously unpredictable both in some feature and scale–space. Unpredictability (rarity) is determined
in a statistical sense, generating a space of saliency values over position and scale, as a basis
for later understanding. The technique aims to be a generic approach to scale and saliency
compared to conventional methods, because both are defined independent of a particular basis
morphology–meaning that it is not based on a particular geometric feature like a blob, edge or
corner. The technique operates by determining the entropy (a measure of rarity) within patches
at scales of interest and the saliency is a weighted summation of where the entropy peaks. The
new method has practical capability in that it can be made invariant to rotation, translation,
non-uniform scaling and uniform intensity variations and robust to small changes in viewpoint.
An example result of processing the image in Fig. 4.42(a) is shown in Figure 4.42(b) where the
200 most salient points are shown circled, and the radius of the circle is indicative of the scale.
Many of the points are around the walking subject and others highlight significant features in the
background, such as the waste bins, the tree or the time index. An example use of saliency was
within an approach to learn and recognize object class models (such as faces, cars or animals)