0% found this document useful (0 votes)
96 views4 pages

A Review On Image Feature Detection and Description

1) This document is a review paper on image feature detection and description techniques published in a conference proceedings journal in 2016. 2) It provides an overview of several well-known techniques for image feature detection and description such as SIFT, SURF, BRISK, and ORB. 3) The paper also describes two experiments conducted to evaluate and compare the performance of the mentioned techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views4 pages

A Review On Image Feature Detection and Description

1) This document is a review paper on image feature detection and description techniques published in a conference proceedings journal in 2016. 2) It provides an overview of several well-known techniques for image feature detection and description such as SIFT, SURF, BRISK, and ORB. 3) The paper also describes two experiments conducted to evaluate and compare the performance of the mentioned techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2016년 추계학술발표대회 논문집 제23권 제2호(2016.

11)

A Review on Image Feature Detection and Description

Mai Thanh Nhat Truong, Sanghoon Kim*


Department of Electrical, Electronic, and Control Engineering, Hankyong National University

Abstract
In computer vision and image processing, feature detection and description are essential parts of many
applications which require a representation for objects of interest. Applications like object recognition or motion
tracking will not produce high accuracy results without good features. Due to its importance, research on image
feature has attracted a significant attention and several techniques have been introduced. This paper provides a
review on well-known image feature detection and description techniques. Moreover, two experiments are
conducted for the purpose of evaluating the performance of mentioned techniques.

1. Introduction 2. Feature detection and description techniques


Recently, image feature detectors and descriptors have 2.1 SIFT
become an important algorithms in the computer vision and
SIFT (Scale-Invariant Feature Transform) [7] algorithm
image processing. They have been applied widely in many
was published by David Lowe in 1999. Its applications
vision-based applications, such as image representation [1],
include object recognition, robotic mapping and navigation,
image classification [2], object recognition [3], 3D modeling
image stitching, 3D modeling, gesture recognition, video
[4], tracking [5], and biometrics systems [6]. Such
tracking, individual identification of wildlife and match
applications require robust features in the image, which are
moving. The algorithm is patented in the US; the owner is
not only representative but also invariant to noise, scale, or
the University of British Columbia. However, SIFT is
illumination. Therefore, detecting and extracting features
allowed to be used in academic research.
from images are essential parts for these applications.
SIFT is both feature detector and feature descriptor. SIFT
In digital images, the concept of feature in computer
transforms an image into a large collection of local feature
vision refers to a piece of information, which represents
vectors, each of which is invariant to image translation,
characteristics of the image. This concept is generally the
scaling, and rotation, and partially invariant to illumination
same as feature in machine learning and pattern recognition,
changes and affine or 3D projection. Previous approaches to
even though image data has a very sophisticated collection of
local feature generation lacked invariance to scale and were
features. Feature can be considered as interesting parts of an
more sensitive to projective distortion and illumination
image, and features are used as a starting point for many
change. The SIFT features share a number of properties in
computer vision algorithms. Since features are used as the
common with the responses of neurons in inferior temporal
starting point and main primitives for subsequent algorithms,
(IT) cortex in primate vision. SIFT author also describes
the overall algorithm will often only be as good as its
improved approaches to indexing and model verification.
generated feature. In general, image features can be
The scale-invariant features are efficiently identified by using
categorized as edges, corners blobs, and ridges.
a staged filtering approach. The first stage identifies key
Feature detection is a method to compute abstractions of
locations in scale space by looking for locations that are
image information and making local decisions at every image
maxima or minima of a difference-of-Gaussian function.
point whether there is an image feature of a given type at that
Each point is used to generate a feature vector that describes
point or not. Feature detection is a low-level image
the local image region sampled relative to its scale-space
processing operation. That is, it is usually performed as the
coordinate frame. The features achieve partial invariance to
first operation on an image, and examines every pixel to see
local variations, such as affine or 3D projections, by blurring
if there is a feature present at that pixel. If this is part of a
image gradient locations. This approach is based on a model
larger algorithm, then the algorithm will typically only
of the behavior of complex cells in the cerebral cortex of
examine the image in the region of the features. After
mammalian vision. The resulting feature vectors are called
detecting keypoints in an image, we need a method to
SIFT keys. Due to complex computation, the execution time
describe local properties of the image at those points, hence
of SIFT is usually high. SIFT does not work well when
the name feature description. These algorithms extract
dealing with affine transformation. SIFT features exhibit the
interesting information from the image data at detected
highest matching accuracies for an affine transformation of
keypoints. A common practice to organize the information
50 degrees. After this transformation limit, results start to
provided by these feature description algorithms are
become unreliable.
encoding them as the elements of one single vector, which is
commonly referred to as a feature vector. The set of all 2.2 SURF
possible feature vectors constitutes a feature space.
SURF (Speeded Up Robust Features) [8] is partly inspired
In the following sections, we will provide a review on
by SIFT descriptor. The standard version of SURF is several
well-known image feature detection and feature description
times faster than SIFT and claimed by its authors to be more
techniques. We also conducted two experiments for the
robust against different image transformations than SIFT.
purpose of comparing performance of mentioned algorithms.
SURF was first presented by Herbert Bay, et al., at the 2006

- 677 -
2016년 추계학술발표대회 논문집 제23권 제2호(2016. 11)

European Conference on Computer Vision. An application of 2.4 BRISK


the algorithm is patented in the United States.
BRISK (Binary Robust Invariant Scalable Keypoints) [10]
SURF is a fast and performant scale and rotation invariant
is a point-feature detector and descriptor, recently developed
interest point detector and descriptor. It relies on integral
by Autonomous Systems Lab (ETH Zurich, Switzerland).
images for image convolutions to reduce computation time,
BRISK achieved low computational complexity thanks to the
builds on the strengths of the leading existing detectors and
application of a novel scale-space FAST-based detector, in
descriptors (using a fast Hessian matrix-based measure for
combination with the assembly of a bit-string descriptor from
the detector and a distribution-based descriptor). In feature
intensity comparisons retrieved by dedicated sampling of
description, it describes a distribution of Haar wavelet
each keypoint neighborhood.
responses within the interest point neighborhood. Integral
In BRISK, points of interest are identified across both the
images are used for speed and only 64 dimensions are used
image and scale dimensions using a saliency criterion. In
reducing the time for feature computation and matching. The
order to boost efficiency of computation, keypoints are
indexing step is based on the sign of the Laplacian, which
detected in octave layers of the image pyramid as well as in
increases the matching speed and the robustness of the
layers in-between. The location and the scale of each
descriptor. The important speed gain is due to the use of
keypoint are obtained in the continuous domain via quadratic
integral images, which drastically reduce the number of
function fitting. For feature description, a sampling pattern
operations for simple box convolutions, independent of the
consisting of points lying on appropriately scaled concentric
chosen scale. Even without any dedicated optimizations, an
circles is applied at the neighborhood of each keypoint to
almost real-time computation without loss in performance is
retrieve gray values: processing local intensity gradients, the
possible, which represents an important advantage for many
feature characteristic direction is determined. Finally, the
on-line computer vision applications. Experiments showed
oriented BRISK sampling pattern is used to obtain pairwise
that the performance of Hessian approximation is
brightness comparison results which are assembled into the
comparable and sometimes even better than the SIFT. The
binary BRISK descriptor. Once generated, the BRISK
high repeatability is advantageous for camera self-calibration,
keypoints can be matched very efficiently thanks to the
where an accurate interest point detection has a direct impact
binary nature of the descriptor. With a strong focus on
on the accuracy of the camera self-calibration and therefore
efficiency of computation, BRISK also exploits the speed
on the quality of the resulting 3D model. SURF is faster than
savings offered in the SSE instruction set widely supported
SIFT in term of execution time. However, when speed is not
RQ WRGD\¶V DUFKLWHFWXUHV BRISK is faster than SIFT and
critical, SIFT outperforms SURF.
SURF, while using less computational resource.
2.3 FAST
2.5 ORB
FAST (Features from Accelerated Segment Test) [9] is a
SIFT uses 128-dim vector for descriptors. Since it is using
corner detection method, which could be used to extract
floating point numbers, it takes basically 512 bytes. Similarly
feature points and later used to track and map objects in
SURF also takes minimum of 256 bytes (for 64-dim).
many computer vision tasks. FAST corner detector was
Creating such a vector for thousands of features takes a lot of
originally developed by Edward Rosten and Tom Drummond,
memory which are not feasible for resource-constraint
and published in 2006. The most promising advantage of the
applications especially for embedded systems. Larger the
FAST corner detector is its computational efficiency. FAST is
memory, longer the time it takes for matching. BRIEF
not a feature descriptor, hence it must be combined with
(Binary Robust Independent Elementary Features) [11] helps
other descriptors in specific applications.
reduce resource consumption. ORB (Oriented FAST and
FAST corner detector uses a circle of 16 pixels (a
Rotated BRIEF) [12] is a combination of FAST and BRIEF,
Bresenham circle of radius 3) to classify whether a candidate
with some improvements. Hence ORB is both detector and
point p is actually a corner. Each pixel in the circle is labeled
descriptor. The main goal of ORB is to reduce resource
from integer number 1 to 16 clockwise. If a set of N
consumption. The contribution of ORB is the addition of a
contiguous pixels in the circle are all brighter than the
fast and accurate orientation component to FAST, and an
intensity of candidate pixel p (denoted by Ip) plus a threshold
efficient computation of oriented BRIEF features. Many
value t or all darker than the intensity of candidate pixel p
keypoint detectors include an orientation operator (SIFT and
minus threshold value t, then p is classified as corner. FAST
SURF are two prominent examples), but FAST does not.
need less execution time that many other well-known feature
There are various ways to describe the orientation of a
extraction methods. It is suitable for real-time video
keypoint; many of these involve histograms of gradient
processing application because of high-speed performance.
computations. However, ORB use centroid technique to
This feature detector has a special characteristics, that is it
calculate orientation for FAST. ORB authors also propose a
can be improved by using machine learning to select optimal
learning method for de-correlating BRIEF features under
value for N. Without machine learning, N is usually chosen
rotational invariance, leading to better performance in
as 12, which may affect the overall performance. FAST has
nearest-neighbor applications.
high levels of repeatability under large aspect changes and
for different kinds of feature. It is many times faster than 2.6 FREAK
other existing corner detectors when it was announced. This
FREAK (Fast Retina Keypoint) [13] is inspired by the
is, however, also its weakness. Since high speed is achieved
by analyzLQJWKHIHZHVWSL[HOVSRVVLEOHWKHGHWHFWRU¶VDELOLW\ human visual system and more precisely the retina. A
to average out noise is reduced. cascade of binary strings is computed by efficiently
comparing image intensities over a retinal sampling pattern.

- 678 -
2016년 추계학술발표대회 논문집 제23권 제2호(2016. 11)

The authors claim that FREAKs are in general faster to Table 2. Execution time.
compute with lower memory load and also more robust than Sample 1 Sample 2 Sample 3
SIFT, SURF or BRISK. They are competitive alternatives to SIFT 0.2523s 0.1747s 0.1088s
existing keypoints in particular for embedded applications. SURF 0.0420s 0.0398s 0.0191s
FREAK is not a feature detector, it can only be applied to FAST 0.0010s 0.0007s 0.0002s
keypoints which are already detected by other feature BRISK 0.0690s 0.0426s 0.0165s
detection algorithms. In this algorithm, a cascade of binary ORB 0.0222s 0.0142s 0.0088s
strings is computed by efficiently comparing pairs of image
intensities over a retinal sampling pattern. It select pairs to
reduce the dimensionality of the descriptor yields a highly
structured pattern that mimics the saccadic search of the
human eyes.
3. Experiments
In this section, we compare the performance of feature
detection and feature description techniques. The algorithms
are implemented in C++ under Microsoft Windows 7, using
OpenCV library for processing image data. The system has
4GB of RAM and a quad-core Intel CPU running at 3.0GHz. Figure 2. (From left to right, from top to bottom) Feature
The performance of algorithms is evaluated by two detection results of first sample from SIFT, SURF, FAST,
experiments. In the first experiment, five feature detection BRISK, ORB.
techniques are used to detect interest points in three images,
3.2 Feature description evaluation
each image contains a single object. The performance of each
technique is evaluated by number of detected features, the In this experiment, five feature description techniques are
goodness of features, and execution time. In the second used to locate three objects in the first experiment, now
experiment, five feature description techniques are used to placed in cluttered scenes, among other objects with arbitrary
locate three objects in the first experiment, which are now positions. The selected algorithms are SIFT, SURF, BRISK,
placed in a cluttered scene. ORB. A combination of FAST and FREAK is also used in
this experiment, because FAST is detector-only and FREAK
3.1 Feature detection evaluation
is descriptor-only, they cannot work separately for object
In this experiment, five feature detectors are taken into localization test.
comparison. The selected algorithms are SIFT, SURF, FAST,
BRISK, and ORB. Selected detectors are applied to three
images for locating keypoints. Each image contains a single
objects. These three sample image are shown in Figure 1.

Figure 3. Matching result from SIFT.


Figure 1. Three sample image containing single object.
The results of this experiment is shown in Figure 2, Table 1
and Table 2. As shown in Figure 2, keypoints detected by
SIFT have the best distribution, in other words, SIFT
keypoints cover important parts of the object with reasonable
density. As shown in Table 1, SURF and BRISK detect more
keypoint than other methods, and their distributions of
keypoint have high density. FAST and ORB detect keypoints
mainly in the text regions. Despite giving the best results,
SIFT requires highest execution time. On average,
considering both detected keypoints and execution time,
SURF produces most reasonable results. Figure 4. Matching result from SURF.
Table 1. Number of detected features Table 3. Error rate comparison
Sample 1 Sample 2 Sample 3 Sample 1 Sample 2 Sample 3
SIFT 604 442 389 SIFT 6.06% 15.78% 21.05%
SURF 786 573 245 SURF 100% 0% 64.28%
FAST 409 207 26 FAST+FREAK 86.85% 63.39% 63.63%
BRISK 1662 991 258 BRISK 100% 9.37% 30.59%
ORB 454 483 367 ORB 100% 93.82% 37.12%

- 679 -
2016년 추계학술발표대회 논문집 제23권 제2호(2016. 11)

Keypoints will be matched by using FLANN (Fast Library 5. Acknowledgments


for Approximate Nearest Neighbors) [14]. Only good
This research was supported by Basic Science Research
matches, which have low distance, are kept. Table 3 shows
Program through the National Research Foundation of
error rate of five algorithms, the lower the better. Error rate is
Korea(NRF) funded by the Ministry of
calculated as the percentage of mismatched keypoints over
Education(2015R1D1A1A01057518).
the total number of matches. Figure 3 shows a sample
matching results from SIFT. As we can see, there are 2 References
incorrect matches out of 33 matches, hence the error rate is
[1] Yap, T., Jiang, X. and Kot, A.C.: Two-dimensional polar
6.06%. Figure 4 shows 100% accuracy results from SURF. harmonic transforms for invariant image representation. IEEE
However, there are only 5 matches, the object may not be Transactions on Pattern Analysis and Machine Intelligence,
recognized if we choose high threshold. Figure 5 and 6 32(7):1259±1270, 2010.
shows the results of SIFT and SURF in another test. This [2] Liu, S. and Bai, X.: Discriminative features for image
time SIFT retains reasonable results while SURF produce classification and retrieval. Pattern Recognition Letter 33(6):744±
results with zero-accuracy. In this experiments, SIFT 751, 2012.
produces best overall results. [3] Andreopoulos, A. and Tsotsos, J.: 50 years of object recognition:
directions forward. Computer Vision and Image Understanding,
117(8):827±891, 2013.
[4] Moreels, P. and Perona, P.: Evaluation of features detectors and
descriptors based on 3D objects. International Journal of Computer
Vision, 73(3):263±284, 2007.
[5] Takacs, G., Chandrasekhar, V., Tsai, S., Chen, D., Grzeszczuk, R.
and Girod, B.: Rotation-invariant fast features for large-scale
recognition and real-time tracking. Signal Processing: Image
Communication, 28(4):334±344, 2013.
[6] Mian, A., Bennamoun, M. and Owens, R.: Keypoint detection
and local feature matching for textured 3D face recognition.
International Journal of Computer Vision, 79(1):1±12, 2008.
[7] Lowe, D. G.: Object recognition from local scale-invariant
Figure 5. Matching results from SIFT in second sample. features. In Proceedings of the Seventh IEEE International
Conference on Computer Vision, volume 2, pages 1150±1157, 1999.
[8] Bay, H., Ess, A., Tuytelaars, T. and Gool, L. V.: Speeded-Up
Robust Features (SURF). Computer Vision and Image
Understanding, 110(3):346±359, 2008.
[9] Rosten, E. and Drummond T.: Machine learning for high-speed
corner detection. In Proceedings of 9th European Conference on
Computer Vision, volume 2, pages 430±443, 2006.
[10] Leutenegger, S., Chli, M. and Siegwart, R. Y.: BRISK: Binary
Robust Invariant Scalable Keypoints. In Proceedings of the 2011
International Conference on Computer Vision, volume 1, pages
2548±2555, 2011.
[11] Calonder, M., Lepetit, V., Strecha, C. and Fua, P.: BRIEF:
Figure 6. Bad matching results from SURF. Binary Robust Independent Elementary Features. In Proceedings of
11th European Conference on Computer Vision, volume 1, pages
4. Conclusions 778±792, 2010.
[12] Rublee, E., Rabaud, V., Konolige, K. and Bradski, G: ORB: an
Recently, image feature detectors and descriptors have efficient alternative to SIFT or SURF. In Proceedings of IEEE
become an important algorithms in the computer vision and International Conference on Computer Vision, volume 1, pages
image processing. This study we reviewed several feature 2564±2571 , 2011.
detection and feature description techniques. Two [13] Ortiz, R.: FREAK: Fast Retina Keypoint. In Proceedings of the
experiments have been conducted to compare the 2012 IEEE Conference on Computer Vision and Pattern
performance of algorithms. From experimental results, in Recognition, volume 1, pages 510±517, 2012.
[14] Muja, M. and Lowe, D. G.: Fast approximate nearest neighbors
general SURF produce best performance in feature detection,
with automatic algorithm configuration. In Proceedings of VISAPP
while SIFT dominates in object localization tests. For future International Conference on Computer Vision Theory and
works, we are trying to improve the accuracy of point feature Applications, volume 1, pages 331±340, 2009.
detector and descriptor for object recognition in various
environments. *
Corresponding author

- 680 -

You might also like