06 Avr PDF
06 Avr PDF
=
+
l
i
i
C w
t
w
b w
1
2
1
, ,
min
, with C > 0
subject to
i i
t
i
b x w y + 1 ) ) ( ( , where . 0
i
Here training vectors, x
i
, are mapped into a higher
(maybe infinite) dimensional space by the function .
Then SVM finds a linear separating hyperplane with the
maximal margin in this higher dimensional space. C is
the penalty parameter of the error term. Furthermore,
K(x
i
,x
j
)= (x
i
)
T
(x
j
) is called the kernel function. In this
work a radial basis function (RBF) was used like kernel
function:
0 ), || || exp( ) (
2
>
j i j i
x x x x K
(4)
The K-nearest neighbor (KNN) is a statistical method
of classification well known and very simple,
nevertheless has come to demonstrate to be very
effective in a wide variety of applications. It works
based on minimum distance from the query instance x
j
to
the training samples (x
i
,y
i
), to determine the K-nearest
neighbors. After we gather K nearest neighbors, we take
simple majority of these K-nearest neighbors to be the
prediction of the query instance.
In this work we employ a 5-nearest neighbor; we use
a parameter like a threshold for considering a good
match between an x
j
candidate descriptor and the
selected more voted feature class V:
X = AS (2)
2
1
) , (
l
x x d K
l
j
j
=
=
(5)
where x
j
V, d(x
j
,x) is the Euclidean distance and l is the
number of votes for the more voted class V. In this way
the average Euclidean distance is used like threshold but
is penalized according to the number of votes received
for V respect to K.
ICAD and PCAD methods
Many ICA and PCA algorithms are available. A
computationally efficient ICA algorithm, called the
FastICA [10] algorithm and the PCA Snapshot Method
[11] have been chosen for this work.
When the KLT (small base-line tracker) locates a feature
(feature i at frame f), a p-by-p pixels window around the
feature center is stored as a vector u
fi
of length p-by-p,
with a distinctive label; in the following frames the
feature is tracked and repeating the above process,
storing the window with the same label. Immediately
vectors with the same label (same feature i) are
regrouped in a matrix U
i
= [u
1i
, , u
ni
]
T
where n is the
number of frames where the feature has been tracked.
Then for each matrix U the ICA or PCA is applied as
it has been shown in the section 3.2 along with
dimensional reduction selecting the largest eigenvalue to
be retained. At the output of the ICA or PCA we obtain a
descriptor q
i
with a dimension that equals to the feature
window size. The descriptors are stored in a database
with a unique label for each feature.
In the recognition phase features are detected but not
tracked by the KLT for each incoming frame. Then for
each feature detected a window is obtained in the same
way than the learning stage and sorted in a vector v
i
, the
ICA or PCA is directly applied to this vector without
dimensional reduction producing a descriptor x
j
. A fast
k-nearest neighbor algorithm is applied to the database
in order to look for the 2-nearest neighbor descriptors q
i1
and q
i2
. Be k
1
= d(x
j
, q
i1
) and k
2
= d(x
j
, q
i2
) (d is the
Euclidean distance), k
1
k
2
, and =k
1
/ k
2
, this factor
will be used by our algorithm as a threshold for
considering a good match between the candidate
descriptor r
i
and its corresponding nearest descriptors q
i1
and q
i2
in the database. When tends to 0 means a great
distance between candidates and, empirically, the results
are better.
SVM-ICA and SVM-PCA methods
We used the LIBSVM [12] for the implementation of
the SVM. The method follows exactly the same steps
than the feature-class method (section 2): In step 3
(descriptor creation) unlike the ICAD and PCAD
methods, the ICA or PCA is applied directly to the
vector u
fi
obtained from de p-by-p pixels window (step
2) and stored in the database (step 4). In the output of the
ICA or PCA we obtain a descriptor with the same
dimension than the pixel window.
For step 5 (feature-class creation) we employ the
descriptors-database for training a SVM classifier with a
radial basis function (RBF) as a kernel function,
equation 4. The parameters C = 8 and = 1 used in RBF
were selected by cross-validation and grid search. For
the recognition phase the SVM output model is used to
predict the feature class V of the candidate descriptor x
j
,
as it has been explained in section 2 (recognition phase).
KNN-ICA and KNN-PCA methods
For the implementation of KNN we employ a
computationally efficient algorithm called approximate
nearest neighbor (ANN). The method follows the same
steps than SVM-ICA and SVM-PCA except that a
training model is not generated from the descriptor-
database. Prior to the recognition phase the whole
database is loaded in memory by the ANN algorithm. In
experiments we used 5-NN. For recognition phase ANN
is applied as it has been explained in section 3.3. A
threshold is used for considering a good match
between an x
j
candidate descriptor and the selected more
voted feature class V (equation 5).
IV. EXPERIMENTS
We have implemented a C++ version of the
methods that runs on a PC 2GHz Pentium IV processor,
512MB RAM. A non-expensive USB Webcam with a
maximum resolution of 640-by-480 pixels and 30 fps
has been used.
We performed a variety of experiments in order to
show the performance of the aforementioned methods.
For each method in the learning phase, a video sequence
of a rigid environment desktop scene was recorded
moving the camera slowly and continuously in order to
obtain a change of some degrees in the 3D point of view
and rotation of the camera. Later, twenty descriptors
were created from this video sequence as it was
described in section 3. In Fig. 1 (upper and lower center)
the scene used in the learning phase is showed.
Together with these 20 descriptors, the database
contains another 1000 descriptors corresponding to other
video sequences. The objective of these experiments
consists of observing the response of the methods when
a set of descriptors coming from an online video
sequence (close to the learning sequence, as it is
explained below) will be matched with the descriptors
database. The response was observed in four different
situations which can be seen in Fig. 1:
case a) change in 3D viewpoint with respect to the
position in the learning phase and little change in
illumination,
case b) change in 3D point of view plus change in
rotation and little change in illumination,
case c) change in 3D point of view plus change in
scale (camera zoom) and little change in illumination,
case d) change in 3D point of view plus great change
in illumination.
We define the response of the methods in terms of two
measurements:
1) Error of classification: it is the ratio between the
percentage of false positive and percentage of
classification. We define percentage of classification as
the ratio between the total of features classified in the
scene (correctly or incorrectly) and the total that
potentially could be matched (we consider only a finite
number of possible locations in each frame to be
matched, step 1 of recognition phase). A direct trade-off
exists between the threshold and the percentage of
classification. The lower is the threshold, the lower is the
number of false positive, but consequently the
percentage of classification is also low. Depending on
the type of algorithm is possible to establish a threshold
for considering a good match, such as the KNN
approach, but in case of SVM is not possible and it is
like choose the best candidate, for instance we
consider that the distance between the hyperplane and
the candidate is not a good quality measurement so we
consider the percentage of classification for SVM as 100
percent. Therefore a 38 percent of false positive in SVM
means 38 percent of classification error. In KNN the
percentage of false positive for a threshold =0 .6 could
be 16 percent but the percentage of classification is 50
percent, consequently the error of classification is 0.32.
The best case is 0 percent of false positive and 100
percent of classification.
2) Computational cost: we have calculated the time
for each frame (or the frequency, it is the same) that the
different methods take to classify 30 possible features
with the 1000 descriptors that are in the database.
The results for the experiments and the
measurements are shown in Table 1.
.
Fig. 2 a more robust descriptor like SIFT. Examples of frames used in the learning phase (upper and lower central images) and
frames used in the recognition phase: case a (lower right), case b (upper right), case c (lower left), case d (upper left)
Table 1. Error of classification for each condition case (a, b, c and d) in the recognition phase and their computational cost. CPU*
does not include the time to detect features by KLT tracker.
Case PCAD ICAD SVM-
PCA
SVM-
ICA
KNN-
PCA
KNN-
ICA
a .40 .35 .33 .21 .26 .21
b .47 .45 .44 .38 .31 .32
c .47 .33 .41 .28 .50 .35
d .48 .47 .43 .42 .37 .32
CPU 5.34 Hz 5.05 Hz 2.32 Hz 2.20 Hz 5.34 Hz 3.84 Hz
CPU* 16.30 Hz 10.70 Hz 3.22 Hz 2.94 Hz 21.72 HZ 7.09 Hz
.
V. CONCLUSIONS
In this work we presented a general feature-class
method for image features matching in a wide base-line
using statistical classification methods and descriptors,
as well as a comparative study of different statistical
methods with the objective of observing their relative
performance and the effectiveness using the feature-class
methodology.
In general ICA based descriptors show lower error of
classification than PCA based descriptors but
computationally the cost for ICA is greater than PCA.
Also we can observe a lower error of classification in the
descriptor feature-class methods (SVM-PCA, SVM-
ICA, KNN-PCA and KNN-ICA) comparing with
descriptor feature methods (PCAD and ICAD) and a
similar computational cost in the case PCAD and ICAD
with KNN-PCA and KNN-ICA. Among descriptor
feature-class methods KNN shows better performance
than SVM in error of classification as well as in
computational speed.
The results show the effectiveness of feature-class
method even if the descriptor-feature methods are based
in multiple adjacent views.
Its very important to note the principal objective of
this work: using the feature-class methodology can
improve the performance for matching image features in
a wide base-line since invariance to some changes like
illumination or point of view in an image feature is more
difficult to improve for a single-view descriptor. The key
idea is learning variations of images features among the
time using statistical methods. This is independent of the
kind of the descriptor used for representing the image
feature. For this reason, the experimental results of the
different descriptors-methods presented in this work
have to be interpreted relative to themselves.
In a future work we want to try using a more robust
descriptor like SIFT. Looking the results of this work,
we can expect that using SIFT [4] together with the
feature-class method can improve their performance than
using SIFT alone for applications like a mobile robot
where an incoming stream video is available.
REFERENCES
[1] J. Shi, C. Tomasi, Good features to track, Proc. IEEE CVPR,
1994.
[2] C. Harris, M. Stephens, A combined corner and edge detector,
Alvey Vision Conf., 1988.
[3] K. Mikolajczyk, C. Schmid, An affine invariant interest point
detector, Proc. ECCV, 2002.
[4] D. Lowe, Object recognition from local scale-invariant
features, Proc. ICCV, Corfu, Greece, September 1999.
[5] D. G. Lowe, Distinctive image features from scale-invariant
keypoints, International Journal of Computer Vision, 60 (2):91-
110, 2004.
[6] J Meltzer, M H. Yang, R Gupta, S. Soatto, Multiple view feature
descriptors from image sequences via kernel principal component
analysis, Proc. ECCV, 2004.
[7] I. T. Jolliffe. Principal Component Analysis. Springer Verlag,
1986.
[8] P. Comon, Independent component analysis, a new concept?,
Signal Processing, Elsevier, 36(3):287-314, April 1994.
[9] B. E. Boser, I. Guyon, and V. Vapnik, A training algorithm for
optimal margin classifiers, Proc. of the Fifth Annual Workshop
on Computational Learning Theory 5, pp. 144-152, 1992.
[10] A. Hyvarinen, E. Oja. A fast fixed-point algorithm for
independent component analysis, Neural Computation, 9
(7):1483-1492, 1997.
[11] L. Sirovich, Turbulence and the dynamics of coherent structures,
Part 1: Coherent Structure, Quarterly of Applied Mathematics,
45 (3):561-571, October 1987.
[12] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for
support vector machines, 2001. Software available at
http://www.csie.ntu.edu.tw/~cjlin/libsvm.