SSVM A Simple SVM Algorithm
SSVM A Simple SVM Algorithm
Abstract - We present a fast iterative algorithm for identifying the inseparable case [8]. The Karush-Kuhn-Tucker ( KKT )
the Support Vectors of a given set of points. Our algorithm works conditions for the dual can be expressed as
by maintaining a candidate Support Vector set. It uses a greedy
approach to pick points for inclusion in the candidate set. When dW =
the addition of a point to the candidate set is blocked because
9'I -
-
-
dai ;
Qijaj + yib - 1 = y i f ( ~ i )- 1 (2)
of other points already present in the set we use a backtracking
approach to prune away such points. To speed up convergence and
we initialize our algorithm with the nearest pair of points from
opposite classes. We then use an optimization based approach (3)
to increment or prune the candidate Support Vector set. The al-
gorithm makes repeated passes over the data to satisfy the KKT
constraints. The memory requirements of our algorithm scale as This partitions the training set into S the Support Vector set
O(lSlz) in the average case, where IS1 is the size of the Support (O<ai<C,gi=O),Etheerrorset(ai = C,gi < 0 )
Vector set. We show that the algorithm is extremely competitive and R the well classified set ( ai = 0, gi > 0 ) [2].
as compared to other conventional iterative algorithms like SMO If the points in error are penalized quadratically with a
and the NPA. We present results on a variety of real life datasets penalty factor C', then, it has been shown that the problem
to validate our claims. reduces to that of a separable case with C = 00 [3]. The
kernel function is modified as
I. Introduction
Support Vector Machines ( SVM ) have recently gained
prominence in the field of machine learning and pattern clas-
sification [8]. Classification is achieved by realizing a linear where 6 i j = 1if i = j and S i j = 0 otherwise. The advantage
or non-linear separation surface in the input space. of this formulation is that the SVM problem reduces to that of
In Support Vector classification,the separating function can a linearly separable case [4].
be expressed as a linear combination of kernels associated It can be seen that training the SVM involves solving a
with the Support Vectors as quadratic optimization problem which requires the use of op-
timization routines from numerical libraries. This step is com-
putationally intensive, can be subject to stability problems and
Xj ES is non-trivial to implement [5]. Attractive iterative algorithms
like the Sequential Minimal Optimization ( SMO ), Nearest
where z i denotes the training patterns, y i E {+l,-1) de- Point Algorithm ( NPA ) etc. have been proposed to overcome
notes the corresponding class labels and S denotes the set of this problem [ 5 ] , [4]. This paper makes another contribution
Support Vectors [8]. in this direction.
The dual formulation yields
1 A. DirectSVM and Geometric SVM
min W = -
o<ai<c
aiQijaj - ai +b yiai (1)
i,j i i The DirectSVM is an intuitively appealing algorithm, which
where a i are the corresponding coefficients, b is the offset, builds the Support Vector set incrementally [6]. Recently it
Q t3
.. - y i y j K ( z i , z j ) is a symmetric positive definite kernel has been proved that the closest pair of points of the opposite
matrix and C is the parameter used to penalize error points in class are always Support Vectors [7]. DirectSVM starts off
with this pair of points in the candidate Support Vector set. It
Corresponding author - M. Narasimha Murty has been conjectured that the maximum violator during each
~~
3 we get the change in gi due to addition of a new point c as c. Pruning
Agi = QicAac + Qij Aaj +yiAb (4) In the discussion above we tacitly assumed that (a+ A a ) >
j€S 0 V p E S. But, this condition may be violated if any point
and in S blocks c. When we say that a point p E S is blocking
0 = ycAac + YjAaj (5)
the addition of c to S what we mean is that ap of that point
may become negative due to the addition of c to S. What it
jES
physically implies is that p is making a transition from S to the
where Aai is the change in the value of ai and Ab is the
well classified set R. Because of the presence of such points
change in the value of b. We start off with a, = 0 and update
we may not be able to update a, by the amount specified by
ac as we go along.
Equation 8. In such a case we can prune away p from S by
Because all the vectors in S are Support Vectors we know
using [2]
from Equation 2 that gi = 0 Vi. If none of the current Sup-
port Vectors blocks the addition of c to S then all the vectors a.-- R.. - R - ~ & , R .
3 SS 83 Y Ra3. .E R
in S continue to remain Support Vectors in S U { c} and hence
we require that Agi = 0 for all vectors in S. It is shown that We remove the alpha entry corresponding to p from S so that
[21 all the other points in S continue to remain Support Vectors.
Ab = pAa, We now try to add c to this reduced S. We keep pruning points
from S till c can actually become a Support Vector.
and
Aaj = pjAac (7)
D. OurAlgorithm
If we define
-1 Using the ideas we discussed above an iterative algorithm can
0 y1 ... YS
be designed which scans through the dataset looking for vi-
YI Qii - - . &is olators. Using ideas presented in Section 11-B the violator
R=
is made a Support Vector. Blocking points are identified and
pruned away by using the ideas presented in Section 11-C. The
then
[ :]=-R[ $1
Qw
with 'a = 0.5 [4]. We vary the value of C' and reproduce our
results in table I.
We used the Adult- 1 dataset from the UCI Machine Learn-
Simple SVM NPA ing repository [ 11. This is a sparse dataset which consists of
C' I& I I
SV KernelEval SV KernelEval I 1605 data points, each having a dimension of 123. We used
0.03 192 I 194 I 0.038 195 I 0.104 the Gaussian kernel with c2 = 10.0 [5]. We vary the value of
0.1 192 194 0.038 195 0.121 C' and reproduce our results in table III.
0.2 192 194 0.038 195 0.134 The Adult-4 dataset is also from the UCI Machine Learning
0.3 192 194 0.038 195 0.143 repository [l]. This is a sparse dataset and consists of 4781
0.6 192 194 0.038 195 0.162 data points, each having a dimension of 123. We used the
1.0 192 194 0.038 195 0.188 Gaussian kernel with n' = 10.0 [5]. We vary the value of C'
2.0 199 194 0.038 195 0.238 and reproduce our results in table IV.
I I
I I
2.210
TABLE I
NPA VS SIMPLE SVM ON SPIRALS DATASET. KERNEL
EVALUATIONS ARE x lo6
References
[I] C.L. Blake and C.J. Men. UCI repository of machine learning databases,
1998.
[2] Gert Cauwenberghs and Tomaso Poggio. Incremental and decremental
support vector machine learning. In Advances in Neural Information
Processing System, (NIPS*2000), volume 13. NIPS, Cambridge MA:
MIT Press, 2001.
[3] T. Friess, N. Cristianini, and C. Campbell. The kernel adatron algo-
rittun: a fast and simple learning procedure for support vector machine.
In Proceedings of ISth lntemational Conference on Machine Learning.
Morgan Kaufinan, 1998. ,
_ _ _ _ ~