Incremental Support Vector Machine Construction
Incremental Support Vector Machine Construction
590
the value of y in K ( x i ,x) = e--YIIx~-x112 equal to the op- incremental and batch techniques, respectively, are 41 8 and
timal one determined via cross-validation. Also the value 430. Since the data distribution is stationary, the perfor-
of C for the soft-margin classifier is optimized via cross- mance and estimator size remain stable over time. We
validation. For the incremental techniques we have tested observe that the incremental technique employed (Fixed-
different batch sizes and ne values. In Tables 1 - 2 we re- partition) and the batch mode algorithm basically provide
port the best performances obtained (B is for the batch al- the same results, both in terms of performance and size of
gorithm). We also report, besides the average classification the model. These results provide clear evidence that, al-
error rates and standard deviations, the number of support though the incremental techniques allow loss of informa-
vectors of the resulting classifier, the corresponding size of tion, they are capable of achieving accuracy results similar
the condensed set (%), and the number of training cycles to the batch algorithm, while significantly improving train-
the SVM underwent. ing time.
To test the incremental techniques with stream data, we
have used the Noisy-crossed-norm dataset (generated as the
Large-noisy-crossed-norm dataset described below), and Table 1. Results for Large-noisy-crossed-
generated streams in batches of size b = 1000, and set norm data.
w = 3. We have employed the Fixed-partition technique
for the incremental updates. At each incremental step, we
have tested the performance of the current model using 10
independent test sets of size 1000. We report average classi-
fication error rates and classifier sizes over successive steps.
For comparison, we have also trained a SVM in batch mode
over w = 3 consecutive batches of data over time, and re-
port average classification error rates obtained at each step.
The Problems: Large-noisy-crossed-normdata. This data
batchsize - 500 I 500 1 500 1 500
set consists of n = 20 attributes and J = 2 classes. Each
time 14 17 I 20 I 0.5 I 22
class is drawn from a multivariate normal distribution with
unit covariance matrix. One class has mean 2/malong
each dimension, and the other has mean -2/malong
each dimension. We have generated 200,000 data points, Table 2. Results for Pima data.
and performed 5-fold cross-validation with 100,000 train-
ing data and 100,000testing data. Table 1 shows the results.
The last column lists the running times (in hours). Experi- error(%) 31.9 29.3 26.2 27.1 26.4
ments were conducted on a 1.3 GHz machine with 1GB of std dev 0.47 0.02 0.02 0.02 0.02
RAM. Pima Indians Diabete data. This data set consists of #SVs 547 291 405 394 399
1
n = 8 attributes, J = 2 classes, and 1 = 768 instances. Cond. set (%) 96 51.2 71.3 69.4 70.2
Results are shown in Table 2. We performed IO-fold cross- cycles - 13 38 34 36
validation with 568 training data and 200 testing data. batchsize - 10 10 10 10
Results: Tables 1-2 show that, for both the data sets we
have tested, the performance obtained with the incremen-
tal techniques comes close to the performance given by the
batch algorithm. Moreover, for each problem considered,
more than one incremental scheme provides a much smaller
5. Related Work
condensed set. In particular, it is quite remarkable the con-
densation power (1.5%) that the Exceed-margin technique The incremental techniques discussed here can be
shows for the Large-noisy-crossed-norm, while still per- viewed as approximations of the chunking technique em-
forming close to the batch algorithm. The fact that the clas- ployed to train SVMs [ 131. The chunking technique is an
sifier is kept smaller allows for a much faster computation exact decomposition method that iterates through the train-
(30 minutes). The results obtained with the Pima data are ing set to select the support vectors.
also of interest. All four incremental techniques perform The incremental methods introduced here, instead, scan
better than the batch algorithm and, at the same time, com- the training data only once, and, once discarded, data are
pute a smaller condensed set. not considered anymore. This property makes the methods
In Figure 1, we plot the results obtained with the stream suited to be employed within the data stream model also.
data for 12 time steps. The average estimator size for the Furthermore, the experiments we have performed show that,
591
[4] Pedro Domingos, Geoff Hulten, “Mining high-speed
data streams.” SIGKDD 2000: 71-80, Boston, MA.
[ 5 ] Venkatesh Ganti, Johannes Gehrke, Raghu Ramakr-
3.5 1 4 ishnan. “DEMON: Mining and Monitoring Evolving
Data.”, in ICDE 2000: 439-448, San Diego, CA.
[6] Sudipto Guha and Nick Koudas. “Data-Streams and
3.1 1 ’ r i
Histograms.”, In Proc. STOC 2001.
0 1 2 3 4 5 E 7 8 0 1 0 1 1 1 2 1 3 1 4
Time SI-
[7] S. Guha, N. Mishra, R. Motwani, L. O’Callaghan,
“Clustering Data Stream”, IEEE Foundations of Com-
Figure 1. Noisy-crossed-normdata: Average
puter Science, 2000.
Error Rates of Fixed-partition and batch algo-
rithms for consecutive time steps. [8] M. R. Henzinger, P. Raghavan, and S. Rajagopalan,
“Computing on data streams”, SRC Technical Note
1998-011 , Digital Research Center, May 26, 1998.
although the incremental techniques allow loss of informa- [9] T. Joachims, “Text categorization with support vector
tion, they are capable of achieving performance results sim- machines”, Proc. of European Conference on Machine
ilar to the batch algorithm. Learning, 1998.
[IO] T. Joachims, “Making large-scale SVM learning prac-
6 Conclusions tical” Advances in Kernel Methods - Support Vec-
tor Learning, B. Scholkopf and C. Burger and A.
We have introduced and compared new and existing in- Smola (ed.), MIT-Press, 1999. http://www-ai.cs.uni-
cremental techniques for constructing SVMs. The experi- dortmund.de/thorsten/svm-light. html
mental results presented show that incremental techniques
are capable of achieving performance results similar to the [ 1 13 P. Mitra, C. A. Murthy, and S. K. Pal, “Data Condensa-
batch algorithm, while improving the training time. We ex- tion in Large Databases by Incremental Learning with
tended these approaches to work with stream data, and pre- Support Vector Machines”, International Conference
sented experimental results to show the efficiency and accu- on Pattern Recognition, 2000.
racy of the method. [12] E. Osuna, R. Freund, and E Girosi, “Training sup-
port vector machines: An application to face detec-
Acknowledgments tion”, Proc. of Computer Vision and Pattern Recogni-
tion, 1997.
This research has been supported by the National
Science Foundation under grants NSF CAREER Award [ 131 E. Osuna, R. Freund, and F. Girosi, “An improved
9984729 and NSF 11s-9907477, by the US Department of training algorithm for support vector machines”, Pro-
Defense, and a research award from AT&T. ceedings of IEEE NNSP’97, 1997.
[ 141 J. C. Platt, “Fast Training of Support Vector Machines
References using Sequential Minimal Optimization”, Advances in
Kernel Methods, B. Scholf, C. J. C. Burges, and A. J.
[ 11 J. C. Bennett, C. Campbel, “Support Vector Machines: Smola (eds.), MIT Press, 185-208, 1999.
Hype or Hallelujah?’, SIGKDD Explorations, Vol. 2,
151 E J. Provost and V. Kolluri, “A survey of methods
NO. 2, 1-13,2000.
for scaling up inductive learning algorithms”, Techni-
[2] M. Brown, W. Grundy, D. Lin, N. Cristianini, C. Sug- cal Report ISL-97-3,Intelligent Systems Lab., Depart-
net, T. Furey, M. Ares, and D. Haussler, “Knowledge- ment of Computer Science, University of Pittsburgh,
based analysis of microarray gene expressions data us- 1997.
ing support vector machines,” Tech. Report, Univer- 161 N. A. Syed, H. Liu, and K. K. Sung, “Incremen-
sity of California in Santa Cruz, 1999. tal Learning with Support Vector Machines”, Interna-
[3] G. Cauwenberghs and T. Poggio, “Incremental and tional Joint Conference on Artijicial Intelligence (IJ-
Decremental Support Vector Machine Learning”, Ad- CAI), 1999.
vances in Neural Information Processing Systems, [ 171 V. Vapnik, Statistical Learning Theory. Wiley, 1998.
2000.
592