0% found this document useful (0 votes)
171 views13 pages

Tutorial On Support Vector Machine (SVM) : Abstract

tutorial svm for one on format
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
171 views13 pages

Tutorial On Support Vector Machine (SVM) : Abstract

tutorial svm for one on format
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 13

Tutorial on Support Vector Machine (SVM)

Vikramaditya Jakkula, School of EECS, Washington State ni!ersity, "ullman ##$%&'

Abstract: In this tutorial we present a brief introduction to SVM, and we discuss about SVM from
published papers, workshop materials & material collected from books and material available online on the World Wide Web. In the beginning we try to define SVM and try to talk as why SVM, with a brief overview of statistical learning theory. he mathematical formulation of SVM is presented, and theory for the implementation of SVM is briefly discussed. !inally some conclusions on SVM and application areas are included. Support Vector Machines "SVMs# are competing with $eural $etworks as tools for solving pattern recognition problems. his tutorial assumes you are familiar with concepts of %inear &lgebra, real analysis and also understand the working of neural networks and have some background in &I.

Introduction
Machine (earning is considered as a su)field of *rtificial +ntelligence and it is concerned ,ith the de!elopment of techni-ues and methods ,hich ena)le the computer to learn' +n simple terms de!elopment of algorithms ,hich ena)le the machine to learn and perform tasks and acti!ities' Machine learning o!erlaps ,ith statistics in many ,ays' .!er the period of time many techni-ues and methodologies ,ere de!eloped for machine learning tasks /$0' Support Vector Machine (SVM) ,as first heard in $##1, introduced )y 2oser, 3uyon, and Vapnik in C.(T4#1' Support !ector machines (SVMs) are a set of related super!ised learning methods used for classification and regression /$0' They )elong to a family of generali5ed linear classifiers' +n another terms, Support Vector Machine (SVM) is a classification and regression prediction tool that uses machine learning theory to ma6imi5e predicti!e accuracy ,hile automatically a!oiding o!er4fit to the data' Support Vector machines can )e defined as systems ,hich use hypothesis space of a linear functions in a high dimensional feature space, trained ,ith a learning algorithm from optimi5ation theory that implements a learning )ias deri!ed from statistical learning theory' Support !ector machine ,as initially popular ,ith the 7+"S community and no, is an acti!e part of the machine learning research around the ,orld' SVM )ecomes famous ,hen, using pi6el maps as input8 it gi!es accuracy compara)le to sophisticated neural net,orks ,ith ela)orated features in a hand,riting recognition task /10' +t is also )eing used for many applications, such as hand ,riting analysis, face analysis and so forth, especially for pattern classification and regression )ased applications' The foundations of Support Vector Machines (SVM) ha!e )een de!eloped )y Vapnik /90 and gained popularity due to many promising features such as )etter empirical performance' The formulation uses the Structural :isk Minimi5ation (S:M) principle, ,hich has )een sho,n to )e superior, /&0, to traditional Empirical :isk Minimi5ation (E:M) principle, used )y con!entional neural net,orks' S:M minimi5es an upper )ound on the e6pected risk, ,here as E:M minimi5es the error on the training data' +t is this difference ,hich e-uips SVM ,ith a greater a)ility to generali5e, ,hich is the goal in statistical learning' SVMs ,ere de!eloped to sol!e the classification pro)lem, )ut recently they ha!e )een e6tended to sol!e regression pro)lems /;0'

Statistical Learning Theory The statistical learning theory pro!ides a frame,ork for studying the pro)lem of gaining kno,ledge, making predictions, making decisions from a set of data' +n simple terms, it ena)les the choosing of the hyper plane space such a ,ay that it closely represents the underlying function in the target space /%0' +n statistical learning theory the pro)lem of super!ised learning is formulated as follo,s' We are gi!en a set of training data <(x$,y$)''' (xl,yl)= in :n : sampled according to unkno,n pro)a)ility distri)ution "(x,y), and a loss function V(y,f(x)) that measures the error, for a gi!en x, f(x) is >predicted> instead of the actual !alue y' The pro)lem consists in finding a function f that minimi5es the e6pectation of the error on ne, data that is, finding a function f that minimi5es the e6pected error? V(y, f( x)) "( x, y) dx dy /%0 +n statistical modeling ,e ,ould choose a model from the hypothesis space, ,hich is closest (,ith respect to some error measure) to the underlying function in the target space' More on statistical learning theory can )e found on introduction to statistical learning theory /@0' Learning and Generalization Early machine learning algorithms aimed to learn representations of simple functions' Aence, the goal of learning ,as to output a hypothesis that performed the correct classification of the training data and early learning algorithms ,ere designed to find such an accurate fit to the data /B0' The a)ility of a hypothesis to correctly classify data not in the training set is kno,n as its generali5ation' SVM performs )etter in term of not o!er generali5ation ,hen the neural net,orks might end up o!er generali5ing easily /$$0' *nother thing to o)ser!e is to find ,here to make the )est trade4off in trading comple6ity ,ith the num)er of epochs8 the illustration )rings to light more information a)out this' The )elo, illustration is made from the class notes'

Cigure $? 7um)er of Epochs Vs Comple6ity' /B0/#0/$$0

D SVM Tutorial

Introduction to SVM: Why SVM? Cirstly ,orking ,ith neural net,orks for super!ised and unsuper!ised learning sho,ed good results ,hile used for such learning applications' M("Es uses feed for,ard and recurrent net,orks' Multilayer perceptron (M(") properties include uni!ersal appro6imation of continuous nonlinear functions and include learning ,ith input4output patterns and also in!ol!e ad!anced net,ork architectures ,ith multiple inputs and outputs /$F0'

Cigure 1? a0 Simple 7eural 7et,ork )0Multilayer "erceptron' /$F0/$$0' These are simple !isuali5ations Gust to ha!e a o!er!ie, as ho, neural net,ork looks like'

There can )e some issues noticed' Some of them are ha!ing many local minima and also finding ho, many neurons might )e needed for a task is another issue ,hich determines ,hether optimality of that 77 is reached' *nother thing to note is that e!en if the neural net,ork solutions used tends to con!erge, this may not result in a uni-ue solution /$$0' 7o, let us look at another e6ample ,here ,e plot the data and try to classify it and ,e see that there are many hyper planes ,hich can classify it' 2ut ,hich one is )etterH

Cigure 9? Aere ,e see that there are many hyper planes ,hich can )e fit in to classify the data )ut ,hich one is the )est is the right or correct solution' The need for SVM arises' (Taken *ndre, W' Moore 1FF9) /10' 7ote the legend is not descri)ed as they are sample plotting to make understand the concepts in!ol!ed'

D SVM Tutorial

Crom a)o!e illustration, there are many linear classifiers (hyper planes) that separate the data' Ao,e!er only one of these achie!es ma6imum separation' The reason ,e need it is )ecause if ,e use a hyper plane to classify, it might end up closer to one set of datasets compared to others and ,e do not ,ant this to happen and thus ,e see that the concept of ma6imum margin classifier or hyper plane as an apparent solution' The ne6t illustration gi!es the ma6imum margin classifier e6ample ,hich pro!ides a solution to the a)o!e mentioned pro)lem /B0'

Cigure &? +llustration of (inear SVM' ( Taken from *ndre, W' Moore slides 1FF9) /10' 7ote the legend is not descri)ed as they are sample plotting to make understand the concepts in!ol!ed'

E6pression for Ma6imum margin is gi!en as /&0/B0 (for more information !isit /&0)?
margin arg min d ( x) = arg min
x' x'

x w + b
d

i =$ wi1

The a)o!e illustration is the ma6imum linear classifier ,ith the ma6imum range' +n this conte6t it is an e6ample of a simple linear SVM classifier' *nother interesting -uestion is ,hy ma6imum marginH There are some good e6planations ,hich include )etter empirical performance' *nother reason is that e!en if ,eE!e made a small error in the location of the )oundary this gi!es us least chance of causing a misclassification' The other ad!antage ,ould )e a!oiding local minima and )etter classification' 7o, ,e try to e6press the SVM mathematically and for this tutorial ,e try to present a linear SVM' The goals of SVM are separating the data ,ith hyper plane and e6tend this to non4linear )oundaries using kernel trick /B0 /$$0' Cor calculating the SVM ,e see that the goal is to correctly classify all the data' Cor mathematical calculations ,e ha!e, /a0 +f IiJ K$8 w(i + b $ /)0 +f IiJ 4$8 ,6i K ) L $ /c0 Cor all i8 yi (,i K )) M $ & D SVM Tutorial

+n this e-uation 6 is a !ector point and , is ,eight and is also a !ector' So to separate the data /a0 should al,ays )e greater than 5ero' *mong all possi)le hyper planes, SVM selects the one ,here the distance of hyper plane is as large as possi)le' +f the training data is good and e!ery test !ector is located in radius r from training !ector' 7o, if the chosen hyper plane is located at the farthest possi)le from the data /$10' This desired hyper plane ,hich ma6imi5es the margin also )isects the lines )et,een closest points on con!e6 hull of the t,o datasets' Thus ,e ha!e /a0, /)0 N /c0'

=1 b + wx =0 b + =wx b + 1 wx
Cigure ;? :epresentation of Ayper planes' /#0

Oistance of closest point on hyperplane to origin can )e found )y ma6imi5ing the 6 as 6 is on the hyper plane' Similarly for the other side points ,e ha!e a similar scenario' Thus sol!ing and su)tracting the t,o distances ,e get the summed distance from the separating hyperplane to nearest points' Ma6imum Margin J M J 1 P DD,DD 7o, ma6imi5ing the margin is same as minimum /B0' 7o, ,e ha!e a -uadratic optimi5ation pro)lem and ,e need to sol!e for , and )' To sol!e this ,e need to optimi5e the -uadratic function ,ith linear constraints' The solution in!ol!es constructing a dual pro)lem and ,here a (anglierEs multiplier )i is associated' We need to find , and ) such that Q (,) JR D,EDD,D is minimi5ed8 *nd for all <(6i, yi)=? yi (, S 6i K b) M $' 7o, sol!ing? ,e get that , JT)i * 6i8 bJ yk4 , S6k for any 6k such that )k F

D SVM Tutorial

7o, the classifying function ,ill ha!e the follo,ing form? f(6) J T)i yi 6i S 6 K b

Cigure %? :epresentation of Support Vectors (Copyright U 1FF9, *ndre, W' Moore)/10

SVM e!resentation +n this ,e present the V" formulation for SVM classification /&0/B0/$10/$90' This is a simple representation only' SV classification?
l

min f
f,i

1 W

+ C i
i =$
l l

yif(xi) $ 4 i, for all i i F

SVM classification, 'ual formulation?


$ min X i Xi 1 i =$
l

X X y y W(x , x )
i =$ G=$ i G i G i G

F i C, for all i8

y
i =$ i

=F

Varia)les i are called slack !aria)les and they measure the error made at point ( xi,yi)' Training SVM )ecomes -uite challenging ,hen the num)er of training points is large' * num)er of methods for fast SVM training ha!e )een proposed /&0/B0/$90' So"t Margin #lassi"ier +n real ,orld pro)lem it is not likely to get an e6actly separate line di!iding the data ,ithin the space' *nd ,e might ha!e a cur!ed decision )oundary' We might ha!e a hyperplane ,hich might e6actly separate the data )ut this may not )e desira)le if the data has noise in it' +t is )etter for the smooth )oundary to ignore fe, data points than )e cur!ed or go in loops, around the outliers' This is handled in a different ,ay8 here ,e hear the term slack !aria)les )eing introduced' 7o, ,e ha!e, y i(,E6 K )) M $ 4 S k /&0 /$10' This allo,s a point to )e a small distance Sk on the ,rong side of the hyper plane ,ithout !iolating the constraint' 7o, ,e might end up ha!ing huge slack !aria)les ,hich allo, any line to separate the data, thus in such scenarios ,e ha!e the (agrangian !aria)le introduced ,hich penali5es the large slacks' min ( J R ,E, 4 Y Zk ( yk (,E6k K )) K sk 4$) K X Y sk Where reducing X allo,s more data to lie on the ,rong side of hyper plane and ,ould )e treated as outliers ,hich gi!e smoother decision )oundary /$10' $ernal Tric% % D SVM Tutorial

(etEs first look at fe, definitions as ,hat is a kernel and ,hat does feature space meanH $ernel: +f data is linear, a separating hyper plane may )e used to di!ide the data' Ao,e!er it is often the case that the data is far from linear and the datasets are insepara)le' To allo, for this kernels are used to non4linearly map the input data to a high4dimensional space' The ne, mapping is then linearly separa)le /$0' * !ery simple illustration of this is sho,n )elo, in figure @ /#0 /$$0 /1F0'

Cigure @? Why use WernelsH /$$0/#0 /1F0

This mapping is defined )y the Wernel? &eature S!ace: Transforming the data into feature space makes it possi)le to define a similarity measure on the )asis of the dot product' +f the feature space is chosen suita)ly, pattern recognition can )e easy /$0'
($ (1 + ( ($ , (1 ) = ( ($ ) ( (1 )

Cigure B? Ceature Space :epresentation /$$0/#0' 7ote the legend is not descri)ed as they are sample plotting to make understand the concepts in!ol!ed'

7o, getting )ack to the kernel trick, ,e see that ,hen ,,) is o)tained the pro)lem is sol!ed for a simple linear scenario in ,hich data is separated )y a hyper plane' The Wenral trick allo,s SVMEs to form nonlinear )oundaries' Steps in!ol!ed in kernel trick are gi!en )elo, /$10 /1&0' 'a( The algorithm is e6pressed using only the inner products of data sets' This is also called as dual pro)lem' 'b( .riginal data are passed through non linear maps to form ne, data ,ith respect to ne, dimensions )y adding a pair ,ise product of some of the original data dimension to each data !ector' 'c( :ather than an inner product on these ne,, larger !ectors, and store in ta)les and later do a ta)le lookup, ,e can represent a dot product of the data after doing non linear mapping on them' This function is the kernel function' More on kernel functions is gi!en )elo,' @ D SVM Tutorial

$ernal Tric%: )ual *roble+ Cirst ,e con!ert the pro)lem ,ith optimi5ation to the dual form in ,hich ,e try to eliminate ,, and a (agrangian no, is only a function of Z i' There is a mathematical solution for it )ut this can )e a!oided here as this tutorial has instructions to minimi5e the mathematical e-uations, + ,ould descri)e it instead' To sol!e the pro)lem ,e should ma6imi5e the (O ,ith respect to Zi' The dual form simplifies the optimi5ation and ,e see that the maGor achie!ement is the dot product o)tained from this /&0/B0/$10' $ernal Tric%: Inner *roduct su++arization Aere ,e see that ,e need to represent the dot product of the data !ectors used' The dot product of nonlinearly mapped data can )e e6pensi!e' The kernel trick Gust picks a suita)le function that corresponds to dot product of some nonlinear mapping instead /&0 /B0/$10' Some of the most commonly chosen kernel functions are gi!en )elo, in later part of this tutorial' * particular kernel is only chosen )y trial and error on the test set, choosing the right kernel )ased on the pro)lem or application ,ould enhance SVMEs performance' $ernel &unctions The idea of the kernel function is to ena)le operations to )e performed in the input space rather than the potentially high dimensional feature space' Aence the inner product does not need to )e e!aluated in the feature space' We ,ant the function to perform mapping of the attri)utes of the input space to the feature space' The kernel function plays a critical role in SVM and its performance' +t is )ased upon reproducing Wernel Ail)ert Spaces /B0 /$&0 /$;0 /$B0' +f W is a symmetric positi!e definite function, ,hich satisfies MercerEs Conditions,

Then the kernel represents a legitimate inner product in feature space' The training set is not linearly separa)le in an input space' The training set is linearly separa)le in the feature space' This is called the [Wernel trick\ /B0 /$10' The different kernel functions are listed )elo, /B0? More e6planation on kernel functions can )e found in the )ook /B0' The )elo, mentioned ones are e6tracted from there and Gust for mentioning purposes are listed )elo,' $0 ,olynomial- * polynomial mapping is a popular method for non4linear modeling' The second kernel is usually prefera)le as it a!oids pro)lems ,ith the hessian )ecoming ]ero'

10 .aussian /adial 0asis !unction? :adial )asis functions most commonly ,ith a 3aussian form B D SVM Tutorial

90 1(ponential /adial 0asis !unction? * radial )asis function produces a piece,ise linear solution ,hich can )e attracti!e ,hen discontinuities are accepta)le'

&0 Multi2%ayer ,erceptron? The long esta)lished M(", ,ith a single hidden layer, also has a !alid kernel representation'

There are many more including Courier, splines, 24splines, additi!e kernels and tensor products /B0' +f you ,ant to read more on kernel functions you could read the )ook /B0' #ontrolling #o+!lexity in SVM: Trade,o""s SVM is po,erful to appro6imate any training data and generali5es )etter on gi!en datasets' The comple6ity in terms of kernel affects the performance on ne, datasets /B0' SVM supports parameters for controlling the comple6ity and a)o!e all SVM does not tell us ho, to set these parameters and ,e should )e a)le to determine these "arameters )y Cross4Validation on the gi!en datasets /10 /$$0' The diagram gi!en )elo, gi!es a )etter illustration'

Cigure #? Ao, to control comple6ity /10 /#0' 7ote the legend is not descri)ed as they are sample plotting to make understand the concepts in!ol!ed'

SVM "or #lassi"ication SVM is a useful techni-ue for data classification' E!en though itEs considered that 7eural 7et,orks are easier to use than this, ho,e!er, sometimes unsatisfactory results are o)tained' * classification task usually in!ol!es ,ith training and testing data ,hich consist of some data instances /1$0' Each instance in the training set contains one target !alues and se!eral attri)utes' The goal of SVM is to produce a model ,hich predicts target !alue of data instances in the testing set ,hich are gi!en only the attri)utes /B0'

D SVM Tutorial

Classification in SVM is an e6ample of Super!ised (earning' Wno,n la)els help indicate ,hether the system is performing in a right ,ay or not' This information points to a desired response, !alidating the accuracy of the system, or )e used to help the system learn to act correctly' * step in SVM classification in!ol!es identification as ,hich are intimately connected to the kno,n classes' This is called feature selection or feature e6traction' Ceature selection and SVM classification together ha!e a use e!en ,hen prediction of unkno,n samples is not necessary' They can )e used to identify key sets ,hich are in!ol!ed in ,hate!er processes distinguish the classes /B0' SVM "or egression SVMs can also )e applied to regression pro)lems )y the introduction of an alternati!e loss function /B0 /$@0' The loss function must )e modified to include a distance measure' The regression can )e linear and non linear' (inear models mainly consist of the follo,ing loss functions, e4intensi!e loss functions, -uadratic and Au)er loss function' Similarly to classification pro)lems, a non4linear model is usually re-uired to ade-uately model data' +n the same manner as the non4linear SVC approach, a non4linear mapping can )e used to map the data into a high dimensional feature space ,here linear regression is performed' The kernel approach is again employed to address the curse of dimensionality' +n the regression method there are considerations )ased on prior kno,ledge of the pro)lem and the distri)ution of the noise' +n the a)sence of such information Au)erEs ro)ust loss function, has )een sho,n to )e a good alternati!e /B0 /$%0' A!!lications o" SVM SVM has )een found to )e successful ,hen used for pattern classification pro)lems' *pplying the Support Vector approach to a particular practical pro)lem in!ol!es resol!ing a num)er of -uestions )ased on the pro)lem definition and the design in!ol!ed ,ith it' .ne of the maGor challenges is that of choosing an appropriate kernel for the gi!en application /&0' There are standard choices such as a 3aussian or polynomial kernel that are the default options, )ut if these pro!e ineffecti!e or if the inputs are discrete structures more ela)orate kernels ,ill )e needed' 2y implicitly defining a feature space, the kernel pro!ides the description language used )y the machine for !ie,ing the data' .nce the choice of kernel and optimi5ation criterion has )een made the key components of the system are in place /B0' (etEs look at some e6amples' The task of te6t categori5ation is the classification of natural te6t documents into a fi6ed num)er of predefined categories )ased on their content' Since a document can )e assigned to more than one category this is not a multi4class classification pro)lem, )ut can )e !ie,ed as a series of )inary classification pro)lems, one for each category' .ne of the standard representations of te6t for the purposes of information retrie!al pro!ides an ideal feature mapping for constructing a Mercer kernel /1;0' +ndeed, the kernels someho, incorporate a similarity measure )et,een instances, and it is reasona)le to assume that e6perts ,orking in the specific application domain ha!e already identified !alid similarity measures, particularly in areas such as information retrie!al and generati!e models /1;0 /1@0' $F D SVM Tutorial

Traditional classification approaches perform poorly ,hen ,orking directly )ecause of the high dimensionality of the data, )ut Support Vector Machines can a!oid the pitfalls of !ery high dimensional representations /$10' * !ery similar approach to the techni-ues descri)ed for te6t categori5ation can also )e used for the task of image classification, and as in that case linear hard margin machines are fre-uently a)le to generali5e ,ell /B0' The first real4,orld task on ,hich Support Vector Machines ,ere tested ,as the pro)lem of hand4,ritten character recognition' Curthermore, multi4class SVMs ha!e )een tested on these data' +t is interesting not only to compare SVMs ,ith other classifiers, )ut also to compare different SVMs amongst themsel!es /190' They turn out to ha!e appro6imately the same performance, and furthermore to share most of their support !ectors, independently of the chosen kernel' The fact that SVM can perform as ,ell as these systems ,ithout including any detailed prior kno,ledge is certainly remarka)le /1;0' Strength and Wea%ness o" SVM: The maGor strengths of SVM are the training is relati!ely easy' 7o local optimal, unlike in neural net,orks' +t scales relati!ely ,ell to high dimensional data and the trade4off )et,een classifier comple6ity and error can )e controlled e6plicitly' The ,eakness includes the need for a good kernel function /10 /&0 /B0 /$10 /1&0' #onclusion The tutorial presents an o!er!ie, on SVM in parallel ,ith a summary of the papers collected from the World Wide We)' Some of the important conclusions of this tutorial are summari5ed as follo,s' SVM are )ased on statistical learning theory' They can )e used for learning to predict future data /1;0' SVM are trained )y sol!ing a constrained -uadratic optimi5ation pro)lem' SVM, implements mapping of inputs onto a high dimensional space using a set of nonlinear )asis functions' SVM can )e used to learn a !ariety of representations, such as neural nets, splines, polynomial estimators, etc, )ut there is a uni-ue optimal solution for each choice of the SVM parameters /&0' This is different in other learning machines, such as standard 7eural 7et,orks trained using )ack propagation /1%0' +n short the de!elopment of SVM is an entirely different from normal algorithms used for learning and SVM pro!ides a ne, insight into this learning' The four most maGor features of SVM are duality, kernels, con!e6ity and sparseness /1&0' Support Vector Machines acts as one of the )est approach to data modeling' They com)ine generali5ation control as a techni-ue to control dimensionality' The kernel mapping pro!ides a common )ase for most of the commonly employed model architectures, ena)ling comparisons to )e performed /B0' +n classification pro)lems generali5ation control is o)tained )y ma6imi5ing the margin, ,hich corresponds to minimi5ation of the ,eight !ector in a canonical frame,ork' The solution is o)tained as a set of support !ectors that can )e sparse' The minimi5ation of the ,eight !ector can )e used as a criterion in regression pro)lems, ,ith a modified loss function' Cuture directions include? * techni-ue for choosing the kernel function and additional capacity control8 Oe!elopment of kernels ,ith in!ariance' Cinally, ne, directions are mentioned in ne, SVM4related learning formulations recently proposed )y Vapnik /$#0' $$ D SVM Tutorial

e"erences:
/$0 Wikipedia .nline' Attp?PPen',ikipedia'orgP,iki /10 Tutorial slides )y *ndre, Moore' Attp?PP,,,'cs'cmu'eduP^a,m /90 V' Vapnik' The 7ature of Statistical (earning Theory' Springer, 7'I', $##;' +S27 F49B@4#&;;#4B' /&0 2urges C', [* tutorial on support !ector machines for pattern recognition\, +n [Oata Mining and Wno,ledge Oisco!ery\' Wlu,er *cademic "u)lishers, 2oston, $##B, (Volume 1)' /;0 V' Vapnik, S' 3olo,ich, and *' Smola' Support !ector method for function appro6imation, regression estimation, and signal processing' +n M' Mo5er, M' Jordan, and T' "etsche, editors, *d!ances in 7eural +nformation "rocessing Systems #, pages 1B$_ 1B@, Cam)ridge, M*, $##@' M+T "ress' /%0 Theodoros E!genuiu and Massimilliano "ontil, Statistical (earning Theory? a "rimer $##B' /@0 .li!ier 2ous-uet, Stephane 2oucheron, and 3a)or (ugosi, [+ntroduction to Statistical (earning Theory\' /B0 7ello Cristianini and John Sha,e4Taylor, [*n +ntroduction to Support Vector Machines and .ther Wernel4)ased (earning Methods\, Cam)ridge ni!ersity "ress, 1FFF' /#0 +mage found on the ,e) search for learning and generali5ation in s!m follo,ing links gi!en in the )ook a)o!e' /$F0 Oa!id M Skapura, 2uilding 7eural 7et,orks, *CM press, $##%' /$$0 Tom Mitchell, Machine (earning, Mc3ra,4Aill Computer science series, $##@' /$10 J'"'(e,is, Tutorial on SVM, C3+T (a), SC, 1FF&' /$90 Vapnik V', \Statistical (earning Theory\, Wiley, 7e, Iork, $##B' /$&0 M' *' *i5erman, E' M' 2ra!erman, and (' +' :o5ono`er' Theoretical foundations of the potential function method in pattern recognition learning' *utomation and :emote Control, 1;?B1$_B9@, $#%&' /$;0 7' *rons5aGn' Theory of reproducing kernels' Trans' *mer' Math' Soc', %B%?99@_&F&, $#;F' /$%0 C' Cortes and V' Vapnik' Support !ector net,orks' Machine (earning, 1F?1@9 _ 1#@, $##; /$@0 *' J' Smola' :egression estimation ,ith support !ector learning machines' MasterEs thesis, Technische ni!ersitaat Maunchen, $##%' /$B0 7' Aeckman' The theory and application of penali5ed least s-uares methods or reproducing kernel hil)ert spaces made easy, $##@' /$#0 Vapnik, V', Estimation of Oependencies 2ased on Empirical Oata' Empirical +nference Science? *fter,ord of 1FF%, Springer, 1FF% /1F0 http?PP,,,'enm')ris'ac'ukPteachingPproGectsP1FF&bF;Pdm$%;&Pkernel'htm /1$0 Ouda :' and Aart "', >"attern Classification and Scene *nalysis>, Wiley, 7e, Iork $#@9' /110 E' .suna, :' Creund, and C' 3irosi' *n impro!ed training algorithm for support !ector machines' +n J' "rincipe, (' 3ile, 7' Morgan, and E' Wilson, editors, 7eural 7et,orks for Signal "rocessing V++ c "roceedings of the $##@ +EEE Workshop, pages 1@% _ 1B;, 7e, Iork, $##@' +EEE' /190 M' .' Stitson and J' *' E' Weston' +mplementational issues of support !ector machines' Technical :eport CSO4T:4#%4$B, Computational +ntelligence 3roup, :oyal Aollo,ay, ni!ersity of (ondon, $##%' /1&0 2urges 2'^Scholkopf, editor, [*d!ances in Wernel Methods44Support Vector (earning\' M+T press, $##B'

$1

D SVM Tutorial

/1;0 .suna E', Creund :', and 3irosi C', [Support Vector Machines? Training and *pplications\, *'+' Memo 7o' $%F1, *rtificial +ntelligence (a)oratory, M+T, $##@' /1%0 Trafalis T', >"rimal4dual optimi5ation methods in neural net,orks and support !ector machines training>, *C*+##' /1@0 Veropoulos W', Cristianini 7', and Camp)ell C', >The *pplication of Support Vector Machines to Medical Oecision Support? * Case Study>, *C*+##

$9

D SVM Tutorial

You might also like