M7-Support Vector Mechine
M7-Support Vector Mechine
Machine Classification
Bioinformatics Lecture 7/2/2003
by
Pierre Dönnes
Outline
• What do we mean with classification, why is it
useful
• Machine learning- basic concept
• Support Vector Machines (SVM)
– Linear SVM – basic terminology and some formulas
– Non-linear SVM – the Kernel trick
• An example: Predicting protein subcellular
location with SVM
• Performance measurments
Tennis example 2
Temperature
Humidity
= play tennis
= do not play tennis
Linear Support Vector Machines
Data: <xi,yi>, i=1,..,l
xi Rd
yi {-1,+1}
x2
=+1
=-1
x1
Linear SVM 2
f(x) =-1
=+1
What we need to see: xiand xj (input vectors) appear only in the form
of dot product – we will soon see why that is important.
Problems with linear SVM
=-1
=+1
=-1
=+1
=-1
=+1
Overtraining/overfitting 2
A measure of the risk of overtraining with SVM (there are also other
measures).
It can be shown that: The portion, n, of unseen data that will be
missclassified is bound by:
n No of support vectors / number of training examples
Ockham´s razor principle: Simpler system are better than more complex ones.
In SVM case: fewer support vectors mean a simpler representation of the
hyperplane.
Nuclear
SVM Model
All others
Cross-validation
Cross validation: Split the data into n sets, train on n-1 set, test on the set left
out of training.
1
1 Test set
Nuclear 1
2
2
1
Training set
All others 3
2
2
3
3
Performance measurments
Test data Predictions TP
FP
+1
Model
TN
-1
=+1
=-1
FN
Reason: All enemy tank photos taken in the morning. All own tanks in dawn.
The classifier could recognize dusk from dawn!!!!
References
http://www.kernel-machines.org/
http://www.support-vector.net/
Papers by Vapnik