0% found this document useful (0 votes)

12 views24 pages

M7-Support Vector Mechine

This document provides an introduction to support vector machine (SVM) classification. It outlines SVM concepts such as linear and non-linear classification, maximizing margin, kernels, and overfitting. It also gives an example of using SVMs to predict protein subcellular localization based on amino acid composition, obtaining predictive power but noting the need to consider differences within classes. The key points are that SVMs find optimal separating hyperplanes to classify data and use kernels to handle non-linear cases.

Uploaded by

Supriyadi Ageng Parjoyono

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views24 pages

M7-Support Vector Mechine

Uploaded by

Supriyadi Ageng Parjoyono

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 24

An Introduction to Support Vector

Machine Classification
Bioinformatics Lecture 7/2/2003

by
Pierre Dönnes
Outline
• What do we mean with classification, why is it
useful
• Machine learning- basic concept
• Support Vector Machines (SVM)
– Linear SVM – basic terminology and some formulas
– Non-linear SVM – the Kernel trick
• An example: Predicting protein subcellular
location with SVM
• Performance measurments
Tennis example 2

Temperature

Humidity
= play tennis
= do not play tennis
Linear Support Vector Machines
Data: <xi,yi>, i=1,..,l
xi  Rd
yi  {-1,+1}

=+1
=-1

x1
Linear SVM 2

Data: <xi,yi>, i=1,..,l

xi  Rd
yi  {-1,+1}

f(x) =-1
=+1

All hyperplanes in Rd are parameterize by a vector (w) and a constant b.

Can be expressed as w•x+b=0 (remember the equation for a hyperplane
from algebra!)
Our aim is to find such a hyperplane f(x)=sign(w•x+b), that
correctly classify our data.
Definitions
Define the hyperplane H such that:
H1
xi•w+b  +1 when yi =+1
xi•w+b  -1 when yi =-1
H2
H1 and H2 are the planes: d+
H1: xi•w+b = +1
d-
H2: xi•w+b = -1 H
The points on the planes
H1 and H2 are the
Support Vectors

d+ = the shortest distance to the closest poitive point

d- = the shortest distance to the closest negative point

The margin of a separating hyperplane is d+ + d-.

Maximizing the margin
We want a classifier with as big margin as possible.
H1
H
H2
Recall the distance from a point(x0,y0) to a line: d+
Ax+By+c = 0 is|A x0 +B y0 +c|/sqrt(A2+B2) d-
The distance between H and H1 is:
|w•x+b|/||w||=1/||w||

The distance between H1 and H2 is: 2/||w||

In order to maximize the margin, we need to minimize ||w||. With the

condition that there are no datapoints between H1 and H2:
xi•w+b  +1 when yi =+1
xi•w+b  -1 when yi =-1 Can be combined into yi(xi•w)  1
The Lagrangian trick
Reformulate the optimization problem:
A ”trick” often used in optimization is to do an Lagrangian
formulation of the problem.The constraints will be replace
by constraints on the Lagrangian multipliers and the training
data will only occur as dot products.

Gives us the task:

Max Ld = i – ½ijxi•xj,
Subject to:
w = iyixi
iyi = 0

What we need to see: xiand xj (input vectors) appear only in the form
of dot product – we will soon see why that is important.
Problems with linear SVM

=-1
=+1

What if the decison function is not a linear?

Non-linear SVM 1
The Kernel trick Imagine a function  that maps the data into another space:
=Rd
Rd 
=-1
=+1


=-1
=+1

Remember the function we want to optimize: Ld = i – ½ijxi•xj,

xi and xj as a dot product. We will have (xi) • (xj) in the non-linear case.
If there is a ”kernel function” K such as K(xi,xj) = (xi) • (xj), we
do not need to know  explicitly. One example:
Non-linear svm2
The function we end up optimizing is:
Max Ld = i – ½ijK(xi•xj),
Subject to:
w = iyixi
iyi = 0

Another kernel example: The polynomial kernel

K(xi,xj) = (xi•xj + 1)p, where p is a tunable parameter.
Evaluating K only require one addition and one exponentiation
more than the original dot product.
Solving the optimization
problem
• In many cases any general purpose optimization
package that solves linearly constrained equations will
do.
– Newtons’ method
– Conjugate gradient descent
• Other methods involves nonlinear programming
techniques.
Overtraining/overfitting
A well known problem with machine learning methods is overtraining.
This means that we have learned the training data very well, but
we can not classify unseen examples correctly.
An example: A botanist really knowing trees.Everytime he sees a new tree,
he claims it is not a tree.

=-1
=+1
Overtraining/overfitting 2
A measure of the risk of overtraining with SVM (there are also other
measures).
It can be shown that: The portion, n, of unseen data that will be
missclassified is bound by:
n  No of support vectors / number of training examples

Ockham´s razor principle: Simpler system are better than more complex ones.
In SVM case: fewer support vectors mean a simpler representation of the
hyperplane.

Example: Understanding a certain cancer if it can be described by one gene

is easier than if we have to describe it with 5000.
A practical example, protein
localization
• Proteins are synthesized in the cytosol.
• Transported into different subcellular
locations where they carry out their
functions.
• Aim: To predict in what location a
certain protein will end up!!!
Subcellular Locations
Method
• Hypothesis: The amino acid composition of proteins
from different compartments should differ.
• Extract proteins with know subcellular location from
SWISSPROT.
• Calculate the amino acid composition of the proteins.
• Try to differentiate between: cytosol, extracellular,
mitochondria and nuclear by using SVM
Input encoding
Prediction of nuclear proteins:
Label the known nuclear proteins as +1 and all others
as –1.
The input vector xi represents the amino acid
composition.
Eg xi =(4.2,6.7,12,….,0.5)
A , C , D,….., Y)

Nuclear
SVM Model
All others
Cross-validation
Cross validation: Split the data into n sets, train on n-1 set, test on the set left
out of training.
1
1 Test set
Nuclear 1
2

2
1
Training set
All others 3
2
2
3
3
Performance measurments
Test data Predictions TP

FP
+1
Model
TN
-1

=+1
=-1
FN

SP = TP /(TP+FP), the fraction of predicted +1 that actually are +1.

SE = TP /(TP+FN), the fraction of the +1 that actually are predicted as +1.
In this case: SP=5/(5+1) =0.83
SE = 5/(5+2) = 0.71
Results
• We definetely get some predictive
power out of our models.
• Seems to be a difference in composition
of proteins from different subcellular
locations.
• Another questions: What about nuclear
proteins. Is there a difference between
DNA-binding proteins and others???
Conclusions
• We have (hopefully) learned some basic
concepts and terminology of SVM.
• We know about the risk of overtraining
and how to put a measure on the risk
of bad generalization.
• SVMs can be useful for example in
predicting subcellular location of
proteins.
You can’t input anything into
a learning machine!!!

Image classification of tanks. Autofire when an enemy tank is spotted.

Input data: Photos of own and enemy tanks.
Worked really good with the training set used.
In reality it failed completely.

Reason: All enemy tank photos taken in the morning. All own tanks in dawn.
The classifier could recognize dusk from dawn!!!!
References
http://www.kernel-machines.org/

http://www.support-vector.net/

AN INTRODUCTION TO SUPPORT VECTOR MACHINES

(and other kernel-based learning methods)
N. Cristianini and J. Shawe-Taylor
Cambridge University Press
2000 ISBN: 0 521 78019 5

Papers by Vapnik

C.J.C. Burges: A tutorial on Support Vector Machines. Data Mining and

Knowledge Discovery 2:121-167, 1998.

Final Marketing Plan Whole
No ratings yet
Final Marketing Plan Whole
19 pages
SAP SD Credit Memo, Debit Memo and Return Order
100% (2)
SAP SD Credit Memo, Debit Memo and Return Order
21 pages
MOS-II Lecture 01 - Stress Analysis
No ratings yet
MOS-II Lecture 01 - Stress Analysis
25 pages
Tour de Samos 2025 Results Overall
No ratings yet
Tour de Samos 2025 Results Overall
1 page
BMS Procedure
100% (3)
BMS Procedure
138 pages
Ericsson The Bss To Cloud Journey
No ratings yet
Ericsson The Bss To Cloud Journey
26 pages
Zok The Armenian Dialect of Agulis
No ratings yet
Zok The Armenian Dialect of Agulis
19 pages
Enterprise Resource Planning: MODULE 9: Business Process Management (BPM)
No ratings yet
Enterprise Resource Planning: MODULE 9: Business Process Management (BPM)
4 pages
Chapter-5 Push and Pull Model
No ratings yet
Chapter-5 Push and Pull Model
12 pages
68 133 1 SM PDF
No ratings yet
68 133 1 SM PDF
9 pages
American Scientist, Vol. 111.1 (January-February 2023)
No ratings yet
American Scientist, Vol. 111.1 (January-February 2023)
68 pages
Chapter 2 Opaud
No ratings yet
Chapter 2 Opaud
5 pages
E-Commerce & Business Communication Ebook (SEM 4)
No ratings yet
E-Commerce & Business Communication Ebook (SEM 4)
87 pages
Chemistry Homework 8-1
No ratings yet
Chemistry Homework 8-1
7 pages
Seminar On Schedule U: Presented by
No ratings yet
Seminar On Schedule U: Presented by
21 pages
Application Form
No ratings yet
Application Form
2 pages
Studentsco: Computer Science
No ratings yet
Studentsco: Computer Science
6 pages
Electrical Electronics VOL.08 PDF
50% (2)
Electrical Electronics VOL.08 PDF
148 pages
Lockheed Martin Case Study
No ratings yet
Lockheed Martin Case Study
2 pages
PR A2plus B1 The World Today Videos Videoscript
No ratings yet
PR A2plus B1 The World Today Videos Videoscript
3 pages
Learner Autonomy and Vocabulary Development For Female Learners of English As A Foreign Language at A College Level in The Kingdom of Saudi Arabia
No ratings yet
Learner Autonomy and Vocabulary Development For Female Learners of English As A Foreign Language at A College Level in The Kingdom of Saudi Arabia
4 pages
Giancoli Chap 3 Vectors Kinematics in 2 Dimensions
No ratings yet
Giancoli Chap 3 Vectors Kinematics in 2 Dimensions
37 pages
EUPoP-Solo and Bot Rules-1.2-Single Pages
No ratings yet
EUPoP-Solo and Bot Rules-1.2-Single Pages
16 pages
National Guidelines For Management of DR TB - 27 3 2025
No ratings yet
National Guidelines For Management of DR TB - 27 3 2025
82 pages
EBD Blades Sponsorhip Letter
No ratings yet
EBD Blades Sponsorhip Letter
2 pages
Experiment 106: Uniform Circular Motion
No ratings yet
Experiment 106: Uniform Circular Motion
7 pages
WAHA Monthly Halal Checklist Abattoir
No ratings yet
WAHA Monthly Halal Checklist Abattoir
9 pages
Ielts
No ratings yet
Ielts
1 page
TLE7 - 8-ICT-PROGRAMMING FOR ROBOTICS Q1 M1 W1 - noAK
No ratings yet
TLE7 - 8-ICT-PROGRAMMING FOR ROBOTICS Q1 M1 W1 - noAK
16 pages
9 Types of Organization
No ratings yet
9 Types of Organization
4 pages

M7-Support Vector Mechine

Uploaded by

M7-Support Vector Mechine

Uploaded by

An Introduction to Support Vector

Data: <xi,yi>, i=1,..,l

All hyperplanes in Rd are parameterize by a vector (w) and a constant b.

d+ = the shortest distance to the closest poitive point

d- = the shortest distance to the closest negative point

The margin of a separating hyperplane is d+ + d-.

The distance between H1 and H2 is: 2/||w||

In order to maximize the margin, we need to minimize ||w||. With the

Gives us the task:

What if the decison function is not a linear?

Remember the function we want to optimize: Ld = i – ½ijxi•xj,

Another kernel example: The polynomial kernel

Example: Understanding a certain cancer if it can be described by one gene

SP = TP /(TP+FP), the fraction of predicted +1 that actually are +1.

Image classification of tanks. Autofire when an enemy tank is spotted.

AN INTRODUCTION TO SUPPORT VECTOR MACHINES

C.J.C. Burges: A tutorial on Support Vector Machines. Data Mining and

You might also like