Open navigation menu

Scribd

0% found this document useful (0 votes)

208 views34 pages

SVM Tutorial

Support Vector Machine (SVM) is a supervised machine learning algorithm that can be used for both classification and regression analysis. It works by finding a hyperplane in a high-dimensional space that distinctly classifies the data points. SVMs use kernel functions to maximize the margin between the decision boundary and the nearest data points of each class. They can efficiently perform nonlinear classification by implicitly mapping their inputs into high-dimensional feature spaces. SVMs have been successfully applied to areas like text categorization, image recognition, and cancer classification.

Uploaded by

Copyright

© Attribution Non-Commercial (BY-NC)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

208 views34 pages

SVM Tutorial

Support Vector Machine (SVM) is a supervised machine learning algorithm that can be used for both classification and regression analysis. It works by finding a hyperplane in a high-dimensional space that distinctly classifies the data points. SVMs use kernel functions to maximize the margin between the decision boundary and the nearest data points of each class. They can efficiently perform nonlinear classification by implicitly mapping their inputs into high-dimensional feature spaces. SVMs have been successfully applied to areas like text categorization, image recognition, and cancer classification.

Uploaded by

Copyright

© Attribution Non-Commercial (BY-NC)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 34

Support Vector Machine

& Its Applications

A portion (1/3) of the slides are taken from Prof. Andrew Moores SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials

Mingyue Tan
The University of British Columbia Nov 26, 2004

Overview

Intro. to Support Vector Machines (SVM) Properties of SVM Applications

Gene Expression Data Classification Text Categorization if time permits

Discussion

Linear Classifiers
denotes +1 denotes -1

x
w x + b>0
b= 0

yest

f(x,w,b) = sign(w x + b)

How would you classify this data?

w x + b<0

Linear Classifiers
denotes +1 denotes -1

yest

f(x,w,b) = sign(w x + b)

How would you classify this data?

Linear Classifiers
denotes +1 denotes -1

yest

f(x,w,b) = sign(w x + b)

How would you classify this data?

Linear Classifiers
denotes +1 denotes -1

yest

f(x,w,b) = sign(w x + b)

Any of these would be fine.. ..but which is best?

Linear Classifiers
denotes +1 denotes -1

yest

f(x,w,b) = sign(w x + b)

How would you classify this data?

Misclassified to +1 class

Classifier Margin
x
denotes +1 denotes -1

yest

f(x,w,b) = sign(w x + b)

Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.

Maximum Margin
denotes +1 denotes -1

x
1. Maximizing the margin is good accordingf(x,w,b) = sign(w x + b) to intuition and PAC theory 2. Implies that only support vectors are important; other The maximum training examples are ignorable.

yest

Support Vectors are those datapoints that the margin pushes up against

margin linear 3. Empirically it works very very well. classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM)
Linear SVM

Linear SVM Mathematically 1

=+ s l as e C ict zon re d P

x+

M =Margin Width

1 b= + 0 wx b= + wx -1 b= + wx

X-1 = ss la t C o ne c z edi Pr

What we know: w . x+ + b = +1 w . x- + b = -1 w . (x+-x-) = 2

(x x ) w 2 M = = w w

Linear SVM Mathematically

Goal: 1) Correctly classify all training data wx i + b 1 if yi = +1

wx i +b 1 if yi = -1 yi ( wxi + b) 1 for all i 2

M =

2) Maximize the Margin same as minimize

1 t w ww 2

We can formulate a Quadratic Optimization Problem and solve for w and b

1 t ( w) = w w Minimize 2
subject to

yi ( wxi + b) 1

Solving the Optimization Problemthat Find w and b such

(w) = wTw is minimized; and for all {(xi ,yi)}: yi (wTxi + b) 1 Need to optimize a quadratic function subject to linear constraints. Quadratic optimization problems are a well-known class of mathematical programming problems, and many (rather intricate) algorithms exist for solving them. The solution involves constructing a dual problem where a Lagrange multiplier i is associated with every constraint in the primary problem: Find 1N such that Q() =i - ijyiyjxiTxj is maximized and (1) iyi = 0 (2) i 0 for all i

The Optimization Problem Solution

The solution has the form: w =iyixi b= yk- wTxk for any xk such that k 0

Each non-zero i indicates that corresponding xi is a support vector. Then the classifying function will have the form: f(x) = iyixiTx + b Notice that it relies on an inner product between the test point x and the support vectors xi we will return to this later. Also keep in mind that solving the optimization problem involved computing the inner products xiTxj between all pairs of training points.

Dataset with noise

denotes +1 denotes -1

Hard Margin: So far we require

all data points be classified correctly - No training error

What if the training set is noisy? - Solution 1: use very powerful

kernels

OVERFITTING!

Soft Margin Classification

Slack variables i can be added to allow misclassification of difficult or noisy examples.

2
=1 +b wx =0 +b -1 wx b= + wx

11

What should our quadratic optimization criterion be? Minimize

1 w.w + C k 2 k =1

Hard Margin v.s. Soft Margin

The old formulation:

Find w and b such that (w) = wTw is minimized and for all {(xi ,yi)} yi (wTxi + b) 1

The new formulation incorporating slack variables:

Find w and b such that (w) = wTw + Ci is minimized and for all {(xi ,yi)} yi (wTxi + b) 1- i and i 0 for all i

Parameter C can be viewed as a way to control overfitting.

Linear SVMs: Overview

The classifier is a separating hyperplane. Most important training points are support vectors; they define the hyperplane. Quadratic optimization algorithms can identify which training points xi are support vectors with non-zero Lagrangian multipliers i. Both in the dual formulation of the problem and in the solution training points appear only inside dot products:

Find 1N such that Q() =i - ijyiyjxiTxj is maximized and (1) iyi = 0 (2) 0 i C for all i f(x) = iyixiTx + b

Non-linear SVMs

Datasets that are linearly separable with some noise work out great:
0 x

But what are we going to do if the dataset is just too hard? How about mapping data to a higher-dimensional space:
x2 0 x

Non-linear SVMs: Feature spaces

General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable:

: x (x)

The Kernel Trick

The linear classifier relies on dot product between vectors K(xi,xj)=xiTxj If every data point is mapped into high-dimensional space via some transformation : x (x), the dot product becomes: K(xi,xj)= (xi) T(xj) A kernel function is some function that corresponds to an inner product in some expanded feature space. Example: 2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2, Need to show that K(xi,xj)= (xi) T(xj): K(xi,xj)=(1 + xiTxj)2, = 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2 = [1 xi12 2 xi1xi2 xi22 2xi1 2xi2]T [1 xj12 2 xj1xj2 xj22 2xj1 2xj2] = (xi) T(xj), where (x) = [1 x12 2 x1x2 x22 2x1 2x2]

What Functions are Kernels?

For some functions K(xi,xj) checking that K(xi,xj)= (xi) T(xj) can be cumbersome. Mercers theorem: Every semi-positive definite symmetric function is a kernel Semi-positive definite symmetric functions correspond to a semi-positive definite symmetric Gram matrix:

K(x1,x1) K(x1,x2) K(x1,x3)

K=

K(x1,xN) K(x2,xN) K(xN,xN)

K(x2,x1) K(x2,x2) K(x2,x3) K(xN,x1) K(xN,x2) K(xN,x3)

Examples of Kernel Functions

Linear: K(xi,xj)= xi Txj

Polynomial of power p: K(xi,xj)= (1+ xi Txj)p Gaussian (radial-basis function network): 2

K (x i , x j ) = exp(

xi x j 2
2

Sigmoid: K(xi,xj)= tanh(0xi Txj + 1)

Non-linear SVMs Mathematically

Dual problem formulation:

Find 1N such that Q() =i - ijyiyjK(xi, xj) is maximized and (1) iyi = 0 (2) i 0 for all i

The solution is: f(x) = iyiK(xi, xj)+ b

Optimization techniques for finding is remain the same!

Nonlinear SVM - Overview

SVM locates a separating hyperplane in the feature space and classify points in that space It does not need to represent the space explicitly, simply by defining a kernel function The kernel function plays the role of the dot product in the feature space.

Properties of SVM

Flexibility in choosing a similarity function Sparseness of solution when dealing with large data sets
- only support vectors are used to specify the separating hyperplane

Ability to handle large feature spaces

- complexity does not depend on the dimensionality of the feature space

Overfitting can be controlled by soft margin approach Nice math property: a simple convex optimization problem
which is guaranteed to converge to a single global solution

Feature Selection

SVM Applications

SVM has been used successfully in many real-world problems

- text (and hypertext) categorization - image classification - bioinformatics (Protein classification, Cancer classification) - hand-written character recognition

Application 1: Cancer Classification High Dimensional

- p>1000; n<100

Patients P-1

Genes g-2 g-p

g-1

Imbalanced
- less positive samples

p-2 . p-n

n+ K [ x , x ] = k ( x, x ) + N

Many irrelevant features Noisy

FEATURE SELECTION In the linear case, wi2 gives the ranking of dim i

SVM is sensitive to noisy (mis-labeled) data

Weakness of SVM

It is sensitive to noise
- A relatively small number of mislabeled examples can dramatically decrease the performance

It only considers two classes

- how to do multi-class classification with SVM? - Answer: 1) with output arity m, learn m SVMs SVM 1 learns Output==1 vs Output != 1 SVM 2 learns Output==2 vs Output != 2 : SVM m learns Output==m vs Output != m 2)To predict the output for a new input, just predict with each SVM and find out which one puts the prediction the furthest into the positive region.

Application 2: Text Categorization

Task: The classification of natural text (or hypertext) documents into a fixed number of predefined categories based on their content.
- email filtering, web searching, sorting documents by topic, etc..

A document can be assigned to more than one category, so this can be viewed as a series of binary classification problems, one for each category

Representation of Text
IRs vector space model (aka bag-of-words representation) A doc is represented by a vector indexed by a pre-fixed set or dictionary of terms Values of an entry can be binary or weights

Normalization, stop words, word stems Doc x => (x)

Text Categorization using SVM

The distance between two documents is (x)(z) K(x,z) = (x)(z) is a valid kernel, SVM can be used with K(x,z) for discrimination. Why SVM?
-High dimensional input space -Few irrelevant features (dense concept) -Sparse document vectors (sparse instances) -Text categorization problems are linearly separable

Some Issues

Choice of kernel
- Gaussian or polynomial kernel is default - if ineffective, more elaborate kernels are needed - domain experts can give assistance in formulating appropriate similarity measures

Choice of kernel parameters

- e.g. in Gaussian kernel - is the distance between closest points with different classifications - In the absence of reliable criteria, applications rely on the use of a validation set or cross-validation to set such parameters.

Optimization criterion Hard margin v.s. Soft margin

- a lengthy series of experiments in which various parameters are tested

Additional Resources

An excellent tutorial on VC-dimension and Support Vector Machines:

C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):955974, 1998.

The VC/SRM/SVM Bible:

Statistical Learning Theory by Vladimir Vapnik, WileyInterscience; 1998

http://www.kernel-machines.org/

Reference

Support Vector Machine Classification of Microarray Gene Expression Data, Michael P. S. Brown William Noble Grundy, David Lin, Nello Cristianini, Charles Sugnet, Manuel Ares, Jr., David Haussler www.cs.utexas.edu/users/mooney/cs391L/svm.ppt Text categorization with Support Vector Machines: learning with many relevant features
T. Joachims, ECML - 98

You might also like

PUC User Manual
No ratings yet
PUC User Manual
16 pages
TSCM60
0% (3)
TSCM60
2 pages
Literature Review
No ratings yet
Literature Review
17 pages
lp215 Label Printer Brochure
No ratings yet
lp215 Label Printer Brochure
3 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
Lecture 03 Gradient Descent
No ratings yet
Lecture 03 Gradient Descent
26 pages
Chapter 6 ML Classifications
No ratings yet
Chapter 6 ML Classifications
51 pages
Supervised Learning - Regression - Annotated
No ratings yet
Supervised Learning - Regression - Annotated
97 pages
The Complete Guide To Data Preprocessing
No ratings yet
The Complete Guide To Data Preprocessing
50 pages
Support Vector Machines PDF
100% (1)
Support Vector Machines PDF
37 pages
Understanding Support Vector Machine Algorithm From Examples
No ratings yet
Understanding Support Vector Machine Algorithm From Examples
10 pages
Curse of Dimensionality
No ratings yet
Curse of Dimensionality
9 pages
ML Kernel Methods
No ratings yet
ML Kernel Methods
51 pages
Lasoo Regression
No ratings yet
Lasoo Regression
8 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Assignment No - 6-1
100% (1)
Assignment No - 6-1
3 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
Machine Learning Solution
No ratings yet
Machine Learning Solution
6 pages
Linear Regression
100% (1)
Linear Regression
51 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
Bagging and Boosting Regression Algorithms
100% (1)
Bagging and Boosting Regression Algorithms
84 pages
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
No ratings yet
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
10 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
100% (1)
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
73 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
Data Preprocesing JavaPoint
No ratings yet
Data Preprocesing JavaPoint
19 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
K-Means and PCA
No ratings yet
K-Means and PCA
69 pages
ML Interview Questions and Answers
100% (1)
ML Interview Questions and Answers
25 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
Logistic Regression
100% (1)
Logistic Regression
14 pages
Machine Learning and Neural Networks: Riccardo Rizzo
100% (1)
Machine Learning and Neural Networks: Riccardo Rizzo
113 pages
Bandits
No ratings yet
Bandits
2 pages
Supervised Learning
No ratings yet
Supervised Learning
19 pages
Machine Learning
100% (5)
Machine Learning
56 pages
03 - K Means Clustering On Iris Datasets
No ratings yet
03 - K Means Clustering On Iris Datasets
4 pages
Intro SVM New Example PDF
100% (1)
Intro SVM New Example PDF
56 pages
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
IRIS BPNN - Ipynb - Colaboratory
100% (1)
IRIS BPNN - Ipynb - Colaboratory
4 pages
Machine Learning
No ratings yet
Machine Learning
29 pages
Loading The Dataset: 'Churn - Modelling - CSV'
No ratings yet
Loading The Dataset: 'Churn - Modelling - CSV'
6 pages
The Multilayer Perceptron
No ratings yet
The Multilayer Perceptron
11 pages
Support Vector Machines
No ratings yet
Support Vector Machines
14 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
Ch5 - Support Vector Machine (SVM)
No ratings yet
Ch5 - Support Vector Machine (SVM)
27 pages
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
100% (1)
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
27 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
Data Preprocessing
No ratings yet
Data Preprocessing
38 pages
Algorithms: K Nearest Neighbors
No ratings yet
Algorithms: K Nearest Neighbors
16 pages
Machine Learning SVM - Supervised
No ratings yet
Machine Learning SVM - Supervised
32 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
TF Idf
100% (3)
TF Idf
38 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
5 pages
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Use SQLMAP SQL Injection To Hack A Website and Database in Kali Linux
No ratings yet
Use SQLMAP SQL Injection To Hack A Website and Database in Kali Linux
19 pages
Human Computer Interaction Tutorial Example Exam Questions 1
No ratings yet
Human Computer Interaction Tutorial Example Exam Questions 1
9 pages
AEC Collection Comparison Matrix
No ratings yet
AEC Collection Comparison Matrix
1 page
Erp Mass Addition
No ratings yet
Erp Mass Addition
29 pages
DBMS Practical List
No ratings yet
DBMS Practical List
6 pages
6592dquiz - II - Erp
No ratings yet
6592dquiz - II - Erp
4 pages
CS6760 - DDV - Verilog Assignment 1: Deadline For Submission: 23:59hrs On The 24 Submission Guidelines
No ratings yet
CS6760 - DDV - Verilog Assignment 1: Deadline For Submission: 23:59hrs On The 24 Submission Guidelines
8 pages
DeepakBatra7 0
No ratings yet
DeepakBatra7 0
3 pages
Flask Docs
No ratings yet
Flask Docs
273 pages
OOP-1st Lecture - Building Block
100% (1)
OOP-1st Lecture - Building Block
55 pages
Worksheet - List Xi 1
No ratings yet
Worksheet - List Xi 1
12 pages
Linux Encrypted Filesystem With Dm-Crypt: Búsqueda
No ratings yet
Linux Encrypted Filesystem With Dm-Crypt: Búsqueda
5 pages
3G Huawei New Sites Parameter Setting Guideline V1.0
No ratings yet
3G Huawei New Sites Parameter Setting Guideline V1.0
8 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Mum PSP SQL
No ratings yet
Mum PSP SQL
24 pages
Lec 9
No ratings yet
Lec 9
19 pages
Assignment-2 Numerical Sol of DE 3rd Dec 2018 PDF
No ratings yet
Assignment-2 Numerical Sol of DE 3rd Dec 2018 PDF
2 pages
Project Title
No ratings yet
Project Title
12 pages
Human Resource Planning at WIPRO
71% (7)
Human Resource Planning at WIPRO
11 pages
AWS Simple Icons PPT v15.10.2
No ratings yet
AWS Simple Icons PPT v15.10.2
35 pages
Mysql Insert Update Delete
No ratings yet
Mysql Insert Update Delete
6 pages
LDPC
No ratings yet
LDPC
40 pages
Math2565Winter2016outline 4
No ratings yet
Math2565Winter2016outline 4
2 pages
MOP - Flexi Multiradio WCDMA BTS Commissioning - Zain Iraq
No ratings yet
MOP - Flexi Multiradio WCDMA BTS Commissioning - Zain Iraq
51 pages
Lect18 2fast Add
No ratings yet
Lect18 2fast Add
8 pages
Database Management System (DBMS)
No ratings yet
Database Management System (DBMS)
18 pages