0% found this document useful (0 votes)

66 views10 pages

sectionSVM PDF

This document provides an introduction to support vector machines (SVMs) using the R package kernlab. It begins with a simple linear SVM example using synthetic two-dimensional data to classify points as positive or negative. The document explores changing parameters like C and using different kernels to handle non-linearly separable data. Finally, it applies SVMs to a cancer gene expression dataset to classify tumors.

Uploaded by

Arif Rahman Hakim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views10 pages

sectionSVM PDF

Uploaded by

Arif Rahman Hakim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Practical session: Introduction to SVM in R

Jean-Philippe Vert

November 23, 2015

In this session you will

• Learn how manipulate a SVM in R with the package kernlab

• Observe the effect of changing the C parameter and the kernel

• Test a SVM classifier for cancer diagnosis from gene expression data

1 Linear SVM
Here we generate a toy dataset in 2D, and learn how to train and test a SVM.

1.1 Generate toy data

First generate a set of positive and negative examples from 2 Gaussians.

n <- 150 #number of data points

p <- 2 # dimension
sigma <- 1 # variance of the distribution
meanpos <- 0 # centre of the distribution of positive examples
meanneg <- 3 # centre of the distribution of negative examples
npos <- round(n / 2) # number of positive examples
nneg <- n - npos # number of negative examples

# Generate the positive and negative examples

xpos <- matrix(rnorm(npos * p, mean = meanpos, sd = sigma), npos, p)
xneg <- matrix(rnorm(nneg * p, mean = meanneg, sd = sigma), npos, p)
x <- rbind(xpos, xneg)

# Generate the labels

y <- matrix(c(rep(1, npos), rep(-1, nneg)))

# Visualize the data

plot(x, col = ifelse(y > 0, 1, 2))
legend("topleft", c("Positive", "Negative"), col = seq(2), pch = 1, text.col = seq(2))

1
1.1 Generate toy data 1 LINEAR SVM

Positive
Negative
4
2
x[,2]

0
−2

−2 0 2 4

x[,1]

Now we split the data into a training set (80%) and a test set (20%)

# Prepare a training and a test set

ntrain <- round(n * 0.8) # number of training examples
tindex <- sample(n, ntrain) # indices of training samples
xtrain <- x[tindex, ]
xtest <- x[-tindex, ]
ytrain <- y[tindex]
ytest <- y[-tindex]
istrain <- rep(0, n)
istrain[tindex] <- 1

# Visualize
plot(x, col = ifelse(y > 0, 1, 2), pch = ifelse(istrain == 1,1,2))
legend("topleft", c("Positive Train", "Positive Test", "Negative Train", "Negative Test"), col = c(1, 1,

2
1.2 Train a SVM 1 LINEAR SVM

Positive Train
Positive Test
Negative Train
4

Negative Test
2
x[,2]

0
−2

−2 0 2 4

x[,1]

1.2 Train a SVM

Now we train a linear SVM with parameter C=100 on the training set.

# load the kernlab package

# install.packages("kernlab")
library(kernlab)

# train the SVM

svp <- ksvm(xtrain, ytrain, type = "C-svc", kernel = "vanilladot", C=100, scaled=c())

#Look and understand what svp contains

# General summary
svp

# Attributes that you can access

attributes(svp)

# For example, the support vectors

alpha(svp)
alphaindex(svp)
b(svp)

3
1.3 Predict with a SVM 1 LINEAR SVM

# Use the built-in function to pretty-plot the classifier

plot(svp, data = xtrain)

QUESTION1 - Write a function plotlinearsvm=function(svp,xtrain) to plot the points and the

decision boundaries of a linear SVM, as in Figure 1. To add a straight line to a plot, you may
use the function abline.

1.3 Predict with a SVM

Now we can use the trained SVM to predict the label of points in the test set, and we analyze the results
using variant metrics.

# Predict labels on test

ypred <- predict(svp, xtest)
table(ytest, ypred)

4
1.4 Cross-validation 1 LINEAR SVM

# Compute accuracy
sum(ypred == ytest) / length(ytest)

# Compute at the prediction scores

ypredscore <- predict(svp, xtest, type = "decision")

# Check that the predicted labels are the signs of the scores
table(ypredscore > 0, ypred)

# Package to compute ROC curve, precision-recall etc...

# install.packages("ROCR")
library(ROCR)

## Loading required package: gplots

##
## Attaching package: ’gplots’
##
## The following object is masked from ’package:stats’:
##
## lowess

pred <- prediction(ypredscore, ytest)

# Plot ROC curve

perf <- performance(pred, measure = "tpr", x.measure = "fpr")
plot(perf)
# Plot precision/recall curve
perf <- performance(pred, measure = "prec", x.measure = "rec")
plot(perf)
# Plot accuracy as function of threshold
perf <- performance(pred, measure = "acc")
plot(perf)

1.4 Cross-validation
Instead of fixing a training set and a test set, we can improve the quality of these estimates by running k-fold
cross-validation. We split the training set in k groups of approximately the same size, then iteratively train
a SVM using k - 1 groups and make prediction on the group which was left aside. When k is equal to the
number of training points, we talk of leave-one-out (LOO) cross-validatin. To generate a random split of n
points in k folds, we can for example create the following function:

cv.folds <- function(y, folds = 3){

## randomly split the n samples into folds
split(sample(length(y)), rep(1:folds, length = length(y)))
}

QUESTION2 - Write a function cv.ksvm = function(x, y, folds = 3,...) which returns a vector
ypred of predicted decision score for all points by k-fold cross-validation

QUESTION3 - Compute the various performance of the SVM by 5-fold cross-validation. Al-
ternatively, the ksvm function can automatically compute the k-fold cross-validation accuracy:

5
1.5 Effect of C 1 LINEAR SVM

svp <- ksvm(x, y, type = "C-svc", kernel = "vanilladot", C = 100, scaled=c(), cross = 5)
print(cross(svp))
print(error(svp))

QUESTION4 - Compare the 5-fold CV estimated by your function and ksvm.

1.5 Effect of C
The C parameters balances the trade-off between having a large margin and separating the positive and
unlabeled on the training set. It is important to choose it well to have good generalization.

QUESTION5 - Plot the decision functions of SVM trained on the toy examples for different
values of C in the range 2seq(−10,14) . To look at the different plots you can use the function
par(ask=T) that will ask you to press a key between successive plots. Alternatively, you can
use par(mfrow = c(5,5)) to see all the plots in the same window

QUESTION6 - Plot the 5-fold cross-validation error as a function of C.

QUESTION7 - Do the same on data with more overlap between the two classes, e.g., re-
generate toy data with meanneg being 1.

6
2 NONLINEAR SVM

2 Nonlinear SVM
Sometimes linear SVM are not enough. For example, generate a toy dataset where positive and negative
examples are mixture of two Gaussians which are not linearly separable.

QUESTION8 - Make a toy example that looks like Figure 2, and test a linear SVM with
different values of C.

To solve this problem, we should instead use a nonlinear SVM. This is obtained by simply changing the
kernel parameter. For example, to use a Gaussian RBF kernel with σ = 1 and C = 1:

# Train a nonlinear SVM

svp <- ksvm(x, y, type = "C-svc", kernel="rbf", kpar = list(sigma = 1), C = 1)

# Visualize it
plot(svp, data = x)

You should obtain something that look like Figure 3. Much better than the linear SVM, no? The nonlinear
SVM has now two parameters: σ and C. Both play a role in the generalization capacity of the SVM.

QUESTION9 - Visualize and compute the 5-fold cross-validation error for different values of
C and σ. Observe their influence.

7
2 NONLINEAR SVM

A useful heuristic to choose σ is implemented in kernlab. It is based on the quantiles of the distances between
the training point.

# Train a nonlinear SVM with automatic selection of sigma by heuristic

svp <- ksvm(x, y, type = "C-svc", kernel = "rbf", C = 1)

# Visualize it
plot(svp, data = x)

QUESTION10 - Train a nonlinear SVM with various of C with automatic determination of σ.

In fact, many other nonlinear kernels are implemented. Check the documentation of kernlab
to see them: ?kernels

QUESTION11 - Test the polynomial, hyperbolic tangent, Laplacian, Bessel and ANOVA ker-
nels on the toy examples.

8
3 APPLICATION: CANCER DIAGNOSIS FROM GENE EXPRESSION DATA

3 Application: cancer diagnosis from gene expression data

As a real-world application, let us test the ability of SVM to predict the class of a tumour from gene ex-
pression data. We use a publicly available dataset of gene expression data for 128 different individuals with
acute lymphoblastic leukemia (ALL).

# Load the ALL dataset

library(ALL)

## Loading required package: Biobase

## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: ’BiocGenerics’
##
## The following objects are masked from ’package:parallel’:
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
##
## The following object is masked from ’package:stats’:
##
## xtabs
##
## The following objects are masked from ’package:base’:
##
## anyDuplicated, append, as.data.frame, as.vector, cbind,
## colnames, do.call, duplicated, eval, evalq, Filter, Find, get,
## intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rep.int, rownames, sapply, setdiff, sort,
## table, tapply, union, unique, unlist, unsplit
##
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## ’browseVignettes()’. To cite Bioconductor, see
## ’citation("Biobase")’, and for packages ’citation("pkgname")’.

data(ALL)

# Inspect them
?ALL
show(ALL)
print(summary(pData(ALL)))

Here we focus on predicting the type of the disease (B-cell or T-cell). We get the expression data and disease
type as follows

9
3 APPLICATION: CANCER DIAGNOSIS FROM GENE EXPRESSION DATA

x <- t(exprs(ALL))
y <- substr(ALL$BT,1,1)

QUESTION12 - Test the ability of a SVM to predict the class of the disease from gene ex-
pression. Check the influence of the parameters.

Finally, we may want to predict the type and stage of the diseases. We are then confronted with a multi-class
classification problem, since the variable to predict can take more than two values:

y <- ALL$BT
print(y)

## [1] B2 B2 B4 B1 B2 B1 B1 B1 B2 B2 B3 B3 B3 B2 B3 B B2 B3 B2 B3 B2 B2 B2
## [24] B1 B1 B2 B1 B2 B1 B2 B B B2 B2 B2 B1 B2 B2 B2 B2 B2 B4 B4 B2 B2 B2
## [47] B4 B2 B1 B2 B2 B3 B4 B3 B3 B3 B4 B3 B3 B1 B1 B1 B1 B3 B3 B3 B3 B3 B3
## [70] B3 B3 B1 B3 B1 B4 B2 B2 B1 B3 B4 B4 B2 B2 B3 B4 B4 B4 B1 B2 B2 B2 B1
## [93] B2 B B T T3 T2 T2 T3 T2 T T4 T2 T3 T3 T T2 T3 T2 T2 T2 T1 T4 T
## [116] T2 T3 T2 T2 T2 T2 T3 T3 T3 T2 T3 T2 T
## Levels: B B1 B2 B3 B4 T T1 T2 T3 T4

Fortunately, kernlab implements automatically multi-class SVM by an all-versus-all strategy to combine

several binary SVM.

QUESTION13 - Test the ability of a SVM to predict the class and the stage of the disease
from gene expression.

Honours Endsem Notes
No ratings yet
Honours Endsem Notes
163 pages
SVM7
No ratings yet
SVM7
53 pages
3.unit 3 ML Part-1 Q&A
No ratings yet
3.unit 3 ML Part-1 Q&A
39 pages
Support Vector Machine
No ratings yet
Support Vector Machine
9 pages
Intellect OCR To SAP FB60 Integration Proposal
No ratings yet
Intellect OCR To SAP FB60 Integration Proposal
2 pages
SVM Everything
No ratings yet
SVM Everything
5 pages
SVM, Neural Network and Random Forest in R
No ratings yet
SVM, Neural Network and Random Forest in R
45 pages
Chapter 4 - Kernel Theory
No ratings yet
Chapter 4 - Kernel Theory
9 pages
Support Vector Machine For Classification
No ratings yet
Support Vector Machine For Classification
38 pages
A Practical Guide To Support Vector Classification: I I I N L
No ratings yet
A Practical Guide To Support Vector Classification: I I I N L
12 pages
Presentation - SVM & KM - May 2009
No ratings yet
Presentation - SVM & KM - May 2009
24 pages
Ex 6
No ratings yet
Ex 6
16 pages
06 Support - Vector - Machine
No ratings yet
06 Support - Vector - Machine
8 pages
Svmdoc
No ratings yet
Svmdoc
8 pages
SVM Notes
No ratings yet
SVM Notes
8 pages
EIE520 Neural Computation: The Hong Kong Polytechnic University
No ratings yet
EIE520 Neural Computation: The Hong Kong Polytechnic University
14 pages
SVM1
No ratings yet
SVM1
4 pages
A Practical Guide To Support Vector Classification: I I I N L
No ratings yet
A Practical Guide To Support Vector Classification: I I I N L
15 pages
Title: Implement Support Vector Machine Classifier: Department of Computer Science and Engineering
No ratings yet
Title: Implement Support Vector Machine Classifier: Department of Computer Science and Engineering
5 pages
SVM 1
No ratings yet
SVM 1
17 pages
ML LW 6 Kernel SVM
No ratings yet
ML LW 6 Kernel SVM
4 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
43 pages
DMML Unit4 - SVM
No ratings yet
DMML Unit4 - SVM
50 pages
SVM
No ratings yet
SVM
2 pages
UNIT-II-Support Vector Machine Algorithm
No ratings yet
UNIT-II-Support Vector Machine Algorithm
13 pages
Session Svmclassification
No ratings yet
Session Svmclassification
28 pages
SVM Example in R
No ratings yet
SVM Example in R
4 pages
Unit 3 Aam
No ratings yet
Unit 3 Aam
30 pages
SVM Presentation
No ratings yet
SVM Presentation
13 pages
Chapter 6 SVM
No ratings yet
Chapter 6 SVM
66 pages
SVM in R (David Meyer)
No ratings yet
SVM in R (David Meyer)
8 pages
Vignesh's Documentation
No ratings yet
Vignesh's Documentation
59 pages
Day 34
No ratings yet
Day 34
3 pages
Support Vector Machine
0% (1)
Support Vector Machine
7 pages
SVM Unit 2
No ratings yet
SVM Unit 2
12 pages
Unit5 ML
No ratings yet
Unit5 ML
12 pages
Breeds of Cattle
No ratings yet
Breeds of Cattle
18 pages
DIST88FNL
No ratings yet
DIST88FNL
37 pages
Support Vector Machine: Classification, Regression and Outliers Detection
No ratings yet
Support Vector Machine: Classification, Regression and Outliers Detection
26 pages
B24 ML Exp-3
No ratings yet
B24 ML Exp-3
10 pages
Deficient Knowledge: Nursing Diagnosis Nursing Care Plans (NCP)
No ratings yet
Deficient Knowledge: Nursing Diagnosis Nursing Care Plans (NCP)
3 pages
A Introduction To SVM PDF
No ratings yet
A Introduction To SVM PDF
48 pages
Advanced Photonics Research - 2021 - Wu - High Resolution 960 540 and 1920 1080 UV Micro Light Emitting Diode Displays
No ratings yet
Advanced Photonics Research - 2021 - Wu - High Resolution 960 540 and 1920 1080 UV Micro Light Emitting Diode Displays
8 pages
Lab 6 Dsa
No ratings yet
Lab 6 Dsa
15 pages
Support Vector Machines: The Interface To Libsvm in Package E1071 by David Meyer FH Technikum Wien, Austria
No ratings yet
Support Vector Machines: The Interface To Libsvm in Package E1071 by David Meyer FH Technikum Wien, Austria
8 pages
Support Vactor Machine Final
No ratings yet
Support Vactor Machine Final
11 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
28 pages
SVM Manual
No ratings yet
SVM Manual
7 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
SVM
No ratings yet
SVM
12 pages
What Is Support Vector Machine
No ratings yet
What Is Support Vector Machine
13 pages
Certificate Beneficial Ownership Form
100% (1)
Certificate Beneficial Ownership Form
3 pages
Aim of The Experiment-Software Required - Theory
No ratings yet
Aim of The Experiment-Software Required - Theory
6 pages
SB3000
No ratings yet
SB3000
76 pages
Negro Who's Who in California (1948)
100% (2)
Negro Who's Who in California (1948)
154 pages
Vendor Agreement Template
No ratings yet
Vendor Agreement Template
11 pages
Property Management Presentation
100% (1)
Property Management Presentation
14 pages
Support Vecor Machine
No ratings yet
Support Vecor Machine
4 pages
Ultralight Shape Load Table PDF
No ratings yet
Ultralight Shape Load Table PDF
13 pages
PML Lab Exp 10
No ratings yet
PML Lab Exp 10
3 pages
Chapter 4
No ratings yet
Chapter 4
53 pages
Attachment J - Weekly Inspection Report - New
No ratings yet
Attachment J - Weekly Inspection Report - New
8 pages
Labour Regulations in The UAE Are Governed by The UAE Labour Law
No ratings yet
Labour Regulations in The UAE Are Governed by The UAE Labour Law
10 pages
A Practical Guide To Support Vector Classi Cation - Chih-Wei Hsu, Chih-Chung Chang and Chih-Jen Lin
No ratings yet
A Practical Guide To Support Vector Classi Cation - Chih-Wei Hsu, Chih-Chung Chang and Chih-Jen Lin
12 pages
Dayananda Sagar College of Engineering: M.TECH: Digital Electronics and Communication
No ratings yet
Dayananda Sagar College of Engineering: M.TECH: Digital Electronics and Communication
4 pages
Computer Organization: Basic Structure of Computer
No ratings yet
Computer Organization: Basic Structure of Computer
59 pages
Support Vector Machines: The Interface To Libsvm in Package E1071 by David Meyer FH Technikum Wien, Austria
No ratings yet
Support Vector Machines: The Interface To Libsvm in Package E1071 by David Meyer FH Technikum Wien, Austria
8 pages
Support Vector Machine in R Paper
No ratings yet
Support Vector Machine in R Paper
28 pages
2 Plugins Changelog
No ratings yet
2 Plugins Changelog
3 pages
Apprinova Neossance Hemisqualane Latest
No ratings yet
Apprinova Neossance Hemisqualane Latest
4 pages
ArticleText 36579 1 10 20210115
No ratings yet
ArticleText 36579 1 10 20210115
15 pages
Centrifugal and Axial Compressor Appendix B
No ratings yet
Centrifugal and Axial Compressor Appendix B
21 pages
Birds Nest Menu
No ratings yet
Birds Nest Menu
7 pages
February 6 Vdi Comparison Gberger PDF
No ratings yet
February 6 Vdi Comparison Gberger PDF
49 pages
Child Care Resources - Seven Hills Foundation
No ratings yet
Child Care Resources - Seven Hills Foundation
1 page
(Final Draft) Taskap Sesdilu - M. Arief Priowahono
No ratings yet
(Final Draft) Taskap Sesdilu - M. Arief Priowahono
21 pages
Commerce Clause Flowchart
100% (1)
Commerce Clause Flowchart
1 page
Study Plan
No ratings yet
Study Plan
1 page
Figure 1. Resources and Competencies Figure 3. Porter's Value Chain
No ratings yet
Figure 1. Resources and Competencies Figure 3. Porter's Value Chain
6 pages
Fleximax Series Power System: All The Power You Need
No ratings yet
Fleximax Series Power System: All The Power You Need
2 pages
Email Exchange
No ratings yet
Email Exchange
2 pages
Atc Lecture Tyliu
No ratings yet
Atc Lecture Tyliu
48 pages
Quant Developers' Tools and Techniques: Quant Books, #1
From Everand
Quant Developers' Tools and Techniques: Quant Books, #1
Manfred Hindering
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet