Data Blog (/)
Data Science, Machine Learning and Statistics, implemented in Python
Support Vector Machine: Python implementation
using CVXOPT
Xavier Bourret Sicotte
Tue 26 June 2018
Category: Machine Learning (/category/machine-learning.html)
Support Vector Machines
In this second notebook on SVMs we will walk through the implementation of both the hard margin and soft margin SVM
algorithm in Python using the well known CVXOPT library. While the algorithm in its mathematical form is rather straightfoward,
its implementation in matrix form using the CVXOPT API can be challenging at first. This notebook will show the steps required
to derive the appropriate vectorized notation as well as the inputs needed for the API.
Background
This notebook assumes previous knowledge and understanding of the mathematics behind SVMs and the formulation of the
primal / dual optimization problem. For a summary of this topic please have a look at the following post on stats.stackexchange:
https://stats.stackexchange.com/questions/23391/how-does-a-support-vector-machine-svm-work/353605#353605
(https://stats.stackexchange.com/questions/23391/how-does-a-support-vector-machine-svm-work/353605#353605)
Libraries
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
sns.set()
from sklearn.svm import SVC
from cvxopt import matrix as cvxopt_matrix
from cvxopt import solvers as cvxopt_solvers
The dataset and inspection
We will use the same dataset as in the previous notebook of this series. In this case the data is linearly separable
/
#Data set
x_neg = np.array([[3,4],[1,4],[2,3]])
y_neg = np.array([-1,-1,-1])
x_pos = np.array([[6,-1],[7,-1],[5,-3]])
y_pos = np.array([1,1,1])
x1 = np.linspace(-10,10)
x = np.vstack((np.linspace(-10,10),np.linspace(-10,10)))
#Data for the next section
X = np.vstack((x_pos, x_neg))
y = np.concatenate((y_pos,y_neg))
#Parameters guessed by inspection
w = np.array([1,-1]).reshape(-1,1)
b = -3
#Plot
fig = plt.figure(figsize = (10,10))
plt.scatter(x_neg[:,0], x_neg[:,1], marker = 'x', color = 'r', label = 'Negative -1')
plt.scatter(x_pos[:,0], x_pos[:,1], marker = 'o', color = 'b',label = 'Positive +1')
plt.plot(x1, x1 - 3, color = 'darkblue')
plt.plot(x1, x1 - 7, linestyle = '--', alpha = .3, color = 'b')
plt.plot(x1, x1 + 1, linestyle = '--', alpha = .3, color = 'r')
plt.xlim(0,10)
plt.ylim(-5,5)
plt.xticks(np.arange(0, 10, step=1))
plt.yticks(np.arange(-5, 5, step=1))
#Lines
plt.axvline(0, color = 'black', alpha = .5)
plt.axhline(0,color = 'black', alpha = .5)
plt.plot([2,6],[3,-1], linestyle = '-', color = 'darkblue', alpha = .5 )
plt.plot([4,6],[1,1],[6,6],[1,-1], linestyle = ':', color = 'darkblue', alpha = .5 )
plt.plot([0,1.5],[0,-1.5],[6,6],[1,-1], linestyle = ':', color = 'darkblue', alpha = .5 )
#Annotations
plt.annotate(s = '$A \ (6,-1)$', xy = (5,-1), xytext = (6,-1.5))
plt.annotate(s = '$B \ (2,3)$', xy = (2,3), xytext = (2,3.5))#, arrowprops = {'width':.2, 'headwidth':8})
plt.annotate(s = '$2$', xy = (5,1.2), xytext = (5,1.2) )
plt.annotate(s = '$2$', xy = (6.2,.5), xytext = (6.2,.5))
plt.annotate(s = '$2\sqrt{2}$', xy = (4.5,-.5), xytext = (4.5,-.5))
plt.annotate(s = '$2\sqrt{2}$', xy = (2.5,1.5), xytext = (2.5,1.5))
plt.annotate(s = '$w^Tx + b = 0$', xy = (8,4.5), xytext = (8,4.5))
plt.annotate(s = '$(\\frac{1}{4},-\\frac{1}{4}) \\binom{x_1}{x_2}- \\frac{3}{4} = 0$', xy = (7.5,4), xytext =
(7.5,4))
plt.annotate(s = '$\\frac{3}{\sqrt{2}}$', xy = (.5,-1), xytext = (.5,-1))
#Labels and show
plt.xlabel('$x_1$')
plt.ylabel('$x_2$')
plt.legend(loc = 'lower right')
plt.show()
/
Implementing the SVM algorithm (Hard margin)
Case 1) Linearly separable, binary classification
Using the notation and steps provided by Tristan Fletcher
(https://static1.squarespace.com/static/58851af9ebbd1a30e98fb283/t/58902fbae4fcb5398aeb7505/1485844411772/SVM+Explained.pdf)
the general steps to solve the SVM problem are the following:
Create P where Hi,j = y
(i)
y
(j)
< x
(i)
x
(j)
>
m
Calculate w = ∑i y αi x(i) (i)
Determine the set of support vectors S by finding the indices such that αi > 0
Calculate the intercept term using b = y − ∑m∈S αm y (s) (m)
< x
(m)
x
(s)
>
For each new point x ′ classify according to y ′ = sign(wT x ′ + b)
Re-writing the problem in an appropriate format
Since we will solve this optimization problem using the CVXOPT (https://cvxopt.org/userguide/coneprog.html#quadratic-
programming) library in python we will need to match the solver's API which, according to the documentation is of the form:
1 T T
min x Px + q x
2
s. t. Gx ≤ h
Ax = b
With API
/
cvxopt.solvers.qp(P, q[, G, h[, A, b[, solver[, initvals]]]])
Recall that the dual problem is expressed as:
m m
1 (i) (j) (i) (j)
max ∑ α i − ∑y y αi αj < x x >
α 2
i i,j
Let H be a matrix such that Hi,j = y
(i)
y
(j)
< x
(i)
x
(j)
> , then the optimization becomes:
m
1 T
max ∑ α i − α Hα
α 2
i
s. t. α i ≥ 0
m
(i)
∑ αi y = 0
We convert the sums into vector form and multiply both the objective and the constraint by −1 which turns this into a
minimization problem and reverses the inequality
1 T T
min α Hα − 1 α
α 2
s. t. − α i ≤ 0
T
s. t. y α = 0
We are now ready to convert our numpy arrays into the cvxopt format, using the same notation as in the documentation this
gives
P := H a matrix of size m × m
⃗
q := −1 a vector of size m × 1
G := −diag[1] a diagonal matrix of -1s of size m × m
⃗
h := 0 a vector of zeros of size m × 1
A := y the label vector of size m × 1
b := 0 a scalar
Note that in the simple example of m = 2 the matrix G and vector h which define the constraint are
−1 0 0
G = [ ] and h = [ ]
0 −1 0
Computing the matrix H in vectorized form
Consider the simple example with 2 input samples {x
(1)
,x
(2)
} ∈ R
2
which are two dimensional vectors. i.e.
(1) (1)
(1) T
x = (x ,x )
1 2
(1) (1)
(1)
x x y
1 2
X = [ ] y = [ ]
(2) (2) (2)
x x y
1 2
We now proceed to creating a new matrix X ′ where each input sample x is multiplied by the corresponding output label y. This
can be done easily in Numpy using vectorization and padding.
(1) (1)
(1) (1)
x y x y
′ 1 2
X = [ ]
(2) (2)
(2) (2)
x y x y
1 2
Finally we take the matrix multiplication of X ′ and its transpose giving H = X X
′ ′T
/
(1) (1) (1) (2)
(1) (1) (1) (2)
x y x y x y x y
′ ′T 1 2 1 1
H = X @X = [ ][ ]
(2) (2) (1) (2)
(2) (2) (1) (2)
x y x y x y x y
1 2 2 2
(1) (1) (1) (1) (1) (2) (1) (2)
(1) (1) (1) (1) (1) (2) (1) (2)
x x y y + x x y y x x y y + x x y y
1 1 2 2 1 1 2 2
H = [ ]
(2) (1) (2) (1) (2) (2) (2) (2)
(2) (1) (2) (1) (2) (2) (2) (2)
x x y y + x x y y x x y y + x x y y
1 1 2 2 1 1 2 2
Implementation in Python
CVXOPT solver and resulting α
#Importing with custom names to avoid issues with numpy / sympy matrix
from cvxopt import matrix as cvxopt_matrix
from cvxopt import solvers as cvxopt_solvers
#Initializing values and computing H. Note the 1. to force to float type
m,n = X.shape
y = y.reshape(-1,1) * 1.
X_dash = y * X
H = np.dot(X_dash , X_dash.T) * 1.
#Converting into cvxopt format
P = cvxopt_matrix(H)
q = cvxopt_matrix(-np.ones((m, 1)))
G = cvxopt_matrix(-np.eye(m))
h = cvxopt_matrix(np.zeros(m))
A = cvxopt_matrix(y.reshape(1, -1))
b = cvxopt_matrix(np.zeros(1))
#Setting solver parameters (change default to decrease tolerance)
cvxopt_solvers.options['show_progress'] = False
cvxopt_solvers.options['abstol'] = 1e-10
cvxopt_solvers.options['reltol'] = 1e-10
cvxopt_solvers.options['feastol'] = 1e-10
#Run solver
sol = cvxopt_solvers.qp(P, q, G, h, A, b)
alphas = np.array(sol['x'])
Compute w and b parameters
/
#w parameter in vectorized form
w = ((y * alphas).T @ X).reshape(-1,1)
#Selecting the set of indices S corresponding to non zero parameters
S = (alphas > 1e-4).flatten()
#Computing b
b = y[S] - np.dot(X[S], w)
#Display results
print('Alphas = ',alphas[alphas > 1e-4])
print('w = ', w.flatten())
print('b = ', b[0])
Alphas = [0.0625 0.06249356]
w = [ 0.24999356 -0.25000644]
b = [-0.74996781]
Implementing the SVM algorithm (Soft Margin)
Adding a positive point in the middle of the negative cluster
The data is no longer linearly separable
/
x_neg = np.array([[3,4],[1,4],[2,3]])
y_neg = np.array([-1,-1,-1])
x_pos = np.array([[6,-1],[7,-1],[5,-3],[2,4]])
y_pos = np.array([1,1,1,1])
x1 = np.linspace(-10,10)
x = np.vstack((np.linspace(-10,10),np.linspace(-10,10)))
fig = plt.figure(figsize = (10,10))
plt.scatter(x_neg[:,0], x_neg[:,1], marker = 'x', color = 'r', label = 'Negative -1')
plt.scatter(x_pos[:,0], x_pos[:,1], marker = 'o', color = 'b',label = 'Positive +1')
plt.plot(x1, x1 - 3, color = 'darkblue', alpha = .6, label = 'Previous boundary')
plt.xlim(0,10)
plt.ylim(-5,5)
plt.xticks(np.arange(0, 10, step=1))
plt.yticks(np.arange(-5, 5, step=1))
#Lines
plt.axvline(0, color = 'black', alpha = .5)
plt.axhline(0,color = 'black', alpha = .5)
plt.xlabel('$x_1$')
plt.ylabel('$x_2$')
plt.legend(loc = 'lower right')
plt.show()
#New dataset (for later)
X = np.array([[3,4],[1,4],[2,3],[6,-1],[7,-1],[5,-3],[2,4]] )
y = np.array([-1,-1, -1, 1, 1 , 1, 1 ])
/
Case 2) Non fully linearly separable, binary classification
For the softmax margin SVM, recall that the optimization problem can be expressed as
m
1 T
max ∑ α i − α Hα
α 2
i
s. t. 0 ≤ α i ≤ C
m
(i)
∑ αi y = 0
Which can be written in standard form as
1 T T
min α Hα − 1 α
α 2
s. t. − α i ≤ 0
αi ≤ C
T
y α = 0
This is almost the same problem as previously, except for the additional inequality constraint on α . We translate this new
constraint into standard form by concatenating below matrix G a diagonal matrix of 1s of size m × m . Similarly for the vector h
to which the value of C is added m times.
Note that in the simple example of m = 2 the matrix G and vector h which define the constraint are
/
−1 0 0
⎡ ⎤ ⎡ ⎤
⎢ 0 −1 ⎥ ⎢ 0 ⎥
G = ⎢ ⎥ and h = ⎢ ⎥
⎢ 1 0 ⎥ ⎢ C ⎥
⎣ ⎦ ⎣ ⎦
0 1 C
CVXOPT Solver with the new constraint
#Initializing values and computing H. Note the 1. to force to float type
C = 10
m,n = X.shape
y = y.reshape(-1,1) * 1.
X_dash = y * X
H = np.dot(X_dash , X_dash.T) * 1.
#Converting into cvxopt format - as previously
P = cvxopt_matrix(H)
q = cvxopt_matrix(-np.ones((m, 1)))
G = cvxopt_matrix(np.vstack((np.eye(m)*-1,np.eye(m))))
h = cvxopt_matrix(np.hstack((np.zeros(m), np.ones(m) * C)))
A = cvxopt_matrix(y.reshape(1, -1))
b = cvxopt_matrix(np.zeros(1))
#Run solver
sol = cvxopt_solvers.qp(P, q, G, h, A, b)
alphas = np.array(sol['x'])
#==================Computing and printing parameters===============================#
w = ((y * alphas).T @ X).reshape(-1,1)
S = (alphas > 1e-4).flatten()
b = y[S] - np.dot(X[S], w)
#Display results
print('Alphas = ',alphas[alphas > 1e-4])
print('w = ', w.flatten())
print('b = ', b[0])
Alphas = [ 5. 6.3125 1.3125 10. ]
w = [ 0.25 -0.25]
b = [-0.75]
Comparing to Sklearn results
/
clf = SVC(C = 10, kernel = 'linear')
clf.fit(X, y.ravel())
print('w = ',clf.coef_)
print('b = ',clf.intercept_)
print('Indices of support vectors = ', clf.support_)
print('Support vectors = ', clf.support_vectors_)
print('Number of support vectors for each class = ', clf.n_support_)
print('Coefficients of the support vector in the decision function = ', np.abs(clf.dual_coef_))
w = [[ 0.25 -0.25]]
b = [-0.75]
Indices of support vectors = [0 2 3 6]
Support vectors = [[ 3. 4.]
[ 2. 3.]
[ 6. -1.]
[ 2. 4.]]
Number of support vectors for each class = [2 2]
Coefficients of the support vector in the decision function = [[ 5. 6.3125 1.3125 1
0. ]]
Sources and further reading
https://pythonprogramming.net/soft-margin-kernel-cvxopt-svm-machine-learning-tutorial/ (https://pythonprogramming.net/soft-
margin-kernel-cvxopt-svm-machine-learning-tutorial/)
http://goelhardik.github.io/2016/11/28/svm-cvxopt/ (http://goelhardik.github.io/2016/11/28/svm-cvxopt/)
https://cvxopt.org/userguide/coneprog.html#cvxopt.solvers.coneqp
(https://cvxopt.org/userguide/coneprog.html#cvxopt.solvers.coneqp)
Comments
/
xavierbourretsicotte.github.io Comment Policy
All comments are welcome, feel free to contact me to discuss data, learning and maths
2 Comments xavierbourretsicotte.github.io 🔒 Disqus' Privacy Policy
1 Login
Recommend 7 t Tweet f Share Sort by Newest
Join the discussion…
LOG IN WITH
OR SIGN UP WITH DISQUS ?
Name
Sai Durga • 6 months ago
Is CVXOPT a QP solver which is not implemented under the SVM algorithm hood? If yes, then
what QP solver is used in SVM?
△ ▽ • Reply • Share ›
Xavier Mod > Sai Durga • 3 months ago
SVM requires an optimization algorithm, but not necessarily a QP or any particular type
of solver. As long as your solver finds the optimal solutions given the constraints then
you are OK
△ ▽ • Reply • Share ›
Navigation
Data Blog ()
About me (/pages/about-me.html)
Categories (/pages/categories.html)
atom (/feeds/all.atom.xml)
Author
github (https://github.com/xavierbourretsicotte)
linkedin (https://www.linkedin.com/in/xavier-bourret-sicotte/)
stackexchange (https://stats.stackexchange.com/users/192854/xavier-bourret-sicotte)
Categories
Machine Learning (19) (/category/machine-learning.html)
Statistics (3) (/category/statistics.html)
Links
Centre National des Arts et Metiers (http://www.cnam.fr/portail/conservatoire-national-des-arts-et-metiers-821166.kjsp)
/
© Xavier Bourret Sicotte 2016
Powered by Pelican