SVM in Matlab
SVM in Matlab
08/10/2003
3 3 3
MATRICES, VECTORS AND CONSTANTS ........................................................................................ 4 COPY DATA SETS AND PROGRAMS INTO LOCAL DIRECTORY ...................................................... 4 CLASSES REPRESENTED AS 1 FOR SVMS .................................................................................. 4 RETURN OF THE HYPER-PLANE ..................................................................................................... 4 WHY ARE OPTIMAL SEPARATING HYPER-PLANE SO GOOD?....................................................... 5 THE SVM OPTIMISATION PROBLEM (PRIMARY SEPARABLE CASE) ........................................... 5 INTUITION OF THE MAIN TERM OF SVM OPTIMISATION ............................................................ 5 INTUITION OF MAIN TERM AND SIDE CONSTRAINTS .................................................................... 6 THE QUADPROG FUNCTION............................................................................................................... 6 ENCODING SVM OPTIMISATION PROBLEMS AS MATRICES AND VECTORS ................................ 6 CONCATENATION OF MATRICES ................................................................................................... 7 CREATING DIAGONAL MATRICES ................................................................................................. 8 CREATING IDENTITY MATRICES ................................................................................................... 8 USING QUADPROG TO SOLVE SVM OPTIMISATION PROBLEMS ..................................................... 8 ADD THE CODE TO FIND THE SVM OPTIMAL SEPARATING HYPERPLANE ................................. 9 PLOT THE OPTIMAL SEPARATING HYPERPLANE AND MARGINS ............................................... 10 PLOTTING NEW TEST DATA POINTS FOR CLASSIFICATION ....................................................... 11 THE FINISHED GRAPH PLOT ADDITIONAL TASKS 13 14
L L
www.david-lindsay.co.uk
Page 2 of 17
08/10/2003
Session Plan
This work shop is designed to revise the theory and practice of Support Vector Machines (SVM) as well as introduce important functionality in MATLAB. The text will be broken down into sections clearly marked using the symbols given below. Read through the exercise and try to answer the questions the best you can amongst yourselves. We have left space at the bottom of each question to write notes/answers.
This is a key concept that must be understood, if you dont then you must ask! This indicates a question or exercise that needs to be completed. This indicates a task that you need to do as instructed in the exercise. This is a helpful tip or piece of useful information.
L
Overview:
Today we will continue our analysis of the tennisweather data set that we analysed in the previous MATLAB workshop. We will implement an SVM on the data and will demonstrate practically how to classify new examples. I will build upon the MATLAB programming techniques introduced in the previous session and introduce more advanced functionality where needed. This session is designed as an informal practical, you can work in groups and chat amongst yourselves whilst working through the exercises provided. I encourage you to work through the questions whilst you can so that I may help you with any problems. I have tried to make this session light hearted and highlight the key concepts with a
symbol. If
you need help at any time, please either refer to your notes, the MATLAB help guide or ask me or Tony! I will hand out the solution code to this exercise at the end of this session.
Learning outcomes
1. Review some of the basic concepts of SVMs 2. Look at how SVMs are applied to real life data 3. Learn more advanced manipulations and creation of matrices in MATLAB such as concatenation, creating diagonal and identity matrices. 4. How to classify new test data using SVMs 5. How to formulate the SVM primal separable problem as a quadratic optimisation problem. 6. How to implement SVMs in MATLAB using the quadprog function
www.david-lindsay.co.uk
Page 3 of 17
08/10/2003
I will try to use consistent notation to distinguish between matrices, vectors and constants throughout this document. I will usually use capital bold notation for matrices (e.g. , , C,K ), lower case bold for vectors (e.g. a, b, c,K ) and italicised for constants (e.g. a, b, c,K ). It is very important that you can distinguish between these mathematical objects!
As stated in the previous workshop we have created for you a shared directory of resources under the /CS/courses/CS392/ directory Copy the files that you need for this session into your local directory cp /CS/courses/CS392/ex2SVM/* . You should find the files firstsvm1.m, tennisweather2AttSVMTrain.txt and tennisweather2AttSVMTest.txt in your home directory. The file firstsvm1.m is a slightly modified version of the program that you developed in workshop 1. Open this file for editing by typing emacs firstsvm1.m &
The two dataset files that you have copied is the same data as you worked with in the previous workshop on assessing good tennis playing days. Look at the data, you will notice that I have changed the classes to 1 so that it can be analysed by an SVM directly. It will become clearer later why it is useful to encode the classes in this manner. We have also separated the last example from the original data to make a separate training and test file.
One of the most important concepts to get your head around is that of a hyper-plane in n i dimensional space [see Class 2 Slide 8]! We consider our attribute vectors x i = x1i ,K , xn as the n dimensionals vectors for the ith example in the data set. We have a a training set of l examples (information pairs attribute vector + label) as x1 , y1 K , x1 , y l where the labels y i { 1,1}. The equation of a hyperplane H is H = w0 x0 + w1 x1 + L + wn xn + b = (w x) + b = 0 using the dot product notation. The hyperplane is defined by the weights which are contained in the n dimensional vector w and the constant b.
{(
) (
)}
www.david-lindsay.co.uk
Page 4 of 17
08/10/2003
If the data is separable then there are infinitely many possible separating hyperplanes that would split the data!
To get an understanding of why finding the largest separating hyper-plane is a good idea consider the above example. The separating hyper-plane on the right drives a larger wedge between the data, than the one on the left. We would hope that this decision rule would give better generalisation of the data than the other. The separating hyperplane (centre of the wedge) has the equation H = 0 , whereas the margins hyperplanes (the upper and lower planes surrounding the wedge) have equations H = 1 .
As specified in the notes [Class 2 Page 12] our SVM in the primal separable case can be formulated as the following optimisation problem: 1 Minimise ( w w ) 2 Subject to the constraints y i (w xi + b) 1 for 1 i l
1 (w w ) ? Well without going into too 2 much detail to calculate the distance of the ith examples attribute vector x i from the hyperplane H like so: w xi d ( xi , H ) = w
There for if we are maximising the weight vector w then we are minimising w which is in turn maximising the distance of the example xi from the hyperplane (this maximising the margin). Note: we are minimising a vector w therefore we are actually minimising n values!
www.david-lindsay.co.uk
Page 5 of 17
08/10/2003
In brief the side constraints are making sure the positive examples xi (which have label y i = 1 ) lie above the upper margin, and negative examples xi (which have label y i = 1 ) lie below the lower margin. For example y i = 1 gives w x i + b 1 which means example xi lies above the upper margin hyperplane H = 1 . Alternatively if we have a negative example y i = 1 then we get (w xi + b) 1 (w xi + b) 1 therefore the example xi lies below the hyperplane H = 1 . Note: We have a side constraint for every example in the data set, therefore we have l side constraints making sure examples lie on the right sides of the hyperplanes! In short we are optimising n variables with l side constraints.
To run an SVM in MATLAB you will have to use the quadprog function to solve the optimisation problem. For the CS392 course we will use this optimisation tool like a black box, and not give consideration to how it finds solutions (this is not the CS349 Computational Optimisation course). You must try and understand the input arguments and the output this function returns [see Class 2a Page 2]. The quadprog function solves generic quadratic programming optimisation problems of the form: 1 Minimise x ' Hx + fx 2 Subject to the constraints Ax c This minisation problem solves for the vector x . Do not confuse the notation with that of the SVM ( x above is not the same as the attribute vectors) we could use any letters to denote these matrices and vectors! The quadprog program does not only just solve SVM problems, it just happens that SVMs can be formulated as a quadratic programming problem (which quadprog can solve easily). The first step to solving our problem, is to encode it using the matrices H , A and vectors f , c as we shall see in the next section! Once we have created the matrices and vectors (H,A,f,c) quadprog function can be used like so: x = quadprog(H,f,A,c) which will return the optimal values into vector x.
In this section we will explain how to encode the SVM optimisation problem using the matrices ad vector formulation that quadprog likes as input. We can endode the main SVM 1 1 optimisation term ( w w ) = ( w1w1 + w2 w2 + L + wn wn ) min like so: 2 2
I 0 w 1 ( w b) min 2 0 0 b 1 0 b) 0 0 w 0 0 0 1 w 1 0 0 2 M min 0 O 0 wn 0 0 0 b
1 ( w1 2
w2 K wn
Page 6 of 17
08/10/2003
Similarly we can encode the l side constraints y i (w xi + b) 1 for 1 i l by expanding the notation [see Class 2a, Page 3]
y1 0 0 0 0 y2 0 0 x1 0 1 2 x1 0 0 3 x O 0 1 M 0 yl l x1 0 x1 2 2 x2 3 x2 M l x2 L L L M L
1 xn 2 xn 3 xn O l xn
Note: You should always check that the dimensions of your matrices and vectors match up so that the multiplications work as you want. You read the dimensions as the number of rows number of columns (as shown with the red brackets and lines on the left).
1 w1 1 1 w2 1 1 M M 1 wn 1 1 b
(n + 1) 1
l 1
l l
l (n + 1) l (n + 1)
l 1
The dimensions of matrices must match in the sense that the number of columns of the first matrix must be the same as the number of rows in the second matrix (e.g. the multiplication of two matrices (l n) (n m) would give a new matrix with dimensions (l m) ).
VUse the same technique of checking that dimensions of matrices match on the main
optimisation term
I 0 w 1 ( w b) min mentioned earlier. What are the dimensions 2 0 0 b of the resulting matrix?
Concatenation of matrices
One very useful (but confusing) technique is to concatentate matrices. Using the square brackets [ ]to signify we are constructing a matrix or vector. If we separate elements in the matrix by a space we signify continuing a row; if we separate elements in the matrix using a semi colon ; we signify a new column is about to start. For example if we had four vectors A = [1 2 3], B = [4 5; 7 8], C = [6; 9] we create the following matrices in MATLAB 4 5 6 A = (1 2 3) , B = , C = 7 8 9 If we use the concatenation to create a 3 3 matrix using notation D = [A; B C] 1 2 3 D = 4 5 6 7 8 9
This can be interpreted as make A the first row, then make B and C the second row of the new matrix D. You must be very careful that your dimensions match (use the size function to check).
David Lindsay 2003 www.david-lindsay.co.uk Page 7 of 17
08/10/2003
To create a square matrix with specific diagonal elements MATLAB has created the very useful diag function. This function takes a row ( d 1 ) or column ( 1 d ) vector as an input and places its elements as the diagonal elements of a d d matrix. For example if we defined a vector A = [1 2 3 4] we could create a matrix B = diag(A) like so: 1 0 0 0 0 2 0 0 B= 0 0 3 0 0 0 0 4 Notice that all the off diagonal elements are zero, which is very useful when formulating multiplications in matrix form!
One very useful matrix is the identity matrix which is a square matrix with diagonal elements = 1 and off diagonal elements = 0. We can create identity matrices of any square dimension d using the function eye(d), and have the useful property of not having any effect when multiplied by another matrix. 1 0 0 For example to create a 3 3 identity matrix 0 1 0 by typing eye(3). 0 0 1
As mentioned earlier, to use x=quadprog(H,f,A,c) to solve our SVM optimisation problem we must construct the matrices and vectors H, f, A and c like so: w 1) The vector x = returned by quadprog is our weight vector that we are seeking that b defines the optimal hyperplane. I 0 w 1 ( w b) min can be formulated with 2 0 0 b 0 In 0 0 H= where I n is a n n identity matrix, and f = is a (n+1) column vector. M 0 0 0 Sticking the extra 0 in the bottom right corner of the identity matrix makes sure we do not minimise only the weight vector w and not the constant b . The matrix H can be constructed using the eye(n+1) function and f using zeros(n+1,1) which is a n+1 column vector of zeros.
www.david-lindsay.co.uk
Page 8 of 17
08/10/2003
3) Finally our constraints are constructed using the matrices A and c like so: A = diag(Y)[X ones(l,1)] and c = ones(l,1) Note: we put an extra minus sign in front of the matrix and vector c because quadprog solves optimisation problems which are of the form Ax c and our SVM constraints 1 y1 x 1 w diag M M M ( 1) are of the opposite equality Ax c . y xl 1 b l
In this exercise you will add the following code to the program file firstsvm1.m that you should have open in your user area using emacs (or any other editor that you prefer!). Add the code in underneath the comments:
%%% Code to find the SVM hyperplane will go here! %%%%
Run the program by opening up MATLAB and typing firstsvm1 in the Command Window.
VCreate the main SVM optimisation term matrices H and f using the variable
numOfAttributes
VCreate the SVM side constraint matrices A and c using labels vector Y, augmented
feature matrix Z and numOfExamples
VPlug all these matrices into quadprog and return the values into a weight vector w.
www.david-lindsay.co.uk
Page 9 of 17
08/10/2003
In this exercise you will add the following code to the program file firstsvm1.m that will plot the separating hyperplanes and respective margins to the data. Make sure you insert your code in between the comments provided to make your code easy to read! Now that quadprog has found the optimal weight vector w you will need to extract the relevant components to plot the straight line. (Refer to previous workshop for detail on graph plotting).
VCreate variables and extract the weights w1, w2 and b from the vector w
+ b = 0 to get x 2 in terms of x1 , w1 and w2 . This is the equation of the separating hyper-plane (this is the centre of the wedge or sandwich filling). Note: You are trying to get the equation in the y = mx + c .
1 1 2 2
VRearrange w x + w x
w1 x1 + w2 x 2 + b = 1 and w1 x1 + w2 x 2 + b = 1 in the same way. This is the equation of the margin hyper-planes (edges of the wedge)
VRearrange
VUsing the previous questions create a vector Y1 to plot the corresponding vertical values
for the separating hyper-plane using the range X1 created earlier.
www.david-lindsay.co.uk
Page 10 of 17
08/10/2003
VCreate a further two plots in a similar manner for the margins using vectors YUP and
YLOW to define the upper and lower margins. Plot the margins using a magenta dotted line.
Once we have plotted the separating hyperplane we must remember why we were doing this whole messy process in the first place, to classify new test examples. In this section we will add new test data points to our graph (your plot should look very close to that on page 13).
VRead in the data for new test examples into a matrix TESTDATA from
tennisweather2AttSVMTest.txt which is a comma delimited file using the dlmread function.
www.david-lindsay.co.uk
Page 11 of 17
08/10/2003
VEdit the legend to label the separating hyperplanes, margins and new test data
= (68,90) , x1 = (90,75) and x1 = (90,90) to the test file tennisweather2AttSVMTest and re-plot the data.
1
www.david-lindsay.co.uk
Page 12 of 17
08/10/2003
www.david-lindsay.co.uk
Page 13 of 17
08/10/2003
Additional Tasks
I cannot recommend how important it is to just mess around with functions in MATLAB. Try writing programs to tackle fun little problems. Practise is the best way to learn any programming language. Here are a list of possible task that you do to improve your background knowledge to this subject. Use the MATLABs help to find out about other functions, concentrate on the arguments and return values of a function.
How would you define a hyperplane in 3 dimensions? Rearrange the formula
www.david-lindsay.co.uk
Page 14 of 17
08/10/2003
References
Websites
http://www.math.ohiou.edu/~vnguyen/papers/Introduction%20to%20Support%20Vector%20 Machines.pdf This pdf document gives a tutorial on SVMs, there are many others out there! http://www.kernel-machines.org/ This website has lots of data, software, and papers on kernel based learning algorithms such as SVMs. http://www.ics.uci.edu/~mlearn/MLRepository.html This website contains lots of freely downloadable data sets (including the tennisweather data set you have worked with!) and has a mixture of pattern recognition and regression problems. http://www.mlnet.org/ This website has many resources for people in machine learning including datasets, jobs, links to research groups etc.
Books
www.david-lindsay.co.uk
Page 15 of 17
08/10/2003
www.david-lindsay.co.uk
Page 16 of 17
08/10/2003
% Add some nice titles to the graph title('Days to play tennis?') xlabel('Temperature') ylabel('Humidity') legend('Yes days','No days','SVM Hyperplane','Upper Margin','Lower Margin','New Test Points') hold off
www.david-lindsay.co.uk
Page 17 of 17