0% found this document useful (0 votes)
9 views55 pages

8 SVM

The document provides an overview of Support Vector Machines (SVM) and their functionality in classifying data using hyperplanes. It explains concepts such as vectors, vector normalization, inner products, and the limitations of perceptrons, leading to the introduction of SVM as a method that maximizes the margin between classes. The objective of SVM is to select a hyperplane that not only separates the data but also maximizes the distance from the nearest data points, known as support vectors.

Uploaded by

rhzx3519
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views55 pages

8 SVM

The document provides an overview of Support Vector Machines (SVM) and their functionality in classifying data using hyperplanes. It explains concepts such as vectors, vector normalization, inner products, and the limitations of perceptrons, leading to the introduction of SVM as a method that maximizes the margin between classes. The objective of SVM is to select a hyperplane that not only separates the data but also maximizes the distance from the nearest data points, known as support vectors.

Uploaded by

rhzx3519
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Support Vector Machines

CSIT375/975 AI for Cybersecurity


Dr Chen Chen, Dr Thanh Le
SCIT University of Wollongong

Disclaimer: The presentation materials


come from various sources. For further
information, check the references section
Vectors

• A vector can be visually represented as an arrow


Head
4
𝑣⃗ =
3
𝑣⃗ 3

Tail Or: 𝑣⃗ = (4, 3)


4
• A vector can also be represented as an ordered list or a tuple by starting
from its tail and asking how far away is its head in the:
• Horizontal direction
• Vertical direction

3
Unit Vectors

• Any vector can also be represented as a sum of scaled up versions of


specific unit vectors It goes in the horizontal
1 direction only and has
𝚤̂ =
𝑣⃗ 0 length 1
𝑣⃗ = 4
3
3 𝚥̂ 0 It goes in the vertical
𝚤̂ 4 𝚥̂ =
1 direction only and has
length 1

1 0 4 0 4
𝑣⃗ = 4×𝚤̂ + 3×𝚥̂ = 4 +3 = + =
0 1 0 3 3

4
Vector Magnitude

• How can we calculate the length (or magnitude) of a vector?

Pythagorean theorem:
c = 𝑣⃗ 𝒂𝟐 + 𝒃𝟐 = 𝒄𝟐 è 𝐜 = 𝒂𝟐 + 𝒃𝟐
b=3

a= 4

Magnitude of the vector = 𝑣⃗ = 3$ + 4$ = 25 = 5

How can we keep vector, 𝑣,


⃗ pointing to the same direction, but change its magnitude
to 1 (i.e., turn it into a unit vector)?
5
Vector Normalization

• How can we construct a unit vector out of a given vector of any length?

Input: Output:

Vector of The vector with the same direction,


Normalization
any length but with length 1

6
Vector Normalization

• How can we construct a unit vector out of a given vector of any length?

Input: Output:

𝑣⃗
"=
𝒗
𝒗 Normalization: "
𝒗 𝒗
𝒗
𝒗

7
Vector Normalization

• How can we construct a unit vector out of a given vector of any length?

𝑣⃗
3 𝑣⃗ 4/5
𝑣⃗ = 4 "
𝒗 "=
𝒗 =
3 𝒗 3/5
4
Let us verify that its length is 1
⃗ = 3$ + 4$ = 25 = 5
‖𝑣‖

8
Vector Normalization

• How can we construct a unit vector out of a given vector of any length?

𝑣⃗
3 𝑣⃗ 4/5
𝑣⃗ = 4 "
𝒗 "=
𝒗 =
3 𝒗 3/5
4
& (
⃗ = 3$ + 4$ = 25 = 5
‖𝑣‖ 3 = ( )$+( )$
𝒗 ' '
= (9 + 16)/25 = 1
9
Vector Inner Product

• Assume the following two vectors:


Project 𝑣⃗ onto 𝑢

𝑢1 𝑣1 𝒗𝟐 𝑣⃗
𝑢= 𝑣⃗ =
𝑢2 𝑣2 𝒖𝟐
𝑢

p
• What is the inner product of 𝑢 and 𝑣?
⃗ 𝒗𝟏 𝒖𝟏

𝑢! 𝑣⃗ = ?

p is the length of the projection of 𝑣⃗ onto 𝑢

10
Vector Inner Product

• Assume the following two vectors:


Project 𝑣⃗ onto 𝑢

𝑢1 𝑣1 𝒗𝟐 𝑣⃗
𝑢= 𝑣⃗ =
𝑢2 𝑣2 𝒖𝟐
𝑢

p
• What is the inner product of 𝑢 and 𝑣?
⃗ 𝒗𝟏 𝒖𝟏

𝑢! 𝑣⃗ = 𝑝 × 𝑢

! 𝑣1
𝑢 𝑣⃗ = 𝑢1 𝑢2
𝑣2 p is the length of the projection of 𝑣⃗ onto 𝑢
= 𝑢1×𝑣1 + 𝑢2×𝑣2 Can be signed
11
Vector Inner Product

• Assume the following two vectors:

𝑢1 𝑣1
𝑢= 𝑣⃗ =
𝑢2 𝑣2 Project 𝑣⃗ onto 𝑢 𝑢
𝑣⃗

• What is the inner product of 𝑢 and 𝑣?


⃗ p 𝒖𝟏

𝑢! 𝑣⃗ = 𝑝 × 𝑢

! 𝑣1
𝑢 𝑣⃗ = 𝑢1 𝑢2
𝑣2 p is negative here
= 𝑢1×𝑣1 + 𝑢2×𝑣2
12
Recall: Perceptron
• Given a training set (𝑥 # , 𝑦# ), (𝑥 $ , 𝑦 $ ), . . . , (𝑥 % , 𝑦 % )
• {𝑥 ! , 𝑥 " , …, 𝑥 # } : data set, and 𝑦 $ Î {1,-1} be the class label
• Perceptron
• a binary classifier: maps its input to an output value 𝑓(𝐱):

+1, 𝑖𝑓 𝒘𝑻 𝒙 + 𝑏 ≥ 0
𝑓(𝒙) = )
−1, 𝑖𝑓 𝒘𝑻 𝒙 + 𝑏 < 0

• parameters: 𝒘 is a vector of weights, b is the bias


• linear classifier: a classification algorithm that makes its predictions based on a linear
function combining a set of weights with the feature vector.
• find a hyperplane 𝒘𝑻 𝒙 + 𝑏 = 0 that minimizes the number of misclassified points by
varying 𝒘 and b
13
Perceptron: Intuition

• The way a perceptron works is by learning a hyperplane that clearly


separates examples into two classes

A perceptron divides a space by a


hyperplane into two half-spaces

wTx + b = 0

14
Perceptron: Intuition

• The way a perceptron works is by learning a hyperplane that clearly


separates examples into two classes

This entails that the space has to


be linearly separable (or otherwise,
will not be able to correctly
classify all examples)

15
Perceptron: Intuition

• The way a perceptron works is by learning a hyperplane that clearly


separates examples into two classes

NOT a linearly separable space,


hence, perceptron will not
be effective!

16
Perceptron: Intuition

• The way a perceptron works is by learning a hyperplane that clearly


separates examples into two classes

A linearly separable space and


a workable perceptron

17
Perceptron: Intuition

• The way a perceptron works is by learning a hyperplane that clearly


separates examples into two classes

A linearly separable space and


another workable perceptron!

18
Perceptron: Intuition

• The way a perceptron works is by learning a hyperplane that clearly


separates examples into two classes

Yet, another valid hyperplane


that can be learnt by a perceptron!

If there are many hyperplanes, the


perceptron will converge to one of
them & classify correctly all examples

19
Perceptron: Intuition

• The way a perceptron works is by learning a hyperplane that clearly


separates examples into two classes

Any of these would be


fine..

..but which is best?

20
Limitations of Perceptron

• Perceptrons exhibit various limitations in their ability to classify data


• There could be many hyperplanes, which are not all equally good

An acceptable hyperplane and the


new example indicated by “?” will be
? classified as a square

21
Limitations of Perceptron

• Perceptrons exhibit various limitations in their ability to classify data


• There could be many hyperplanes, which are not all equally good

Another acceptable hyperplane,


but the new example will now be
? classified as a circle (although it
seems closer to squares!)

22
Limitations of Perceptron

• Perceptrons exhibit various limitations in their ability to classify data


• another problem is that perceptrons usually stop as soon as there are no
misclassified examples
This hyperplane just managed to
accommodate the two squares
it touches before stopped

23
Limitations of Perceptron

• Perceptrons exhibit various limitations in their ability to classify data


• another problem is that perceptrons usually stop as soon as there are no
misclassified examples
This hyperplane also just managed
to accommodate the two circles
it touches before stopped

24
Limitations of Perceptron

• Perceptrons exhibit various limitations in their ability to classify data


• another problem is that perceptrons usually stop as soon as there are no
misclassified examples
If either of these hyperplanes
represents the final weight vector,
the weights will be biased toward
one of the classes

25
Limitations of Perceptron

• Perceptrons exhibit various limitations in their ability to classify data


• another problem is that perceptrons usually stop as soon as there are no
misclassified examples
For example, if this hyperplane is the
one that the perceptron chooses, the
? example indicated by “?” will be
classified as a circle

26
Limitations of Perceptron

• Perceptrons exhibit various limitations in their ability to classify data

Any of these would be


fine..

..but which is best?

27
Support Vector Machines

• Linear Support Vector Machines (Linear SVM)


• Maximum margin linear classifier

SVM selects one particular


hyperplane (i.e. decision boundary –
the green line in the figure) that not
only separates the examples into two
classes, but does so in a way that
maximizes the margin: which is measured
𝛾
by the distance between the hyperplane
and the closest examples of the training set
𝛾
Support Vectors: subset of the data which
28
defines the position of the separator
The Objective of SVM

• The objective of an SVM is to select a hyperplane 𝒘𝑻 𝒙 + 𝑏 = 0 that


maximizes the distance, 𝛾, between the hyperplane and examples in the
training set
Intuitively, we are more certain of
the class of examples that are far
from the separating hyperplane
than we are of examples near to
that hyperplane
𝛾

𝛾
wTx + b = 0 29
The Objective of SVM

• The objective of an SVM is to select a hyperplane 𝒘𝑻 𝒙 + 𝑏 = 0 that


maximizes the distance, 𝛾, between the hyperplane and examples in the
training set
Thus, it is desirable that all the
training examples be as far from
the hyperplane as possible
(but on the correct side of that
hyperplane, of course!)
𝛾

𝛾
wTx + b = 0 30
Finding the Decision Boundary

• Given a training set (𝑥 $ , 𝑦$ ), (𝑥 % , 𝑦 % ), . . . , (𝑥 & , 𝑦 & )


• {𝑥 ., 𝑥 $, …, 𝑥 / }: our data set
• 𝑦 0 Î {1,-1}: the class label
• Classifier: f(xi) = sign(wTxi + b)
• w: weight vector
• b: bias
• The SVM decision boundary aims to
• classify all points correctly
• maximize the margin (by varying weight vector w and bias b)
• The decision boundary should be as far away from the data of both
classes as possible
31
Computing the margin width
linearly separable data
Margin = 2
||w||

Support Vector
Support Vector

wTx + b = 1 w

wTx + b = 0

wTx + b = -1
Computing the margin width

• Assume that all data is at least distance 1 from the hyperplane, then
the following two constraints follow
wTxi + b ≥ 1 if yi = +1

wTxi + b ≤ -1 if yi = −1

• For support vectors, the inequality becomes an equality


$
• The margin is: 2𝛾 = ∥𝒘∥
• Maximizing the margin is equivalent to minimizing ∥ 𝒘 ∥

32
Computing the margin width

• How do we compute 𝛾 in terms of w and b?

• Claim: The vector w is perpendicular (orthogonal) to the hyperplane


wTx + b = 0
• If P and Q are in the plane with equation wTx + b = 0 , then wT.P = -b and
wT .Q = -b, so wT(Q - P) = 0.
• This means that the vector w is orthogonal to any vector PQ between points P
and Q of the plane.

33
Computing the margin width

• Consider one of the support vectors,


(say, x2 on the black line) and let x1 be • x1 = x2 + l w for some value
the projection of x2 to the upper
of l
hyperplane
x1 • The line from x2 to x1 is
perpendicular to the planes
x2 • So to get from x2 to x1 travel
some distance in direction w

wTx + b = +1

wTx + b = -1 wTx + b = 0 34
Computing the margin width

x1

x2
What we know:
wTx1 + b = +1
wT x2 + b = -1
x1 = x2 + l w
𝛾 ∥x1 – x2 ∥ = 2𝛾
wTx + b = +1
𝛾 It’s now easy to get 𝛾
wTx + b = -1 wTx + b = 0 in terms of w and b

35
What we know:
Computing the margin width wT x1 + b = +1
wT x2 + b = -1
x1 = x2 + l w
|x1 – x2 | = 2𝛾
x1
Since x1 is on the hyperplane defined
x2 by wT.x + b = +1, we know that wT x1 +
b = 1. If we substitute for x1:
wT x1 + b = 1
𝛾 =>
wTx + b = +1 wT (x2 + l w) + b = 1
𝛾 =>
wTx + b = -1 wTx + b = 0 -1 + l wT w = 1
=>
l = 2/ wT w
36
What we know:
Computing the margin width wT x1 + b = +1
wT x2 + b = -1
x1 = x2 + l w
|x1 – x2 | = 2𝛾
x1

x2 2𝛾 = ∥ x1 – x2 ∥
=∥lw∥
=l ∥ 𝒘 ∥
𝛾
Because l = 2/ wT w
wTx + b = +1
𝛾
=>
𝟐∥𝒘∥ $
2𝛾 = wT w = ∥𝒘∥
wTx + b = -1 wTx + b = 0

37
Computing the margin width

x1
𝛾 = 𝟏⁄∥𝒘∥
x2
Thus, maximizing 𝛾 is the same as
minimizing ∥ 𝒘 ∥
𝛾
wTx + b = +1
𝛾
wTx + b = -1 wTx + b = 0

38
The Objective of SVM

• More formally, the goal of SVM can be stated as follows:

Given a training set (𝑥 ., 𝑦.), (𝑥 $, 𝑦 $), . . . , (𝑥 / , 𝑦 / ), minimize ∥ 𝑤 ∥ (by


varying w and b) subject to the constraint that for all i = 1, 2, . . . , n,

wTxi + b ≥ 1 if yi = +1 But, why would this constraint


serve in materializing large
wTxi + b ≤ -1 if yi = −1 margin classification?

39
The Objective of SVM - Illustration

• More formally, the goal of SVM can be stated as follows:

Given a training set (𝑥 ., 𝑦.), (𝑥 $, 𝑦 $), . . . , (𝑥 / , 𝑦 / ), minimize ∥ 𝑤 ∥ (by


varying w and b) subject to the constraint that for all i = 1, 2, . . . , n,

For illustrative purposes, let us


wTxi + b ≥ 1 if yi = +1
assume only two features (i.e.,
xi = [𝒙𝒊𝟏 , 𝒙𝒊𝟐 ] and w = [w1, w2])
wTxi + b ≤ -1 if yi = −1
and b = 0

40
The Objective of SVM - Illustration

• More formally, the goal of SVM can be stated as follows:

Given a training set (𝑥 ., 𝑦.), (𝑥 $, 𝑦 $), . . . , (𝑥 / , 𝑦 / ), minimize ∥ 𝑤 ∥ (by


varying w and b) subject to the constraint that for all i = 1, 2, . . . , n,

𝒊
What is the inner product
𝒙 of w and xi (i.e., wT xi)?
wTxi + b ≥ 1 if yi = +1 𝒙𝒊𝟐
w
w2
wTxi + b ≤ -1 if yi = −1
𝒙𝒊𝟏 w1

41
The Objective of SVM - Illustration

• More formally, the goal of SVM can be stated as follows:

Given a training set (𝑥 ., 𝑦.), (𝑥 $, 𝑦 $), . . . , (𝑥 / , 𝑦 / ), minimize ∥ 𝑤 ∥ (by


varying w and b) subject to the constraint that for all i = 1, 2, . . . , n,

𝒊
What is the inner product
𝒙 of w and xi (i.e., wT xi)?
wTxi + b ≥ 1 if yi = +1 Project xi onto w w
pi . ∥ 𝒘 ∥
wTxi + b ≤ -1 if yi = −1 pi = w1.𝒙𝒊𝟏 + w2.𝒙𝒊𝟐

42
The Objective of SVM - Illustration

• More formally, the goal of SVM can be stated as follows:

Given a training set (𝑥 ., 𝑦.), (𝑥 $, 𝑦 $), . . . , (𝑥 / , 𝑦 / ), minimize ∥ 𝑤 ∥ (by


varying w and b) subject to the constraint that for all i = 1, 2, . . . , n,

𝒊
What is the inner product
𝒙 of w and xi (i.e., wT xi)?
pi. ∥ 𝒘 ∥ ≥ 1 if yi = +1 Project xi onto w w
pi . ∥ 𝒘 ∥
pi. ∥ 𝒘 ∥ ≤ -1 if yi = −1 pi = w1.𝒙𝒊𝟏 + w2.𝒙𝒊𝟐

43
The Objective of SVM - Illustration

• More formally, the goal of SVM can be stated as follows:

Given a training set (𝑥 ., 𝑦.), (𝑥 $, 𝑦 $), . . . , (𝑥 / , 𝑦 / ), minimize ∥ 𝑤 ∥ (by


varying w and b) subject to the constraint that for all i = 1, 2, . . . , n,

If SVM encounters this green


pi. ∥ 𝒘 ∥ ≥ 1 if yi = +1 w decision boundary, will it choose it?

1 orthogonal to the decision boundary


pi. ∥ 𝒘 ∥ ≤ -1 if yi = −1 p

𝒙𝟏
Project x1 onto w

wT x = 0 for simplicity, assuming b = 44


0
The Objective of SVM - Illustration

• More formally, the goal of SVM can be stated as follows:

Given a training set (𝑥 ., 𝑦.), (𝑥 $, 𝑦 $), . . . , (𝑥 / , 𝑦 / ), minimize ∥ 𝑤 ∥ (by


varying w and b) subject to the constraint that for all i = 1, 2, . . . , n,

If SVM encounters this green


pi. ∥ 𝒘 ∥ ≥ 1 if yi = +1 w decision boundary, will it choose it?

𝒙𝟐 1
p1 is positive and very small, hence,
pi. ∥ 𝒘 ∥ ≤ -1 if yi = −1 p
for p1. ∥ 𝒘 ∥ to be ≥ 1, ∥ 𝒘 ∥ has to
p
2 𝒙𝟏 be very large!

wT x = 0 45
The Objective of SVM - Illustration

• More formally, the goal of SVM can be stated as follows:

Given a training set (𝑥 ., 𝑦.), (𝑥 $, 𝑦 $), . . . , (𝑥 / , 𝑦 / ), minimize ∥ 𝑤 ∥ (by


varying w and b) subject to the constraint that for all i = 1, 2, . . . , n,

If SVM encounters this green


pi. ∥ 𝒘 ∥ ≥ 1 if yi = +1 w decision boundary, will it choose it?

𝒙𝟐 1
But, the optimization objective is to
pi. ∥ 𝒘 ∥ ≤ -1 if yi = −1 p
minimize ∥ 𝒘 ∥, hence, SVM will
p
2 𝒙𝟏 not prefer this decision boundary

wT x = 0 46
The Objective of SVM - Illustration

• More formally, the goal of SVM can be stated as follows:

Given a training set (𝑥 ., 𝑦.), (𝑥 $, 𝑦 $), . . . , (𝑥 / , 𝑦 / ), minimize ∥ 𝑤 ∥ (by


varying w and b) subject to the constraint that for all i = 1, 2, . . . , n,

If SVM encounters this green


pi. ∥ 𝒘 ∥ ≥ 1 if yi = +1 w decision boundary, will it choose it?

𝒙𝟐 1
p2 is negative and very small, hence,
pi. ∥ 𝒘 ∥ ≤ -1 if yi = −1 p
for p2. ∥ 𝒘 ∥ to be ≤ -1, ∥ 𝒘 ∥ has to
p
2 𝒙𝟏 be very large!

wT x = 0 47
The Objective of SVM - Illustration

• More formally, the goal of SVM can be stated as follows:

Given a training set (𝑥 ., 𝑦.), (𝑥 $, 𝑦 $), . . . , (𝑥 / , 𝑦 / ), minimize ∥ 𝑤 ∥ (by


varying w and b) subject to the constraint that for all i = 1, 2, . . . , n,

If SVM encounters this green


pi. ∥ 𝒘 ∥ ≥ 1 if yi = +1 w decision boundary, will it choose it?

𝒙𝟐 1
But, the optimization objective is to
pi. ∥ 𝒘 ∥ ≤ -1 if yi = −1 p
minimize ∥ 𝒘 ∥, thus, again, SVM
p
2 𝒙𝟏 will not prefer this decision boundary

wT x = 0 48
The Objective of SVM - Illustration

• More formally, the goal of SVM can be stated as follows:

Given a training set (𝑥 ., 𝑦.), (𝑥 $, 𝑦 $), . . . , (𝑥 / , 𝑦 / ), minimize ∥ 𝑤 ∥ (by


varying w and b) subject to the constraint that for all i = 1, 2, . . . , n,

If SVM encounters this green


pi. ∥ 𝒘 ∥ ≥ 1 if yi = +1 w decision boundary, will it choose it?

𝒙𝟐 1 NO
pi. ∥ 𝒘 ∥ ≤ -1 if yi = −1 p

p
2 𝒙𝟏

wT x = 0 49
The Objective of SVM - Illustration

• More formally, the goal of SVM can be stated as follows:

Given a training set (𝑥 ., 𝑦.), (𝑥 $, 𝑦 $), . . . , (𝑥 / , 𝑦 / ), minimize ∥ 𝑤 ∥ (by


varying w and b) subject to the constraint that for all i = 1, 2, . . . , n,

If SVM encounters this purple


pi. ∥ 𝒘 ∥ ≥ 1 if yi = +1 decision boundary, will it choose it?

𝒙𝟐 p1 is positive and bigger now, hence,


pi. ∥ 𝒘 ∥ ≤ -1 if yi = −1 p1
for p1. ∥ 𝒘 ∥ to be ≥ 1, ∥ 𝒘 ∥ can be
p2 w
smaller, aligning better with the
𝒙𝟏
optimization objective

wT x = 0 50
The Objective of SVM - Illustration

• More formally, the goal of SVM can be stated as follows:

Given a training set (𝑥 ., 𝑦.), (𝑥 $, 𝑦 $), . . . , (𝑥 / , 𝑦 / ), minimize ∥ 𝑤 ∥ (by


varying w and b) subject to the constraint that for all i = 1, 2, . . . , n,

If SVM encounters this purple


pi. ∥ 𝒘 ∥ ≥ 1 if yi = +1 decision boundary, will it choose it?

𝒙𝟐 p2 is negative and bigger now, hence,


pi. ∥ 𝒘 ∥ ≤ -1 if yi = −1 p1
for p2. ∥ 𝒘 ∥ to be ≤ -1, ∥ 𝒘 ∥ can be
p2 w
smaller, aligning better with the
𝒙𝟏
optimization objective

wT x = 0 51
The Objective of SVM - Illustration

• More formally, the goal of SVM can be stated as follows:

Given a training set (𝑥 ., 𝑦.), (𝑥 $, 𝑦 $), . . . , (𝑥 / , 𝑦 / ), minimize ∥ 𝑤 ∥ (by


varying w and b) subject to the constraint that for all i = 1, 2, . . . , n,

If SVM encounters this purple


pi. ∥ 𝒘 ∥ ≥ 1 if yi = +1 decision boundary, will it choose it?

𝒙𝟐 YES
pi. ∥ 𝒘 ∥ ≤ -1 if yi = −1 p1
p2 w
𝒙𝟏
wT x = +1
wT x = -1
wT x = 0 52
Example

• Consider the following training examples, assuming w = [w1, w2]


• How to solve for w and b?
• Our goal is to maximize the margin subject to the constraints yi(wTxi + b)
≥ 1, which can be derived from the training examples
x y Constraints
[1, 2] +1 u + 2v + b ≥ 1
[2, 1] -1 2u + v + b ≤ −1
[3, 4] +1 3u + 4v + b ≥ 1
[4, 3] -1 4u + 3v + b ≤ −1
Example

• Consider the following training examples, assuming w = [w1, w2]


• it is easy to find that b = 0 and w = [-1, +1] using geometric interpretation.
wT x = 0
wT x = +1

wT x = -1
[3, 4]

[1, 2]
[4, 3]
maximize the margin!
[2, 1]
54
References
• Support Vector Machines, Andrew W. Moore, CMU School of Computer Science
• Introduction to Information Retrieval: support vector machines

55

You might also like