0% found this document useful (0 votes)

89 views49 pages

Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)

This document provides an introduction to support vector machines (SVM) for machine learning and data mining. It discusses how SVM finds the optimal hyperplane for linearly separable data that maximizes the margin between the two classes. For non-linear classification, kernel functions are used to transform the data into a higher dimensional space where it may become linearly separable. Learning SVM is formulated as a convex quadratic optimization problem to minimize the weights under margin constraints.

Uploaded by

Lộc Sẹo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views49 pages

Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)

Uploaded by

Lộc Sẹo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Introduction to

Machine Learning and Data Mining

(Học máy và Khai phá dữ liệu)

Khoat Than
School of Information and Communication Technology
Hanoi University of Science and Technology

2021
2

Contents
¡ Introduction to Machine Learning & Data Mining
¡ Unsupervised learning
¡ Supervised learning
¨ Support Vector Machines

¡ Practical advice
3
Support Vector Machines (1)
¡ Support Vector Machines (SVM) (máy vectơ hỗ trợ) was
proposed by Vapnik and his colleages in 1970s. Then it
became famous and popular in 1990s.
¡ Originally, SVM is a method for linear classification. It finds a
hyperplane (also called linear classifier) to separate the
two classes of data.
¡ For non-linear classification for which no hyperplane
separates well the data, kernel functions (hàm nhân) will be
used.
¨ Kernel functions play the role to transform the data into
another space, in which the data is linearly separable.
¡ Sometimes, we call linear SVM when no kernel function is
used. (in fact, linear SVM uses a linear kernel)
4
Support Vector Machines (2)
¡ SVM has a strong theory that supports its performance.
¡ It can work well with very high dimensional problems.
¡ It is now one of the most popular and strong methods.
¡ For text categorization, linear SVM performs very well.
5
1. SVM: the linearly separable case
¡ Problem representation:
¨ Training data D = {(x1, y1), (x2, y2), …, (xr, yr)} with r instances.
¨ Each xi is a vector in an n-dimensional space,
e.g., xi = (xi1, xi2, …, xin)T. Each dimension represents an attribute.
¨ Bold characters denote vectors.
¨ yi is a class label in {-1; 1}. ‘1’ is possitive class, ‘-1’ is negative class.

¡ Linear separability assumption: there exists a hyperplane

(of linear form) that well separates the two classes
(giả thuyết tồn tại một siêu phẳng mà phân tách 2 lớp được)
6
Linear SVM
¡ SVM finds a hyperplane of the form:
f(x) = áw × xñ + b [Eq.1]
¨ w is the weight vector; b is a real number (bias).
¨ áw × xñ and áw, xñ denote the inner product of two vectors
(tích vô hướng của hai véctơ)

¡ Such that for each xi:

ì 1 if áw × xi ñ + b ³ 0
yi = í [Eq.2]
î- 1 if áw × xi ñ + b < 0
7
Separating hyperplane
¡ The hyperplane (H0) which separates the possitive from
negative class is of the form:
áw × xñ + b = 0
¡ It is also known as the decision boundary/surface.
¡ But there might be infinitely many separating hyperplanes.
Which one should we choose?

[Liu, 2006]
8
Hyperplane with max margin
¡ SVM selects the hyperplane with max margin.
(SVM tìm siêu phẳng tách mà có lề lớn nhất)
¡ It is proven that the max-margin hyperplane has minimal errors
among all possible hyperplanes.

[Liu, 2006]
9
Marginal hyperplanes
¡ Assume that the two classes in our data can be separated
clearly by a hyperplane.
¡ Denote (x+,1) in possitive class and (x-,-1) in negative class
which are closest to the separating hyperplane H0
(áw × xñ + b = 0)
¡ We define two parallel marginal hyperplanes as follows:
¨ H+ crosses x+ and is parallel with H0: áw × x+ñ + b = 1
¨ H- crosses x- and is parallel with H0: áw × x-ñ + b = -1
¨ No data point lies between these two marginal hyperplanes.
And satisfying:
áw × xiñ + b ≥ 1, if yi = 1
áw × xiñ + b ≤ -1, if yi = -1 [Eq.3]
10
The margin (1)
¡ Margin (mức lề) is defined as the distance between the two
marginal hyperplanes.
¨ Denote d+ the distance from H0 to H+.
¨ Denote d- the distance from H0 to H-.
¨ (d+ + d-) is the margin.
¡ Remember that the distance from a point xi to the
hyperplane H0 (áw × xñ + b = 0) is computed as:

| 𝒘 # 𝒙! + 𝑏| [Eq.4]
𝒘
¨ Where:
[Eq.5]
𝒘 = 𝒘#𝒘 = 𝑤"# + 𝑤## + ⋯+ 𝑤$#
11
The margin (2)
¡ So the distance d+ from x+ to H0 is
| áw × x+ ñ + b | | 1 | 1 [Eq.6]
d+ = = =
|| w || || w || || w ||
¡ Similarly, the distance d- from x- to H0 is
| á w × x - ñ + b | | -1 | 1 [Eq.7]
d- = = =
|| w || || w || || w ||

¡ As a result, the margin is:

2
margin = d + + d - = [Eq.8]
|| w ||
12
SVM: learning with max margin (1)
¡ SVM learns a classifier H0 with a maximum margin, i.e., the
hyperplane that has the greatest margin among all
possible hyperplanes.

¡ This learning principle can be formulated as the following

quadratic optimization problem:
¨ Find w and b that maximize
2
margin =
w
¨ and satisfy the below conditions for any training data xi:
ìá w × x i ñ + b ³ 1, if y i = 1
í
îá w × x i ñ + b £ -1, if y i = -1
13
SVM: learning with max margin (2)
¡ Learning SVM is equivalent to the following minimization
problem:
¨ Minimize á w × wñ [Eq.9]
2
ì á w × x i ñ + b ³ 1, if yi = 1
¨ Conditioned on
í
îá w × x i ñ + b £ -1, if yi = -1

¡ Note, it can be reformulated as:

¨ Minimize á w × wñ [Eq.10]
2 (P)
¨ Conditioned on yi (á w × x i ñ + b) ³ 1, "i = 1..r
¡ This is a constrained optimization problem.
14
Constrained optimization (1)
¡ Consider the problem:
Minimize f(x) conditioned on g(x) = 0
¡ Necessary condition: a solution x0 will satisfy
ì¶
ï ( f(x) + αg (x)) =0
í ¶x x=x0
;
ï g(x) = 0
î
¨ Where α is a Lagrange multiplier.
¡ In the cases of many constraints (gi(x)=0 for i=1…r), a
solution x0 will satisfy:
ì¶ æ r
ö
ï ç
í ¶x è
f(x) + å
i =1
α g
i i (x) ÷
ø x=x0
=0
;
ï g (x) = 0
î i
15
Constrained optimization (2)
¡ Consider the problem with inequality constraints:
Minimize f(x) conditioned on gi(x) ≤ 0
¡ Necessary condition: a solution x0 will satisfy
ì¶ æ r
ö
ï ç f(x) + å αi g i(x) ÷ =0
í ¶x è i =1 ø x=x0 ;
ï g (x) £ 0
î i
¨ Where α! ≥ 0 is a Lagrange multiplier.
r
¡ L = f(x) + å α g (x)
i =1
i i
is known as the Lagrange function.

¨ x is called primal variable (biến gốc)

¨ 𝛼 is called dual variable (biến đối ngẫu)
16
SVM: learning with max margin (3)
¡ The Lagrange function for problem [Eq. 10] is
r
1
L(w, b, α ) = 〈w ⋅ w〉 − ∑α i [yi (〈w ⋅ x i 〉 + b) −1] [Eq.11a]
2 i=1
¨ Where each α! ≥ 0 is a Lagrange multiplier.
¡ Solving [Eq. 10] is equivalent to the following minimax
problem:
arg min max L(w, b, α ) [Eq.11b]
w,b α ≥0

'1 r *
= arg min max ) 〈w ⋅ w〉 − ∑α i [yi (〈w ⋅ x i 〉 + b) −1],
w,b α ≥0
(2 i=1 +
17
SVM: learning with max margin (4)
¡ The primal problem [Eq. 10] can be derived by solving:
'1 r *
max L(w, b, α ) = max ) 〈w ⋅ w〉 − ∑α i [yi (〈w ⋅ x i 〉 + b) −1],
α ≥0 α ≥0
(2 i=1 +

¡ Its dual problem (đối ngẫu) can be derived by solving:

&1 r )
min L(w, b, α ) = min ( 〈w ⋅ w〉 − ∑α i [yi (〈w ⋅ x i 〉 + b) −1]+
w,b w,b
'2 i=1 *

¡ It is known that the optimal solution to [Eq. 10] will satisfy

some conditions which is called the Karush-Kuhn-Tucker
(KKT) conditions.
18
SVM: Karush-Kuhn-Tucker
r
∂L [Eq.12]
= w − ∑α i yi x i = 0
∂w i=1
r
∂L [Eq.13]
= −∑α i yi = 0
∂b i=1

yi ( w × xi + b) - 1 ³ 0, "xi (i = 1..r ) [Eq.14]

αi ³ 0 [Eq.15]

αi ( yi ( w × xi + b) - 1) = 0 [Eq.16]

¡ The last equation [Eq. 16] comes from a nice result from the
duality theory.
¨ Note: any 𝛼! > 0 will imply that the associated point xi lies in a
boundary hyperplane (H+ or H-).
¨ Such a boundary point is named as a support vector.
¨ A non-support vector will correspond to 𝛼! = 0.
19
SVM: learning with max margin (5)
¡ In general, the KKT conditions do not guarantee the
optimality of the solution.
¡ Fortunately, due to the convexity of the primal problem
[Eq.10], the KKT conditions are both necessary and
sufficient to assure the global optimality of the solution. It
means a vector satisfying all KKT conditions provides the
globally optimal classifier.
¨ Convex optimization is ‘easy’ in the sense that we always can
find a good solution with a provable guarantee.
¨ There are many algorithms in the literature, but most are
iterative.
¡ In fact, problem [Eq.10] is pretty hard to derive an efficient
algorithm. Therefore, its dual problem is more preferable.
20
SVM: the dual form (1)
¡ Remember that the dual counterpart of [Eq.10] is
&1 r )
min L(w, b, α ) = min ( 〈w ⋅ w〉 − ∑α i [yi (〈w ⋅ x i 〉 + b) −1]+
w,b w,b
'2 i=1 *

¡ By taking the gradient of L(w,b,𝛼 ) in variables (w,b) and

zeroing it, we can find the following dual function:
r
1 r
LD (α ) = ∑α i − ∑ α iα j yi y j 〈x i ⋅ x j 〉 [Eq.17]
i=1 2 i, j=1
21
SVM: the dual form (2)
¡ Solving problem [Eq.10] is equivalent to solving its dual
problem below:
r
1 r
¨ Maximize LD (α ) = å a i - å a ia j yi y j á x i × x j ñ [Eq.18]
i =1 2 i , j =1
(D)
ìr
Such that ïå a i yi = 0
¨
í i =1
ïîa i ³ 0, "i = 1..r
¡ The constraints in (D) is much more simpler than those of
the primal problem. Therefore deriving an efficient method
to solve this problem might be easier.
¨ However, existing algorithms for this problem are iterative and
complicated. Therefore, we will not discuss any algorithm in
detail !
22
SVM: the optimal classifier
¡ Once the dual problem is solved for 𝜶, we can recover the
optimal solution to problem [Eq.10] by using the KKT.
¡ Let SV be the set of all support vectors
¨ SV is a subset of the training data.
¨ α! > 0 suggests that xi is a support vector.
¡ We can compute w* by using [Eq.12]. So:
r
w* = å a i yi x i = åa y x ; i i i (due to α" = 0 for any xj not in SV)
i =1 x i ÎSV

¡ To find b*, we take an index k such that 𝛼$ > 0:

¨ It means yk(áw* × xkñ + b*) -1 = 0 due to [Eq.16].
¨ Hence,
¨ b* = yk - áw* × xkñ
23
SVM: classifying new instances
¡ The decision boundary is
f(x) = á w * ×xñ + b* = å α y áx × xñ + b* = 0
x i ÎSV
i i i
[Eq.19]

¡ For a new instance z, we compute:

æ ö
sign( á w * ×zñ + b*) = signçç å αi yi á x i × zñ + b* ÷÷ [Eq.20]
è xi ÎSV ø
¨ If the result is 1, z will be assigned to the possitive class;
otherwise z will be assigned to the negative class.
¡ Note that this classification principle
¨ Just depends on the support vectors.
¨ Just needs to compute some dot products.
24
2. Soft-margin SVM
¡ What if the two classes are not linearly separable?
(Trường hợp 2 lớp không thể phân tách tuyến tính thì sao?)
¨ Linear separability is ideal in practice.
¨ Data are often noisy or erronous, making two classes
overlapping (nhiễu/lỗi có thể làm 2 lớp giao nhau)
¡ In the case of linear separability:
¨ Minimize á w × wñ
2
¨ Conditioned on yi (á w × x i ñ + b) ³ 1, "i = 1..r
¡ In the cases of noises or overlapping, those constraints may
never meet simutaneously.
¨ It means we cannot solve for w* and b*.
25
Example of inseparability
¡ Noisy points xa and xb are mis-labeled.
26
Relaxing the constraints
¡ To work with noises/errors, we need to relax the constraints
about margin by using some slack variables xi (³ 0):
(Ta sẽ mở rộng ràng buộc về lề bằng cách thêm biến bù)
áw×xiñ+b ³ 1-xi if yi = 1
áw×xiñ+b £ -1+xi if yi = -1
¨ For a noisy/erronous point xi, we have: xi >1
¨ Otherwise xi = 0.
¡ Therefore, we have the following conditions for the cases of
nonlinear separability:
yi(áw×xiñ + b) ³ 1-xi for all i = 1…r
xi ³ 0 for all i = 1…r
27
Penalty on noises/errors
¡ We should enclose some information on noises/errors into
the objective function when learning
(ta nên đính thêm thông tin về nhiễu/lỗi vào hàm mục tiêu)
¨ Otherwise, the resulting classifier easily overfits the data.
¡ A penalty term will be used so that learning is to minimize
&
𝑾, 𝑾
+ 𝐶 . 𝜉!'
2
!%"
¨ Where C (>0) is the penalty constant (hằng số phạt).
¨ The greater C, the heavier the penalty on noises/errors.
¡ 𝑘 = 1 is often used in practice, due to simplicity for solving
the optimization problem.
28
The new optimization problem
á w × wñ r
¡ Minimize + C å xi [Eq.21]
2 i =1

Conditioned on ì yi (á w × x i ñ + b) ³ 1 - x i , "i = 1..r

í
¨

î x i ³ 0, "i = 1.. r
¡ This problem is called Soft-margin SVM.
¡ It is equivalent to minimize the following function
&
1
. max(0,1 − 𝑦! ( 𝒘 # 𝒙! + 𝑏)) + 𝜆 𝒘 ##
𝑟
!%"
¨ max(0,1 − 𝑦! ( 𝒘 0 𝒙! + 𝑏)) is called Hinge loss
¨ Some popular losses: squared error, cross entropy, hinge
¨ 𝜆 > 0 is a constant
29
The new optimization problem
¡ Its Lagrange function is
r r r
1
L = 〈w ⋅ w〉 + C ∑ξ i − ∑α i [yi (〈w ⋅ x i 〉 + b) −1+ ξ i ]− ∑ µiξ i
2 i=1 i=1 i=1
[Eq.22]
¨ Where ai (³0) and µi (³0) are Lagrange multipliers.
30
Karush-Kuhn-Tucker conditions (1)

¶LP r
= w - å a i yi x i = 0 [Eq.23]
¶w i =1

¶LP r
= - å a i yi = 0 [Eq.24]
¶b i =1

¶LP
= C - a i - µi = 0, "i = 1..r [Eq.25]
¶x i
31
Karush-Kuhn-Tucker conditions (2)
yi (á w × xi ñ + b) - 1 + xi ³ 0, "i = 1..r [Eq.26]

xi ³ 0 [Eq.27]

ai ³ 0 [Eq.28]

µi ³ 0 [Eq.29]

a i ( yi (á w × xi ñ + b) - 1 + xi ) = 0 [Eq.30]

µ ix i = 0 [Eq.31]
32
The dual problem
r
1 r
¡ Maximize LD (α ) = å a i - å a ia j yi y j á x i × x j ñ
i =1 2 i , j =1
ìr
ïå a i yi = 0
¨ Such that [Eq.32]
í i =1
ïî0 £ a i £ C , "i = 1..r

¡ Note that neither x nor µi appears in the dual problem.

¡ This problem is almost similar with that [Eq.18] in the case of
linearly separable classification.
¡ The only difference is the constraint: ai £C
33
Soft-margin SVM: the optimal classifier
¡ Once the dual problem is solved for 𝛼 , we can recover the
optimal solution to problem [Eq.21].
¡ Let SV be the set of all support/noisy vectors
¨ SV is a subset of the training data.
¨ 𝛼! > 0 suggests that xi is a support/noisy vector.
¡ We can compute w* by using [Eq.12]. So:
r
w* = å a i yi x i = åa y x ; i i i (due to 𝛼" = 0 for any xj not in SV)
i =1 x i ÎSV

¡ To find b*, we take an index k such that C > 𝛼$ > 0:

¨ It means xk = 0 due to [Eq.25] and [Eq.31];

¨ And yk(áw* × xkñ + b*) -1 = 0 due to [Eq.30].

¨ Hence, b* = yk - áw* × xkñ
34
Some notes
¡ From [Eq.25-31] we conclude that

If a i = 0 then yi (á w × x i ñ + b ) ³ 1, and x i = 0
If 0 < a i < C then yi (á w × x i ñ + b ) = 1, and x i = 0
If a i = C then yi (á w × x i ñ + b ) < 1, and x i > 0

¡ The classifier can be expressed as a linear combination of

few training points.
¨ Most training points lie outside the margin area: 𝛼! = 0
¨ The support vectors lie in the marginal hyperplanes: 0 < 𝛼! < 𝐶
¨ The noisy/erronous points will associate with 𝛼! = 𝐶
¡ Hence the optimal classifier is a very sparse combination of
the training data.
35
Soft-margin SVM: classifying new instances

¡ The decision boundary is

f(x) = á w * ×xñ + b* = å α y áx × xñ + b* = 0
x i ÎSV
i i i
[Eq.19]

¡ For a new instance z, we compute:

æ ö
sign( á w * ×zñ + b*) = signçç å αi yi á x i × zñ + b* ÷÷ [Eq.20]
è xi ÎSV ø
¨ If the result is 1, z will be assigned to the possitive class;
otherwise z will be assigned to the negative class.
¡ Note: it is important to choose a good value of C, since it
significantly affects performance of SVM.
¨ We often use a validation set to choose a value for C.
36
Linear SVM: summary
¡ Classification is based on a separating hyperplane.
¡ Such a hyperplane is represented as a combination of
some support vectors.
¡ The determination of support vectors reduces to solve a
quadratic programming problem.
¡ In the dual problem and the separating hyperplane, dot
products can be used in place of the original training data.
¨ This is the door for us to learn a nonlinear classifier.
37
3. Non-linear SVM
¡ Consider the case in which our data are not linearly
separable
¨ This may often happen in practice
¡ How about using a non-linear function?
¡ Idea of Non-linear SVM:
¨ Step 1: transform the input into another space, which often has
higher dimensions, so that the projection of data is linearly
separable
¨ Step 2: use linear SVM in
the new space
38
Non-linear SVM
¡ Input space: initial representation of data
¡ Feature space: the new space after the transformation

𝜙(𝒙)
39
Non-linear SVM: transformation
¡ Our idea is to map the input x to a new representation,
using a non-linear mapping
𝜙: 𝑋 ⟶ 𝐹
𝒙 ⟼ 𝜙(𝒙)
¡ In the feature space, the original training data
{(𝒙𝟏, 𝑦1), (𝒙𝟐, 𝑦2), … , (𝒙𝒓, 𝑦& )} are represented by
{(f(x1), y1), (f(x2), y2), …, (f(xr), yr)}
40
Non-linear SVM: transformation
¡ Consider the input space to be 2-dimensional, and we
choose the following map
𝜙: 𝑋 ⟶ 𝐹
(𝑥" , 𝑥# ) ⟼ (𝑥"# , 𝑥## , 2𝑥" 𝑥# )
¡ So instance x = (2, 3) will have the representation in the
feature space as
f(x) = (4, 9, 8.49)
41
Non-linear SVM: learning & prediction
¡ Training problem:
Minimize á w × wñ r
LP = + C å xi [Eq.34]
2 i =1
Such that ì yi (á w × f (x i )ñ + b ) ³ 1 - x i , "i = 1..r
í
î x i ³ 0, "i = 1..r
¡ The dual problem:
r
1 r
Maximize LD = å a i - å a ia j yi y j áf (x i ) × f (x j )ñ [Eq.35]
i =1 2 i , j =1
ì r
Such that ï å a i yi = 0
í i =1
ïî0 £ a i £ C , "i = 1..r
¡ Classifier:
𝑓 𝒛 = 𝒘∗, 𝜙(𝒛) + 𝑏 ∗ = ; 𝛼! 𝑦! 𝜙(𝒙! ), 𝜙(𝒛) + 𝑏 ∗ [Eq.36]
𝒙! ∈+,
42
Non-linear SVM: difficulties
¡ How to find the mapping?
¨ An intractable problem
¡ The curse of dimensionality
¨ As the dimensionality increases, the volume of the space
increases so fast that the available data become sparse.
¨ This sparsity is problematic.
¨ Increasing the dimensionality will require significantly more
training data.

ược
Dữ liệu dù thu thập đ
là
lớn đến đâu thì cũng
g
quá nhỏ so với khôn
gian của chúng
43
Non-linear SVM: Kernel functions
¡ An explicit form of a tranformation is not necessary
¡ The dual problem:
r r
1
Maximize LD = å a i - å a ia j yi y j áf (x i ) × f (x j )ñ
i =1 2 i , j =1
ì r
Such that ï å a i yi = 0
í i =1
ïî0 £ a i £ C , "i = 1..r
¡ Classifier: 𝑓 𝒛 = 𝒘∗ , 𝜙(𝒛) + 𝑏 ∗ = ∑𝒙" ∈+, 𝛼! 𝑦! 𝜙(𝒙! ), 𝜙(𝒛) + 𝑏 ∗

¡ Both require only the inner product áf(x),f(z)ñ

¡ Kernel trick: Nonlinear SVM can be used by replacing those
inner products by evaluations of some kernel function
K(x,z) = áf(x),f(z)ñ [Eq.37]
44
Kernel functions: example
¡ Polynomial
𝐾 𝒙, 𝒛 = 𝒙, 𝒛 %

¡ Consider the polynomial with degree d=2. For any vectors

x=(x1,x2) and z=(z1,z2)

𝒙, 𝒛 # = 𝑥" 𝑧" + 𝑥# 𝑧# #
= 𝑥"# 𝑧"# + 2𝑥" 𝑧" 𝑥# 𝑧# + 𝑥## 𝑧##
= 𝑥"# , 𝑥## , 2𝑥" 𝑥# , 𝑧"# , 𝑧## , 2𝑧" 𝑧#
= 𝜙 𝒙 , 𝜙(𝒛) = 𝐾 𝒙, 𝒛
¨ Where 𝜙 𝒙 = 𝑥-. , 𝑥.. , 2𝑥- 𝑥. .

¡ Therefore the polynomial is the product of two vectors 𝜙 𝒙

and 𝜙(𝒛).
45
Kernel functions: popular choices
¡ Polynomial

K(x,z) = (á x × zñ + θ ) ; trong đó : θ Î R,d Î N

¡ Gaussian radial basis function (RBF)

2
x-z
-
K(x,z) = e 2σ
; trong đó : σ > 0

¡ Sigmoid
1
K(x,z) = tanh (b á x × zñ - λ) = - ( b á x× z ñ - λ )
; trong đó : β,λ Î R
1+ e

¡ What conditions ensure a kernel function?

Mercer’s theorem
46
SVM: summary
¡ SVM works with real-value attributes
¨ Any nominal attribute need to be transformed into a real one
¡ The learning formulation of SVM focuses on 2 classes
¨ How about a classification problem with > 2 classes?
¨ One-vs-the-rest, one-vs-one: a multiclass problem can be
solved by reducing to many different problems with 2 classes
¡ The decision function is simple, but may be hard to
interpret
¨ It is more serious if we use some kernel functions
47
SVM: some packages
¡ LibSVM:
•http://www.csie.ntu.edu.tw/~cjlin/libsvm/

¡ Linear SVM for large datasets:

•http://www.csie.ntu.edu.tw/~cjlin/liblinear/
•http://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html

¡ Scikit-learn in python:
•http://scikit-learn.org/stable/modules/svm.html

¡ SVMlight:
•http://www.cs.cornell.edu/people/tj/svm_light/index.html
48
References
¡ B. Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage
Data. Springer, 2006.
¡ C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern
Recognition. Data Mining and Knowledge Discovery, 2(2): 121-167, 1998.
¡ Cortes, Corinna, and Vladimir Vapnik. "Support-vector networks."
Machine learning 20.3 (1995): 273-297.
49
Exercises
¡ What is the main difference between SVM and KNN?
¡ How many support vectors are there in the worst case?
Why?

¡ The meaning of the constant C in SVM? Compare the role

of C in SVM with that of λ in Ridge regression.

Workday® - PRISM
No ratings yet
Workday® - PRISM
200 pages
Solutions To Exercises On Processes Synchronization
100% (1)
Solutions To Exercises On Processes Synchronization
5 pages
Probable Cause Affidavit Filed
No ratings yet
Probable Cause Affidavit Filed
6 pages
Uebung CAE
No ratings yet
Uebung CAE
7 pages
The Ins and Outs of Core Data Tools
No ratings yet
The Ins and Outs of Core Data Tools
4 pages
Black Techniques+ +exercise Students
No ratings yet
Black Techniques+ +exercise Students
6 pages
Bitefight Bot Story
0% (3)
Bitefight Bot Story
12 pages
Simple Web-Hacking Techniques
100% (1)
Simple Web-Hacking Techniques
3 pages
50 Deep Learning Technical Interview Questions With Answers
100% (1)
50 Deep Learning Technical Interview Questions With Answers
20 pages
A New Multilevel Coding Method Using Error-Correcting Codes - Correction To
No ratings yet
A New Multilevel Coding Method Using Error-Correcting Codes - Correction To
1 page
DE10 Standard OpenCL 18.0
No ratings yet
DE10 Standard OpenCL 18.0
31 pages
ĐỀ 2
No ratings yet
ĐỀ 2
4 pages
Quick Onboarding Guide
No ratings yet
Quick Onboarding Guide
9 pages
Gps Tracker Tk300
No ratings yet
Gps Tracker Tk300
8 pages
Trắc Nghiệm Big data
No ratings yet
Trắc Nghiệm Big data
69 pages
2023 10 10 Passwords Personal Protection Karampelas
No ratings yet
2023 10 10 Passwords Personal Protection Karampelas
38 pages
User Manual For WebVPN
No ratings yet
User Manual For WebVPN
4 pages
Support Vector Machines & Kernels: David Sontag New York University
No ratings yet
Support Vector Machines & Kernels: David Sontag New York University
19 pages
Project Brazen - ITO - Nov. 27
100% (1)
Project Brazen - ITO - Nov. 27
474 pages
On Tap CSDL 2
No ratings yet
On Tap CSDL 2
20 pages
Mobile Adhoc Network
No ratings yet
Mobile Adhoc Network
5 pages
Final Exam CPL - Practice.t102
No ratings yet
Final Exam CPL - Practice.t102
8 pages
Project Report On Internet Banking Using Java Gui and Mysql With Several Modules
No ratings yet
Project Report On Internet Banking Using Java Gui and Mysql With Several Modules
16 pages
Access Select and Access SelectPay Portfolio
No ratings yet
Access Select and Access SelectPay Portfolio
20 pages
Definition of Mobile Phone Cloning
0% (1)
Definition of Mobile Phone Cloning
22 pages
Ilovepdf Meed
No ratings yet
Ilovepdf Meed
79 pages
3scribd Leech Generator - بحث Google
No ratings yet
3scribd Leech Generator - بحث Google
2 pages
Glossary - Malwarebytes
No ratings yet
Glossary - Malwarebytes
63 pages
Microprocessing Mini Project Report
No ratings yet
Microprocessing Mini Project Report
9 pages
BR 36 Wisenet-WAVE-Brochure v5
No ratings yet
BR 36 Wisenet-WAVE-Brochure v5
12 pages
Employer Information: Department of Labor & Industry Office of Unemployment Compensation Benefits Policy
No ratings yet
Employer Information: Department of Labor & Industry Office of Unemployment Compensation Benefits Policy
2 pages
Google Dorks Vulnerable Sites #1 Vulnerable Sites #2 Vulnerable Sites #3 Vulnerable Sites (With Syntax)
No ratings yet
Google Dorks Vulnerable Sites #1 Vulnerable Sites #2 Vulnerable Sites #3 Vulnerable Sites (With Syntax)
11 pages
Adv Works RetailData
No ratings yet
Adv Works RetailData
6,770 pages
Stepper Motor PDF
No ratings yet
Stepper Motor PDF
49 pages
HTTP Request Smuggling
No ratings yet
HTTP Request Smuggling
23 pages
How To Spread A Batch File Thru A Network
No ratings yet
How To Spread A Batch File Thru A Network
10 pages
Linux Fundamentals 6
No ratings yet
Linux Fundamentals 6
1 page
Mapxtreme V7.2.0 Release Notes: List of Topics
No ratings yet
Mapxtreme V7.2.0 Release Notes: List of Topics
7 pages
06.05.23 - Python - Web Scraping in Python
No ratings yet
06.05.23 - Python - Web Scraping in Python
108 pages
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
No ratings yet
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
5 pages
Unit 7 - Traffic - L P 7
No ratings yet
Unit 7 - Traffic - L P 7
8 pages
Conventional Bridge Loan Financing
No ratings yet
Conventional Bridge Loan Financing
1 page
UPI Frauds
No ratings yet
UPI Frauds
4 pages
Android App Portal
No ratings yet
Android App Portal
10 pages
How To Set Up A Multi-Node Hadoop Cluster On Amazon EC2 - WithScreenShots
No ratings yet
How To Set Up A Multi-Node Hadoop Cluster On Amazon EC2 - WithScreenShots
42 pages
DBMS LAB Reference Manual
No ratings yet
DBMS LAB Reference Manual
105 pages
Tutorial Templates Joomla
No ratings yet
Tutorial Templates Joomla
28 pages
Remote Access Portal User Guide
No ratings yet
Remote Access Portal User Guide
3 pages
Close X: Notifications: 1
No ratings yet
Close X: Notifications: 1
63 pages
Introduction To SSRF
No ratings yet
Introduction To SSRF
9 pages
DA38 Safety Data Shett FAME (Fatty Acid Methyl Ester) V4 - en - DE
No ratings yet
DA38 Safety Data Shett FAME (Fatty Acid Methyl Ester) V4 - en - DE
14 pages
RP Data Formats
100% (1)
RP Data Formats
38 pages
Project 4
No ratings yet
Project 4
52 pages
Sample Report
No ratings yet
Sample Report
24 pages
Intelius - Dashboard PDF
No ratings yet
Intelius - Dashboard PDF
1 page
github.com
No ratings yet
github.com
50 pages
Atm Project
No ratings yet
Atm Project
4 pages
Understanding VAG Immobilizer 3 - Nefariousmotorsports-Com PDF
No ratings yet
Understanding VAG Immobilizer 3 - Nefariousmotorsports-Com PDF
4 pages
SSN
No ratings yet
SSN
2 pages
L5 SVMs
No ratings yet
L5 SVMs
37 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
LESSON 11 - Synchronous Machines
No ratings yet
LESSON 11 - Synchronous Machines
32 pages
LESSON 10 - INDUCTION MACHINES - Part 2
No ratings yet
LESSON 10 - INDUCTION MACHINES - Part 2
29 pages
FINAL EXAM 20202 - PART 2 (6.5 Points) Object-Oriented Programming Object-Oriented Language and Theory
No ratings yet
FINAL EXAM 20202 - PART 2 (6.5 Points) Object-Oriented Programming Object-Oriented Language and Theory
2 pages
Ky Thuat Dien Ebook Bai Tap Ki Thuat Dien Phan 1 Vo Huy Toan, Truong Ngoc Tuan (Cuuduongthancong - Com)
No ratings yet
Ky Thuat Dien Ebook Bai Tap Ki Thuat Dien Phan 1 Vo Huy Toan, Truong Ngoc Tuan (Cuuduongthancong - Com)
49 pages
Oolt Ict Ds Ai k63 Part1
No ratings yet
Oolt Ict Ds Ai k63 Part1
2 pages
Solutions To Exercises On Memory Management
No ratings yet
Solutions To Exercises On Memory Management
7 pages
Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)
No ratings yet
Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)
26 pages
Research Paper Rasa Opensource
No ratings yet
Research Paper Rasa Opensource
9 pages
Crop Disease Detection Using CNN
No ratings yet
Crop Disease Detection Using CNN
5 pages
Introduction To Microsoft Power BI
No ratings yet
Introduction To Microsoft Power BI
3 pages
Random Forest Classifiers A Survey and Future
No ratings yet
Random Forest Classifiers A Survey and Future
10 pages
Chapter-1 Introduction To NLP
No ratings yet
Chapter-1 Introduction To NLP
12 pages
Lung Disease Report Final
No ratings yet
Lung Disease Report Final
51 pages
Plotting System Responses - MATLAB Simulink PDF
No ratings yet
Plotting System Responses - MATLAB Simulink PDF
11 pages
13A03702 Automation & Robotics
No ratings yet
13A03702 Automation & Robotics
2 pages
What Is Normalization in SQL and What Are Its Types
No ratings yet
What Is Normalization in SQL and What Are Its Types
6 pages
Affan Abbas: Computer Vision Engineer
No ratings yet
Affan Abbas: Computer Vision Engineer
1 page
Theroy of Facial Landmarks
No ratings yet
Theroy of Facial Landmarks
4 pages
Jan11 CBFC1103 INTRO COMM Skema 2
No ratings yet
Jan11 CBFC1103 INTRO COMM Skema 2
10 pages
Advanced Control System Notes
No ratings yet
Advanced Control System Notes
40 pages
Sketch To Image Using GAN
No ratings yet
Sketch To Image Using GAN
6 pages
Thongsuwan 2020
No ratings yet
Thongsuwan 2020
10 pages
C1 W2
No ratings yet
C1 W2
18 pages
Automatic Control Systems (Fifth Edition) : Reviewer
No ratings yet
Automatic Control Systems (Fifth Edition) : Reviewer
2 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Neuron Model and Network Architectures
No ratings yet
Neuron Model and Network Architectures
18 pages
Software Needs
No ratings yet
Software Needs
3 pages
Chertovskikh 2019 J. Phys. Conf. Ser. 1359 012090
No ratings yet
Chertovskikh 2019 J. Phys. Conf. Ser. 1359 012090
7 pages
AI Brochure - PDF 3
No ratings yet
AI Brochure - PDF 3
8 pages
Geno-Fuzzy P ID Control System For Flexible-Joint Robot Arm: Teranun Tangcharoensuk
No ratings yet
Geno-Fuzzy P ID Control System For Flexible-Joint Robot Arm: Teranun Tangcharoensuk
4 pages
Data Driven Decisions For Business
100% (1)
Data Driven Decisions For Business
14 pages
Chapter 12 PID Control Design and Tuning B
No ratings yet
Chapter 12 PID Control Design and Tuning B
35 pages
Preprocessing
No ratings yet
Preprocessing
13 pages