0% found this document useful (0 votes)
37 views

Kernel Functions

Uploaded by

nick peak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Kernel Functions

Uploaded by

nick peak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Support Vector Machines

Machine Learning

Manoj Kumar

Youtube

October 22, 2024

Manoj Sir (Youtube) Lecture 1 1 / 22


Outline

1 Introduction

2 Kernels

Manoj Sir (Youtube) Lecture 1 2 / 22


Vector Dot Product

Manoj Sir (Youtube) Lecture 1 3 / 22


Calculations Involved in Dot Product

Manoj Sir (Youtube) Lecture 1 4 / 22


Magic!!

Manoj Sir (Youtube) Lecture 1 5 / 22


Kernel Functions

Manoj Sir (Youtube) Lecture 1 6 / 22


Example of Kernel Functions

Manoj Sir (Youtube) Lecture 1 7 / 22


Validity of Kernel Functions

Manoj Sir (Youtube) Lecture 1 8 / 22


Validity of Kernel Functions

Method 1: We find a function K (x, x ′ ) such that

K (x, x ′ ) = ϕ(x).ϕ(x ′ )
Method 2: Mercer’s theorem
1 K (x, x ′ ) = K (x ′ , x) {Symmetric}
2 Datapoints x1 , x2 , x3 . The Kernel matrix K is formed by applying the kernel function to
each pair of data points:
 
K (x1 , x1 ) K (x1 , x2 ) K (x1 , x3 )
K = K (x2 , x1 ) K (x2 , x2 ) K (x2 , x3 )
K (x3 , x1 ) K (x3 , x2 ) K (x3 , x3 )

Kernel Matrix K should be a symmetric matrix and positive semi-definite (all eigenvalues
of K must be non-negative).

Manoj Sir (Youtube) Lecture 1 8 / 22


Types of Kernel Function

Manoj Sir (Youtube) Lecture 1 9 / 22


Types of Kernel Function

1 Polynomial Kernel Function


 d
K (x, x ′ ) = x ⊤ x ′ + c

x and x ′ are input vectors.


c is a constant that can be adjusted.
d is the degree of the polynomial.

Manoj Sir (Youtube) Lecture 1 9 / 22


Types of Kernel Function

1 Polynomial Kernel Function


 d
K (x, x ′ ) = x ⊤ x ′ + c

x and x ′ are input vectors.


c is a constant that can be adjusted.
d is the degree of the polynomial.
2 Gaussian Kernel / Radial Basis Function (RBF) Kernel
It maps the input space into an infinite dimensional feature space.
∥x−x ′ ∥2 d2
K (x, x ′ ) = e − 2σ 2 = e − 2σ2
d → distance between x and x ′
x and x ′ are input vectors.

Manoj Sir (Youtube) Lecture 1 9 / 22


Validity of Kernel Functions

Given that K1 : X × X → R and K2 : X × X → R are two symmetric, positive definite kernel


functions, the validity of the following kernel functions is as follows:
Valid Kernel Functions:
K (x, x ′ ) = c · K1 (x, x ′ ), where c is a positive constant.
K (x, x ′ ) = K1 (x, x ′ ) + K2 (x, x ′ )
Not a Valid Kernel Function:
K (x, x ′ ) = K1 (x, x ′ ) + 1
K2 (x,x ′ )

Manoj Sir (Youtube) Lecture 1 10 / 22


Question

Consider two data points, x1 = (1, −1) and x2 = (2, 2), in a binary classification task
using an SVM with a custom kernel function K (x, y ). The kernel function is applied to
these points, resulting in the following matrix, referred to as matrix A:
   
K (x1 , x1 ) K (x1 , x2 ) 1 3
=
K (x2 , x1 ) K (x2 , x2 ) 3 6

Which of the following statements is correct regarding matrix A and the kernel function
K (x, y )?
A) K (x, y ) is a valid kernel.
B) K (x, y ) is not a valid kernel.
C) Matrix A is positive semi-definite.
D) Matrix A is not positive semi-definite.
Question

Given a kernel function K1 : Rn × Rn → R and its corresponding feature map ϕ1 : Rn →


Rd , which feature map ϕ : Rn → Rd would correctly produce the kernel cK1 (x, z),
where c is a positive constant?
a) ϕ(x) = cϕ1 (x)

b) ϕ(x) = cϕ1 (x)
c) ϕ(x) = c 2 ϕ1 (x)
d) No such feature map exists.
Question

Let K1 : X × X → R and K2 : X × X → R be two symmetric, positive definite kernel


functions. Which of the following cannot be a valid kernel function?
(a) K (x, x ′ ) = 5 · K1 (x, x ′ )
(b) K (x, x ′ ) = K1 (x, x ′ ) + K2 (x, x ′ )
(c) K (x, x ′ ) = K1 (x, x ′ ) + 1
K2 (x,x ′ )
(d) All three are valid kernels.
Question

Given a kernel function K (x, x ′ ) = f (x)g (x ′ ) + f (x ′ )g (x), where f and g are real-valued
functions (Rd → R), the kernel is not valid. What additional terms would you include
in K to make it a valid kernel?
Options:
(A) f (x) + g (x)
(B) f (x)g (x) + f (x ′ )g (x ′ )
(C) f (x)f (x ′ ) + g (x)g (x ′ )
(D) f (x ′ ) + g (x ′ )
Question

Which of the following are properties that a kernel matrix always has?
□ Invertible
□ At least one negative eigenvalue
□ All the entries are positive
□ Symmetric
Question

Suppose ϕ(x) is an arbitrary feature mapping from input x ∈ X to ϕ(x) ∈ RN and let
K (x, z) = ϕ(x)⊤ ϕ(z). Then K (x, z) will always be a valid kernel function.
Circle one: True False
Question

Suppose ϕ(x) is the feature map induced by a polynomial kernel K (x, z) of degree d,
then ϕ(x) should be a d-dimensional vector.
Circle one: True False
Question

Which of the following are valid kernel functions?


 2

⃝ k(x, z) = exp − ∥x−z∥2σ 2
⃝ k(x, z) = ∥x∥∥z∥
 
⊤ 727 1
⃝ k(x, z) = x z
1 42
 
⊤ −727 1
⃝ k(x, z) = x z
1 −42
Question

= x⊤ rev(y
Let x, y ∈ Rd be two points. Consider the function k(x, y )   )where rev(y )
1 3
reverses the order of the components in y . For example, rev 2 = 2. Show that
  
3 1
k cannot be a valid kernel function.
Question

Which of the following statements about kernels are true?


⃝ A: The dimension of the lifted feature vectors Φ(·), whose inner products the
kernel function computes, can be infinite.
⃝ B: For any desired lifting Φ(x), we can design a kernel function k(x, z) that
will evaluate Φ(x)⊤ Φ(z) more quickly than explicitly computing Φ(x) and Φ(z).
⃝ C: The kernel trick, when it is applicable, speeds up a learning algorithm if the
number of sample points is substantially less than the dimension of the (lifted)
feature space.
⃝ D: If the raw feature vectors x, y are of dimension 2, then
k(x, y ) = x12 y12 + x22 y22 is a valid kernel.
Question

Suppose we have a feature map Φ and a kernel function k(Xi , Xj ) = Φ(Xi ) · Φ(Xj ).
Select the true statements about kernels.
A: If there are n sample points of dimension d, it takes O(nd) time to compute
the kernel matrix.
⃝ B: The kernel trick implies we do not compute Φ(Xi ) explicitly for any sample
point Xi .
⃝ C: For every possible feature map Φ : Rd → RD you could imagine, there is a
way to compute k(Xi , Xj ) in O(d) time.
⃝ D: Running times of kernel algorithms do not depend on the dimension D of
the feature space Φ(·).
Thank you!
Questions?

Manoj Sir (Youtube) Lecture 1 22 / 22

You might also like