DSA5102X Lecture2
DSA5102X Lecture2
Soufiane Hayou
Department of Mathematics
Consultation, Homework, Project
Consultation
We have three TAs for this class
• Wang Shida
• Jiang Haotian
• Wang Weixi
𝜙3
𝜙1 𝜙2
Another view of feature maps
One can also view feature maps as implicitly defining some sort
of similarity measure
Consider two vectors and . Then, measures how similar they are
𝑢 𝑣 𝑢 𝑢𝑣
𝑣 Increasing
Increasing similarity
Feature maps defines a similarity between two samples by
computing the dot product in feature space
𝜙 ( 𝑥) 𝜙 ( 𝑥′) 𝜙 ( 𝑥) 𝜙 ( 𝑥′)
𝜙 ( 𝑥)
𝜙 ( 𝑥′)
Increasing
Recall:
Two observations:
• The dataset is memorized by and is not needed for new
predictions
• For each new prediction, we have operations
Reformulation of Ridge Regression
Let us now write ridge regression solution another way…
Two observations:
• The input data participates in theGram matrix
predictions
• For each new prediction, we have operations
Original Solution Reformulated Solution
• (Non-negativity)
• (Positive Semi-definiteness)
for all
1. Symmetry:
In fact,
and
• Polynomial kernel
• Many more…
Flexibility of using kernels
For example, consider the RBF kernel in 1 input dimension
where
In general:
is
Why?
1
min |𝑤 𝑥𝑖 +𝑏| subject ¿ 𝑦 𝑖 ( 𝑤 𝑥 𝑖 +𝑏 ) >0 ∀ 𝑖
𝑇 𝑇
max
𝑤 ,𝑏 ‖𝑤‖ 𝑖=1 ,… , 𝑁
Reformulated as a constrained
convex optimization problem
1 2
min ‖𝑤‖ subject ¿ 𝑦 𝑖 ( 𝑤 𝑥 𝑖 +𝑏 ) ≥1 ∀ 𝑖
𝑇
𝑤, 𝑏 2
The Method of Lagrange Multipliers
Minimizing a function can be found by . What if there are
constraints?
First example:
𝐹 (𝑧) 𝐹 (𝑧) ( ^
𝑧 )
𝐹
𝑧∇
^
∇𝐹
𝑎
𝑎
𝑇 𝑇
𝑧 𝑎=0 𝑧 𝑎=0
What about general equality constraints?
𝐹 (𝑧)
^
𝑧 ∇ 𝐹( 𝑧^ )
∇ 𝐺( 𝑧^ )
𝐺 ( 𝑧 )=0
What about general inequality constraints?
^
𝑧 ∇ 𝐹( 𝑧^ )
^
𝑧 ∇ 𝐺( 𝑧^ )
𝐺 ( 𝑧 )≤ 0 𝐺 ( 𝑧 )≤ 0
𝛻 𝐹(^ ^ ) <0
𝑧 ) =0 𝐺 ( 𝑧
Inactive Case Active Case
1 2
min ‖𝑤‖ subject ¿ 𝑦 𝑖 ( 𝑤 𝑥 𝑖 +𝑏 ) ≥1 ∀ 𝑖
𝑇
𝑤, 𝑏 2
Decision function:
Complementary slackness:
Crucial Observations
From the dual formulation, we
observe the following
1. Only vectors closest to the
decision boundary matters
in predictions. These are
called support vectors.
2. The dual formulation of
the problem depends on
the inputs only through the
dot product support vectors
Kernel Support Vector Machines
Decision function: