Lecture 03 - Supervised Learning by Computing Distances - Plain
Lecture 03 - Supervised Learning by Computing Distances - Plain
For denoting multiple vectors, we will use a subscript with each vector, e.g.,
N images denoted by N feature vectors , or compactly as
The vector denotes the image
(a scalar) denotes the feature () of the image
CS771: Intro to ML
6
Some Basic Operations on Vectors
Addition/subtraction of two vectors gives another vector of the same size
For a vector , its Euclidean norm is defined via its inner product with itself
Also the Euclidean distance of from origin
Note: Euclidean norm is also called L2
norm
CS771: Intro to ML
7
Computing Distances
Euclidean (L2 norm) distance between two vectors and
√
Sqrt of Inner product of Another expression in terms of inner
𝐷 the difference vector! products of individual vectors
𝑑 2 ( 𝒂, 𝒃 )=¿|𝒂− 𝒃|∨¿ 2= ∑ ( 𝑎𝑖 − 𝑏𝑖 ) =√ ( 𝒂− 𝒃 )
2 ⊤
( 𝒂−𝒃 )=√ 𝒂 𝒂+𝒃 𝒃 − 2 𝒂 𝒃 ¿ ⊤ ⊤ ⊤
𝑖=1
Weighted Euclidean distance between two vectors and
Useful tip: Can achieve the effect of is a DxD diagonal matrix with weights on its
feature scaling (recall last lecture) diagonals. Weights may be known or even
√
by using weighted Euclidean learned from data (in ML problems)
distances!
𝐷
𝑑 𝑤 ( 𝒂 , 𝒃 )= ∑ 𝑤 𝑖 ( 𝑎𝑖 − 𝑏𝑖 ) =√ ( 𝒂 − 𝒃 )
2 ⊤
𝐖 ( 𝒂 − 𝒃)
Note: If is a DxD symmetric matrix
then it is called the Mahalanobis
distance (more on this later)
𝑖=1
Absolute (L1 norm) distance between two vectors and
L1 norm distance is also known as the Apart from L2 and L1.
Manhattan distance or Taxicab norm 𝐷 there other ways of
CS771: Intro to ML
9
Prelude: A Very Primitive Classifier
The idea also applies to multi-
class classification: Use one
image per class, and predict label
based on the distances of the test
Consider a binary classification problem – cat vs dog image from all such images
A simple idea: Predict using its distance from each of the 2 training images
d( Test
image , ) < d( Test
image
, ) ? Predict cat else dogExcellent question! Glad you
Wait. Is it ML? Seems to be Some possibilities: Use a feature asked!
like just a simple “rule”. learning/selection algorithm to Even this simple model can be
Where is the “learning” part extract features, and use a learned. For example, for the
in this? Mahalanobis distance where you feature extraction/selection part
learn the W matrix (instead of using and/or for the distance computation
a predefined W), using “distance part
metric learning” techniques CS771: Intro to ML
10
Improving Our Primitive Classifier
Just one input per class may not sufficiently capture variations in a class
Both LwP and NN will use multiple inputs per class but in different ways
CS771: Intro to ML
11
Learning with Prototypes (LwP)
Basic idea: Represent each class by a “prototype” vector
Predict label of each test input based on its distances from the class prototypes
Predicted label will be the class that is the closest to the test input
How we compute distances can have an effect on the accuracy of this model
(may need to try Euclidean, weight Euclidean, Mahalanobis, or something else)
Pic from: https://www.reddit.com/r/dataisbeautiful/comments/3wgbv9/average_handwritten_digit_oc/ CS771: Intro to ML
12
Learning with Prototypes (LwP): An Illustration
Suppose the task is binary classification (two classes assumed pos and neg)
1 𝜇
∑
1
𝜇− = 𝐱𝑛 𝜇−
+¿=
∑
¿
𝑁 − 𝑦 =−1
𝑛 𝜇+¿¿ 𝑁+¿
𝑦 𝑛=+1
𝐱𝑛¿
Test example
CS771: Intro to ML
14
LwP: The Prediction Rule, Mathematically
Let’s expand the prediction rule expression a bit more
Test example
In general, if classes are not equisized and spherical, LwP with Euclidean
distance will usually not work well (but improvements possible; will discuss
later)
CS771: Intro to ML
16
LwP: Some Key Aspects
Very simple, interpretable, and lightweight model
Just requires computing and storing the class prototype vectors
Works with any number of classes (thus for multi-class classification as well)
With a learned distance function, can work very well even with very few
examples from each class (used in some “few-shot learning” models nowadays
– if interested, please refer to “Prototypical Networks for Few-shot Learning”)
CS771: Intro to ML
17
Next Lecture
Fixing LwP
Nearest Neighbors
CS771: Intro to ML