2223 ML Lecture04
2223 ML Lecture04
Lecture 4
Non-parametric Learning and kNNs
Stergios Christodoulidis
MICS Laboratory
CentraleSupélec
Université Paris-Saclay
https://stergioc.github.io/
CentraleSupélec Stergios C. 1
Last Lecture
CentraleSupélec Stergios C. 2
Linear Regression – Problem Formulation
• Data: Inputs are continuous vectors of length 𝐾 (dimensions, features). Outputs are continuous
scalars
𝒟 = {𝒙 ! , 𝑦 (!) }& '
!$% , where 𝑥 ∈ ℝ and y ∈ ℝ
• Prediction: Output is a linear function of the inputs
𝑦- = ℎ( 𝑥 = 𝜃) + 𝜃% 𝑥% + 𝜃* 𝑥* + ⋯ + 𝜃' 𝑥'
𝑦- = ℎ( 𝑥 = 𝜃 +𝒙
• Learning: finds the parameters 𝜃 that minimize some objective function
𝜃 ∗ = argmin ℐ(x, y; θ)
(
CentraleSupélec Stergios C. 3
Gradient Descent – The Idea (1/3)
𝑢! = 𝑢 − 𝜆∇𝑓(𝑢)
𝜆: learning rate
CentraleSupélec Stergios C. 4
Model Complexity
Θ(2)
• A “simple” model is one where θ is almost uniform θ
“simple model”
• Few features are significantly more relevant than others
• 𝜃 # will be small (L2 norm) Θ(3)
θ
“simple model”
CentraleSupélec Stergios C. 5
Ridge Regression
prefers uniform
JRR ( ) = J( ) + || ||22 parameters
N K
1 T
= ( x(i) y (i) )2 + 2
k
2 i=1 k=1
Hyperparameter λ: how
much should we trade-off Model complexity
accuracy versus
complexity
CentraleSupélec Stergios C. 6
LASSO Regression
yields sparse
JLASSO ( ) = J( ) + || ||1 parameters
N K
1 T
= ( x(i) y (i) )2 + | k|
2 i=1 k=1
CentraleSupélec Stergios C. 7
Logistic Regression (1/2)
p(label|data)
by training a classifier of the form
(
1 if ✓> x 0
yi =
0 otherwise
• Intuitively, it does not make sense for 𝑦𝑖 to take values larger than 1 or smaller than 0
• Note the sign() function presented before (classification as a regression task) is not very useful
CentraleSupélec Stergios C. 8
Logistic Regression (2/2)
𝑝( 𝑦 𝑥 ∈ [0,1]
1
𝑦- = 𝑓 𝑥 = 𝜎 𝜃-𝑥 = !/
1 + 𝑒 .(
CentraleSupélec Stergios C. 9
Today’s Lecture
CentraleSupélec Stergios C. 10
Today’s Lecture
CentraleSupélec Stergios C. 11
Non Parametric Learning
CentraleSupélec Stergios C. 12
Classification: Oranges and Lemons
CentraleSupélec Stergios C. 13
Classification as Induction
• Classification as induction
• Comparison to instances already seen in
training
• Non-parametric learning
CentraleSupélec Stergios C. 14
Non-parametric Learning
- Examples:
l k-nearest neighbors
l Tree-based methods
l Some cases of SVMs (next lecture)
CentraleSupélec Stergios C. 15
How Would You Color the Blank Circles?
CentraleSupélec Stergios C. 16
How Would You Color the Blank Circles?
CentraleSupélec Stergios C. 17
Partitioning the Space
CentraleSupélec Stergios C. 18
Nearest Neighbors – The Idea
• Learning (training):
• Store all the training examples
• Prediction:
• For a point x: assign the label of the training example closest to it
CentraleSupélec Stergios C. 19
Nearest Neighbors – The Idea
• Learning (training):
• Store all the training examples
• Prediction:
• For a point x: assign the label of the training example closest to it
• Classification
• Majority vote: predict the class of the most frequent label among the k neighbors
• Regression
• Predict the average of the labels of the k neighbors
CentraleSupélec Stergios C. 20
Instance-based Learning
• Learning
• Store training instances
• Prediction
• Compute the label for a new instance based on its similarity with the stored instances
CentraleSupélec Stergios C. 21
Computing distances and similarities
CentraleSupélec Stergios C. 22
Distance Function
CentraleSupélec Stergios C. 23
Distance Between Instances
CentraleSupélec Stergios C. 24
From Distance to Similarity
1
s=
1+d
• Pearson’s correlation
n
Pn 1X
j=1 (x j x̄)(z j z̄) x̄ = xj
⇢(x, z) = qP qP n
n n j=1
j=1 (x j x̄)2 j=1 (z j z̄)2
Pn
j=1 x jz j
⇢(x, z) = qP qP
n n
j=1 x2j 2
j=1 z j
CentraleSupélec Stergios C. 25
Pearson’s Correlation
Pn
j=1 x jz j hx, zi
⇢(x, z) = qP qP = = cos ✓
n
x2j n 2 ||x|| ||z||
j=1 j=1 z j
CentraleSupélec Stergios C. 26
Categorical Features
CentraleSupélec Stergios C. 27
Binary Representation (1/2)
0 1 1 0 0 1 0 0 0 1 0 1 0 0 1
CentraleSupélec Stergios C. 28
Binary Representation (2/2)
0 1 1 0 0 1 0 0 0 1 0 1 0 0 1
CentraleSupélec Stergios C. 29
Example
x = 010101001
y = 010011000
• Hamming distance
x = 010101001
y = 010011000
Thus, d(x,y) = 3
• Jaccard similarity
𝑱 = (# 𝒐𝒇 𝟏𝟏) / ( # 𝒐𝒇 𝟎𝟏 + # 𝒐𝒇 𝟏𝟎 + # 𝒐𝒇 𝟏𝟏) = (𝟐) / (𝟏 + 𝟐 + 𝟐) = 𝟐 / 𝟓 = 𝟎. 𝟒
CentraleSupélec Stergios C. 30
kNN Classifiers
CentraleSupélec Stergios C. 31
Nearest Neighbor Algorithm
Algorithm 1
Input: training data 𝒟 = {𝑥 % , 𝑦 % }%&",…,)
1. Find example (𝑥 ∗ , 𝑦 ∗ ) from the stored training set closest to the test instance 𝑥 such that:
𝑥 ∗ = 𝑎𝑟𝑔min 𝑑(𝑥 % , 𝑥)
+ ! ∈𝒟
2. Then output is f(𝑥) = 𝑦 ∗
CentraleSupélec Stergios C. 32
k-Nearest Neighbors (kNN) Algorithm
1NN
Every example in
the blue shaded
area will be
misclassified as
the blue class
CentraleSupélec Stergios C. 33
Choice of Parameter k
CentraleSupélec Stergios C. 34
How to Choose Parameter k
m: # of training instances
k=7
Source: https://kevinzakka.github.io/2016/07/13/k-nearest-neighbor/
CentraleSupélec Stergios C. 35
Advantages of kNN
CentraleSupélec Stergios C. 36
Drawbacks of kNN
• Memory requirements
• Must store all training data
• Prediction can be slow (will figure it out by yourself in the lab)
• Complexity of labeling 1 new data point: 𝒪(𝑘𝑛𝑚)
• But kNN works best with lots of samples
• Can we further improve the running time?
• Efficient data structures (e.g., k-D trees)
• Approximate solutions based on hashing
• High dimensional data and the curse of dimensionality
• Computation of the distance in a high dimensional space may become meaningless
• Need more training data
• Dimensionality reduction
CentraleSupélec Stergios C. 37
kNN – Some More Issues
• Simple option: linearly scale the range of each feature to be, e.g., in the range of [0,1]
CentraleSupélec Stergios C. 38
Decision Boundary of kNN
CentraleSupélec Stergios C. 39
Voronoi Tessellation
• Voronoi cell of x:
• Set of all points of the space closer to x than any
other point of the training set x
• Polyhedron
• Voronoi tessellation (or diagram) of the space
• Union of all Voronoi cells
CentraleSupélec Stergios C. 40
Voronoi Tessellation
Wikipedia: https://en.wikipedia.org/wiki/Voronoi_diagram
CentraleSupélec Stergios C. 41
kNN Variants
• Weighted kNN
• Weight the vote of each neighbor xi according to the distance to the test point x
1
wi =
d(x, xi )2
Source: https://epub.ub.uni-muenchen.de/1769/1/paper_399.pdf
CentraleSupélec Stergios C. 42
Applications of kNN
MNIST dataset
Source: http://yann.lecun.com/exdb/mnist/
CentraleSupélec Stergios C. 43
kNN in scikit-learn
http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
CentraleSupélec Stergios C. 44
Summary
CentraleSupélec Stergios C. 45
Thank you!
CentraleSupélec Stergios C. 46