0% found this document useful (0 votes)

39 views57 pages

Lecture 2 - Nearest-Neighbors Methods

The document discusses the k-nearest neighbors (kNN) machine learning algorithm. It explains that kNN involves storing all training examples and classifying new examples based on the labels of its k nearest neighbors. The algorithm finds the k closest training examples to make a prediction, which can be a majority vote for classification or average of labels for regression. Choosing k involves a tradeoff between noise and computational cost. kNN is a non-parametric, instance-based approach to learning.

Uploaded by

Ali Raza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views57 pages

Lecture 2 - Nearest-Neighbors Methods

Uploaded by

Ali Raza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 57

MACHINE LEARNING Dr.

Sajid Mahmood

CS 444
Slides were created by Chloé-Agathe Azencott
Centre for Computational Biology, Mines ParisTech
[email protected]
LECTURE 2 - NEAREST-
NEIGHBORS METHODS
LEARNING
OBJECTIVES
● Implement the nearest-neighbor and k-nearest-
neighbors algorithms.
● Compute distances between real-valued vectors as
well as objects represented by categorical features.
● Define the decision boundary of the nearest-
neighbor algorithm.
● Explain why kNN might not work well in high
dimension.

3
NEAREST
NEIGHBORS

4
●
HOW WOULD YOU COLOR THE
BLANK CIRCLES?

5
●
HOW WOULD YOU COLOR THE
BLANK CIRCLES?

6
Partitioning the space
The training data partitions the entire space

7
NEAREST
● Learning: NEIGHBOR
– Store all the training examples
● Prediction:
– For x: the label of the training example closest to it

8
K NEAREST
● Learning:
NEIGHBORS
– Store all the training examples
● Prediction:
– Find the k training examples closest to x
– Classification?

9
K NEAREST
● Learning:
NEIGHBORS
– Store all the training examples
● Prediction:
– Find the k training examples closest to x
– Classification
Majority vote: Predict the class of the most frequent
label among the k neighbors.

10
K NEAREST
● Learning:
NEIGHBORS
– Store all the training examples
● Prediction:
– Find the k training examples closest to x
– Classification
Majority vote: Predict the class of the most frequent
label among the k neighbors.
– Regression?

11
K NEAREST
● Learning:
NEIGHBORS
– Store all the training examples
● Prediction:
– Find the k training examples closest to x
– Classification
Majority vote: Predict the class of the most frequent
label among the k neighbors.
– Regression
Predict the average of the labels of the k neighbors.

12
CHOIC
E OF K
● Small k: noisy
The idea behind using more than 1 neighbor is to
average out the noise
● Large k: computationally intensive
If k = n ?

13
CHOIC
E OF K
● Small k: noisy
The idea behind using more than 1 neighbor is to
average out the noise
● Large k: computationally intensive
If k=n, then we predict
– for classification: the majority class
– for regression: the average value
● Set k by cross-validation
● Heuristic: k ≈ √n

14
NON-
PARAMETRIC
LEARNING
Non-parametric learning algorithm:
– the complexity of the decision function grows with the
number of data points.
– contrast with linear regression (≈ as many parameters as
features).
– Usually: decision function is expressed directly in terms
of the training examples.
– Examples:
● kNN (this chapter)
● tree-based methods (Chap. 9)
● SVM (Chap. 10)

15
INSTANCE-BASED LEARNING
● Learning:
●
– Storing training instances.
Predicting:
– Compute the label for a new instance based on its
similarity with the stored instances.
● Also called lazy learning.
● Similar to case-based reasoning
– Doctors treating a patient based on how patients with
similar symptoms were treated,
– Judges ruling court cases based on legal precedent.
16
INSTANCE-BASED LEARNING
●
Learning:
– Storing training instances.
● Predicting:
– Compute the label for a new instance based on its
similarity with the stored instances.
where the magic happens!
● Also called lazy learning.
● Similar to case-based reasoning
– Doctors treating a patient based on how patients with
similar symptoms were treated,
– Judges ruling court cases based on legal precedent.
17
COMPUTING
DISTANCES &
SIMILARITIES

18
DISTANCES
BETWEEN
●
INSTANCES
Distance

19
DISTANCES
BETWEEN
●
INSTANCES
Distance

20
DISTANCES
BETWEEN
●
INSTANCES
Euclidean distance

21
DISTANCES
BETWEEN
●
INSTANCES
Euclidean distance

● Manhattan distance

Why is this called the Manhattan distance?

22
DISTANCES
BETWEEN
●
INSTANCES
Euclidean distance

● Manhattan distance

● Lq-norm: Minkowski distance

– L1 = Manhattan.
– L2 = Euclidean.
– L∞ ?
23
DISTANCES
BETWEEN
●
INSTANCES
Euclidean distance

● Manhattan distance

● Lq-norm: Minkowski distance

– L1 = Manhattan.
– L2 = Euclidean.
– L∞
24
SIMILARITY
BETWEEN
INSTANCES
● Pearson's correlation

● Assuming the data is centered

Geometric interpretation?

25
SIMILARITY
BETWEEN
●
INSTANCES
PEARSON'S CORRELATION (CENTERED DATA)

● Cosine similarity: the dot product can be used to

measure similarities.

26
CATEGORICAL
FEATURES
● Ex: a feature that can take 5 values
– Sports
– World
– Culture
– Internet
– Politics
●
Naive encoding: x1 in {1, 2, 3, 4, 5}:
– Why is Sports closer to World than Politics?
●
One-hot encoding: x1, x2, x3, x4, x5
– Sports: [1, 0, 0, 0, 0]
– Internet: [0, 0, 0, 1, 0] 27
CATEGORICAL
FEATURES
● Represent object as the list of presence/absence (or
counts) of features that appear in it.
● Example: small molecules
features = atoms and bonds of a certain type
– C, H, S, O, N...
– O-H, O=C, C-N....
BINARY
REPRESENTATI
0 1 1 ON
0 0 1 0 0 0 1 0 1 0 0
1

no occurrence 1+ occurrences
of the 1st feature of the 10th feature
● Hamming distance
Number of bits that are different

Equivalent to
?
BINARY
REPRESENTATI
0 1 1 ON
0 0 1 0 0 0 1 0 1 0 0
1

no occurrence 1+ occurrences
of the 1st feature of the 10th feature
● Hamming distance
Number of bits that are different

Equivalent to
BINARY
REPRESENTATI
0 1 1ON
0 0 1 0 0 0 1 0 1 0 0 1

● Tanimoto/Jaccard similarity
Number of shared features (normalized)
COUNTS
REPRESENTATI
0 1 2ON
0 0 1 0 0 0 4 0 1 0 0
7
# occurrences
no occurrence of the 10th feature
of the 1st feature
● MinMax similarity
Number of shared features (normalized)

If x is binary, MinMax and Tanimoto are equivalent

CATEGORICA
L FEATURES
● Feature
s
● Compute the Hamming distance and Tanimoto and
MinMax similarities between these objects: ?
CATEGORICA
L FEATURES
● Feature
s
● Compute the Hamming distance and Tanimoto and
MinMax similarities between these objects:
100011010110
300011010120

111011011110
211021011120

111011010100
311011010100
CATEGORICAL
●
FEATURES
A = 100011010110 / 300011010120
● B = 111011011110 / 211021011120
● C = 111011010100 / 311011010100

● Hamming distance
d(A, B) = 3 d(A, C) = 3 d(B, C)
● =2
Tanimoto
s(A, B) =similarity
6/9 s(A, C) = 5/8 s(B, C) = 7/9
= 0.67 = 0.63 = 0.78
● MinMax similarity
s(A, B) = 8/13 s(A, C) = 7/11 s(B, C) = 8/13
= 0.62 = 0.64 = 0.62 3
5
CATEGORICAL
FEATURES
● Feature
s
● When new data has unknown features: ignore
them.

=
BACK TO
NEAREST
NEIGHBORS
ADVANTAGE
S OF KNN
● Training is very fast
– Just store the training examples.
– Can use smart indexing procedures to speed-up testing
(slower training).
● Keeps the training data
–Useful if we want to do something else with it.
● Rather robust to noisy data (averaging k votes)
● Can learn complex functions
DRAWBACK
S OF KNN
● Memory requirements
● Prediction can be
?
slow.
– Complexity of labeling
1 new data point
DRAWBACK
S OF KNN
● Memory requirements
● Prediction can be
slow.
Complexity of labeling 1 new data point:
But kNN works best with lots of samples...
● construction
→ Efficient dataspace:
structures (k-D trees,
time: query:
ball-trees)
●

→ Approximate solutions based on hashing

● kNN are fooled by irrelevant attributes.
E.g. p=1000, only 10 features are relevant; distances
become meaningless.
DECISION
BOUNDARY OF
●
KNN
Classification
● Decision boundary: Line separating the positive
from negative regions.
● What decision boundary is the kNN building?
VORONOI
TESSELATION
● Voronoi cell of x:
– set of all points of the space closer to x than any other point of
the training set
– polyhedron
● Voronoid tesselation of the space: union of all
Voronoi cells.

Draw the
Voronoi cell of ?
the blue dot.
VORONOI
●
TESSELATION
Voronoi cell of x:
– set of all points of the space closer to x than any other point of
the training set
– polyhedron
● Voronoid tesselation of the space: union of all
Voronoi cells.
VORONOI
TESSELATION
The Voronoi tesselation defines the decision
●

boundary of the 1-NN.

● The kNN also partitions the space (in a more

complex way).
CURSE OF
DIMENSIONALITY
● Remember from Chap 3
● When p ↗ the proportion of a hypercube outside of
its inscribed hypersphere approaches 1.
● Volume of a p-sphere:

● What this means:

– hyperspace is very big
– all points are far apart
– dimensionality reduction needed.
KNN
VARIAN
● ε-ball neighborsTS
– Instead of using the k nearest neighbors, use all points
within a distance ε of the test point.
– What if there are no such points?
KNN
VARIAN
● Weighted kNN TS
– Weigh the vote of each neighbor according to the
distance to the test point.

– Variant: learn the optimal weights [e.g. Swamidass,

Azencott et al. 2009, Influence Relevance Voter]
COLLABORATIVE
FILTERING
● Collaborative filtering: recommend items that
similar users have liked in the past
similar users = users with similar tastes
● item-based kNN
–similarity between items: adjusted cosine
similarity
Sum over the users that rated both item A and item B

Rating of item A by user u Average rating by user u

COLLABORATIVE
FILTERING
– score of item A for user u:

k nearest neighbors of A
according to s
among the items rated
by user u
SUMM
ARY
● kNN
– very simple training
– prediction can be expensive
● Relies on a “good” distance/similarity between
instances
● Decision boundary = Voronoi tesselation
● Curse of dimensionality: hyperspace is very
big.
REFER
●
ENCES
A Course in Machine Learning.
http://ciml.info/dl/v0_99/ciml-v0_99-all.pdf
– kNN: Chap 3.2 — 3.3
– Categorical variables: Chap 3.1
– Curse of dimensionality: Chap 3.5
●
More on
– Kd-trees
https://www.ri.cmu.edu/pub_files/pub1/moore_andrew_1991
_ 1/moore_andrew_1991_1.pdf
http://www.alglib.net/other/nearestneighbors.php
– Voronoi tessellation
http://philogb.github.io/blog/2010/02/12/voronoi-tessellation
/
Lab
Even though we use the same scoring strategy, we don’t get the
same optimum. That’s because the cross-validation evaluation
strategy is different: scikit-learn compute one AUC per fold and
averages them.
THE KNN PERFORMS MUCH WORSE
THAN THE LINEAR MODELS. WITH
SUCH A LARGE NUMBER OF FEATURES,
THIS IS NOT UNEXPECTED.
COMPUTING NEAREST NEIGHBORS BASED ON CORRELATION
WORKS BETTER THAN BASED ON MINKOWSKI DISTANCES. INDEED
THIS ALLOWS TO COMPARE THE PROFILES OF THE GENE
EXPRESSIONS (WHICH GENES HAVE HIGH EXPRESSION / LOW
EXPRESSION SIMULTANEOUSLY). STILL LOGISTIC REGRESSION
WORKS BEST.

NMS TJ5500 8.x UI Guide
No ratings yet
NMS TJ5500 8.x UI Guide
1,085 pages
05 KNN
No ratings yet
05 KNN
49 pages
A PROJECT REPORT On Online Quiz System
No ratings yet
A PROJECT REPORT On Online Quiz System
46 pages
Lec 02 - KNN
No ratings yet
Lec 02 - KNN
36 pages
Machine Learning: Nearest Neighbors
No ratings yet
Machine Learning: Nearest Neighbors
23 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
m3 Final-1
No ratings yet
m3 Final-1
171 pages
UNIT - 1 PPT - DBMS - BSC
No ratings yet
UNIT - 1 PPT - DBMS - BSC
27 pages
K - Nearest Neighbours
No ratings yet
K - Nearest Neighbours
6 pages
04 KNN
No ratings yet
04 KNN
60 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
ML 5
No ratings yet
ML 5
35 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
Lecture 17 - KNN
No ratings yet
Lecture 17 - KNN
18 pages
Lecture 07 Slides
No ratings yet
Lecture 07 Slides
45 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
Week 5 - Instance-Based Learning & PCA
No ratings yet
Week 5 - Instance-Based Learning & PCA
69 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
ML KN
No ratings yet
ML KN
12 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
K Nearest Neighbour - Algorithm
No ratings yet
K Nearest Neighbour - Algorithm
29 pages
06 KNN
No ratings yet
06 KNN
41 pages
Lecture Slides-Week15,16
No ratings yet
Lecture Slides-Week15,16
50 pages
Week 7 Nearest Neighbours
No ratings yet
Week 7 Nearest Neighbours
21 pages
K Nearest Neighbor Classification
No ratings yet
K Nearest Neighbor Classification
30 pages
Unit 5 Learning With Algorithm
No ratings yet
Unit 5 Learning With Algorithm
7 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
K Nearest Neighbour
100% (1)
K Nearest Neighbour
35 pages
Session 9 KNN - 2024
No ratings yet
Session 9 KNN - 2024
23 pages
KNN
No ratings yet
KNN
53 pages
Classification KNN
No ratings yet
Classification KNN
11 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
Lecture-11-KNearest Clustering-Part-1
No ratings yet
Lecture-11-KNearest Clustering-Part-1
18 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
IMaster NCE Smart LCT V100R021C00 User Guide 01-C
No ratings yet
IMaster NCE Smart LCT V100R021C00 User Guide 01-C
59 pages
KNN Notes
No ratings yet
KNN Notes
6 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
No ratings yet
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
16 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Week 07
No ratings yet
Week 07
24 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
Online Book Store Project Report
100% (1)
Online Book Store Project Report
50 pages
K Nearest Neighbor Classification
0% (1)
K Nearest Neighbor Classification
32 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
20 KNN Presentation
No ratings yet
20 KNN Presentation
16 pages
20180723161729D4730 - Pert18 - K-Nearest Neighbor
No ratings yet
20180723161729D4730 - Pert18 - K-Nearest Neighbor
22 pages
Samsung Mobile Secret Codes
100% (1)
Samsung Mobile Secret Codes
42 pages
Textbook ML - Removed
No ratings yet
Textbook ML - Removed
10 pages
5c. Nearest Neighbour Classifier
No ratings yet
5c. Nearest Neighbour Classifier
2 pages
AIX Performance Tuning VUG May2418
No ratings yet
AIX Performance Tuning VUG May2418
50 pages
Lecture8 KNN1
No ratings yet
Lecture8 KNN1
16 pages
ML DSBA Lab4
No ratings yet
ML DSBA Lab4
5 pages
A Good Team RACF and CICS 2
No ratings yet
A Good Team RACF and CICS 2
9 pages
COS4852 2023 Unit 2 - KNN
No ratings yet
COS4852 2023 Unit 2 - KNN
10 pages
Exercises
No ratings yet
Exercises
21 pages
Blind Navigation Band
No ratings yet
Blind Navigation Band
50 pages
ICT Skills - II (Part - A - Unit - 3)
No ratings yet
ICT Skills - II (Part - A - Unit - 3)
28 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
ArchiMate D2.3 Readability and Usability Guidelines
No ratings yet
ArchiMate D2.3 Readability and Usability Guidelines
63 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
Microsoft Purview Data Lifecycle Management Overview
No ratings yet
Microsoft Purview Data Lifecycle Management Overview
22 pages
L01 Mysql Workbench Setup
No ratings yet
L01 Mysql Workbench Setup
17 pages
Arduino OLED Thermometer and Hygrometer With DHT11
No ratings yet
Arduino OLED Thermometer and Hygrometer With DHT11
6 pages
What Is Network-Attached Storage A Complete Guide
No ratings yet
What Is Network-Attached Storage A Complete Guide
51 pages
Karl Ove Knausgaard: The Man, The Myth, The Legend
No ratings yet
Karl Ove Knausgaard: The Man, The Myth, The Legend
18 pages
CL-1208 CL-1216: User Manual
No ratings yet
CL-1208 CL-1216: User Manual
82 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
MunsA Textd Grade 4 ICT TERM 2 HOLIDAY WORK 2024
No ratings yet
MunsA Textd Grade 4 ICT TERM 2 HOLIDAY WORK 2024
4 pages
APA Citations For Electronic Sources
No ratings yet
APA Citations For Electronic Sources
7 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
25 pages
BSM-3000 09
No ratings yet
BSM-3000 09
8 pages
Coc Gab Question
No ratings yet
Coc Gab Question
3 pages
10 Steps To Having Amazing One On Ones With Your Team
No ratings yet
10 Steps To Having Amazing One On Ones With Your Team
12 pages
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)
GC 2024 09 24
No ratings yet
GC 2024 09 24
8 pages
1-Blockchain Technology Adoption Barriers in The Indian Agricultural Supply
No ratings yet
1-Blockchain Technology Adoption Barriers in The Indian Agricultural Supply
15 pages
Sap IT Leads
No ratings yet
Sap IT Leads
5 pages
Guidelines For Master Thesis SS 2013 Allgemein
No ratings yet
Guidelines For Master Thesis SS 2013 Allgemein
20 pages
6502 Status Flags - Nesdev Wiki
No ratings yet
6502 Status Flags - Nesdev Wiki
3 pages
Blockchain Based Framework For Software Development Using DevOps
No ratings yet
Blockchain Based Framework For Software Development Using DevOps
6 pages
DD vcredistUI0CD6
No ratings yet
DD vcredistUI0CD6
2 pages

Lecture 2 - Nearest-Neighbors Methods

Uploaded by

Lecture 2 - Nearest-Neighbors Methods

Uploaded by

MACHINE LEARNING Dr.

Why is this called the Manhattan distance?

● Lq-norm: Minkowski distance

● Lq-norm: Minkowski distance

● Assuming the data is centered

● Cosine similarity: the dot product can be used to

If x is binary, MinMax and Tanimoto are equivalent

→ Approximate solutions based on hashing

boundary of the 1-NN.

● The kNN also partitions the space (in a more

● What this means:

– Variant: learn the optimal weights [e.g. Swamidass,

Rating of item A by user u Average rating by user u

You might also like