0% found this document useful (0 votes)

16 views27 pages

Lec 02

This lecture focuses on supervised learning, particularly the Nearest Neighbours algorithm for classification tasks. It explains the process of classifying a new input by finding the nearest input in the training set and discusses the k-Nearest Neighbours (kNN) method, including its sensitivity to noise and the importance of choosing the right hyperparameters. The lecture also highlights challenges such as the Curse of Dimensionality and the computational costs associated with the algorithm.

Uploaded by

Dániel Krebsz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views27 pages

Lec 02

Uploaded by

Dániel Krebsz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

CSC 411: Introduction to Machine Learning

Lecture 2: Nearest Neighbours

Mengye Ren and Matthew MacKay

University of Toronto

UofT CSC411 2019 Winter Lecture 02 1 / 27

Introduction

Today (and for the next 5 weeks) we’re focused on supervised learning.
This means we’re given a training set consisting of inputs and
corresponding labels
Machine learning - learning a program. Labels are the expected output of
the correct program when given the inputs.

Task Inputs Labels

object recognition image object category
image captioning image caption
document classification text document category
speech-to-text audio waveform text
.. .. ..
. . .

Goal: correctly predict labels for data not in the training set (“in the wild”)
i.e. our ML algorithm must generalize

UofT CSC411 2019 Winter Lecture 02 2 / 27

Input Vectors

Machine learning algorithms need to handle lots of types of data: images,

text, audio waveforms, credit card transactions, etc.

Common strategy: represent the input as an input vector in Rd

I Representation = mapping to another space that’s easy to manipulate
I Vectors are a great representation since we can do linear algebra!

UofT CSC411 2019 Winter Lecture 02 3 / 27

Input Vectors

What an image looks like to the computer:

[Image credit: Andrej Karpathy]

UofT CSC411 2019 Winter Lecture 02 4 / 27

Input Vectors
Can use raw pixels:

Can do much better if you compute a vector of meaningful features.

UofT CSC411 2019 Winter Lecture 02 5 / 27
Input Vectors

Mathematically, our training set consists of a collection of pairs of an input

vector x ∈ Rd and its corresponding target, or label, t
I Regression: t is a real number (e.g. stock price)
I Classification: t is an element of a discrete set {1, . . . , C }
I These days, t is often a highly structured object (e.g. image)

Denote the training set {(x(1) , t (1) ), . . . , (x(N) , t (N) )}

I Note: these superscripts have nothing to do with exponentiation!

UofT CSC411 2019 Winter Lecture 02 6 / 27

Nearest Neighbours
Suppose we’re given a novel input vector x we’d like to classify.
The idea: find the nearest input vector to x in the training set and copy its
label.
Can formalize “nearest” in terms of Euclidean distance
v
u d
uX
||x − y||2 = t (xj − yj )2
j=1

Algorithm:
1. Find example (x∗ , t ∗ ) (from the stored training set) closest to x.
That is:
x∗ = argmin dist(x(i) , x)
x(i) ∈train. set
2. Output y = t ∗

Note: we don’t need to compute the square root. Why?

UofT CSC411 2019 Winter Lecture 02 7 / 27
Nearest Neighbours: Decision Boundaries

We can visualize the behavior in the classification setting using a Voronoi

diagram.

UofT CSC411 2019 Winter Lecture 02 8 / 27

Nearest Neighbours: Decision Boundaries

Decision boundary: the boundary between regions of input space assigned to

different categories.

UofT CSC411 2019 Winter Lecture 02 9 / 27

Nearest Neighbours: Decision Boundaries

Example: 3D decision boundary

UofT CSC411 2019 Winter Lecture 02 10 / 27

k-Nearest Neighbours
[Pic by Olga Veksler]

Nearest Neighbours sensitive to noise or mis-labeled data (“class noise”).

Solution?
Smooth by having k nearest Neighbours vote

Algorithm (kNN):
1. Find k examples {(x(r ) , t (r ) )}kr=1 closest to the test instance x
2. Classification output is majority class
Xk
y = arg max I[t = t (r ) ]
t
r =1

UofT CSC411 2019 Winter Lecture 02 11 / 27

K-Nearest Neighbours
k=1

[Image credit: ”The Elements of Statistical Learning”]

UofT CSC411 2019 Winter Lecture 02 12 / 27
K-Nearest Neighbours
k=15

[Image credit: ”The Elements of Statistical Learning”]

UofT CSC411 2019 Winter Lecture 02 13 / 27
k-Nearest Neighbours

Tradeoffs in choosing k? Remember: goal is to correctly classify unseen

examples
Small k
I Good at capturing fine-grained patterns
I May overfit, i.e. be sensitive to random idiosyncrasies in the training

data

Large k
I Makes stable predictions by averaging over lots of examples
I May underfit, i.e. fail to capture important regularities

Rule of thumb: k < sqrt(n), where n is the number of training examples

UofT CSC411 2019 Winter Lecture 02 14 / 27

Choosing Hyperparameters using a Validation Set

k is an example of a hyperparameter, something we can’t fit as part of the

learning algorithm itself, but which controls the behavior of the algorithm
We want to choose hyperparameters based on how well the algorithm
generalizes
Thus, we separate some of our available data into a validation set, distinct
from the training set
Model’s performance on the validation set indicates how well it generalizes
I choose hyperparameters which leads to best performance (lowest error)
on validation set
I Note: error here means number of incorrectly classified examples

UofT CSC411 2019 Winter Lecture 02 15 / 27

Test Set
Now hyperparameters might have overfit to the validation set! Validation
performance not good assessment of generalization of final algorithm
Solution: separate an additional test set from the available data and
evaluate on it once hyperparameters are chosen
I Available data partitioned into 3 sets: training, validation, and test

The test set is used only at the very end, to measure the generalization
performance of the final configuration.
UofT CSC411 2019 Winter Lecture 02 16 / 27
K-Nearest Neighbours

[Image credit: ”The Elements of Statistical Learning”]

UofT CSC411 2019 Winter Lecture 02 17 / 27

The Curse of Dimensionality

Low-dimensional visualizations are misleading!

I Given a new point, we want to classify it based on a point only a small
distance away
I But in high dimensions, “most” points are far apart.
At least how many points are needed to guarantee the nearest neighbor is
closer than ?
I The volume of a single ball of radius is O(d )
d
I The total volume of
[0, 1] is 1.
1 d
I Therefore O ( ) balls are needed to cover the volume.
Assuming data follows uniform distribution, training set size must grow
exponentially with the number of dimensions for points to be close by!

UofT CSC411 2019 Winter Lecture 02 18 / 27

The Curse of Dimensionality

Edge length of hypercube required to occupy given fraction r of volume of

unit hypercube [0, 1]d is r 1/d
I If d = 10 and r = 0.1, the edge length required is 0.11/10 ≈ 0.8
I To use 10% of the data to make our decision, must cover 80% of the
range of each dimension!

[Image credit: ”The Elements of Statistical Learning”]

UofT CSC411 2019 Winter Lecture 02 19 / 27

The Curse of Dimensionality
In high dimensions, “most” points are approximately the same distance.

Saving grace: some datasets (e.g. images) may have low intrinsic
dimension, i.e. lie on or near a low-dimensional manifold. So nearest
Neighbours sometimes still works in high dimensions.

UofT CSC411 2019 Winter Lecture 02 20 / 27

Normalization

Nearest Neighbours can be sensitive to the ranges of different features.

Often, the units are arbitrary:

Simple fix: normalize each dimension to be zero mean and unit variance.
I.e., compute the mean µj and standard deviation σj , and take
xj − µj
x̃j =
σj

Caution: depending on the problem, the scale might be important!

UofT CSC411 2019 Winter Lecture 02 21 / 27

Computational Cost

Number of computations at training time: 0

Number of computations at test time, per query (naı̈ve algorithm)

I Calculuate D-dimensional Euclidean distances with N data points:
O(ND)
I Sort the distances: O(N log N)

This must be done for each query, which is very expensive by the standards
of a learning algorithm!

Need to store the entire dataset in memory!

Tons of work has gone into algorithms and data structures for efficient
nearest Neighbours with high dimensions and/or large datasets.

UofT CSC411 2019 Winter Lecture 02 22 / 27

Example: Digit Classification
Decent performance when lots of data

UofT CSC411 2019 Winter Lecture 02 23 / 27

Example: Digit Classification

KNN can perform a lot better with a good similarity measure.

Example: shape contexts for object recognition. In order to achieve
invariance to image transformations, they tried to warp one image to match
the other image.
I Distance measure: average distance between corresponding points on
warped images
Achieved 0.63% error on MNIST, compared with 3% for Euclidean KNN.
Competitive with conv nets at the time, but required careful engineering.

[Belongie, Malik, and Puzicha, 2002. Shape matching and object recognition using shape
contexts.]
UofT CSC411 2019 Winter Lecture 02 24 / 27
Example: 80 Million Tiny Images

80 Million Tiny Images was the

first extremely large image
dataset. It consisted of color
images scaled down to 32 × 32.
With a large dataset, you can
find much better semantic
matches, and KNN can do
some surprising things.
Note: this required a carefully
chosen similarity metric.

[Torralba, Fergus, and Freeman, 2007. 80 Million Tiny Images.]

UofT CSC411 2019 Winter Lecture 02 25 / 27

Example: 80 Million Tiny Images

[Torralba, Fergus, and Freeman, 2007. 80 Million Tiny Images.]

UofT CSC411 2019 Winter Lecture 02 26 / 27
Conclusions

Simple algorithm that does all its work at test time — in a sense, no
learning!

Can control the complexity by varying k

Suffers from the Curse of Dimensionality

Next time: decision trees, another approach to regression and

classification

UofT CSC411 2019 Winter Lecture 02 27 / 27

Nayaya Recommendation Letter
No ratings yet
Nayaya Recommendation Letter
1 page
III Putman, H. Paul - Rational Psychopharmacology - A Book of Clinical Skills-American Psychiatric Association Publishing (2020)
100% (7)
III Putman, H. Paul - Rational Psychopharmacology - A Book of Clinical Skills-American Psychiatric Association Publishing (2020)
336 pages
Visual Metaphor
No ratings yet
Visual Metaphor
33 pages
Object Recognition: Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab
No ratings yet
Object Recognition: Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab
76 pages
The ROOM Fellowship Slides 2022-0820 - 783978159
No ratings yet
The ROOM Fellowship Slides 2022-0820 - 783978159
23 pages
Formal Languages and Automata Theory Exercises Finite Automata Unit 3
No ratings yet
Formal Languages and Automata Theory Exercises Finite Automata Unit 3
12 pages
Lec 02
No ratings yet
Lec 02
60 pages
KNN and Naive Bayes
No ratings yet
KNN and Naive Bayes
61 pages
Statistical Learning
No ratings yet
Statistical Learning
92 pages
Distance Based Method
No ratings yet
Distance Based Method
67 pages
Lecture 2
No ratings yet
Lecture 2
101 pages
Cp5293 Big Data Analytics 1
No ratings yet
Cp5293 Big Data Analytics 1
9 pages
K Nearest Neighbor Classification
0% (1)
K Nearest Neighbor Classification
32 pages
Lecture 2 - Nearest-Neighbors Methods
No ratings yet
Lecture 2 - Nearest-Neighbors Methods
57 pages
Warming-Up To ML, and Some Simple Supervised Learners (Distance-Based "Local" Methods)
No ratings yet
Warming-Up To ML, and Some Simple Supervised Learners (Distance-Based "Local" Methods)
29 pages
CS464 Ch1 Intro Fall2020
No ratings yet
CS464 Ch1 Intro Fall2020
83 pages
Daftar Mahasiswa Studi Independen 9 Februari 2023
No ratings yet
Daftar Mahasiswa Studi Independen 9 Februari 2023
32 pages
L3 KNN
No ratings yet
L3 KNN
17 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
73 pages
Classification Methods I
No ratings yet
Classification Methods I
20 pages
Micro-Doppler Signatures Based Human Activity Classification Using Dense-Inception Neural Network
No ratings yet
Micro-Doppler Signatures Based Human Activity Classification Using Dense-Inception Neural Network
6 pages
Chapter#10 (Part#01) SL (K-NN)
No ratings yet
Chapter#10 (Part#01) SL (K-NN)
27 pages
White Book
No ratings yet
White Book
65 pages
IV Distance and Rule Based Models 4.1 Distance Based Models
No ratings yet
IV Distance and Rule Based Models 4.1 Distance Based Models
45 pages
Lecture 2
No ratings yet
Lecture 2
98 pages
JETIR1906K46
No ratings yet
JETIR1906K46
3 pages
IJRPR21544
No ratings yet
IJRPR21544
8 pages
A Robot Is A Virtual or Mechanical Artificial Agent
100% (3)
A Robot Is A Virtual or Mechanical Artificial Agent
13 pages
Lecture 12
No ratings yet
Lecture 12
15 pages
AI Empowered Methods For Smart Energy Consumption: A Review of Load Forecasting, Anomaly Detection and Demand Response
No ratings yet
AI Empowered Methods For Smart Energy Consumption: A Review of Load Forecasting, Anomaly Detection and Demand Response
31 pages
List Pattern
No ratings yet
List Pattern
1 page
Enhanced Bagging EBagging A Novel Approach For Ens
No ratings yet
Enhanced Bagging EBagging A Novel Approach For Ens
15 pages
Syllabus
No ratings yet
Syllabus
35 pages
Revue de L'impact de La Transformation Digitale Sur Les Pratiques RH
No ratings yet
Revue de L'impact de La Transformation Digitale Sur Les Pratiques RH
11 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
Week 7 Nearest Neighbours
No ratings yet
Week 7 Nearest Neighbours
21 pages
1 - Nearest Neighbor Classification Handout
No ratings yet
1 - Nearest Neighbor Classification Handout
6 pages
LFD 2005 Nearest Neighbour
No ratings yet
LFD 2005 Nearest Neighbour
6 pages
The Impact of Artificial Intelligence On Learner-Instructor Interaction in Online Learning
No ratings yet
The Impact of Artificial Intelligence On Learner-Instructor Interaction in Online Learning
23 pages
Machine Learning Algorithms - pptx-1
No ratings yet
Machine Learning Algorithms - pptx-1
129 pages
A Guide To Transformers
No ratings yet
A Guide To Transformers
7 pages
COS4852 2023 Unit 2 - KNN
No ratings yet
COS4852 2023 Unit 2 - KNN
10 pages
r05410201 - Neural Networks & Fuzzy Logic
100% (1)
r05410201 - Neural Networks & Fuzzy Logic
4 pages
Lec 4
No ratings yet
Lec 4
28 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
Introduction To Decision Trees
No ratings yet
Introduction To Decision Trees
10 pages
ShortCourse QTT Lecture1
No ratings yet
ShortCourse QTT Lecture1
40 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
20 KNN Presentation
No ratings yet
20 KNN Presentation
16 pages
Inbound 3051673720160856818
No ratings yet
Inbound 3051673720160856818
5 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Unit 3
No ratings yet
Unit 3
100 pages
01 Basics 02knn 02
No ratings yet
01 Basics 02knn 02
7 pages
5c. Nearest Neighbour Classifier
No ratings yet
5c. Nearest Neighbour Classifier
2 pages
KNN CIML
No ratings yet
KNN CIML
12 pages
Park and Gopdf
No ratings yet
Park and Gopdf
11 pages
RBF, KNN, SVM, DT
No ratings yet
RBF, KNN, SVM, DT
9 pages
Apply The Concept
No ratings yet
Apply The Concept
12 pages
3 KNN
No ratings yet
3 KNN
18 pages
02-knn Slides
No ratings yet
02-knn Slides
57 pages
PresCNRS22 2
No ratings yet
PresCNRS22 2
37 pages
Mechanical Engineering
No ratings yet
Mechanical Engineering
151 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
K Nearest Neighbor Classification
No ratings yet
K Nearest Neighbor Classification
30 pages
cs4302 Lecture2
No ratings yet
cs4302 Lecture2
40 pages
08 - KNN
No ratings yet
08 - KNN
39 pages
Assignment 3 B
No ratings yet
Assignment 3 B
7 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
Ch2 - Lec2 - K Nearest Neighbour (KNN)
No ratings yet
Ch2 - Lec2 - K Nearest Neighbour (KNN)
18 pages
Lecture 5
No ratings yet
Lecture 5
21 pages
BMETE12AP50 Topics
No ratings yet
BMETE12AP50 Topics
1 page
Robotics and Automation PPT R.M V.S
No ratings yet
Robotics and Automation PPT R.M V.S
14 pages
ML 5
No ratings yet
ML 5
35 pages
ML KN
No ratings yet
ML KN
12 pages
Chapter 2
No ratings yet
Chapter 2
26 pages
Machine Learning: Nearest Neighbors
No ratings yet
Machine Learning: Nearest Neighbors
23 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
Sarowar 2025 Ijca 924776
100% (1)
Sarowar 2025 Ijca 924776
34 pages
9.introduction To Artificial Intelligence
No ratings yet
9.introduction To Artificial Intelligence
14 pages
Project
No ratings yet
Project
13 pages
SN74LS04
No ratings yet
SN74LS04
30 pages
Self Reading - KNN - Notes
No ratings yet
Self Reading - KNN - Notes
7 pages
Textbook ML - Removed
No ratings yet
Textbook ML - Removed
10 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
BD179G en 10029639
No ratings yet
BD179G en 10029639
5 pages
Ai-Google Internship
No ratings yet
Ai-Google Internship
3 pages
SSL Lx5097sisgsyc
No ratings yet
SSL Lx5097sisgsyc
2 pages
Projects
No ratings yet
Projects
3 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
ML 2
No ratings yet
ML 2
41 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet