Lec 02
Lec 02
University of Toronto
Today (and for the next 5 weeks) we’re focused on supervised learning.
This means we’re given a training set consisting of inputs and
corresponding labels
Machine learning - learning a program. Labels are the expected output of
the correct program when given the inputs.
Goal: correctly predict labels for data not in the training set (“in the wild”)
i.e. our ML algorithm must generalize
Algorithm:
1. Find example (x∗ , t ∗ ) (from the stored training set) closest to x.
That is:
x∗ = argmin dist(x(i) , x)
x(i) ∈train. set
2. Output y = t ∗
Algorithm (kNN):
1. Find k examples {(x(r ) , t (r ) )}kr=1 closest to the test instance x
2. Classification output is majority class
Xk
y = arg max I[t = t (r ) ]
t
r =1
data
Large k
I Makes stable predictions by averaging over lots of examples
I May underfit, i.e. fail to capture important regularities
The test set is used only at the very end, to measure the generalization
performance of the final configuration.
UofT CSC411 2019 Winter Lecture 02 16 / 27
K-Nearest Neighbours
Saving grace: some datasets (e.g. images) may have low intrinsic
dimension, i.e. lie on or near a low-dimensional manifold. So nearest
Neighbours sometimes still works in high dimensions.
Simple fix: normalize each dimension to be zero mean and unit variance.
I.e., compute the mean µj and standard deviation σj , and take
xj − µj
x̃j =
σj
This must be done for each query, which is very expensive by the standards
of a learning algorithm!
Tons of work has gone into algorithms and data structures for efficient
nearest Neighbours with high dimensions and/or large datasets.
[Belongie, Malik, and Puzicha, 2002. Shape matching and object recognition using shape
contexts.]
UofT CSC411 2019 Winter Lecture 02 24 / 27
Example: 80 Million Tiny Images
Simple algorithm that does all its work at test time — in a sense, no
learning!