Lazy Learners Unit 2
Lazy Learners Unit 2
Lazy Learners
(Instance-Based
Learners)
Outline
• Introduction
• k-Nearest-Neighbor Classifiers
Lazy Learners
Introduction
Lazy Learners
Example Problem: Face
Recognition
• We have a database of (say) 1 million face
images
• We are given a new image and want to find the
most similar images in the database
• Represent faces by (relatively) invariant values,
e.g., ratio of nose width to eye width
• Each image represented by a large number of
numerical features
• Problem: given the features of a new face, find those
in the DB that are close in at least ¾ (say) of the
features
Lazy Learners
Introduction
• Typical approaches
– k-nearest neighbor approach
◆ Instances represented as points in a Euclidean space.
– Case-based reasoning
◆ Uses symbolic representations and knowledge-based
inference
Lazy Learners
k-Nearest-Neighbor Classifiers
Lazy Learners
k-Nearest-Neighbor Classifiers
Lazy Learners
k-Nearest-Neighbor Classifiers
• Example:
– We are interested in classifying the type of drug a
patient should be prescribed
– Based on the age of the patient and the patient’s
sodium/potassium ratio (Na/K)
– Dataset includes 200 patients
Lazy Learners
Scatter plot
On the scatter plot; light gray points indicate drug Y; medium gray points indicate drug A or
X; dark gray points indicate drug B or C
Lazy Learners
Close-up of neighbors to new patient
2
• Main questions:
– How many neighbors should we consider? That is,
what is k?
– How do we measure distance?
– Should all points be weighted equally, or should some
points have more influence than others?
Lazy Learners
k-Nearest-Neighbor Classifiers
Lazy Learners
k-Nearest-Neighbor Classifiers
Lazy Learners
k-Nearest-Neighbor Classifiers
Lazy Learners
k-Nearest-Neighbor Classifiers
Lazy Learners
Categorical Attributes
Lazy Learners
Missing Values
Lazy Learners
Determining a good value for
k
• k can be determined experimentally.
• Starting with k = 1, we use a test set to estimate the
error rate of the classifier.
• This process can be repeated each time by
incrementing k to allow for one more neighbor.
• The k value that gives the minimum error rate
may be selected.
• In general, the larger the number of training
tuples is, the larger the value of k will be
Lazy Learners
Finding nearest neighbors efficiently
Lazy Learners
kD-trees
Lazy Learners
kD-tree example
Lazy Learners
Using kD-trees:
example
• The target, which is not one of the instances in the tree, is
marked by a star.
• The leaf node of the region containing the target is
colored black.
• To determine whether one
closer exists, first check
whether it is possible for a
closer neighbor to lie within
the node’s sibling.
• Then back up to the parent
node and check its sibling
Lazy Learners
More on kD-trees
Lazy Learners
Building trees incrementally
Lazy Learners
Lazy Learners
Lazy Learners