Lazy Learning
(Or Learning from Your Neighbors)
Lazy Learning vs. Eager Learning
Lazy learning vs. Eager learning
– Eager learning
Given a set of training set, constructs a classification model before receiving new
(e.g., test) data to classify
e.g. decision tree induction, Bayesian classification, rule-based classification
– Lazy learning
Simply stores training data (or only minor processing) and waits until it is given a new
instance
Lazy learners take less time in training but more time in predicting
e.g., k-nearest-neighbor classifiers, case-based reasoning classifiers
Typical approaches of lazy learning:
–k-nearest neighbor approach
Instances represented as points in a Euclidean space.
–Case-based reasoning
Uses symbolic representations and knowledge-based inference Locally weighted
regression
k-Nearest-Neighbor Method
–first described in the early 1950s
–It has since been widely used in the area of pattern recognition.
–The training instances are described by n attributes.
–Each instance represents a point in an n-dimensional space.
–A k-nearest-neighbor classifier searches the pattern space for the k training instances that are
closest to the unknown instance.
Requires 3 things:
o Feature Space (Training Data)
o Distance metric
to compute distance between records
o The value of k
the number of nearest neighbors to retrieve from which to get
majority class
To classify an unknown record:
o Compute distance to other training records
o Identify k nearest neighbors
o Use class labels of nearest neighbors to determine the class label of unknown
record
k = 1:
– Belongs to square class
k = 3:
? – Belongs to triangle class
k = 7:
– Belongs to square class
Choosing the value of k:
o If k is too small, sensitive to noise points
o If k is too large, neighborhood may include points from other classes
o Choose an odd value for k, to eliminate ties
Common Distance Metrics:
o Euclidean distance (continuos distribution)
Examples
Name Acid Durability Strength class
Type-1 7 7 Bad
Type-2 7 4 Bad
Type-3 3 4 Good
Type-4 1 4 Good
Test – data -----> acid durability = 3 , and strength = 7 , class= ?
Calculated using distance measure
Name Acid durability Strength Class Distance
Type-1 7 7 Bad Sqrt((7-3)2+(7-7)2)=4
Type-2 7 4 Bad 5
Type-3 3 4 Good 3
Type-4 1 4 Good 3.6
Name Acid durability Strength Class Distance Rank
Type-1 7 7 Bad 4 3
Type-2 7 4 Bad 5 4
Type-3 3 4 Good 3 1
Type-4 1 4 Good 3.6 2
K =1
Name Acid durability Strength Class Distance Rank
Type-1 7 7 Bad 4 3
Type-2 7 4 Bad 5 4
Type-3 3 4 Good 3 1
Type-4 1 4 Good 3.6 2
Based on immediate neighbor, Good
K=2
Name Acid durability Strength Class Distance Rank
Type-1 7 7 Bad 4 3
Type-2 7 4 Bad 5 4
Type-3 3 4 Good 3 1
Type-4 1 4 Good 3.6 2
Based on two neighbor, Good
K=3
Name Acid durability Strength Class Distance Rank
Type-1 7 7 Bad 4 3
Type-2 7 4 Bad 5 4
Type-3 3 4 Good 3 1
Type-4 1 4 Good 3.6 2
Based on three neighbor, 2 Good and 1 Bad, majority---> Good