Classification 2
Classification 2
Classification
Part 2
Overview
• Rule based Classifiers
• Nearest-neighbor Classifiers
Basic Definitions
• Coverage of a rule:
– Fraction of
instances that
satisfy the
antecedent of a rule
• Accuracy of a rule:
– Fraction of
instances that
satisfy both the
antecedent and (Marital Status=Married) → No
consequent of a rule Coverage = 40%, Accuracy = 100%
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering
Indirect Method:
From Decision Trees to Rules
Example
RIPPER:
Eager Learners
• So far we have learnt that classification involves
– An inductive step for constructing classification models
from data
– A deductive step for applying the derived model to
previously unseen instances
• For decision tree induction and rule based
classifiers, the models are constructed
immediately after the training set is provided
• Such techniques are known as eager learners
because they intend to learn the model as soon
as possible, once the training data is available
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering
Lazy Learners
• An opposite strategy would be to delay the
process of generalizing the training data until it
is needed to classify the unseen instances
• Techniques that employ such strategy are known
as lazy learners
• An example of lazy learner is the Rote Classifier,
which memorizes the entire training data and
perform classification only if the attributes of a
test instance matches one of the training
instances exactly
Nearest-neighbor Classifiers
• One way to make the “Rote Classifier”
approach more flexible is to find all
training instances that are relatively similar
to the test instance. They are called nearest
neighbors of the test instance
• The test instance can then be classified
according to the class label of its neighbors
• “If it walks like a duck, quacks like a duck, and
looks like a duck, then it’s probably a duck”
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering
1-nearest Neighbor
Distance Metric
• Distance metric is required to compute the
distance between two instances
• A nearest neighbor classifier represents each
instance as a data point embedded in a d-
dimensional space, where d is the number of
continuous attributes
• Euclidean Distance
• Weighted Distance
– Weight factor, w = 1 / d2
– Weight the vote according to the distance
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering