COMP527: Data Mining: M. Sulaiman Khan (Mskhan@liv - Ac.uk)
COMP527: Data Mining: M. Sulaiman Khan (Mskhan@liv - Ac.uk)
Data Mining
COMP527: Data Mining
M. Sulaiman Khan
([email protected])
Introduction
Rule Sets vs Rule Lists
Constructing Rules-based Classifiers
1R
PRISM
Reduced Error Pruning
RIPPER
Rules with Exceptions
Idea: Learn a set of rules from the data. Apply those rules to
determine the class of the new instance.
For example:
R1. If blood-type=Warm and lay-eggs=True then Bird
R2. If blood-type=Cold and flies=False then Reptile
R3. If blood-type=Warm and lay-eggs=False then Mammal
Set:
The rules make independent predictions.
Every record is covered by 0..1 rules (hopefully 1!)
List:
The rules make dependent predictions.
Every record is covered by 0..* rules (hopefully 1..*!)
If all records are covered by at least one rule, then rule set or list is
considered Exhaustive.
Rules generated:
Attribute Rules Errors Total Errors
Outlook sunny » no 2/5 4/14
overcast » yes 0/4
rainy » yes 2/5
Temperature hot » no 2/4 (random) 5/14
mild » yes 2/6
cool » yes 1/4
Humidity high » no 3/7 4/14
normal » yes 1/7
Windy false » yes 2/8 5/14
true » no 3/6 (random)
Now choose the attribute with the fewest errors. Randomly decide
on Outlook. So 1R will simply use the outlook attribute to predict
the class for new instances.
Classification: Rules February 10, 2009 Slide 10
COMP527:
Data Mining
1R Algorithm
foreach attribute,
foreach value of that attribute,
find class distribution for attr/value
conc = most frequent class
make rule: attribute=value -> conc
calculate error rate of ruleset
select ruleset with lowest error rate
This covers:
Age Spectacle prescription Astigmatism Tear production Recommended
rate lenses
Young Myope Yes Reduced None
Young Myope Yes Normal Hard
Young Hypermetrope Yes Reduced None
Young Hypermetrope Yes Normal hard
Pre-presbyopic Myope Yes Reduced None
Pre-presbyopic Myope Yes Normal Hard
Pre-presbyopic Hypermetrope Yes Reduced None
Pre-presbyopic Hypermetrope Yes Normal None
Presbyopic Myope Yes Reduced None
Presbyopic Myope Yes Normal Hard
Presbyopic Hypermetrope Yes Reduced None
Presbyopic Hypermetrope Yes Normal None
Sunny (2/5) rainy (3/5) hot (0/2) mild (3/5) cool (2/3) high (1/5) normal (4/5) false (4/6) true
(¼)
We could split the training set into a growing set and a pruning set.
Grow rules out using the first, and then try to cut the rules back with
the pruning set.
Two strategies:
( p + (N -n)) / T
true positive + true negative / total number of instances
If 2 classes, then learn rules for one and default the other
If more than 2 classes, start with smallest until you have 2.