0% found this document useful (0 votes)
86 views28 pages

COMP527: Data Mining: M. Sulaiman Khan (Mskhan@liv - Ac.uk)

The document discusses rules-based classification techniques for data mining, including the 1R classifier which generates one rule per attribute, and the PRISM classifier which constructs rules sequentially to maximize accuracy on covered instances, removing covered instances before looking for subsequent rules. It provides examples and explanations of how these rule-based classifiers are constructed from data.

Uploaded by

Tanzeela Akbar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views28 pages

COMP527: Data Mining: M. Sulaiman Khan (Mskhan@liv - Ac.uk)

The document discusses rules-based classification techniques for data mining, including the 1R classifier which generates one rule per attribute, and the PRISM classifier which constructs rules sequentially to maximize accuracy on covered instances, removing covered instances before looking for subsequent rules. It provides examples and explanations of how these rule-based classifiers are constructed from data.

Uploaded by

Tanzeela Akbar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

COMP527:

Data Mining
COMP527: Data Mining

M. Sulaiman Khan
([email protected])

Dept. of Computer Science


University of Liverpool
2009

Classification: Rules February 10, 2009 Slide 1


COMP527:
Data Mining
COMP527: Data Mining

Introduction to the Course Input Preprocessing


Introduction to Data Mining Attribute Selection
Introduction to Text Mining Association Rule Mining
General Data Mining Issues ARM: A Priori and Data Structures
Data Warehousing ARM: Improvements
Classification: Challenges, Basics ARM: Advanced Techniques
Classification: Rules Clustering: Challenges, Basics
Classification: Trees Clustering: Improvements
Classification: Trees 2 Clustering: Advanced Algorithms
Classification: Bayes Hybrid Approaches
Classification: Neural Networks Graph Mining, Web Mining
Classification: SVM Text Mining: Challenges, Basics
Classification: Evaluation Text Mining: Text-as-Data
Classification: Evaluation 2 Text Mining: Text-as-Language
Regression, Prediction Revision for Exam

Classification: Rules February 10, 2009 Slide 2


COMP527:
Data Mining
Today's Topics

Introduction
Rule Sets vs Rule Lists
Constructing Rules-based Classifiers
1R
PRISM
Reduced Error Pruning
RIPPER
Rules with Exceptions

Classification: Rules February 10, 2009 Slide 3


COMP527:
Data Mining
Rules-Based Classifiers

Idea: Learn a set of rules from the data. Apply those rules to
determine the class of the new instance.

For example:
R1. If blood-type=Warm and lay-eggs=True then Bird
R2. If blood-type=Cold and flies=False then Reptile
R3. If blood-type=Warm and lay-eggs=False then Mammal

Hawk: flies=True, blood-type=Warm, lay-eggs=True, class=???

R1 is True, so the classifier predicts that Hawk = Bird.


Yay!

Classification: Rules February 10, 2009 Slide 4


COMP527:
Data Mining
Rules-Based Classifiers

A rule r covers an instance x if the attributes of the instance satisfy


the condition of the rule.

The coverage of a rule is the percentage of records that satisfy the


condition.

The accuracy of a rule is the percentage of covered records that


satisfy the condition and the conclusion.

For example, a rule might cover 10/50 records (coverage 20%) of


which 8 are correct (accuracy 80%).

Classification: Rules February 10, 2009 Slide 5


COMP527:
Data Mining
Rule Set vs Rule List

Rules can either be grouped as a set or an ordered list.

Set:
The rules make independent predictions.
Every record is covered by 0..1 rules (hopefully 1!)

R1. If flies=True and lays-eggs=True and lives-in-water=False then


Bird
R2. If flies=False and lives-in-water=True and lays-eggs=True then
Fish
R3. If blood-type=Warm and lays-eggs=False then Mammal
R4. If blood-type=Warm and lays-eggs=True then Reptile

Doesn’t matter which order we evaluate these rules.

Classification: Rules February 10, 2009 Slide 6


COMP527:
Data Mining
Rule Set vs Rule List

List:
The rules make dependent predictions.
Every record is covered by 0..* rules (hopefully 1..*!)

R1. If flies=True and lays-eggs=True then Bird


R2. If blood-type=Warm and lays-eggs=False then Mammal
R3. If lives-in-water=True then Fish
R4. If lays-eggs=True then Reptile

Does matter which order we evaluate these rules.

If all records are covered by at least one rule, then rule set or list is
considered Exhaustive.

Classification: Rules February 10, 2009 Slide 7


COMP527:
Data Mining
Constructing Rules-Based Classifiers

Covering approach: At each stage, a rule is found that covers


some instances.

“Separate and Conquer” -- Choose rule that identifies many


instances, separate them out, repeat.

But first a very very simple classifier called “1R”.


1R because the rules all test one particular attribute.

Classification: Rules February 10, 2009 Slide 8


COMP527:
Data Mining
1R Classifier

Idea: Construct one rule for each attribute/value combination


predicting the most common class for that combination.
Example Data:
Outlook Temperature Humidity Windy Play?
sunny hot high false no
sunny hot high true no
overcast hot high false yes
rainy mild high false yes
rainy cool normal false yes
rainy cool normal true no
overcast cool normal true yes
sunny mild high false no
sunny cool normal false yes
rainy mild normal false yes
sunny mild normal true yes
overcast mild high true yes
overcast hot normal false yes
rainy mild high true no

Classification: Rules February 10, 2009 Slide 9


COMP527:
Data Mining
1R Classifier

Rules generated:
Attribute Rules Errors Total Errors
Outlook sunny » no 2/5 4/14
overcast » yes 0/4
rainy » yes 2/5
Temperature hot » no 2/4 (random) 5/14
mild » yes 2/6
cool » yes 1/4
Humidity high » no 3/7 4/14
normal » yes 1/7
Windy false » yes 2/8 5/14
true » no 3/6 (random)

Now choose the attribute with the fewest errors. Randomly decide
on Outlook. So 1R will simply use the outlook attribute to predict
the class for new instances.
Classification: Rules February 10, 2009 Slide 10
COMP527:
Data Mining
1R Algorithm

foreach attribute,
foreach value of that attribute,
find class distribution for attr/value
conc = most frequent class
make rule: attribute=value -> conc
calculate error rate of ruleset
select ruleset with lowest error rate

Almost not worth wasting a slide on, really!

Classification: Rules February 10, 2009 Slide 11


COMP527:
Data Mining
PRISM Classifier

Instead of always looking to the full data set, after constructing


each rule we could remove the instances that the rule covers
before looking for a new rule.

Start with a high coverage rule, then increase its accuracy by


adding more conditions to it.

Want to maximise the accuracy of each rule: maximise the ratio of


positive instances/covered instances.

Finished adding conditions when p/t = 1, or no more instances to


look at

Classification: Rules February 10, 2009 Slide 12


COMP527:
Data Mining
PRISM Classifier

Following Witten (pg 6, 108+)


If X then recommendation=hard
Find highest coverage ratio condition for X:

Age = Young 2/8


Age = Pre-presbyopic 1/8
Age = Presbyopic 1/8
Prescription = Myope 3/12
Prescription = Hypermetrope 1/12
Astigmatism = no 0/12
Astigmatism = yes 4/12
Tear-Production = Reduced 0/12
Tear-Production = Normal 4/12

Select astigmatism = yes


(arbitrarily over Tear-Production = Normal)

Classification: Rules February 10, 2009 Slide 13


COMP527:
Data Mining
PRISM Classifier

This covers:
Age Spectacle prescription Astigmatism Tear production Recommended
rate lenses
Young Myope Yes Reduced None
Young Myope Yes Normal Hard
Young Hypermetrope Yes Reduced None
Young Hypermetrope Yes Normal hard
Pre-presbyopic Myope Yes Reduced None
Pre-presbyopic Myope Yes Normal Hard
Pre-presbyopic Hypermetrope Yes Reduced None
Pre-presbyopic Hypermetrope Yes Normal None
Presbyopic Myope Yes Reduced None
Presbyopic Myope Yes Normal Hard
Presbyopic Hypermetrope Yes Reduced None
Presbyopic Hypermetrope Yes Normal None

Now need to add another condition to make it more accurate.

If astigmatism = yes and X then recommendation = hard

Classification: Rules February 10, 2009 Slide 14


COMP527:
Data Mining
PRISM Classifier

Best condition is Tear-Production = normal (4/6)


New rule: astigmatism=yes and tear-production = normal

But still some inaccuracy...

Age=Young (2/2) or Prescription = Myope (3/3) both have 100%


ratio in remaining instances. Choose the greater coverage.

If astigmatism = yes and tear-production = normal and


prescription = myope then recommendation = hard
Repeat the process, removing the instances covered by this rule.
Then repeat for all classes.

Classification: Rules February 10, 2009 Slide 15


COMP527:
Data Mining
PRISM Classifier

Try with the other example data set. If X then play=yes

Outlook Temperature Humidity Windy Play?


sunny hot high false no
sunny hot high true no
overcast hot high false yes
rainy mild high false yes
rainy cool normal false yes
rainy cool normal true no
overcast cool normal true yes
sunny mild high false no
sunny cool normal false yes
rainy mild normal false yes
sunny mild normal true yes
overcast mild high true yes
overcast hot normal false yes
rainy mild high true no

Outlook=overcast is (4/4) Already perfect, remove instances and


look again.

Classification: Rules February 10, 2009 Slide 16


COMP527:
Data Mining
PRISM Classifier

With reduced dataset, if X then play=yes


Outlook Temperature Humidity Windy Play?
sunny hot high false no
sunny hot high true no
rainy mild high false yes
rainy cool normal false yes
rainy cool normal true no
sunny mild high false no
sunny cool normal false yes
rainy mild normal false yes
sunny mild normal true yes
rainy mild high true no

Sunny (2/5) rainy (3/5) hot (0/2) mild (3/5) cool (2/3) high (1/5) normal (4/5) false (4/6) true
(¼)

Select humidity=normal (4/5) and look for another rule as not


perfect

Classification: Rules February 10, 2009 Slide 17


COMP527:
Data Mining
PRISM Classifier

If humidity=normal and X then play=yes


Outlook Temperature Humidity Windy Play?
rainy cool normal false yes
rainy cool normal true no
sunny cool normal false yes
rainy mild normal false yes
sunny mild normal true yes

If we could use 'and-not' we could have:


and-not (temperature=cool and windy=true)
But instead:
rainy (2/3), sunny (2/2), cool (2/3), mild (2/2), false(3/3), true
(1/2)
So we select windy=false to maximise t and add that to the rule.

Classification: Rules February 10, 2009 Slide 18


COMP527:
Data Mining
PRISM Algorithm

for each class C


initialise E to the complete instance set
while E contains instances with class C
create empty rule R if X then C
until R is perfect (or no more attributes)
for each attribute A not in R, and each value v,
consider adding A=v to R
select A and v to maximise accuracy of p/t
add A=v to R
remove instances covered by R from E

Classification: Rules February 10, 2009 Slide 19


COMP527:
Data Mining
Issues with PRISM

Overfitting. As we saw, we had 4/5 and then lost one positive


instance to lose the one negative instance. But with more
examples maybe it was 199/200 and we'd need to lose 40
positive to remove it... that's crazy.

Measure 2: Information Gain


p * (log(p/t) - log (P/T))
p/t as before (positive over total)
P and T are positive and total before new condition.

Emphasises large number of positive examples. Use this in PRISM


in place of maximising p/t.

Classification: Rules February 10, 2009 Slide 20


COMP527:
Data Mining
Over-Fitting Avoidance

We could split the training set into a growing set and a pruning set.
Grow rules out using the first, and then try to cut the rules back with
the pruning set.

Two strategies:

Reduced-error pruning: Build full rule set then prune rules

Incremental reduced-error pruning: Simplify rules as built


Can re-split the data after each rule. Let's look at this one.

Classification: Rules February 10, 2009 Slide 21


COMP527:
Data Mining
Incremental Reduced Error Pruning

initialise E to instance set


until E is empty:
split E into Grow and Prune (ratio 2:1)
for each class C in Grow
generate best rule for C
using Prune:
calc worth(R) and worth(R-finalCondition)
while worth(R-) > worth(R), prune rule
from rules for different classes, select
largest worth(R)
remove instances covered by rule

Classification: Rules February 10, 2009 Slide 22


COMP527:
Data Mining
Rule Worth?

How can we generate the worth of the rules? (Witten 203)

 ( p + (N -n)) / T
 true positive + true negative / total number of instances

 positive + totalNegative - negativesCovered / totalInstances

 p=2000, t=3000 --> 1000 + N / T

 p=1000, t=1001 --> 999 + N / T


 p/t
 Same problem as before p=1,t=1 vs p=1000,t=1001

 Simple but intuitive algorithm for worth?

Classification: Rules February 10, 2009 Slide 23


COMP527:
Data Mining
Issue with Grow/Prune Splitting

Say we have 1000 examples, and we split 2:1 for train/test


(666,334), then 2:1 for grow/prune (444,222) ... we're building
our rules on less than half of our data!

Depending on the dataset, classes may be absent from the training


set, or the distributions may be very wrong, or any number of
other statistical problems with random sampling to this degree.

Ameliorated in Incremental as re-split often. But might still want to


perform the algorithm several times and pick the best.

Classification: Rules February 10, 2009 Slide 24


COMP527:
Data Mining
RIPPER

Rules-based classifier from industry.

If 2 classes, then learn rules for one and default the other
If more than 2 classes, start with smallest until you have 2.

Information Gain to grow rules


Measure for pruning: p-n / p + n (positive/negative examples
covered in pruning set)

Uses 'Description Length' metric -- Ockham's Razor says that the


simplest solution is the best, so here the simplest rule set is the
best. (Not going into how to calculate this)

Classification: Rules February 10, 2009 Slide 25


COMP527:
Data Mining
RIPPER Algorithm

Repeated Incremental Pruning to Produce Error Reduction

split E into Grow/Prune


BUILD:
repeat until no examples, or DL of ruleset >minDL(rulesets)+64, or error >50%
GROW: add conditions until rule is 100% by IG
PRUNE: prune last to first while worth metric W increases
for each rule R, for each class C:
split E into Grow/Prune
remove all instances from Prune covered by other rules
GROW and PRUNE two competing rules:
R1 is new rule built from scratch
R2 is generated by adding conditions to R
prune using worth metric A for reduced dataset
replace R by R, R1 or R2 with smallest DL
if uncovered instances of C, return to BUILD to make more rules
calculate DL for ruleset and ruleset with each rule omitted, delete any rule that
increases the DL
remove instances covered by rules generated

DL = Description Length, Metric W= p+1/t+2, Metric A=p+N-n/T

Classification: Rules February 10, 2009 Slide 26


COMP527:
Data Mining
Rules with Exceptions

If we get more data after a ruleset has been generated, it might be


useful to add exceptions to rules.

If X then class1 unless Y then class2

Consider our humidity rule:


if humidity=normal then play=yes
unless temperature=cool and windy=true then play = no

Exceptions developed with the Induct system, called 'ripple-down


rules'

Classification: Rules February 10, 2009 Slide 27


COMP527:
Data Mining
Further Reading

 Witten, Sections 3.3, 3.5, 3.6, 4.1, 4.4


 Dunham Section 4.6
 Han, Section 6.5
 Berry and Browne, Chapter 8

Classification: Rules February 10, 2009 Slide 28

You might also like