0% found this document useful (0 votes)
20 views72 pages

DM 05 04 Rule-Based Classification

The document discusses Rule-Based Classification, focusing on the use of IF-THEN rules for classification, rule extraction from decision trees, and various algorithms like 1R, PRISM, and FOIL. It explains the assessment of rules in terms of coverage and accuracy, conflict resolution strategies, and the process of generating rules through sequential covering algorithms. The document also highlights the importance of pruning rule sets and addresses issues like overfitting in rule generation.

Uploaded by

floraaluoch3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views72 pages

DM 05 04 Rule-Based Classification

The document discusses Rule-Based Classification, focusing on the use of IF-THEN rules for classification, rule extraction from decision trees, and various algorithms like 1R, PRISM, and FOIL. It explains the assessment of rules in terms of coverage and accuracy, conflict resolution strategies, and the process of generating rules through sequential covering algorithms. The document also highlights the importance of pruning rule sets and addresses issues like overfitting in rule generation.

Uploaded by

floraaluoch3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Data Mining

Part 5. Prediction

5.4. Rule-Based Classification

Spring 2010

Instructor: Dr. Masoud Yaghini

Rule-Based Classification
Outline

 Using IF-THEN Rules for Classification


 Rule Extraction from a Decision Tree
 1R Algorithm
 Sequential Covering Algorithms
 PRISM Algorithm
 FOIL Algorithm
 References

Rule-Based Classification
Using IF-THEN Rules for
Classification

Rule-Based Classification
Using IF-THEN Rules for Classification

 A rule-based classifier uses a set of IF-THEN


rules for classification.
 An IF-THEN rule is an expression of the form:

– where
 Condition (or LHS) is rule antecedent/precondition
 Conclusion (or RHS) is rule consequent

Rule-Based Classification
Using IF-THEN rules for classification

 An example is rule R1:

– The condition consists of one or more attribute tests


that are logically ANDed
 such as age = youth, and student = yes
– The rule’s consequent contains a class prediction
 we are predicting whether a customer will buy a computer

 R1 can also be written as

Rule-Based Classification
Assessment of a Rule

 Assessment of a rule:
– Coverage of a rule:
 The percentage of instances that satisfy the antecedent of a
rule (i.e., whose attribute values hold true for the rule’s
antecedent).
– Accuracy of a rule:
 The percentage of instances that satisfy both the antecedent
and consequent of a rule

Rule-Based Classification
Rule Coverage and Accuracy

 Rule accuracy and coverage:

 where
– D: class labeled data set
– |D|: number of instances in D
– ncovers : number of instances covered by R
– ncorrect : number of instances correctly classified by R

Rule-Based Classification
Example: AllElectronics

Rule-Based Classification
Coverage and Accuracy

 The rule R1:

– R1 covers 2 of the 14 instances


– It can correctly classify both instances
 Therefore:
– Coverage(R1) = 2/14 = 14.28%
– Accuracy(R1) = 2/2 = 100%.

Rule-Based Classification
Executing a rule set

 Two ways of executing a rule set:


– Ordered set of rules (“decision list”)
 Order is important for interpretation
– Unordered set of rules
 Rules may overlap and lead to different conclusions for the
same instance

Rule-Based Classification
How We Can Use Rule-based Classification

 Example: We would like to classify X according to


buys_computer:

 If a rule is satisfied by X, the rule is said to be


triggered
 Potential problems:
– If more than one rule is satisfied by X
 Solution: conflict resolution strategy
– if no rule is satisfied by X
 Solution: Use a default class

Rule-Based Classification
Conflict Resolution

 Conflict resolution strategies:


– Size ordering
– Rule Ordering
 Class-based ordering
 Rule-based ordering

 Size ordering (rule antecedent size ordering)


– Assign the highest priority to the triggering rules that
is measured by the rule precondition size. (i.e., with
the most attribute test)
– the rules are unordered

Rule-Based Classification
Conflict Resolution

 Class-based ordering:
– Decreasing order of most frequent
 That is, all of the rules for the most frequent class come first,
the rules for the next most frequent class come next, and so on.
– Decreasing order of misclassification cost per
class
– Most popular strategy

Rule-Based Classification
Conflict Resolution

 Rule-based ordering (Decision List):


– Rules are organized into one long priority list,
according to some measure of rule quality such as:
 accuracy
 coverage
 by experts

Rule-Based Classification
Default Rule

 If no rule is satisfied by X :
– A default rule can be set up to specify a default class,
based on a training set.
– This may be the class in majority or the majority class
of the instances that were not covered by any rule.
– The default rule is evaluated at the end, if and only if
no other rule covers X.
– The condition in the default rule is empty.
– In this way, the rule fires when no other rule is
satisfied.

Rule-Based Classification
Rule Extraction from a
Decision Tree

Rule-Based Classification
Building Classification Rules

 Direct Method: extract rules directly from data


– 1R Algorithm
– Sequential covering algorithms
 e.g.: PRISM, RIPPER, CN2, FOIL, and AQ

 Indirect Method: extract rules from other


classification models
– e.g. decision trees

Rule-Based Classification
Rule Extraction from a Decision Tree

 Decision trees can become large and difficult to


interpret.
– Rules are easier to understand than large trees
– One rule is created for each path from the root to a leaf
– Each attribute-value pair along a path forms a precondition: the
leaf holds the class prediction
– The order of the rules does not matter
 Rules are
– Mutually exclusive: no two rules will be satisfied for the same
instance
– Exhaustive: there is one rule for each possible attribute-value
combination

Rule-Based Classification
Example: AllElectronics

Rule-Based Classification
Pruning the Rule Set

 The resulting set of rules extracted can be large


and difficult to follow
– Solution: pruning the rule set
 For a given rule any condition that does not
improve the estimated accuracy of the rule can
be pruned (i.e., removed)
 C4.5 extracts rules from an unpruned tree, and
then prunes the rules using an approach similar
to its tree pruning method

Rule-Based Classification
1R Algorithm

Rule-Based Classification
1R algorithm

 An easy way to find very simple classification rule


 1R: rules that test one particular attribute
 Basic version
– One branch for each value
– Each branch assigns most frequent class
– Error rate: proportion of instances that don’t belong to the
majority class of their corresponding branch
– Choose attribute with lowest error rate (assumes nominal
attributes)
 “Missing” is treated as a separate attribute value

Rule-Based Classification
Pseudocode or 1R Algorithm

Rule-Based Classification
Example: The weather problem

Rule-Based Classification
Evaluating the weather attributes

Rule-Based Classification
The attribute with the smallest number of errors

Rule-Based Classification
Dealing with numeric attributes

 Discretize numeric attributes


 Divide each attribute’s range into intervals
– Sort instances according to attribute’s values
– Place breakpoints where class changes (majority
class)
– This minimizes the total error

Rule-Based Classification
Weather data with some numeric attributes

Rule-Based Classification
Example: temperature from weather data

 Discretization involves partitioning this sequence


by placing breakpoints wherever the class
changes,

Rule-Based Classification
The problem of overfitting

 Overfitting is likely to occur whenever an attribute


has a large number of possible values
 This procedure is very sensitive to noise
– One instance with an incorrect class label will
probably produce a separate interval
 Attribute will have zero errors
 Simple solution: enforce minimum number of
instances in majority class per interval

Rule-Based Classification
Minimum is set at 3 for temperature attribute

 The partitioning process begins

 the next example is also yes, we lose nothing by


including that in the first partition

 Thus the final discretization is

 the rule set

Rule-Based Classification
Resulting rule set with overfitting avoidance

Rule-Based Classification
Sequential Covering Algorithms

Rule-Based Classification
Sequential Covering Algorithms

 A sequential covering algorithm:


– The rules are learned sequentially (one at a time)
– Each rule for a given class will ideally cover many of
the instances of that class (and hopefully none of the
instances of other classes).
– Each time a rule is learned, the instances covered by
the rule are removed, and the process repeats on the
remaining instances.

Rule-Based Classification
Sequential Covering Algorithms

while (enough target instances left)


generate a rule
remove positive target instances satisfying this rule

Instances covered
Instances covered by Rule 2
by Rule 1 Instances covered
by Rule 3

Instances

Rule-Based Classification
Sequential Covering Algorithms

 Typical Sequential covering algorithms:


– PRISM
– FOIL
– AQ
– CN2
– RIPPER
 Sequential covering algorithms are the most
widely used approach to mining classification
rules
 Comparison with decision-tree induction:
– Decision tree is learning a set of rules
simultaneously
Rule-Based Classification
Basic Sequential Covering Algorithm

Rule-Based Classification
Basic Sequential Covering Algorithm

Rule-Based Classification
Basic Sequential Covering Algorithm

 Steps:
– Rules are learned one at a time
– Each time a rule is learned, the instances covered by
the rules are removed
– The process repeats on the remaining instances
unless termination condition
 e.g., when no more training examples or when the quality of a
rule returned is below a user-specified level

Rule-Based Classification
Generating A Rule

 Typically, rules are grown in a general-to-


specific manner
 We start with an empty rule and then gradually
keep appending attribute tests to it.
 We append by adding the attribute test as a
logical conjunct to the existing condition of the
rule antecedent.

Rule-Based Classification
Example: Generating A Rule

 Example:
– Suppose our training set, D, consists of loan application
data.
– Attributes regarding each applicant include their:
 age
 income
 education level
 residence
 credit rating
 the term of the loan.
– The classifying attribute is loan_decision, which indicates
whether a loan is accepted (considered safe) or rejected
(considered risky).

Rule-Based Classification
Example: Generating A Rule

 To learn a rule for the class “accept,” we start off


with the most general rule possible, that is, the
condition of the rule precondition is empty.
– The rule is:

 We then consider each possible attribute test that


may be added to the rule.

Rule-Based Classification
Example: Generating A Rule

 Each time it is faced with adding a new attribute


test to the current rule, it picks the one that most
improves the rule quality, based on the training
samples.
 The process repeats, where at each step, we
continue to greedily grow rules until the resulting
rule meets an acceptable quality level.

Rule-Based Classification
Example: Generating A Rule

 A general-to-specific search through rule space

Rule-Based Classification
Example: Generating A Rule

 Possible rule set for class “a”:


if true then class = a

Rule-Based Classification
Example: Generating A Rule

 Possible rule set for class “a”:

Rule-Based Classification
Example: Generating A Rule

 Possible rule set for class “a”:

Rule-Based Classification
Decision tree for the same problem

 Corresponding decision tree: (produces exactly


the same predictions)

Rule-Based Classification
Rules vs. trees

 Both methods might first split the dataset using


the x attribute and would probably end up splitting
it at the same place (x = 1.2)
 But: rule sets can be more clear when decision
trees suffer from replicated subtrees
 Also: in multiclass situations, covering algorithm
concentrates on one class at a time whereas
decision tree learner takes all classes into
account

Rule-Based Classification
PRISM Algorithm

Rule-Based Classification
PRISM Algorithm

 PRISM method generates a rule by adding tests


that maximize rule’s accuracy
 Each new test reduces rule’s coverage:

Rule-Based Classification
Selecting a test

 Goal: maximize accuracy


– t total number of instances covered by rule
– p positive examples of the class covered by rule
– t – p number of errors made by rule
– Select test that maximizes the ratio p/t

 We are finished when p / t = 1 or the set of


instances can’t be split any further

Rule-Based Classification
Example: contact lens data

Rule-Based Classification
Example: contact lens data

 To begin, we seek a rule:

 Possible tests:

Rule-Based Classification
Create the rule

 Rule with best test added and covered instances:

Rule-Based Classification
Further refinement

 Current state:

 Possible tests:

Rule-Based Classification
Modified rule and resulting data

 Rule with best test added:

 Instances covered by modified rule:

Rule-Based Classification
Further refinement

 Current state:

 Possible tests:

 Tie between the first and the fourth test


– We choose the one with greater coverage

Rule-Based Classification
The result

 Final rule:

 Second rule for recommending “hard lenses”:


(built from instances not covered by first rule)

 These two rules cover all “hard lenses”:


– Process is repeated with other two classes

Rule-Based Classification
Pseudo-code for PRISM

Rule-Based Classification
Rules vs. decision lists

 PRISM with outer loop generates a decision list


for one class
– Subsequent rules are designed for rules that are not
covered by previous rules
– But: order doesn’t matter because all rules predict the
same class
 Outer loop considers all classes separately
– No order dependence implied

Rule-Based Classification
Separate and conquer

 Methods like PRISM (for dealing with one class)


are separate-and-conquer algorithms:
– First, identify a useful rule
– Then, separate out all the instances it covers
– Finally, “conquer” the remaining instances

Rule-Based Classification
FOIL Algorithm
(First Order Inductive Learner Algorithm)

Rule-Based Classification
Coverage or Accuracy?

Rule-Based Classification
Coverage or Accuracy?

 Consider the two rules:


– R1: correctly classifies 38 of the 40 instances it covers
– R2: covers only two instances, which it correctly
classifies
 Their accuracies are 95% and 100%
 R2 has greater accuracy than R1, but it is not the
better rule because of its small coverage
 Accuracy on its own is not a reliable estimate of
rule quality
 Coverage on its own is not useful either

Rule-Based Classification
Consider Both Coverage and Accuracy

 If our current rule is R:


IF condition THEN class = c
 We want to see if logically ANDing a given
attribute test to condition would result in a better
rule
 We call the new condition, condition’, where R’ :
IF condition’ THEN class = c
– is our potential new rule
 In other words, we want to see if R’ is any better
than R

Rule-Based Classification
FOIL Information Gain

 FOIL_Gain (in FOIL & RIPPER): assesses


info_gain by extending condition
pos ' pos
FOIL _ Gain = pos ' × (log2 − log 2 )
pos '+ neg ' pos + neg
 where
– pos (neg) be the number of positive (negative)
instances covered by R
– pos’ (neg’) be the number of positive (negative)
instances covered by R’
 It favors rules that have high accuracy and
cover many positive instances
Rule-Based Classification
Rule Generation

 To generate a rule
while(true)
find the best predicate p
if FOIL_GAIN(p) > threshold then add p to current rule
else break

A3=1&&A1=2
A3=1&&A1=2
&&A8=5A3=1

Positive Negative
examples examples

Rule-Based Classification
Rule Pruning: FOIL method

 Assessments of rule quality as described above


are made with instances from the training data
 Rule pruning based on an independent set of test
instances

 If FOIL_Prune is higher for the pruned version of


R, prune R

Rule-Based Classification
References

Rule-Based Classification
References

 J. Han, M. Kamber, Data Mining: Concepts and


Techniques, Elsevier Inc. (2006). (Chapter 6)

 I. H. Witten and E. Frank, Data Mining: Practical


Machine Learning Tools and Techniques, 2nd
Edition, Elsevier Inc., 2005. (Chapter 6)

Rule-Based Classification
The end

Rule-Based Classification

You might also like