0% found this document useful (0 votes)

20 views72 pages

DM 05 04 Rule-Based Classification

The document discusses Rule-Based Classification, focusing on the use of IF-THEN rules for classification, rule extraction from decision trees, and various algorithms like 1R, PRISM, and FOIL. It explains the assessment of rules in terms of coverage and accuracy, conflict resolution strategies, and the process of generating rules through sequential covering algorithms. The document also highlights the importance of pruning rule sets and addresses issues like overfitting in rule generation.

Uploaded by

floraaluoch3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views72 pages

DM 05 04 Rule-Based Classification

Uploaded by

floraaluoch3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

Data Mining

Part 5. Prediction

5.4. Rule-Based Classification

Spring 2010

Instructor: Dr. Masoud Yaghini

Rule-Based Classification
Outline

Using IF-THEN Rules for Classification

Rule Extraction from a Decision Tree
1R Algorithm
Sequential Covering Algorithms
PRISM Algorithm
FOIL Algorithm
References

Rule-Based Classification
Using IF-THEN Rules for
Classification

Rule-Based Classification
Using IF-THEN Rules for Classification

A rule-based classifier uses a set of IF-THEN

rules for classification.
An IF-THEN rule is an expression of the form:

– where
Condition (or LHS) is rule antecedent/precondition
Conclusion (or RHS) is rule consequent

Rule-Based Classification
Using IF-THEN rules for classification

An example is rule R1:

– The condition consists of one or more attribute tests

that are logically ANDed
such as age = youth, and student = yes
– The rule’s consequent contains a class prediction
we are predicting whether a customer will buy a computer

R1 can also be written as

Rule-Based Classification
Assessment of a Rule

Assessment of a rule:
– Coverage of a rule:
The percentage of instances that satisfy the antecedent of a
rule (i.e., whose attribute values hold true for the rule’s
antecedent).
– Accuracy of a rule:
The percentage of instances that satisfy both the antecedent
and consequent of a rule

Rule-Based Classification
Rule Coverage and Accuracy

Rule accuracy and coverage:

where
– D: class labeled data set
– |D|: number of instances in D
– ncovers : number of instances covered by R
– ncorrect : number of instances correctly classified by R

Rule-Based Classification
Example: AllElectronics

Rule-Based Classification
Coverage and Accuracy

The rule R1:

– R1 covers 2 of the 14 instances

– It can correctly classify both instances
Therefore:
– Coverage(R1) = 2/14 = 14.28%
– Accuracy(R1) = 2/2 = 100%.

Rule-Based Classification
Executing a rule set

Two ways of executing a rule set:

– Ordered set of rules (“decision list”)
Order is important for interpretation
– Unordered set of rules
Rules may overlap and lead to different conclusions for the
same instance

Rule-Based Classification
How We Can Use Rule-based Classification

Example: We would like to classify X according to

buys_computer:

If a rule is satisfied by X, the rule is said to be

triggered
Potential problems:
– If more than one rule is satisfied by X
Solution: conflict resolution strategy
– if no rule is satisfied by X
Solution: Use a default class

Rule-Based Classification
Conflict Resolution

Conflict resolution strategies:

– Size ordering
– Rule Ordering
Class-based ordering
Rule-based ordering

Size ordering (rule antecedent size ordering)

– Assign the highest priority to the triggering rules that
is measured by the rule precondition size. (i.e., with
the most attribute test)
– the rules are unordered

Rule-Based Classification
Conflict Resolution

Class-based ordering:
– Decreasing order of most frequent
That is, all of the rules for the most frequent class come first,
the rules for the next most frequent class come next, and so on.
– Decreasing order of misclassification cost per
class
– Most popular strategy

Rule-Based Classification
Conflict Resolution

Rule-based ordering (Decision List):

– Rules are organized into one long priority list,
according to some measure of rule quality such as:
accuracy
coverage
by experts

Rule-Based Classification
Default Rule

If no rule is satisfied by X :
– A default rule can be set up to specify a default class,
based on a training set.
– This may be the class in majority or the majority class
of the instances that were not covered by any rule.
– The default rule is evaluated at the end, if and only if
no other rule covers X.
– The condition in the default rule is empty.
– In this way, the rule fires when no other rule is
satisfied.

Rule-Based Classification
Rule Extraction from a
Decision Tree

Rule-Based Classification
Building Classification Rules

Direct Method: extract rules directly from data

– 1R Algorithm
– Sequential covering algorithms
e.g.: PRISM, RIPPER, CN2, FOIL, and AQ

Indirect Method: extract rules from other

classification models
– e.g. decision trees

Rule-Based Classification
Rule Extraction from a Decision Tree

Decision trees can become large and difficult to

interpret.
– Rules are easier to understand than large trees
– One rule is created for each path from the root to a leaf
– Each attribute-value pair along a path forms a precondition: the
leaf holds the class prediction
– The order of the rules does not matter
Rules are
– Mutually exclusive: no two rules will be satisfied for the same
instance
– Exhaustive: there is one rule for each possible attribute-value
combination

Rule-Based Classification
Example: AllElectronics

Rule-Based Classification
Pruning the Rule Set

The resulting set of rules extracted can be large

and difficult to follow
– Solution: pruning the rule set
For a given rule any condition that does not
improve the estimated accuracy of the rule can
be pruned (i.e., removed)
C4.5 extracts rules from an unpruned tree, and
then prunes the rules using an approach similar
to its tree pruning method

Rule-Based Classification
1R Algorithm

Rule-Based Classification
1R algorithm

An easy way to find very simple classification rule

1R: rules that test one particular attribute
Basic version
– One branch for each value
– Each branch assigns most frequent class
– Error rate: proportion of instances that don’t belong to the
majority class of their corresponding branch
– Choose attribute with lowest error rate (assumes nominal
attributes)
“Missing” is treated as a separate attribute value

Rule-Based Classification
Pseudocode or 1R Algorithm

Rule-Based Classification
Example: The weather problem

Rule-Based Classification
Evaluating the weather attributes

Rule-Based Classification
The attribute with the smallest number of errors

Rule-Based Classification
Dealing with numeric attributes

Discretize numeric attributes

Divide each attribute’s range into intervals
– Sort instances according to attribute’s values
– Place breakpoints where class changes (majority
class)
– This minimizes the total error

Rule-Based Classification
Weather data with some numeric attributes

Rule-Based Classification
Example: temperature from weather data

Discretization involves partitioning this sequence

by placing breakpoints wherever the class
changes,

Rule-Based Classification
The problem of overfitting

Overfitting is likely to occur whenever an attribute

has a large number of possible values
This procedure is very sensitive to noise
– One instance with an incorrect class label will
probably produce a separate interval
Attribute will have zero errors
Simple solution: enforce minimum number of
instances in majority class per interval

Rule-Based Classification
Minimum is set at 3 for temperature attribute

The partitioning process begins

the next example is also yes, we lose nothing by

including that in the first partition

Thus the final discretization is

the rule set

Rule-Based Classification
Resulting rule set with overfitting avoidance

Rule-Based Classification
Sequential Covering Algorithms

A sequential covering algorithm:

– The rules are learned sequentially (one at a time)
– Each rule for a given class will ideally cover many of
the instances of that class (and hopefully none of the
instances of other classes).
– Each time a rule is learned, the instances covered by
the rule are removed, and the process repeats on the
remaining instances.

Rule-Based Classification
Sequential Covering Algorithms

while (enough target instances left)

generate a rule
remove positive target instances satisfying this rule

Instances covered
Instances covered by Rule 2
by Rule 1 Instances covered
by Rule 3

Instances

Rule-Based Classification
Sequential Covering Algorithms

Typical Sequential covering algorithms:

– PRISM
– FOIL
– AQ
– CN2
– RIPPER
Sequential covering algorithms are the most
widely used approach to mining classification
rules
Comparison with decision-tree induction:
– Decision tree is learning a set of rules
simultaneously
Rule-Based Classification
Basic Sequential Covering Algorithm

Rule-Based Classification
Basic Sequential Covering Algorithm

Steps:
– Rules are learned one at a time
– Each time a rule is learned, the instances covered by
the rules are removed
– The process repeats on the remaining instances
unless termination condition
e.g., when no more training examples or when the quality of a
rule returned is below a user-specified level

Rule-Based Classification
Generating A Rule

Typically, rules are grown in a general-to-

specific manner
We start with an empty rule and then gradually
keep appending attribute tests to it.
We append by adding the attribute test as a
logical conjunct to the existing condition of the
rule antecedent.