IME672 - Lecture 48

IME 672
Data Mining & Knowledge

Discovery
Dr. Faiz Hamid

Department of Industrial & Management Engineering
Indian Institute of Technology Kanpur
Email: [email protected]
Rule-Based Classification
IF-THEN Rules for Classification
• Represent the knowledge in the form of IF-THEN rules
• R: IF age = youth AND student = yes THEN buys_computer = yes

– The “IF” part (or left side) of a rule is known as the rule antecedent or
precondition; conjunction of attribute tests
– The “THEN” part (or right side) is the rule consequent
• Can be generated either from a decision Tree or directly from

the training data using a sequential covering algorithm
• If the rule antecedent holds true for a given tuple, we say that
the rule is satisfied; rule covers the tuple; rule is fired or
triggered
• Vertebrate Classification Problem
• Assessment of a rule R: coverage and accuracy
– ncovers = # of tuples covered by R
– ncorrect = # of tuples correctly classified by R
– Coverage: Fraction of records that satisfy the antecedent of a rule
– Accuracy: Fraction of records that satisfy both the antecedent and
consequent of a rule
• If more than one rule is triggered, need conflict resolution

– Size ordering: assigns highest priority to the triggering rules that has
the “toughest” requirement (i.e., with the most attribute tests)
– Rule ordering: rules prioritized beforehand
• class-based, rule-based
• Class-based ordering: classes are sorted in decreasing order of
prevalence or misclassification cost per class
• Rule-based ordering (decision list): rules are organized into
one long priority list, according to some measure of rule
quality or by experts
• What if no rule satisfied by X?

– Set up a default rule to specify a default class, based on a training set
– May be the class in majority or the majority class of the tuples that
were not covered by any rule
Rule Extraction from a Decision Tree
• Rules are easier to understand than large trees
• One rule is created for each path from the root to a leaf
• Each splitting criterion along a given path is logically ANDed to form the rule
antecedent (“IF” part)
• The leaf node holds the class prediction, forming the rule consequent
(“THEN” part)
• IF age = youth AND student = no THEN buys_computer = no

• IF age = youth AND student = yes THEN buys_computer = yes
• IF age = mid-age THEN buys_computer = yes
• IF age = senior AND credit_rating = fair THEN buys_computer = no
• IF age = senior AND credit_rating = excellent THEN buys_computer = yes
Rule Extraction from a Decision Tree
• Rules extracted are mutually exclusive and exhaustive
• Mutually exclusive:
– no two rules will be triggered for the same tuple
– cannot have rule conflicts
• Exhaustive:
– one rule for each possible attribute–value combination
– each record is covered by at least one rule
• Since one rule extracted per leaf, the set of rules is not much
simpler than the corresponding decision tree
• Rule pruning required

Rule Induction: Sequential Covering Algorithm
• Extracts rules directly from training data
• Typical sequential covering algorithms: FOIL, AQ, CN2, RIPPER
• Rules are learned sequentially, each rule for a given class Ci will
cover many tuples of Ci but none (or few) of the tuples of other
classes
Otherwise, the next rule is identical to previous rule
• Steps:
i. Rules are learned one at a time
ii. Each time a rule is learned, the tuples covered by the rules are removed
iii. Repeat the process on the remaining tuples until termination condition
• Termination condition
– when no more training examples, or
– when the quality of a rule returned is below a user-specified threshold
Basic Sequential Covering Algorithm
• Tuples of the class for which rules are learned are called positive tuples, while
the remaining tuples are negative
Basic Sequential Covering Algorithm
How are Rules Learned?
• Start with the most general rule possible:
– IF THEN loan_decision = accept
• Add new attributes by adopting a greedy depth-first
strategy
– Pick the one that improves the rule quality most
– E.g. maximize rule’s accuracy
• Similar to situation in decision trees: problem of
selecting an attribute to split on
• The resulting rule should cover relatively more of the
“accept” tuples
Rule Learning
Rule Learning
IF {} THEN class = a IF x > 1.2 THEN class = a IF x > 1.2 and y > 2.6 THEN class = a
• Possible rule set for class “b”

– IF x ≤ 1.2 THEN class = b
– IF x > 1.2 and y ≤ 2.6 THEN class = b
• Each new test reduces rule’s coverage

Rule-Quality Measures
• Rule R1 correctly classifies 38 of the 40 tuples it
covers
• Rule R2 covers only two tuples, which it
correctly classifies
• R2 (100%) has greater accuracy than R1 (95%)
• R2 not the better rule because of its small Rules for the class loan_decision = accept,
showing accept (a) and reject (r) tuples
coverage
• Accuracy on its own is not a reliable estimate of rule quality

• Coverage on its own is not useful either
– for a given class we could have a rule that covers many tuples, most of which belong to
other classes!
• Need to consider rule quality measures which may integrate aspects of
accuracy and coverage
FOIL Gain
• Entropy - prefers rules that cover a large number of tuples of a
single class and few tuples of other classes
• Foil-gain (in FOIL & RIPPER): assesses information gained by
extending the antecedent
• where pos (neg) is the number of positive (negative) tuples covered by R,

pos’ (neg’) number of positive (negative) tuples covered by R’
• Favors rules that have high accuracy and cover many positive
tuples
IF {} THEN class = a IF x > 1.2 THEN class = a IF x > 1.2 and y > 2.6 THEN class = a
Rule Pruning
• Pruning = remove conjunct (attribute test) from the rule
• Prune a rule, R, if the pruned version of R has greater quality, as
assessed on an independent set of tuples
• If FOIL_Prune is higher for the pruned version of R, prune R

• These assessments are performed on pruning set (validation
set), otherwise overfitting
Likelihood Ratio Statistic
• A statistical test of significance
• Determines if the apparent effect of a rule is not attributed to chance,
instead indicates a genuine correlation between attribute values and classes
• Test compares the observed distribution among classes of tuples covered by
a rule with the expected distribution that would result if the rule made
predictions at random
• m is the number of classes

• For tuples satisfying the rule
– fi is the observed frequency of each class i among the tuples
– ei is the expected frequency of each class i if the rule made random predictions
• The statistic has a chi-square distribution with m-1 degrees of freedom
• Higher the likelihood ratio, more likely there is a significant difference in the
number of correct predictions made by the rule compared to a “random
guessor”
• Consider following pair of rules - R1: A → C and R2: A ∧ B → C
• Consider a validation set with 500 +ve examples and 500 -ve examples
• R1 is covered by 350 +ve examples and 150 -ve examples
• R2 is covered by 300 +ve examples and 50 -ve examples
• FOIL_Gain
– Rule R1 : pos’ = 350, neg’ = 150, pos = 500, neg = 500
– Rule R2 : pos’ = 300, neg’ = 50, pos = 500, neg = 500
• FOIL_Prune
• Likelihood Ratio
– Rule R1 :
• Expected number of +ve examples = 500/1000*(350+150) = 250
• Expected number of -ve examples = 500/1000*(350+150) = 250
– Rule R2 :
• Expected number of +ve examples = 500/1000*(300+50) = 175
• Expected number of -ve examples = 500/1000*(300+50) = 175
Rule-Based Classifiers
• Advantages:
– As highly expressive as decision trees
– Easy to interpret
– Easy to generate
– Can classify new instances rapidly
– Performance comparable to decision trees
– Can easily handle missing values and numeric attributes

IME672 - Lecture 48

Uploaded by

Copyright:

Available Formats

IME672 - Lecture 48

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IME672 - Lecture 48

Uploaded by

Copyright:

Available Formats

IME 672

Data Mining & Knowledge

Dr. Faiz Hamid

• R: IF age = youth AND student = yes THEN buys_computer = yes

• Can be generated either from a decision Tree or directly from

• If more than one rule is triggered, need conflict resolution

• What if no rule satisfied by X?

• IF age = youth AND student = no THEN buys_computer = no

• Rule pruning required

• Possible rule set for class “b”

• Each new test reduces rule’s coverage

• Accuracy on its own is not a reliable estimate of rule quality

• where pos (neg) is the number of positive (negative) tuples covered by R,

• If FOIL_Prune is higher for the pruned version of R, prune R

• m is the number of classes

– Rule R2 : pos’ = 300, neg’ = 50, pos = 500, neg = 500

You might also like