Tutorial
Tutorial
Mining
Rules 9
10
x2
x3
y4
y1
c1
c1
Classification Algorithms
Typical Algorithms:
• Decision trees
• Rule-based induction
• Neural networks
• Memory(Case) based reasoning
• Genetic algorithms
• Bayesian networks
Decision Tree: Example
Day Outlook Temperature Humidity Wind Play Tennis
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No
Outlook
• Aim
– A small set of rules as classifier
– All rules according to minsup and minconf
• Syntax
XÆ Y, where Y is restricted to the class attribute values
Why & How to Integrate
• Both classification rule mining and
association rule mining are
indispensable to practical applications.
• The integration is done by focusing on a
special subset of association rules
whose right-hand-side are restricted to
the classification class attribute.
– CARs: class association rules
Associative Classification (AC) Problem
Training
Associative classification Algorithm Data
Frequent Ruleitems:
• Evaluation
Rule Generator: Basic
Concepts
• Ruleitem
<condset, y> :condset is a set of items, y is a class label
Each ruleitem represents a rule: condset->y
• condsupCount
• The number of cases in D that contain condset
• rulesupCount
• The number of cases in D that contain the condset and
are labeled with class y
• Support=(rulesupCount/|D|)*100%
• Confidence=(rulesupCount/condsupCount)*100%
RG: Basic Concepts (Cont.)
• Frequent ruleitems
– A ruleitem is frequent if its support is above
minsup
• Accurate rule
– A rule is accurate if its confidence is above
minconf
• Possible rule
– For all ruleitems that have the same condset, the
ruleitem with the highest confidence is the
possible rule of this set of ruleitems.
• The set of class association rules (CARs)
consists of all the possible rules (PRs) that
are both frequent and accurate.
RG: An Example
• A ruleitem:<{(A,1),(B,1)},(class,1)>
– assume that
• the support count of the condset (condsupCount)
is 3,
• the support of this ruleitem (rulesupCount) is 2,
and
• |D|=10
– then (A,1),(B,1) -> (class,1)
• [supt=20% (rulesupCount/|D|)*100%
• confd=66.7%
(rulesupCount/condsupCount)*100%]
RG: The Algorithm
1 F 1 = {large 1-ruleitems};
2 CAR 1 = genRules (F 1 );
3 prCAR 1 = pruneRules (CAR 1 ); //count the item and class occurrences to
determine the frequent 1-ruleitems and prune it
4 for (k = 2; F k-1≠ Ø; k++) do
5 C k = candidateGen (F k-1 ); //generate the candidate ruleitems Ck
using the frequent ruleitems Fk-1
6 for each data case d∈ D do //scan the database
7 C d = ruleSubset (C k , d); //find all the ruleitems in Ck whose condsets
are supported by d
8 for each candidate c∈ C d do
9 c.condsupCount++;
10 if d.class = c.class then
c.rulesupCount++; //update various support counts of the candidates in Ck
11 end
12 end
RG: The Algorithm(cont.)
13 F k = {c∈ C k | c.rulesupCount≥ minsup};
//select those new frequent ruleitems to form Fk
14 CAR k = genRules(F k ); //select the ruleitems both accurate and frequent
15 prCAR k = pruneRules(CAR k );
16 end
17 CARs = ∪ k CAR k ;
18 prCARs = ∪ k prCAR k ;
Class Builder M1: Basic
Concepts
• Given two rules ri and rj, define: ri f rj if
– The confidence of ri is greater than that of
rj, or
– Their confidences are the same, but the
support of ri is greater than that of rj, or
– Both the confidences and supports are the
same, but ri is generated earlier than rj.
• Our classifier is of the following format:
– <r1, r2, …, rn, default_class>,
• where ri∈ R, ra f rb if b>a
M1: Three Steps
The basic idea is to choose a set of high precedence
rules in R to cover D.
• Sort the set of generated rules R
• Select rules for the classifier from R following
the sorted sequence and put in C.
– Each selected rule has to correctly classify at
least one additional case.
– Also select default class and compute errors.
• Discard those rules in C that don’t improve
the accuracy of the classifier.
– Locate the rule with the lowest error rate and
discard the rest rules in the sequence.
M1: Algorithm
• 1 R = sort(R); //Step1:sort R according to the relation “f”
• 2 for each rule r ∈ R in sequence do
• 3 temp = Ø;
• 4 for each case d ∈ D do //go through D to find those cases covered by
each rule r
• 5 if d satisfies the conditions of r then
• 6 store d.id in temp and mark r if it correctly classifies d;
• 7 if r is marked then
• 8 insert r at the end of C; //r will be a potential rule because it can correctly
classify at least one case d
• 9 delete all the cases with the ids in temp from D;
• 10 selecting a default class for the current C; //the majority class in the
remaining data
• 11 compute the total number of errors of C;
• 12 end
• 13 end // Step 2
• 14 Find the first rule p in C with the lowest total number of errors and drop all the
rules after p in C;
• 15 Add the default class associated with p to end of C, and return C (our
classifier). //Step 3
M1: Two conditions it satisfies
AC Ruleitem
Itemset Class Support Confidence
Supp=5%, confidence=40%
Number of datasets : 12-16 UCI data
Algorithms used:
• CBA (AC algorithm)
• MMAC (AC algorithm)
• Decision Tree algorithms (C4.5)
• Covering algorithms (RIPPER)
• Hybrid Classification algorithm (PART)
(%)
Co Ti
0
10
20
30
40
50
60
70
80
90
100
nt c
ac - T a
c
PART
t -l
en
se
Br s
ea L
st e d7
-c
an
RIPPER
W ce r
ea
th
He er
ar
t
CBA
He -c
ar
t-
Ly s
M mp
pr us h h
im r
a r oo m
MMAC
y-
tu
m
or
Vo
te
CR
Ba X
la Si
nc c
e- k
sc
al
e
Au
B r t os
Da ta se ts
Hy ea s
and MMAC on UCI data sets
po t -w
th
yr
oi
d
z
k r oo
-v
Accuracy (%) for PART, RIPPER, CBA
s-
kp
Comparison between AC algorithms on 12
UCI data sets
30
27
24
21
18 CBA
CMAR
(%)
15
CPAR
12 MCAR
0
1 2 3 4 5 6 7 8 9 10 11 12
Data
Potential Applications for
Associative Classification
• Text Categorisation
Hyperheuristics
Hyperheuristic
Heuristic
Choice
Problem
Benefits of Hyperheuristics
• Low level heuristics easy to implement
• Objective measures may be easy to
implement – they should be present to
raise decision quality
• Rapid prototyping – time to first solution
low
Concrete example
• Organising meetings at a sales summit
• Low level heuristics:
– Add meeting, delete meeting, swap meeting,
add delegate, remove delegate, etc.
• Objectives:
– Minimise delegates
– Maximise supplier meetings
Concrete Example
• Hyperheuristic based on the exponential
smoothing forecast of performance, compared to
simple restarting approaches
• Result: 99 delegates reduced to 72 delegates
with improved schedule quality for both
delegates and suppliers
• Compares favourably with bespoke
metaheuristic (Simulated Annealing) approach
• Fast to implement and easy to modify
Other applications
• Timetabling mobile trainers
• Nurse rostering
• Scheduling project meetings
• Examination timetabling
Other Hyperheuristics
• Genetic Algorithms
– Chromosomes represent sequences of low
level heuristics
– Evolutionary ability to cope with changing
environments useful
• Forecasting approaches
• Genetic Programming approaches
• Artificial Neural Network approaches
Role of Data Mining (AC) in Hyperheuristic
Apply DM (AC)
Problem: algorithm
Assume there are 10 LLHs,
then at each choice point we
have to test all available LLHs LLH1 LLH1 LLH1 class
to select just a single one to 10 2 2 48
apply
2 20 2 73
48 48 70 2
LLH1 LLH1 LLH1 class 10 2 2 48
10 2 70 10
10 2 2 ?
48 48 2 73
2 20 2 73
Classifier
(Model)
Text categorisation task
• Multi-label classification