0% found this document useful (0 votes)

76 views18 pages

A Generic Framework For Rule-Based Classification

This document proposes a generic framework for rule-based classification that encompasses existing classification approaches such as decision trees, rule induction, association-based, and instance-centric methods. The framework provides a formal context to define concepts like data objects, patterns, and operators for constructing and optimizing classifiers. It also presents a generic incremental classification algorithm that can represent existing classification algorithms in a uniform manner.

Uploaded by

Trang Vũ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views18 pages

A Generic Framework For Rule-Based Classification

Uploaded by

Trang Vũ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

A Generic Framework for Rule-Based

Classification

Arnaud Giacometti, Eynollah Khanjari Miyaneh

Patrick Marcel, Arnaud Soulet

L.I. Université François Rabelais de Tours

41000 Blois, France
[email protected]
{arnaud.giacometti, patrick.marcel, arnaud.soulet}@univ-tours.fr

Abstract. Classification is an important field of data mining problems.

Given a set of labeled training examples the classification task constructs
a classifier. A classifier is a global model which is used to predict the class
label for data objects that are unlabeled. Many approaches have been
proposed for the classification problem. Among them, rule-induction, as-
sociative and instance-centric approaches have been closely integrated
with constraint-based data mining. There also exist several classification
methods based on each of these approaches, e.g. AQ, CBA and HAR-
MONY respectively. Moreover, each classification method may provide
one or more algorithms that exploit particular local pattern extraction
techniques to construct a classifier. In this paper, we proposed a generic
classification framework that encompasses all the mentioned approaches.
Based on our framework we present a formal context to define basic con-
cepts, operators for classifier construction, composition of classifiers, and
class prediction. Moreover, we proposed a generic classifier construction
algorithm (ICCA) that incrementally constructs a classifier using the
proposed operators. This algorithm is generic in the sense that it can
uniformly represent a large class of existing classification algorithms. We
also present the properties under which different optimization possibili-
ties are provided in the generic algorithm.

Key words: classification, classifier, classification rule, prediction, lo-

cal patterns, global patterns.

1 Introduction
Many pattern extraction methods have been proposed in the constraint-based
data mining field. On the one hand, individual patterns are declaratively specified
under a local model using various class of constraints. The local pattern mining
algorithms, e.g. a typical association rule mining algorithm, extracts individual
patterns that are capable of describing some portions of the whole underlying
database from which they have been generated. The descriptive property of the
local patterns are independent from each other. On the other hand, any arbi-
trary combination of individual patterns can be assumed as a global pattern
[14]. In a classification task, it should also be described how the global pattern
is applied to comply with the user’s request, i.e. class label prediction for un-
classified data objects. Therefore, classification is a predictive global modeling
task among data mining problems. Given a set of (labeled) training examples,
i.e. a training database, the classification task constructs a classifier. A classifier
is a global model which not only describes the whole training database achieving
some level of accuracy, but also is used to predict the class label for data objects
that are unlabeled. Intuitively, not only the problem of finding a (minimal) set
of patterns describing the whole training database and achieving the maximum
accuracy is intractable, but also this problem is intractable even for the binary-
class classification problem [10] w.r.t the domain size of the training data set.
Moreover, although the local pattern extraction methods can be adapted and
exploited for classifier construction, however, the problem remains intractable
for general measure functions. More clearly, when the pattern evaluation mea-
sure is an arbitrary measure it lacks some useful properties to be used to reduce
the search space of patterns.

1.1 Related works

Considering the limitations mentioned above, many approaches have been pro-
posed aiming at the construction of an (approximately) accurate classifier. Among
them, the following approaches are well-known in the literature: decision-tree,
rule-induction, association-based, instance-centric and hybrid ones. More pre-
cisely, a number of classification methods have been presented in the light of
each of these approaches where each method provides one or more particular
algorithms to construct a classifier. We clarify this by some examples. ID3 [11]
and C4.5 [12] are based on decision-tree induction approach which directly con-
struct a classification model from the training database following a top-down
and divide-and-conquer paradigm. AQ [9] and CN2 [1] are two methods based
on rule-induction approach where the methods induce one rule at a time and all
the data examples covered by that rule are removed from the search space. The
associative approach integrates association rule mining with classification. CBA
[8], CMAR [7] and CorClass [18] are examples of methods based on associative
approach. In the instance-centric approach the final set of class association rules
is directly extracted such that for each training example it includes at least one
of the most interesting rules (correctly) covering that example. HARMONY [16]
and DeEPs [5] are two instance-centric classifiers. Note that our framework is
based on rule-based approach so that we take into consideration the extraction of
k rules at a time which extends the traditional rule-induction approach, i.e. one
rule at a time. Moreover, by rule-based approach we mean the employment of
local patterns to construct a classifier, i.e. a global model, such that some order
on the extracted rules are to be provided. Therefore, in our rule-based classifica-
tion approach (RBC) the local patterns are not to be considered individual and
independent from each other. In this sense, we does not assume the decision-
tree induction as a rule-based approach since a decision-tree algorithm directly
constructs a global model without employing any individual local pattern dur-
ing the construction process. Excepting [14] which presents a theoretically exact
formalism for the problem of constraint-based global pattern mining, there is no
a general framework for the classification problem in the literature. In fact, in
[14] the authors defined a general constraint-based framework and tried to adapt
the properties from local pattern mining for use in pattern set mining. However,
it suffers from the combinatorial explosion of pattern space (or more precisely,
pattern set spaces) in practical and real-word problems.

1.2 Motivation and Contributions

The generic approach which is presented in this paper, has mainly been moti-
vated by the fact that it is a longterm goal in the data mining (and especially in
the inductive database) community to provide a generic pattern mining frame-
work so that different pattern types can be uniformly specified taking into ac-
count both structural and operational aspects. It is also interested to provide a
formal and generic framework such that different approaches and methods for
the extraction of global patterns (of one or more types) can be uniformly repre-
sented. Despite many and diverse classification approaches and methods, there
is not a generic framework for the classification problem. This motivated us to
propose a generic framework for rule-based classification respecting the two as-
pects. In the structural aspect it is desirable to formally specify data objects and
patterns in a uniform and language-based manner. For the operational aspect,
some primitives, basic operators, and compositional operators are needed for the
construction of a classification model. In this paper all the concepts of our ap-
proach are specified in a formal framework. The generality of the framework can
be interpreted from different viewpoints. First, we present a generic definition for
data-dependent order over rules and classifiers as well as basic operators. In fact,
we provide a set of operators for rule extraction, construction and composition of
classifiers, and class prediction of data objects. Second, our approach is generic
in the sense that it encompasses, if not most, many of the existing classification
approaches. In other words, the rule-based approach is a conceptual integration
of rule-induction, associative, and instance-centric approaches. Third, from an
algorithmic point of view, we proposed a generic classifier construction algorithm
(ICCA) that incrementally constructs a classifier using the proposed operators.
This algorithm is generic in the sense that it can uniformly represent a large class
of existing classification algorithms, e.g. AQ [9], CN2 [1], CBA [8], CMAR [7],
FOIL [13], PRM [17], CorClass [18] and HARMONY [16]. We formally represent
all these algorithms in terms of our generic algorithm. This is a non exhaustive
list of classifiers and then, many others can be described, as for example DeEPs
[5], CAEP [3], JEP [6], RIPPER [2] and so on. We also present the properties
under which different optimizations can be provided in the generic algorithm.

This paper is organized as follows: Section 2 provides basic definitions and

introduces several operators and their properties for the classification problem in
a language-based formal framework. Section 3 introduces the generic incremental
classification algorithm. In Section 4 some existing classifiers are represented in
terms of our rule-based approach and the algorithms are reformulated in ICCA
form. In Section 5 we introduce a comparative study of different classification
methods based on properties of the rule-based approach and ICCA. Finally Sec-
tion 6 concludes the main characteristics of the framework and discusses future
works.

2 Formal Framework
In this section, we give the formal definitions of the notions used throughout the
paper to describe the rule-based classification process. This process consists of
using a training set of labeled objects (called dataset of examples in what follows,
see Section 2.1) from which classification rules are extracted (see Section 2.2) by
the employment of basic operations (see Section 2.3) for building a classifier (see
Section 2.4), and using it to predict the class label of a given unlabeled object
(see Section 2.5).

2.1 Objects and examples

Objects Let A be a set of attribute names and D be a set of constants. An object
o is a tuple over A, which is called the schema of o, noted sch(o), and for a given
A ∈ A, o(A) ∈ D denotes the value of the attribute A for o. Let O be the set of
objects.

Data examples Let Class be a set of class labels. A data example (example for
short) over a given schema A is a tuple ho, ci where o ∈ O, A is the schema of o
and c ∈ Class. Given an example e = ho, ci, we note c = cl(e). In what follows,
we consider examples over a fixed schema A, and we denote by E this set of
examples over A.
A data set E is a subset of E, for which |{c/∃o ∈ O, ho, ci ∈ E}| is denoted
by N bClass. Given a class label cj ∈ Class and a data set E ⊆ E, we denote by
Ej the set of examples in E of class cj , i.e., Ej = {e ∈ E|cl(e) = cj }.

2.2 Classification Rules

We now recall the classical definitions of classification rules.

Rules Given the set of examples E over a schema A, a rule is a tuple ho, ci, where
o ∈ O, the schema of o is a schema {A1 , . . . , An } ⊆ A and c ∈ Class. A rule is
noted o → c, and we use |r| to denote |sch(o)|. We denote by R the set of rules.

Rule specialization A rule r = o → c is more general than a rule r0 = o0 → c0

if sch(o) ⊆ sch(o0 ) and ∀A ∈ sch(o), o0 (A) = o(A). This is noted r ≺R r0 . If
two rules satisfy a given rule quality our approach prefers the more general one,
regardless of their class labels.
Coverage A rule r = o → c covers an example e = ho0 , c0 i, noted r e, if
sch(o) ⊆ sch(o0 ) and ∀A ∈ sch(o), o0 (A) = o(A).

Support and confidence Let r = o → c be a rule and E be a data set, the

support and confidence of r are defined as usual by, respectively: sup(r, E) =
|{e∈E|r e∧cl(r)=cl(e)}| e∧cl(r)=cl(e)}|
|E| and conf (r, E) = |{e∈E|r
|{e∈E|r e}| .

2.3 Theory and Topk

We now introduce the two basic operations used in the description of the classi-
fication process. Note that we give here a generic definition of these operations,
in the sense that they are given for any language L.
The first operation is the theory computation operation T h that extracts
from a data set the elements of a given language L satisfying a given selection
predicate.

Selection predicate Given a set L, a selection predicate for L is used to express

a condition on the elements of L w.r.t. E. It is a boolean function on L × 2E .

Theory Given a set L, a data set E ⊆ E and a selection predicate q for L, a

theory of elements of L is the set T h(L, E, q) = {ϕ ∈ L|q(ϕ, E) = true}.

The second operation, Topk , extracts the k best elements of a language L w.r.t.
a given order on L. This order may depend on a given data set. To this end we
first define the notion of data dependent order.

Data dependent order (E-order) A data dependent order (or E-order) on a given
set L is a relation α ⊆ L × L × 2E such that, for a given E ∈ 2E , the relation
α[E] = {hϕ, ϕ0 i|hϕ, ϕ0 , Ei ∈ α} is an order on L. In what follows, we note
α(ϕ, ϕ0 , E) = true if hϕ, ϕ0 , Ei ∈ α.

Topk Given a set L, a dataset E and α an E-order, the top k elements of L

w.r.t. the E-order α is the set T opk (L, E, α) = {ϕ ∈ L/|{ϕ0 ∈ L/α(ϕ0 , ϕ, E) =
true}| < k}

Example 1. Given a dataset E ⊆ E, the language R of classification rules, and

the predicate q such that for a given t, ∀r ∈ R, q(r, E) = true if conf (r, E) > t,
T h(R, E, q) extracts those classification rules from E having confidence greater
than t.
An example of E-order on R is αcsc , that we will use in the subsequent
sections, defined for every r, r0 ∈ R2 and E ⊆ E by αcsc (r, r0 , E) = true iff :

– conf (r, E) > conf (r0 , E), or

– conf (r, E) = conf (r0 , E) and sup(r, E) > sup(r0 , E), or
– conf (r, E) = conf (r0 , E) and sup(r, E) = sup(r0 , E) and |r| < |r0 |.
2.4 Classifiers
We now propose a definition of a rule-based classifier. Informally, a rule-based
classifier consists of a set of rules, used in a given order during the prediction
process, to classify unlabeled objects.
Definition 1. (Classifier) Let C be the set of all classifiers. A classifier is a
tuple hR, <R i where R is a set of rules and <R is an order on R. If C = hR, <R i
is a classifier and r is a rule, we use |C| to denote |R| and r ∈ C to denote r ∈ R.

Classifier inclusion Let C1 = hR1 , <R1 i and C2 = hR2 , <R2 i be two classifiers.
We say that C1 is included in C2 , noted C1 v C2 , if R1 ⊆ R2 and <R1 ⊆<R2

Examples covered Let E be a dataset and C be a classifier. We distinguish three

categories of examples covered by a classifier:
– covered(E, C) = {e ∈ E|∃r ∈ C, r e}
– covered+ (E, C) = {e ∈ E|∃r ∈ C, r e ∧ cl(r) = cl(e)}
– covered− (E, C) = {e ∈ E|∃r ∈ C, r e ∧ cl(r) 6= cl(e)}

Default classifier A default classifier is a function from 2E to C, that gives a

particular classifier used to provide a default class label. In this paper, given a
set of examples E, we restrict to the function outputing the classifier h{∅ → c}, ∅i
where c is such that nb(c, E) = max{nb(c0 , E)|c0 ∈ {cl(e)|e ∈ E}} where for a
given c0 , nb(c0 , E) = |{e ∈ E/cl(e) = c0 }|.

Operators on classifiers Our description of the classifier construction process

is incremental. The following operators show how a classifier can be obtained
using operators on classifiers.

Concatenation Let C1 = hR1 , <R1 i and C2 = hR2 , <R2 i be two classifiers such
that R1 ∩ R2 = ∅. The concatenation of the two classifiers is defined as C1 C2 =
hR1 ∪ R2 , < i, where < =<R1 ∪ <R2 ∪(R1 × R2 ). Note that operator does not
commute, since × does not commute.

Union Let C1 = hR1 , <R1 i and C2 = hR2 , <R2 i be two classifiers such that
R1 ∩R2 = ∅. The union of the two classifiers is defined as C1 ∪C2 = hR1 ∪R2 , <∪ i,
where <∪ =<R1 ∪ <R2 .

Difference Let C1 = hR1 , <R1 i and C2 = hR2 , <R2 i be two classifiers. The
difference C1 \C2 is the classifier hR1 \R2 , <R1 \{hr, r0 i ∈<R1 |r ∈ R2 ∨r0 ∈ R2 }i.
By abuse of notation we sometime write C1 \ R2 to denote C1 \ hR2 , ∅i and
R2 \ C1 to denote R2 \ R1 .

2.5 Prediction
Finally we give the definitions used for describing the predication process. A
class prediction operator is a mapping from O × C to E. Below are two examples
of a class prediction operator.
Best rule prediction Let C = hR, <R i be a classifier such that <R is a total order,
and o be an object. BestRule(o, C) = ho, cl(rp )i where rp = max<R {r ∈ R}.

Aggregate prediction For a rule r and a dataset E, let w be a function from

R × 2E to R that gives the value of a measure (e.g., confidence) of r w.r.t E.
Let C = hR, <R i be a classifier, cl(C) be the class labels of C i.e., the set
{c ∈ Class|∃o ∈ O, o → c ∈ R}. Given an object o, a class label ci ∈ cl(C),
a data set E and an aggregate function agg of signature 2R → [0, 1] and an
E-order α defined by α(r, r0 , E) = true if w(r, E) > w(r0 , E), we define for each
class ci the set R(ci ) = T opk ({o0 → ci ∈ R|o0 o}, E, α) of the k best rules
covering o, and W (ci ) = agg{w(r, E)|r ∈ R(ci )} the aggregate of the measure
value of these rules.
Then AggP redk,w,agg (o, C) = ho, cp i where cp is such that W (cp ) =
max{W (ci )|ci ∈ cl(C)}.

Error Rate Given a set of examples E, a classifier C and a prediction operator

P red, we denote by Error(E, C, P red) the ratio of examples in E misclassified
by C using P red, i.e. Error(E, C, P red) = |{ho,ci∈E | ho,ci6
|E|
=P red(o,C)}|
.

3 ICCA: A Generic Incremental Classifier Construction

Algorithm
In this section we present a generic classifier construction algorithm for build-
ing any classifier presented under rule-induction, associative, or instance-centric
approaches, and therefore, any combination of them.
Given a set of examples E, a classifier C can be seen as the best set of rules
w.r.t a data-dependent order O over classifiers: C = T op1 (C, E, O). Obviously,
evaluating the interestingness of a classifier (i.e., the definition of O) is a hard
task [4]. Assuming that O is known, finding the best classifier remains a challenge
due to the huge search space C. To the best of our knowledge, only one work [14]
addresses the exact resolution of this problem. It aims to extract a set of (individ-
ual) pattern sets each satisfying a constraint at pattern set level. The adaptation
of local pattern mining techniques into pattern set mining suffers from efficiency
as well as scalability viewpoints. On the one hand, the efficiency of this approach
relies on specific boundability property of O. On the other hand, the proposed
algorithm doesn’t seem to be feasible on large databases. For this purpose, most
of the existing classification methods use heuristic algorithms for building a rule-
based classifier C which is an approximate solution of T op1 (C, E, O). Basically,
such methods try to build a global optimal set of rules by iteratively selecting
the best rule (i.e., local optimal rule).

3.1 Principles and Parameters

Our generic classifier construction algorithm is named ICCA (Incremental Clas-
sifier Construction Algorithm). Intuitively, it iteratively constructs a classifier
based on a training database starting from an empty classifier. The basic princi-
ple is that at each iteration of ICCA, a set of best rules w.r.t. a local selection
predicate is concatenated to the classifier constructed at the previous iteration,
thus making the resulting classifier more accurate (and more predictive).
Before presenting ICCA (see Algorithm 1) in more details, let us first start
with a short description of ICCA’s parameters:

– R is a rule language.
– E is a training database.
– K is the number of rules that should be extracted at each iteration of the
algorithm.
– OrdGen is a function used to generate an E-order on R from a classifier
(i.e., mainly a set of rules). For instance, if C is a classifier, and E is the
set of examples not yet covered by the rules of C, OrdGen(C) = α where α
is such that, for every r, r0 ∈ R2 α(r, r0 , E) = true iff the confidence of r is
greater than the confidence of r0 or, if they are equal, the size of r is smaller
than the size of r0 .
– P redGen is a function used to describe a selection predicate for R under
which the extraction of a set of rules is taking place. For example, if C is
a classifier and E is the set of examples not yet covered by the rules of C,
P redGen(C) = q where q is such that, for every r ∈ R we have q(r, E) = true
iff the support of r is greater than a given threshold.
– V is a function used to describe a selection predicate for C, in order to
evaluate the quality (or accuracy) of a classifier w.r.t the whole training
dataset. For example, if C is a classifier and E a dataset, V (C, E) can be
such that it outputs true iff C covers all examples of E.
– O is an E-order on C. For example, if C1 and C2 are two classifiers and E
is a dataset, O(C1 , C2 , E) = true iff the number of examples covered by C1
is greater than that of C2 .

The next Section describes the algorithm more detailed.

3.2 The ICCA Algorithm

Suppose that ICCA is called with: ICCA(R, E, k, OrdGen, P redGen, V, O).

ICCA starts from an empty classifier (Line 1) and constructs its output classifier
by iterating until the classifier satisfies the V validation predicate or no more
rules can be added to the classifer (Line 8). At each step of the loop, a classifier
is constructed with a set of k best rules along with a (local) ordering relation
over them. The rules are extracted with the T h operation (Line 5), where the
classifier constructed during the previous step is taken into account for generating
the selection predicate (Line 4) and ignoring the rules already extracted (Line 5).
Out of the rules extracted with T h, the best ones are extracted with the T opk
operation (Line 6), where here again the rules extracted during the previous
steps are taken into account for generating the E-order used for ranking (Line
4). Then a new classifier is obtained by concatenating the classifier constructed
Algorithm 1 ICCA(R, E, K, OrdGen, P redGen, V, O)
Input: A rule language R, a training dataset E, an integer K, an order generator
function orderGen, a predicate generator function P redGen, a classifier validation
predicate V and a classifier E-order O
Output: A classifier C
1: i = 0 and C0 = h∅, ∅i
2: repeat
3: i=i+1
4: αi = OrdGen(Ci−1 ) and qi = P redGen(Ci−1 )
5: Ri = T h(R \ Ci−1 , E, qi )
6: Ti = T opK (Ri , E, αi )
7: Ci = Ci−1 · hTi , αi [E]i
8: until (V (Ci , E) = true) or Ci = Ci−1
9: C = T op1 ({Ck | 1 ≤ k ≤ i}, E, O)
10: return C

during the previous steps with the newly extracted best rules along with the
order relation on them (Line 7). Finally, out of all the classifiers constructed
during the loop, the best one w.r.t O is returned (Line 9 and 10).
The proposed operators and ICCA uniformly describe in a as declarative as
possible fashion various classification approaches and algorithms, that have their
own requirements and properties. To the best of our knowledge, this is the first
time that all the related definitions are formally specified in the classification
context. In order to illustrate the generality of our framework, the next sec-
tion shows how our approach integrates and represents requirements of different
classification methods using ICCA. Optimization aspects are also discussed in
Section 5.

4 Representation of Existing Classifiers in ICCA

Framework

In this section, we show how existing classification algorithms can be described in

the ICCA framework. Some algorithms are described in detail (AQ, CN 2, CBA,
F OIL and HARM ON Y ). Due to lake of space, we only summarize the main
features of other algorithms (CM AR, CorClass and P RM ). It is important to
note that the diversity of the methods, and hence exploited heuristics, makes the
notations to be rather complicated. Further details, e.g. interestingness measures,
can be found in the respective references for each method.

4.1 AQ

For multi-class classification problem, AQ needs to be applied multiple times,

each time mining the rules for one class. In our framework, it means that ICCA
will be executed for each class in Class. Then, a global classifier is obtained
using the union operator between classifiers. More formally, the AQ classifier,
denoted by CAQ , is defined by:
0 0
CAQ = CAQ Cdef ault (E \ covered(E, CAQ ))
N bClass
0
S
where CAQ = AQj and for every j ∈ {1, . . . , N bClass}:
j=1

AQj = ICCA(R, E, 1, OrdGenAQj , P redGenAQj , VAQj , OAQj )

and the functions OrdGenAQj , P redGenAQj , VAQj , and OAQj are defined for
every classifier C, C1 , C2 ∈ C and class cj ∈ Class by:
– OrdGenAQj (C) = αAQj where for every r, r0 ∈ R2 and E ⊆ E, we have
αAQj (r, r0 , E) = true iff:
• sup(r, E 0 ) > sup(r0 , E 0 ) or
• sup(r, E 0 ) = sup(r0 , E 0 ) ∧ |r| < |r0 |
where E 0 = Ej \ covered(Ej , C).
– P redGenAQj (C) = qjseed where for every r ∈ R and E ⊆ E, qjseed (r, E) =
true iff:
• cl(r) = cj , and
• rseed where seed = max≺E (Ej \covered(Ej , C)), i.e. r covers correctly
an example of E of class cj which is not yet classified correctly by any
rule of C, and
• (@e ∈ E \ Ej )(r e), i.e. r does not cover an example of E that is of
class ck 6= cj .
– For every E ⊆ E, VAQj (C, E) = true iff Ej = covered(Ej , C), i.e. C covers
all examples of E of class cj .
– For every E ⊆ E, OAQj (C1 , C2 , E) = true iff covered(Ej , C1 ) ⊃ covered(Ej ,
C2 ), i.e. C1 covers more example of E of class cj than C2 .
In practice, note that the last constraint in the definition of qjseed should be re-
laxed to overcome the problem of multi-label (contradictory) training examples.
Given an object o ∈ O and a classifier CAQ , the prediction operator used by
AQ is defined by: P redictAQ (o, CAQ ) = AggP red+∞,w,agg (o, CAQ ), where for
every rule r ∈ R and data sets E ⊆ E, w(r, E) is the support of rule r in E, and
agg is the probabilistic sum (P sum) aggregation [15].

4.2 CN2
By comparison with AQ, the CN 2 algorithm builds directly one classifier for
multi-class problems. Formally, the CN 2 classifier, denoted by CCN 2 , is defined
by:
0 0
CCN 2 = CCN 2 Cdef ault (E \ covered(E, CCN 2 )) with
0
CCN 2 = ICCA(R, E, 1, OrdGenCN 2 , P redGenCN 2 , VCN 2 , OCN 2 )

where the functions OrdGenCN 2 , P redGenCN 2 , VCN 2 and OCN 2 are defined for
every classifier C, C1 , C2 ∈ C as follows:
– OrdGenCN 2 (C)= αCN 2 where for every r, r0 ∈ R2 and E ⊆ E, we have
αCN 2 (r, r0 , E) = true iff:
• entropy(r, E 0 ) < entropy(r0 , E 0 ), or
• entropy(r, E 0 ) = entropy(r0 , E 0 ) and |r| < |r0 |
where E 0 = E \ covered(E, C).
– P redGenCN 2 (C) = qCN 2 where for every r ∈ R and E ⊆ E, qCN 2 (r, E) =
true iff F (r, E 0 ) ≥ τ , where F is a significance measure such as χ2 (or
Likelihood ratio), τ is a minimum threshold and E 0 = E \ covered(E, C).
– For every E ⊆ E, VCN 2 (C, E) = true iff E = covered(E, C).
– For every E ⊆ E, OCN 2 (C1 , C2 , E) = true iff |C1 | > |C2 |.

The result classifier CCN 2 is totally ordered. Therefore,the BestRule pre-

diction operator can be used by CN 2, i.e. for every object o ∈ O, we have:
P redCN 2 (o, CCN 2 ) = BestRule(o, CCN 2 ).

4.3 CBA

As CN 2, CBA builds directly one classifier for multi-class problems. Formally,

the CBA classifier, denoted by CCBA , can be specified as follows:

CCBA = ICCA(R, E, 1, OrdGenCBA , P redGenCBA , VCBA , OCBA )

where the functions OrdGenCBA , P redGenCBA , VCBA , and OCBA are defined
for every classifier C, C1 , C2 ∈ C by:

– OrdGenCBA (C) = αCBA where for every r, r0 ∈ R2 and E ⊆ E, we have

αCBA (r, r0 , E) = true iff αcsc (r, r0 , E) = true.
– P redGenCBA (C) = qCBA where for every r ∈ R and E ⊆ E, qCBA (r, E) =
true iff:
• sup(r, E) ≥ α and conf (r, E) ≥ β, and
• for all direct generalization r0 of r, P essError(r0 , E) >
P essError(r, E)), where P essError is the pessimistic error rate,
and
• (∃e ∈ E \ covered(E, C))(r e ∧ cl(r) = cl(e)).
– For every E ⊆ E, VCBA (C, E) = true iff E = covered(E, C).
– For every E ⊆ E, OCBA (C1 , C2 , E) = true iff Error(C10 , E, P redCBA ) ≤
Error(C20 , E, P redCBA ) where Ci0 = Ci Cdef ault (E \ covered(E, Ci )) (i =
1, 2).

As for CN 2, since the result classifier CCBA is totally ordered, CBA can used
the BestRule prediction operator. For every object o ∈ O, we have: P redCBA (o,
CCBA ) = BestRule(o, CCBA ).
CMAR and CorClass In this paper, we do not describe CM AR in detail
since CM AR is mainly an extension of CBA. In comparison with CBA, CM AR
selects only positively correlated rules (by χ2 testing). Moreover, instead of re-
moving an example as soon as it is covered by a rule, it only removes examples
that are covered by more than δ rules, where δ is a parameter of CM AR. Fi-
nally, in the prediction operator used by CM AR, instead of confidence, w(r, E)
is the weighted−χ2 ; this measure is used to overcome the minority class favoring
problem.
In comparison with CBA and CM AR, CorClass directly extracts the k rules
with the highest significance measures (χ2 , information gain, etc.) on the data
set, meaning that ICCA iterates only once. On the other hand, the classifiers
built by CorClass are evaluated using different prediction operators (Best Rule
or Aggregate prediction with different weighted combinations of rules).

4.4 FOIL

As the algorithm AQ, the algorithm F OIL has to be applied on each class for
multi-class problems. Moreover, in order to compare rules, F OIL uses a specific
gain measure defined as follows.

Definition 2. (Foil Gain). Given two rules r1 and r2 such that r2 ≺R r1 , the
gain to specialize r2 to r1 w.r.t. a set of examples E is defined by:

|P1 | |P2 |
gain(r1 , r2 , E) = |P1 |(log( ) − log( ))
|P1 | + |N1 | |P2 | + |N2 |

where P2 = covered+ (E, C2 ), N2 = covered− (E, C2 ), P1 = covered+ (P2 ∪

N2 , C1 ), N1 = covered− (P2 ∪ N2 , C1 ) with Ci = h{ri }, ∅i (i = 1, 2).

SN Formally,
bClass j
the F OIL classifier, denoted by CF OIL , is specified by CF OIL =
j=1 CF OIL where for every j ∈ {1, . . . , N bClass}:

CFj OIL = ICCA(R, E, 1, OrdGenjF OIL , P redGenjF OIL , VFj OIL , OFj OIL )

where the functions OrdGenjF OIL , P redGenjF OIL , VFj OIL , and OFj OIL are de-
fined for any classifiers C, C1 , C2 by:

– OrdGenjF OIL (C) = αF j 2

OIL where for every r1 , r2 ∈ R and E ⊆ E, we
j
have αF OIL (r1 , r2 , E) = true iff gain(r1 , r1 ∧ r2 , E ) > gain(r2 , r1 ∧ r2 , E 0 )
0

where r1 ∧ r2 is the most specific rule that is more general than r1 and r2
(r1 ∧ r2 = min≺R {r ∈ R | r ≺R r1 , r ≺R r2 }) and E 0 = Ej \ covered(Ej , C).
– P redGenjF OIL (C) = qFj OIL where for every r ∈ R and E ⊆ E, we have
qFj OIL (r, E) = true iff cl(r) = cj and |r| ≤ L where L is a parameter of
F OIL.
– For every E ⊆ E, VFj OIL (C, E) = true iff Ej = covered(Ej , C).
– For every E ⊆ E, OFj OIL (C1 , C2 , E) = true iff |C1 | > |C2 |.
Given a classifier CF OIL , F OIL can use the prediction operator defined for
every object o ∈ O by: P redictF OIL (o, CAQ ) = AggP red+∞,w,agg (o, CF OIL ),
where for every rule r ∈ R and data sets E ⊆ E, w(r, E) is the confidence of rule
r in E, and agg is the sum aggregation function.

PRM In this paper, we do not describe P RM in detail since P RM is mainly

an extension of F OIL. By comparison with F OIL, after an example is correctly
covered by a rule, P RM does not remove it. More precisely, a weight is associated
to every example of the data set, and when an example is correctly covered by
a new rule, its weight is decreased. These weights are mainly used to evaluate
the gain of a rule specialization. They are also used to stop the incremental
construction of a classifier. On the other hand, note that these weights are only
used during the construction of the classifier ; They are not used by the prediction
operator of P RM .

4.5 HARMONY

In comparison with the previous classification methods, HARM ON Y uses an

instance-centric rule generation procedure, meaning that it mines for each
training example the K highest confidence rules that cover it correctly. Then,
HARM ON Y orders the rules w.r.t. their class label, confidence and support.
In order to specify the classifier build by HARM ON Y , we introduce in the
following definition of an operator orderby that allows to re-order the rules of a
classifier.

Definition 3. (Order By.) Given a rule E-order α, the orderby operator is

defined for every classifier C = hR, >R i and data set E by: orderbyα (C, E)
= hR, α[E]i.

Using this definition, the HARM ON Y classifier, denoted by CHARM , is de-

e
S
fined for every data set E by: CHARM = orderbyαALL ( e∈E CHARM , E) where:

– For every r, r0 in R and data set E ⊆ E, αALL (r, r0 , E) = true if:

• cl(r) = cl(r0 ) and conf (r, E) > conf (r0 , E), or
• cl(r) = cl(r0 ), conf (r, E) = conf (r0 , E) and sup(r, E) > sup(r0 , E).
e
– For every training example e, CHARM is specified by:
e
CHARM = ICCA(R, E, k, OrdGenHARM , P redGeneHARM , VHARM , OHARM )

where the functions OrdGenHARM , P redGeneHARM , VHARM and OHARM

are defined for every classifier C, C1 , C2 ∈ C by:
• OrdGenHARM (C) = αHARM where for every r, r0 ∈ R2 and E ⊆ E, we
have αHARM (r, r0 , E) = true iff conf (r, E) > conf (r0 , E).
• P redGeneHARM (C) = q e , where for every r ∈ R and E ⊆ E, q e (r, E) =
true iff r e and cl(r) = cl(e) and sup(r, E) ≥ τ where τ is a minimum
support threshold.
• For every E ⊆ E, VHARM (C, E) = true for every classifier C and training
data set E.
• For every E ⊆ E, OHARM (C1 , C2 , E) = true iff |C1 | > |C2 |.
Given an object o ∈ O and a classifier CHARM , the prediction operator used
by HARM ON Y is defined by: P redHARM (o, CHARM ) = AggBestP redk,w,agg (o,
CHARM ) where for every rule r ∈ C and data set E, w(r, E) = conf (r, E) and
agg is the sum aggregation function.

5 Analyzing and Optimizing ICCA

This section aims at analyzing the properties of the different classification meth-
ods within our framework in order to compare them. These properties also pro-
vide interesting optimizations allowing us to improve the efficiency of ICCA.

5.1 A comparative study of classification methods in our framework

Before detailing the comparison between all the classification methods, we in-
troduce two important properties on the parameters of ICCA. The latter has an
important impact on the construction of classifiers.
At first, many methods use the same rule E-order to rank the rules all along
the construction of the classifier. This property is formally defined below:

Definition 4 (P1: Constant Order Generator). An order generator func-

tion OrdGen is constant iff there exists a rule E-order α such that ∀C ∈ C,
OrdGen(C) = α.

In other terms, whenever P1 is satisfied by a method (e.g., CBA or HAR-

MONY), the ranking over the rules is not modified by the rules added in the
classifier. Typically, OrdGenCBA stemming from information about confidence,
support and generalization on the entire database, is independent of the classi-
fier. In Section 5.2, P1 is taken into account by sorting the rules only once before
starting the iterative phase of the algorithm.
Moreover, most of the methods start from an initial collection of rules and
refine it during the classifier construction. The following definition expresses this
property:

Definition 5 (P2: Monotone Predicate Generator). A predicate generator

function P redGen is monotone iff ∀C1 , C2 ∈ C such that C1 v C2 , one have
P redGen(C2 ) ⇒ P redGen(C1 ).

Intuitively, Property P2 is satisfied for all the methods which compute a

smaller theory when the classifier contains more rules. ICCA can benefit from
this property by computing only once the theory at the initialization step. For
instance, P redGenCBA satisfies Property P2 since the larger the classifier, the
more selective the predicate (∃e ∈ E \ covered(E, C))(r e ∧ cl(r) = cl(e)).
Thereafter, for each step, the algorithm reduces the previous theory by removing
Features AQ CN2 CBA CMAR FOIL PRM CorC HARM
Representing Number of calls of ICCA N bClass 1 1 1 N bClass N bClass 1 |E|
Level
P1: constant OrdGen × × × ×
Parameters of
P2: monotone P redGen × × × × ×
ICCA
K: number of rules per 1 1 1 ≥1 1 ≥1 1 1
iteration in ICCA
Resulting order type (P=Partial, P T T T P P T P
classifier T=Total)
BestRule × × ×
Prediction AggPred w conf wght-χ2 Laplace Laplace (wght-)conf
agg Psum sum avg avg avg/sum
k +∞ +∞ k k k or +∞

Table 1. Comparison of the classifiers based on features in RBC framework (wght

means weighted).

uninteresting rules (e.g., already covered rules) according to the classifier in

progress instead of computing a new theory.
Based on our generic framework, Table 5.1 sums up and compares many
various classification methods including those listed in Section 4. The first row
indicates for each method the number of required calls to ICCA. The second part
describes the main parameters of ICCA. Then, it provides information about the
satisfaction of Properties P1 and P2 which lead to the optimizations described
below. The number of selected rules for each iteration in ICCA is also specified.
The third part depicts the order type of the final classifier. Finally, the last part
summarizes the used prediction method(s) with their own parameters (i.e., w,
agg and k).
Of course, we observe that BestRule prediction is always performed on to-
tally ordered rules. More interestingly, Table 5.1 shows that Property P 1 implies
Property P 2. Indeed, when the order is constant during the extraction (i.e., the
interest of each rule remains the same), an uninteresting rule in a given step is
not relevant in the subsequent steps.

5.2 ICCAopt : a generic optimized algorithm

This section presents an optimized version of ICCA skecthed by Algorithm 2
which inputs and result are similar to those of ICCA (see Section 3.2). We just
describe here the different optimizations relying on Properties P1 and P2 given
in the previous section. Indeed, these properties decrease the computation cost of
the theory which is the most difficult step of ICCA (see Algorithm 1, Line 5). Let
us note that other sophisticated optimizations based on properties of predicate
qi (e.g., anti-monotonicity, boundable constraints) are not discussed here.
Line 2 computes R0 which is a superset of all theories W obtained with any
classifier Ci . In other terms, the predicate P redGen∗ = C∈C P redGen(C) is
a constant predicate and allows us to restrain the language of rules potentially
interesting for the classifier construction. In particular, as ∀i ≥ 0, Ci v Ci+1 , we
straightforwardly deduce that P redGen∗ = P redGen(h∅, ∅i) when P2 is satis-
fied. For instance, P redGenCBA ∗ (r, E) = sup(r, E) ≥ α∧conf (r, E) ≥ β∧(@r0 ∈
Algorithm 2 ICCAopt (R, E, K, OrdGen, P redGen, V, O)
Input: A rule language R, a training data set E, an integer K, an order generator function OrdGen,
a predicate generator function P redGen, a classifier validation predicate V and a classifier E-
order O
Output: A classifier C
1: i = 0 and C0 = h∅, ∅i
2: R0 = T h(R, E, P redGen∗ )
3: if OrdGen is constant then >0 = OrdGen(C0 )[E] and Cto = hR0 , >0 i
4: repeat
5: i = i + 1
6: qi = P redGen(Ci−1 )
7: if OrdGen is constant then
8: Ti = ∅ and Ccur = Cto
9: while (|Ti | < k) ∧ (Ccur 6= ∅) do
10: Ri = max>0 (Ccur )
11: Ccur = Ccur \ Ri
12: Ti = Ti ∪ T h(Ri , E, qi )
13: od
14: Cto = Cto \ Ti
15: else
16: αi = OrdGen(Ci−1 )
17: if P redGen is isotone then
18: Ri = T h(Ri−1 \ Ci−1 , E, qi )
19: else
20: Ri = T h(R0 \ Ci−1 , E, qi )
21: end if
22: Ti = T opK (Ri , E, αi )
23: end if
24: Ci = Ci−1 ◦ hTi , αi [E]i
25: until (V (Ci , E) = true) or Ci = Ci−1
26: C = T op1 ({Ck | 1 ≤ k ≤ i}, E, O)
27: return C

R)(r is a direct specialization of r0 ∧P essError(r0 , E) < P essError(r, E)) (see

CBA description in Section 4).
Line 3 computes the E-order >0 and ranks the rules with Cto iff OrdGen
satisfies P1. In such case, >0 and Cto are used all along the main loop for
computing Ti by selecting the best rules (w.r.t >0 ) from Cto satisfying qi (Line 9-
13).
When P redGen is monotone, Line 18 directly finds the theory Ri by selecting
the rules among Ri−1 \ Ci−1 and satisfying qi . As Ri−1 \ Ci−1 is smaller (and, in
general, much smaller) than R0 (which is used when P redGen is not monotone,
see Line 20), this optimization really improves the algorithm’s efficiency.

6 Conclusion

This paper proposes a generic framework for rule-based classification: rule-

induction, association-based or instance-centric classifiers. This framework en-
compasses a broad spectrum of existing methods including AQ, CN2, CBA,
CMAR, FOIL, PRM, CorClass and HARMONY. The classifier construction of
such approaches is uniformly described thanks to the general algorithm ICCA.
We similarly give two general prediction functions: BestRule and AggP red. Fi-
nally, we compare the main features of the described methods within our frame-
work. In particular, two key properties allow us to improve the efficiency of ICCA
by reducing the computational cost of theories.
Further work addresses the generalization of this framework to other kinds
of patterns (e.g., sequences or trees) in order to naturally extend existing classi-
fiers to more complex data. We would like also to implement ICCA and to test
other classification methods by defining new ICCA’s parameters. Furthermore,
it would be interesting to examine the interest of our approach for other global
model construction such as clustering.

References
1. Peter Clark and Tim Niblett. The cn2 induction algorithm. Mach. Learn., 3(4):261–
283, 1989.
2. William W. Cohen. Fast effective rule induction. In Armand Prieditis and Stuart
Russell, editors, Proc. of the 12th International Conference on Machine Learning,
pages 115–123, Tahoe City, CA, july 1995. Morgan Kaufmann.
3. Guozhu Dong, Xiuzhen Zhang, Limsoon Wong, and Jinyan Li. Caep: Classification
by aggregating emerging patterns. In Discovery Science, pages 30–42, 1999.
4. Arno J. Knobbe and Eric K. Y. Ho. Pattern teams. In PKDD, pages 577–584,
2006.
5. Jinyan Li, Guozhu Dong, and Kotagiri Ramamohanarao. Instance-based classifi-
cation by emerging patterns. In PKDD ’00: Proceedings of the 4th European Con-
ference on Principles of Data Mining and Knowledge Discovery, pages 191–200,
London, UK, 2000. Springer-Verlag.
6. Jinyan Li, Guozhu Dong, and Kotagiri Ramamohanarao. Making use of the most
expressive jumping emerging patterns for classification. In Pacific-Asia Conference
on Knowledge Discovery and Data Mining, pages 220–232, 2000.
7. Wenmin Li, Jiawei Han, and Jian Pei. Cmar: Accurate and efficient classification
based on multiple class-association rules. In ICDM ’01: Proceedings of the 2001
IEEE International Conference on Data Mining, pages 369–376, Washington, DC,
USA, 2001. IEEE Computer Society.
8. Bing Liu, Wynne Hsu, and Yiming Ma. Integrating classification and association
rule mining. In Knowledge Discovery and Data Mining, pages 80–86, 1998.
9. Ryszard S. Michalski. On the quasi-minimal solution of the general covering prob-
lem. In Proceedings of the V International Symposium on Information Processing
(FCIP 69)(Switching Circuits, volume A3, pages 125–128, 1969.
10. Yasuhiko Morimoto, Takeshi Fukuda, Hirofumi Matsuzawa, Takeshi Tokuyama,
and Kunikazu Yoda. Algorithms for mining association rules for binary segmen-
tations of huge categorical databases. In VLDB ’98: Proceedings of the 24rd In-
ternational Conference on Very Large Data Bases, pages 380–391, San Francisco,
CA, USA, 1998. Morgan Kaufmann Publishers Inc.
11. J. Ross Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.
12. J. Ross Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Pub-
lishers Inc., San Francisco, CA, USA, 1993.
13. J. Ross Quinlan and R. Mike Cameron-Jones. FOIL: A midterm report. In Machine
Learning: ECML-93, European Conference on Machine Learning, Proceedings, vol-
ume 667, pages 3–20. Springer-Verlag, 1993.
14. Luc De Raedt and Albrecht Zimmermann. Constraint-based pattern set mining.
In SDM, 2007.
15. Michalski Ryszard S., Mozetic Igor, Hong Jiarong, and Lavrac Nada. The aq15 in-
ductive learning system: An overview and experiments. In Reports of the Intelligent
Systems Group, ISG 86-20, UIUCDCS-R-86-1260, 1986.
16. Jianyong Wang and George Karypis. Harmony: Efficiently mining the best rules
for classification. In SDM, 2005.
17. Xiaoxin Yin and Jiawei Han. Cpar: Classification based on predictive association
rules. In SDM, 2003.
18. Albrecht Zimmermann and Luc De Raedt. Corclass: Correlated association rule
mining for classification. In Discovery Science, pages 60–72, 2004.

The Lagrangian Method in Economics
No ratings yet
The Lagrangian Method in Economics
4 pages
Product Design and Manufacturing - R.C. Gupta, A.K. Chitale (PHI, 2011) PDF
No ratings yet
Product Design and Manufacturing - R.C. Gupta, A.K. Chitale (PHI, 2011) PDF
539 pages
Bayesian Laws
No ratings yet
Bayesian Laws
16 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Unit 4 Classification & Prediction
No ratings yet
Unit 4 Classification & Prediction
10 pages
Datamining Unit 3
No ratings yet
Datamining Unit 3
47 pages
Classifiction
No ratings yet
Classifiction
42 pages
6 الى13 داتا ماينق
No ratings yet
6 الى13 داتا ماينق
19 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
22 pages
Data Minning Unit 2-1
No ratings yet
Data Minning Unit 2-1
10 pages
Unit6 - 5 Rule Based Classifier
No ratings yet
Unit6 - 5 Rule Based Classifier
28 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
DWDM Unit IV Note
No ratings yet
DWDM Unit IV Note
21 pages
DM Module 4
No ratings yet
DM Module 4
12 pages
ART: A Hybrid Classification Model: 2004 Kluwer Academic Publishers. Manufactured in The Netherlands
No ratings yet
ART: A Hybrid Classification Model: 2004 Kluwer Academic Publishers. Manufactured in The Netherlands
26 pages
Classification and Prediction-Module4
No ratings yet
Classification and Prediction-Module4
26 pages
Paper IJRITCC
No ratings yet
Paper IJRITCC
5 pages
Eodr Fcds
No ratings yet
Eodr Fcds
12 pages
PR Assignment 01 - Seemal Ajaz (206979)
No ratings yet
PR Assignment 01 - Seemal Ajaz (206979)
7 pages
Pattern Recognition and Computer Vision NOTES
No ratings yet
Pattern Recognition and Computer Vision NOTES
27 pages
Lecture 7 - Classification (Rules and Naïve Bayes)
100% (1)
Lecture 7 - Classification (Rules and Naïve Bayes)
19 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Clustering Before Classification
No ratings yet
Clustering Before Classification
3 pages
DM See M4
No ratings yet
DM See M4
8 pages
Knowledge Mining Using Classification Through Clustering
No ratings yet
Knowledge Mining Using Classification Through Clustering
6 pages
Classification: Alternative Techniques: Md. Fazlul Karim Patwary IIT
No ratings yet
Classification: Alternative Techniques: Md. Fazlul Karim Patwary IIT
65 pages
Rule
No ratings yet
Rule
3 pages
Survey Paper On Classification
No ratings yet
Survey Paper On Classification
6 pages
Literature Review CCSIT205
No ratings yet
Literature Review CCSIT205
9 pages
Learning Predictive Clustering Rules
No ratings yet
Learning Predictive Clustering Rules
12 pages
Rule Based Classifications
No ratings yet
Rule Based Classifications
14 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
41 j48 Naive Bayes Weka
No ratings yet
41 j48 Naive Bayes Weka
5 pages
Chap4 Rule Based
No ratings yet
Chap4 Rule Based
21 pages
Module 3 Notes
No ratings yet
Module 3 Notes
31 pages
Gene Classification Using Pattern Discovery Based Classifier Rules
No ratings yet
Gene Classification Using Pattern Discovery Based Classifier Rules
39 pages
CZ4032 Data Analytics & Mining Notes
No ratings yet
CZ4032 Data Analytics & Mining Notes
16 pages
Module4 QB 1
No ratings yet
Module4 QB 1
26 pages
Siv UNIT-3 Classification DWM PART-A
No ratings yet
Siv UNIT-3 Classification DWM PART-A
12 pages
Veri Madenciliği Bölğm 6
No ratings yet
Veri Madenciliği Bölğm 6
164 pages
Decision Tree For The Weather Forecasting
No ratings yet
Decision Tree For The Weather Forecasting
4 pages
Introduction To Data Mining 2005
60% (5)
Introduction To Data Mining 2005
400 pages
Class Adv Classification II
No ratings yet
Class Adv Classification II
32 pages
DM 05 04 Rule-Based Classification
No ratings yet
DM 05 04 Rule-Based Classification
72 pages
DM - 05 - 04 - Rule-Based Classification PDF
No ratings yet
DM - 05 - 04 - Rule-Based Classification PDF
72 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
4 Classification
No ratings yet
4 Classification
20 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
DM 04 04 Rule-Based Classification
No ratings yet
DM 04 04 Rule-Based Classification
72 pages
Spatial and Temporal Data Mining
No ratings yet
Spatial and Temporal Data Mining
95 pages
Data Warehousing and Data Mining UNIT - 04: A Lazy Learner Simply Stores The Training Data and
No ratings yet
Data Warehousing and Data Mining UNIT - 04: A Lazy Learner Simply Stores The Training Data and
3 pages
Classification
No ratings yet
Classification
52 pages
Review of Data Mining Classification Techniques
No ratings yet
Review of Data Mining Classification Techniques
4 pages
6 IJAEST Volume No 2 Issue No 2 Representative Based Method of Categorical Data Clustering 152 156
No ratings yet
6 IJAEST Volume No 2 Issue No 2 Representative Based Method of Categorical Data Clustering 152 156
5 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Lect12-Rule Based Classifier
No ratings yet
Lect12-Rule Based Classifier
27 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
WCDMA RNO Access Problem Analysis Guidance-20040716-A-2.0
No ratings yet
WCDMA RNO Access Problem Analysis Guidance-20040716-A-2.0
39 pages
Haykin, Simon Kosko, Bart (Eds.) Intelligent Signal Processing 2001
100% (2)
Haykin, Simon Kosko, Bart (Eds.) Intelligent Signal Processing 2001
595 pages
Add Maths Project 2015
0% (1)
Add Maths Project 2015
29 pages
Area Target Tutorial
No ratings yet
Area Target Tutorial
20 pages
Material Management, Planning of Production, Batch Record
No ratings yet
Material Management, Planning of Production, Batch Record
39 pages
Analysis of Injection Moulding Machine Process
No ratings yet
Analysis of Injection Moulding Machine Process
4 pages
Mining Method Selection by Multiple Criteria Decision Making Tools
No ratings yet
Mining Method Selection by Multiple Criteria Decision Making Tools
6 pages
Simplex Algorithm (Algebraic & Tabular)
No ratings yet
Simplex Algorithm (Algebraic & Tabular)
14 pages
Ansys Optislang
No ratings yet
Ansys Optislang
2 pages
Billet Optimization For Steering Knuckle
No ratings yet
Billet Optimization For Steering Knuckle
5 pages
Hayati 2012 - Qualitative Evaluation and Optimization of Forest Road Network To Minimize Total Costs and Environmental Impacts
No ratings yet
Hayati 2012 - Qualitative Evaluation and Optimization of Forest Road Network To Minimize Total Costs and Environmental Impacts
5 pages
Time-Optimal Control With Direct Collocation and Variable Discretization
No ratings yet
Time-Optimal Control With Direct Collocation and Variable Discretization
6 pages
Elements of Computational Metrology: Vijay Srinivasan
No ratings yet
Elements of Computational Metrology: Vijay Srinivasan
2 pages
Math - Optimization For Middle School Students
100% (1)
Math - Optimization For Middle School Students
31 pages
Hosseini 2019
No ratings yet
Hosseini 2019
14 pages
Unconstrained Optimization
No ratings yet
Unconstrained Optimization
5 pages
Linear Programming
No ratings yet
Linear Programming
89 pages
Decision Variable
No ratings yet
Decision Variable
10 pages
Parikshit Mahajan Final Internship Report
No ratings yet
Parikshit Mahajan Final Internship Report
39 pages
Determination of Optimal Cut-Off Grade Policy (Referencia)
No ratings yet
Determination of Optimal Cut-Off Grade Policy (Referencia)
8 pages
Lecture 4: Equality Constrained Optimization: Tianxi Wang
No ratings yet
Lecture 4: Equality Constrained Optimization: Tianxi Wang
14 pages
IBP400 Col2505 SAP IBP For Inventory Planning and Optimization
No ratings yet
IBP400 Col2505 SAP IBP For Inventory Planning and Optimization
20 pages
Journal of Cleaner Production
No ratings yet
Journal of Cleaner Production
11 pages
International Journal of Research in Computer & Information Technology
No ratings yet
International Journal of Research in Computer & Information Technology
104 pages
Optimization
No ratings yet
Optimization
2 pages
Bhavan'S College: Andheri (West)
No ratings yet
Bhavan'S College: Andheri (West)
2 pages
Ship Design Stages
No ratings yet
Ship Design Stages
24 pages
Vasp Tutorial at Ugent: Dr. Dr. Danny E.P. Vanpoucke
No ratings yet
Vasp Tutorial at Ugent: Dr. Dr. Danny E.P. Vanpoucke
38 pages

A Generic Framework For Rule-Based Classification

Uploaded by

A Generic Framework For Rule-Based Classification

Uploaded by

A Generic Framework for Rule-Based

Arnaud Giacometti, Eynollah Khanjari Miyaneh

L.I. Université François Rabelais de Tours

Abstract. Classification is an important field of data mining problems.

Key words: classification, classifier, classification rule, prediction, lo-

1.1 Related works

1.2 Motivation and Contributions

This paper is organized as follows: Section 2 provides basic definitions and

2.1 Objects and examples

2.2 Classification Rules

Rule specialization A rule r = o → c is more general than a rule r0 = o0 → c0

Support and confidence Let r = o → c be a rule and E be a data set, the

2.3 Theory and Topk

Selection predicate Given a set L, a selection predicate for L is used to express

Theory Given a set L, a data set E ⊆ E and a selection predicate q for L, a

Topk Given a set L, a dataset E and α an E-order, the top k elements of L

Example 1. Given a dataset E ⊆ E, the language R of classification rules, and

– conf (r, E) > conf (r0 , E), or

Examples covered Let E be a dataset and C be a classifier. We distinguish three

Default classifier A default classifier is a function from 2E to C, that gives a

Operators on classifiers Our description of the classifier construction process

Aggregate prediction For a rule r and a dataset E, let w be a function from

Error Rate Given a set of examples E, a classifier C and a prediction operator

3 ICCA: A Generic Incremental Classifier Construction

3.1 Principles and Parameters

The next Section describes the algorithm more detailed.

3.2 The ICCA Algorithm

Suppose that ICCA is called with: ICCA(R, E, k, OrdGen, P redGen, V, O).

4 Representation of Existing Classifiers in ICCA

In this section, we show how existing classification algorithms can be described in

For multi-class classification problem, AQ needs to be applied multiple times,

AQj = ICCA(R, E, 1, OrdGenAQj , P redGenAQj , VAQj , OAQj )

The result classifier CCN 2 is totally ordered. Therefore,the BestRule pre-

As CN 2, CBA builds directly one classifier for multi-class problems. Formally,

CCBA = ICCA(R, E, 1, OrdGenCBA , P redGenCBA , VCBA , OCBA )

– OrdGenCBA (C) = αCBA where for every r, r0 ∈ R2 and E ⊆ E, we have

where P2 = covered+ (E, C2 ), N2 = covered− (E, C2 ), P1 = covered+ (P2 ∪

– OrdGenjF OIL (C) = αF j 2

PRM In this paper, we do not describe P RM in detail since P RM is mainly

In comparison with the previous classification methods, HARM ON Y uses an

Definition 3. (Order By.) Given a rule E-order α, the orderby operator is

Using this definition, the HARM ON Y classifier, denoted by CHARM , is de-

– For every r, r0 in R and data set E ⊆ E, αALL (r, r0 , E) = true if:

where the functions OrdGenHARM , P redGeneHARM , VHARM and OHARM

5 Analyzing and Optimizing ICCA

5.1 A comparative study of classification methods in our framework

Definition 4 (P1: Constant Order Generator). An order generator func-

In other terms, whenever P1 is satisfied by a method (e.g., CBA or HAR-

Definition 5 (P2: Monotone Predicate Generator). A predicate generator

Intuitively, Property P2 is satisfied for all the methods which compute a

Table 1. Comparison of the classifiers based on features in RBC framework (wght

uninteresting rules (e.g., already covered rules) according to the classifier in

5.2 ICCAopt : a generic optimized algorithm

R)(r is a direct specialization of r0 ∧P essError(r0 , E) < P essError(r, E)) (see

This paper proposes a generic framework for rule-based classification: rule-

You might also like