PRP Ij 1
PRP Ij 1
Abstract: Associotion and classificalion are two important techniques of data mining in knowledge discovery process. Classification
of association rules generales a set of large number of class associalion rules for a given observalion. Pruning unnecessary class
association rules withoul loosing classification accuracy is most imporlanl bul very challenging. The experimental resulls produced
by different class association rules mining techniques have shown lhe need lo consider more pruning parameters in order to reduce
the size of class association rule sel. In lhis paper, v)e survey various slrolegies for class association rule pruning and study their
efecls thal enables us to extracl effcienl compact and high conJidence class associafion rule set.
Keywords: Class Associalion Rules, Dala Mining, Knov,ledge Discovery Process, Pruning Strategies.
)
INTRODUCTION attributes, from a transaction database (DT). Let [ = la,, a,,
The idea of using association rule mining in classification .,., a") be a set of items (database attributes), and T: {T,,
rule mining was first introduced in 199'7 by [2] and [3] and T2, ..., T.) be a set of transactions, DT is described by T.,
it.was named as class association rule mining or associative where each T eT contains a set of items I'and I'c I. In
classification. The first classifier based on association rules association rule mining, fwo threshold values are usually used
to determine the significance of an association rule.
was CBA [4] given by Liu et al. in 1998. Later, some
improved classifiers were given by Li et al. CMAR [5] in (i) Support: The ffequency that the items occur or co-
2001, Yin et al. CPAR [6] in 2003, and Fadi et al. MCAR occur in T. A support threshold min_sup, defined
by the users, is used to distinguish fi'equent itemsets
[8] in 2005. More research is going on to design even
improved classifiers. from the infrequent ones. A set of items S is called
an itemset, where Sc I, and V a € S co-occur in T.
Class association rule mining process can be
If the occurrence of some S in T exceeds min_sup,
decomposed in three parts. First we find frequent itemsets
we say that S is a frequent itemset.
and frequent class association rules. The provided support
threshold value is used to remove the uninterested elements.
(ii) Confidence.' represents how "strongly" an itemset
X implies another itemset Y, where X, Y E I; and
Second we find the strong class association rules. Confidence
threshold value helps to accomplish this task and prune the
XnY - {O} A confidence threshold min_conf,
weak rules. Third only a subset of selected class association
supplied by the user, is used to distinguish high
confidence association rules from low confidence
rules is used to design a classifier and rest of the class
association rules.
association rules are removed. Various methods [7, 9, 10, 1 1,
An association rule X =+ Y is valid when the support
12] are used to perform the selection ofclass association rules.
for the co-occurrence ofX and Y exceeds min_sup, and the
The different associative classification technique uses
confidence of this association rule exceeds min conf. The
several different approaches to discover, extract, store, rank
computation of support is: (X u Y) / (total number of
and prune the redundant class association rules. The
transactions in DT). The computation of confidence is:
objective of this paper is to survey different pruning
support (X u Y) / support (X). Informally, X
methodologies involved in different classifiers in order to = Y can be
interpreted as "ifX exists, it is likely that Y also exists".
produce elfic ient classifi er.
Association rule mining involves the following two
CLASS ASSOCIATION RULE MINING steps:
step (i)' All rules having confidence not less than pruning is used to remove infrequent itemsets, infrequent
min*conf are extracted' and weak class association rules. Pruning is also used to
The idei of class association rule mining is as fqllows' decide that which rule from the strong class association rule
We have given a training database where each transaction set
will be included in the final classifier based on its
contains all features ofan object in addition to the class label
confidence and coverage capacity' Pruning can be applied
of that object. We can derive the association rules to always into three levels in the overall process of associative
have a class label as consequent i.e. the problem states of classification.
finding a subset of an association rule set of the X =+ C,
(a) Early Pruning: early pruning is used notto generate
where X is association of some or all object features and C
irrelevant candidate sets, remove infrequent
itemsets and infrequent class association rules' The
is class label ofthat object.
Class association rule mining is a special case of support threshold value is at this level.
association rule mining and associative classification finds
(b) Intermecliate Pruning: intermediate pruning is used
class association rules and extract
a subset ofclass association rule set to predict the class of
to remove weak
only strong class association rules. The confidence
previously unseen data (test data) as accurate as possible
threshold value is used at this stage to accomplish
with minimum efforts. This subset of class association rule
the task.
set is called associative classifier or simply a classifier'
Let we illustrate the class association ruie mining with
(c) Late Pruning; late pruning is used to extract only
a selected subset of strong class association rules
the training data shown in table 1. It consists three attributes
to form the final associative classifier. The strong
!x 1xt, x2,x3), Y (Y1, Y2,Y3),2(21,22,23):30o/o
" class labels (Cl, C2). We assume the min-sup
and two
and ness and coverage capacity ofthe class association
min-conf : '7TYo.Table 2 shows the strong ciass association rules is used to do it.
rules along with their support and confidence' The table 2 In this section we will go through the various pruning
also represents a classifier as the rules are sorted according strategies involved in the different associative classification
to confidence theY hold. systems at the different levels of the overalI process'
Table I
Early Pruning Techniques
Training Datrbase In early pruning, support threshold value is used to remove
C |ass
the infrequent itemsets. Following methods can be used to
TID
reduce the efforts involved in frequent itemset mining:
L Y) Y2 ZI CI
2. XI YZ Z2 C2 Handling Mutually Exclusive ltems
3. Xi Y3 Z3 C2
This technique exploits the concept that values ofan attribute
X3 YI L/ C1
4.
used to be mutually exclusive i.e. only one attribute value
XI YI C2
be contained by that attribute in an instance' To
1,J
5.
Y) CI
will
Y] ZT
implen.rent it, the candidate generator must avoid producing
6.
T, X3 Y3 Z2 C1
candidates with more than one value for the same attribute'
o X1 YI Z1 CI
CI
This strategy speeds up the second pass and it also speeds
X2 Y3 ZI
candidate generation on subsequent passes since subsets for
9.
l0 XI YI ZI C2
candidates pruned by this technique do not have to be
explicitly checked by subset-support based pruning'
Table 2
Strong CIass Association Rule Set llse of Infrequent ltemsets
Class Association Rule Supporl Confidence This technique is based on that an itemset cannot be frequent
if its any subset is inffequent. An infrequent itemset can be
Anlecedenl ConseQuent
used to find that several candidates in subsequent passes will
3/10
be infrequent without calculating their support. It will save
x2 C1 313
Y3 CI 3/10 313
a lot ofefforts involved in support count process ofthose
xzzl CI 3/10 Il
itemsets that will never be frequent. It will be better to refer
Xi C2 4/10 415
this task by "negative itemset mining".
7.1 CI 4t 10 4 5
relrrov€ fiom the set of frequent itemsets any set having a Handling Conflicting Glass Association
subset with equivalent support before forming the next set Rufes
of candidates. Conflicting class association rules refers to such rules that
have similar LHS itemsets but predicting different classes
Other Techniques
in RHS. For example, given two rules such as R -+ C1 and
We can find some other techniques that can be used to reduce R -+ C2, [13] proposed a pruning method that considers
the candidate sets. The algorithm [17] reduces the candidate these conflicting rules and removes them. The algorithm [14]
itemsets by removing those itemsets that are containing more considers such rules as useful knowledge and combines them
number of items than available number of attributes. The in a single rules naming multi-label class association rule
algorithms including [18] do not generates candidate i.e. R -+ Cl V C2.
itemsets, and uses other data siructure FP-tree to generate
the frequent itemsets. We can search some more techniques Correlation Testing Between Rule Body &
that can be used to prune the infrequent itemsets more its Gfass
efficiently. This concept is taken from statistics that finds the correlation
between rule body and its predicting class to determine
lntermediate Pruning Techniques whether they are correlated or not. The chi-square testing is
Intermediate pruning uses confidence threshold value to used to for this purpose. Ifthe discovered class association
remove the weak class association rules. But following more rule is negatively correlated, it is pruned. If the ruie body is
rnethods can be used to reduce the size ofclass association positively correlated to its class, it is stored in class
rules without reducing its classification strength: association rule set. The algorithm [5] performs the chi-
square testing in its rule discovery step to retain or remove
Removing Redundant Class Association the class association rules.
Rules
The idea behind this technique is to exploit the fact that if a
Backward Class Association Rule Pruning
rule R meets the confidence threshold value, then any rule Pruning in decision tree involves pre-pruning and post-
containing R and having confidence less than R will apply pruning. The post-pruning, known as backward pruning is
to only covered instances. Such rules are termed as redundant frequently used by decision tree algoritl'ms like C4.5 [15].
class association rules. Redundant rule pruning has been First a decision tree is constructed and, then it is decided
reported in [7]. Its works as follows: Let R *+ C be a general whether each node and its descendents is replaced or not by
rule, any rule R' + C such as R c R' and R' -r C has lower a single leaf. The decision is made on the basis of the
confidence in compare to R -r C, will be redundant and estimated error using pessimistic error estimation [16] of a
they are removed from the class association rule set. This node and comparing it with its potential replacement leaf.
method significantly reduces the size of the class association This backward pruning can also be used in class association
rule set and minimizes rules redundancy. The algorithms, rule pruning. The algorithms including [4] have used it to
including t7, l3l have used the pruning of redundant class effectively reduce the number ofextracted class association
association rules. They perform such pruning immediately rules.
after a rule is inserted into a compact data structure called
Late Pruning Techniques
CR-tree.
Late pruning is involved with the final step of classifier
Rule Strueture Exploltatlon formation. Following techniques may be used for the late
pruning of class association rules:
The structure of class association rules can be exploited to
reduce the efforts involved in class association rule mining. Selecfion of Most Confident Glass
For example, if a class is not frequent then the class Association Rules
association rules predicting that infrequent class can never This technique is based on the assumption that the testing
be frequent. Therefore, we need not to calculate the data will share the same attributes as the training data, on
ffequency of those itemsets, which are in left hand side of which we built our classifier. So if a rule has higher
such infrequent class association rules. This stratdgy plunes confidence in the training data, then this rule will also show
the useless candidates without calculating their supports. The a higher confidence in testing data i.e. the class predicted
class association rule structure can be exploited by many by that class association will be most likely to occur. In this
other ways also to ge1 benefited. Some datasets will benefit technique, rve always choose a class association rule with
ltom stronger rule structure exploitation at the cost ofother the highest confidence among all the applicabie class
strategies affected by it, thought the class association rules association rules. If rve have a tie, rve choose tire longest
structure investigation is interesting topic for research. one class association rule.
65 Journal of Advanced Research in Computer Engineering
Matching Longest Glass Associatlon before the rule is applied for classification oftest instances,
Rufes The CPAR algorithm has shown that exploitation of Laplace
accuracy has produced better results in compare to CBA.
In this technique we select those class association rules
having longest left hand side that matches a particular case. EFFECT OF THE PRUNING TECHNIQUES ON
The longest match method is based on the conclusion that ASSOCIATIVE GLASSI FICATION
the class association rules with longest left hand side will
In class association rule mining we need to generate all the
contain more accurate and richer information for the frequent itemsets, strong class association rules and a
prediction of a class. We know that longest match is more
selected subset ofthese strong class association rules to form
specific and accurate, but the problem with this case is that an associative classifier. Since the maximum number of
the support and confidence of the class association rule combination of all items-class set is very high and used to
decreases exponentially as the size of its left hand size
be impossible for manual interpretation and human
increases.
understanding, therefore it becomes very important to extract
Database Coverage only useful elements at each level. Here pruning becomes
and important task. Pruning infrequent itemsets prevents the
Database coverage is a very popular pruning technique in
generation ofuseless class association rules and pruning of
class association rule mining. The algorithms including [4]
class association rules remove the useless class association
and [5] have successfully exploited this pruning technique
rules, generated anyhow and makes the class association rule
to reduce the size of class association rule set. This method
set compact. Selection of some very very important rules to
J works as follows: first all the rules of class association rule
form an associative classifier further reduces the size of class
set are sorted in descending order according to their
association rule set used for the classification of new
confidence. Then each class association rule is tested against
instances.
the training dataset instances. If a rule correctly classifies
The experimental results of different associative
some instances in the training dataset, all instances covered
classification techniques have shown that the use ofdifferent
by the rule are removed from the training dataset and the
pruning techniques at different levels has given better
rule is marked as candidate rule. If a rule does not correctly
performance. The database coverage and / or pessimistic
covers any instance in training dataset then it is removed
pruning tends to chose general rules and produces simpler
from the class association rule set. Finally we get the class
association rule set, having candidate rules only. classifiers which sometimes are more accurate on test dataset
when compared with lazy pruning methods.
Lazy Prunlng Generally smaller classifiers are preferred by human
Lazy pruning aims to remove only those rules from class experts, however they suffer from drawbacks like sensitivity
association rule set that incorrectly classify the training data to low quality data and coverage to the database. On the
set instances. ln lazy pruning each rule of slass association other hands the techniques that derives the classifiers with
rule set is tested against training dataset instances and we larger rule set are having better predictive accuracy but takes
delete those rules that either incorrectly classifies at least more time in training and classifying objects as they
one training data set instance or they do not covers even generates large number of rules and use them for
single instance of the training data set. Here we do not delete classification. They also take more space to store such
the covered instances from the training data set as it is done associative classifiers. Combining different pruning
in database coverage method. Lazy pruning considers all techniques in an associative classifier can resolve the trade
class association rules classifying the'instances of training olf among the time, space and predictive accuracy of that
data set, whereas the database coverage method considers classifier.
only single rule classifoing an instance. In other words in
EXPERI MENTAL EVALUATION
lazy pruning an instance is classified by several class
association rules but in database coverage method an instance We have compared the effect of three pruning techniques
is covered by only single class association rule. The with the number of rules derived by them. These are CBA
erperimental results have shown that lazy pruning produces [4] (pessimistic error and database coverage), MCAR [8]
large number ofpotential class association rules and therefore (database coverage) andlazy pruning [19]. The experiments
consumes more memory space in compare to other are done on the fourteen datasets available on UCI MIL data
techniques. repository [20]. Table 3 gives the number of rules derived
from different pruning techniques.
Laplace Accuracy The last column of table 3 shorvs that lazy pruning
The Laplace accuracy is used by the associative classification gcnerates lrrge nurnber ofclass aisociation rutes in conrpare
algorithm i5]. It is mainly used in class association rule to other approaches. One ofthe reasons for generating large
mining to calculate the expected error of the rules. It number of class association rules by lazy pruning may be
calculates expected accuracy for each class association rule due to storing those rules (as spare rules) that do not covers
r
p,runing Techniques for Class Association Rule Mining 67
Table 3 t5l W. Li, J. Han. and J. Pei, "CMAR: Accurate and Elficient
r of Rules Derived by Different Pruning Techniques. Classification based on Multiple Class-association Rules", In
Numbe
Proceedings ofthe 2001 IEEE International Conference on Data
S.N. Data P essimistic Database La:y Mining (ICDM-}1),IEEE Computer Society, San Jose, CA,
Set Name Error & Coverage Pruning United States,2001, pp. 369-3'16.
Dalobase Coverage
t6l X. Yin and J. Han, "CPAR: Classification based on Predictive
t. Breast 47 6'7 22t83 Association Rules", In Proceedings of the Third SIAM
') Glass 29 39 li06l InternationalConference on Data Mining (SDM-03), SIAM, San
3 Heart 4J 80 40069 Francisco, CA, United States,2003, pp. 331-335.
4. Iris 5 l5 190
l7 l6 1967 t11 F. Coenen and P. Leng. "An Evaluation of Approaches to
5. Labor
Lymph l5 5Z 86917 Classification Rule Selection", In Proceedings ofthe 4'h IEEE
6.
7. Pima 40 9l 9842 [nternational Conference on Dala Mining (ICDM-04), IEEE
8. Tic-tac 28 28 41823 Computer Society, Brighton, United Kingdom, November 2004,
9, Wine 1l 51 407-t 5 pp.359-362.
l0 Zoo 5 9 8092 I
3
t8l F. Thabtah, P. Cowling, and Y. Peng, MCAR: Multi-class
Classification B ased on Association Rule Approach. Proceedings
of3'd IEEE Internalional Conference on Computer System and
any objects. The database coverage methods eliminate these Applicat ions Ca iro, Egypt, 2005, pp. 1 -7.
spare rules that reduces the size ofthe associative classifiers. [9] Veloso A., Meira W. "Rule Generation and Selection Techniques
The CBA [4] and MCAR [5] algorithms generate the for Cost Sensitive Associative Classification", l9rh Brazilian
associative classifiers of reasonable size in compare to the Symposium on Software Engineering, 2005.
,t\ lazy pruning methods [19]. [l0] J. Wang, G Karypis, "On Mining Instance-Centric Classification
Rules", in IEEE Transacttons on Knowledge and Dala
CONGLUSION Engineering, Vol. XX. No. XX, 2006, pp, l-13.
Class association rule mining is an important method of flll Y. J. Wang. Q. Xin, and F. Coenen, "A Novel Rule Ordering
knorvledge discovery in databases. Pruning techniques plays Approach in Classification Association Rule Mining", In
Proceedings ol the 5th International Conference on lvlachine
very important role in effective class association rule mining
Learning and Dato Mining (MLDM-07), Springer-Verlag,
to construct high standard associative classifier. We have Leipzig, Germany, July 2007, pp. 339-348.
discussed different pruning methods in this paper and
compared some of them. Comparison has been made on the [12] Y. J. Wang, Q. Xin, and F. Coenen, "ANovel RuleWeighting
Approach in Classilication Association Rule Mining", In
results obtained by different associative classification Proceedings ofthe 7th IEEE International Conference on Dato
algorithrns employing a particular pruning technique. The Mining, October 2007, pp. 271-276.
comparison has shown that database coverage with
fl31 M. Antonie, O. Zaiane, and A. Coman. (2003), Associative
pessimistic error and database coverage pruning have Classifiers for Medical Images, Lecture Notes in Artificial
produced better results in compare to lazy pruning and Intelligence 2797, Mining Multintedia and Complex Data,
generates compact classifiers that are easy to understand and Springer-Verlag, 2003, pp. 68-83.
use for the classification ofnew instances. Ii4] F. Thabtah, P. Cowling, and Y Peng, (2004a) MMAC: A New
Multi-class, Multi-label Associative Classification Approach.
REFERENCES Proceedings of the Fourth IEEE International Conference on
Data Mining, Brighton, UK 2004, pp.217-221.
tll R. Agrarval and R. Srikant, "Fast Algorithm for Mining
Association Rules". In Proceedings of the 20th Internationdl [5] J. Quinlan. C4.5: Programs lor Machine Learning, San Mateo,
Conkrence o.n I''ery, Large Dala Bases (VLDB-94). Morgan CA: Morgan Kaufmann. [ 993.
Kaufman Publishers, Santiago de Chi1e. Chile, September 1994,
pp 487--lqq [16] J. Quinlan, Simplifying Decision Trees. International Journal
of Man-ntachine Studies. 21 (3), 1987, pp. 221-248.
12j R. Bayardo, Brute-force Mining ol High-confidence
Classification Rules, In Proceedings of the -il-d Internolional [17] P. R. Pal, R. C. Jain, CAARMSAD. "Combinatorial Approach
olAssociation Rule Mining for Sparsely Associated Databases".
Conference on Knowledge Discovery and Dala ),lining (KDD-
Published in Jottnal of Compuler Science, Coimbature,
97). AAAI Press, Newport Beach. CA. United States. August
Tamilnadu India, Vol. 2, No 5, pp 717, July 2008.
1997, pp. 123-126.
t3l K. Ali. S. Iv'langanaris. and R. Srikant. "Partial Classification [8] J. FIan. J. Pei, and Y. Yin, Mining Frequent Patterns without
usingAssocration Rules", In Proceedings oflhe 3rd Internotional Candidate Generation. [n proceedings of international
Conference on Knov,ledge Dtscovery ond Data ivlining (KDD- conference on management of data (.4CM SIGMOD'00), pp l-
97). AAAI Press, Nervport Beach. CA, United Stales. August 12, Dallas. TX, Ma1 2000.
1997. pp ll5-ll8 [9] E. Baralis and P. Torino. "A Lazy Approach to Prunning
t4l B Liu. \V. LIsu. and Y. Ma, "lntegrating Classihcation and Classification Rules". In proceedings ol international conference
Association Rule i\,Iining''. In Procccdings c:f lhe 1'h International on Data I'lining (lEtit:, ICDiV'02). pp.35.
( on,lirence on Kitowled,qe l)i.scovery and Dctla l.lining (KDD- P. Murphy, IJCI Repository of Machine Learning
[20] C. Merz and
98/.,{AAi Press. Ne$,York Citl'. NX United States. 1998. pp. Databases. Irvine, CA. University of Califbrnia, Department of
80-86. Inlormation and Computer Science.