2019-12 Classification of Pruning Methodologies For Model Development Using Data Mining Techniques
2019-12 Classification of Pruning Methodologies For Model Development Using Data Mining Techniques
Published By:
Retrieval Number: B3317129219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.B3317.129219 2043 & Sciences Publication
Classification of Pruning Methodologies for Model Development using Data Mining Techniques
Published By:
Retrieval Number: B3317129219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.B3317.129219 2044 & Sciences Publication
International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-9 Issue-2, December, 2019
Usually the item-sets used to have their support near to either is frequently used by decision tree algorithms like C4.5 [9].
support of one of its subsets or support of one of its supersets. First a decision tree is constructed and, then it is decided
To prevent these equivalent rules from being generated or whether each node and its descendants is replaced or not by a
incurring any overhead, the exact-equivalence strategy single leaf. The decision is made on the basis of the estimated
remove from the set of frequent item-sets any set having a error using pessimistic error estimation [10] of a node and
subset with equivalent support before forming the next set of comparing it with its potential replacement leaf. This
candidates. backward pruning can also be used in class association rule
Other Techniques pruning. The algorithms including [3] have used it to
effectively remove the number of extracted class association
We can find some other techniques that may optimize the
rules.
candidate rule sets. The algorithm [11] reduces the candidate
item-sets by removing those item-sets that are containing C. Late Pruning Techniques
more number of items than available number of attributes.
Late pruning is involved with the final step of classifier
The algorithms including [11] do not generates candidate
formation. There are a large number of late pruning
item-sets, and uses other data structure FP-tree to generate the
techniques available that have been used by different
frequent item-sets. We can search some more techniques that
researchers. Some of these techniques are listed out here
can be used to prune the infrequent item-sets more efficiently.
along-with their scope and importance:
B. Intermediate Pruning Techniques
Matching Longest Associative Classification Rules
Intermediate pruning uses confidence threshold value to
In this technique we select those class association rules having
remove the weak class association rules. But following more
longest left hand side that matches a particular case. The
methods may optimize the size of class association rules
longest match method is based on the conclusion that the class
without reducing its classification strength:
association rules with longest left hand side will contain more
Removing Redundant Class Association Rules accurate and richer information for the prediction of a class.
We know that longest match is more specific and accurate, but
The idea behind this technique is to exploit the fact that if a
the problem with this case is that the support and confidence
rule R meets the confidence threshold value, then any rule
of the class association rule decreases exponentially as the
containing R and having confidence less than R will apply to
size of its left hand size increases.
only covered instances. Such rules are termed as redundant
class association rules. Redundant rule pruning has been Database Coverage
reported in [5]. Its works as follows: Let R C be a general
Database coverage is a very popular pruning technique in
rule, any rule R’ C such as R R’ and R’ C has lower associative classification. The algorithms including [3] and
confidence in compare to R C, will be redundant and they [4] have successfully exploited this pruning technique to
are removed from associative classification rule set. It minimize the size of class association rule set. This method
significantly reduces the size of the class association rule set works as follows: first all the rules of class association rule set
and minimizes rules redundancy. The algorithms, including are sorted in descending order using their confidence. Then
[5, 7] have used the pruning of redundant class association each class association rule is tested against the training dataset
rules. It perform pruning immediately as a rule is inserted into instances. If a rule correctly classifies some instances in the
the data structure called CR-tree. training dataset, all instances covered by the rule are removed
Handling Conflicting Class Association Rules from the training dataset and the rule is marked as candidate
Conflicting class association rules refers to such rules that rule. If a rule does not correctly covers any instance in training
have similar LHS item-sets but predicting different classes in dataset then it is removed from the class association rule set.
RHS. Let’s take given rules such as R C1 and R C2 [7]. Finally we get the class association rule set, having candidate
Proposed a pruning technique considers these conflicting rules only.
rules and removes them. The algorithm [8] considers such Lazy Pruning
rules as useful knowledge and combines them in a single rules
naming multi-label class association rule i.e. R C1 V C2. Lazy pruning aims to remove only those rules from class
association rule set that incorrectly classify the training data
Correlation Testing Between Rule Body & its Class set instances. In lazy pruning each rule of class association
This concept is taken from statistics that finds the correlation rule set is tested against training dataset instances and we
between rule body and its predicting class to determine delete those rules that either incorrectly classifies at least one
whether they are correlated or not. The chi-square testing is training data set instance or they do not covers even single
used to for this purpose. If the discovered class association instance of the training data set. Here we do not delete the
rule is negatively correlated, it is pruned. If the rule body is covered instances from the training data set as it is done in
positively correlated to its class, it is stored in class database coverage method. Lazy pruning considers all class
association rule set. The algorithm [4] performs the association rules classifying the instances of training data set,
chi-square testing in its rule discovery step to retain or remove whereas the database coverage method considers only single
the class association rules. rule classifying an instance.
Backward Class Association Rule Pruning In other words in lazy pruning an instance is classified by
several class association rules
Pruning in decision tree involves pre-pruning and but in database coverage
post-pruning. The post-pruning, known as backward pruning method an instance is covered
Published By:
Retrieval Number: B3317129219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.B3317.129219 2045 & Sciences Publication
Classification of Pruning Methodologies for Model Development using Data Mining Techniques
by only single class association rule. The experimental results From the above table it is obvious that huge number of
have shown that lazy pruning produces large number of classification rule are generated by the lazy pruning
potential class association rules and therefore consumes more technique. The reasons is that lazy pruning involves the
memory space in compare to other techniques. method in which a large number of those class association
rules (as spare rules) are stored that do not covers even any
Laplace Accuracy
objects. The proposed database coverage methods overcomes
The Laplace accuracy is used by the associative classification with this problem and removes these spare rules that reduces
algorithm [4]. It is mainly used in class association rule the size of the associative classifiers. The CBA [3] algorithm
mining to calculate the expected error of the rules. It generates the associative classifiers of reasonable size in
calculates expected accuracy for each class association rule compare to the lazy pruning methods [12].
before the rule is applied for classification of test instances.
The CPAR algorithm has shown that exploitation of Laplace VI. CONCLUSION
accuracy has produced better results in compare to CBA.
Associative classification is a significant technique of
knowledge discovery in data mining field. Pruning techniques
IV. PROPOSED METHODOLOGY
are the most important part of the process of constructing and
Our proposed methodology aims to reduce the number of effective classifier with high accuracy standard. Effective
rules as well as to study the impact of pruning on accuracy. class association rule mining yields a classifier that reduces
Database coverage method which provides heuristics to select error possibility and increases the accuracy rate and can be
the rule subset from set of rules. This method has one shortfall deployed for use in big data analytics and data science. The
that in some cases like when there is no rule to classify it paper discusses the different pruning methods that have been
considers largest frequency class for remaining unclassified proposed since the inception of the class associative rule
instances. We can hybrid database coverage pruning with the mining technique and compares them with the latest ones.
rule induction for maximum coverage of dataset. The rule Comparison has been made on the results obtained by
which has inducted followed by rule evaluation step through different associative classification algorithms employing a
the rank. The proposed method includes rule induction, particular pruning technique. The results show that the
evaluation of the rules and classifying the test data. database coverage with pessimistic error and database
Evaluation of the rule helps to seek out whether the rule is coverage pruning have produced better results in comparison
able to cover the large part of dataset or not. While evaluating with the lazy pruning methods. They generate compact
the rule the rank of the rule is constantly revised to reflect the classifiers that are easy to understand, implement and use for
coverage of the rule on test examples. Proposed method tries classification of new data items.
to acquire as many as possible instances of dataset within the
rule and hence less number of rules derived. So there is no REFERENCES
majority voting class concept is used for unclassified
1. A., Azmi M. and Bernado. 2016. "Class Association Rules Pruning
instances. using Regularization ." In Proceeding of International Conference on
Computer System and Applications. IEEE.
V. EXPERIMENTAL EVALUATION 2. Agarwal R., Imielinski T. and Swami A. 1993. "Mining Association
Rules between Sets of Items in Large Databases." In Proceedings of
We have compared the effect of three pruning techniques International Conference on Management of Data. Washington DC.
with the number of rules derived by them. These are CBA [3] 207-216.
(pessimistic error and database coverage), MCAR [6] 3. Bayardo R. 1997. "Brute Force mining of high confidence
database coverage and lazy pruning [12]. The experiments are classification rules." In proceedings of an International conference on
Knowledge Discovery and Data Mining. Newport Beach, CA, United
done on the fourteen datasets available on UCI M/L data States. 123-126.
repository [13]. Table 3 gives the number of rules derived 4. Coenen F., and Leng P. 2004. "An Evaluation of Approaches to
from different pruning techniques. Classification Rule Selection ." In Proceedings of International
Conference on Data Mining. Brighton, United Kingdom: IEEE.
Table 3: Set of Rules Derived by Different Pruning 359-362.
Techniques 5. Hiang, Mohammad S. A. and Tze. 2017. "Effects of Pruning on
Accuracy in Associative Classification." In Journal of Informatics and
S. Name of Pessimistic Proposed Lazy Mathematical Sciences, Vol. 9, No. 4.
No. Data Set Error & Database Pruning 6. J., Quinlan. 1993. "C4.5: Programs for Machine Learning." San Mateo,
Database Coverage CA: Morgan Kaufmann.
Coverage 7. J., Vishwakarma N. and Agrawal. 2013. "Comparative Analysis of
1. Breast 47 67 22183 Different Techniques in Classification based on Association Rules." In
Proceeding of International Conference on Computational
2. Glass 29 39 11061 Intelligence and Comuting Research. IEEE.
3. Heart 43 80 40069 8. Liu B., Hsu W. and Ma Y. 1998. "Integrating Classification and
4. Iris 5 15 190 Association Rule Mining." In Proceedings of International
5. Labor 17 16 7967 Conference on Knowledge Discovery and Data Mining. New York.
80-86.
6. Lymph 35 52 86917
9. P., Baralis E. and Torino. 2002. "A Lazy Approach to Pruning
7. Pima 40 93 9842 Classification Rules." In Proceeding of International Conference on
8. Tic-tac 28 28 41823 Data Mining. IEEE.
9. Wine 11 51 40775 10. P., Merz C. and Murphy. n.d.
10. Zoo 5 9 380921 "UCI Repository of Machine
Learning Databases." Irvine CA,:
University of California.
Published By:
Retrieval Number: B3317129219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.B3317.129219 2046 & Sciences Publication
International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-9 Issue-2, December, 2019
AUTHORS PROFILE
Dr. Parashu Ram Pal, obtained Ph.D. in Computer
Science. He is working as a Professor in Department of
Information Technology, ABES Engineering College,
Ghaziabad, India. He has published three books and more
than 40 Research Papers in various International, National
Journals & Conferences. His area of interests are Data
Mining, Computer Architecture, Computer Graphics and Operations
Research. He is devoted to Education, Research & Development for more
than twenty years and always try to create a proper environment for
imparting quality education with the spirit of service to the humanity. He
believes in motivating the colleagues and students to achieve excellence in
the field of education and research.
Published By:
Retrieval Number: B3317129219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.B3317.129219 2047 & Sciences Publication