Paper Minig and Association
Paper Minig and Association
K. Rajeswari, PhD
Dept. of Computer Engineering
Pimpri Chinchwad college Of Engineering
Pune, India
An association rule is an implication expression of the form A not contributing to class label then it is deleted from dataset
→ B , where A and B are disjoint item sets, i.e., A ∩ B = ∅. and the size of dataset is updated.
There are two important basic measures for association rules,
Apriori algorithm is used for mining frequent item sets for
minimum support and confidence. Generally minimum
support and confidence are predefined by user/analyst so that Boolean association rules. It uses prior knowledge of frequent
the rules which are not so interesting or not useful can be item sets and explores k+1 item sets from k item sets. to
deleted. Support is the total count of number of transactions generate all k- frequent item sets . It follows antimonotonic
where all items in A and B are together. Confidence property ie if a set does not pass test, all of its supersets also
determines how frequently items in B appear in transactions will fail in the test. If P(I) < min_sup, then P( I∪A) <
that contain A. The formal definitions of these metrics are
min_sup. A two step process is followed namely Join Step
given below,
and Prune step. From these steps, frequent item sets are found
Support(A → B)= σ(A and B ) and association rules are generated.
Confidence(A → B)= σ(A and B )/ σ(A)
Apriori algorithm is used to find the frequent item sets [31]. Let Ck denote the set of candidate k-item sets and Fk
denote the set of frequent k-itemsets. The frequent itemsets
3. DATA SET REDUCTION FOR generation algorithm has two important characteristics:
FEATURE SELECTION
Feature selection is an important preprocessing technique to (1) It is a level-wise algorithm; i.e., mapped to the lattice
improve performance of association rule mining process. It structure, it traverses the item set lattice one level at a time,
improves the accuracy of the classifier. For any given dataset, from frequent 1-itemsets to the maximum size of frequent
the features can be analyzed, to find their association with the
item sets;
class label using Apriori algorithm. Rules are generated for
item sets with expected minimum support and confidence. If
the customer is interested on a particular class label, then the (2) It uses a generate-and-test strategy for finding frequent
tuples with this particular class label is taken for analyzing the item sets. At each iteration, new candidate item sets are
association level existing between attributes and desirable generated from the frequent item sets found in the previous
class label. This method of selecting tuples based on desirable iteration. The support for each candidate is then counted and
class label increases the efficiency of Apriori algorithm by tested against the minimum support threshold [16]. The
reducing the number of iterations and time involved in finding second step is to construct association rules that satisfy the
association rules .The subset features found after reducing the user-defined minimum confidence by using frequent itemsets.
tuples is fed to the classifier to check the performance of Suppose one of the frequent itemsets is Fk,
classifier.
Fk = {i1, i2,i3,…,ik},
For the heart disease data set, the class label will be ‘Have
risk’ and ‘No risk’ cadres. As user will be more interested in association rules with this itemsets can be generated in the
the class label ‘Have risk‘ only , the ‘No risk’ cadre tuples can following way: the first rule is
be removed from the data set to find the association rules for
consequences ’Have Risk’. This reduces the data set size at {i1, i2,…,ik-1}
least 40%, thereby improving the performance of Apriori
algorithm. We have obtained the similar accuracy, sensitivity by checking the confidence this rule can be determined as
and specificity values with a reduced data set as that of interesting or not.
original data set. Accuracy is tested by using C 4.5 decision
tree classifier in Weka [32]. Algorithm Feature selection based on association
rule mining returns all closely related features. Dataset D is
4. FEATURE SELECTION discretized in step 2. Output of step 2 is given to step 3 that
will reduced the size of descretized dataset to improve the
In this paper, a novel feature selection method is proposed efficiency of association rule mining thereby improving
based on Association analysis. It extracts the features by efficiency of this feature selection algorithm. All possible
analyzing the correlation between features found by rules of interested consequences are generated in step 4. Some
association rule mining. Based on the consequence, the data rules among all generated rules may not be useful and are
set is first reduced. This reduced data set with the desired deleted based on lift value (lift <=1). If lift value is equal to
consequence is used for association rule mining. This reduces one, it means the antecedent and consequent of the rule r are
the memory utilization, and time taken for each iteration of not related. If lift values is less than one, it means the
the association rule mining process. After selecting the antecedent and consequent of the rule r are related negatively
features, complete set of data is given to classifier to test the and if it is greater than one then positively related. Feature
accuracy using 10 fold cross validation. We found that, the selection based on association rule mining In step 12-13
results obtained are similar to that of the original data set. antecedent attributes of selected rules are included in result set
and are returned as a final selected feature set in step 16.
4.1 Implementation details
Our Feature selection algorithm has four steps- dataset Algorithm 1: Database_Reduction
reduction, frequent item set generation using Apriori,
association rule generation and feature selection. Algorithm D : Training set
Database_Reduction is used to reduce original dataset. Steps
4-12 are repeated for each tuple in original dataset, if tuple is N: Number of tuples in D
31
International Journal of Computer Applications (0975 – 8887)
Volume 119 – No.20, June 2015
minconf: Minimum confidence threshold 5.1.1 Time required for generation of frequent 1-itemsets -
O(NI) where I is the average no. of Item set and N is the total
Method: number of transactions.
5.1.2 Time required for candidate generation C- It includes
1. For each frequent k-itemset fk, k≥2 do
merging cost and pruning cost.
2. H1={i|i∈fk}
3. call apr-genrules(fk, H1)
4. End for Apriori on original data set Apriori on Reduced data set
0.81
Function apr-genrules(fk, Hm) R 0.72
u 0.63
1. k=|fk| n
0.54
2. m=|Hm| 0.45
3. If k > m+1 then t s
0.36
4. Hm+1=m+1 i e
0.27
5. For each hm+1 ∈ Hm+1 do m c
0.18
6. Conf=Support(fk)/ Support(fk-hm+1) e
0.09
7. If Conf ≥ minconf then
i 0
8. Return the rule (fk-hm+1)→hm+1 10 20 30 40 50
n
9. Else
10. delete hm+1 from Hm+1 Minimum Support
11. End if
12. End for
13. apr-genrules(fk, Hm+1) Figure 1. Execution time of Apriori Algorithm with
14. End if different minimum support threshold.
32
International Journal of Computer Applications (0975 – 8887)
Volume 119 – No.20, June 2015
7. REFERENCES
[1] Jaiwei Han and Micheline Kamber, “Data Mining
Concepts and Techniques”, Second Edition, Elsevier,
Morgan Kaufmann publishers.
[2] R. Agrawal, T. Imielinski, and A. Swami, “Database
mining: A perfor- mance perspective,” IEEE Trans.
Knowledge Data Eng., vol. 5, Dec. 1993.
[3] Reunanen, J. (2003). “Overfitting in making
comparisons between variable selection methods”.
Journal of Machine Learning Research, 3 (7/8),
1371—1382.
[4] K.Z. Mao, “Fast Orthogonal Forward Selection
Algorithm for Feature Subset Selection”. IEEE
Transactions on Neural Networks, 2002. 13(5): 1218-
1224.
[5] J. Jelonek, Jerzy S., “Feature Subset Selection for
Figure 2. Result of Classifier C 4.5[28] on Heart dataset
Classification of Histological Images. Artificial
[29] with all attributes
Intelligence in Medicine”, 1997. 9:22-239.
[6] B. Sahiner, H.P. Chan, N. Petrick, R.F. Wagner, and
L. Hadjiiski, “Feature Selection and Classifier
Performance in Computer-Aided Diagnosis: The
Effect of Finite Sample Size” Medical Physics, 2000.
27(7): 1509-1522.
[7] Z. Zhao, H. Liu, Searching for Interacting Features,
IJCAI 2007.
[8] Gatu C. And Kontoghiorghes E.J. (2003).”Parallel
Algorithms for Computing all Possible Subset
Regression Models Using the {QR} Decomposition”.
Parallel Computing, 29, pp.505-521.
[9] Gatu C. And Kontoghiorghes E.J. (2005). ”Efficient
Strategies for Deriving the Subset {VAR} Models”.
Computational Management Science, 2 (4):253-278.
Figure 3. Result of Classifier C 4.5[28] on Heart dataset [10] Gatu C. And Kontoghiorghes E.J. (2006).”Branch-
[29] with selected attributes using feature selection and-bound Algorithms for Computing the Best-Subset
algorithm Regression Models”. Journal of Computational and
Graphical Statistics, 15 (1):139-156.
5.1.3 Time required for support counting- The
cost for support counting is [11] T. Joliffe, “Principal Component Analysis”, New
where w is the maximum transaction width and York: Springer- Verlag, 1986.
αk is the cost for updating the support count of [12] K. L. Priddy et al., “Bayesian selection of important
a candidate k-itemset.
features for feed- forward neural networks”,
The computational complexity for the Apriori algorithm on Neurocomput., vol. 5, no. 2 and 3, 1993.
reduced dataset having N’ tuples is very much less than the [13] L. M. Belue and K. W. Bauer, “Methods of
same on original dataset having N tuple (Note N’ is atleast determining input features for multilayer perceptrons,”
40% less than N) and is shown in figure 1.
Neural Comput., vol. 7, no. 2, 1995.
5.2 Effectiveness [14] J. M. Steppe, K.W. Bauer Jr., and S. K. Rogers,
Effectiveness of the proposed algorithm is checked. “Integrated feature and architecture selection,” IEEE
Trans. Neural Networks, vol. 7, July 1996.
6. CONCLUSION [15] Q. Li and D. W. Tufts, “Principal feature
classification,” IEEE Trans. Neural Networks, vol. 8,
In this paper, we have proposed a novel method of feature Jan. 1997.
selection using association rule mining on reduced data set
based on desired class label attribute. By reducing dataset the [16] R. Setiono and H. Liu, “Neural network feature
performance of Apriori algorithm is improved significantly, selector,” IEEE Trans. Neural Networks, vol. 8, May
thereby improving association rule mining process. Our result 1997.
shows that this method is effective and efficient for most of [17] R. Quinlan, C4.5: Programs for Machine Learning.
the real datasets from UCI repository with acceptable San Mateo, CA: Morgan Kaufmann.
classifier accuracy.
33
International Journal of Computer Applications (0975 – 8887)
Volume 119 – No.20, June 2015
[18] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Expression Data Using a Genetic Algorithm / k-
Stone, Classification and Regression Trees. Belmont, Nearest Neighbor Method," Combinatorial Chemistry
CA:Wadsworth, 1984. & High Throughput Screening, vol. 4, pp. 727-739,
2001.
[19] Hanchuan Peng, Fuhui Long, Chris Ding, Feature
Selection Based on Mutual Information: Criteria of [26] Guyon and A. Elisseeff, "An Introduction to Variable
Max-Dependency, Max-Relevance, and Min- and Feature Selection," Journal of Machine Learning
Redundancy, IEEE Trabsactions on Pattern Alalysis Research, vol. 3, pp. 1157-1182, 2003.
and machine Intelligence, vol. 27, No. 8, August 2005.
[27] Huizhen Liu, Shangping Dai, Hong Jiang,
[20] S Nojun Kwak and Chong-Ho Choi, Input Feature ‘Quantitative association rules mining algorithm based
Selection by Mutual Information Based on Parzen on matrix’, 978-1-4244-4507-3/09©2009 IEEE.
Window, IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 24, No. 12, December 2002. [28] Weka Software http://www.cs.waikato.ac.nz /ml/
weka.
[21] Thomas Drugman, Mihai Gurban and jean-Philippe
Thiran,’ Feature Selection and Bimodal Integration for [29] Murphy P. M. and Aha. D. W. (1994). “UCI
Audio-Visual Speech Recognition’, School of repository of Machine Learning, University of
Engineering-STI Signal Processing Institute California”, Department of Information and Computer
Science, http://www.ics. uci.edu/~ mlearn/ ML
[22] Georgia D. Tourassi, Erik D. Frederick, Mia K. Repository.html.
Markey, Carey E., Floyd, Jr., “Application of the
mutual information criterion for feature selection in [30] R. Battiti, “Using mutual information for selecting
computer-aided diagnosis”, North Carolina, Medical features in supervised neural net learning,” IEEE
Physics, vol. 28, No. 12, December 2001. Trans. Neural Networks, vol. 5, July 1994.
[23] Gang Wang, Frederick H. Lochovsky, Qiang Yang, [31] N. R. Draper and H. Smith, Applied Regression
“Feature Selection with Conditional Mutual Analysis, 2nd ed. New York: Wiley, 1981.
Information MaxiMin in Text Categorization”, [32] P. H.Winston, Artificial Intelligence, MA: Addison-
Department of Computer Science, Hong Kong Wesley, 1992.
University of Science and technology, Kowloon, Hong
Kong, 2004. [33] G. E. P. Peterson et al., “Using Taguchi’s method of
experimental design to control errors in layered
[24] J. J. Liu, G. Cutler, W. Li, Z. Pan, S. Peng, T. Hoey, L. perceptrons,” IEEE Trans. Neural Networks, vol. 6,
Chen, and X. B. Ling, "Multiclass Cancer July 1995.
Classification and Biomarker Discovery Using GA-
Based Algorithms," Bioinformatics, vol. 21, pp.2691- [34] Pang-Ning Tan, Michael Steinbach, Vipin Kumar
2697, 2005. “Introduction to Data Mining”, Addison Wesley.
Table 1. Features selected and Accuracy of different methods on various UCI datasets [33]
Breast
9 94.84 8 92.8 9 94.84 9 94.84 9 94.84
Cancer
IJCATM : www.ijcaonline.org 34