0% found this document useful (0 votes)
29 views5 pages

Paper Minig and Association

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views5 pages

Paper Minig and Association

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

International Journal of Computer Applications (0975 – 8887)

Volume 119 – No.20, June 2015

Feature Selection by Mining Optimized Association


Rules based on Apriori Algorithm

K. Rajeswari, PhD
Dept. of Computer Engineering
Pimpri Chinchwad college Of Engineering
Pune, India

ABSTRACT found and added entirely after a single consideration of its


This paper presents a novel feature selection based on usefulness. The pitfalls of this method include a high
association rule mining using reduced dataset. The key idea susceptibility to getting trapped by local optima, and a one
of the proposed work is to find closely related features using track process that easily discards a feature entirely after a
association rule mining method. Apriori algorithm is used to single consideration of its usefulness. Variations of this
find closely related attributes using support and confidence method is found in [4][5][6]. Stepwise backward
measures. From closely related attributes a number of elimination procedure starts with full set of attributes and at
association rules are mined. Among these rules, only few each step, a worst attribute is removed. INTERACT is a
related with the desirable class label are needed for backward elimination algorithm [7]. Recent references about
classification. We have implemented a novel technique to implicit enumerative techniques of selection features adapted
reduce the number of rules generated using reduced data set to regression models are found in [8 - 10]. Also the problems
thereby improving the performance of Association Rule of feature selection is dealt in.[2][11-17]. Principal component
Mining (ARM) algorithm. Experimental results of proposed analysis (PCA) is one of the famous method used [11]. But it
algorithm on datasets from standard university of California, is disadvantageous, as all the data need to be processed when
Irvine (UCI) demonstrate that our algorithm is able to classify new data is added. Decision trees are used [16-18] which
accurately with minimal attribute set when compared with uncovers relevant attributes one by one iteratively. Mutual
Information is used as a feature selector in [19]. Stepwise
other feature selection algorithms.
regression [20] uses a statistical F-Test technique and best
Keywords first search uses greedy hill climbing [21] for Feature
Feature selection, Association Rule Mining (ARM), Apriori, selection. Taguchi method is used to find the Neural Network
Classification.. structure for feature section [22]. Measures like Information
measures, distance measures, dependence measures, accuracy
1. INTRODUCTION measures, consistency measures are used for evaluating the
Many times, data sets for analysis contain hundreds of goodness of features [23-27]. Wrapper methods with Genetic
attributes, which may be irrelevant to the mining task or algorithms are used for feature selection [28][29]. The
redundant. Attribute subset selection or Feature selection is a drawback of Genetic algorithms is over fitting. Although
technique to extract closely related features and remove classification algorithms like decision tree, neural networks,
irrelevant or useless features according to an objective bayes classifier classify the given data set, it is found [30] that
function. The aim of Feature selection is to minimize the Feature selection or attribute selection play a major role in
number of features such that the probability distribution of the improving the efficiency of the classifier.
resulting data classes is near to the original distribution of all Our work minimizes an exhaustive search as the data set itself
the features [1]. An exhaustive search for the optimal subset is reduced, as per the class labels desired during classification.
of attributes can be prohibitively expensive, especially as total The organization of paper is as follows – Chapter 2 discusses
number of records (n) and the number of data classes increase. about Association rule mining process, Chapter 3 is about data
Association rule mining, one of the most important and well set reduction. Chapter 4 gives an overview about Feature
researched techniques of data mining, was initially introduced selection, Chapter 5 gives details of implementation and in
in [2]. This technique is utilized in our work with reduced data chapter 6 result analyses is discussed.
set related to the desired class label and with reduced features.
There are many reasons for subset selection of the features 2. ASSOCIATION RULE MINING
instead of all the features [3]. To measure a diminished set of Let I = {i1, i2, i3,..., id} be the set of all items in a market
features is cheaper, faster with increased accuracy by basket data and T = {t1,t2,t3,...,tn} be the set of all
exclusion of irrelevant features. Differentiating relevant and transactions. Each transaction ti contains a subset of items
irrelevant features, gives a proper insight about the nature of chosen from Item set I. A collection of zero or more items is
prediction problem and understanding of final classification termed an item set. Support count is an important property of
model. an item set. Support count refers to the number of transactions
For feature selection various heuristic methods used are that contain a particular item set. Mathematically, the support
stepwise forward selection, stepwise backward elimination, count, σ(X), for an item set X can be given as follows:
combined forward selection and backward elimination,
random generation and decision tree induction. Stepwise σ(X)= |{ti|X ⊆ ti, ti ∈ T}|
forward selection is a feature selection method which starts
with an empty set of attributes, best of the original attributes is
30
International Journal of Computer Applications (0975 – 8887)
Volume 119 – No.20, June 2015

An association rule is an implication expression of the form A not contributing to class label then it is deleted from dataset
→ B , where A and B are disjoint item sets, i.e., A ∩ B = ∅. and the size of dataset is updated.
There are two important basic measures for association rules,
Apriori algorithm is used for mining frequent item sets for
minimum support and confidence. Generally minimum
support and confidence are predefined by user/analyst so that Boolean association rules. It uses prior knowledge of frequent
the rules which are not so interesting or not useful can be item sets and explores k+1 item sets from k item sets. to
deleted. Support is the total count of number of transactions generate all k- frequent item sets . It follows antimonotonic
where all items in A and B are together. Confidence property ie if a set does not pass test, all of its supersets also
determines how frequently items in B appear in transactions will fail in the test. If P(I) < min_sup, then P( I∪A) <
that contain A. The formal definitions of these metrics are
min_sup. A two step process is followed namely Join Step
given below,
and Prune step. From these steps, frequent item sets are found
Support(A → B)= σ(A and B ) and association rules are generated.
Confidence(A → B)= σ(A and B )/ σ(A)
Apriori algorithm is used to find the frequent item sets [31]. Let Ck denote the set of candidate k-item sets and Fk
denote the set of frequent k-itemsets. The frequent itemsets
3. DATA SET REDUCTION FOR generation algorithm has two important characteristics:
FEATURE SELECTION
Feature selection is an important preprocessing technique to (1) It is a level-wise algorithm; i.e., mapped to the lattice
improve performance of association rule mining process. It structure, it traverses the item set lattice one level at a time,
improves the accuracy of the classifier. For any given dataset, from frequent 1-itemsets to the maximum size of frequent
the features can be analyzed, to find their association with the
item sets;
class label using Apriori algorithm. Rules are generated for
item sets with expected minimum support and confidence. If
the customer is interested on a particular class label, then the (2) It uses a generate-and-test strategy for finding frequent
tuples with this particular class label is taken for analyzing the item sets. At each iteration, new candidate item sets are
association level existing between attributes and desirable generated from the frequent item sets found in the previous
class label. This method of selecting tuples based on desirable iteration. The support for each candidate is then counted and
class label increases the efficiency of Apriori algorithm by tested against the minimum support threshold [16]. The
reducing the number of iterations and time involved in finding second step is to construct association rules that satisfy the
association rules .The subset features found after reducing the user-defined minimum confidence by using frequent itemsets.
tuples is fed to the classifier to check the performance of Suppose one of the frequent itemsets is Fk,
classifier.
Fk = {i1, i2,i3,…,ik},
For the heart disease data set, the class label will be ‘Have
risk’ and ‘No risk’ cadres. As user will be more interested in association rules with this itemsets can be generated in the
the class label ‘Have risk‘ only , the ‘No risk’ cadre tuples can following way: the first rule is
be removed from the data set to find the association rules for
consequences ’Have Risk’. This reduces the data set size at {i1, i2,…,ik-1}
least 40%, thereby improving the performance of Apriori
algorithm. We have obtained the similar accuracy, sensitivity by checking the confidence this rule can be determined as
and specificity values with a reduced data set as that of interesting or not.
original data set. Accuracy is tested by using C 4.5 decision
tree classifier in Weka [32]. Algorithm Feature selection based on association
rule mining returns all closely related features. Dataset D is
4. FEATURE SELECTION discretized in step 2. Output of step 2 is given to step 3 that
will reduced the size of descretized dataset to improve the
In this paper, a novel feature selection method is proposed efficiency of association rule mining thereby improving
based on Association analysis. It extracts the features by efficiency of this feature selection algorithm. All possible
analyzing the correlation between features found by rules of interested consequences are generated in step 4. Some
association rule mining. Based on the consequence, the data rules among all generated rules may not be useful and are
set is first reduced. This reduced data set with the desired deleted based on lift value (lift <=1). If lift value is equal to
consequence is used for association rule mining. This reduces one, it means the antecedent and consequent of the rule r are
the memory utilization, and time taken for each iteration of not related. If lift values is less than one, it means the
the association rule mining process. After selecting the antecedent and consequent of the rule r are related negatively
features, complete set of data is given to classifier to test the and if it is greater than one then positively related. Feature
accuracy using 10 fold cross validation. We found that, the selection based on association rule mining In step 12-13
results obtained are similar to that of the original data set. antecedent attributes of selected rules are included in result set
and are returned as a final selected feature set in step 16.
4.1 Implementation details
Our Feature selection algorithm has four steps- dataset Algorithm 1: Database_Reduction
reduction, frequent item set generation using Apriori,
association rule generation and feature selection. Algorithm D : Training set
Database_Reduction is used to reduce original dataset. Steps
4-12 are repeated for each tuple in original dataset, if tuple is N: Number of tuples in D

31
International Journal of Computer Applications (0975 – 8887)
Volume 119 – No.20, June 2015

C : Class Attribute Algorithm 4: Feature selection based on association

Method: rule mining

1. Set D’=D D’: Reduced Training set


2. Set N’=N
3. Set i=1; N’: Number of tuples in D’
4. Repeat
5. For each Ti Є D do N: Number of tuples in D
6. Set flag=false
7. For each c Є C do C: Class Attribute
8. flag=flag v Ti[c]
9. If( flag = false) Method:
10. Set D’=D’- Ti
11. Set N’=N’-1; 1. Result=Null
12. Until i < > N 2. Discretize(D)
3. Database_Reduction(D,N)
Algorithm 2: Apriori algorithm for frequent itemsets 4. Rules=apriori(D’,N’,N,minsup,minconf)
generation 5. For rach rule r Є Rules do
minsup: Minimum support threshold 6. If lift(r) < 1
7. Delete r from Rules
N :Number of tuples in original data set D 8. End if
9. End for
Method: 10. If Rules = Null then break
1. k=1 11. Else
2. Fk={i|i∈I∧ Support({i})≥N×minsup} 12. r = getRule(Rules)
3. Repeat 13. F= Select_Antecedent_Attributes(r)
4. k=k+1 14. End if
5. Ck=candidates generated from Fk-1 15. Return Result
6. For each instance t ∈ T do
7. Ct=subset(Ck, t) 5. EXPERIMENTAL RESULTS
8. For each candidate itemset c ∈ Ct do
9. Support(c)= Support(c)+1 5 .1 Efficiency
10. End for Performance of proposed algorithm is mainly
11. End for depending on the run time of Apriori algorithm.
12. Fk={c|c∈Ck∧Support(c)≥N×minsup}
13. Until Fk=Null The computational complexity of the Apriori algorithm can be
14. Result=∪Fk affected by support threshold, number of items, number of
transactions and average transaction width [34]. The
Algorithm 3: Rules generation from frequent itemset computational complexity of the Apriori algorithm on a
generated by Apriori algorithm dataset having N tuples has following parameters:

minconf: Minimum confidence threshold 5.1.1 Time required for generation of frequent 1-itemsets -
O(NI) where I is the average no. of Item set and N is the total
Method: number of transactions.
5.1.2 Time required for candidate generation C- It includes
1. For each frequent k-itemset fk, k≥2 do
merging cost and pruning cost.
2. H1={i|i∈fk}
3. call apr-genrules(fk, H1)
4. End for Apriori on original data set Apriori on Reduced data set
0.81
Function apr-genrules(fk, Hm) R 0.72
u 0.63
1. k=|fk| n
0.54
2. m=|Hm| 0.45
3. If k > m+1 then t s
0.36
4. Hm+1=m+1 i e
0.27
5. For each hm+1 ∈ Hm+1 do m c
0.18
6. Conf=Support(fk)/ Support(fk-hm+1) e
0.09
7. If Conf ≥ minconf then
i 0
8. Return the rule (fk-hm+1)→hm+1 10 20 30 40 50
n
9. Else
10. delete hm+1 from Hm+1 Minimum Support
11. End if
12. End for
13. apr-genrules(fk, Hm+1) Figure 1. Execution time of Apriori Algorithm with
14. End if different minimum support threshold.

32
International Journal of Computer Applications (0975 – 8887)
Volume 119 – No.20, June 2015

7. REFERENCES
[1] Jaiwei Han and Micheline Kamber, “Data Mining
Concepts and Techniques”, Second Edition, Elsevier,
Morgan Kaufmann publishers.
[2] R. Agrawal, T. Imielinski, and A. Swami, “Database
mining: A perfor- mance perspective,” IEEE Trans.
Knowledge Data Eng., vol. 5, Dec. 1993.
[3] Reunanen, J. (2003). “Overfitting in making
comparisons between variable selection methods”.
Journal of Machine Learning Research, 3 (7/8),
1371—1382.
[4] K.Z. Mao, “Fast Orthogonal Forward Selection
Algorithm for Feature Subset Selection”. IEEE
Transactions on Neural Networks, 2002. 13(5): 1218-
1224.
[5] J. Jelonek, Jerzy S., “Feature Subset Selection for
Figure 2. Result of Classifier C 4.5[28] on Heart dataset
Classification of Histological Images. Artificial
[29] with all attributes
Intelligence in Medicine”, 1997. 9:22-239.
[6] B. Sahiner, H.P. Chan, N. Petrick, R.F. Wagner, and
L. Hadjiiski, “Feature Selection and Classifier
Performance in Computer-Aided Diagnosis: The
Effect of Finite Sample Size” Medical Physics, 2000.
27(7): 1509-1522.
[7] Z. Zhao, H. Liu, Searching for Interacting Features,
IJCAI 2007.
[8] Gatu C. And Kontoghiorghes E.J. (2003).”Parallel
Algorithms for Computing all Possible Subset
Regression Models Using the {QR} Decomposition”.
Parallel Computing, 29, pp.505-521.
[9] Gatu C. And Kontoghiorghes E.J. (2005). ”Efficient
Strategies for Deriving the Subset {VAR} Models”.
Computational Management Science, 2 (4):253-278.
Figure 3. Result of Classifier C 4.5[28] on Heart dataset [10] Gatu C. And Kontoghiorghes E.J. (2006).”Branch-
[29] with selected attributes using feature selection and-bound Algorithms for Computing the Best-Subset
algorithm Regression Models”. Journal of Computational and
Graphical Statistics, 15 (1):139-156.
5.1.3 Time required for support counting- The
cost for support counting is [11] T. Joliffe, “Principal Component Analysis”, New
where w is the maximum transaction width and York: Springer- Verlag, 1986.
αk is the cost for updating the support count of [12] K. L. Priddy et al., “Bayesian selection of important
a candidate k-itemset.
features for feed- forward neural networks”,
The computational complexity for the Apriori algorithm on Neurocomput., vol. 5, no. 2 and 3, 1993.
reduced dataset having N’ tuples is very much less than the [13] L. M. Belue and K. W. Bauer, “Methods of
same on original dataset having N tuple (Note N’ is atleast determining input features for multilayer perceptrons,”
40% less than N) and is shown in figure 1.
Neural Comput., vol. 7, no. 2, 1995.
5.2 Effectiveness [14] J. M. Steppe, K.W. Bauer Jr., and S. K. Rogers,
Effectiveness of the proposed algorithm is checked. “Integrated feature and architecture selection,” IEEE
Trans. Neural Networks, vol. 7, July 1996.
6. CONCLUSION [15] Q. Li and D. W. Tufts, “Principal feature
classification,” IEEE Trans. Neural Networks, vol. 8,
In this paper, we have proposed a novel method of feature Jan. 1997.
selection using association rule mining on reduced data set
based on desired class label attribute. By reducing dataset the [16] R. Setiono and H. Liu, “Neural network feature
performance of Apriori algorithm is improved significantly, selector,” IEEE Trans. Neural Networks, vol. 8, May
thereby improving association rule mining process. Our result 1997.
shows that this method is effective and efficient for most of [17] R. Quinlan, C4.5: Programs for Machine Learning.
the real datasets from UCI repository with acceptable San Mateo, CA: Morgan Kaufmann.
classifier accuracy.

33
International Journal of Computer Applications (0975 – 8887)
Volume 119 – No.20, June 2015

[18] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Expression Data Using a Genetic Algorithm / k-
Stone, Classification and Regression Trees. Belmont, Nearest Neighbor Method," Combinatorial Chemistry
CA:Wadsworth, 1984. & High Throughput Screening, vol. 4, pp. 727-739,
2001.
[19] Hanchuan Peng, Fuhui Long, Chris Ding, Feature
Selection Based on Mutual Information: Criteria of [26] Guyon and A. Elisseeff, "An Introduction to Variable
Max-Dependency, Max-Relevance, and Min- and Feature Selection," Journal of Machine Learning
Redundancy, IEEE Trabsactions on Pattern Alalysis Research, vol. 3, pp. 1157-1182, 2003.
and machine Intelligence, vol. 27, No. 8, August 2005.
[27] Huizhen Liu, Shangping Dai, Hong Jiang,
[20] S Nojun Kwak and Chong-Ho Choi, Input Feature ‘Quantitative association rules mining algorithm based
Selection by Mutual Information Based on Parzen on matrix’, 978-1-4244-4507-3/09©2009 IEEE.
Window, IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 24, No. 12, December 2002. [28] Weka Software http://www.cs.waikato.ac.nz /ml/
weka.
[21] Thomas Drugman, Mihai Gurban and jean-Philippe
Thiran,’ Feature Selection and Bimodal Integration for [29] Murphy P. M. and Aha. D. W. (1994). “UCI
Audio-Visual Speech Recognition’, School of repository of Machine Learning, University of
Engineering-STI Signal Processing Institute California”, Department of Information and Computer
Science, http://www.ics. uci.edu/~ mlearn/ ML
[22] Georgia D. Tourassi, Erik D. Frederick, Mia K. Repository.html.
Markey, Carey E., Floyd, Jr., “Application of the
mutual information criterion for feature selection in [30] R. Battiti, “Using mutual information for selecting
computer-aided diagnosis”, North Carolina, Medical features in supervised neural net learning,” IEEE
Physics, vol. 28, No. 12, December 2001. Trans. Neural Networks, vol. 5, July 1994.

[23] Gang Wang, Frederick H. Lochovsky, Qiang Yang, [31] N. R. Draper and H. Smith, Applied Regression
“Feature Selection with Conditional Mutual Analysis, 2nd ed. New York: Wiley, 1981.
Information MaxiMin in Text Categorization”, [32] P. H.Winston, Artificial Intelligence, MA: Addison-
Department of Computer Science, Hong Kong Wesley, 1992.
University of Science and technology, Kowloon, Hong
Kong, 2004. [33] G. E. P. Peterson et al., “Using Taguchi’s method of
experimental design to control errors in layered
[24] J. J. Liu, G. Cutler, W. Li, Z. Pan, S. Peng, T. Hoey, L. perceptrons,” IEEE Trans. Neural Networks, vol. 6,
Chen, and X. B. Ling, "Multiclass Cancer July 1995.
Classification and Biomarker Discovery Using GA-
Based Algorithms," Bioinformatics, vol. 21, pp.2691- [34] Pang-Ning Tan, Michael Steinbach, Vipin Kumar
2697, 2005. “Introduction to Data Mining”, Addison Wesley.

[25] L. Li, T.A.Darden, C.R.Weingberg, and Levine.,


"Gene Assessment and Sample Classification for Gene

Table 1. Features selected and Accuracy of different methods on various UCI datasets [33]

Association Mining Genetic Search Chi-square Information Gain

Method No. of No. of No. of


No. of
Attribut Accurac Accurac Attribut Attribut
Attributes Accuracy Accuracy
Datasets es y y es es
selected
selected selected selected
No.
of Accura
Name
Attrib cy
utes

16 96.32 4 95.843 7 96.32 16 96.32 16 96.32


Vote

Zoo 17 92.07 12 90.67 13 92.07 17 92.07 17 92.07

Breast
9 94.84 8 92.8 9 94.84 9 94.84 9 94.84
Cancer

IJCATM : www.ijcaonline.org 34

You might also like