Improv Me Net
Improv Me Net
Scientific Programming
Volume 2015, Article ID 910281, 6 pages
http://dx.doi.org/10.1155/2015/910281
Research Article
Research of Improved FP-Growth Algorithm in
Association Rules Mining
Copyright © 2015 Yi Zeng et al. This is an open access article distributed under the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Association rules mining is an important technology in data mining. FP-Growth (frequent-pattern growth) algorithm is a classical
algorithm in association rules mining. But the FP-Growth algorithm in mining needs two times to scan database, which reduces
the efficiency of algorithm. Through the study of association rules mining and FP-Growth algorithm, we worked out improved
algorithms of FP-Growth algorithm—Painting-Growth algorithm and N (not) Painting-Growth algorithm (removes the painting
steps, and uses another way to achieve). We compared two kinds of improved algorithms with FP-Growth algorithm. Experimental
results show that Painting-Growth algorithm is more than 1050 and N Painting-Growth algorithm is less than 10000 in data volume;
the performance of the two kinds of improved algorithms is better than that of FP-Growth algorithm.
Definition 2. When the length of the item set 𝑋 is 𝑘 and Figure 1: Generating FP-tree.
support(𝑋) ≥ minsup, one calls item set 𝑋𝑘-item frequent
set. If 𝑘 ≥ 3, one can call item set 𝑋 multi-item frequent set.
Nature. All nonempty subsets of frequent item sets must be
frequent. For convenience of tree traversal, the algorithm creates
an item header table. Each item through a node link points
to itself in FP-tree. After scanning all transactions, we get the
2.2. FP-Growth Algorithm. FP-Growth algorithm [10] com- FP-tree displayed in Figure 1.
presses the database into a frequent pattern tree (FP-tree) FP-tree Mining Processing. The algorithm starts by the
and still maintains the information of associations between frequent patterns’ length of 1 (initial suffix pattern) and builds
item sets. Then the compressed database is divided into its conditional pattern base (a “subdatabase,” consisting of the
a set of condition databases (a special type of projection prefix path set which appears with the suffix pattern). Then,
database). Each condition database is dug, respectively, and algorithm builds a (conditional) FP-tree for the conditional
associates with a frequent item. Transaction database is in pattern base and recursively digs the tree. The achievement
Table 1 (support count is 2); mining process using FP-Growth of pattern growth gets through the link between frequent
algorithm is shown in Table 1. patterns generating by conditional FP-tree and suffix pattern.
Scanning the database for the first time, we can obtain a The mining of FP-tree is summarized in Table 2.
set of frequent items and their support count. The collection
of frequent items is ordered by decreasing sequence of
support count. The result set or list writes for 𝐿. In this way, 2.3. System Model. Algorithms of frequent patterns mining
we have 𝐿 = [C:4, D:3, E:3, A:2, B:2]. have been applied in many fields. Researching their system
Building FP-Tree. First, the algorithm creates the root model can facilitate a better understanding of them. Figure 2
node of the tree, with the tag “null.” Then it scans the is a system model of the improved algorithms in this paper.
database for the second time. Each item in a transac- The user can get needed knowledge which passes data
tion is ordered by the sequence of 𝐿. Later it creates a mining through the data mining platform. Data mining plat-
branch for each transaction. For example, the first trans- form includes data definition, mining designer, and pattern
action “001:A, B, C, D, E” contains five items {C, D, E, A, B} filter. Through the data definition, we can do a pretreatment
according to the sequence of 𝐿, generating the first branch for data and make incomplete data usable; through the
⟨(C:1), (D:1), (E:1), (A:1), (B:1)⟩ for building FP-tree. The mining designer, we can use the improved algorithms to dig
branch has five nodes. In it, C is the children link of root, data and get useful patterns (here are frequent item sets);
D links to C, E links to D, A links to E, and B links to through the pattern filter, we can select interesting patterns
A. The second transaction “002:B, C, E” contains three items from obtained patterns.
{C, E, B} according to the sequence of 𝐿, generating a branch.
In it, C links to the root, E links to C, and B links to E. 3. Improved Algorithms Based on the
This branch shares the prefix ⟨C⟩ with the existing path FP-Growth Algorithm
of transaction “001.” In this way, the algorithm makes the
count of node C increase by 1 and creates two new nodes FP-Growth algorithm requires scanning database twice. Its
⟨(E:1), (B:1)⟩ as a link of (C:2). Generally, the algorithm algorithm efficiency is not high. This paper puts forward
considers increasing a branch for a transaction and when two improved algorithms—Painting-Growth algorithm and
each node follows common prefix, its count increases by 1; N Painting-Growth algorithm—which use two-item permu-
algorithm creates node for the item following the prefix and tation sets to dig. Both algorithms scan database only once to
linking. obtain the results of mining.
Scientific Programming 3
User
Knowledge
Similarly, according to the frequent item association sets (25) {hm0.put(z+“,”+list0.get(j),value)};//save the item sets
{C(A:2,B:2,D:3,E:3);D(A:2,C:3,E:2);E(B:2,C:3,D:2)}, we get a and its support count in hm0
three-item frequent set {(C,D,E):2}. (26) }
(6) At this point, we get all frequent item sets.
The algorithm pseudocode is as follows. (27) return hm0;//gain all frequent item sets
(28) super.paintComponents(g); //execute painting
Algorithm 3 (Painting-Growth). method.
Input. Transaction database, minimum support count: 2 3.2. N Painting-Growth Algorithm. The thought of N Paint-
ing-Growth algorithm is similar to the Painting-Growth
Output. All frequent item sets algorithm, but with different implementation method. N
(1) HashMap⟨String, integer⟩ hm0; //define Painting-Growth algorithm removes the painting steps. The
a HashMap set hm0 mining process of N Painting-Growth is as follows.
(2) List⟨String⟩ list,list0; //define the List set list,list0 (1) The algorithm scans the database once and gets two-
(3) List⟨String⟩ permutation(); //scan the transaction item permutation sets of all transactions.
database, execute two-item arranging to each trans- (2) Then, the algorithm counts each permutation in two-
action, return list item permutation sets getting all item association sets.
(4) paint(Graphics g) //painting method (3) Later, the algorithm removes infrequent associations
(5) String[] s=null, x=null; //define String[] s, x according to the support count and gets frequent item
association sets.
(6) String z, y;
(4) Finally, it gets all frequent item sets according to the
(7) HashMap⟨String, HashMap⟨String, integer⟩⟩ frequent item association sets. Mining ends.
hm=null; //define a HashMap set hm
From the above processes it can be seen that the N
(8) For (int i=0; i<list. size(); i++)
Painting-Growth algorithm is the removing of painting steps
(9) { version of Painting-Growth. The implementation methods
(10) s = list.get(i).split(“,”); //let list.get(i) to a String[] are different: Painting-Growth algorithm imports java.awt
and javax.swing, implementing mining through calling
(11) drawLine(s[0].x, s[0].y, s[1].x, s[1].y); //draw
super.paintComponents(g); N Painting-Growth algorithm
a line between s[0] and s[1]
only passes instantiation of a class in main function to
(12) HashMap⟨String, HashMap⟨String, integer⟩⟩ implement.
count(drawLine()); //count the drawing line and
return the item associations to hm 4. Experimental Results Analysis
(13) }
To improved algorithms—Painting-Growth and N Painting-
(14) Iterator it = hm.keySet().iterator; //define key set Growth algorithm—the biggest advantage is reducing data-
iterator of hm base scanning to once. Comparing with scanning database
(15) z = it.next(); //let the key in key set of hm to z twice of FP-Growth algorithm, it has improved time effi-
(16) Iterator it0 = hm.get(z). keySet(). iterator; //define ciency.
the key sets iterator in value sets of hm Another advantage is that improved algorithms are sim-
ple, completing all mining only needing transactions’ two-
(17) y = it0.next(); //let the key in key sets of value sets item permutation sets. Although the FP-Growth algorithm is
of hm to y also getting FP-tree to complete mining, the FP-tree builds
(18) if(hm.get(z).get(y)<minsup∗N) //if the value in complexly and requires memory overhead largely. Relatively,
value sets of hm less than minimum support count the two-item permutation sets can be obtained easily.
(19) {it0.remove();} //remove the unfrequented item sets Of course, improved algorithms have disadvantages. In
Painting-Growth algorithm, the algorithm needs to build the
(20) List⟨String⟩ combination(hm.get(z).keySet()); association picture, leading to a large memory overhead. In
//combination the key sets in value sets based on key N Painting-Growth algorithm, the implementation method
z of hm, return list0 is less vivid than Painting-Growth algorithm. When using the
(21) for(int j=0; j<list0.size();j++) two improved algorithms to dig multi-item frequent sets, they
(22) { scan the frequent item association sets repeatedly for count.
This reduces the time efficiency.
(23) x = list0.get(j).split(“,”); In order to verify the two kinds of improved algorithms
(24) if(count(hm.contain(z+“,”+list0.get(j))==1+x. relative to the FP-Growth algorithm existing superiority,
length)) //if the count of item sets in hm equal with we use the Java language, in eclipse development environ-
the length of the item sets(first consider the key of hm ment, Windows 7 64-bit operating system, implementing the
in the item sets or not) Painting-Growth algorithm, N Painting-Growth algorithm,
Scientific Programming 5
Increase rate
0.25
On the other hand, from 1050 transactions, the execution 0.2
0.15
time of Painting-Growth algorithm is a little bit more than 0.1
FP-Growth algorithm. But with the increase in number 0.05
0
of transactions, the execution time is less than the FP- 0 1 2 3 4 5 6 7 8
Growth algorithm significantly. Thus it can be seen, from Transaction stage
the transactions-execution time comparing, that Painting-
Painting-Growth
Growth algorithm is more stable and efficient than FP-
N Painting-Growth
Growth algorithm. FP-Growth
Another, the implementation method of Painting-
Growth algorithm and N Painting-Growth is different. The Figure 5: The increase rate of three algorithms in different transac-
performance is also different. Although N Painting-Growth tion stages.
algorithm omits the painting steps, only around 1050 trans-
actions to 10500 transactions, the execution time of N
Painting-Growth algorithm is a little less than Painting- Finally, to FP-Growth algorithm, although the whole
Growth algorithm. Then, with the increase of transaction change trend of increase rate is similar to improved algo-
amount, the performance of Painting-Growth algorithm is rithms, it has more clear change than improved algorithms in
far better than N Painting-Growth algorithm. This shows stage 2 and stage 5. So, the FP-Growth algorithm is less stable
that the implementation method of N Painting-Growth has than improved algorithms.
large memory consumption which leading the execution From what is above it can be concluded that our Painting-
time of N Painting-Growth grows faster. Growth algorithm has an obvious breakthrough in data
Figure 5 is execution time’s increase rate comparing of analysis. Unhesitatingly, when the data size is suitable, we can
different transaction stages for Painting-Growth algorithm, consider adopting improved algorithms to achieve further
N Painting-Growth algorithm, and FP-Growth algorithm. performance. Carefully, the transactions are less than 10000
There are seven transaction stages; stage 1: 0–1050 trans- and we can consider N Painting-Growth algorithm. In other
actions, stage 2: 1050–5250 transactions, stage 3: 5250– cases, the Painting-Growth algorithm performs better and we
10500 transactions, stage 4: 10500–21000 transactions, stage can consider adopting it.
5: 21000–31500 transactions, stage 6: 31500–42000 transac-
tions, and stage 7: 42000–52500 transactions. 5. Conclusions
From Figure 5, firstly, to Painting-Growth algorithm at
initial stage 1, the execution time’s increase rate of Painting- In this paper, we put forward improved algorithms—
Growth algorithm is high. But then, from stage 2 to stage 7, the Painting-Growth algorithm and N Painting-Growth algo-
fluctuation of execution time’s increase rate is gentle, stable rithm. Both algorithms get all frequent item sets only through
performance. And at stage 2 to stage 6, the execution time’s the two-item permutation sets of transactions, being simple
increase rate of Painting-Growth algorithm is lower than FP- in principle and easy to implement and only scanning
Growth algorithm, superior performance. the database once. So, at appropriate transactions, we can
Secondly, to N Painting-Growth algorithm at the first consider using the improved algorithms. But we also see
three stages, the execution time’s increase rate of N Painting- the problems of improved algorithm: in large data, the
Growth algorithm is lower than FP-Growth algorithm, per- performance of the N Painting-Growth is disappointing.
forming well. But later, the increase rate of N Painting- Considering how to make the performance of the improved
Growth algorithm is almost higher than FP-Growth algo- algorithms more stable, make the removal of unfrequented
rithm and Painting-Growth algorithm. It also explains why item associations efficient, and make the mining of multi-
the execution time of N Painting-Growth is rising rapidly. item frequent sets quick will be our future work.
6 Scientific Programming
Conflict of Interests
The authors declare that there is no conflict of interests
regarding the publication of this paper.
Acknowledgments
This work is supported by the Fundamental Research Funds
for the Central Universities (XDJK2009C027) and Science &
Technology Project (2013001287).
References
[1] P. Yang and Z. Song, “An improvement to FP-growth algorithm,”
Journal of Anhui Institute of Mechanical & Electrical Engineering:
Natural Science, vol. 17, no. 3, pp. 8–13, 2005.
[2] D. Fengyi and L. Zhenyu, “An ameliorating FP-growth algo-
rithm based on patterns-matrix,” Journal of Xiamen University
(Natural Science), vol. 44, no. 5, pp. 629–633, 2005.
[3] Y. Yang and Y. Luo, “Improved algorithm based on FP-Growth,”
Computer Engineering and Design, no. 7, pp. 1506–1509, 2010.
[4] Q. Ruan, Y. Li, and X. Liu, “A hash table and linear based
improved FP-Tree algorithm,” Journal of Yangtze University
(Natural Science Edition): Science & Engineering, vol. 1, pp. 76–
79, 2010.
[5] X. Luo and J. Chen, “An improvement algorithm for FP-growth,”
Journal of Xi’an University of Science and Technology, vol. 29, no.
4, pp. 491–494, 2009.
[6] L. Zhichun and Y. Fengxin, “An improved frequent pattern tree
growth algorithm,” Applied Science and Technology, vol. 35, no.
6, pp. 47–51, 2008.
[7] C. Jun and G. Li, “An improved FP-growth algorithm based on
item head table node,” Information Technology, vol. 12, pp. 34–
35, 2013.
[8] B. Zheng and J. Li, “An improved algorithm based on FP-
growth,” Journal of Pingdingshan Institute of Technology, vol. 17,
no. 4, pp. 9–12, 2008.
[9] N. Xinzheng and S. Kun, “Mining maximal frequent item sets
with improved algorithm of FPMAX,” Computer Science, vol.
40, no. 12, pp. 223–228, 2013.
[10] J. Han and M. Kamber, Data Mining: Concepts and Techniques,
China Machine Press, Beijing, China, 2001, translated by: F.
Ming, M. Xiaofeng.
Advances in Journal of
Industrial Engineering
Multimedia
Applied
Computational
Intelligence and Soft
Computing
The Scientific International Journal of
Distributed
Hindawi Publishing Corporation
World Journal
Hindawi Publishing Corporation
Sensor Networks
Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
Advances in
Fuzzy
Systems
Modelling &
Simulation
in Engineering
Hindawi Publishing Corporation
Hindawi Publishing Corporation Volume 2014 http://www.hindawi.com Volume 2014
http://www.hindawi.com
International Journal of
Advances in Computer Games Advances in
Computer Engineering Technology Software Engineering
Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
International Journal of
Reconfigurable
Computing