Mining Association Rules With Systolic Trees: Dept. of Electrical and Computer Engineering Iowa State University Email
Mining Association Rules With Systolic Trees: Dept. of Electrical and Computer Engineering Iowa State University Email
n
k=1
n
k
C
k
n
.
The latter in the worst case is W cycles which is the
time for the bottom counting node to propagate the item to
the control node. The time in this part is negligible com-
pared with that of the former case computed above. For
an arbitrary transactional database with n frequent items,
it can always be projected into multiple sub-databases with
N frequent items by Parallel Projection. Since these sub-
databases are mined in parallel, the time required for mining
is solely determined by the size of the systolic tree. If the
size of the tree N is equal to the number of frequent items
n, i.e, K = W = N = n, the number of clock cycles for
mining are 2
n
n
k=1
n
k
C
k
n
.
Based on the simulated result, we compare the mining
time of systolic tree with FP-growth algorithm in Fig. 7.
The FP-growth algorithm is from [13]. The mining time
of the software algorithm is collected from a PC with Pen-
tium D 3GHz CPU, 2GB RAM. The benchmark is chess.dat
from[14] which is prepared fromthe UCI datasets and PUMSB.
This database has 75 items and 3196 transactions. In our
experiments, we change the support count threshold to get
Table 2. Experimental Results
Design Parameters
K = 2 K = 3 K = 4
W = 3 W = 4 W = 3 W = 4 W = 3 W = 4
Total PEs 18 35 43 125 88 345
Total Slices 648(1.25%) 1267(2.44%) 1548(2.99%) 4616(8.90%) 3169(6.11%) 12674(24.25%)
Clock Freq (MHz) 427.533 425.532 421.941 412.031 402.253 363.769
7 8 9 10 11 12 13
0
2
4
6
8
10
12
14
16
18
20
Number of Frequent Items
L
o
g
a
r
i
t
h
m
o
f
t
i
m
e
i
n
m
i
c
r
o
s
e
c
o
n
d
s
Software(FP-growth)
Hardware(Systolic Tree)
Fig. 7. Hardware and Software Mining Time Comparison
different numbers of frequent items. Note that the run time
of the FP-growth algorithm is closely related to the size of
the FP-tree while the run time of the systolic tree implemen-
tation is only determined by the number of frequent items.
It can be observed that the threshold size of the systolic tree
must be no more than 11 in order to be faster than FP-growth
algorithm. When the size of the systolic tree is 10, the min-
ing speed is 24 times faster than FP-growth. Our future work
will shrink the size of the systolic tree to shorten the dicta-
tion time, thus increasing the threshold size of the tree.
6. CONCLUSION
In this paper we proposed a systolic tree hardware architec-
ture for association rules mining. Similar to the FP-growth
algorithm, our architecture only requires two database reads.
Our preliminary experiments show that with the careful se-
lection of the size of the systolic tree, the mining time can be
greatly accelerated compared to current software approaches.
7. REFERENCES
[1] R. Agrawal and R. Srikant, Fast algorithms for mining asso-
ciation rules, in Proceedings of the 1994 International Con-
ference on Very Large Data Bases (VLDB), 1994.
[2] J. Han, J. Pei, Y. Yin, and R. Mao, Mining frequent pat-
terns without candidate generation: A frequent-pattern tree
approach, Data Mining and Knowledge Discovery, 2004.
[3] S. Kotsiantis and D. Kanellopoulos, Association rules min-
ing: A recent overview, in GESTS International Transac-
tions on Computer Science and Engineering Vol.32, 2006.
[4] A. Choudhary, R. Narayanan, B. Ozisikyilmaz, G. Memik,
J. Zambreno, and J. Pisharath, Optimizing data mining
workloads using hardware accelerators, in Proceedings of
the Workshop on Computer Architecture Evaluation using
Commercial Workloads (CAECW), 2007.
[5] R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and
A. Choudhary, Minebench: A benchmark suite for data min-
ing workloads, in Proceedings of the IEEE International
Symposium on Workload Characterization (IISWC), 2006.
[6] J. Zambreno, B. Ozisikyilmaz, G. Memik, and A. Choudary,
Performance characterization of data mining applications
using minebench, in Proceedings of the Workshop on Com-
puter Architecture Evaluation using Commercial Workloads
(CAECW), 2006.
[7] Y. Ye and C.-C. Chiang, A parallel apriori algorithm for fre-
quent itemsets mining, in Proceedings of the Fourth Interna-
tional Conference on Software Engineering Research, Man-
agement and Applications, 2006.
[8] I. Pramudiono and M. Kitsuregawa, Parallel FP-growth
on PC cluster, in Proceedings of the Seventh Pacic-
Asia Conference of Knowledge Discovery and Data Min-
ing(PAKDD03), 2003.
[9] Z. K.Baker and V. K.Prasanna, Efcient hardware data min-
ing with the Apriori Algorithm on FPGAs, in Proceedings
of the 13th Annual IEEE Symposium on Field-Programmable
Custom Computing Machines (FCCM), 2005.
[10] , An architecture for efcient hardware data mining us-
ing recongurable computing systems, in Proceedings of the
14th Annual IEEE Symposium on Field-Programmable Cus-
tom Computing Machines (FCCM), 2006.
[11] A. Ghoting, G. Buehrer, S. Parthasarathy, D. Kim,
A. Nguyen, Y.-K. Chen, and P. Dubey, Cache-conscious
frequent pattern mining on a modern processor, in Pro-
ceedings of the 31st International Conference on Very Large
Databases, 2005.
[12] P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data
Mining. Addison Wesley, 2005.
[13] F. Coenen, The LUCS-KDDimplementation of the FP-growth
algorithm, website, 2003, http://www.csc.liv.ac.uk/
frans/
KDD/Software/FPgrowth/fpGrowth.html#downloading.
[14] B. Goethals, Frequent Itemset Mining Dataset Repository,
website, http://mi.cs.helsinki./data/.