0% found this document useful (0 votes)

57 views15 pages

An Efficient Algorithm For Enumerating Closed Patterns in Transaction Databases

The document describes an efficient algorithm called LCM for mining frequent closed patterns from transaction databases. LCM uses a new technique called prefix-preserving closure extension (ppc-extension) that allows it to generate each closed pattern from a previous one in linear time, avoiding the exponential time and memory costs of existing algorithms. Theoretical analysis and experiments show LCM can enumerate all frequent closed patterns in linear time and space relative to the number of patterns, outperforming previous algorithms that may require exponential time or memory.

Uploaded by

k.s insandji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views15 pages

An Efficient Algorithm For Enumerating Closed Patterns in Transaction Databases

Uploaded by

k.s insandji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

An Efficient Algorithm for Enumerating Closed

Patterns in Transaction Databases

Takeaki Uno1 , Tatsuya Asai2 3 , Yuzo Uchida2 , and Hiroki Arimura2

1
National Institute of Informatics, 2-1-2, Hitotsubashi, Chiyoda-ku, Tokyo, Japan,
e-mail: [email protected]
2
Department of Informatics, Kyushu University, 6-10-1, Hakozaki, Fukuoka, Japan,
e-mail:{t-asai,y-uchida,arim}@i.kyushu-u.ac.jp
Abstract: The class of closed patterns is a well known condensed representations of
frequent patterns, and have recently attracted considerable interest. In this paper, we
propose an efficient algorithm LCM (Linear time Closed pattern Miner) for mining
frequent closed patterns from large transaction databases. The main theoretical con-
tribution is our proposed prefix-preserving closure extension of closed patterns, which
enables us to search all frequent closed patterns in a depth-first manner, in linear time
for the number of frequent closed patterns. Our algorithm do not need any storage space
for the previously obtained patterns, while the existing algorithms needs it. Performance
comparisons of LCM with straightforward algorithms demonstrate the advantages of
our prefix-preserving closure extension.

1 Introduction
Frequent pattern mining is one of the fundamental problems in data mining and has
many applications such as association rule mining [1, 5, 7] and condensed represen-
tation of inductive queries [12]. To handle frequent patterns efficiently, equivalence
classes induced by the occurrences had been considered. Closed patterns are maximal
patterns of an equivalence class.
This paper addresses the problems of enumerating all frequent closed patterns. For
solving this problem, there have been proposed many algorithms [14, 13, 15, 20, 21].
These algorithms are basically based on the enumeration of frequent patterns, that is,
enumerate frequent patterns, and output only those being closed patterns. The enumera-
tion of frequent patterns has been studied well, and can be done in efficiently short time
[1, 5]. Many computational experiments supports that the algorithms in practical take
very short time per pattern on average.
However, as we will show in the later section, the number of frequent patterns can
be exponentially larger than the number of closed patterns, hence the computation time
can be exponential in the size of datasets for each closed pattern on average. Hence, the
existing algorithms use heuristic pruning to cut off non-closed frequent patterns. How-
ever, the pruning are not complete, hence they still have possibilities to take exponential
time for each closed pattern.
Moreover, these algorithms have to store previously obtained frequent patterns in
memory for avoiding duplications. Some of them further use the stored patterns for
3
Presently working for Fujitsu Laboratory Ltd., e-mail: [email protected]
checking the “closedness” of patterns. This consumes much memory, sometimes ex-
ponential in both the size of both the database and the number of closed patterns. In
summary, the existing algorithms possibly take exponential time and memory for both
the database size and the number of frequent closed patterns. This is not only a theoreti-
cal observation but is supported by results of computational experiments in FIMI’03[7].
In the case that the number of frequent patterns is much larger than the number of fre-
quent closed patterns, such as BMS-WebView1 with small supports, the computation
time of the existing algorithms are very large for the number of frequent closed patterns.
In this paper, we propose a new algorithm LCM (Linear time Closed pattern Miner)
for enumerating frequent closed patterns. Our algorithm uses a new technique called
prefix preserving extension (ppc-extension), which is an extension of a closed pattern
to another closed pattern. Since this extension generates a new frequent closed pattern
from previously obtained closed pattern, it enables us to completely prune the unnec-
essary non-closed frequent patterns. Our algorithm always finds a new frequent closed
pattern in linear time of the size of database, but never take exponential time. In the
other words, our algorithm always terminates in linear time in the number of closed
patterns. Since any closed pattern is generated by the extension from exactly one of
the other closed patterns, we can enumerate frequent closed patterns in a depth-first
search manner, hence we need no memory for previously obtained patterns. Thereby,
the memory usage of our algorithm depends only on the size of input database. This
is not a theoretical result but our computational experiments support the practical effi-
ciency of our algorithm. The techniques used in our algorithm are orthogonal to the ex-
isting techniques, such as FP-trees, look-ahead, and heuristic preprocessings. Moreover,
we can add the existing techniques for saving the memory space so that our algorithm
can handle huge databases much larger than the memory size.
For practical computation, we propose occurrence deliver, anytime database reduc-
tion, and fast ppc-test, which accelerate the speed of the enumeration significantly. We
examined the performance of our algorithm for real world and synthesis datasets taken
from FIMI’03 repository, including the datasets used in KDD-cup [11], and compare
with straightforward implementations. The results showed that the performance of the
combination of these techniques and the prefix preserving extension is good for many
kinds of datasets.
In summary, our algorithm has the following advantages.
· Linear time enumeration of closed patterns
· No storage space for previously obtained closed patterns
· Generating any closed pattern from another unique closed pattern
· Depth-first generation of closed patterns
· Small practical computational cost of generation of a new pattern
The organization of the paper is as follows. Section 2 prepares notions and defini-
tions. In the Section 3, we give an example to show the number of frequent patterns can
be up to exponential to the number of closed patterns. Section 4 explains the existing
schemes for closed pattern mining, then present the prefix-preserving closure extension
and our algorithm. Section 5 describes several improvements for practical use. Section
6 presents experimental results of our algorithm and improvements on synthesis and
realworld datasets. We conclude the paper in Section 7.
2 Preliminaries
We give basic definitions and results on closed pattern mining according to [1, 13, 6].
Let I = {1, . . . , n} be the set of items. A transaction database on I is a set T =
{t1 , . . . , tm } such that each ti is included in I.PEach ti is called a transaction. We
denote the total size of T by ||T ||, i.e., ||T || = t∈T |t|. A subset P of I is called a
pattern (or itemset). For pattern P , a transaction including P is called an occurrence of
P . The denotation of P , denoted by T (P ) is the set of the occurrences of P . |T (P )|
is called the frequency of P, and denoted by f rq(P ). For given constant θ ∈ N, called
a minimum support, pattern P is frequent if f rq(P ) ≥ θ. For any patterns P and Q,
T (P ∪ Q) = T (P ) ∩ T (Q) holds, and if P ⊆ Q then T (Q) ⊇ T (P ).
Let T be a database and P be a pattern on I. For a pair of patterns P and Q, we
say P and Q are equivalent to each other if T (P ) = T (Q). The relationship induces
equivalence classes on patterns. A maximal pattern and a minimal pattern of an equiva-
lence class, w.r.t. set inclusion, are called a closed pattern and key pattern, respectively.
We denote by F and C the sets of all frequent patterns and the set of frequent closed
patterns in T , respectively. T
Given set S ⊆ T of transactions, let I(S) = T ∈S T be the set of items common
to all transactions
T in S. Then, we define the closure of pattern P in T , denoted by
Clo(P ), by T ∈T (P ) T . For every pair of patterns P and Q, the following properties
hold (Pasquier et al.[13]).
(1) If P ⊆ Q, then Clo(P ) ⊆ Clo(Q).
(2) If T (P ) = T (Q), then Clo(P ) = Clo(Q).
(3) Clo(Clo(P )) = Clo(P ).
(4) Clo(P ) is the unique smallest closed pattern including P .
(5) A pattern P is a closed pattern if and only if Clo(P ) = P .
Note that a key pattern is not the unique minimal element of an equivalence class,
while the closed pattern is unique. Here we denote the set of frequent closed patterns
by C, the set of frequent patterns by F, the set of items by I, and the size of database
by ||T ||.
For pattern P and item i ∈ P , let P (i) = P ∩ {1, . . . , i} be the subset of P
consisting only of elements no greater than i, called the i-prefix of P . Pattern Q is a
closure extension of pattern P if Q = Clo(P ∪ {i}) for some i 6∈ P . If Q is a closure
extension of P , then Q ⊃ P, and f rq(Q) ≤ f rq(P ).

3 Difference between Numbers of Frequent Patterns and Frequent

Closed Patterns
This section shows that the the number of frequent patterns can be quite larger than the
number of frequent closed patterns. To see it, we prove the following theorem. Here an
irredundant transaction database is a transaction database such that
· no two transactions are the same,
· no item itself is an infrequent pattern, and
· no item is included in all transactions.
Intuitively, if these conditions are satisfied, then we can neither contract nor reduce the
database.
Theorem 1. There are infinite series of irredundant transaction databases T such that
the number |F| of frequent patterns is exponential in m and n while the number |C| of
frequent closed patterns is O(m2 ), where m is the number of transactions in T and n
is the size of the itemset on which the transactions are defined. In particular, the size of
T is Θ(nm).
Proof. Let n be any number larger than 4 and m be any even number satisfying that
n − (dlg me + 2) is larger than both n and m for a constant . Let
X = {1, ..., n − 2(dlg me + 2)}
Y1 = {n − 2(dlg me + 2) + 1, ..., n − (dlg me + 2)}
Y2 = {n − (dlg me + 2) + 1, ..., n}
J1 := a subset of 2Y1 \ {∅, Y1 } of size m/2 − 1 without duplications
J2 := a subset of 2Y2 \ {∅, Y2 } of size m/2 − 1 without duplications.
Since |Y1 | = |Y2 | = dlg me + 2, such J1 and J2 always exist. Then, we construct
a database as follows.
S {X ∪ Y1 ∪ S|S ∈ J2 }
{X ∪ Y2 ∪ S|S ∈ J1 }
T := S
S {Y 1 ∪ Y2 }
{X}
The database has m transactions defined on an itemset of size n. The size of database
is Θ(nm).
We set the minimum support to m/2. We can see that no transactions are the same,
no item is included in all transactions, and any item is included in at least m/2 transac-
tions. Let F and C be the set of frequent patterns and the set of frequent closed patterns,
respectively.
Since any transaction in T except for Y1 ∪ Y2 includes X, any subset of X is a
frequent pattern. Since |X| = n − (dlg me + 2) is larger than both n and m , we see
that |F| is exponential in both n and m.
On the other hand, any frequent closed pattern includes X. Hence, any frequent
closed pattern is equal to X ∪S for some S ⊆ Y1 ∪Y2 . Since |Y1 ∪Y2 | = 2(dlg me+2),
we have
|C| ≤ 22(dlg me+2)
= 22 lg m+6
= 64m2 .
Therefore, |C| =O(m2 ). t
u

From the theorem, we can see that frequent pattern mining based algorithms can
take exponentially longer time for the number of closed patterns. Note that such patterns
may appear in real world data in part, because some transactions may share a common
large pattern.
4 Algorithm for Enumerating Closed Patterns
We will start with the existing schemes for closed pattern enumeration.

4.1 Previous Approaches for Closed Pattern Mining

A simple method of enumerating frequent closed patterns is to enumerate all frequent
patterns, classify them into equivalence classes, and find the maximal pattern for each
equivalent class. This method needs O(|F|2 ) time and O(|F|) memory, since pairwise
comparisons of their denotations are needed to classify frequent patterns. Although the
number of comparisons can be decreased by using some alphabetical sort of denotations
such as radix sort, it still needs O(|F|) computation. As we saw, |F| is possibly quite
larger than |C|, hence this method consumes a great deal of time and memory.
Some state-of-the-art algorithms for closed pattern mining, such as CHARM [21]
and CLOSET [15], use heuristic pruning methods to avoid generating unnecessary non-
closed patterns. Although these pruning methods efficiently cut off non-closed pattens,
the number of generated patterns is not bounded by a polynomial of |C|.
Pasquier et al.[13] proposed the use of closure operation to enumerate closed pat-
terns. Their idea is to generate frequent patterns, and check whether the patterns are
closed patterns or not by closure operation. Although this reduces the storage space for
non-closed patterns, the algorithm still requires |C| space. They actually generate fre-
quent key patterns instead of frequent patterns, to reduce the computational costs. Thus,
the computation time is linear in the number of frequent key patterns, which is less than
|F| but can be up to exponential in |C|.

4.2 Closed Patterns Enumeration in Linear Time with Small Space

Our algorithm runs in linear time in |C| with storage space significantly smaller than that
of Pasquier et al.’s algorithm [13], since it operates no non-closed patterns using depth-
first search The following lemmas provide a way of efficiently generating any closed
pattern from another closed pattern. In this way, we construct our basic algorithm. Then,
the basic algorithm is improved to save the memory use by using ppc extension.
Lemma 1. Let P and Q, P ⊆ Q be patterns having the same denotation, i.e., T (P ) =
T (Q). Then, for any item i 6∈ P, T (P ∪ {i}) = T (Q ∪ {i}).
Proof. T (P ∪ {i}) = T (P ) ∩ T ({i}) = T (Q) ∩ T ({i}) = T (Q ∪ {i}). t
u
Lemma 2. Any closed pattern P 6= ⊥ is a closure extension of other closed patterns.
Proof. Let Q be a pattern obtained by repeatedly removing items from P until its deno-
tation changes, and i be the item removed last. Then, Clo(Q ∪ {i}) = P. Such a pattern
must exist since P 6= ⊥. Since T (Q) 6= T (Q ∪ {i}), i 6∈ Clo(Q). From Property 1,
Clo(Q ∪ {i}) = Clo(Clo(Q) ∪ {i}). Thus, P is a closure extension of Q ∪ {i}. t
u

Through these lemmas, we can see that all closed patterns can be generated by
closure extensions to closed patterns. It follows the basic version of our algorithm,
Algorithm Closure version
1. D := {⊥}
2. D0 := { Clo(P ∪ {i}) | P ∈ D, i ∈ I\P }
3. if D0 = ∅ then output D ; halt
4. D := D ∪ D 0 ; go to 2

Fig. 1. Basic algorithm for enumerating frequent closed patterns

which uses levelwise (breadth-first) search similar to Apriori type algorithms [1] using
closed expansion instead of tail expansion. We describe the basic algorithm in Figure 1.
Since the algorithm deals with no non-closed pattern, the computational cost depends
on |C| but not on |F|. However, we still need much storage space to keep D in memory.
A possible improvement is to use depth-first search instead of Apriori-style lev-
elwise search. For enumerating frequent patterns, Bayardo [5] proposed an algorithm
based on tail extension, which is an extension of a pattern P by an item larger than the
maximum item of P . Since any frequent pattern is a tail extension of another unique
frequent pattern, the algorithm enumerates all frequent patterns without duplications in
a depth-first manner, with no storage space for previously obtained frequent patterns.
This technique is efficient, but cannot directly be applied to closed pattern enumeration,
since a closed pattern is not always a tail-extension of another closed pattern.
We here propose prefix-preserving closure extension satisfying that any closed pat-
tern is an extension of another unique closed pattern unifying ordinary closure-expansion
and tail-expansion. This enables depth-first generation with no storage space.

4.3 Prefix-Preserving Closure Extension

We start with definitions. Let P be a closed pattern. The core index of P , denoted by
core i(P ), is the minimum index i such that T (P (i)) = T (P ). We let core i(⊥) = 0.
Here we give the definition of ppc-extension. Pattern Q is called a prefix-preserving
closure extension (ppc-extension) of P if

(i) Q = Clo(P ∪ {i}) for some i ∈ P , that is, Q is obtained by first adding i to P
and then taking its closure,
(ii) item i satisfies i 6∈ P and i > core i(P ), and
(iii) P (i − 1) = Q(i − 1), that is, the (i − 1)-prefix of P is preserved.

Actually, ppc-extension satisfies the following theorem. We give an example in Fig. 2.

Theorem 2. Let Q 6= ⊥ be a closed pattern. Then, there is just one closed pattern P
such that Q is a ppc-extension of P .

To prove the theorem, we state several lemmas.

Lemma 3. Let P be a closed pattern and Q = Clo(P ∪ {i}) be a ppc-extension of P .

Then, i is the core index of Q.
transaction database
1 2 3 4 5 6

1234 56
1 2 4 5 6
23 5
2 5 2 3 5
12 4 56 2 4 2 5 3 4 6 4 6
2 4
1 4 6
34 6 1 4 6 2 3 4 6

Fig. 2. Example of all closed patterns and their ppc extensions. Core indices are circled

Proof. Since i > core i(P ), we have T (P ) = T (P (i)). From Lemma 2, Clo(P (i) ∪
{i}) = Clo(P ∪ {i}) = Clo(Q), thus core i(Q) ≤ i. Since the extension preserves the
i-prefix of P , we have P (i − 1) = Q(i − 1). Thus, Clo(Q(i − 1)) = Clo(P (i − 1)) =
P 6= Q. It follows that core i(Q) > i − 1, and we conclude core i(Q) = i. t
u

Let Q be a closed pattern and P(Q) be the set of closed patterns such that Q is
their closure extension. We show that Q is a ppc-extension of a unique closed pattern
of P(Q).
Lemma 4. Let Q 6= ⊥ be a closed pattern, and P = Clo(Q(core i(Q) − 1)). Then,
Q is a ppc-extension of P.
Proof. Since T (P ) = T (Q(core i(Q)−1)), we have T (P ∪{i}) = T (Q(core i(Q)−
1) ∪ {i}) = T (Q(core i(Q))). This implies Q = Clo(P ∪ {i}), thus Q satisfies
condition (i) of ppc-extension. Since P = Clo(Q(core i(Q) − 1)), core i(P ) ≤ i − 1.
Thus, Q satisfies condition (ii) of ppc-extension. Since P ⊂ Q and Q(i − 1) ⊆ P, we
have P (i − 1) = Q(i − 1). Thus, Q satisfies condition (iii) of ppc-extension. t
u

proof of Theorem 2: From Lemma 4, there is at least one closed pattern P in P(Q)
such that Q is a ppc-extension of P. Let P = Clo(Q(core i(Q) − 1)). Suppose that
there is a closed pattern P 0 6= P such that Q is a ppc-extension of P 0 . From lemma 3,
Q = Clo(P 0 ∪{i}). Thus, from condition (iii) of ppc-extension, P 0 (i−1) = Q(i−1) =
P (i − 1). This together with T (P ) = T (P (i − 1)) implies that T (P ) ⊃ T (P 0 ). Thus,
we can see T (P 0 (i − 1)) 6= T (P 0 ), and core i(P 0 ) ≥ i. This violates condition (ii) of
ppc-extension, and is a contradiction. t
u

From this theorem, we obtain our algorithm LCM, described in Figure 3, which
generate ppc-extensions for each frequent closed pattern.
Since the algorithm takes O(||T (P )||) time to derive the closure of each P ∪ {i},
we obtain the following theorem.
Algorithm LCM(T :transaction database, θ:support )
1. call E NUM C LOSED PATTERNS(⊥) ;
Procedure E NUM C LOSED PATTERNS( P : frequent closed pattern )
2. if P is not frequent then Return;
2. output P ;
3. for i = core i(P ) + 1 to |I|
4. Q = Clo(P ∪ {i});
5. If P (i − 1) = Q(i − 1) then // Q is a ppc-extension of P
6. Call E NUM C LOSED PATTERNS(Q);
7. End for

Fig. 3. Description of Algorithm LCM

Theorem 3. Given a database T , the algorithm LCM enumerates all frequent closed
patterns in O(||T (P )|| × |I|) time for each pattern P with O(||T ||) memory space.

The time and space complexities of the existing algorithms [21, 15, 13] are O(||T ||×
|F|) and O(||T || + |C| × |I|), respectively. As we saw in the example in Section 3, the
difference between |C| and |F|, and the difference between |C| × |I| and ||T || can
be up to exponential. As Compared with our basic algorithm, the ppc extension based
algorithm exponentially reduces the memory complexity when O(|C|) is exponentially
larger than ||T ||. In practice, such exponential differences often occur (see results in
Section 6). Thus, the performance of our algorithm possibly exponentially better than
the existing algorithms in some instances.

5 Reducing Practical Computation Time

The computation time of LCM described in the previous section is linear in |C|, with a
factor depending on ||T || × |I| for each closed pattern P ∈ C. However, this still takes
a long time if it is implemented in a straightforward way. In this section, we propose
some techniques for speeding up frequency counting and closure operation. These tech-
niques will increase practical performance of the algorithms and incorporated into the
implementations used in the experiments in Section 6 of the paper, although indepen-
dent of our main contribution. In Figure 7, we describe the details of LCM with these
practical techniques.

5.1 Occurrence Deliver

Occurrence deliver reduces the construction time for T (P ∪ {i}), which is used for
frequency counting and closure operation. This technique is particularly efficient for
sparse datasets, such that |T (P ∪ {i})| is much smaller than |T (P )| on average. In a
usual way, T (P ∪ {i}) is obtained by T (P ) ∩ T (P ∪ {i}) in O(|T (P )| + |T (P ∪ {i})|)
time (this is known as down-project []). Thus, generating all ppc-extensions needs |I|
scans and takes O(||T (P )||) time.
Instead of this, we build for all i = core i(P ), . . . , |I| denotations T (P ∪ {i}),
simultaneously, by scanning the transactions in T (P ). We initialize X[i] := ∅ for all i.
A 2 3 4 5 7

B 1 3 4 5 6 8

C 2 3 5 6 7

T({5})
A A A
B B B B
C C C
X[4] X[5] X[6] X[7] X[8]

Occurrences of {5,i} for i=4,…,8

Fig. 4. Occurrence deliver: build up denotations by inserting each transaction to each its member

1 2 345
1 2 3 5 7 9 1 235 (× 2)
1 2 3 5 1 2359 (× 2)
1 2 345 89 1 2 (× 2)
1 2 7 1 23 9 (× 1)
1 2 3 9
1 2 6

Fig. 5. Example of anytime database reduction

For each t ∈ T (P ) and for each i ∈ t; i > core i(P ), we insert t to X[i]. Then, each
X[i] is equal to T (P ∪ {i}). See Fig. 4 for the explanation. This correctly computes
T (P ∪{i}) for all i in O(||T (P )||). Table 1 shows results of computational experiments
where the number of item-accesses were counted. The numbers are deeply related to the
performance of frequency counting heavily depending on the number of item-accesses.

Table 1: the accumulated number of item-accesses over all iterations:

Dataset and support connect,65% pumsb,80%, BMS-webview2,0.2% T40I10D100K,5%
Straightforward 914074131 624940309 870850439 830074845
Occurrence deliver 617847782 201860874 17217493 73406900
Reduction factor 1.47 3.09 50.5 11.3

5.2 Anytime Database Reduction

In conventional practical computations of frequent pattern mining, the size of the input
database is reduced by removing infrequent items i such that T ({i}) < θ, and then
merging the same transactions into one. Suppose that we are given the database on the
left hand side of Fig. 5, and minimum support 3.
In the database, items 4, 6, 7, and 8 are included in at most two transactions, hence
patterns including these items can not be frequent. Items 1 and 2 are included in all
transactions, hence any frequent patterns can contain any subset of {1, 2}. It follows
that items 1 and 2 are redundant. Hence, we remove items 1, 2, 4, 6, 7, and 8 from the
database. After the removal, we merge the same transactions into one. We then record
the multiplicity of transactions. Consequently, we have a smaller database, right hand
side of Fig. 5, with fewer items and fewer transactions, which includes the same set
of frequent patterns. We call this operation database reduction (or database filtering).
Many practical algorithms utilize this operation at initialization. Database reduction is
known to be efficient in practice, especially for large support [7].
Here we propose anytime database reduction, which is to apply database reduction
in any iteration. For an iteration (invokation) of Enum ClosedPatterns inputting pat-
tern P, the following items and transactions are unnecessary for any of its descendant
iterations:

(1) transactions not including P ,

(2) transactions including no item greater than core i(P ),
(3) items included in P
(4) items less than or equal to core i(P ), and
(5) items i satisfies that f rq(P ∪ {i}) < θ (no frequent pattern includes i).

In each iteration, we restrict the database to T (P ), apply database reduction to

remove such items and transactions, merge the same transactions, and pass it to child
iterations.
Anytime database reduction is efficient especially when support is large. In the ex-
periments, in almost all iterations, the size of the reduced database is bounded by a
constant even if the patterns have many occurrences. Table 2 lists simple computational
experiments comparing conventional database reduction (applying reduction only at
initialization) and anytime database reduction. Each cell shows the accumulated num-
ber of transactions in the reduced databases in all iterations.

Table 2: Accumulated number of transactions in database in all iterations

Dataset and support connect, 50% pumsb, 60%, BMS-WebView2, 0.1% T40I10D100K, 0.03%
Database reduction 188319235 2125460007 2280260 1704927639
Anytime database reduction 538931 7777187 521576 77371534
Reduction factor 349.4 273.2 4.3 22.0

We also use anytime database reduction for closure operation. Suppose that we have
closed pattern P with core index i. Let transactions t1 , ..., tk ∈ T (P ) have the same
i-suffix, i.e., t1 ∩ {i, ..., |I|} = t2 ∩ {i, ..., |I|} =, ..., = tk ∩ {i, ..., |I|}.

Lemma 5. Let j; j < i be an item, and j 0 be items such that j 0 > i and included in all
t1 , ..., tk . Then, if j is not included in at least one transaction of t1 , ..., tk , j 6∈ Clo(Q)
holds for any pattern Q including P ∪ {i}.

Proof. Since T (Q) includes all t1 , ..., tk , and j is not included in one of t1 , ..., tk .
Hence, j is not included in Clo(Q). t
u
A 2 3 4 5 7

B 1 3 4 5 6 8

C 2 3 5 6 7

D 1 2 3 4 5 6 7 8
T({5})
E 1 2 3 5 7 8

F 2 3 4 5 6 7 8

Fig. 6. Transaction A has the minimum size. Fast ppc test accesses only circled items while
closure operation accesses all items

According to this lemma, we can remove j from t1 , ..., tk from the database if j is
not included
T in at least one of t1 , ..., tk . By removing all such items, t1 , ..., tk all be-
come h=1,...,k th . Thus, we can merge them into one transaction similar to the above
reduction. This reduces the number of transactions as much as the reduced database for
frequency counting. Thus, the computation time of closure operation is shorten drasti-
cally. We describe the details on anytime database reduction for closure operation.
1. Remove transactions not including P
2. Remove items i such that f rq(P ∪ {i}) < θ
3. Remove items of P
4. Replace the transactions
T T1 , ..., Tk having the same i-suffix by the intersection,
i.e., replace them by {T1 , ..., Tk }.

5.3 Fast Prefix-Preserving Test

Fast prefix-preserving test (fast ppc-test) efficiently checks condition (iii) of ppc-extension
for P ∪ {i}. A straightforward way for this task is to compute the closure of P ∪ {i}.
This usually takes much time since it accesses all items in the occurrences of P . Instead
of this, we check for each j whether j; j < i, j 6∈ P (i − 1) is included in Clo(P ∪ {i})
or not. Item j is included in Clo(P ∪ {i}) if and only if j is included in every transac-
tion of T (P ∪ {i}). If we find a transaction not including j, then we can immediately
conclude that j is not included in every transaction of T (P ). Thus, we do not need to
access all items in occurrences. In particular, we have to check items j in occurrence
t∗ of the minimum size, since other items can not be included in every transaction of
T (P ). Fig. 6 showsPan example of accessed items by fast ppc test.
This results O( j∈t∗ \P |T (P ∪ {i} ∪ {j})|) time algorithm, which is much faster
P
than the straightforward algorithm with O( j<i |T (P ∪ {i} ∪ {j})|) time. Table 3
shows the results of computational experiments comparing the number of accesses be-
tween closure operation and fast ppc test.

Table 3: Accumulated number of accessed items in all iterations:

Dataset and support connect,60% pumsb,75%, BMS-WebVeiw2,0.1% T40I10D100K,0.1%
Closure operation 1807886455 741205890 50313395 2093327534
Fast ppc test 2333551 2703318 127722 1701748
Reduction factor 774.7 274.1 393.9 1230
Algorithm LCM (T :transaction database, θ:support)
1. call E NUM C LOSED PATTERNS(T , ⊥, T ) ;

Procedure E NUM C LOSED PATTERNS (T : transaction database,

P :frequent closed pattern, Occ: transactions including P )
2. Output P ;
3. Reduce T by Anytime database reduction;
4. Compute the frequency of each pattern P ∪ {i}, i > core i(P )
by Occurrence deliver with P and Occ ;
5. for i := core i(P ) + 1 to |I|
6. Q := Clo(P ∪ {i});
7. If P (i − 1) = Q(i − 1) and Q is frequent then // Q is a ppc-extension of P
Call E NUM C LOSED PATTERNS(Q);
8. End for

The version of fast-ppc test is obtained by replacing the lines 6 and 7 by

6. if fast-ppc test is true for P ∪ {i} and P ∪ {i} is frequent then
Call E NUM C LOSED PATTERNS(Q);

Fig. 7. Algorithm LCM with practical speeding up

In fact, the test requires an adjacency matrix (sometimes called a bitmap) repre-
senting the inclusion relation between items and transactions. The adjacency matrix
requires O(|T | × |I|) memory, which is quite hard to store for large instances. Hence,
we keep columns of the adjacency matrix for only transactions larger than |I|/δ, where
δ is a constant number. In this way, we can check whether j ∈ t or not in constant time
if j is large, and also in short time by checking items of t if t is small. The algorithm
uses O(δ × ||T ||) memory, which is linear in the input size.

6 Computational Experiments

This section shows the results of computational experiments for evaluating the practi-
cal performance of our algorithms on real world and synthetic datasets. Fig. 8 lists the
datasets, which are from the FIMI’03 site (http://fimi.cs.helsinki.fi/):
retail, accidents; IBM Almaden Quest research group website (T10I4D100K); UCI ML
repository (connect, pumsb); (at
http://www.ics.uci.edu/ mlearn/MLRepository.html) Click-stream
Data by Ferenc Bodon (kosarak); KDD-CUP 2000 [11] (BMS-WebView-1, BMS-POS)
( at http://www.ecn.purdue.edu/KDDCUP/).
To evaluate the efficiency of ppc extension and practical improvements, we imple-
mented several algorithms as follows.

· freqset: algorithm using frequent pattern enumeration

· straight: straightforward implementation of LCM (frequency counting by tuples)
· occ: LCM with occurrence deliver
· occ+dbr: LCM with occurrence deliver and anytime database reduction for
both frequency counting and closure
Fig. 8. Datasets: AvTrSz means average transaction size
Dataset #items #Trans AvTrSz |F| |C| support (%)
BMS-POS 1,657 517,255 6.5 122K–33,400K 122K–21,885K 0.64–0.01
BMS-Web-View1 497 59,602 2.51 3.9K–NA 3.9K–1,241K 0.1–0.01
T10I4D100K 1,000 100,000 10.0 15K–335K 14K–229K 0.15–0.025
kosarak 41,270 990,000 8.10 0.38K–56,006K 0.38K–17,576K 1–0.08
retail 16,470 88,162 10.3 10K–4,106K 10K–732K 0.32–0.005
accidents 469 340,183 33.8 530–10,692K 530–9,959K 70–10
pumsb 7,117 49,046 74.0 2.6K–NA 1.4K–44,453K 95–60
connect 130 67,577 43.0 27K–NA 3.4K–8,037K 95–40

Fig. 9. Computation time for datasets

1000 10000
T10I4D100K BMS-pos
1000
time(sec), #solutions(1000)

100 freqset freqset

straight straight

time(sec), #solutions(1000)
occ occ
100
occ+dbr occ+dbr
10
occ+ftest occ+ftest
#freq closed #freq closed
#freqset 10 #freqset
0.64% 0.32% 0.16% 0.08% 0.04% 0.02% 0.01%
1
0.16% 0.08% 0.04% 0.02% 0.01% 0.005% support
support 1

1000 10000
BMS- kosarak
WebView1
1000
100
freqset freqset
time(sec), #solutions(1000)
time(sec), #solutions(1000)

straight straight
100
occ occ
10
occ+dbr occ+dbr
occ+ftest 10 occ+ftest
#freq closed #freq closed
1 #freqset
#freqset 1
0.16% 0.08% 0.06% 0.04% 0.02% 0.01%
1%

support 8%
support
0.8

0.6

0.4

0.2

0.1

0.0

0.1 0.1

1000 1000
retail accidents
100 100 freqset
freqset
time(sec), #solutions(1000)

straight
time(sec), #solutions(1000)

straight occ
occ
10 10 occ+dbr
occ+dbr
occ+ftest
occ+ftest
#freq closed
#freq closed
#freqset
1 #freqset 1
70% 60% 50% 40% 30% 20% 10%
2%

%
05
0.3

0.1

0.0

support
0.0

support
0.1 0.1

10000 10000
connect pumsb
1000 1000
freqset
freqset
time(sec), #solutions(1000)
time(sec), #solutions(1000)

straight
straight
100 100 occ
occ
occ+dbr
occ+dbr
occ+ftest
10 occ+ftest 10
#freq closed
#freq closed
#freqset
1 #freqset 1
90% 80% 70% 60% 50% 40% 30% 20% 10% 90% 80% 70% 60% 50% 40%
support support
0.1 0.1
· occ+fchk: LCM with occurrence deliver, anytime database reduction for
frequency counting, and fast ppc test

The figure also displays the number of frequent patterns and frequent closed pat-
terns, which are written as #freqset and #freq closed. The algorithms were implemented
in C and compiled with gcc 3.3.1. The experiments were run on a notebook PC with mo-
bile Pentium III 750MHz, 256MB memory. Fig. 9 plots the running time with varying
minimum supports for the algorithms on the eight datasets.
From Fig. 9, we can observe that LCM with practical optimization (occ, occ+dbr,
occ+fchk) outperforms the frequent pattern enumeration-based algorithm (freqset). The
speed up ratio of the ppc extension algorithm (straight) against algorithm freqset totally
depends on the ratio of #freqset and #freq. closed. The ratio is quite large for several
real world and synthetic datasets with small supports, such as BMS-WebView, retails,
pumsb, and connect. On such problems, sometimes the frequent pattern based algorithm
takes quite long time while LCM terminates in short time.
Occurrence deliver performs very well on any dataset, especially on sparse datasets,
such as BMS-WebView, retail, and IBM datasets. In such sparse datasets, since T ({i})
is usually larger than T (P ∪ {i}), occurrence deliver is efficient.
Anytime database reduction decreases the computation time well, especially in
dense datasets or those with large supports. Only when support is very small, closed
to zero, the computation time were not shorten, since a few items are eliminated and
few transactions become the same. In such cases, fast ppc test performs well. However,
fast ppc test does not accelerate speed so much in dense datasets or large supports.
For a detailed study about the performance of our algorithm compared with other al-
gorithms, consult the companion paper [18] and the competition report of FIMI’03 [7].
Note that the algorithms we submitted to [18, 7] were old versions, which does not in-
clude anytime database reduction, thus they are slower than the algorithm in this paper.

7 Conclusion

We addressed the problem of enumerating all frequent closed patterns in a given trans-
action database, and proposed an efficient algorithm LCM to solve this, which uses
memory linear in the input size, i.e., the algorithm does not store the previously ob-
tained patterns in memory. The main contribution of this paper is that we proposed
prefix-preserving closure extension, which combines tail-extension of [5] and closure
operation of [13] to realize direct enumeration of closed patterns.
We recently studied frequent substructure mining from ordered and unordered trees
based on a deterministic tree expansion technique called the rightmostexpansion [2–
4]. There have been also pioneering works on closed pattern mining in sequences and
graphs [17, 19]. It would be an interesting future problem to extend the framework of
prefix-preserving closure extension to such tree and graph mining.

Acknowledgment
We would like to thank Professor Ken Satoh of National Institute of Informatics and
Professor Kazuhisa Makino of Osaka University for fruitful discussions and comments
on this issue. This research was supported by group research fund of National Institute
of Informatics, Japan. We are also grateful to Professor Bart Goethals, and people sup-
porting FIMI’03 Workshop/Repository, and the authors of the datasets for the datasets
available by the courtesy of them.

References
1. R. Agrawal,H. Mannila,R. Srikant,H. Toivonen,A. I. Verkamo, Fast Discovery of Association
Rules, In Advances in Knowledge Discovery and Data Mining, MIT Press, 307–328, 1996.
2. T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, S. Arikawa, Efficient Substructure
Discovery from Large Semi-structured Data, In Proc. SDM’02, SIAM, 2002.
3. T. Asai, H. Arimura, K. Abe, S. Kawasoe, S. Arikawa, Online Algorithms for Mining Semi-
structured Data Stream, In Proc. IEEE ICDM’02, 27–34, 2002.
4. T. Asai, H. Arimura, T. Uno, S. Nakano, Discovering Frequent Substructures in Large Un-
ordered Trees, In Proc. DS’03, 47–61, LNAI 2843, 2003.
5. R. J. Bayardo Jr., Efficiently Mining Long Patterns from Databases, In Proc. SIGMOD’98,
85–93, 1998.
6. Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, L. Lakhal, Mining Frequent Patterns with
Counting Inference, SIGKDD Explr., 2(2), 66–75, Dec. 2000.
7. B. Goethals, the FIMI’03 Homepage, http://fimi.cs.helsinki.fi/, 2003.
8. E. Boros, V. Gurvich, L. Khachiyan, K. Makino, On the Complexity of Generating Maximal
Frequent and Minimal Infrequent Sets, In Proc. STACS 2002, 133-141, 2002.
9. D. Burdick, M. Calimlim, J. Gehrke, MAFIA: A Maximal Frequent Itemset Algorithm for
Transactional Databases, In Proc. ICDE 2001, 443-452, 2001.
10. J. Han, J. Pei, Y. Yin, Mining Frequent Patterns without Candidate Generation, In Proc. SIG-
MOD’00, 1-12, 2000
11. R. Kohavi, C. E. Brodley, B. Frasca, L. Mason, Z. Zheng, KDD-Cup 2000 Organizers’ Re-
port: Peeling the Onion, SIGKDD Explr., 2(2), 86-98, 2000.
12. H. Mannila, H. Toivonen, Multiple Uses of Frequent Sets and Condensed Representations,
In Proc. KDD’96, 189–194, 1996.
13. N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Efficient Mining of Association Rules Using
Closed Itemset Lattices, Inform. Syst., 24(1), 25–46, 1999.
14. N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Discovering Frequent Closed Itemsets for
Association Rules, In Proc. ICDT’99, 398-416, 1999.
15. J. Pei, J. Han, R. Mao, CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets,
In Proc. DMKD’00, 21-30, 2000.
16. R. Rymon, Search Through Systematic Set Enumeration, In Proc. KR-92, 268–275, 1992.
17. P. Tzvetkov, X. Yan, and J. Han, TSP: Mining Top-K Closed Sequential Patterns, In
Proc. ICDM’03, 2003.
18. T. Uno, T. Asai, Y. Uchida, H. Arimura, LCM: An Efficient Algorithm for Enumerating Fre-
quent Closed Item Sets, In Proc. IEEE ICDM’03 Workshop FIMI’03, 2003. (Available as
CEUR Workshop Proc. series, Vol. 90, http://ceur-ws.org/vol-90)
19. X. Yan and J. Han, CloseGraph: Mining Closed Frequent Graph Patterns, In Proc. KDD’03,
ACM, 2003.
20. M. J. Zaki, Scalable Algorithms for Association Mining, Knowledge and Data Engineering,
12(2), 372–390, 2000.
21. M. J. Zaki, C. Hsiao, CHARM: An Efficient Algorithm for Closed Itemset Mining, In
Proc. SDM’02, SIAM, 457-473, 2002.

Autonomous Database Exam Questions and Answers
No ratings yet
Autonomous Database Exam Questions and Answers
137 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
P8 FPBasic
No ratings yet
P8 FPBasic
53 pages
06 FPBasic
No ratings yet
06 FPBasic
74 pages
UNIT-3 DM
No ratings yet
UNIT-3 DM
9 pages
Module 3
No ratings yet
Module 3
136 pages
Association
No ratings yet
Association
40 pages
Unit 3
No ratings yet
Unit 3
62 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
06 FPBasic
No ratings yet
06 FPBasic
37 pages
Tutorial 02
No ratings yet
Tutorial 02
17 pages
Notes 4 DWM Data Mining
No ratings yet
Notes 4 DWM Data Mining
34 pages
2007 Jiawei Han FP Mining
No ratings yet
2007 Jiawei Han FP Mining
32 pages
Slides 06FPBasic
No ratings yet
Slides 06FPBasic
30 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
67 pages
FP Growth Alg
No ratings yet
FP Growth Alg
17 pages
Concepts and Techniques: - Chapter 6
No ratings yet
Concepts and Techniques: - Chapter 6
64 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
06 Apriori
No ratings yet
06 Apriori
36 pages
06 FPBasic
No ratings yet
06 FPBasic
59 pages
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
No ratings yet
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
66 pages
Week 3
No ratings yet
Week 3
56 pages
DWDWM Unit2
No ratings yet
DWDWM Unit2
59 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
37 pages
Frequent Itemset Mining
No ratings yet
Frequent Itemset Mining
58 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Comparing The Performance of Frequent Pattern Mini
No ratings yet
Comparing The Performance of Frequent Pattern Mini
5 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
06 FPBasic
No ratings yet
06 FPBasic
65 pages
Mining Frequent Patterns From Very High Dimensional Data: A Top-Down Row Enumeration Approach
No ratings yet
Mining Frequent Patterns From Very High Dimensional Data: A Top-Down Row Enumeration Approach
12 pages
Efficient Algorithm For Mining Frequent Patterns Java Project
No ratings yet
Efficient Algorithm For Mining Frequent Patterns Java Project
38 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
5 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
NGDM07 Philip Yu
No ratings yet
NGDM07 Philip Yu
22 pages
C ARM: An Efficient Algorithm For Closed Association Rule Mining
No ratings yet
C ARM: An Efficient Algorithm For Closed Association Rule Mining
20 pages
Veloso Sbac03
No ratings yet
Veloso Sbac03
8 pages
Generic Pattern Mining
No ratings yet
Generic Pattern Mining
17 pages
DM Unit-2
No ratings yet
DM Unit-2
14 pages
PowerProtect Data Manager Troubleshooting - Participant Guide
No ratings yet
PowerProtect Data Manager Troubleshooting - Participant Guide
45 pages
Association Rule Mining Closed Maximal
No ratings yet
Association Rule Mining Closed Maximal
17 pages
Fast and Memory Efficient Mining of Frequent Closed Itemsets
No ratings yet
Fast and Memory Efficient Mining of Frequent Closed Itemsets
18 pages
p132 Closet
No ratings yet
p132 Closet
11 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
33 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
14 pages
Chat Application
100% (1)
Chat Application
20 pages
ETP-Mine: An Efficient Method For Mining Transitional Patterns
No ratings yet
ETP-Mine: An Efficient Method For Mining Transitional Patterns
9 pages
Mining Frequent Patterns From Very High Dimensional Data: A Top-Down Row Enumeration Approach
No ratings yet
Mining Frequent Patterns From Very High Dimensional Data: A Top-Down Row Enumeration Approach
12 pages
Itemset Mining Over Large Transactional Tables On The Relational Databases
No ratings yet
Itemset Mining Over Large Transactional Tables On The Relational Databases
6 pages
Single-Pass Interesting Frequent Pattern Mining: Without Support Threshold
No ratings yet
Single-Pass Interesting Frequent Pattern Mining: Without Support Threshold
2 pages
Human Resource Information System
No ratings yet
Human Resource Information System
20 pages
Closet - An Efficient Algorithm For Mining Frequent
No ratings yet
Closet - An Efficient Algorithm For Mining Frequent
8 pages
Elasticsearch Optimization
No ratings yet
Elasticsearch Optimization
25 pages
Unit V-Apache Pig
No ratings yet
Unit V-Apache Pig
10 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
TDD and Migration Document For OIC
No ratings yet
TDD and Migration Document For OIC
17 pages
Curriculum Evaluation System: A Synopsis Submitted To Kalinga University Raipur (C.G.), India
No ratings yet
Curriculum Evaluation System: A Synopsis Submitted To Kalinga University Raipur (C.G.), India
34 pages
Sap Hana
No ratings yet
Sap Hana
13 pages
Point-Biserial Correlation Coefficient
No ratings yet
Point-Biserial Correlation Coefficient
3 pages
Himani Parikh Resume
No ratings yet
Himani Parikh Resume
2 pages
Altova Umodel 2022 Enterprise Edition: User & Reference Manual
No ratings yet
Altova Umodel 2022 Enterprise Edition: User & Reference Manual
1,379 pages
hw4 Mongodb
No ratings yet
hw4 Mongodb
5 pages
Lecture 2 Data Access
No ratings yet
Lecture 2 Data Access
40 pages
Eb Data Warehouse Automation in Azure For Dummies en
No ratings yet
Eb Data Warehouse Automation in Azure For Dummies en
46 pages
Assignment Sum22
No ratings yet
Assignment Sum22
4 pages
Chapter 10
No ratings yet
Chapter 10
52 pages
Scou 220 Manual T 03
No ratings yet
Scou 220 Manual T 03
18 pages
Dice Resume CV Che Ndipowa
No ratings yet
Dice Resume CV Che Ndipowa
3 pages
6th Sem Syllabus
No ratings yet
6th Sem Syllabus
18 pages
OPC UA Cloud Initiative Flyer HM 2025 Lay17-PaulB
No ratings yet
OPC UA Cloud Initiative Flyer HM 2025 Lay17-PaulB
8 pages
Dbms Unit 1 Bca 1 Notes For Dbms
No ratings yet
Dbms Unit 1 Bca 1 Notes For Dbms
32 pages
Final - BCA SEMESTER-III 2022-23
No ratings yet
Final - BCA SEMESTER-III 2022-23
12 pages
Exercise 7,8,9 Basic Commands
No ratings yet
Exercise 7,8,9 Basic Commands
7 pages
GR 12 MR Longs Exam Guide 2021 - CAT
No ratings yet
GR 12 MR Longs Exam Guide 2021 - CAT
6 pages
Guide To Design Database For Newsletter in MySQL
No ratings yet
Guide To Design Database For Newsletter in MySQL
9 pages
10th International Conference On Data Mining and Database Management Systems (DMDBS 2024)
No ratings yet
10th International Conference On Data Mining and Database Management Systems (DMDBS 2024)
2 pages
Zachman Framework in Teaching Information Systems: July 2003
No ratings yet
Zachman Framework in Teaching Information Systems: July 2003
7 pages
Asset Performance Testing: 1. Overview
No ratings yet
Asset Performance Testing: 1. Overview
8 pages
Richard Anderson Data Stage Resume
No ratings yet
Richard Anderson Data Stage Resume
1 page
Design And Analysis Of Algorithm
From Everand
Design And Analysis Of Algorithm
Bhupendra Mandloi
No ratings yet
IGNOU Operating System Previous Years Solved Papers
From Everand
IGNOU Operating System Previous Years Solved Papers
Manish Soni
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet

An Efficient Algorithm For Enumerating Closed Patterns in Transaction Databases

Uploaded by

An Efficient Algorithm For Enumerating Closed Patterns in Transaction Databases

Uploaded by

An Efficient Algorithm for Enumerating Closed

Patterns in Transaction Databases

Takeaki Uno1 , Tatsuya Asai2 3 , Yuzo Uchida2 , and Hiroki Arimura2

3 Difference between Numbers of Frequent Patterns and Frequent

4.1 Previous Approaches for Closed Pattern Mining

4.2 Closed Patterns Enumeration in Linear Time with Small Space

Fig. 1. Basic algorithm for enumerating frequent closed patterns

4.3 Prefix-Preserving Closure Extension

Actually, ppc-extension satisfies the following theorem. We give an example in Fig. 2.

To prove the theorem, we state several lemmas.

Lemma 3. Let P be a closed pattern and Q = Clo(P ∪ {i}) be a ppc-extension of P .

Fig. 3. Description of Algorithm LCM

5 Reducing Practical Computation Time

5.1 Occurrence Deliver

Occurrences of {5,i} for i=4,…,8

Fig. 5. Example of anytime database reduction

Table 1: the accumulated number of item-accesses over all iterations:

5.2 Anytime Database Reduction

(1) transactions not including P ,

In each iteration, we restrict the database to T (P ), apply database reduction to

Table 2: Accumulated number of transactions in database in all iterations

5.3 Fast Prefix-Preserving Test

Table 3: Accumulated number of accessed items in all iterations:

Procedure E NUM C LOSED PATTERNS (T : transaction database,

The version of fast-ppc test is obtained by replacing the lines 6 and 7 by

Fig. 7. Algorithm LCM with practical speeding up

· freqset: algorithm using frequent pattern enumeration

Fig. 9. Computation time for datasets

100 freqset freqset

You might also like