0% found this document useful (0 votes)

16 views19 pages

An Improved Frequent Pattern Tree The Child Struct

This paper presents the Child Structured Frequent Pattern Tree (CSFP-tree), an enhanced version of the FP-tree designed to improve frequent itemset mining efficiency. The CSFP-tree incorporates a child search subtree for each node, which significantly enhances search performance compared to traditional FP-tree structures. Experimental results demonstrate that the CSFP-tree outperforms the FP-tree and its variations across various datasets.

Uploaded by

jvsjoaovictorjvs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views19 pages

An Improved Frequent Pattern Tree The Child Struct

Uploaded by

jvsjoaovictorjvs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Pattern Analysis and Applications (2023) 26:437–454

https://doi.org/10.1007/s10044-022-01111-1

THEORETICAL ADVANCES

An improved frequent pattern tree: the child structured frequent

pattern tree CSFP‑tree
O. Jamsheela1 · G. Raju2

Received: 13 November 2020 / Accepted: 27 August 2022 / Published online: 26 September 2022
© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022

Abstract
Frequent itemsets are itemsets that occur frequently in a dataset. Frequent itemset mining extracts specific itemsets with
supports higher than or equal to a minimum support threshold. Many mining methods have been proposed but Apriori and
FP-growth are still regarded as two prominent algorithms. The performance of the frequent itemset mining depends on many
factors; one of them is searching the nodes while constructing the tree. This paper introduces a new prefix-tree structure
called child structured frequent pattern tree (CSFP-tree), an FP-tree attached with a child search subtree to each node. The
experimental results reveal that the CSFP-tree is superior to the FP-tree and its new variations for any kind of datasets.

Keywords FP-tree · CSFP-tree · Frequent itemset mining · Data mining · CSFP-tree mining · Improved FP-tree

1 Introduction are constructed by using paths with the same prefix item.
Using the conditional FP-tree, the algorithm can generate
The frequent itemsets mining algorithm demands an efficient frequent itemsets.
data structure to store frequent itemsets for further process- Most of the recent proposals based on FP-tree, concen-
ing. FP-growth uses a prefix tree to store the frequent item- trate on the improvement of the Mining phase, whereas
sets and mines frequent itemsets without generating can- improvement in FP-tree structure is of great significance,
didate itemsets. It achieves much better performance and as a better tree structure would reduce the runtime as well
efficiency than Apriori-like algorithms. To avoid the costly as memory requirement. Hence, we explored the possibility
candidate generation, FP-growth algorithm uses a frequent of modification in the basic FP-tree structure.
pattern tree (FP-tree) with a header table. FP-growth algo- Consequently, in this paper an improved tree structure
rithm scans the database two times. After the first scan fre- called child structured frequent pattern tree (CSFP-tree) is
quent 1-itemsets are stored in the header table in decreasing proposed. In the proposed algorithm the child list of each
order of their frequencies. FP-tree is a tree-like data structure node is replaced with a child search tree (CST) to improve
constructed during the second scan. After the second scan, the searching.
the transactions in the transaction database are stored in the
FP-tree in a compressed form. The first instance of each item
in the FP-tree is linked with the corresponding item in the 2 Related work
header table. Nodes of FP-tree with similar items are con-
nected by a link. In FP-growth method FP-tree construction Frequent itemset mining finds specific itemsets with sup-
is the first step. In the second phase, frequent itemsets are ports higher than or equal to a minimum support threshold.
mined from the FP-tree. Mining starts from the least fre- Many frequent itemset mining methods have been intro-
quent item to the most frequent item. Conditional FP-trees duced by various authors, but Apriori and FP-growth are
still regarded as the favored algorithms. Apriori is the oldest
frequent itemset mining algorithm [45]. Many algorithms,
* O. Jamsheela such as DP-Apriori [9], AGM [23, 62], Parallel Apriori
[email protected] [60] and YAFIM [46], are based on Apriori algorithm. Apri-
1
EMEA College of Arts and Science, Kondotty, Kerala, India ori first generates candidate itemsets, then scans the database
2 to confirm whether the candidates are frequent or not. This
Christ University Yeshwantpur Campus, Bengaluru, India

13
Vol.:(0123456789)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

438 Pattern Analysis and Applications (2023) 26:437–454

method scans the database as much as the maximum length in the databases of widely varying items’ frequencies can
among frequent itemsets. The apriori property is used in cause a dilemma known as the rare item problem. To solve
many recent algorithms. The AprioriDP approach requires the problem, the authors proposed a generalized model of a
only one database scan for both frequent candidate 1-itemset pattern-growth algorithm, called GCoMine [29] to discover
and 2-itemset [6]. gpuDCI algorithm [52] is the paralleliza- the itemsets. The approaches of the above-mentioned algo-
tion of DCI algorithm, a sequential algorithm for Frequent rithms are different and to reduce the time complexity they
Itemset Mining [38]. have applied different methods. Table 1 gives a summary
The parallel Apriori algorithm proposed by Bhalodiya of the prominent algorithm described above. In data min-
et al. [6] is implemented in a parallel processing structure. ing different kinds of trees like decision trees, FP-tree, etc.,
The authors have used different nodes to run the algorithm. are proposed. Improvements in the trees are also a major
The database is partitioned into small sections and each par- research area. Frequent itemset mining without any kind of
tition is assigned to different nodes. Here the result of each trees is also proposed and proved efficient [41, 49].
node should be consolidated to get a final output, so each FP-growth algorithm solved the problem of unwanted
node has to be sent its result to a central node. The authors scans with the use of FP-tree, which consists of a tree for
suggested only a revised algorithm of a modified Apriori storing the transaction in a transaction database and a header
algorithm [17]. They did not compare the algorithm with table containing frequent 1-itemsets sorted in descending
another fast mining algorithm like FP-Tree. Another algo- order of the frequency. Each node of the FP-tree contains
rithm [52] also proposed a parallel version to improve the an item name, a support count, a parent pointer, a child
Frequent Itemset Mining process. In this paper, the authors pointer, and a node-link. The node-link is a pointer that
maximize the utilization of GPU to parallelize the bitmap of connects all nodes with the same item to each other. Since
transactions. They have implemented transaction-wise par- the FP-growth algorithm was proposed, various algo-
allelization and candidate-wise parallelization. The authors rithms, such as LP-tree [45], FIUT [56], AFOPT [36],
have compared the algorithm with the sequential DCI algo- BFP-growth [1] FPgrowth* [19], FPmax* [18], Binary
rithms and proved that this algorithm is faster than the DCI. Search Header Three (BSHT) [25] and FPclose [19] were
Another improved version of Parallel Apriori is proposed by developed adopting the FP-tree structure. A survey on FP-
Qiu et al. [46] called YAFIM (Yet Another Frequent Itemset tree based mining methods is conducted and published by
Mining). This is also a parallel Apriori algorithm based on a Jamsheela and Raju [24]. The algorithm CoMine uses the
specially designed in-memory parallel computing model (on FP-tree and FP-growth method to discover the complete set
the Spark RDD framework ) to support iterative algorithms of correlated itemsets in a database [31]. A modified ver-
and interactive data mining. The transaction database is once sion of CoMine called CoMine++ also uses the FP-tree and
loaded into the Spark RDDs (Resilient Distributed Data- FP-growth method for mining. In the GCoMine algorithm
sets), the memory-based data objects in Spark, then during during the frequent itemset mining, the CP-tree is used [48].
the next iteration, the same is used. Here the authors com- GCoMine used the multiple minAllConf threshold values
pared the algorithm with MPApriori[33] and proved that the to avoid the rare item problem. Algorithms, PrePost [10],
algorithm is 25 times faster than the previous one. Another FIN [4, 11, 12] and PrePost+ ] [13] have used both the meth-
mining algorithm was proposed to find correlated itemsets. ods (FP-tree and Apriori) to improve the mining. Pyun et al.
The existing algorithms with a single minAllConf threshold [45] recommended a new tree structure called linear prefix

Table 1 Features of prominent frequent pattern mining algorithms

Algorithms authors year Data structures Concept Apriori or FP-tree Algorithms compared with

AprioriDP Bhalodiya et al. [6] Count-table Require only one database scan for Apriori Apriori
both frequent candidate 1-itemset
and 2-itemset.
gpuDCI Silvestri and Orlando [52] bitmap Parallel conversion of DCI algorithm Apriori DCI
Parallel Ye and Chiang [60] Trie Parallel implementation of a trie- Apriori Trie-base Apriori.
based APRIORI
YAFIM Qiu et al. [46] Hash tree Parallel Apriori algorithm based on Apriori MRApriori.
the Spark RDD framework
GCoMine Rage and Kitsuregawa [48] CP-tree Used multiple minAllConf] to avoid FP-tree CoMine
the rare item problem.
CoMine++ Kiran and Kitsuregawa CP-tree Introduce items’ support intervals to FP-tree CoMine
[29] combine items

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Pattern Analysis and Applications (2023) 26:437–454 439

tree (LP-Tree) to implement an outstanding frequent item- a comparable difference in the number of generated itemsets
set mining technique. An LP-tree is constructed by using and association rules. An improved LBP operator based on
arrays to minimize pointers between nodes. Tsay et al. [56] FP-growth is suggested by Long et al. [37]. The authors have
suggested a novel method, the frequent items ultrametric applied the modified algorithm on a face database for face
trees (FIUT) to enhance the efficiency in obtaining frequent recognition. A full compression frequent pattern tree (FCFP-
itemsets. Tseng et al. [57] introduced an adaptive mecha- Tree) is proposed by Sun et al. [55] to solve the problem of
nism to find a suitable data structure among two pattern large size and rapidly expanding datasets, faced in mining
list structures for mining frequent itemsets. The frequent algorithms. Here the authors have mentioned that to achieve
pattern list (FPL) for sparse databases and the transaction the goal a compromise in memory use should be there. A
pattern list (TPL) for dense databases are the two struc- two-dimensional table is added in another FP-tree-based
tures. Database density is the selection criteria. They sug- algorithm to improve the efficiency of the weighted frequent
gested a method to calculate the database density. Lin et al. itemsets mining proposed by Li and Yin [34]. A modified
[35] proposed an improved frequent pattern (IFP) growth conditional FP-tree (MCFP-tree) and a modified FP-growth
method with the new tree structure to improve the perfor- (MFP-Growth) algorithm are proposed by Ahmed and Nath
mance of mining. IFP-growth needs additional memory to [3] to avoid the creation of conditional FP-trees during the
hold an address table attached to each node. The address frequent itemsets generation from FP-Tree. Many algorithms
table contains the item name and pointer to its child. IFP have been proposed as an improved FP-tree algorithm such
growth does not reduce the size of the tree since it still uses as Caroro et al. [8], Ahmed and Nath [3], Yang et al. [59]
the original FP-tree-based structures with an additional and Zhang et al. [63] . The experimental results show that
address table. Borgelt et al. [7] suggested a new data struc- fast algorithms consume more memory. A detailed study and
ture to find frequent itemsets stating that their algorithm is analysis of the recent FP-tree-based proposals revealed that
the simplest. Racz et al. [47] used arrays to store the nodes more efficient data structures can improve the runtime and
and suggested an alternate method to find frequent item- memory usage of the mining algorithm. The above discussed
sets without rebuilding conditional FP-trees. The recursive algorithms did not consider the searching time during the
mining process is replaced by building new tree structures, tree construction. This leads to the development of a novel
which avoids rebuilding each conditional pattern base. Deng tree structure based on FP-tree.
et al. [10] proposed N-lists and PPC-tree (PrePost-tree) to
find frequent itemsets. In this method, each node contains
its pre-order and post-order sequence numbers. Deng et al.
[12] proposed another method (FIN) by using Nodeset, a 3 CSFP‑tree: child structured frequent
more efficient data structure, for mining frequent itemsets. pattern tree
FIN applied the pruning method suggested by Rymon [50] to
reduce the search space. In PrePost two properties have been The Child Structured Frequent Pattern Tree is a modi-
used but FIN requires only the pre-order (or post-order code) fied frequent pattern tree which contains child sub-trees
of each node. Deng et al. proposed another more efficient with each node. The child sub-trees are a kind of child
method, PrePost+([13]). The authors used the same prun- search tree formed with the children of each node. Details
ing method of FIN and node structure of PrePost to improve of the CSFP-tree construction procedure and related tech-
the performance. A modified FP-tree is used for ontology niques are presented in this section. Table 2 shows the
learning with applications in education [51]. Authors have details of the variables used in the article.
used a regular expression parser approach, deterministic
finite automata (DFA) here for concept extraction. The
same authors have introduced a more efficient method to
frequent itemset mining [11]. Aryabarzan et al. proposed
another efficient data structure called NegNodeset, sets of Table 2 List of variables used
nodes in a prefix tree. Another modification of FP-tree is
Variable name Description
used in the medical data environment [15]. The procedure
is the same as FP-tree, but the authors have removed the TID Transaction id
infrequent items from the transaction by applying a new DB Transaction database
database scan. In another research paper [8], to enhance the dcp Direct child pointer
frequent itemset generation the authors applied to the FP- lsp Left sibling pointer
growth algorithm a modified anti-monotone support con- rsp Right sibling pointer
straint. The modification is applied after constructing the pp Parent pointer
FP-tree. Authors proved that the mining process resulted to

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

440 Pattern Analysis and Applications (2023) 26:437–454

3.1 Background Table 3 Transactional database TID Transactions

FP-tree is an efficient tree structure to store frequent item- 1 C,F

sets. The performance of the mining process with FP-tree 2 B,C,D,
depends on various factors such as the tree construction, 3 A,B,E
searching among the children and traversal. Let k be the 4 F,G
number of items in itemset I. The complexity to add a 5 A,H
transaction T with j items { i 1 , i 2 , . . . , i j } into an FP-tree
can be derived as follows:-
Let the support descending order be i1 ⟶ i 2 ⟶, . . . Table 4 Items with frequency Item Frequency
, i j and the depth of the root node in the FP-tree be 0. In count
the FP-tree, starting at depth 0, FP-growth must check A 2
whether i 1 has existed at depth 1, k times for the worst B 2
case. Similarly, it must verify the existences of i 2 and i 3 C 2
in depths 2 and 3, respectively, (k-1) and (k-2) times for D 1
the worst case. Therefore, the complexity of constructing E 1
a new path in the worst case scenario is (k + (k - 1) + ...+ F 1
(k - (j- 1))) [35]. G 1
The complexity C to construct a path P is as follows H 1

n
∑
(1)
( )
C(P) = k−i
i=0 applied in various algorithms and proved faster than linear
search [21, 39, 40, 53]. Binary tree has also been proved
where, C(P): Complexity to construct path P. k : number of as an efficient tree for searching [32]
items in itemset I. j: number of items in transaction t. n: j − 1 A binary search tree (BST), is a tree like data structure
Each node of the tree has a set of children. In the root where each node has no more than two child nodes. The left
node, each child represents a separate branch of the tree by sub-tree contains only nodes with keys less than the par-
holding unique items. When a new transaction is inserted ent node; the right sub-tree contains only nodes with keys
into the tree, the children of the root have to be searched greater than the parent node. The main advantage of a binary
for the existence of the first item of the transaction. If the search tree is that it remains ordered, which provides a faster
item is found among the children of the root, the frequency search than many other data structures [22].
count of that item is increased by one, otherwise a new child By analyzing the insertion procedure of FP-tree, we can
is added to root and form a new branch with the remaining find that the insertion of each item leads a search among the
items. When a new node is added to the root, the number children to find the exact location. The CSFP-tree is intro-
of child pointers in the root is increased. In worst case, the duced to enable fast searching among the child list of the
children of root include all items. nodes. The concept of a binary search tree is used to con-
FP-tree and most of its recent improvements use a lin- struct the child search trees (CST) of each node.
ear data structure to store the children of each node (Pre- The concept can be illustrated by using an exam-
Post [10], FIN [12], PrePost+ [13]). Only a dynamic data ple. Table 2 lists the variables and their descriptions. Table 3
structure is appropriate to store the children because the is a simple transnational database with 5 transactions. The
number of children cannot be predicted earlier and new minimum support is fixed as one and Table 4 contains fre-
items have to be added dynamically. Dynamic linear data quent items with frequencies of each. Figure 1 is the FP-tree
structures like Linked List prefer a linear search to a binary with the structure of normal FP-tree which is constructed
search. Linear search is the simplest search algorithm. In with the 5 transactions in Table 3. The FP-tree in Fig. 2 is
linear search the first item is compared first, then the sec- also the same FP-tree which is displayed to show its search
ond item, and so on until you find the target item or reach path among the children. Figure 3 shows the CSFP-tree of
the end of the list. If the list grows in size, the number of the 5 transactions in Table 3. The CSFP tree construction
comparisons required to find a target item in both worst steps are illustrated in detail with an example in Sect. 3.4. In
and average cases grows linearly. In FP-tree the number Figs. 2 and 3, the direct children of the root are highlighted
of child pointers are added during the construction pro- with shaded circles and those without shades are normal
cess and hence the comparison becomes time consuming. nodes. The root node is the node with symbol R.
Binary search is faster than linear search in average case
and worst case [5, 28, 39, 43, 53, 54]. Binary search is also

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Pattern Analysis and Applications (2023) 26:437–454 441

– If a new transaction ti:’I J’ needs to be inserted into the

FP-tree, each child ie; A B C and F have to be searched
to find the exact position of item I . Instead, in CSFP-tree
only two nodes C and F need to be searched for that.
Thus the following points can be derived from Figs. 2
and 3:
– To add a new transaction ti to an FP-tree, in the worst
case, all the children of root node (highlighted nodes)
have to be searched to find a perfect match of the first
item of each ti. But in the case of CSFP-tree to add a new
transaction, a binary search can be applied.
– In Fig. 3, children of all nodes of CSFP-tree are arranged
as a binary search tree structure. The children of node A,
ie;B H which are highlighted in the figure are also formed
Fig. 1 FP-tree constructed from DB in Table 3
as a binary tree structure.

3.2 The child search tree

Each node of the CSFP-tree has zero or more children. The

children of each node are arranged as a subtree called child
search tree (CST). The CST is constructed according to the
rule of the Binary Search Tree. A CST is created for each
node with more than one child. All branches of the tree,
except one, start from the nodes of the CST of the root node.

3.3 Structure of the CSFP‑tree

Fig. 2 FP-tree constructed from DB in Table 3 with the search path

CSFP-tree is an improved FP-tree and contains a collection

of CST. Each node of the CSFP-tree contains item name,
support count, left pointer, right pointer, child pointer,
parent pointer and link to similar items.

3.3.1 Node structure

The structure of the proposed CSFP-tree is as follows. A

node in the CSFP-tree has four link fields in addition to the
data field, points to its direct child(dcp), left sibling(lsp),
right sibling(rsp) and parent(pp). Accordingly, a node can
have at most one direct child, one left-sibling and one
right-sibling.
A parent node is linked directly to its first child (direct
child) but no parent-to-child link to other children, which
Fig. 3 CSFP-tree constructed from DB in Table 3 are linked through the direct child using sibling links. But
all children of a node A have their parent link points to the
– The main differences between the trees can be pointed node A. Let X be a node and Y its first (direct) child. Then,
out as follows; the remaining children of X, Z1, Z2.....Z6, are linked through
– The nodes, A B C and F, are the direct children of the sibling chain as illustrated in Fig. 4a. The sibling chain is
root node. The FP-trees in Figs. 1 and 2 are arranged in formed based on the structuring of the binary search tree
such a way that the nodes A B C and F are connected as a enabling fast search among the children. The item name
leaner structure. In Fig. 3, CSFP-tree, the specified nodes is used as the search key. A direct child and the siblings
are arranged as a binary search tree. can have a direct child. The direct child is added in the

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

442 Pattern Analysis and Applications (2023) 26:437–454

descending order of the support. All the nodes have a par- Table 5 The transaction database
ent link point to its respective parent. TID Transactions Sorted transactions after
removing infrequent
3.4 Tree construction items

1 B,D,F F,B,D
The tree is created with the frequent 1-itemsets. The data-
2 B,C,J, P, Q C, B, J
base is scanned for the first time to find the frequent 1-item-
3 A,B,G,H B, A, H
sets as in the case of FP-tree. Thereafter, the header table
4 B,C,D,E,G C,B,D, E
is created with the frequent 1-itemsets sorted in descending
5 C,D,E,F,J C,F, D, E, J
order of the support. Next step is the construction of the
6 B,C,F,J, R, T C,F, B, J
CSFP-tree.
7 A, D, E, S D, E, A
In order to construct the CSFP-tree a second scan of
8 C,F,L, U C,F,L
the database is carried out. Take the first transaction,
9 D,F,H,I, S F,D, H
after removing the infrequent items sort the transaction in
10 C,F, R C,F
descending order of the support. Let X be the first item of
11 K, L,M K, L, M
the transaction. X is the first child of the root and hence
12 A,K,M, Q A,K,M
becomes the direct child of the root. The remaining items
13 E,K, T E, K
of the transaction are formed as a branch of direct children
14 H,L,U H, L
from X. To insert the second transaction compare the first
15 M,N, O M
item with X. If the item is smaller than (or greater than) X,
a new node is created and added as the left sibling (or right
sibling) of node X. If the two items are same, then increase
the support count of node X and compare the next item of Table 6 List of frequent items Item Frequency
the transaction with the direct child of X and so on. The
insertion of the remaining transactions is as follows. C 6
Remove the infrequent items and sort the items of each F 6
transaction. Compare the first item I 1 of each transaction B 5
with ‘X,’ the direct child of the root. If I 1 matches with ‘X,’ D 5
increment the support count of ‘X’ and compare next item E 4
I 2 of the transaction with direct child of ‘X.’ This process A 3
is continued (match, increment, move to direct child) till a H 3
mismatch between an item I n and a node ‘Y’ is found. Then J 3
if I n < Y , continue the process with the left sibling of ‘Y.’ K 3
Otherwise, continue with the right sibling of ‘Y.’ L 3
If match with a node ‘Y’ is successful and Y has no M 3
direct child, then the remaining items, if any, are formed as
a branch of direct children from ‘Y.’
If the search for a match of an item I j ends in a NULL The details are explained with an example in the following
(ie when I j < Y and left sibling of ‘Y’ is NULL or I j > Y section.
and right sibling of Y is NULL), then insert I j as a sibling, The CSFP-tree construction is illustrated with the data
Z (left sibling if I j < Y or right sibling if I j > Y ) of ‘Y.’ The given in Tables 5. The minimum support is set as 3. Table 6
remaining items, if any, are formed as a branch of direct contains the list of frequent items. Column 3 of the table
children from ‘Z.’ contains the sorted transactions of column 2 after removing
After inserting all the transactions, the children of each the infrequent items. ‘F B D’ is the first transaction to be
node form a sub-tree (CST) and the direct child of each node inserted first. ‘F’ is inserted as the direct child of root and
become the root of the CST. The search of an item among the remaining items ‘B’ and ‘D’ are formed as a branch of
the children is very efficient with the CST. The sub-tree is the direct child from F. ‘F’ became the root of the CST con-
called a child search tree because the sub-tree is created by structed by the children of the root. B is the direct child of F
using the children of each node. and the root of the CST of the children of node F. Figure 4b
The CST is a tree structure based on the order property shows the CSFP-tree after inserting the first transaction. To
of the binary search tree where the children of each node insert the second transaction ‘C B J,’ search C on the CST of
are added according to the rule of the binary search tree. the root. ‘F’ is the direct child of root and ‘F’ has no siblings.
Therefore, create a new node with C and compare C with

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Pattern Analysis and Applications (2023) 26:437–454 443

Fig. 4 a Node structure of

CSFP-tree. b CSFP-tree after
inserting transaction-1. c CSFP-
tree after inserting transaction-3

F. Insert C as the left sibling of F because C is less than F. Figure 5 is a CSFP-tree after inserting all the trans-
Set the parent of C with the root node. Add a new branch actions in column 3 of Table 5. All the bold ovals with
from node C with ‘B and J.’ Next transaction to be inserted items F, C, B, A, D, E, K, H and M are the children
is ‘B A H.’ Compare B with the direct child ‘F’ of the root. of the root node. These children form a CST structure.
B is less than F. Move to the left sibling of F. C is the left The parent pointer of each node is linked to its respective
sibling of F. Compare B with C. Add B as the left sibling of parent. This criterion is used because each path has to
C because B is less than C. Now root has three children ie. F be taken separately during the mining process. The other
C and B. F is the direct child of root and set as the root of the oval shaped nodes are also part of a CST but they are the
CST of the root. C and B are siblings of F and added to the children of some internal nodes. The internal node F with
left subtree of F. Figure 4c shows the CSFP-tree after insert- support 4 has 3 children D, B and L. The CST of F is
ing the 3rd transaction. Insert all other transactions according formed with D, B, L. D is the root of the CST. The thick
to the above mentioned criteria. dashed lines denote left and right siblings. The other nodes

Fig. 5 CSFP-tree after inserting

all transactions in Table 5

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

444 Pattern Analysis and Applications (2023) 26:437–454

are normal nodes representing transactions. For the sake as null. lines 4 to 25 are used to create the CSFP-tree. Two
of clarity, the links which connect similar items are not procedures are used here. The procedure ‘childsearch’ is
included in the diagram. used to search the specific item in the partially constructed
CSFP-tree.
The procedure childsearch (Fig. 7) accepts as its param-
4 Frequent itemset mining with CSFP‑tree eter, the current child of the root ie; temp and the item ie; I
to be searched in the child BST. The transaction is already
Frequent itemset mining with CSFP-tree involves two arranged in the order of the frequencies of each item. There-
main algorithms and two sub algorithms. Details of the fore, the BST is formed according to the item name only
algorithms are presented in this section. and not according to the frequency. If the I is less than the
current child temp, it will be moved to the left branch of the
current node and vice versa.
4.1 CSFP‑tree construction algorithm If the search is successful, the procedure returns the node
with the searched item. Otherwise, create a new node nd
Algorithm-1 in Fig. 6 is used to construct the CSFP-tree. with the item, add nd in proper location and return nd. A
Step-1 is used to find all the frequent 1-itemsets. Next step flag is set to know whether the returned node is an exist-
entails the creation of the header list. 3 rd step set the root ing node or a newly created node. To create new branches,
the procedure ‘createBranch’ is invoked (Fig. 6) by sending
the processed transaction as an argument. The output of the
algorithm is the CSFP-tree.

4.2 CSFP‑growth algorithm

The algorithm CSFP-growth mines the frequent itemsets

by traversing the CSFP-tree. Figure 8 presents the growth
algorithm. The mining process is the same as the FP-growth.
The mining starts from the least frequent item X. From the
CSFP-tree find all paths which contain the current item X.
Starting from the current node X retrieve all items from the
path by traversing upward to the root. Set the support of each
retrieved itemsets with the support of X in that particular
path. If there is only one path and satisfies the minimum

Fig. 6 Algorithm-1 to construct CSFP-tree Fig. 7 Procedure to search child

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Pattern Analysis and Applications (2023) 26:437–454 445

complexity of the algorithm here is the same as the FP-

tree. In algorithm 1, steps one and two are included for
the above mentioned steps. The time complexity of the
first step is O(nm). Here all transitions should be pro-
cessed once.

n:number of transactions in DB
m:number of items in each transaction

2 The second step is to construct the CSFPTree. Here we

have implemented a binary tree structure for the children
of each node. To insert a transaction to the FP-tree the
Fig. 8 Algorithm-2 to print frequent itemsets time complexity is depended on the number of frequent
items [42]. To insert an item to the FP-tree, a linear
search is performed to check whether the item is present
support criteria, then the combinations of items in the path in a child list. The number of children of the root, in the
are printed as frequent itemsets. worst case, is the total number of frequent items in DB.
The process is as follows: From the CSFP-tree find all
The number of children, chn, in worst case, of each
paths which include the current item X. If there is only one
node, nd is
path and the support of X in that particular path is greater
than or equal to the minimum support, then the combina- chn(nd) = nf − (p(nd)) (2)
tions of items in the path are retrieved as separate item-
sets T i . Add X with each T i and print as frequent itemsets where nf=number of frequent items, p(nd))=position of
with support of X. Otherwise, starting from current node X, the item in the frequent item table. When a linear search
retrieve the itemsets from each path by traversing upward to is performed the time complexity to insert a transaction
the root. Set the support of each retrieved itemsets with the to the FP-tree is
support of X in that particular path. Call the first algorithm
to construct conditional CSFP-tree of X by considering each O(m*k)
itemset as a separate transaction with support count. Then m=number of frequent items in transaction
all the procedures are recursively called. Next, call the 2 nd k=nf-p(nd)
algorithm to find the frequent itemsets of the least frequent
item in the conditional CSFP-tree of X. If more than one To construct the complete FP-tree the time complex-
path exists in the conditional CSFP-tree, the procedures are ity is
again called recursively. O(n*m*k) (3)
In CSFP-tree a binary search is conducting. The time
5 Time complexity analysis complexity of binary search is

The real challenge, while introducing a new algorithm as an 1 Best case O(1)
improvement of an existing one, is to reduce computational 2 Average case O(log n))
complexity such as space and time complexities. The impor- 3 worst case O(log n) [54]
tant matter to be considered is how the time and space com-
plexity can be addressed. The time complexity of FP-Tree By using the child list as a binary tree, the time com-
is discussed in detail in Kadappa and Nagesh [27], Yin et al. plexity for inserting a transaction to the tree is reduced
[61], WEN-YUAN and LIU [58], Jia and Liu[26], Agapito to,0(m*log n) To construct the CSFP-tree the time com-
et al.[2]. The overall time complexity of the FP-tree is ana- plexity is
lyzed in the above cited papers. The time complexity of FP-
O(n*m*log n) (4)
tree creation is specified in Kosters et al. [30] The CSFP-tree
algorithm has three main steps as mentioned below:- 3 The last step is the mining of frequent itemsets from the
CSFP-tree. The time complexity of this phase is same
1 During the first scan, it extracts all the frequent items as FP-Tree.
from the database and sorts the frequent items in
descending order of the frequency. The time and space

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

446 Pattern Analysis and Applications (2023) 26:437–454

The space complexity of CSFP-tree is the same as FP-Tree but the memory usage of PrePost+ and FIN algorithms are
because the node structure and the number of nodes in the very high while comparing with H-mine and FP-growth. The
tree are the same as FP-tree. The advantage of the CSFP- most efficient algorithm among these seven is the NegFIN
tree is that without increasing memory, it can improve the algorithm, but it uses more memory than other FP-Growth
running time of the algorithm. ans CSFP-tree.
Seven datasets are used in the experiments. The details
of the datasets are shown in Table 7. These datasets are
publicly available in the FIMI repository (http://fimi.ua.ac.
6 Performance evaluation be) [16] and all are real datasets except T10I4D100K. The
Mushroom, Connect and Pumsb are dense datasets. Chess
In this section, the experimental results are presented. The and Retail are sparse datasets. To evaluate the runtime and
proposed algorithm is compared with other seven algorithms memory usage of the proposed algorithm, the datasets have
-FP-growth [20], H-Mine [44], FIN [12], PrePost [10], been used as their original form without losing any data.
PrePost+[13], DFIN[11] and NegFIN [4]. The proposed During the run time test, a huge number of frequent item-
method is implemented in java and the platform is Intel sets could be generated from each dataset. These outputs
CPU-3.3 GHz with 4GB RAM and Windows 7 32bit OS. are too large to compare manually. Hence, to evaluate the
Implementations of all other algorithms are taken from the accuracy of the new algorithm, a subset of data from each
SPMF website (http://www.philippe-fournier-viger.com/ datasets are used with a convenient minimum support and
spmf/index.php?link=license.php) [14]. FP-growth algo- compared the output of every datasets with the output of
rithm is chosen as the baseline algorithm. PrePost+ has been other algorithms. It is observed that the frequent itemsets
proven as the best algorithm among all node-based meth- generated from the proposed algorithm are the same as the
ods. H-mine is included because of its efficiency in memory frequent itemsets generated by the other algorithms.
usage. FIN, PrePost+, DFIN, NegFIN are recent algorithms Figures 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21
for finding frequent itemsets. The run time of PrePost+ and and 22 show the experimental results. With all the datasets
FIN are lesser than the run time of H-mine and FP-growth the runtime of the proposed algorithms is better than FP-
growth and H-mine. In the memory usage evaluation, the
proposed algorithm uses less memory than the other algo-
Table 7 Datasets
rithms except FP-growth. Fig. 9 and 10 show the run time
Datasets Transactions Items and memory consumption of all algorithms on the dataset
‘connect.’ CSFP-growth performed well with the lowest
Accidents 340183 468
minimum supports but NegFIN and DFIN are the fastest
Retail 88162 16470
algorithm among the other algorithms. Algorithm H-mine
Connect 67557 129
could not be included in the result because it takes more time
Pumsb 49046 7116
with dataset ‘connect.’
Chess 3196 84
FP-growth consumes less memory than all others. The
Mushroom 8124 119
memory consumption of CSFP-growth is near to the FP-
T10I4D100K 98487 949
growth. PrePost drastically increases its memory usage when

Fig. 9 Running time on the con-

nect dataset

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Pattern Analysis and Applications (2023) 26:437–454 447

Fig. 10 Memory consumption

on the connect dataset

Fig. 11 Running time on the

mushroom dataset

Fig. 12 Memory consumption

on the mushroom dataset

minimum support crosses 80%. PrePost+ consumes more The results on the dataset ‘Mushroom’ are shown in
memory than FP-growth and CSFP-growth. The graph rep- Figs. 11 and 12. Here the run time of all algorithms except
resenting FIN is a consistent line with all minimum support H-Mine are the same up to 10% minimum support. With
values and consumes more memory than others with the the lowest minimum supports, the proposed CSFP-growth
highest minimum support. and NegFIN are the fastest algorithm. CSFP-growth and

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

448 Pattern Analysis and Applications (2023) 26:437–454

Fig. 13 Running time on the

pumsb dataset

Fig. 14 Memory consumption

on the pumsb dataset

Fig. 15 Running time on the

chess dataset

FP-growth consume lesser memory than other algorithms memory usage of H-mine is lesser than PrePost, FIN and
for all minimum supports. With 2% minimum support, PrePost+.
CSFP-growth consumes less memory than FP-growth. The For dataset ‘Pumsb,’ PrePost+, PrePost, FIN, DFIN
and NegFIN have performed almost the same, though

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Pattern Analysis and Applications (2023) 26:437–454 449

Fig. 16 Memory consumption

on the chess dataset

Fig. 17 Running time on the

retail dataset

Fig. 18 Memory consumption

on the retail dataset

PrePost+ and PrePost run a little bit faster than FIN. The memory than the proposed CSFP-growth algorithm. Pre-
performance of CSFP-growth with dataset Pumsb is not Post is one of the fastest algorithms with ‘Pumsb’ but con-
good as with other datasets but performs better than FP- sumes more memory than the other algorithms when the
growth. Recent algorithms (FIN, PrePost, FIN, NegFIN) minimum support value reduces. The results are given in
perform well with dataset ‘Pumsb,’ but consume more Figs. 13, and 14.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

450 Pattern Analysis and Applications (2023) 26:437–454

Fig. 19 Running time on the

accidents dataset

Fig. 20 Memory consumption

on the accidents dataset

Fig. 21 Running time on the

T10I4D100K dataset

Figures 15 and 16 show the runtime and memory usage In memory usage evaluation on dataset ‘chess,’ PrePost
of the algorithms on the dataset ‘Chess.’ Although all the consumes more memory than other algorithms. FP-growth
algorithms except H-mine performed well, CSFP-growth consumes less memory than the other algorithms. The
and FP-growth consumes less memory. H-mine is the slow- memory consumption of CSFP-growth is lesser than other
est algorithm with each and every minimum support value.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Pattern Analysis and Applications (2023) 26:437–454 451

Fig. 22 Memory consumption

on the T10I4D100K dataset

algorithms except for FP-growth. The algorithm DFIN con- The remaining two datasets, Retail and Chess, are sparse
sumes more memory than the algorithm NegFIN. datasets. The time complexity of the algorithms with dataset
Figures 17 and 18 show the experimental results on the Chess is almost the same except H-mine and FP-growth,
dataset ‘Retail.’ The variations in the results are not sig- but when analyzing the memory usage of the algorithms
nificant, but NegFIN and DFIN are a little faster than other with the same dataset, all algorithms consume more memory
algorithms. PrePost and PrePost+ consume more memory except the proposed algorithm and FP-growth. The retail
than the other 3 algorithms. The memory usage of the CSFP- dataset contains the highest number of transactions and the
growth algorithm is near to FP-growth which consumes less highest number of items among the 5 datasets. The runtime
memory than others. of CSFP-growth is the same as the other recent algorithms,
For dataset ‘Accidents,’ CSFP-growth, PrePost+, Pre- PrePost and PrePost+, but the memory usage of the CSFP-
Post, FIN, DFIN and NegFIN have performed almost the growth is less than 1/4 of the memory usage of the recent
same. Recent algorithms (FIN, PrePost, FIN, NegFIN, etc.) algorithms.
perform well with dataset ‘Accidents,’ but consume more The recent algorithms NegFIN and DFIN outperform
memory than the proposed CSFP-growth algorithm. The with the datasets connect and retail but consumes more
results are given in Figs. 19 and 20. memory than others. CSFP-growth performs well with data-
Figures 21 and 22 show the experimental results on the sets T10I4D100K, pumsb, mushroom and accidents and also
dataset ‘T10I4D100K.’ The variations in the results are uses less memory than others.
not significant, but NegFIN and DFIN and FIN are a little The number of items in the dense dataset Pumsb is 7116
slower than other algorithms. NegFIN and DFIN consume and the number of the transactions in the sparse dataset
more memory than the other algorithms. The memory usage Retail is 16470. The number of transactions in Retail is
of the CSFP-growth algorithm is near to FP-growth which double the number of transactions in Pumsb. CSFP-growth
consumes less memory than others. algorithms perform better in Retail but not in Connect. One
Three datasets, Connect, Pumsb and Mushroom, are reason is that the Pumsb is a dense dataset. Another reason
dense datasets. The runtime results of Mushroom and Con- may be the tree structure. Here the CST is not a balanced
nect show that the proposed algorithm is performed better tree. The tree may be formed as a skewed tree. In such situ-
than other algorithms even with the huge number of trans- ations, the searching time is the same as linear search.
actions but with a fewer number of items. The performance In FP-tree and its new variants, tree construction is a
of the other algorithms outperforms when both the number continuous procedure. The main FP-tree is constructed first,
of items and number of the transactions are high. It can be then during the mining process, the conditional FP-trees are
analyzed from the runtime result of Pumsb. Among the three recursively constructed. The CST creation using the child
datasets, Connect contains the highest number of transac- nodes is not an overhead during the tree construction. Dur-
tions. If we analyze the runtime results of the three datasets ing the insertion of each item, child list of each level has to
we can conclude that the proposed algorithm is better when be searched to check whether the item is present or not. Nor-
the number of items is less and the performance will not be mally a linear search is carried out. The CST is constructed
affected when the number of transactions is increased. to implement the binary search and hence the search time
is reduced.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

452 Pattern Analysis and Applications (2023) 26:437–454

The memory consumption of the proposed CSFP-tree to get a balanced CSFP-tree. Therefore, the algorithm would
is the same as FP-tree. In FP-tree and other FP-tree vari- perform better with any kind of dataset, but transaction sort-
ants, each node should be pointed from its parent. Let x is a ing is a time-consuming process. As future work, a new
node with 4 children z1, z2, z3 and z4. Then x contains four transaction sorting technique with lesser time can be intro-
pointers and each pointer points to each child of x. But in duced and in the future, the CSFP-tree algorithm will have
the proposed algorithm each parent points to only one child to be extended with a balanced BST to get better results.
node. Hence, in CSFP-tree x has only one child pointer
which points to the first child z1. z1 contains maximum of
2 sibling pointers to hold z2 and z3. z4 is pointed from z2 Funding No funding is available.
or z3. In the FP-tree structure, the four pointers are stored
Availability of data and material Publicly available data is used.
in x but in CSFP-tree the pointers are arranged in different
nodes. There is no increase in the number of pointers but Code availability Not applicable.
only rearranged. In the proposed algorithm no additional
nodes are created but the existing nodes are re-positioned to Declarations
form a CSFP-tree structure. Therefore no extra memory is
used to construct the proposed CSFP-tree. Conflicts of interest No conflict of interest.

7 Conclusion
References
Association rule mining is the task of finding out interesting
1. Adnan M, Alhajj R (2011) A bounded and adaptive memory-
rules from the databases which would help in many areas based approach to mine frequent patterns from very large
of its applications in different ways. Many improvements databases. IEEE Trans Syst Man Cybernet Part B Cybernet
have been introduced so far in this area. However, the execu- 41(1):154–172
tion time increases significantly with an increase in memory 2. Agapito G, Guzzi PH, Cannataro M (2018) Parallel and distrib-
uted association rule mining in life science: A novel parallel algo-
usage. The improved algorithms have proved its efficiency rithm to mine genomics data. Information Sciences
only in runtime by compromising high memory usage. For 3. Ahmed SA, Nath B (2019) Modified fp-growth: an efficient fre-
huge datasets, optimizing the memory usage is quite impor- quent pattern mining approach from fp-tree. In: International
tant. In this sense, many of the proposed algorithms are not conference on pattern recognition and machine intelligence, pp
47–55. Springer
efficient. A new data structure named CSFP-tree, which is 4. Aryabarzan N, Minaei-Bidgoli B, Teshnehlab M (2018) negfin: an
more efficient than FP-Tree is introduced. The CSFP-tree is efficient algorithm for fast mining frequent itemsets. Expert Syst
used in the new algorithm named CSFT-growth for mining Appl 105:129–143
complete frequent patterns. The modified prefix tree struc- 5. Bae S (2019) Searching and sorting. In: JavaScript data structures
and algorithms, pp 125–149. Springer
ture, CSFP-tree is proposed to speed up frequent itemset 6. Bhalodiya D, Patel K, Patel C (2013) An efficient way to find
mining problems without using extra memory. The order frequent pattern with dynamic programming approach. In: 2013
property of the child search trees is based on the concept of Nirma university international conference on engineering (NUi-
binary search. Hence the structure of the CSFP-tree enables CONE), pp 1–5. IEEE
7. Borgelt C (2010) Simple algorithms for frequent item set mining.
faster find/insert operations. This is achieved without adding In: Advances in machine learning II, pp 351–369. Springer
any extra node in comparison to FP-tree. Experiments car- 8. Caroro RA, Sison AM, Medina RP (2019) Modified anti-mono-
ried out with the different standard datasets established the tone support pruning on fp tree for improved frequent pattern
efficacy of the proposed algorithm in comparison to FP-tree generation. In: Proceedings of the 2nd international conference on
software engineering and information management, pp 138–142
and its new variants. 9. Cheng X, Su S, Xu S, Li Z (2015) Dp-apriori: a differentially
The problem with BST is that, depending on the order private frequent itemset mining algorithm based on transaction
of inserting elements in the tree, the shape of the tree may splitting. Comput Secur 50:74–90
vary. In the worst cases, the tree will look like a linked list 10. Deng Z, Wang Z, Jiang J (2012) A new algorithm for fast mining
frequent item sets using n-lists. Sci China Inf Sci 55(9):2008–2030
in which each node will have only the right child. Therefore, 11. Deng Z-H (2016) Diffnodesets: an efficient structure for fast min-
with a few datasets the proposed CSFP-tree is not performed ing frequent itemsets. Appl Soft Comput 41:214–223
well. As future work, extra sorting can be applied to the 12. Deng Z-H, Lv S-L (2014) Fast mining frequent itemsets using
transaction database. After filtering and sorting the items nodesets. Expert Syst Appl 41(10):4505–4512
13. Deng Z-H, Lv S-L (2015) Prepost+: An efficient n-lists-based
in each transaction, the entire transaction set can be sorted algorithm for mining frequent itemsets via children-parent equiva-
according to the first item of each transaction. The middle lence pruning. Expert Syst Appl 42(13):5424–5432
transaction can be inserted into the tree as the first insertion

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Pattern Analysis and Applications (2023) 26:437–454 453

14. Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C-W, 35. Lin K-C, Liao I-E, Chen Z-S (2011) An improved frequent pat-
Tseng VS (2014) Spmf: a java open-source pattern mining library. tern growth method for mining association rules. Exp Syst Appl
J Mach Learn Res 15(1):3389–3393 38(5):5154–5161
15. Gao X, Xu F-Q, Zhu Z-M (2019) The application of improved 36. Liu G, Lu H, Lou W, Xu Y, Yu JX (2004) Efficient mining of
fp-growth algorithm in disease complications. DEStech Trans frequent patterns using ascending frequency ordered prefix-tree.
Comput Sci Eng (cmso) Data Min Knowl Disc 9(2):249–274
16. Goethals B (2003) Fimi repository website. http://fimi.ua.ac.be/ 37. Long S, Zheng C, Wu C, Qi T, Li X, Zhu Y, Liu J, Li J, Shuai J,
data/, [fimi web site] Xie Z, et al (2019) Face recognition based on fp-growth improved
17. Goethals B, Zaki MJ (2003) A fast apriori implementation. In: lbp operator. In: Proceedings of the 2019 3rd international confer-
Proceedings of the IEEE ICDM workshop on frequent itemset ence on digital signal processing, pp 100–103
mining implementations, pp Vol. 90 of CEUR Workshop Proceed- 38. Lucchese C, Orlando S, Palmerini P, Perego R, Silvestri F (2003)
ings. IEEE kdci: A multi-strategy algorithm for mining frequent sets. In: Pro-
18. Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining ceedings of the IEEE ICDM workshop of frequent itemset mining
frequent itemsets. In FIMI, vol 90 implementations (FIMI), Melbourne, Florida. Citeseer
19. Grahne G, Zhu J (2005) Fast algorithms for frequent itemset min- 39. Marquez A, Leon J, Vazquez S, Franquelo L, Carrasco J, Galvan
ing using fp-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362 E (2016) Binary search based mppt algorithm for high-power pv
20. Han J, Pei J, Yin Y (2000) Mining frequent patterns without systems. In: 2016 10th international conference on compatibility,
candidate generation. In: ACM SIGMOD record, vol 29, pp power electronics and power engineering (CPE-POWERENG),
1–12. ACM pp 168–173. IEEE
21. Heras F, Morgado A, Marques-Silva J (2011) Core-guided 40. Munro JI (2000) On the competitiveness of linear search. In: Euro-
binary search algorithms for maximum satisfiability. In: pean symposium on algorithms, pp 338–345. Springer
Twenty-Fifth AAAI conference on artificial intelligence 41. Nagaraju S, Kashyap M, Bhattachraya M (2017) An effective den-
22. Hibbard TN (1962) Some combinatorial properties of certain sity based approach to detect complex data clusters using notion
trees with applications to searching and sorting. J ACM (JACM) of neighborhood difference. Int J Autom Comput 14(1):57–67
9(1):13–28 42. Panjwani J (2010) Application of FP tree growth algorithm in text
23. Inokuchi A, Washio T, Motoda H (2000) An apriori-based algo- mining. PhD thesis, Citeseer
rithm for mining frequent substructures from graph data. In: 43. Parmar VP, Kumbharana C (2015) Comparing linear search and
Principles of data mining and knowledge discovery, pp 13–23. binary search algorithms to search an element from a linear list
Springer implemented through static array, dynamic array and linked list.
24. Jamsheela O, Raju G (2015) Frequent itemset mining algo- Int J Comput Appl, 121(3)
rithms: a literature survey. In: 2015 IEEE international advance 44. Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2007) H-mine: fast
computing conference (IACC), pp 1099–1104. IEEE and space-preserving frequent pattern mining in large databases.
25. Jamsheela O, Raju G (2015) An adaptive method for mining IIE Trans 39(6):593–605
frequent itemsets efficiently: an improved header tree method. 45. Pyun G, Yun U, Ryu KH (2014) Efficient frequent pattern mining
In: 2015 international conference on advances in computing, based on linear prefix tree. Knowl-Based Syst 55:125–139
communications and informatics (ICACCI), pp 1078–1084. 46. Qiu H, Gu R, Yuan C, Huang Y (2014) Yafim: A parallel fre-
IEEE quent itemset mining algorithm with spark. In: 2014 IEEE Inter-
26. Jia K, Liu H (2017) An improved fp-growth algorithm based on national parallel & distributed processing symposium workshops
som partition. In: International conference of pioneering computer (IPDPSW), pp 1664–1671. IEEE
scientists, engineers and educators, pp 166–178. Springer 47. Rácz B (2004) nonordfp: An fp-growth variation without rebuild-
27. Kadappa V, Nagesh S (2019) Local support-based parti- ing the fp-tree. In: FIMI
tion algorithm for frequent pattern mining. Pattern Anal Appl 48. Rage UK, Kitsuregawa M (2015) Efficient discovery of corre-
22(3):1137–1147 lated patterns using multiple minimum all-confidence thresholds.
28. Karimov E (2020) Binary search tree. In: Data structures and J Intell Inf Syst 45(3):357–377
algorithms in swift, pp 87–100. Springer 49. Reddy H, Raj N, Gala M, Basava A (2020) Text-mining-based
29. Kiran RU, Kitsuregawa M (2012) Efficient discovery of correlated fake news detection using ensemble methods. Int J Autom Com-
patterns in transactional databases using items’ support intervals. put, pp 1–12
In: International conference on database and expert systems appli- 50. Rymon R (1992) Search through systematic set enumeration.
cations, pp 234–248. Springer Technical Reports (CIS), p 297
30. Kosters WA, Pijls W, Popova V (2003) Complexity analysis of 51. Shatnawi S, Gaber MM, Cocea M (2019) A heuristically modified
depth first and fp-growth implementations of apriori. In: Interna- fp-tree for ontology learning with applications in education. arXiv
tional workshop on machine learning and data mining in pattern preprint arXiv:1910.13561
recognition, pp 284–292. Springer 52. Silvestri C, Orlando S (2012) gpudci: exploiting gpus in frequent
31. Lee Y-K, Kim W-Y, Cai YD, Han J (2003) Comine: Efficient min- itemset mining. In: 2012 20th euromicro international confer-
ing of correlated patterns. In ICDM, pp 581–584 ence on parallel, distributed and network-based processing, pp
32. Levy CC, Tarjan RE (2019) Splaying preorders and postorders. 416–425. IEEE
In: Workshop on algorithms and data structures, pp 510–522. 53. Subramanian S, Berzish M, Tripp O, Ganesh V (2017) A solver
Springer for a theory of strings and bit-vectors. In: 2017 IEEE/ACM 39th
33. Li N, Zeng L, He Q, Shi Z (2012). Parallel implementation of apri- international conference on software engineering companion
ori algorithm based on mapreduce. In: 2012 13th ACIS interna- (ICSE-C), pp 124–126. IEEE
tional conference on software engineering, artificial intelligence, 54. Sultana N, Paira S, Chandra S, Alam SS (2017) A brief study and
networking and parallel/distributed computing, pp 236–241. IEEE analysis of different searching algorithms. In: 2017 second inter-
34. Li Y, Yin S (2020) Mining algorithm for weighted fp-tree fre- national conference on electrical, computer and communication
quent item sets based on two-dimensional table. J Phys: Conf Ser technologies (ICECCT), pp 1–4. IEEE
1453:012002 55. Sun J, Xun Y, Zhang J, Li J (2019) Incremental frequent itemsets
mining with fcfp tree. IEEE Access 7:136511–136524

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

454 Pattern Analysis and Applications (2023) 26:437–454

56. Tsay Y-J, Hsu T-J, Fiut J-RY (2009) A new method for mining 62. Zaki MJ (2000) Scalable algorithms for association mining. IEEE
frequent itemsets. Inf Sci 179(11):1724–1737 Trans Knowl Data Eng 12(3):372–390
57. Tseng F-C (2012) An adaptive approach to mining frequent item- 63. Zhang HL, Xue Y, Zhang B, Li X, Lu X (2019) Eeg pattern recog-
sets efficiently. Expert Syst Appl 39(18):13166–13172 nition based on self-adjusting dynamic time dependency method.
58. Wen-Yuan L, Liu S-FF (2003) The study of association agorithm In: International conference on data service, pp 320–328. Springer
BGL based on binary system and oriented graph
59. Yang K, Quan T, Sun Y (2019) Distributed fp-growth with node Publisher's Note Springer Nature remains neutral with regard to
table for large-scale association rule mining, Nov. 26 2019. US jurisdictional claims in published maps and institutional affiliations.
Patent 10,489,363
60. Ye Y, Chiang C-C (2006) A parallel apriori algorithm for frequent Springer Nature or its licensor holds exclusive rights to this article under
itemsets mining. In: Fourth international conference on software a publishing agreement with the author(s) or other rightsholder(s);
engineering research, management and applications, 2006, pp author self-archiving of the accepted manuscript version of this article
87–94. IEEE is solely governed by the terms of such publishing agreement and
61. Yin M, Wang W, Liu Y, Jiang D (2018) An improvement of fp- applicable law.
growth association rule mining algorithm based on adjacency
table. In: MATEC web of conferences, vol189, p 10012. EDP
Sciences

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:

1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at

[email protected]

What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
37 pages
Air Master Catalog
100% (2)
Air Master Catalog
191 pages
Alumimium Bus Bar Calculation 4000A
71% (7)
Alumimium Bus Bar Calculation 4000A
5 pages
Frequent Closed Pattern Mining Algorithm Based On COFI-Tree
No ratings yet
Frequent Closed Pattern Mining Algorithm Based On COFI-Tree
2 pages
An Improvement of FP-Growth Association Rule Minin
No ratings yet
An Improvement of FP-Growth Association Rule Minin
7 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
12 pages
A New Parallel Algorithm For Frequent Pattern Mining
No ratings yet
A New Parallel Algorithm For Frequent Pattern Mining
5 pages
Improv Me Net
No ratings yet
Improv Me Net
7 pages
A Frequent Pattern Mining Algorithm Based On Fp-Tree Structure Andapriori Algorithm
No ratings yet
A Frequent Pattern Mining Algorithm Based On Fp-Tree Structure Andapriori Algorithm
3 pages
Efficient Algorithm For Mining Frequent Patterns Java Project
No ratings yet
Efficient Algorithm For Mining Frequent Patterns Java Project
38 pages
U3 - FP Trees - 5th Sem - DS
No ratings yet
U3 - FP Trees - 5th Sem - DS
9 pages
Knowledge-Based Systems: Gwangbum Pyun, Unil Yun, Keun Ho Ryu
No ratings yet
Knowledge-Based Systems: Gwangbum Pyun, Unil Yun, Keun Ho Ryu
15 pages
Lecture 2.3.3 2.3.4
No ratings yet
Lecture 2.3.3 2.3.4
29 pages
F P-Tree F P-Growth
No ratings yet
F P-Tree F P-Growth
7 pages
FP Growth PPT Shabnam
No ratings yet
FP Growth PPT Shabnam
19 pages
Fp-Tree Growth Algorithm
No ratings yet
Fp-Tree Growth Algorithm
11 pages
FP Growth
No ratings yet
FP Growth
21 pages
FP Tree Growth: Frequent Pattern Growth Algorithm
100% (1)
FP Tree Growth: Frequent Pattern Growth Algorithm
2 pages
15-Fp-Tree Problem-10-09-2024
No ratings yet
15-Fp-Tree Problem-10-09-2024
2 pages
177 1496393364 - 02-06-2017 PDF
No ratings yet
177 1496393364 - 02-06-2017 PDF
6 pages
177 1496393364 - 02-06-2017 PDF
No ratings yet
177 1496393364 - 02-06-2017 PDF
6 pages
DM Unit2 - 1 Association Mining 19I504
No ratings yet
DM Unit2 - 1 Association Mining 19I504
86 pages
Itemset Mining Over Large Transactional Tables On The Relational Databases
No ratings yet
Itemset Mining Over Large Transactional Tables On The Relational Databases
6 pages
Lecture 6
No ratings yet
Lecture 6
18 pages
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
No ratings yet
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
19 pages
DM Unit-2
No ratings yet
DM Unit-2
14 pages
Q) FP Growth Algorithm?: This Algorithm Works As Follows
No ratings yet
Q) FP Growth Algorithm?: This Algorithm Works As Follows
3 pages
FP-Tree Growth Algorithm
No ratings yet
FP-Tree Growth Algorithm
15 pages
(18-22) Hybrid Association Rule Mining Using AC Tree
No ratings yet
(18-22) Hybrid Association Rule Mining Using AC Tree
5 pages
Unit4 2 Association Rules FP Growth
No ratings yet
Unit4 2 Association Rules FP Growth
33 pages
FP Growth Algorithm
No ratings yet
FP Growth Algorithm
17 pages
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
No ratings yet
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
5 pages
Fptreehuffman
No ratings yet
Fptreehuffman
4 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
44 pages
Mtech Project Seminar1
No ratings yet
Mtech Project Seminar1
36 pages
18-FP-Growth Algorithm-12-02-2025
No ratings yet
18-FP-Growth Algorithm-12-02-2025
24 pages
FP Tree
No ratings yet
FP Tree
37 pages
A Comparative Analysis of NFA and Tree-Based Approach For Infrequent Itemset Mining
No ratings yet
A Comparative Analysis of NFA and Tree-Based Approach For Infrequent Itemset Mining
5 pages
2 Unit DM K Raj Kuamr
No ratings yet
2 Unit DM K Raj Kuamr
26 pages
FP Tree
No ratings yet
FP Tree
54 pages
2007 Jiawei Han FP Mining
No ratings yet
2007 Jiawei Han FP Mining
32 pages
FP-Growth Algorithm
No ratings yet
FP-Growth Algorithm
5 pages
Module 4.2 Association Rule Mining
No ratings yet
Module 4.2 Association Rule Mining
88 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
FPTree 09
No ratings yet
FPTree 09
45 pages
03 Pre Processing
No ratings yet
03 Pre Processing
20 pages
FP Tree
No ratings yet
FP Tree
42 pages
ESE Handouts 4 - FP Growth Algorithm (Fall 2016)
No ratings yet
ESE Handouts 4 - FP Growth Algorithm (Fall 2016)
13 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
23 pages
FP Growth
No ratings yet
FP Growth
16 pages
FPgrowth
No ratings yet
FPgrowth
2 pages
DM Lecture 29
No ratings yet
DM Lecture 29
20 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
44 pages
Lecture 5 - Monday, September 3, 2007: 2.1 Example From Paper
No ratings yet
Lecture 5 - Monday, September 3, 2007: 2.1 Example From Paper
6 pages
DWDM Unit-3
100% (1)
DWDM Unit-3
63 pages
Chapter 5
No ratings yet
Chapter 5
26 pages
Fpgrowth
No ratings yet
Fpgrowth
11 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
5 pages
FP Growth
No ratings yet
FP Growth
30 pages
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
From Everand
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
From Everand
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
David R Swinburne
No ratings yet
Oscillations - Class 11 Physics NCERT Solutions Free PDF Download
No ratings yet
Oscillations - Class 11 Physics NCERT Solutions Free PDF Download
41 pages
Notes 1
No ratings yet
Notes 1
76 pages
Programming Manual: Advanced Motion Control Software
No ratings yet
Programming Manual: Advanced Motion Control Software
17 pages
Dignaga's Philosophy of Language Dignaga On Anyapoha
No ratings yet
Dignaga's Philosophy of Language Dignaga On Anyapoha
374 pages
Corekit User Manual Emulex
No ratings yet
Corekit User Manual Emulex
63 pages
Oracle Recommended Patches R12.ATG - PF.B
No ratings yet
Oracle Recommended Patches R12.ATG - PF.B
32 pages
CINPD Unit 5
No ratings yet
CINPD Unit 5
16 pages
Image Recognition Using CIFAR 10
100% (1)
Image Recognition Using CIFAR 10
56 pages
Computer Science Class Notes
No ratings yet
Computer Science Class Notes
3 pages
App T Da Pam 73-1 S
No ratings yet
App T Da Pam 73-1 S
4 pages
Employee Benefit Plans 6: Limitations On Contributions and Benefits
No ratings yet
Employee Benefit Plans 6: Limitations On Contributions and Benefits
23 pages
Gear Beam and Wear Strength
No ratings yet
Gear Beam and Wear Strength
46 pages
6 Electric Potential and Relationship Between E and V - Maxwell's Equation
No ratings yet
6 Electric Potential and Relationship Between E and V - Maxwell's Equation
25 pages
High Temperature Scale
No ratings yet
High Temperature Scale
51 pages
Electro Chemistry (MS)
No ratings yet
Electro Chemistry (MS)
208 pages
Maintenance Schedules / Maintenance Parts
100% (1)
Maintenance Schedules / Maintenance Parts
29 pages
BSC Aeronautical
No ratings yet
BSC Aeronautical
144 pages
Silva Et-Al 2013
No ratings yet
Silva Et-Al 2013
8 pages
Why SIMUL8 PDF
No ratings yet
Why SIMUL8 PDF
8 pages
ATS - Daily Trading Plan 27agustus2018
No ratings yet
ATS - Daily Trading Plan 27agustus2018
1 page
ZOOM Software Measurement and Graph Types
No ratings yet
ZOOM Software Measurement and Graph Types
6 pages
Ur Z21rev4
No ratings yet
Ur Z21rev4
16 pages
Santos Training2
No ratings yet
Santos Training2
27 pages
Solvent Deasphalting PPT Final - 1
100% (5)
Solvent Deasphalting PPT Final - 1
30 pages
Design of Rotation Inducing Rocket Fins and Their Analysis For Aerodynamic Stability
No ratings yet
Design of Rotation Inducing Rocket Fins and Their Analysis For Aerodynamic Stability
6 pages
Intelligent Search Algorithms: Forth Year
No ratings yet
Intelligent Search Algorithms: Forth Year
17 pages
MYP Criteria Year 5
No ratings yet
MYP Criteria Year 5
4 pages
W73153 International GCSE Science (Single Award) 4SS0 AN Accessible Version
No ratings yet
W73153 International GCSE Science (Single Award) 4SS0 AN Accessible Version
4 pages

An Improved Frequent Pattern Tree The Child Struct

Uploaded by

An Improved Frequent Pattern Tree The Child Struct

Uploaded by

Pattern Analysis and Applications (2023) 26:437–454

An improved frequent pattern tree: the child structured frequent

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Table 1 Features of prominent frequent pattern mining algorithms

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

3.1 Background Table 3 Transactional database TID Transactions

FP-tree is an efficient tree structure to store frequent item- 1 C,F

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

– If a new transaction ti:’I J’ needs to be inserted into the

3.2 The child search tree

Each node of the CSFP-tree has zero or more children. The

3.3 Structure of the CSFP‑tree

CSFP-tree is an improved FP-tree and contains a collection

The structure of the proposed CSFP-tree is as follows. A

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Fig. 4 a Node structure of

Fig. 5 CSFP-tree after inserting

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

The algorithm CSFP-growth mines the frequent itemsets

Fig. 6 Algorithm-1 to construct CSFP-tree Fig. 7 Procedure to search child

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

complexity of the algorithm here is the same as the FP-

2 The second step is to construct the CSFPTree. Here we

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Fig. 9 Running time on the con-

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Fig. 10 Memory consumption

Fig. 11 Running time on the

Fig. 12 Memory consumption

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Fig. 13 Running time on the

Fig. 14 Memory consumption

Fig. 15 Running time on the

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Fig. 16 Memory consumption

Fig. 17 Running time on the

Fig. 18 Memory consumption

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Fig. 19 Running time on the

Fig. 20 Memory consumption

Fig. 21 Running time on the

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Fig. 22 Memory consumption

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

You might also like

3.1 Background Table 3 Transactional database TID Transactions

3.2 The child search tree

3.3 Structure of the CSFP‑tree