2022 Pattern Mining Current Challenges
2022 Pattern Mining Current Challenges
net/publication/362028656
CITATIONS READS
60 396
7 authors, including:
All content following this page was uploaded by Wensheng Gan on 25 July 2022.
1 Introduction
Nowadays, large amounts of data of various types are stored in databases of various
organizations. Hence, it has become important for many organizations to develop
automatic or semi-automatic tools to analyze data. Pattern mining is a subfield
of data mining that aims at identifying interesting and useful patterns in data.
The aim is to find patterns that are easily interpretable by users, and thus can
help in understanding the data. Patterns can be used to support decision-making
but also to perform other tasks such as classification, clustering and prediction.
Pattern mining research started more than two decades ago. While initial studies
have focused on discovering frequent patterns on data such as shopping data, the
field has rapidly changed to consider other data types and pattern types. Also,
major improvements have been made to algorithms and data structures to improve
efficiency, scalability, and provide more features.
This paper provides an overview of key challenges and opportunities in pat-
tern mining, that deserve more attention. To write this paper, seven researchers
from the field of pattern mining were invited to write about a key challenge of
their choice. Six challenges have been identified:
The rest of this paper is organized as follows. The Sects. 2 to 6 describe the
six challenges. Then, Sect. 7 draws a conclusion.
A
2 1.5 A
z
B
f) an attributed graph
g) a multi-labeled graph h) a multi-relational graph
Age = 22
Money = 2200
A,B,C
x,y D,E
friend
z
Age = 51 E classmate classmate
Money = 3500 E
designed for quickly querying frequent itemsets during the operation process.
Both algorithms adopt the minimum confidence and support measurement, and
the improved Itemset-Tree can be updated incrementally with new transac-
tions. For multitude-targeted mining, the guided FP-growth [30] was designed
to determine the frequency of each given itemset based on target Itemset-Tree.
After that, a constraint-based ARM query model [1] was also introduced for
exploratory analysis of diverse clinical databases.
Targeted SPM Algorithms. The sequential ordering of items is commonly
seen in real-life applications. To handle the sequence data that is more complex
than transaction data, Chueh et al. [8] reversed the original sequences to dis-
cover targeted sequential patterns with time intervals. Based on the definition of
targeted SPM, Chand et al. [5] proposed a novel SPM algorithm to discover pat-
terns with checking whether they satisfied the recency and monetary constraint
and also were target-oriented. However, the target pattern in this approach is
defined in the end of each sequence. A goal-oriented algorithm [7] can extract the
transaction activities before losing the customer. By utilizing TPM, this algo-
rithm can handle the problem of determining whether a customer is leaving and
toward a specific goal.
Utility-Driven TPM Algorithms. Previous TPM algorithms mainly adopt
the measurement of frequency and confidence, but them do not involve the con-
cept of utility [15], which is helpful for discovering more informative patterns and
knowledge. Recently, Miao et al. [26] are the first to introduce a targeted high-
utility itemset querying model (abbreviated as TargetUM). TargetUM intro-
duced several key definitions and formulated the problem of mining the desired
set of high-utility itemsets containing given target items. A utility-based trie
tree was designed to index and query target itemsets on-the-fly. Consider the
sequence data, Zhang et al. [44] introduced targeted high-utility sequence query-
ing problem and proposed the TUSQ algorithm. Targeted utility-chain and two
novel upper bounds on utility measurement (namely suffix remain utility and
terminated descendants utility) are proposed in the TUSQ model.
Several open problems of targeted pattern mining/search and interesting
directions (including but not limited to) in the future are highlighted in detail
below. It is important to note that these open problems are also widespread in
other pattern mining tasks.
– What type of data to be mined. As we know, there are many types
of data in real world, such as transaction data, sequence, streaming data,
spatiotemporal data, complex event, time-series, text and web, multi-media,
graphs, social network, and uncertain data. How to design effective TPM
algorithms to deal with these data is very urgent and more challenging.
– What kind of pattern or knowledge to be mined. For example, there
are two categories, descriptive vs. predictive data mining, which is based
on different kind of knowledge. As reviewed before, itemset, sequence, rule,
graph, and event are the different kinds of patterns that are extracted from
various types of data. However, few TPM algorithms can discover these kinds
of patterns.
Pattern Mining: Current Challenges and Opportunities 39
– More effective data structure. According to the current studies, the index-
ing and searching in TPM are more challenging than that of traditional pat-
tern mining. In particular, when dealing with big data, we need more effective
data structures to store rich information from data.
– More powerful strategies. Due to the difficulty, the search space of TPM
has an explosion. Thus, how to reduce the search space using powerful pruning
strategies (w.r.t. upper bounds) plays a key role in improving the performance
of the TPM algorithm.
– Different applications. In general, there are many applications of data
mining methods, including discrimination, association analysis, classification,
clustering, trend/deviation, outlier detection, etc. It is clear that different
application requires a special solution of TPM.
– Visualization. It is interesting that the data and mining results will be dis-
played automatically in search process. In the future, there are many oppor-
tunities to increase the interpretability of the results, the ease of use of the
model, and the interactivity of the mining process.
Sequential pattern mining (SPM) has been used in keyphrase extraction [42] and
feature selection [41]. The goal of SPM is to discover interesting subsequences
(also called patterns). The most common problem is to mine frequent patterns
whose supports are no less than a user-defined parameter called minsup. The
definitions are as follows.
Example 1. For a sale dataset, suppose there are five products: a, b, c, d, and e,
i.e. σ = {a, b, c, d, e}. Suppose customer 1 first purchased items a, b, and c, then
bought a, b, and e, then purchased c, then bought (a, b, d), and e, then purchased
a and c, and finally bought (a, c) and e. The shopping sequence of customer 1 is
S1 = {s1 , s2 , s3 , s4 , s5 , s6 } = {(a, b, c), (a, b, e), (c), (a, b, d, e), (a, c), (a, c, e)}.
Similarly, we assume that for customer 2, S2 = {s1 , s2 } = {(a, b, d), (c)}. Thus,
the sequence database is SDB = {S1 , S2 }.
This kind of sequence format is quite general since the sequence is an ordered
list of itemsets, which means that each itemset contains one or more items. Thus,
such sequence is called a sequence with itemsets. But for many applications, the
data is represented as an ordered list of items called a sequence with items,
40 P. Fournier-Viger et al.
which means that each itemset contains only one item, e.g. DNA sequences,
protein sequences, virus sequences, and time series. For example, “attaaagg” is
a segment of the SARS-CoV-2 virus.
Example 2. Pattern P = {(a, b), (c)} occurs in sequences S1 and S2 . For exam-
ple, <1,3> is an occurrence of pattern P in sequence S1 , since p1 = (a, b) ⊆ s1
= (a, b, c) and p2 = (c) ⊆ s3 = (c).
(1) Pattern without gap: Pattern without gap is also called consecutive sub-
sequences [6], i.e. for occurrence I = <i1 , i2 , · · · , im >, it requires that i2 =
i1 + 1, i3 = i2 + 1, · · · , im = im−1 + 1. For example, there are two occur-
rences of pattern P = p1 p2 p3 = aba in sequence S: <2,3,4> and <6,7,8>.
The advantage of this method is that it is easy to calculate the support.
However, the restriction is too strict, which will lead to the loss of a lot of
important information.
(2) Pattern with self-adaptive gap [41]: It means that there is no constraint
on the occurrence. For example, <1,7,8> is an occurrence of P = aba in S.
The advantage of this method is that users do not need any prior knowledge
and it is easy to find the characteristics of the sequence database. However,
there are too many occurrences, which will lead to difficulties in analyzing
the results.
(3) Pattern with gap constraint: In this case, users should predefine a gap
= [M, N ], and for each occurrence, it needs to satisfy that M ≤ ik − ik−1 − 1
≤ N (1 < k ≤ m), where M and N are the minimum and maximum wild-
cards. This method can prune some meaningless occurrences. For example,
Pattern Mining: Current Challenges and Opportunities 41
calculating the supports under different conditions? 2). Given a database with
itemsets, how to design effective mining algorithms for these conditions? 3).
If the dataset is dynamic or a stream database, how to design effective min-
ing algorithms? 4). A variety of SPM methods were proposed to meet differ-
ent requirements, such as closed SPM, maximal SPM, top-k SPM, compressing
SPM, co-occurrence SPM, rare SPM, negative SPM, tri-partition SPM, and
high utility SPM. However, most of them neglect the repetitions and consider
sequence databases with itemsets. If the repetitions cannot be neglected, how to
design effective mining algorithms? 5). For a specific problem, there are many
approaches to solve it. However, what is the best approach? For example, for
a sequence classification problem, there are many methods to extract the fea-
tures, such as frequent patterns and contrast patterns under the four conditions.
However, which one is the best approach?
A key limitation of traditional pattern mining algorithms such as Apriori and FP-
Growth is that they are batch algorithms. This means that if the input database
is updated, the user needs to run again the algorithm to get new results even
if the database is slightly changed. Consequently, classical algorithms are ineffi-
cient for various real applications where databases are dynamics. To address this
challenge, various approaches have been adopted which can be roughly classi-
fied into three categories: (1) Incremental pattern mining algorithms, (2) Stream
pattern mining algorithms and (3) Interactive pattern mining algorithms.
Incremental pattern mining algorithms are designed to update the set of
discovered patterns once the database is updated by inserting or deleting some
transactions. To avoid repetitively scanning the database, a strategy is to use
a buffer that contains the set of almost frequent itemsets in memory [19,23].
Stream pattern mining algorithms are designed to deal with databases
that change in real-time and where new data may arrive at a very high speed.
These algorithms aim to process transactions quickly to return an approximate
set of patterns rather that the complete set. Two representative algorithms for
incremental pattern mining are estDec and estDec+ [32]. estDec employs a lex-
icographic tree structure called a prefix tree to identify and maintain significant
itemsets from an online data stream. Significant itemsets are itemsets that may
be frequent itemsets in the near future. It has been observed that the size of
the prefix tree, which is located in the main memory, becomes very large as
the number of significant itemsets increases. Thus, if the size of the prefix tree
becomes larger than the available memory space, estDec fails to identify new
significant itemsets. As a result, the accuracy of estDec results is degraded [32].
estDec+ and other algorithms have been designed to solve this problem.
Interactive pattern mining tries to handle dynamic databases differently
by injecting users preferences, users feedback or user targeted queries, into the
mining process [3,4,14,20,22]. In contrast with incremental and stream pattern
Pattern Mining: Current Challenges and Opportunities 43
mining where algorithms aim to maintain and update a large set of patterns
that may be uninteresting to users, interactive pattern mining algorithms focus
only on some specific sets of patterns that are needed by the user. Besides,
several approaches have been designed which can generally be classified in three
categories: (1) Targeted querying based approaches, (2) Users feed-backs based
approaches and (3) Visualization based approaches.
Targeted Querying Based Approaches. These approaches let the user search for
patterns containing specific items by sending some targeted queries to the system
to search for interesting patterns. Then, the system interacts and tries to give
quick answers to the user queries [14,20,22]. See Sect. 3 for more details.
Users Feedback Based Approaches. Users feedback based approaches are more
interactive comparing with targeted querying based approaches. The key idea
is to progressively address feedback sent by users during the mining process.
Bhuiyan et al. [4] proposed an interactive pattern mining system that is based
on the sampling of frequent patterns from hidden datasets. Hidden datasets
exist in various real applications where the data owner and the data analyst is
no necessarily the same entity. Thus, the data analyst may not have the full
access to the data and the data owner has to maintain the confidentiality of
the data by providing to analysts only some samples from data that would be
beneficial to him but without giving him the possibility to reconstruct the entire
dataset from the given samples [4]. The proposed interactive systems aims to
continuously update effective sampling distributions by binary feedback from
the users. The proposed system works as follows: Using a Markov Chain Monte
Carlo (MCMC) sampling method, the system return a small set of frequent
patterns (samples) to each analysts (user). Then, each analyst sends a feedback
about its associated samples. The feedback used in this method is a simple
feedback where the response of a user on a pattern is to indicate if this pattern
is interesting or uninteresting. The system defines a scoring function based on
users’ feedback and updates each sampling distribution taking into consideration
its corresponding user’s interests. Following these steps, the proposed system can
progressively address the user preferences so that the data remains confidential.
Experiments on itemset and graph mining datasets demonstrate the usefulness
of the proposed system. Based on the same approach, an improved version of
this system was proposed [3]. Besides, authors have adopted a better scoring
function for graph data by using graph topology and new improved feedback
mechanisms, namely, periodic feedback and conditional periodic feedback.
Another common problem in pattern mining that motivates researchers to
design interactive pattern discovery tools is the problem of pattern explosion [11].
More precisely, traditional pattern mining algorithms discover a large number
of patterns, of which many are redundant or similar. As a result, the analyst or
the data expert should invest substantial efforts to look for the desired patterns
which is not an easy task. To overcome this limitation, an interactive pattern
discovery framework was proposed [11] for two mining tasks, frequent itemset
mining and subgroup discovery. The proposed framework consists of three steps:
(1) Mining patterns, (2) Interacting with the user and (3) learning user-specific
44 P. Fournier-Viger et al.
pattern interestingness. Besides, The user is only asked to rank small sets of pat-
terns, while a ranking function is inferred from users feedback using preference
learning techniques. In the experimental results, it has been demonstrated that
the system was able to learn accurate pattern rankings for both mining tasks.
Visualization Based Approaches. Another important aspect to design a
good interactive pattern mining system is the visualisation aspect. More pre-
cisely, data visualisation techniques play an important role in making the dis-
covered knowledge understandable and interpretable by humans [17]. In fact, the
output of the implemented algorithms is presented to the user only in a textual
form, which may impose many limitations such as the difficulty to identify similar
patterns and the difficulty to understand the relation between patterns. There
are various visualization techniques for different forms of patterns. For instance,
researchers [2], have used a lattice based representation based on the Hasse dia-
gram to visualise the output of frequent itemset mining. All possible itemsets can
be represented in the diagram and the frequent itemsets are highlighted in bold.
Other visualisation techniques have been used to efficiently present itemsets to
the user such as pixel based visualization and tree based visualisation [17]. As for
itemsets mining, various visualization tools were proposed for the other pattern
mining problems such as mining association rules, mining sequential patterns
and mining episodes. The reader can refer to [17] where a detailed survey that
present the visualisation techniques designed for each mining task.
(DBs) where interest is measured using functions. One of the first interestingness
functions used in IPM is the support function to mine frequent patterns (FPM)
in binary DBs. A pattern is said to be frequent in a binary DB if the number
of its appearances in transactions of the database (or its support) is no less
than a user-predefined minimum support threshold. For the special measure, in
order to efficiently solve the combinatorial explosion in FPM, a nice property
of the support, the Downward Closure or Anti-Monotonicity - AM, has been
applied. This property states that if a pattern is not frequent (infrequent), all its
super-patterns are also infrequent, or the whole branch rooted at the infrequent
pattern (on the prefix search tree) can be pruned immediately.
However, the support measure is not suitable for all applications. Thus, other
interestingness functions have been designed to find important patterns that
may be rare but useful or interesting for many real-life applications. Some of the
most popular kinds are utility functions of patterns in quantitative DBs (QDBs).
Utility functions can be used for example to find the most profitable purchase
patterns in customer transactions. Note that the support can be seen as a special
utility function. A simple QDB is called a quantitative transaction DB (QTDB),
where each (input) transaction is a quantitative itemset (a set of quantitative
items). A more general QDB is quantitative sequence DB (QSDB) of which each
input quantitative sequence consists of a sequence of quantitative itemsets.
Moreover, a key challenge in the problem of high utility pattern mining
(HUPM) is that such utility functions usually do not satisfy the AM property. To
overcome this challenge, we need to devise upper bounds or weak upper bounds
(on the utilities) that satisfy the AM property or weaker (such as Anti-Monotone
like - AML) ones. In this context, given a utility function u of patterns, a function
ub is said to be an upper bound (UB) on u if ub(x) ≥ u(x) for any pattern x. And
a function wub is said to be a weak upper bound (WUB) on u if wub(x) ≥ u(y)
for any extension pattern y of x. Usually, given a (W)UB, the tighter (W)UB
is, the stronger its pruning ability is. The effort and time for devising good and
tight (W)UBs is often very long.
For example, in the first problem of high utility itemset mining (HUIM) on a
QTDB D, the utility u of an itemset A is defined as the summation of its utilities
u(A,T) in all transactions T of D containing A, where the utility u(A,T) of A
in T is the summation of utilities of items of A appearing in T. Similarly, for the
second problem of high average utility itemset mining (HAUIM) in a QTDB D,
the average utility au of an itemset A is defined as the utility u(A) divided by
its length length(A). From the first time 2004 [43] (2009) where HUIM (HAUIM,
respectively) was proposed, it took more than 8 (10) years to obtain good tighter
UBs based on the remaining utility [25] (WUBs based on vertical representation
of QTDB [37], respectively). It is worthy to note that for the average utility au,
besides UBs (on it), there are many WUBs, which are much tighter than the
UBs. The number of WUBs found so far is about five times more than that of
UBs, and devising such good WUBs requires much effort and time.
For the more general problems of high utility sequence mining (HUSM) on
a QSDB D, because each sequence α may appear multiple times in an input
quantitative sequence (IQS) Ψ of D, there are many ways to define the utility
Pattern Mining: Current Challenges and Opportunities 47
of α in Ψ . There are two popular kinds of such utilities, denoted as umax (α, Ψ )
and umin (α, Ψ ), that are respectively defined as the maximum and minimum
values among utilities of occurrences of α in Ψ . Then, umax (α) and umin (α) are
respectively the summation of umax (α, Ψ ) and umin (α, Ψ ) of α in all IQSs Ψ
containing α. Similarly, there are two other kinds of utilities named aumax (α)
and aumin (α) that are respectively defined as umax (α) and umin (α) divided
by length of α. For the first (third) utility umax (aumax ), to find good UBs
[16] (WUBs [36], respectively), it took about 10 years (8 years, respectively).
Furthermore, devising such UBs (for example on umax ) without mathematically
proving it strictly may lead to inexactness in corresponding algorithms [16].
For the new second (fourth) utility umin (aumin ), the time for devising good
UBs (WUBs) on it has decreased significantly only in one paper (e.g. [38] for
aumin ). Thus, from the theoretical results presented in the paper, a natural and
useful question that has been raised is how to propose a generic framework for
the IPM problem according to any new interestingness function, and a general
and simple method to quickly design (W)UBs on functions using weeks instead
of years? In more details, given a QSDB D and a new interestingness function itr
that may not satisfy AM and a user-specified minimum interestingness threshold
mi, the corresponding IPM problem is to mine the set {α|itr(α) ≥ mi} of all
highly interesting patterns. The first question is how to quickly devise (W)UBs
on itr so that they are as tight as possible and have anti-monotone-like proper-
ties? The goal of these requirements is to allow significantly reducing the search
space. The second question is how to transform checking the anti-monotone-like
properties of itr in the whole D into simpler one in each input quantitative
sequence? Moreover, these theoretical results must be proven strictly in mathe-
matical language. Then, the main challenge that aims at significantly reducing
time for devising good (W)UBs on itr will be solved.
8 Conclusion
The field of pattern mining has been rapidly changing. This paper has provided
an overview of six key challenges, each identified by a researcher from the field.
References
1. Abeysinghe, R., Cui, L.: Query-constraint-based mining of association rules for
exploratory analysis of clinical datasets in the national sleep research resource.
BMC Med. Inform. Decis. Making 18(2), 58 (2018)
2. Alsallakh, B., Micallef, L., Aigner, W., Hauser, H., Miksch, S., Rodgers, P.: The
state-of-the-art of set visualization. In: Computer Graphics Forum, vol. 35, pp.
234–260. Wiley Online Library (2016)
3. Bhuiyan, M., Hasan, M.A.: Interactive knowledge discovery from hidden data
through sampling of frequent patterns. Statist. Anal. Data Mining ASA Data Sci.
J. 9(4), 205–229 (2016)
4. Bhuiyan, M., Mukhopadhyay, S., Hasan, M.A.: Interactive pattern mining on hid-
den data: a sampling-based solution. In: Proceedings of the 21st ACM International
Conference on Information and Knowledge Management, pp. 95–104 (2012)
48 P. Fournier-Viger et al.
5. Chand, C., Thakkar, A., Ganatra, A.: Target oriented sequential pattern mining
using recency and monetary constraints. Int. J. Comput. App. 45(10), 12–18 (2012)
6. Chen, M.S., Park, J.S., Yu, P.S.: Efficient data mining for path traversal patterns.
IEEE Trans. Knowl. Data Eng. 10(2), 209–221 (1998)
7. Chiang, D.A., Wang, Y.F., Lee, S.L., Lin, C.J.: Goal-oriented sequential pattern
for network banking churn analysis. Expert Syst. App. 25(3), 293–302 (2003)
8. Chueh, H.E., et al.: Mining target-oriented sequential patterns with time-intervals.
Int. J. Comput. Sci. Inf. Technol. 2(4), 113–123 (2010)
9. Djenouri, Y., Comuzzi, M.: Combining apriori heuristic and bio-inspired algorithms
for solving the frequent itemsets mining problem. Inf. Sci 420, 1–15 (2017)
10. Djenouri, Y., Djenouri, D., Belhadi, A., Fournier-Viger, P., Lin, J.C.-W.: A new
framework for metaheuristic-based frequent itemset mining. Appl. Intell. 48(12),
4775–4791 (2018). https://doi.org/10.1007/s10489-018-1245-8
11. Dzyuba, V., Leeuwen, M.v., Nijssen, S., De Raedt, L.: Interactive learning of pat-
tern rankings. Int. J. Artif. Intell. Tools 23(06), 1460026 (2014)
12. Fournier-Viger, P., Cheng, C., Cheng, Z., Lin, J.C., Selmaoui-Folcher, N.: Mining
significant trend sequences in dynamic attributed graphs. Knowl. Based Syst. 182,
104797 (2019)
13. Fournier-Viger, P., et al.: A survey of pattern mining in dynamic graphs. Wiley
Interdiscip. Rev. Data Min. Knowl. Discov. 10(6), e1372 (2020)
14. Fournier-Viger, P., Mwamikazi, E., Gueniche, T., Faghihi, U.: MEIT: memory effi-
cient itemset tree for targeted association rule mining. In: Motoda, H., Wu, Z.,
Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) ADMA 2013. LNCS (LNAI), vol.
8347, pp. 95–106. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-
53917-6 9
15. Gan, W., et al.: A survey of utility-oriented pattern mining. IEEE Trans. Knowl.
Data Eng. 33(4), 1306–1327 (2021)
16. Gan, W., et al.: ProUM: projection-based utility mining on sequence data. Inf. Sci.
513, 222–240 (2020)
17. Jentner, W., Keim, D.A.: Visualization and visual analytic techniques for patterns.
In: High-Utility Pattern Mining, pp. 303–337 (2019)
18. Jiang, C., Coenen, F., Zito, M.: A survey of frequent subgraph mining algorithms.
Knowl. Eng. Rev. 28, 75–105 (2013)
19. Koh, J.-L., Shieh, S.-F.: An efficient approach for maintaining association rules
based on adjusting FP-tree structures. In: Lee, Y.J., Li, J., Whang, K.-Y., Lee, D.
(eds.) DASFAA 2004. LNCS, vol. 2973, pp. 417–424. Springer, Heidelberg (2004).
https://doi.org/10.1007/978-3-540-24571-1 38
20. Kubat, M., Hafez, A., Raghavan, V.V., Lekkala, J.R., Chen, W.K.: Itemset trees
for targeted association querying. IEEE Trans. Knowl. Data Eng. 15(6), 1522–1534
(2003)
21. Lam, H.T., Morchen, F., Fradkin, D., Calders, T.: Mining compressing sequential
patterns. Statist. Anal. Data Mining ASA Data Sci. J. 7(1), 34–52 (2014)
22. Li, X., Li, J., Fournier-Viger, P., Nawaz, M.S., Yao, J., Lin, J.C.W.: Mining pro-
ductive itemsets in dynamic databases. IEEE Access 8, 140122–140144 (2020)
23. Lin, C.W., Hong, T.P., Lu, W.H.: The pre-FUFP algorithm for incremental mining.
Expert Syst. App. 36(5), 9498–9505 (2009)
24. Lin, J.C.W., Yang, L., Fournier-Viger, P., Hong, T.P., Voznak, M.: A binary PSO
approach to mine high-utility itemsets. Soft Comput. 21(17), 5103–5121 (2017)
25. Liu, M., Qu, J.: Mining high utility itemsets without candidate generation. In: Pro-
ceedings of the 21st ACM International Conference on Information and Knowledge
Management, pp. 55–64 (2012)
Pattern Mining: Current Challenges and Opportunities 49
26. Miao, J., Wan, S., Gan, W., Sun, J., Chen, J.: TargetUM: targeted high-utility
itemset querying. arXiv preprint arXiv:2111.00309 (2021)
27. Min, F., Zhang, Z.H., Zhai, W.J., Shen, R.P.: Frequent pattern discovery with
tri-partition alphabets. Inf. Sci. 507, 715–732 (2020)
28. Ouarem, O., Nouioua, F., Fournier-Viger, P.: Mining episode rules from event
sequences under non-overlapping frequency. In: Fujita, H., Selamat, A., Lin, J.C.-
W., Ali, M. (eds.) IEA/AIE 2021. LNCS (LNAI), vol. 12798, pp. 73–85. Springer,
Cham (2021). https://doi.org/10.1007/978-3-030-79457-6 7
29. Qu, W., Yan, D., Guo, G., Wang, X., Zou, L., Zhou, Y.: Parallel mining of frequent
subtree patterns. In: Qin, L., et al. (eds.) SFDI/LSGDA -2020. CCIS, vol. 1281,
pp. 18–32. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61133-0 2
30. Shabtay, L., Yaari, R., Dattner, I.: A guided FP-growth algorithm for multitude-
targeted mining of big data. arXiv preprint arXiv:1803.06632 (2018)
31. Shelokar, P., Quirin, A., Cordón, O.: Three-objective subgraph mining using mul-
tiobjective evolutionary programming. Comput. Syst. Sci 80(1), 16–26 (2014)
32. Shin, S.J., Lee, D.S., Lee, W.S.: CP-tree: an adaptive synopsis structure for com-
pressing frequent itemsets over online data streams. Inf. Sci. 278, 559–576 (2014)
33. Song, W., Huang, C.: Discovering high utility itemsets based on the artificial bee
colony algorithm. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M.,
Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 3–14. Springer,
Cham (2018). https://doi.org/10.1007/978-3-319-93040-4 1
34. Song, W., Huang, C.: Mining high utility itemsets using bio-inspired algorithms: a
diverse optimal value framework. IEEE Access 6, 19568–19582 (2018)
35. Song, W., Zheng, C., Huang, C., Liu, L.: Heuristically mining the top-k high-utility
itemsets with cross-entropy optimization. Appl. Intell. 1–16 (2021). https://doi.
org/10.1007/s10489-021-02576-z
36. Truong, T., Duong, H., Le, B., Fournier-Viger, P.: EHAUSM: an efficient algorithm
for high average utility sequence mining. Inf. Sci. 515, 302–323 (2020)
37. Truong, T., Duong, H., Le, B., Fournier-Viger, P., Yun, U.: Efficient high average-
utility itemset mining using novel vertical weak upper-bounds. Knowl. Based Syst.
183, 104847 (2019)
38. Truong, T., Duong, H., Le, B., Fournier-Viger, P., Yun, U.: Frequent high mini-
mum average utility sequence mining with constraints in dynamic databases using
efficient pruning strategies. Appl. Intell. 52, 1–23 (2021)
39. Wu, Y., Shen, C., Jiang, H., Wu, X.: Strict pattern matching under non-overlapping
condition. Sci. China Inf. Sci. 50(1), 012101 (2017)
40. Wu, Y., Tong, Y., Zhu, X., Wu, X.: NOSEP: nonoverlapping sequence pattern
mining with gap constraints. IEEE Trans. Cybern. 48(10), 2809–2822 (2018)
41. Wu, Y., Wang, Y., Li, Y., Zhu, X., Wu, X.: Self-adaptive nonoverlapping contrast
sequential pattern mining. IEEE Trans. Cybern. (2021)
42. Xie, F., Wu, X., Zhu, X.: Efficient sequential pattern mining with wildcards for
keyphrase extraction. Knowl. Based Syst. 115, 27–39 (2017)
43. Yao, H., Hamilton, H.J., Butz, C.J.: A foundational approach to mining itemset
utilities from databases. In: Proceedings of the 2004 SIAM International Confer-
ence on Data Mining, pp. 482–486. SIAM (2004)
44. Zhang, C., Du, Z., Dai, Q., Gan, W., Weng, J., Yu, P.S.: TUSQ: targeted high-
utility sequence querying. arXiv preprint arXiv:2103.16615 (2021)
45. Zhang, L., Fu, G., Cheng, F., Qiu, J., Su, Y.: A multi-objective evolutionary app-
roach for mining frequent and high utility itemsets. Appl. Soft Comput. 62, 974–
986 (2018)