0% found this document useful (0 votes)
25 views8 pages

Predicting Buying Behavior Using CPT+: A Case Study of An E-Commerce Company

This document discusses predicting customer buying behavior using sequence prediction models. It proposes using the CPT+ model, an improved version of the Compact Prediction Tree (CPT) model, to predict the next product a customer is likely to purchase based on their shopping cart history. The CPT+ model is able to predict customer sequences while retaining more information than other models like Markov chains, clustering, or association rules. A case study on predicting customer buying behavior for an e-commerce company is presented to demonstrate using CPT+ for sequence prediction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views8 pages

Predicting Buying Behavior Using CPT+: A Case Study of An E-Commerce Company

This document discusses predicting customer buying behavior using sequence prediction models. It proposes using the CPT+ model, an improved version of the Compact Prediction Tree (CPT) model, to predict the next product a customer is likely to purchase based on their shopping cart history. The CPT+ model is able to predict customer sequences while retaining more information than other models like Markov chains, clustering, or association rules. A case study on predicting customer buying behavior for an e-commerce company is presented to demonstrate using CPT+ for sequence prediction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/348111217

Predicting Buying Behavior using CPT+: A Case Study of an E-commerce


Company

Article · December 2020


DOI: 10.2174/2666255814666201230115148

CITATIONS READS
0 209

3 authors, including:

Thon-Da Nguyen Thanh Ho


Ho Chi Minh City University of Economics and Law Ho Chi Minh City University of Economics and Law
10 PUBLICATIONS   12 CITATIONS    30 PUBLICATIONS   64 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Improve IELTS Listening Skills Through Intensive Visual Videos View project

All content following this page was uploaded by Thon-Da Nguyen on 27 July 2021.

The user has requested enhancement of the downloaded file.


Send Orders for Reprints to [email protected]
Recent Advances in Computer Science and Communications, XXXX, XX, 1-7 1

RESEARCH ARTICLE

Predicting Buying Behavior using CPT+: A Case Study of an E-commerce


Company

Nguyen Thon Daa,*, Tan Hanhb and Ho Trung Thanhc

a
Faculty of Information Systems, University of Economics and Law, VNU-HCM, Ho Chi Minh, Vietnam; bFaculty of
Information technology, Posts and Telecommunications Institute of Technology, Hanoi, Vietnam; cFaculty of
Information Systems, University of Economics and Law, VNU-HCM, Ho Chi Minh, Vietnam

Abstract: Recently, predicting the buying behaviour of customers on e-commerce websites is a


very critical issue in business management. This could help merchants understand the tendencies
of consumers in choosing and buying products. It has become increasingly common these days that
ARTICLE HISTORY predicting buying behaviour on online systems. Although this is a challenging task, it is an exciting
and hot topic for researchers. This article intends to be proposed as a predictive model for buying
Received: May 04, 2020 behaviour on online systems. This model may be represented as a two-stage process. First, a
Revised: September 16, 2020 sequence database is built from a shopping cart. Second, the prediction will be performed by using
Accepted: October 14, 2020
the CPT+, which is an improved model of CPT (Compact Prediction Tree)). The main contribution
DOI:
of this paper is that we proposed a solution for predicting buying behaviour in the e-commerce
10.2174/2666255814666201230115148 context (a case study of an e-commerce company). The core prediction is mainly based on
sequence prediction, in particularly, CPT+ (Compact Prediction Tree).  
Keywords: Sequence prediction, customer behaviour prediction, compact prediction tree, CPT, CPT+, E-commerce.

1. INTRODUCTION are commonly purchased together; e.g., beer and diapers; (7)
Trend analysis that could help to reveal the difference
Predicting customer behaviour is the key to the business
between a typical customer this month and last.
success of an enterprise. Therefore, companies must create
an analytic tool to forecast customer behaviour, especially Among the seven mentioned strategies, trend analysis is a
the next potential products that customers intend to buy. hot topic in recent years. In the scope of this article, a way to
Doing this could benefit enterprises from selling products predict buying behaviour (e.g. predict next products bought
and increase income for employers and contribute to the by customers from a shopping cart of an e-commerce
sustainable development of companies. One helpful to solve company) has been intoduced.
this problem is using data mining. For instance, if a customer has chosen some products Px,
Data mining is the discovery of structures and patterns in Py, Pz, in that order, one may want to predict what is the
large and complex data sets [1]. Data mining is the discovery next product that will be selected by that user to help
of structures and patterns in large and complex data sets. merchants improve the performance of their business.
There are common seven strategies that companies are Various models have been proposed for making predictions.
interested in: (1) Market segmentation that could help to However, the CPT+ (Compact Prediction Tree Plus) is better
identify the common characteristics of customers who buy than other methods. This is a model using sequential data
the same products from your company; (2) Customer churn mining to predict the next items in a sequence database and
that could help to predict which customers are likely to leave is also the model improved from the CPT (Compact
your company and go to a competitor; (3) Fraud detection Prediction Tree) [2].
that could help to identify which transactions are most likely The rest of this paper is organised as follows. Sections 2,
to be fraudulent; (4) Direct marketing that could help to 3, 4, and 5 respectively present the problem definition and
determine which prospects should be included in a mailing related work, the CPT+ algorithm, the experimental results,
list to obtain the highest response rate; (5) Interactive and the conclusion.
marketing that could help to predict what each is accessing a
Web site is most likely interested in; (6) Market basket
2. RELATED WORK
analysis that helps to understand what products or services
Many models are proposed to predict the next item, such
as Association Rules [3], Markov, and so on.
*Address correspondence to this author at the Faculty of Information
Systems, University of Economics and Law, VNU-HCM, Ho Chi Minh,
Vietnam; E-mail: [email protected]

2666-2558/XX $65.00+.00 © XXXX Bentham Science Publishers


2 Recent Advances in Computer Science and Communications, XXXX, Vol. XX, No. XX Da et al.

The research [4] proposed a product prediction model has chosen some products Px, Py, Pz, in that order, one may
using the decision tree classifier and a main classification want to predict what is the next product that will be chosen
method [5]. by that user to help merchants improve the performance of
their business. Data used to analyse in this paper from the
A typical paper has used Association Rules in which the
authors [6] utilised the association rule model, especially the buying history of customers on an e-commerce website. In
particular, data from a shopping cart will be converted into a
Apriori algorithm for product prediction. Besides, Clustering
sequence database to make predictions.
and k-means algorithm are used. Data collected for the study
consists of 42 products with two main attributes, such as time Table 1. Comparison among models for sequence prediction.
in the week and required quantity. The prediction related to
the range of products in a certain period is proposed. Prediction Models Loss Information Lossless Information
However, they are a few drawbacks of using the
candidate-and-test approach. Because of this, many valid Markov X
candidates could be ignored. Besides, a large number of Clustering X
suitable candidates are not interested in seeing.
Rules X
According the authors [7], models using Markov have a
significant limitation although chains are also widely used CPT X X
for sequence prediction, they assume that sequences are
Markovian. CPT+ X X

All these approaches, as mentioned above, predict lossy Hybrid


X
techniques from training sequences. Therefore, they do not (Markov, Rules)
utilise all the information containing in sequence databases
to make predictions. Hybrid (Markov,
X
Clustering)

2.1. Association Rules Mining Hybrid (Clustering,


X
Markov, Rules)
According to the RuleGrowth [8], a sequential rule X⇒Y
is defined as a relationship between two itemsets X, Y ⊆ I Two approaches that are more accurate than others are CPT and CPT+.
such that X∩Y = Ø and X, Y are not empty. The
interpretation of a rule X⇒Y is that if the items of X occur in
some itemsets of a sequence, the items in Y will occur in 3.1. Background
some itemsets afterward in the same sequence. The major
problem of mining sequential rules is finding all valid 3.1.1. CPT (Compact Prediction Tree)
sequential rules from a sequence database. Their support and The CPT's indexing mechanism structure [7] helps
confidence are respectively higher or equal to some user- quickly collect relevant information for making a prediction.
defined thresholds. Typical works on this approach have Furthermore, CPT provides two strategies that respectively
been carried out [8-10]. reduce the size of CPT and increase prediction accuracy.
The main idea of the CPT is shown below.
2.2. Sequence Prediction
(1) CPT utilises a structure based on tree storing optimal
The problem of Sequence Prediction is to find the next
training sequence without loss of information. In
items in a sequence [2]. There are many methods to solve
other words, CPT is only a structure-based tree to
this problem such as Neural Networks [11], Prediction by
store sequences for training also is an indexing
Partial Matching (PPM) [12], models based on Markov:
mechanism, and each sequence is added one after the
Dependency Graph (DG) [13], All-k-order Markov (AKOM)
[14], Transition Directed Acyclic Graph (TDAG) [15], LZ78 other in this model.
[16], PST [17] and CTW [18]; using compression (2) Predictions are made by measuring the similarity of a
approaches such as LZ78 [16] and Active Lezi [19]. sequence to the training sequences.
Recently, a state-of-the-art approach called CPT For example, Fig. 1 illustrates the creation of the three
(Compact Prediction Tree) has been proposed. This solution structures by the 5 times to insert of sequences s1 = <A, B,
showed that it is a solution to make predictions comparing to C>, s2 = <A, B>, s3 = <A, B, D>, s4 = <B, C> and s5 = <E, A,
many other ones. CPT compresses training sequences B, A> , where the alphabet Z = {A, B, C, D, E} is used.
without information loss by exploiting similarities between
subsequences. It is more accurate than state-of-the-art To train sequences, CPT comprises three different
models such as PPM, DG, AKOM on various real datasets. structures: a Prediction Tree, a Lookup Table and an
Inverted Index. Although CPT is useful for making
However, the spatial complexity and a higher prediction
predictions, it is also still a few drawbacks. Firstly, the noise
time is a significant limitation of CPT as shown in Table 1.
avoidance strategy has not solved effectively yet. Another
thing, it takes much more time to make predictions. Because
3. MATERIALS AND METHOD
of this, another better model was proposed that is, CPT+ [7]
An approach to help merchants to understand about the having many useful strategies to predict.
trend of customers was proposed, in particular, if a customer
Predicting Buying Behavior using CPT+ Recent Advances in Computer Science and Communications, XXXX, Vol. XX, No. XX 3

three leading contributions brought by PNR are to require a


minimum number of updates on the Count Table to perform
a prediction, and to define noise based on the frequency of
items, and to define noise proportionally to a sequence
length [7]. Fig. 3 illustrates the PRN strategy which is based
on the hypothesis that noisy items have a low frequency,
where an item's frequency is defined as the number of
training sequences containing the item. Therefore, during the
prediction process, PNR removes only low-frequency items.
This reduction helps to enhance the performance of
execution time. As mentioned, CPT removes items from a
sequence to be predicted to be more noise-tolerant. The
improvement of PRN is that it helps only to remove less
frequent symbols from sequences, assuming that they are
more likely to be noise; consider a minimum number of
sequences to make a prediction; add a new parameter Noise
Ratio (e.g. 15%) to determine how many items should be
removed from sequences (e.g., the 15% is not most frequent
items). Therefore, the amount of noise is assumed to be
proportional to the length of sequences.

Fig. (1). Compact Prediction Tree is building [7]. (A higher


resolution / colour version of this figure is available in the
electronic copy of the article).

3.1.2. CPT+ (Improved Compact Prediction Tree)


CPT+ provides three major effective strategies that
include two optimisations to reduce the size of the tree and
the remaining one to improve noise reduction
The two first strategies are called compressing frequent
substrings and compressing superficial branches,
respectively:
(1) Compressing Frequent Substrings (CFS) is applied
during training: it identifies frequent substrings in
training sequences; it replaces these substrings with Fig. (2). FSC and SBC strategy [7]. (A higher resolution / colour
new items. Substrings are discovered with a modified version of this figure is available in the electronic copy of the
version of the PrefixSpan algorithm. The time article).
complexity of CPT+ in terms of training in this case:
non-negligible cost to discover frequent substrings;
The time complexity of CPT+ in terms of prediction:
symbols are uncompressed on-the-fly in O(1) time.
The space complexity of CPT+ is O(m), where m is
the number of frequent substrings.
(2) Compressing Simple Branches (CSB) is a second
optimisation to reduce the size of the tree; A
superficial branch is a branch where all nodes have a
single child; Each superficial branch is replaced by a
single node representing the whole branch. The time
complexity of CPT+ in this case: very fast; after
building the tree, we only need to traverse the stems
from the bottom using the lookup table.
Fig. 2 shows the prediction tree obtained by applying the
FSC and SBC strategies.
Fig. (3). PRN strategy [7].
The third strategy has relied upon the hypothesis that
noise in training sequences consists of items having no high
frequency.
3.2. Proposed Method: Predicting Buying Behaviour
Therefore, this strategy removes the only items that have
a low frequency when predicting. This reduction is A possible solution to make better predictions is to use
beneficial because it impacts positively the run time. The CPT+ on sequence databases. These databases like this are
4 Recent Advances in Computer Science and Communications, XXXX, Vol. XX, No. XX Da et al.

generated by using a procedure. The procedure comprises a 4. EXPERIMENTAL RESULTS


few following main steps.
4.1. Datasets
Step 1: Collect data related to the shopping cart from an
e-commerce Website. Using the procedure in Step 1 (see section 3.2), a
sequence database called ShopSBD was generated that is
Step 2: Build a procedure to convert data from a provided at the link below.
shopping cart into a sequence database.
Information about the IDs of customers and products is
given in Table 2.
Table 2. An example of simple shopping cart.

CusID ProdID Buying_Time

100 200 4-May-2019, 9:17:37 AM

101 200 4-May-2019, 11:27:18 AM

102 201 4-May-2019, 1:27:88 PM

103 203 4-May-2019, 3:20:28 PM

103 200 4-May-2019, 5:20:28 PM

102 203 4-May-2019, 3:48:21 PM

100 204 4-May-2019, 3:20:28 PM

103 201 5-May-2019, 8:40:35 PM

101 201 5-May-2019, 1:18:52 PM

102 205 6-May-2019, 9:27:19 PM


An expected sequence database containing four sequences will be generated as follow.
Fig. (4). The CONVERT_CART_INTO_SDB Procedure.
Sequence 1 (corresponds with CustID = 100): https://sourceforge.net/projects/shopsdb/files/ShopSDB.txt
200 -1 204 -1 -2
Sequence 2 (corresponds with CustID = 101):
200 -1 201 -1 -2
Sequence 3 (corresponds with CustID = 102): This sequence database consists of 2984 sequences. It is
201 -1 203 -1 205 -2 built from a shopping cart (in terms of website templates)
offered by one of our partners, the Smart Work Corp.
The value "-1" is used to separate among continuous (http://smarkwork.vn).
products, and the value "-2" indicates the end of a specific
sequence (it is put at the end of each line) in the sequence
database. Fig. 4 shows the main pseudocode of the procedure 4.2. Evaluation Framework
CONVERT_CART_INTO_SDB. The framework [20] was used to compare the accuracy of
Note that every customer has a group of continuous CPT+ against five standard sequence prediction algorithms
buying activities. It means that each customer produces such as All-K-order Markov, DG, Lz78, PPM and TDAG on
distinct sequences in the sequence database so that every the sequence database ShopSDB. Experiments have been
sequence has continuous products bought by a particular performed on a laptop computer using Java 8 with an 8-core
customer. The desired result of the procedure is to create a CPU 4th generation Intel i7 with 32 GB RAM and an SSD
new dataset that would be used as the input data to predict drive. According to [21], three are measures to assess an
the next products bought by customers. overall predictor performance. Accuracy is determined by
equation 1. It is the ratio of successful predictions against the
The primary function of this procedure is to build a total number of failed predictions adding several successful
sequence database from data retrieved by a shopping cart predictions.
database. The procedure takes as input two fields of table
tbCart that contains information relating to customers and Accuracy = |successes| / |sequences| (1)
our ordered products (from Line 5 to Line 6). Accuracy (eq. 1) is the principal measure to evaluate the
Next, the source code from Line 8 to Line 12 implies that accuracy of a given predictor. It is the number of successful
two distinct customers have private products. In other words, predictions against the total number of test sequences [21].
lines containing the last item "-2" contain products that
different customers bought. 4.3. Experiments
Finally, the block of lines from Line 13 to Line 17 supports Results illustrated in Fig. (5) indicate that CPT+ yields the
the output. The obtained result is a sequence database. highest accuracy among other sequence prediction models
Step 3: Utilise the CPT+ algorithm to predict products on when training sequences in the same sequence database
this sequence database. ShopDB. This line graph illustrates the percentage of accuracy
Predicting Buying Behavior using CPT+ Recent Advances in Computer Science and Communications, XXXX, Vol. XX, No. XX 5

of sequence prediction algorithms. As shown in Fig. (5), on


the same dataset (also the sequence database called
ShopSDB), the proportion of CPT+ is higher than that of all
other algorithms. In particular, it peaks at approximately 97%.
The least ratio is TDAG (about 22%). The second accuracy is
DG with the Local Accuracy = 94.08 (%), and the third one is
Markov1 (approximately 60% in terms of accuracy).
Thus, the CPT+ outperformed state-of-the-art sequence
prediction algorithms such as All-K-order Markov, DG,
Lz78, PPM, and TDAG when running on the same sequence
database ShopSDB that we built.
Prediction results created by CPT+ on the sequence
database ShopSDB is shown in Table 3.
Table 3 depicts the result of product prediction.
For instance, with the sequence <1018, 1010, 945>, there
are three possible candidates 947, 949, and 981. In this e-
commerce context, when a customer buys a website template
with ID 1018, he goes on buying from another website Fig. (5). The Comparison Of Sequence Prediction Models On The
template 1010, and 945, he chooses the next website Same Sequence Database ShopSDB. (A higher resolution / colour
template 947, 949, or 981. Among these predicted products, version of this figure is available in the electronic copy of the
website template 947 is the best possible choice. article).

Table 3. Next products prediction results.

Sequences Need to be Predicted Number of Results Top 5 Predicted Products

<12> 19 1011, 355, 398, 981, 919

<904> 81 827, 771, 947, 1011, 919

<949, 1018> 127 947, 941, 919, 904, 50

<928, 969> 155 919, 904, 827, 771, 50

<928, 969, 772> 30 771, 904, 919, 52, 758

<1018, 1010, 945> 126 947, 949, 981, 951, 979

<1015, 112, 902, 135> 91 1016, 904, 981, 903, 905

<3110, 12, 827, 652, 355> 124 981, 947, 1023, 979, 398

<9247, 50, 776, 652, 590> 211 947, 827, 1017, 919, 884

<1058, 941, 77, 112, 625, 430> 223 981, 1011, 626, 627, 628

<735, 736, 737> 32 738, 739, 740, 741, 742

<50, 942, 804> 62 1017, 947, 827, 771, 841

<172, 964, 969> 127 947, 50, 842, 1023, 919

<771, 50, 986> 158 928, 861, 855, 950, 827

<110, 981, 979> 108 108, 949, 969, 904, 961

<1011, 887, 796> 83 50, 889, 888, 1018, 827

<27, 94, 43> 17 76, 211, 932, 210, 6927

<35, 987, 361> 128 285, 128, 151, 637, 804

<55, 627, 804> 77 1018, 772, 1011, 1017, 230

<43, 932, 6927> 12 94, 211, 27, 76, 210

<94, 27, 932> 23 43, 76, 211, 210, 6927

<110, 693, 27> 59 59, 981, 969, 949, 961


6 Recent Advances in Computer Science and Communications, XXXX, Vol. XX, No. XX Da et al.

The sequence <3110, 12, 827, 652, 355>shows that there In the future, we are going to perform another project to
are six products (981, 947, 1023, 979, 398) bought by a improve further the accuracy of CPT+ for buying product
specific customer and thanks to the CPT+, five next products prediction by a combination of various novel approaches.
are predicted (981, 947, 1023, 979, 398), where the product Besides, we also plan to explore product prediction on large
981 is the best-predicted product. Following the product 981 systems like Spark [22], Hadoop or cloud-based systems.
is 947. The third and the fourth predicted ones are 1023, 979,
and finally, the product 398 is predicted. LIST OF ABBREVIATIONS
Similarly, with the sequence <1058, 941, 77, 112, 625, AKOM = All-k-order Markov
430>, the possible website templates that were predicted are
981, 1011, 626, 627, 628. It means that if a customer buys CFS = Compressing Frequent Substrings
website templates 1058, 941, 77, 112, 625, and 430, he could CPT = Compact Prediction Tree
buy the next website templates such as 918, 1011, and so on.
CPT+ = Compact Prediction Tree Plus
As shown in Table 3, products are predicted that belongs
CSB = Compressing Simple Branches
to top 3 predicted products are 947 (occurred 6 times in 22
prediction cases), 981 (occurred 5 times in 22 prection CTW = Context Tree Weighting
cases), 904 (occurred 3 times in 22 prection cases), 1011
CusID = CustomerID
(occurred 3 times in 22 prediction cases), 211 (occurred 3
times in 22 prection cases), 827 (occurred 3 times in 22 DG = Dependency Graph
prection cases). Thus, customers are likely to choose PPM = Partial Matching
products like 947, 981, 904, 1011, 211, 827 after choosing a
sequence of other products that were bought before. Among 1PST = Probabilistic Suffix Tree
the products mentioned above, there are two products that ProdID = ProductID
are favourite. They are 947 (occurred 6 times in 22
considered cases), 981 (5 times in 22 considered cases). In PRN = Prediction with improved Noise
other words, the company should concentrate on investing in Reduction
products like 947 and 981. LZ78 = J. Ziv and A. Lempel (1978)
TDAG = Transition Directed Acyclic Graph
5. DISCUSSION
VNU-HCM = Vietnam Nation University Ho Chi
The work contributed an algorithm to convert a relation
Minh
database (here is a shopping cart) into sequence database to
predict the next products. After a few users' activities to buy
products, trends on their behaviour have been created. This CONSENT FOR PUBLICATION
is really essential for the business or enterprises to build Not applicable.
recommender systems aiming to benefit from purchasing
products. AVAILABILITY OF DATA AND MATERIALS  
However, data collected in this work came from our Not applicable.
partner company. For this reason, this result can only apply
to enterprises having shopping cart model that is similar to FUNDING  
that of the Smart Work company.
None.  
CONCLUSION
CONFLICT OF INTEREST
This article proposes the approach for making predictions
for products in online business. In particular, we built the The authors declare no conflict of interest, financial or
sequence database from data related to the shopping cart. We otherwise.
also prove that with our new dataset (ShopSDB), the CPT+
is also useful for making predictions. In particular, we ACKNOWLEDGEMENTS
presented a procedure to convert the shopping cart into a
Declared none.
sequence database and utilised CPT+ to the next product
prediction applying in the e-commerce context at an e-
commerce company. Experimental results on the same REFERENCES
dataset derived from a shopping cart show that CPT+, in this [1] P-N. Tan, M. Steinbach, and V. Kumar, Introduction to data
case, is much better than other sequence prediction methods mining.. Pearson Education: India, 2016.
(CPT, All-K-Order Markov, DG, Lz78, PPM, and TDAG). [2] T. Gueniche, P. Fournier-Viger, and V.S. Tseng, ""Compact
Prediction Tree: A Lossless Model for Accurate Sequence
Prediction", ADMA, no. 2, pp. 177-188, 2013.
CURRENT & FUTURE DEVELOPMENTS http://dx.doi.org/10.1007/978-3-642-53917-6_16
[3] R. Agrawal, and R. Srikant, "Fast algorithms for mining association
Currently, we are developing the proposed method on rules", Proc. 20th int. conf. very large databases, VLDB, vol. vol.
how to improve the performance of time execution. 1215, 1994pp. 487-499
[4] C. Cumby, A. Fano, R. Ghani, and M. Krema, "Predicting
customer shopping lists from point-of-sale purchase data",
Predicting Buying Behavior using CPT+ Recent Advances in Computer Science and Communications, XXXX, Vol. XX, No. XX 7

Proceedings of the tenth ACM SIGKDD international conference [13] V. Padmanabhan, and J. Mogul, "Using Prefetching to Improve
on Knowledge discovery and data mining, , 2004pp. 402-409 World Wide Web Latency", Comput. Commun., vol. 16, pp. 358-
http://dx.doi.org/10.1145/1014052.1014098 368, 1998.
[5] A. Carlson, C. Cumby, J. Rosen, and D. Roth, "The SNoW learning [14] J. Pitkow, and P. Pirolli, "Mininglongestrepeatin g subsequences to
architecture", Technical report .UIUCDCS1999 predict world wide web surfing", Proc. Usenix Symp. on Internet
[6] R. Ismail, Z. Othman, and A.A. Bakar, "Associative prediction Technologies and systems, 1999p. 1
model and clustering for product forecast data", 2010 10th [15] P. Laird, and R. Saul, "Discrete sequence prediction and its
International Conference on Intelligent Systems Design and applications", Mach. Learn., vol. 15, no. 1, pp. 43-68, 1994.
Applications, , 2010pp. 1459-1464 http://dx.doi.org/10.1007/BF01000408
http://dx.doi.org/10.1109/ISDA.2010.5687116 [16] J. Ziv, and A. Lempel, "Compression of individual sequences via
[7] T. Gueniche, P. Fournier-Viger, R. Raman, and V.S. Tseng, variable-rate coding", IEEE Trans. Inf. Theory, vol. 24, no. 5, pp.
"CPT+: Decreasing the time/space complexity of the Compact 530-536, 1978.
Prediction Tree", Pacific-Asia Conference on Knowledge http://dx.doi.org/10.1109/TIT.1978.1055934
Discovery and Data Mining, , 2015pp. 625-636 [17] R. Begleiter, R. El-Yaniv, and G. Yona, "On prediction using
http://dx.doi.org/10.1007/978-3-319-18032-8_49 variable-order Markov models", J. Artif. Intell. Res., vol. 22, pp.
[8] P. Fournier-Viger, R. Nkambou, and V.S-M. Tseng, "RuleGrowth: 385-421, 2004.
mining sequential rules common to several sequences by pattern- http://dx.doi.org/10.1613/jair.1491
growth", Proceedings of the 2011 ACM symposium on applied [18] F.M. Willems, Y.M. Shtarkov, and T.J. Tjalkens, "The context-tree
computing, 2011pp. 956-961 weighting method: basic properties", IEEE Trans. Inf. Theory, vol.
http://dx.doi.org/10.1145/1982185.1982394 41, no. 3, pp. 653-664, 1995.
[9] P. Fournier-Viger, U. Faghihi, R. Nkambou, and E.M. Nguifo, http://dx.doi.org/10.1109/18.382012
"CMRules: Mining sequential rules common to several sequences", [19] K. Gopalratnam, and D.J. Cook, "Online sequential prediction via
Knowl. Base. Syst., vol. 25, no. 1, pp. 63-76, 2012. incremental parsing: The active lezi algorithm", IEEE Intell. Syst.,
http://dx.doi.org/10.1016/j.knosys.2011.07.005 vol. 22, no. 1, pp. 52-58, 2007.
[10] P. Fournier-Viger, T. Gueniche, S. Zida, and V.S. Tseng, http://dx.doi.org/10.1109/MIS.2007.15
"ERMiner: sequential rule mining using equivalence classes", [20] P. Fournier-Viger, "The SPMF open-source data mining library
International Symposium on Intelligent Data Analysis, , 2014pp. version 2", Joint European conference on machine learning and
108-119 knowledge discovery in databases, , 2016pp. 36-40
http://dx.doi.org/10.1007/978-3-319-12571-8_10 http://dx.doi.org/10.1007/978-3-319-46131-1_8
[11] W. Tian, B. Choi, and V.V. Phoha, "An adaptive web cache access [21] T. Gueniche, P. Fournier-Viger, and V.S. Tseng, "Compact
predictor using neural network", International Conference on prediction tree: A lossless model for accurate sequence prediction",
Industrial, Engineering and Other Applications of Applied International Conference on Advanced Data Mining and
Intelligent Systems, , 2002pp. 450-459 Applications, , 2013pp. 177-188
http://dx.doi.org/10.1007/3-540-48035-8_44 http://dx.doi.org/10.1007/978-3-642-53917-6_16
[12] J. Cleary, and I. Witten, "Data compression using adaptive coding [22] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I.
and partial string matching", IEEE Trans. Commun., vol. 32, no. 4, Stoica, "Spark: Cluster computing with working sets", HotCloud ,
pp. 396-402, 1984. vol. 10 , no. 10-10 , p. 95 , 2010.
http://dx.doi.org/10.1109/TCOM.1984.1096090

DISCLAIMER: The above article has been published in Epub (ahead of print) on the basis of the materials provided by the author. The Editorial Department
reserves the right to make minor modifications for further improvement of the manuscript.

View publication stats

You might also like