0% found this document useful (0 votes)

54 views16 pages

Market Basket Analysis in A Multiple Store Environment: Yen-Liang Chen, Kwei Tang, Ren-Jie Shen, Ya-Han Hu

1) Market basket analysis is a useful method to discover customer purchasing patterns by analyzing transaction data from stores. However, existing methods have weaknesses when applied to multi-store environments where products and purchasing patterns may vary across stores and over time. 2) The proposed new method overcomes these weaknesses by extracting association rules that include information on the specific stores and time periods where the rules apply. 3) The results show the new method is computationally efficient and has advantages over traditional methods for discovering patterns when analyzing diverse stores with changing products and purchases over larger numbers of stores and time periods.

Uploaded by

Zain Aamir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views16 pages

Market Basket Analysis in A Multiple Store Environment: Yen-Liang Chen, Kwei Tang, Ren-Jie Shen, Ya-Han Hu

Uploaded by

Zain Aamir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Decision Support Systems 40 (2005) 339 – 354

www.elsevier.com/locate/dsw

Market basket analysis in a multiple store environment

Yen-Liang Chen a, Kwei Tang b,*, Ren-Jie Shen a, Ya-Han Hu a
a
Department of Information Management, National Central University, Chung-Li, 320 Taiwan, ROC
b
Krannert Graduate School of Management, Purdue University, West Lafayette, IN 47907, USA
Received 1 December 2003; received in revised form 5 April 2004; accepted 5 April 2004
Available online 2 June 2004

Abstract

Market basket analysis (also known as association-rule mining) is a useful method of discovering customer purchasing
patterns by extracting associations or co-occurrences from stores’ transactional databases. Because the information obtained
from the analysis can be used in forming marketing, sales, service, and operation strategies, it has drawn increased research
interest. The existing methods, however, may fail to discover important purchasing patterns in a multi-store environment,
because of an implicit assumption that products under consideration are on shelf all the time across all stores. In this paper, we
propose a new method to overcome this weakness. Our empirical evaluation shows that the proposed method is computationally
efficient, and that it has advantage over the traditional method when stores are diverse in size, product mix changes rapidly over
time, and larger numbers of stores and periods are considered.
D 2004 Elsevier B.V. All rights reserved.

Keywords: Association rules; Data mining; Store chain; Algorithm

1. Introduction by extracting associations or co-occurrences from

stores’ transactional databases. Discovering, for exam-
Because of advances in information and communi- ple, that supermarket customers are likely to purchase
cation technologies, corporations can effectively ob- milk, bread, and cheese together, or that bank custom-
tain and store transactional and demographic data on ers are likely to use a set of services jointly, can help
individual customers at reasonable costs. One of the managers in designing store layout, web sites, product
challenges for corporations that have invested heavily mix and bundling, and other marketing strategies.
in customer data collection is how to extract important The methodology was introduced by Agrawal et al.
information from their vast customer databases in [2] and can be stated as follows. Given two non-
order to gain competitive advantage. Market basket overlapping subsets of product items, X and Y, an
analysis (also known as association rule mining) is a association rule in form of X ! Y indicates a purchase
method of discovering customer purchasing patterns pattern that if a customer purchases X then he or she
also purchases Y. Two measures, support and confi-
* Corresponding author. Tel.: +1-765-494-4464; fax: +1-765-
dence, are commonly used to select the association
494-9658. rules. Support is a measure of how often the transac-
E-mail address: [email protected] (K. Tang). tional records in the database contain both X and Y,

0167-9236/$ - see front matter D 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.dss.2004.04.009
340 Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354

and confidence is a measure of the accuracy of the product set. The results of the method may be biased,
rule, defined as the ratio of the number of transac- however, because a product may be on shelf before its
tional records with both X and Y to the number of first transaction and/or after the last transaction
transactional records with X only. By far, the Apriori occurs, and a product may also be put on-shelf and
algorithm [1] is the most known algorithm for mining taken off-shelf multiple times during the data collec-
the association rules from a transactional database, tion period.
which satisfy the minimum support and confidence The second problem is associated with finding
levels specified by users. common association patterns in subsets of stores.
Since association rules are useful and easy to Similar to the problem in using existing temporal
understand, there have been many successful business rules in a multi-store environment, we have to con-
applications, including, for example, finance, telecom- sider the possibility that some products may not be
munication, marketing, retailing, and web analysis sold in some stores, for example, because of geo-
[5]. The method has also attracted increased research graphical, environmental, or political reasons. This is
interest, and many extensions have been proposed in seemingly related to spatial association rules. Howev-
recent years, including (1) algorithm improvements er, the focus of spatial rules is on finding the associ-
[6,12,18,21]; (2) fuzzy rules [13,14]; (3) multi-level ation patterns that are related to topological or
and generalized rules [7,10]; (4) quantitative rules distance information in, for example, maps, remote
[20,24,25]; (5) spatial rules [7,15]; (6) inter-transac- sensing or medical imaging data and VLSI chip layout
tion rules [19]; (7) interesting rules [4,9]; and (8) data [23].
temporal association rules [3,16,17]. Brief literature To overcome the problems, we develop an Apriori-
reviews of association rules are given by Chen et al. like algorithm for automatically extracting association
[8] and Han and Kamber [11]. rules in a multi-store environment. The format of the
In today’s business world, it is common for a rules is similar to that of the traditional rules. How-
company to have subsidiaries, branches, dealers, or ever, the rules also contain information on store
franchises in different geographical locations. For (location) and time where the rules hold. The results
example, Wal-Mart, the largest supermarket chain in of the proposed method may contain rules that are
the world, has more than 4400 stores worldwide. For a applicable to the entire chain without time restriction
company with multiple stores, discovery of purchas- or to a subset of stores in specific time intervals. For
ing patterns that may vary over time and exist in all, or example, a rule may state: ‘‘In the second week of
in subsets of, stores can be useful in forming market- August, customers purchase computers, printers, In-
ing, sales, service, and operation strategies at the ternet and wireless phone services jointly in electron-
company, local, and store levels. ics stores near campus.’’ Another example is: ‘‘In
There are two main problems in using the existing January, customers purchase cold medicine, humidi-
methods in a multi-store environment. The first is fiers, coffee, and sunglasses together in supermarkets
caused by the temporal nature of purchasing patterns. near skiing resorts.’’ These rules can be used not only
An apparent example is seasonal products. Temporal for general or localized marketing strategies, but also
rules [3,16,17] are developed to overcome the weak- for product procurement, inventory, and distribution
ness of the static association rules that either find strategies for the entire store chain. Furthermore, we
patterns at a point of time or implicitly assume the allow an item to have multiple selling time periods;
patterns stay the same over time and across stores. A i.e., an item may be put on-shelf and taken off-shelf
literature review on temporal rules is given by Rod- multiple times. We further assume that different stores
dick and Spiliopoulou [22]. In temporal rules, selling can have different product-mixes in different time
periods are considered in computing the support periods. That is, each store can have its own prod-
value, where the selling period of a product is defined uct-mix, and the product-mix in a store can be
as the time between its first and last appearances in the dynamically changed over time.
transaction records. Furthermore, the common selling Because the time and store (location) factors are
period of the products in a product set is used as the considered, the rule generation procedure is more
base in computing the ‘‘temporal support’’ of the complicated than the Apriori algorithm. The simula-
Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354 341

tion results presented in the paper show that the timestamp, t, and store identifier, p, to indicate the
proposed method is computationally efficient and store and time that the transaction occurs.
has significant advantage over the traditional associ- Let SkpP and RkpT be the sets of the stores and
ation method when the stores under consideration are times that item Ik is sold, respectively. We define
diverse in size and have product mixes that change VIk = Sk Rk as the context of item Ik; i.e., the set of
rapidly over time. the combinations of stores and times where item Ik is
The paper is organized as follows. We formally sold. Furthermore, the context of itemset X, denoted
define the problem in Section 2 and in Section 3 by VX, is the set of the combinations of stores and
propose an algorithm. In Section 4, we compare the times that all items in X are sold concurrently. For
results generated from the proposed algorithm and the example, if itemset X consists of two items Ik and Ik’,
traditional Apriori algorithm in a simulated multi-store the context of X is given by VX = VIk\Vik’.
environment. The conclusion is given in Section 5.
Definition 2. Let X be an itemset in I with context VX,
and DVX the subset of transactions in D whose
2. Problem definition timestamps t and store identifiers p satisfy VX. We
define the relative support of X with respect to the
We consider a market basket database D that context VX, denoted by rel_sup(X, DVX), as AW (X,
contains transactional records from multiple stores DVX)A/ADVXA. For a given threshold for relative
over time period T. Our objective is to extract the support rr, if a frequent itemset X satisfies rel_sup(X,
association rules from the database. For convenience DVX) z rr, we call X a relative-frequent (RF) itemset.
in presentation, the cardinal of a set, say R, is denoted In the last definition, we require that a relative-
by ARA. Let I={I1, I2,. . ., Ir} be the set of product frequent itemset X be frequent. We add this restriction
items included in D, where Ik (1 V k V r) is the iden- for two reasons. First, it enables us to preserve the
tifier for the kth item. Let X be a set of items in I. We well-known downward-closure property, by which the
refer X as a k-itemset if AXA = k. Furthermore, a candidate set of the next phase can be obtained by
transaction, denoted by s, is a subset of I. We use joining the frequent sets of the preceding phase; this
W(X, D)={sAsaD^Xps} to denote the set of trans- will greatly improve the performance of the algorithm.
actions in D, which contain itemset X. Second, this restriction does not present any real
problem to the mining algorithm, because none of
Definition 1. The support of X, denoted by sup(X, D),
the important patterns would be missing because of
is the fraction of transactions containing X in database
using a low rs value. Therefore, we prefer using a low
D; i.e., sup(X, D) = AW(X, D)A/ADA. For a specified
rs value. However, it should not be too low because
support threshold rs, X is a frequent itemset if sup(X,
an itemset that occurs only in few transactions has no
D) z rs.
practical significance.
Note that the definitions of the support and the Furthermore, the minimum threshold for the rela-
frequent itemset are those used in the traditional tive support of an itemset is used to determine whether
association rules, and, therefore, the store and time a sufficient percentage of transactions exists in its
information is not considered in determining the context to warrant the inclusion of the itemset as a
support of an itemset. relative-frequent (RF) itemset. Its use and purpose are
Let {T1, T2,. . ., Tm} be the set of mutually disjoint similar to those of the traditional minimum support
time intervals (periods) and form a complete partition threshold. Consequently, we can set its value the same
of T. Furthermore, they are ordered, such that Ti + 1 way as we set the traditional minimum support
immediately follows Ti for i z 1. Note that the time threshold.
periods are defined according to specific needs of the
problem, such as 1 h, 6 h, 1 day, 1 week, and so on. Definition 3. Consider two itemsets X and Y. The
Let P={ P1, P2,. . ., Pq} be the set of stores, where Pj relative support of X with respect to the context Vx[y,
(1 V j V q) denotes the jth store in the store chain. We denoted by rel_sup(X, DVX[Y), is defined as AW(X,
assume that each transaction s in D is attached with a DVX[Y)A/ADVX[YA. The confidence of rule X Z Y,
342 Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354

denoted by conf(X Z Y), is defined as rel_sup(X[Y, RF itemsets. Furthermore, when we use the frequent
DVX[Y)/rel_sup(X, DVX[Y). itemsets to generate the candidate set in the next
The above definition implies that the context of phase, it still satisfies the anti-monotone property,
rule X Z Y is Vx[y; i.e., the base used to compute the because we use the same base to compute the
confidence of rule X Z Y is the common stores and supports for all itemsets.
time periods shared by all the items in X[Y. As the first step of the algorithm, we build a table,
called the PT table, for each item in I to associate the
Definition 4. Let Z be an RF itemset, where Z = X[Y, item with its context (i.e., the stores and times it is
XpI, and Yp I \ X. Given a confidence threshold rc, if sold) and use the table to determine the context of an
conf(X Z Y) z rc, we call X Z Y a store-chain (SC) itemset. The algorithm proceeds in phases, where in
association rule, and Vx[y as the context of the rule. the kth phase we generate Fk from Ck and RFk from
Based on Definitions 1 and 4, it is clear that the Fk. In the first phase, we scan the database for the first
selection criteria and outputs for the store-chain asso- time and build a two-dimensional table, called the TS
ciation rules are different from those of the traditional table. In this table, the entry at the position
association rules. For the store chain rules, the output corresponding to Ti and Pj, denoted by TS(Ti, Pj),
includes the confidence, support, and a context indi- records the number of transactions that occur at store
cating the stores and times the rules hold. Pj in period Ti. Using this table and the PT table for a
It can be shown that the traditional method under- given itemset X, we can determine the number of
estimates the support and the confidence values (a transactions associated with the context of X, i.e.,
proof is given in Appendix A). Consequently, impor- ADVXA. In the kth phase of the algorithm, we first
tant purchasing patterns that satisfy the criteria of the derive Ck, and, then, generate Fk by evaluating their
SC association rules may not be identified by the supports, which can be done by scanning the database
traditional association-rule methods. and removing all infrequent itemsets. Since an RF
itemset must be a frequent itemset, we generate RFk
from Fk by evaluating the relative supports of the
3. Algorithm itemsets X in Fk.
In the following subsections, we give detailed
We propose an Apriori-like algorithm for mining descriptions for the key elements of the algorithm,
the store-chain association rules. The algorithm is including methods of (1) building the PT table, (2)
outlined in Fig. 1. We first explain the general concept building the TS table in the first phase, (3) finding
for developing the algorithm and then use five sub- RFk, (4) generating candidate itemsets, and (5) gen-
sections to give detailed information on several key erating the store-chain association rules.
steps of the algorithm.
In describing the algorithm, we use RFk to denote 3.1. The PT table
the set of all relative-frequent k-itemsets; Fk, the set
of all frequent k-itemsets; and Ck, the set of candi- The purpose of the PT table is to efficiently store
date k-itemsets. Note that, in the traditional Apriori the time and store information for each product item
algorithm, a k-item candidate itemset must be a in the database. We use a simple example to illustrate
combination of k 1 frequent itemsets because of the procedure for constructing the table. Consider the
the anti-monotone property [1]. Therefore, the Apri- bit matrices in Fig. 2 for items I1, I2, and I3, in which
ori algorithm can generate the candidate itemsets in there are six stores and six selling periods, and ‘‘1’’
the kth phase by joining the frequent itemsets in the and ‘‘0’’ indicate, respectively, that the item is or is
(k 1)th phase. However, for the SC association not for sale in the corresponding store and time.
rule, a subset of an RF itemset may not be an RF Because an item normally does not switch between
itemset because the base for calculating the relative on- and off-shelf very frequently in a typical applica-
support value varies in different phases. Consequent- tion, we store an item’s context information in the PT
ly, in the proposed algorithm, we generate candidate table instead of the bit matrix in order to conserve data
itemsets from the frequent itemsets, instead of the storage space. In the PT table, we need only to record
Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354 343

Fig. 1. Algorithm Apriori_TP.

Fig. 2. Bit matrices for I1, I2, and I3.

344 Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354

Let us use an itemset consisting of I1 and I2 as an

example. In order to generate the second row (store)
of the PT table, we start with an initial bit array: [1 1 1
1 1 1]. Since the corresponding row of the PT table for
I2 is [1 2 5], the first, fifth, and sixth elements of the
bit array are replaced by ‘‘0’’, resulting a new bit
array: [0 1 1 1 0 0]. Following the same method, the
third through the sixth elements of the new bit array
Fig. 3. The PT tables for I1, I2, and I3. are replaced by ‘‘0’’ when I2 is considered. As a result,
the final bit array is [0 1 0 0 0 0], and the
the time that the item changes its status between on- corresponding (second) row of the PT table for the
shelf and off-shelf. (Initially, we assume that the item itemset is [1 2 3].
is on-shelf in all the stores.) For example, in Store P1, Using the concept described above, we develop the
item I1 changes its status only in time T4. Therefore, procedure in Fig. 4. In the procedure, PT(k, j) denotes
we need only to record ‘‘4’’ in the PT table to reflect the jth row of the PT table for item k, and its S th
the time and store information for item I1 in Store P1. element is PT(k, j, S ), where S is an odd number. The
Following this procedure, the information of the elements of PTj are replaced by 0’s according to the
original bit matrices for items I1, I2, and I3 is con- rule stated in lines 4 through 6 in the algorithm: a
verted into the PT tables shown in Fig. 3. segment of PTj is replaced by ‘‘0’’, starting from
As mentioned previously, the PT tables for indi- position PT(k, j, S ) and ending at position PT(k, j,
vidual items can be used to determine the PT table for S + 1) 1. PT(k, j) is inserted in sequence into PTj for
a given itemset. The procedure given in Fig. 4 shows every item j in itemset X. The process of developing
how to generate the jth row (store) of the PT table for the PT table is included as line 8 in the algorithm
an itemset X by combining the jth rows of the PT given in Fig. 1.
tables for all items in the itemset. We start with an m-
dimension bit array, denoted by PTj, for the itemset 3.2. The TS table
with initial values of ‘‘1’’ for each of the m time
periods. These initial values are replaced by ‘‘0’’ After building the PT table, the first phase of the
found at the corresponding position in the PT tables algorithm is to build the TS table, where each entry at
for all the items in the itemset. Finally, we transform the position corresponding to Ti and Pj is the number
PTj into PT(X, j), the jth row of the PT table for of transactions that occur at store Pj in period Ti. This
itemset X. can be done by a scan through the database. An

Fig. 4. The method to compute the jth row of the PT table for itemset X.
Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354 345

example of the table is given in Fig. 5. Using the TS A. Furthermore, assume that the selling periods for
and PT tables for itemset X, we can determine the product B are from T6 to T15, and that 80 transactions
value ADVXA by summing all the values in the entries of them include product B. Finally, there are 50
of the TS table according to the store and time transactions containing both products A and B, and
information of the items in X. The process of con- they are sold in periods from T6 to T10.
structing the table is described in lines 2 through 4 in
Fig. 1. In order to compute the supports and the relative
supports for itemsets {A}, {B}, and {A, B}, we
3.3. Relative-frequent itemset identify the following values: AW({A}, Dv{A})A =
AW({A}, D)A = 60, AW({B}, Dv{B} )A = AW({B},
Because an RF itemset must be a frequent itemset, D)A = 80, and AW({A, B}, Dv{A,B})A = AW({A, B},
we can generate RFk from Fk by computing the D)A = 50. Since the base for computing the support is
relative supports of those itemsets X in Fk. It is evident ADA = 300, the supports for the three itemsets are
that AW(X, DVX)A equals AW(X, D)A because it is not given by sup({A}, D) = 60/300 = 0.2, sup({B},
possible for X to appear in a transaction not in DVX. D) = 80/300 = 0.267, and sup({A, B}, D) = 50/
Further, ADVXA can be obtained from the TS and PT 300 = 0.167, respectively. On the other hand, the bases
tables of X. As a result, we can find the RF itemsets by for computing the relative support are ADv{A}A = 195,
first computing the relative supports of all X in Fk and ADv{B}A = 205, and ADv{A[B}A = 100, respectively,
then pruning those itemsets whose relative supports for the three itemsets. As a result, the relative supports
are less than rr . a r e re l _ s u p ( { A } , D v { A } ) = 6 0 / 1 9 5 = 0 . 3 0 8 ,
rel_sup({B}, Dv{B}) = 80/205 = 0.39, rel_sup({A, B},
3.4. Candidate itemsets Dv{A,B}) = 50/100 = 0.5 for the itemsets.
Suppose we set rs at 0.1 and rr at 0.35. Then, we
As discussed, we generate the candidate itemsets find that {A}, {B}, and {A, B} are frequent. Further-
from the frequent itemsets, instead of the RF itemsets, more, {A} is not relative-frequent, but {B} and {A,
from the last phase. Furthermore, when we use the B} are relative-frequent.
frequent itemsets to generate the candidate set in the
next phase, it still satisfies the anti-monotone property, 3.5. The store-chain association rules
because we use the same base to compute the supports
for all itemsets. We illustrate the computation process Having found the RF itemsets, we proceed to
by the following example. calculate the confidence values and to find all the
SC association rules. As defined in Definition 3,
Example 1. Suppose there are 15 periods, from T1 to
the confidence value is given by conf(X Z Y) =
T15, and the numbers of transactions occurring in
rel_sup(X[Y, DVX[Y)/rel_sup(X, DVX[Y). If the confi-
these 15 periods are 19, 17, 14, 25, 20, 17, 15, 27, 21,
dence value exceeds rc, the SC association rule holds.
20, 22, 18, 25, 21, and 19, respectively. Assume that
There is an issue that must be dealt with in
the selling periods of product A are from T1 to T10,
computing the confidence value. In the calculation
and that there are 60 transactions containing product
of rel_sup(X[Y, DVX[Y)/rel_sup(X, DVX[Y), we obtain
the numerator after the phase of processing X[Y. But
the denominator is still undetermined after the phase
of processing X[Y, because the length of X is smaller
than that of X[Y, and we process the itemsets of the
same length in a single phase. One possible solution
to this problem is to add one step after the phase of
processing X[Y. In this new step, we compute support
levels of all subsets of X[Y under the context of VX[Y;
i.e., the support levels of X in database DVX[Y, where X
Fig. 5. An example of the TS table. is a subset of X[Y.
346 Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354

If the RF itemset produced in each phase needs support levels of all the subsets X of Z under the
another scan to produce the confidence value, then the context of Vz.
number of scans of the database in this algorithm is For example, suppose that two RF itemsets are
twice that required by the Apriori algorithm. In order generated in the third phase: {A, B, C} and {C, D, E}.
to reduce this requirement, we use another method: if In the fourth phase, we build the PT tables for all the
Z is an RF itemset found in the kth phase, we compute RF itemsets in RF3. And when a transaction is read,
rel_sup(X, DVZ) in the (k + 1) phase by ‘‘hitchhiking,’’ we need to check whether it includes any candidates
where X is a subset of Z. In other words, in phase in C4, as well as whether its time and store combina-
k + 1, we perform two operations: the first is to find tion is in the contexts of {A, B, C} or {C, D, E}. If the
the RF itemset of length k + 1, and the second is to time and store combination of the transaction does not
compute the relative supports, such as rel_sup(X, conform to the context of {A, B, C}, we need to check
DVZ). All these values are calculated during the same whether it does to that of {C, D, E}. If it does, we
scan of the database. Consequently, the proposed proceed to check whether it includes any subsets like
method requires only one more scan than the Apriori {C, D, E}: {C}, {D}, {E}, {C, D}, {C, E}, and {D,
algorithm to obtain the confidence value when the RF E}. If it does, the counters of all the matching subsets
itemset of the last phase is produced. This process is are increased by one.
included as line 11 in the algorithm, and in Fig. 6, we Finally, line 14 in Fig. 1 shows the step for
give the process of computing rel_sup(X, DVZ) for all generating the store chain association rule X Z Y,
subsets X of Z. where X[Y is in RFk 1. It is not difficult to compute
In order to compute rel_sup(X, DVZ) for all subsets the confidence of the rule—i.e., rel_sup(X[Y, DVX[Y)/
X of Z, we must enumerate all the subsets, X, of each rel_sup(X, DVX[Y)—because rel_sup(X[Y, DVX[Y) has
RF itemset, Z, in the previous phase—if the length of already been found in the previous phase and
the RF itemset is k, the number of subsets is 2k 2. rel_sup(X, DVX[Y) found in the current phase.
Because each Z has its own PT table (built in line 8 of
Fig. 1), we need to check whether a transaction 3.6. Complexity analysis
happens in the PT tables of all RF itemsets Z every
time a transaction is read in, after computing the In this section, we analyze the time complexity and
supports of the candidates in Ck. If not, it indicates memory space complexity of the algorithm. Let m be
that this transaction does not happen under the the number of items, n the number of transactions in
context of Vz, and it can be ignored. On the other the database, l the number of items in a transaction.
hand, if the answer is positive, it indicates that this Further, let x denote the largest value of ACkA. Note
transaction happens under the context of Vz, and, as a that, although ACkA can theoretically be as large as
result, we need to check if the transaction includes O(mk), ACkA is very unlikely to be larger than O(m2)
any subset X of Z. This enables us to determine the in practice. This is because, in an Apriori-like algo-

Fig. 6. Compute the support counts of all the subsets X of Z.

Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354 347

rithm [1,2,6,20], C2 usually has the largest size among to all subsets of itemsets in RFk 1 rather than
all candidate sets. We discuss the time complexities of all itemsets in RFk 1. Therefore, it can be
the steps of Apriori_TP algorithm separately, as well done by first reading every transaction and
as the total time complexity of the algorithm, as every itemset in RFk 1, generating all subsets,
follows. and finally adding the counts. Performing
these operations requires time Oðn l
1. In step 1, we construct the PT table for each item. ARFk 1 A 2k 1 Þ . Since OðARFk 1 AÞ V
To produce the table for an item, its Bit Matrix OðACk 1 AÞ , the time required for this
table with APA rows and ATA columns needs to be part is Oðn l x 2k 1 Þ.
linearly scanned and processed. Thus, the time for 4.4. Step 12 is used to generate Fk from Ck. Since
step 1 is O(m A P A A T A). the support of each itemset in Ck must be
2. In steps 2 to 4, two operations are performed: (1) checked if it is no less than rs, the time is
compute the supports of all itemsets in C1, and OðACk AÞ ¼ Oðx Þ.
(2) construct the TS table. Since the first 4.5. Step 13 is for generating RFk from Fk. Because
operation requires a linear scan of all the items the support of each itemset in Fk must be
in every transaction, its time is O(n l). The time checked if it is no less than rr , the time is
needed for the second operation is O(n) because O(AFkA). Since OðAFk AÞ V OðACk AÞ, we have
we examine the attached time and store identifier the total time O(x ).
of each transaction, it requires time O(n). As a 4.6. In steps 14 through 17, we compute the
result, the total time for the three steps is confidence of x Z y where x[y in RFk 1.
O(n l). That means, for each z = x[y in RFk 1, we
3. Step 5 is for determining F1 by examining the need to check all of its subsets. Therefore,
support of every itemset in C1. Since C1 has n there are totally ARFk 1 A 2k 1 possible
itemsets, the time needed for the step is O(n). combinations. Since each combination needs
4. There is a loop from steps 6 to 18. The time a simple division, the total time for this part is
complexities of the steps in iteration k of the loop OðARFk 1 A 2k 1 Þ. Furthermore, because
are discussed as follows. OðARFk 1 AÞ V OðACk 1 AÞ , the total time
4.1. In step 7, we generate Ck. Consequently, the required is O(x 2k 1).
required time is O(ACkA). Because we assume
O(ACkA) V O(x ), the time is O(x ). From the above analysis, we know that two parts of
4.2. In step 8, we build a PT table for each itemset the algorithm are most time consuming. The first is
z in RFk 1. We need k 2 merging operations step 8, and the second is steps 9 through 11, which
for this step because the k 1 PT tables of require times Oðx k APA AT AÞ and Oðn
every individual item in z need to be merged. l x 2k 1 Þ, respectively. Let K denote the total
Because each merging operation can be done number of the phases in the loop from step 6 to step 17.
in time O(APA ATA), the total time for step Then the total time is Oðx K 2 APA AT AÞ þ
8 is O(ARFk 1A k APA ATA). Since Oðn l x K 2K Þ.
RFk 1 pCk 1 , we have O(ARFk 1A) VO(x ) Next, we analyze the memory space required for
and the total time becomes O(x k the algorithm. We perform the analysis by examining
APA ATA). the space needed to store the data structures used in
4.3. In steps 9 to 11, there are two tasks: (1) the algorithm.
compute the supports of all itemsets in Ck, and
(2) compute the supports of all subsets of 1. Because the space requirement for the PT-
itemsets in RFk 1. The time required for the Interval table for each item is OðAPA AT AÞ,
first task is O(n l ACkA), because it can be the total requirement for all individual items is
done by first reading every transaction and Oðm APA AT AÞ.
then adding the counts to the corresponding 2. The requirement for the PT-Interval table for each
itemsets. In the second task, we add the counts itemset in RFk 1 is OðARFk 1 A APA AT AÞ.
348 Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354

Since OðARFk1 AÞV OðACk1 AÞ, the total require- Table 1

ment for all itemsets is Oðx APA AT AÞ. Parameters used in simulation
3. The requirement for the TS table is OðAPA AT AÞ, D Number of transactions
q Number of stores
because it is a single table.
m Number of periods
4. The space requirements for Ck, RFk, and Fk are r Number of items
O(x ) individually. Note that, because the same L Average length of transactions
space can be shared by different iterations in the Fl Average length of maximum potentially frequent itemsets
loop from steps 6 to 17, we only need one copy of Fd Number of maximum potentially frequent itemsets
Su,Sl The maximum and minimum sizes of stores
them rather than multiple copies.
Id Replacement rates of items
5. The space for storing the supports of all subsets
of itemsets in RFk 1 is OðARF k 1 A 2k 1 Þ.
Since OðARFk 1 AÞ V OðACk 1 AÞ , the required sizes, respectively, and the size of store i for 1 V i V q,
space is Oðx 2k 1 Þ. denoted by Si, is generated by a uniform distribution
6. Combining all the space requirements, we find the between Su and Sl. We assume that the total number of
total space is Oðx 2K Þ þ Oðx APA AT AÞ. transactions and the number of products are dependent
on a store’s size. In addition, we also allow the stores
to have different product replacement (turnover) ra-
4. Performance evaluation tios. In the simulation, these relationships are estab-
lished by generating m random numbers for the store i
In this section, we perform a simulation study to from a Poisson distribution with mean Si, and we use
empirically compare the proposed and traditional the jth number, denoted by Wij, as the weight of store i
association-rule mining methods. The main objective in period j. Let Dij denote the number of transactions
of the simulation study is to identify the conditions of store i in period j. The total number of transactions,
under which the proposed method significantly out- D, is distributed to the store i, and period j is
performs the traditional method in identifying impor- determined by:
tant purchasing patterns in a multi-store environment.
Three factors are considered in the study: (1) the D
Dij ¼ Wij
numbers of stores and periods, (2) the store size, P X
X T

and (3) the product replacement ratio. In addition, Wmn

m¼0 n¼0
we evaluate the computational efficiency of the pro-
posed Apriori_TP algorithm using the Apriori algo-
rithm [1] as the baseline for comparisons. The Furthermore, we assume that the number of prod-
proposed algorithm is implemented by Borland C+ ucts in a store is proportional
pffiffiffiffi to the square root of its
language and tested on a PC with a Celeron 1.8 G size. Thus, let ISi ¼ Si for i = 1, 2, . . ., q. Then, the
processor and 768 MB main memory under the number of products in store i, denoted by Ni, is
Windows 2000 operating system. determined by the following formula:

r
4.1. Data generation Ni ¼ ISi
MaxðISi Þ
In the experiment, we randomly generate the
synthetic transactional data sets by applying the data Note that the products sold in a store may change
generation algorithm proposed by Agrawal and Sri- over time, although Ni is kept the same in all periods.
kant [1]. The factors considered in the simulation are Since the parameter Id is the proportion of products
listed in Table 1. In addition, we generate the time that will be replaced in every period, store i replaces
and store information for each transaction in the data Ni Id products in each period. Furthermore, we
sets. follow the method used by Agrawal and Srikant [1]
To generate the store sizes, we use two parameters, to generate Fd maximum potentially frequent itemsets
Su and Sl, to represent the largest and smallest store with an average length of Fl.
Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354 349

Finally, we generate all the transactions in the data The type A error measures the relative difference
sets. To generate the transactions for store i in period j, in the support levels of all frequent itemsets gener-
we generate Dij from a Poisson distribution with mean ated by the traditional and proposed methods. It is
L and a series of maximum potentially frequent item- determined by rel _ sup(X, D V X ) sup(X, D)/
sets. If an itemset generated from the process has rel_sup(X, DVX). For example, if the support and
some items not sold at store i in period j, we remove relative support for an itemset X are sup(X, D) = 0.02
these items, and repetitively add the items into the and rel_sup(X, DVX) = 0.03, respectively, then the
transaction until we have reached the intended size. If type A error rate is rel_sup(X, DVX) sup(X, D))/
the last itemset exceeds the boundary of this transac- rel_sup(X, DVX) = 33.33%. By averaging the error
tion, we remove the part that exceeds the boundary. rates of all frequent itemsets, we obtain the overall
When adding an itemset to a transaction, we use a type A error rate. Similarly, the type B error is used
‘‘corruption level,’’ c = 0.7, to simulate the phenome- to compare the difference in confidence levels of all
non that all the items in a frequent itemset do not rules generated by the traditional and proposed meth-
always appear together. Information on how the ods. It is defined as conf(X Z Y) conf V(X Z Y))/conf
corruption level affects the procedure of generating (X Z Y), where conf V(X Z Y) is the rule confidence
items for a transaction is included in the paper by computed by the traditional methods. By averaging
Agrawal and Srikant [1]. To generate the nine types of the type B error rates of all common rules in the two
data sets shown in Table 2, we use the following methods, we obtain the overall type B error rate.
parameter values: r = 1000, D = 100 K, L = 6, Fl = 4, Finally, the type C error is used to compare the relative
and Fd = 1000. For each type of the data sets, 10 difference in the numbers of rules generated by the two
replications are generated for statistical analysis of the methods. Note that we set rs and rr at the same level
results. when evaluating the types A and B error rates. It is
because the frequent itemsets found by the two algo-
4.2. Performance measures rithms have to be the same in order to have a common
base to compare the results produced by the two
As discussed in Section 2, the traditional method algorithms. Furthermore, we set rc at 1% in the
underestimates the support and the confidence values comparison based on the type B error. Using this
and, as a result, may fail to identify important pur- low value, we can include almost all possible rules
chasing patterns in a multiple-store environment. We in the comparison. However, because in a practical
define three measures (errors) for empirically assess- situation the minimum confidence threshold could be
ing the magnitudes of the deviations in support, higher than this value, we also obtain the results for
confidence, and the number of association rules when selected minimum confidence values ranging from
we use the traditional association rules for the store- 40% to 60%. Finally, we set rs at 0.5% in the
chain data. comparison based on the type C error.

Table 2 4.3. Simulation results

Data sets
Data set Number Number Range of Product The first comparison is carried out based on the
of stores of periods store sizes replacement first three types of data sets in Table 2. Because these
rate three types of data sets have different numbers of
1 5 5 50 – 100 0.001 stores and periods, the results show the effects of the
2 10 10 50 – 100 0.001 size of store chain and the length of time on the errors
3 50 50 50 – 100 0.001
associated with using the traditional method. In order
4 50 50 10 – 100 0.001
5 50 50 50 – 100 0.001 to study the effect of rs, we also obtain the results for
6 50 50 90 – 100 0.001 selected minimum support thresholds ranging from
7 50 50 50 – 100 0.001 0.3% to 0.6%. The averages of the types A, B, and C
8 50 50 50 – 100 0.005 errors are shown in parts (a), (b), and (c), respectively,
9 50 50 50 – 100 0.010
of Fig. 7. The two-factor ANOVA model is used to
350 Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354

reported, 40% of the SC rules are not successfully

discovered when rc is 60%.
The second comparison is used to study the effects
of the store size on the error rates. The data set types
4, 5, and 6 are used, and the average error rates are
shown in Fig. 8. The results of statistical analysis
based on the two-factor ANOVA model indicate that
the error rates are significantly larger when the store
size has a larger variation. As shown in parts (a) and

Fig. 7. (a) Effects of the numbers of stores and periods on the type A
error rate. (b) Effects of the numbers of stores and periods on the
type B error rate. (c) Effects of the numbers of stores and periods on
the type C error rate.

analyze the results. We find that all the three error

rates are significantly larger in the cases involving
larger numbers of stores and periods. We also notice
that the error rates generally increase as the minimum
support decreases. All these results suggest that the
traditional method is not suitable for the store-chain Fig. 8. (a) Effects of store size on the type A error rate. (b) Effects of
data. The result in part (c) of the figure further store size on the type B error rate. (c) Effects of store size on the
supports this conclusion, where, in the worst case type C error rate.
Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354 351

comparison. The results, shown in Fig. 9, indicate

that the error rates associated with larger replacement
ratios are significantly higher than those associated
with smaller replacement ratios. We also notice that
the error rates increase as the minimum support
decreases. These observations are supported by the
results of our statistical analysis. Consequently, we
conclude that the performance of the traditional meth-
od deteriorates as the product replacement ratio
increases.
In the second part of the simulation study, we
observe how the type B error rate changes when rc
is varied from 40% to 60%. In this experiment, we set
rs at 0.5% and use data set types 2, 4, 5, and 9 for
comparison. We use data set type 5 as the baseline;
data set type 2 to study the effect of smaller numbers
of time periods and stores; and data set types 4 and 9,
to study a larger variation in store size and a larger
product replacement rate, respectively.
The simulation results are summarized in Fig. 10,
where lines 1, 2, 3, and 4 correspond to the results of
data sets 5, 2, 4 and 9, respectively. The result
indicates that the error rate decreases significantly as
we increase rc. This is because, when rc is higher,
only those rules with higher confidence values are
used in comparison, causing the type B error rate to
decrease. Furthermore, we found that the effect of the
product replacement rate is very similar to that of the
numbers of periods and stores, and both factors are
stronger than that of the variation in store size.

Fig. 9. (a) Effects of product replacement ratio on the type A error

rate. (b) Effects of product replacement ratio on the type B error
rate. (c) Effects of product replacement ratio on the type C error rate.

(b) of the figure, when the variation of the store size is

the largest, the types A and B errors rates are close to
35% and 23%, respectively. In part (c), we find that
more than 50% of the SC rules are not generated by
the traditional method when rc is 50%, and the error
rate reaches almost 70% when rc is 60%.
In the third part of the simulation study, we
compare the error rates under different replacement
rates. We use data set types 7, 8, and 9 for the Fig. 10. The type B error rates vs. minimum confidence thresholds.
352 Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354

proposed by Agrawal et al. [1] in 1993, it has become

an established and active research area. The existing
methods, however, may fail to discover important
purchasing patterns in a multi-store environment,
because of an implicit assumption that products under
consideration are on-shelf all the time across all
stores.
To overcome the problem, a new method, called
store-chain association rules, is proposed specifically
for a multi-store environment, where stores may have
different product-mix strategies that can be adjusted
over time. The format of the rules is similar to that of
the traditional rules. However, the rules also contain
information on store (location) and time where the
Fig. 11. Run times. rules hold. The rules extracted by the proposed
method may be applicable to the entire chain without
time restriction, but may also be store- and time-
To summarize the simulation study, we conclude specific. These rules have a distinct advantage over
that the traditional association rules may not be able to the traditional ones because they contain store (loca-
extract all important purchasing patterns for a multi- tion) and time information so that they can be used not
store chain, especially when there are large numbers only for general or local marketing strategies (depend-
of stores and periods, a large variation in store sizes, ing on the results), but also for product procurement,
and high product replacement ratios. This finding is inventory, and distribution strategies for the entire
significant because many store chains are growing in store chain.
size to maintain the economy of scale and, at the same An Apriori-like algorithm is developed for mining
time, dynamically localize their product-mix strate- chain-store association rules. A simulation is used to
gies. All these trends support the need for the pro- empirically compare the proposed and traditional
posed method. association-rule mining methods. Three factors are
Finally, we evaluate the computational efficiency considered in generating stores’ sales data: (1) the
of the proposed algorithm by comparing it with the numbers of stores and periods, (2) the store size, and
Apriori algorithm. We show the result in Fig. 11, (3) the product replacement ratio. The analysis of the
where the running time is obtained by averaging the simulation result suggests that the proposed method
running times of all the data sets in Table 2. From the has advantages over the traditional method especially
figure, we find that the proposed algorithm requires when the numbers of stores and periods are large,
larger process times, but the differences are not stores are diverse in size, and product mix changes
substantial. This result is reasonable, because the rapidly over time. Furthermore, the time complexity
proposed algorithm requires one more scan of the of the proposed algorithm is discussed, and the
data than does the Apriori algorithm, and also requires simulation results show that the algorithm is compu-
additional basic operations in each phase of the tationally efficient.
algorithm. Store-chain association rules represent a promising
research area in data mining. The results of this paper
can be extended by considering time constraints,
5. Conclusion spatial constraints, quantitative attributes and/or tax-
onomy, and other kinds of time- or location-related
Association-rule mining is a useful method of knowledge. Furthermore, it is important to explore the
discovering customer purchasing patterns by extract- strategies of generating the store-chain association
ing associations or co-occurrences from stores’ trans- rules incrementally, in an on-line model, in a distrib-
actional databases. Since the method was first uted environment, or in parallel models.
Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354 353

Acknowledgements [2] R. Agrawal, T. Imielinski, A. Swami, Mining association rules

between sets of items in large databases, Proceedings of the
ACM SIGMOD International Conference on Management of
The first author was supported in part by the Data, Washington, D.C., 1993, pp. 207 – 216.
Ministry of Education (MOE) Program for Promoting [3] J.M. Ale, G.H. Rossi, An approach to discovering temporal
Academic Excellence of Universities under Grant No. association rules, Proceedings of the 2000 ACM Symposium
91-H-FA07-1-4 and National Science Council Grant on Applied Computing (Vol. 1), Villa Olmo, Como, Italy,
2000, pp. 294 – 300.
No. 91-2416-H-008-003.
[4] R.J. Bayardo Jr., R. Agrawal, Mining the most interesting
rules, Proceedings of the 5th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, San
Appendix A Diego, CA, USA, 1999, pp. 145 – 154, Aug.
[5] I. Bose, R.K. Mahapatra, Business data mining—a machine
The support values and the confidence values learning perspective, Information and Management 39 (2001)
211 – 225.
obtained by the traditional association mining are [6] S. Brin, R. Motwani, J.D. Ullman, S. Tsur, Dynamic item-
underestimated, compared with the true value dis- set counting and implication rules for market basket data,
cussed in this paper. First, it is easy to see that the Proceedings of the 1997 ACM-SIGMOD Conference on
traditional support value is lower because its base is Management of Data, Tucson, Arizona, USA, May 1997,
larger. As to the confidence value, say conf(X Z Y), pp. 255 – 264.
[7] E. Clementini, P.D. Felice, K. Koperski, Mining multiple-
the traditional approach defines it as follows. level spatial association rules for objects with a broad
boundary, Data and Knowledge Engineering 34 (3) (2000)
conf ðX Z Y Þ 251 – 270.
[8] M.-S. Chen, J. Han, P.S. Yu, Data mining: an overview from a
¼ supðX [ Y ; DÞ=supðX ; DÞ database perspective, IEEE Transactions on Knowledge and
¼ ½AW ðX [ Y ; DÞA=ADA =½AW ðX ; DÞA=ADA Data Engineering 8 (1996) 866 – 883.
[9] A. Freitas, On rule interestingness measures, Knowledge-
¼ AW ðX [ Y ; DÞA=AW ðX ; DÞA: ðA1Þ Based Systems 12 (5) (1999) 309 – 315.
[10] J. Han, Y. Fu, Mining multiple-level association rules in large
But the correct one should be databases, IEEE Transactions on Knowledge and Data Engi-
neering 11 (5) (1999) 798 – 805.
[11] J. Han, M. Kamber, Data Mining, Morgan Kaufmann, San
conf ðX Z Y Þ Francisco, 2001.
¼ rel supðX [ Y ; DVX [Y Þ=rel supðX ; DVX [Y Þ [12] J. Han, J. Pei, Y. Yin, Mining frequent patterns without can-
didate generation, Proceedings of the 2000 ACM-SIGMOD
¼ ½AW ðX [ Y ; DVX [Y ÞA=ADVX [Y A Int. Conf. on Management of Data, Dallas, TX, 2000, May.
=½AW ðX ; DVX [Y ÞA=ADVX [Y A [13] H. Ishibuchi, T. Nakashima, T. Yamamoto, Fuzzy association
rules for handling continuous attributes, Proceedings of the
¼ AW ðX [ Y ; DVX [Y ÞA=AW ðX ; DVX [Y ÞA ðA2Þ IEEE International Symposium on Industrial Electronics,
Pusan, Korea, 2001, pp. 118 – 121.
By comparing Eq. (A1) with Eq. (A2), we find that [14] C.M. Kuok, A.W. Fu, M.H. Wong, Mining fuzzy association
the numerators are the same, because it is not possible rules in databases, SIGMOD Record 27 (1) (1998) 41 – 46.
that X[Y appears in a transaction not in DVX[Y , and [15] K. Koperski, J. Han, Discovery of spatial association rules in
geographic information databases, Proc. 4th International
that the denominator of Eq. (A1) is no less than that Symposium on Large Spatial Databases (SSD95), Portland,
of Eq. (A2), because AW(X, D)A z AW(X, DVX[Y )A. Maine, Aug. 1995, pp. 47 – 66.
Thus, we conclude that the confidence value of Eq. [16] C.H. Lee, C.R. Lin, M.S. Chen, On mining general temporal
(A1) is no larger than that of Eq. (A2). association rules in a publication database, Proceedings of the
2001 IEEE International Conference on Data Mining, San
Jose, California, 2001, pp. 337 – 344.
[17] Y. Li, P. Ning, X.S. Wang, S. Jajodia, Discovering calendar-
References based temporal association rules, Proceedings of the Eighth
International Symposium on Temporal Representation and
[1] R. Agrawal, R. Srikant, Fast algorithms for mining association Reasoning, Cividale Del Friuli, Italy, 2001, pp. 111 – 118.
rules, Proceedings of the 20th VLDB Conference, Santiago, [18] J. Liu, Y. Pan, K. Wang, J. Han, Mining frequent item sets by
Chile, 1994, pp. 478 – 499. opportunistic projection, Proceedings of the 2002 Int. Conf.
354 Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354

on Knowledge Discovery in Databases, Edmonton, Canada, Kwei Tang is Professor of Management and the area coordinator of
2003, July. quantitative methods in the Krannert Graduate School of Manage-
[19] H. Lu, L. Feng, J. Han, Beyond intra-transaction association ment at Purdue University. He received a BS from National Chiao
analysis: mining multi-dimensional inter-transaction associa- Tung University, Taiwan, an MS from Bowling Green State
tion rules, ACM Transactions on Information Systems 18 (4) University, and a PhD in Management Science from Purdue
(2000) 423 – 454. University. His current research interests include data mining,
[20] J.-S. Park, M.-S. Chen, P.S. Yu, Using a hash-based method supply chain management, and quality management.
with transaction trimming for mining association rules, IEEE
Transactions on Knowledge and Data Engineering 9 (1997) Ren-Jie Shen is a system analyst and designer in Data Systems
813 – 825. Consulting, a leading commercial software company in Taiwan. He
[21] R. Rastogi, K. Shim, Mining optimized association rules with received his MS degree in Information Management from National
categorical and numeric attributes, IEEE Transactions on Central University of Taiwan. His research interests include data
Knowledge and Data Engineering 14 (2002) 29 – 50. mining, information systems and EC technologies.
[22] J.F. Roddick, M. Spiliopoilou, A survey of temporal knowl-
edge discovery paradigms and methods, IEEE Transactions on Ya-Han Hu is currently a PhD student in the Department of
Knowledge and Data Engineering 14 (2002) 750 – 767. Information Management, National Central University, Taiwan.
[23] S. Shekhar, S. Chawla, S. Ravadam, A. Fetterer, X. Liu, C. He received the MS degree in Information Management from
Lu, Spatial databases—accomplishments and needs, IEEE National Central University of Taiwan. His research interests
Transactions on Knowledge and Data Engineering 11 (1999) include data mining, information systems and EC technologies.
45 – 55.
[24] R. Srikant, R. Agrawal, Mining quantitative association rules
in large relational tables, Proceedings of the ACM-SIGMOD
1996 Conference on Management of Data, Montreal, Canada,
1996, pp. 1 – 12, June.
[25] J. Wijsen, R. Meersman, On the complexity of mining quan-
titative association rules, Data Mining and Knowledge Dis-
covery 2 (1998) 263 – 281.

Yen-Liang Chen is Professor of Information Management at

National Central University of Taiwan. He received his PhD degree
in Computer Science from the National Tsing Hua University,
Hsinchu, Taiwan. His current research interests include data mod-
eling, data mining, data warehousing and operations research. He
has published papers in Operations Research, IEEE Transaction on
Software Engineering, Computers and OR, European Journal of
Operational Research, Information and Management, Information
Processing Letters, Information Systems, Journal of Operational
Research Society, and Transportation Research.

Lab7 Manual
No ratings yet
Lab7 Manual
15 pages
End-to-End Supply Chain Management - 2nd edition -: Fast, flexible Supply Chains in Manufacturing and Retailing
From Everand
End-to-End Supply Chain Management - 2nd edition -: Fast, flexible Supply Chains in Manufacturing and Retailing
Joris J.A. Leeman
No ratings yet
SLAE Student Slides PDF
No ratings yet
SLAE Student Slides PDF
263 pages
SAP SD Test Sample Case Document PDF
50% (2)
SAP SD Test Sample Case Document PDF
25 pages
Module 3 Mining Frequent Patterns and Associations
No ratings yet
Module 3 Mining Frequent Patterns and Associations
37 pages
Paper Asosiasi - Bahasa Inggris
No ratings yet
Paper Asosiasi - Bahasa Inggris
5 pages
Data - Analytics - Chapter 3
No ratings yet
Data - Analytics - Chapter 3
54 pages
ARM Merged
No ratings yet
ARM Merged
11 pages
1228-Article Text-4370-1-10-20211215
No ratings yet
1228-Article Text-4370-1-10-20211215
13 pages
Data Mining For Supermarket Sale Analysis Using Association Rule
No ratings yet
Data Mining For Supermarket Sale Analysis Using Association Rule
5 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
Data Mining - 8
No ratings yet
Data Mining - 8
19 pages
5136
No ratings yet
5136
3 pages
Editor in Chief,+final Ejcompute+31
No ratings yet
Editor in Chief,+final Ejcompute+31
7 pages
Unit 4
No ratings yet
Unit 4
8 pages
Implementation of Association Rule Using Apriori A
No ratings yet
Implementation of Association Rule Using Apriori A
10 pages
Data Mining Unit-V
No ratings yet
Data Mining Unit-V
14 pages
TMK - DWDM - Unit 4. From Government Engineering College
No ratings yet
TMK - DWDM - Unit 4. From Government Engineering College
176 pages
Market Basket Analysis For Data Mining Concepts and Techniques
No ratings yet
Market Basket Analysis For Data Mining Concepts and Techniques
4 pages
AprioriTID Algorithm Improved From Apriori Algorithm
No ratings yet
AprioriTID Algorithm Improved From Apriori Algorithm
5 pages
AIML Assignment
No ratings yet
AIML Assignment
5 pages
Market Basket Analysis Using Apriori Algorithm: Volume 5-Issue 2, Paper 47 August 2022
No ratings yet
Market Basket Analysis Using Apriori Algorithm: Volume 5-Issue 2, Paper 47 August 2022
7 pages
Market Basket Analysis For A Supermarket
No ratings yet
Market Basket Analysis For A Supermarket
9 pages
Unit-II Association Rules
No ratings yet
Unit-II Association Rules
16 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Application of Data Mining Techniques To A Selected Business Organization With Special Reference To Buying Behavior
No ratings yet
Application of Data Mining Techniques To A Selected Business Organization With Special Reference To Buying Behavior
13 pages
Market Basket Analysis For A Supermarket
No ratings yet
Market Basket Analysis For A Supermarket
9 pages
ch14 Min Assoc Rules
No ratings yet
ch14 Min Assoc Rules
12 pages
CH 5
No ratings yet
CH 5
53 pages
Association Rule
No ratings yet
Association Rule
20 pages
DM Unit-2
No ratings yet
DM Unit-2
22 pages
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
No ratings yet
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
27 pages
Association Rule Mining
No ratings yet
Association Rule Mining
61 pages
CSA 106 Market Basket Analysis
No ratings yet
CSA 106 Market Basket Analysis
13 pages
DA Unit 4
100% (1)
DA Unit 4
125 pages
Market Basket Analysis: Identify The Changing Trends of Market Data Using Association Rule Mining
No ratings yet
Market Basket Analysis: Identify The Changing Trends of Market Data Using Association Rule Mining
8 pages
Using Association Rules For Product Assortment Decisions A Case Study
No ratings yet
Using Association Rules For Product Assortment Decisions A Case Study
7 pages
UNIT-4 DMCT Discovering Patterns and Rules
No ratings yet
UNIT-4 DMCT Discovering Patterns and Rules
18 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
Market Basket Analysis Using Apriori and FP Growth Algorithm
No ratings yet
Market Basket Analysis Using Apriori and FP Growth Algorithm
7 pages
Association Analysis (DMDW)
No ratings yet
Association Analysis (DMDW)
16 pages
AssociationRule and Apriori
No ratings yet
AssociationRule and Apriori
45 pages
Unit IV
No ratings yet
Unit IV
86 pages
Discover Frequent Items in Small Stationary
No ratings yet
Discover Frequent Items in Small Stationary
16 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
10 pages
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
No ratings yet
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
32 pages
Qisman 2021 J. Phys. Conf. Ser. 1722 012020
No ratings yet
Qisman 2021 J. Phys. Conf. Ser. 1722 012020
14 pages
Mining Frequent Patterns, Associations, and Correlations
No ratings yet
Mining Frequent Patterns, Associations, and Correlations
12 pages
Market Basket Analysis Using Association Rule: ISSN: 2454-132X Impact Factor: 4.295
No ratings yet
Market Basket Analysis Using Association Rule: ISSN: 2454-132X Impact Factor: 4.295
4 pages
Introduction - Need For A New Algorithm - Problem Definition - Algorithm Description
No ratings yet
Introduction - Need For A New Algorithm - Problem Definition - Algorithm Description
40 pages
Association RuleMining
No ratings yet
Association RuleMining
52 pages
Market Basket Analysis and Association Rules
No ratings yet
Market Basket Analysis and Association Rules
21 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
13 pages
Market Basket Analysis and Association Rules
No ratings yet
Market Basket Analysis and Association Rules
18 pages
HTCB Unit 3
No ratings yet
HTCB Unit 3
6 pages
Sales Prediction of Market Using Machine Learning
No ratings yet
Sales Prediction of Market Using Machine Learning
6 pages
High Utility Item Set Find Out Profit On Product
No ratings yet
High Utility Item Set Find Out Profit On Product
4 pages
Project Report
No ratings yet
Project Report
57 pages
Ariori Introduction and Concept
No ratings yet
Ariori Introduction and Concept
37 pages
7638 16634 1 SM
No ratings yet
7638 16634 1 SM
10 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
30 pages
Enterprise Supply Chain Management: Integrating Best in Class Processes
From Everand
Enterprise Supply Chain Management: Integrating Best in Class Processes
Vivek Sehgal
No ratings yet
Military Supply Chain Management: From Deployment to Victory, Mastering the Logistics Dance
From Everand
Military Supply Chain Management: From Deployment to Victory, Mastering the Logistics Dance
Fouad Sabry
No ratings yet
CS 432/536 (SP 17-18) - Dr. Mian Muhammad Awais Page 1 of 2
No ratings yet
CS 432/536 (SP 17-18) - Dr. Mian Muhammad Awais Page 1 of 2
2 pages
Data Mining Assignment 1
No ratings yet
Data Mining Assignment 1
2 pages
Assessment Activity: Marketing Concepts: Aamer Adam
No ratings yet
Assessment Activity: Marketing Concepts: Aamer Adam
11 pages
Combination of Multiple Classifiers For The Customer's Purchase Behavior Prediction
No ratings yet
Combination of Multiple Classifiers For The Customer's Purchase Behavior Prediction
9 pages
Application of Predictive Analytics in Customer Relationship Mana
No ratings yet
Application of Predictive Analytics in Customer Relationship Mana
8 pages
Next-Item The Market Basket: Discovery Analysis
No ratings yet
Next-Item The Market Basket: Discovery Analysis
2 pages
Early Prediction of Market Success For New Grocery Products: - Louis A. Fourt and Joseph W. Woodlock
No ratings yet
Early Prediction of Market Success For New Grocery Products: - Louis A. Fourt and Joseph W. Woodlock
8 pages
A Market Basket Analysis Conducted With A Multivariate Logit Mod
No ratings yet
A Market Basket Analysis Conducted With A Multivariate Logit Mod
8 pages
HCPC Husson Josse
No ratings yet
HCPC Husson Josse
17 pages
Journal of Fashion Marketing and Management: An International Journal
No ratings yet
Journal of Fashion Marketing and Management: An International Journal
18 pages
Lariviere 2005
No ratings yet
Lariviere 2005
13 pages
Predicting Online Purchase Intentions For Clothing Products
100% (1)
Predicting Online Purchase Intentions For Clothing Products
15 pages
ASKEY-TCG220-d: D3.0 8x4 Data Cable Modem
No ratings yet
ASKEY-TCG220-d: D3.0 8x4 Data Cable Modem
2 pages
PM75 Spec Sheet ACS
No ratings yet
PM75 Spec Sheet ACS
2 pages
Service Oriented Architecture: Importance of Soa
No ratings yet
Service Oriented Architecture: Importance of Soa
4 pages
Mysql Cheat Sheet A4
No ratings yet
Mysql Cheat Sheet A4
2 pages
12 Smith and Nephew LENS Integrated Camera Processor and LED Light Source B
No ratings yet
12 Smith and Nephew LENS Integrated Camera Processor and LED Light Source B
2 pages
Pyhon FastAPI
No ratings yet
Pyhon FastAPI
10 pages
Bddi
No ratings yet
Bddi
2 pages
Comet 0221 - Ic
No ratings yet
Comet 0221 - Ic
22 pages
EOS Attribute Identifiers
No ratings yet
EOS Attribute Identifiers
4 pages
UnitTest Tutorial
No ratings yet
UnitTest Tutorial
3 pages
Hordhac Excel
No ratings yet
Hordhac Excel
39 pages
VideoXpert Professional™ V 3.12
No ratings yet
VideoXpert Professional™ V 3.12
14 pages
Temp Anr 2264613547763101732
No ratings yet
Temp Anr 2264613547763101732
45 pages
DG-HR3300TA - Digisol
No ratings yet
DG-HR3300TA - Digisol
6 pages
Stend
No ratings yet
Stend
11 pages
Entry Exit Manual
No ratings yet
Entry Exit Manual
25 pages
Closing Processes in RE-FX Leasing: Symptom
No ratings yet
Closing Processes in RE-FX Leasing: Symptom
5 pages
Kendriya Vidyalaya Ujjain: Submitted By:-Subodhit Chouhan
No ratings yet
Kendriya Vidyalaya Ujjain: Submitted By:-Subodhit Chouhan
21 pages
DXX
No ratings yet
DXX
55 pages
Linux: How To Configure The DNS Server For 11gR2 SCAN (ID 1107295.1)
No ratings yet
Linux: How To Configure The DNS Server For 11gR2 SCAN (ID 1107295.1)
4 pages
FPGA Keyboard Interface - Embedded Thoughts
No ratings yet
FPGA Keyboard Interface - Embedded Thoughts
29 pages
Group 2
No ratings yet
Group 2
22 pages
Education: ( Marked Courses Have A Laboratory Component As Well)
No ratings yet
Education: ( Marked Courses Have A Laboratory Component As Well)
1 page
ANIK CHATTERJEE - Cyber
No ratings yet
ANIK CHATTERJEE - Cyber
10 pages
Printing Behavior
No ratings yet
Printing Behavior
27 pages
Cash Recycling Atm SR7500 / TS-EA45ATM
No ratings yet
Cash Recycling Atm SR7500 / TS-EA45ATM
2 pages
On MUX - Hrishita Saha
No ratings yet
On MUX - Hrishita Saha
8 pages