0% found this document useful (0 votes)
54 views16 pages

Market Basket Analysis in A Multiple Store Environment: Yen-Liang Chen, Kwei Tang, Ren-Jie Shen, Ya-Han Hu

1) Market basket analysis is a useful method to discover customer purchasing patterns by analyzing transaction data from stores. However, existing methods have weaknesses when applied to multi-store environments where products and purchasing patterns may vary across stores and over time. 2) The proposed new method overcomes these weaknesses by extracting association rules that include information on the specific stores and time periods where the rules apply. 3) The results show the new method is computationally efficient and has advantages over traditional methods for discovering patterns when analyzing diverse stores with changing products and purchases over larger numbers of stores and time periods.

Uploaded by

Zain Aamir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views16 pages

Market Basket Analysis in A Multiple Store Environment: Yen-Liang Chen, Kwei Tang, Ren-Jie Shen, Ya-Han Hu

1) Market basket analysis is a useful method to discover customer purchasing patterns by analyzing transaction data from stores. However, existing methods have weaknesses when applied to multi-store environments where products and purchasing patterns may vary across stores and over time. 2) The proposed new method overcomes these weaknesses by extracting association rules that include information on the specific stores and time periods where the rules apply. 3) The results show the new method is computationally efficient and has advantages over traditional methods for discovering patterns when analyzing diverse stores with changing products and purchases over larger numbers of stores and time periods.

Uploaded by

Zain Aamir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Decision Support Systems 40 (2005) 339 – 354

www.elsevier.com/locate/dsw

Market basket analysis in a multiple store environment


Yen-Liang Chen a, Kwei Tang b,*, Ren-Jie Shen a, Ya-Han Hu a
a
Department of Information Management, National Central University, Chung-Li, 320 Taiwan, ROC
b
Krannert Graduate School of Management, Purdue University, West Lafayette, IN 47907, USA
Received 1 December 2003; received in revised form 5 April 2004; accepted 5 April 2004
Available online 2 June 2004

Abstract

Market basket analysis (also known as association-rule mining) is a useful method of discovering customer purchasing
patterns by extracting associations or co-occurrences from stores’ transactional databases. Because the information obtained
from the analysis can be used in forming marketing, sales, service, and operation strategies, it has drawn increased research
interest. The existing methods, however, may fail to discover important purchasing patterns in a multi-store environment,
because of an implicit assumption that products under consideration are on shelf all the time across all stores. In this paper, we
propose a new method to overcome this weakness. Our empirical evaluation shows that the proposed method is computationally
efficient, and that it has advantage over the traditional method when stores are diverse in size, product mix changes rapidly over
time, and larger numbers of stores and periods are considered.
D 2004 Elsevier B.V. All rights reserved.

Keywords: Association rules; Data mining; Store chain; Algorithm

1. Introduction by extracting associations or co-occurrences from


stores’ transactional databases. Discovering, for exam-
Because of advances in information and communi- ple, that supermarket customers are likely to purchase
cation technologies, corporations can effectively ob- milk, bread, and cheese together, or that bank custom-
tain and store transactional and demographic data on ers are likely to use a set of services jointly, can help
individual customers at reasonable costs. One of the managers in designing store layout, web sites, product
challenges for corporations that have invested heavily mix and bundling, and other marketing strategies.
in customer data collection is how to extract important The methodology was introduced by Agrawal et al.
information from their vast customer databases in [2] and can be stated as follows. Given two non-
order to gain competitive advantage. Market basket overlapping subsets of product items, X and Y, an
analysis (also known as association rule mining) is a association rule in form of X ! Y indicates a purchase
method of discovering customer purchasing patterns pattern that if a customer purchases X then he or she
also purchases Y. Two measures, support and confi-
* Corresponding author. Tel.: +1-765-494-4464; fax: +1-765-
dence, are commonly used to select the association
494-9658. rules. Support is a measure of how often the transac-
E-mail address: [email protected] (K. Tang). tional records in the database contain both X and Y,

0167-9236/$ - see front matter D 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.dss.2004.04.009
340 Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354

and confidence is a measure of the accuracy of the product set. The results of the method may be biased,
rule, defined as the ratio of the number of transac- however, because a product may be on shelf before its
tional records with both X and Y to the number of first transaction and/or after the last transaction
transactional records with X only. By far, the Apriori occurs, and a product may also be put on-shelf and
algorithm [1] is the most known algorithm for mining taken off-shelf multiple times during the data collec-
the association rules from a transactional database, tion period.
which satisfy the minimum support and confidence The second problem is associated with finding
levels specified by users. common association patterns in subsets of stores.
Since association rules are useful and easy to Similar to the problem in using existing temporal
understand, there have been many successful business rules in a multi-store environment, we have to con-
applications, including, for example, finance, telecom- sider the possibility that some products may not be
munication, marketing, retailing, and web analysis sold in some stores, for example, because of geo-
[5]. The method has also attracted increased research graphical, environmental, or political reasons. This is
interest, and many extensions have been proposed in seemingly related to spatial association rules. Howev-
recent years, including (1) algorithm improvements er, the focus of spatial rules is on finding the associ-
[6,12,18,21]; (2) fuzzy rules [13,14]; (3) multi-level ation patterns that are related to topological or
and generalized rules [7,10]; (4) quantitative rules distance information in, for example, maps, remote
[20,24,25]; (5) spatial rules [7,15]; (6) inter-transac- sensing or medical imaging data and VLSI chip layout
tion rules [19]; (7) interesting rules [4,9]; and (8) data [23].
temporal association rules [3,16,17]. Brief literature To overcome the problems, we develop an Apriori-
reviews of association rules are given by Chen et al. like algorithm for automatically extracting association
[8] and Han and Kamber [11]. rules in a multi-store environment. The format of the
In today’s business world, it is common for a rules is similar to that of the traditional rules. How-
company to have subsidiaries, branches, dealers, or ever, the rules also contain information on store
franchises in different geographical locations. For (location) and time where the rules hold. The results
example, Wal-Mart, the largest supermarket chain in of the proposed method may contain rules that are
the world, has more than 4400 stores worldwide. For a applicable to the entire chain without time restriction
company with multiple stores, discovery of purchas- or to a subset of stores in specific time intervals. For
ing patterns that may vary over time and exist in all, or example, a rule may state: ‘‘In the second week of
in subsets of, stores can be useful in forming market- August, customers purchase computers, printers, In-
ing, sales, service, and operation strategies at the ternet and wireless phone services jointly in electron-
company, local, and store levels. ics stores near campus.’’ Another example is: ‘‘In
There are two main problems in using the existing January, customers purchase cold medicine, humidi-
methods in a multi-store environment. The first is fiers, coffee, and sunglasses together in supermarkets
caused by the temporal nature of purchasing patterns. near skiing resorts.’’ These rules can be used not only
An apparent example is seasonal products. Temporal for general or localized marketing strategies, but also
rules [3,16,17] are developed to overcome the weak- for product procurement, inventory, and distribution
ness of the static association rules that either find strategies for the entire store chain. Furthermore, we
patterns at a point of time or implicitly assume the allow an item to have multiple selling time periods;
patterns stay the same over time and across stores. A i.e., an item may be put on-shelf and taken off-shelf
literature review on temporal rules is given by Rod- multiple times. We further assume that different stores
dick and Spiliopoulou [22]. In temporal rules, selling can have different product-mixes in different time
periods are considered in computing the support periods. That is, each store can have its own prod-
value, where the selling period of a product is defined uct-mix, and the product-mix in a store can be
as the time between its first and last appearances in the dynamically changed over time.
transaction records. Furthermore, the common selling Because the time and store (location) factors are
period of the products in a product set is used as the considered, the rule generation procedure is more
base in computing the ‘‘temporal support’’ of the complicated than the Apriori algorithm. The simula-
Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354 341

tion results presented in the paper show that the timestamp, t, and store identifier, p, to indicate the
proposed method is computationally efficient and store and time that the transaction occurs.
has significant advantage over the traditional associ- Let SkpP and RkpT be the sets of the stores and
ation method when the stores under consideration are times that item Ik is sold, respectively. We define
diverse in size and have product mixes that change VIk = Sk  Rk as the context of item Ik; i.e., the set of
rapidly over time. the combinations of stores and times where item Ik is
The paper is organized as follows. We formally sold. Furthermore, the context of itemset X, denoted
define the problem in Section 2 and in Section 3 by VX, is the set of the combinations of stores and
propose an algorithm. In Section 4, we compare the times that all items in X are sold concurrently. For
results generated from the proposed algorithm and the example, if itemset X consists of two items Ik and Ik’,
traditional Apriori algorithm in a simulated multi-store the context of X is given by VX = VIk\Vik’.
environment. The conclusion is given in Section 5.
Definition 2. Let X be an itemset in I with context VX,
and DVX the subset of transactions in D whose
2. Problem definition timestamps t and store identifiers p satisfy VX. We
define the relative support of X with respect to the
We consider a market basket database D that context VX, denoted by rel_sup(X, DVX), as AW (X,
contains transactional records from multiple stores DVX)A/ADVXA. For a given threshold for relative
over time period T. Our objective is to extract the support rr, if a frequent itemset X satisfies rel_sup(X,
association rules from the database. For convenience DVX) z rr, we call X a relative-frequent (RF) itemset.
in presentation, the cardinal of a set, say R, is denoted In the last definition, we require that a relative-
by ARA. Let I={I1, I2,. . ., Ir} be the set of product frequent itemset X be frequent. We add this restriction
items included in D, where Ik (1 V k V r) is the iden- for two reasons. First, it enables us to preserve the
tifier for the kth item. Let X be a set of items in I. We well-known downward-closure property, by which the
refer X as a k-itemset if AXA = k. Furthermore, a candidate set of the next phase can be obtained by
transaction, denoted by s, is a subset of I. We use joining the frequent sets of the preceding phase; this
W(X, D)={sAsaD^Xps} to denote the set of trans- will greatly improve the performance of the algorithm.
actions in D, which contain itemset X. Second, this restriction does not present any real
problem to the mining algorithm, because none of
Definition 1. The support of X, denoted by sup(X, D),
the important patterns would be missing because of
is the fraction of transactions containing X in database
using a low rs value. Therefore, we prefer using a low
D; i.e., sup(X, D) = AW(X, D)A/ADA. For a specified
rs value. However, it should not be too low because
support threshold rs, X is a frequent itemset if sup(X,
an itemset that occurs only in few transactions has no
D) z rs.
practical significance.
Note that the definitions of the support and the Furthermore, the minimum threshold for the rela-
frequent itemset are those used in the traditional tive support of an itemset is used to determine whether
association rules, and, therefore, the store and time a sufficient percentage of transactions exists in its
information is not considered in determining the context to warrant the inclusion of the itemset as a
support of an itemset. relative-frequent (RF) itemset. Its use and purpose are
Let {T1, T2,. . ., Tm} be the set of mutually disjoint similar to those of the traditional minimum support
time intervals (periods) and form a complete partition threshold. Consequently, we can set its value the same
of T. Furthermore, they are ordered, such that Ti + 1 way as we set the traditional minimum support
immediately follows Ti for i z 1. Note that the time threshold.
periods are defined according to specific needs of the
problem, such as 1 h, 6 h, 1 day, 1 week, and so on. Definition 3. Consider two itemsets X and Y. The
Let P={ P1, P2,. . ., Pq} be the set of stores, where Pj relative support of X with respect to the context Vx[y,
(1 V j V q) denotes the jth store in the store chain. We denoted by rel_sup(X, DVX[Y), is defined as AW(X,
assume that each transaction s in D is attached with a DVX[Y)A/ADVX[YA. The confidence of rule X Z Y,
342 Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354

denoted by conf(X Z Y), is defined as rel_sup(X[Y, RF itemsets. Furthermore, when we use the frequent
DVX[Y)/rel_sup(X, DVX[Y). itemsets to generate the candidate set in the next
The above definition implies that the context of phase, it still satisfies the anti-monotone property,
rule X Z Y is Vx[y; i.e., the base used to compute the because we use the same base to compute the
confidence of rule X Z Y is the common stores and supports for all itemsets.
time periods shared by all the items in X[Y. As the first step of the algorithm, we build a table,
called the PT table, for each item in I to associate the
Definition 4. Let Z be an RF itemset, where Z = X[Y, item with its context (i.e., the stores and times it is
XpI, and Yp I \ X. Given a confidence threshold rc, if sold) and use the table to determine the context of an
conf(X Z Y) z rc, we call X Z Y a store-chain (SC) itemset. The algorithm proceeds in phases, where in
association rule, and Vx[y as the context of the rule. the kth phase we generate Fk from Ck and RFk from
Based on Definitions 1 and 4, it is clear that the Fk. In the first phase, we scan the database for the first
selection criteria and outputs for the store-chain asso- time and build a two-dimensional table, called the TS
ciation rules are different from those of the traditional table. In this table, the entry at the position
association rules. For the store chain rules, the output corresponding to Ti and Pj, denoted by TS(Ti, Pj),
includes the confidence, support, and a context indi- records the number of transactions that occur at store
cating the stores and times the rules hold. Pj in period Ti. Using this table and the PT table for a
It can be shown that the traditional method under- given itemset X, we can determine the number of
estimates the support and the confidence values (a transactions associated with the context of X, i.e.,
proof is given in Appendix A). Consequently, impor- ADVXA. In the kth phase of the algorithm, we first
tant purchasing patterns that satisfy the criteria of the derive Ck, and, then, generate Fk by evaluating their
SC association rules may not be identified by the supports, which can be done by scanning the database
traditional association-rule methods. and removing all infrequent itemsets. Since an RF
itemset must be a frequent itemset, we generate RFk
from Fk by evaluating the relative supports of the
3. Algorithm itemsets X in Fk.
In the following subsections, we give detailed
We propose an Apriori-like algorithm for mining descriptions for the key elements of the algorithm,
the store-chain association rules. The algorithm is including methods of (1) building the PT table, (2)
outlined in Fig. 1. We first explain the general concept building the TS table in the first phase, (3) finding
for developing the algorithm and then use five sub- RFk, (4) generating candidate itemsets, and (5) gen-
sections to give detailed information on several key erating the store-chain association rules.
steps of the algorithm.
In describing the algorithm, we use RFk to denote 3.1. The PT table
the set of all relative-frequent k-itemsets; Fk, the set
of all frequent k-itemsets; and Ck, the set of candi- The purpose of the PT table is to efficiently store
date k-itemsets. Note that, in the traditional Apriori the time and store information for each product item
algorithm, a k-item candidate itemset must be a in the database. We use a simple example to illustrate
combination of k  1 frequent itemsets because of the procedure for constructing the table. Consider the
the anti-monotone property [1]. Therefore, the Apri- bit matrices in Fig. 2 for items I1, I2, and I3, in which
ori algorithm can generate the candidate itemsets in there are six stores and six selling periods, and ‘‘1’’
the kth phase by joining the frequent itemsets in the and ‘‘0’’ indicate, respectively, that the item is or is
(k  1)th phase. However, for the SC association not for sale in the corresponding store and time.
rule, a subset of an RF itemset may not be an RF Because an item normally does not switch between
itemset because the base for calculating the relative on- and off-shelf very frequently in a typical applica-
support value varies in different phases. Consequent- tion, we store an item’s context information in the PT
ly, in the proposed algorithm, we generate candidate table instead of the bit matrix in order to conserve data
itemsets from the frequent itemsets, instead of the storage space. In the PT table, we need only to record
Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354 343

Fig. 1. Algorithm Apriori_TP.

Fig. 2. Bit matrices for I1, I2, and I3.


344 Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354

Let us use an itemset consisting of I1 and I2 as an


example. In order to generate the second row (store)
of the PT table, we start with an initial bit array: [1 1 1
1 1 1]. Since the corresponding row of the PT table for
I2 is [1 2 5], the first, fifth, and sixth elements of the
bit array are replaced by ‘‘0’’, resulting a new bit
array: [0 1 1 1 0 0]. Following the same method, the
third through the sixth elements of the new bit array
Fig. 3. The PT tables for I1, I2, and I3. are replaced by ‘‘0’’ when I2 is considered. As a result,
the final bit array is [0 1 0 0 0 0], and the
the time that the item changes its status between on- corresponding (second) row of the PT table for the
shelf and off-shelf. (Initially, we assume that the item itemset is [1 2 3].
is on-shelf in all the stores.) For example, in Store P1, Using the concept described above, we develop the
item I1 changes its status only in time T4. Therefore, procedure in Fig. 4. In the procedure, PT(k, j) denotes
we need only to record ‘‘4’’ in the PT table to reflect the jth row of the PT table for item k, and its S th
the time and store information for item I1 in Store P1. element is PT(k, j, S ), where S is an odd number. The
Following this procedure, the information of the elements of PTj are replaced by 0’s according to the
original bit matrices for items I1, I2, and I3 is con- rule stated in lines 4 through 6 in the algorithm: a
verted into the PT tables shown in Fig. 3. segment of PTj is replaced by ‘‘0’’, starting from
As mentioned previously, the PT tables for indi- position PT(k, j, S ) and ending at position PT(k, j,
vidual items can be used to determine the PT table for S + 1)  1. PT(k, j) is inserted in sequence into PTj for
a given itemset. The procedure given in Fig. 4 shows every item j in itemset X. The process of developing
how to generate the jth row (store) of the PT table for the PT table is included as line 8 in the algorithm
an itemset X by combining the jth rows of the PT given in Fig. 1.
tables for all items in the itemset. We start with an m-
dimension bit array, denoted by PTj, for the itemset 3.2. The TS table
with initial values of ‘‘1’’ for each of the m time
periods. These initial values are replaced by ‘‘0’’ After building the PT table, the first phase of the
found at the corresponding position in the PT tables algorithm is to build the TS table, where each entry at
for all the items in the itemset. Finally, we transform the position corresponding to Ti and Pj is the number
PTj into PT(X, j), the jth row of the PT table for of transactions that occur at store Pj in period Ti. This
itemset X. can be done by a scan through the database. An

Fig. 4. The method to compute the jth row of the PT table for itemset X.
Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354 345

example of the table is given in Fig. 5. Using the TS A. Furthermore, assume that the selling periods for
and PT tables for itemset X, we can determine the product B are from T6 to T15, and that 80 transactions
value ADVXA by summing all the values in the entries of them include product B. Finally, there are 50
of the TS table according to the store and time transactions containing both products A and B, and
information of the items in X. The process of con- they are sold in periods from T6 to T10.
structing the table is described in lines 2 through 4 in
Fig. 1. In order to compute the supports and the relative
supports for itemsets {A}, {B}, and {A, B}, we
3.3. Relative-frequent itemset identify the following values: AW({A}, Dv{A})A =
AW({A}, D)A = 60, AW({B}, Dv{B} )A = AW({B},
Because an RF itemset must be a frequent itemset, D)A = 80, and AW({A, B}, Dv{A,B})A = AW({A, B},
we can generate RFk from Fk by computing the D)A = 50. Since the base for computing the support is
relative supports of those itemsets X in Fk. It is evident ADA = 300, the supports for the three itemsets are
that AW(X, DVX)A equals AW(X, D)A because it is not given by sup({A}, D) = 60/300 = 0.2, sup({B},
possible for X to appear in a transaction not in DVX. D) = 80/300 = 0.267, and sup({A, B}, D) = 50/
Further, ADVXA can be obtained from the TS and PT 300 = 0.167, respectively. On the other hand, the bases
tables of X. As a result, we can find the RF itemsets by for computing the relative support are ADv{A}A = 195,
first computing the relative supports of all X in Fk and ADv{B}A = 205, and ADv{A[B}A = 100, respectively,
then pruning those itemsets whose relative supports for the three itemsets. As a result, the relative supports
are less than rr . a r e re l _ s u p ( { A } , D v { A } ) = 6 0 / 1 9 5 = 0 . 3 0 8 ,
rel_sup({B}, Dv{B}) = 80/205 = 0.39, rel_sup({A, B},
3.4. Candidate itemsets Dv{A,B}) = 50/100 = 0.5 for the itemsets.
Suppose we set rs at 0.1 and rr at 0.35. Then, we
As discussed, we generate the candidate itemsets find that {A}, {B}, and {A, B} are frequent. Further-
from the frequent itemsets, instead of the RF itemsets, more, {A} is not relative-frequent, but {B} and {A,
from the last phase. Furthermore, when we use the B} are relative-frequent.
frequent itemsets to generate the candidate set in the
next phase, it still satisfies the anti-monotone property, 3.5. The store-chain association rules
because we use the same base to compute the supports
for all itemsets. We illustrate the computation process Having found the RF itemsets, we proceed to
by the following example. calculate the confidence values and to find all the
SC association rules. As defined in Definition 3,
Example 1. Suppose there are 15 periods, from T1 to
the confidence value is given by conf(X Z Y) =
T15, and the numbers of transactions occurring in
rel_sup(X[Y, DVX[Y)/rel_sup(X, DVX[Y). If the confi-
these 15 periods are 19, 17, 14, 25, 20, 17, 15, 27, 21,
dence value exceeds rc, the SC association rule holds.
20, 22, 18, 25, 21, and 19, respectively. Assume that
There is an issue that must be dealt with in
the selling periods of product A are from T1 to T10,
computing the confidence value. In the calculation
and that there are 60 transactions containing product
of rel_sup(X[Y, DVX[Y)/rel_sup(X, DVX[Y), we obtain
the numerator after the phase of processing X[Y. But
the denominator is still undetermined after the phase
of processing X[Y, because the length of X is smaller
than that of X[Y, and we process the itemsets of the
same length in a single phase. One possible solution
to this problem is to add one step after the phase of
processing X[Y. In this new step, we compute support
levels of all subsets of X[Y under the context of VX[Y;
i.e., the support levels of X in database DVX[Y, where X
Fig. 5. An example of the TS table. is a subset of X[Y.
346 Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354

If the RF itemset produced in each phase needs support levels of all the subsets X of Z under the
another scan to produce the confidence value, then the context of Vz.
number of scans of the database in this algorithm is For example, suppose that two RF itemsets are
twice that required by the Apriori algorithm. In order generated in the third phase: {A, B, C} and {C, D, E}.
to reduce this requirement, we use another method: if In the fourth phase, we build the PT tables for all the
Z is an RF itemset found in the kth phase, we compute RF itemsets in RF3. And when a transaction is read,
rel_sup(X, DVZ) in the (k + 1) phase by ‘‘hitchhiking,’’ we need to check whether it includes any candidates
where X is a subset of Z. In other words, in phase in C4, as well as whether its time and store combina-
k + 1, we perform two operations: the first is to find tion is in the contexts of {A, B, C} or {C, D, E}. If the
the RF itemset of length k + 1, and the second is to time and store combination of the transaction does not
compute the relative supports, such as rel_sup(X, conform to the context of {A, B, C}, we need to check
DVZ). All these values are calculated during the same whether it does to that of {C, D, E}. If it does, we
scan of the database. Consequently, the proposed proceed to check whether it includes any subsets like
method requires only one more scan than the Apriori {C, D, E}: {C}, {D}, {E}, {C, D}, {C, E}, and {D,
algorithm to obtain the confidence value when the RF E}. If it does, the counters of all the matching subsets
itemset of the last phase is produced. This process is are increased by one.
included as line 11 in the algorithm, and in Fig. 6, we Finally, line 14 in Fig. 1 shows the step for
give the process of computing rel_sup(X, DVZ) for all generating the store chain association rule X Z Y,
subsets X of Z. where X[Y is in RFk  1. It is not difficult to compute
In order to compute rel_sup(X, DVZ) for all subsets the confidence of the rule—i.e., rel_sup(X[Y, DVX[Y)/
X of Z, we must enumerate all the subsets, X, of each rel_sup(X, DVX[Y)—because rel_sup(X[Y, DVX[Y) has
RF itemset, Z, in the previous phase—if the length of already been found in the previous phase and
the RF itemset is k, the number of subsets is 2k  2. rel_sup(X, DVX[Y) found in the current phase.
Because each Z has its own PT table (built in line 8 of
Fig. 1), we need to check whether a transaction 3.6. Complexity analysis
happens in the PT tables of all RF itemsets Z every
time a transaction is read in, after computing the In this section, we analyze the time complexity and
supports of the candidates in Ck. If not, it indicates memory space complexity of the algorithm. Let m be
that this transaction does not happen under the the number of items, n the number of transactions in
context of Vz, and it can be ignored. On the other the database, l the number of items in a transaction.
hand, if the answer is positive, it indicates that this Further, let x denote the largest value of ACkA. Note
transaction happens under the context of Vz, and, as a that, although ACkA can theoretically be as large as
result, we need to check if the transaction includes O(mk), ACkA is very unlikely to be larger than O(m2)
any subset X of Z. This enables us to determine the in practice. This is because, in an Apriori-like algo-

Fig. 6. Compute the support counts of all the subsets X of Z.


Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354 347

rithm [1,2,6,20], C2 usually has the largest size among to all subsets of itemsets in RFk  1 rather than
all candidate sets. We discuss the time complexities of all itemsets in RFk  1. Therefore, it can be
the steps of Apriori_TP algorithm separately, as well done by first reading every transaction and
as the total time complexity of the algorithm, as every itemset in RFk  1, generating all subsets,
follows. and finally adding the counts. Performing
these operations requires time Oðn  l 
1. In step 1, we construct the PT table for each item. ARFk  1 A  2k  1 Þ . Since OðARFk  1 AÞ V
To produce the table for an item, its Bit Matrix OðACk  1 AÞ , the time required for this
table with APA rows and ATA columns needs to be part is Oðn  l  x  2k  1 Þ.
linearly scanned and processed. Thus, the time for 4.4. Step 12 is used to generate Fk from Ck. Since
step 1 is O(m  A P A  A T A). the support of each itemset in Ck must be
2. In steps 2 to 4, two operations are performed: (1) checked if it is no less than rs, the time is
compute the supports of all itemsets in C1, and OðACk AÞ ¼ Oðx Þ.
(2) construct the TS table. Since the first 4.5. Step 13 is for generating RFk from Fk. Because
operation requires a linear scan of all the items the support of each itemset in Fk must be
in every transaction, its time is O(n  l). The time checked if it is no less than rr , the time is
needed for the second operation is O(n) because O(AFkA). Since OðAFk AÞ V OðACk AÞ, we have
we examine the attached time and store identifier the total time O(x ).
of each transaction, it requires time O(n). As a 4.6. In steps 14 through 17, we compute the
result, the total time for the three steps is confidence of x Z y where x[y in RFk  1.
O(n  l). That means, for each z = x[y in RFk  1, we
3. Step 5 is for determining F1 by examining the need to check all of its subsets. Therefore,
support of every itemset in C1. Since C1 has n there are totally ARFk  1 A  2k  1 possible
itemsets, the time needed for the step is O(n). combinations. Since each combination needs
4. There is a loop from steps 6 to 18. The time a simple division, the total time for this part is
complexities of the steps in iteration k of the loop OðARFk  1 A  2k  1 Þ. Furthermore, because
are discussed as follows. OðARFk  1 AÞ V OðACk  1 AÞ , the total time
4.1. In step 7, we generate Ck. Consequently, the required is O(x  2k  1).
required time is O(ACkA). Because we assume
O(ACkA) V O(x ), the time is O(x ). From the above analysis, we know that two parts of
4.2. In step 8, we build a PT table for each itemset the algorithm are most time consuming. The first is
z in RFk  1. We need k  2 merging operations step 8, and the second is steps 9 through 11, which
for this step because the k  1 PT tables of require times Oðx  k  APA  AT AÞ and Oðn 
every individual item in z need to be merged. l  x  2k  1 Þ, respectively. Let K denote the total
Because each merging operation can be done number of the phases in the loop from step 6 to step 17.
in time O(APA  ATA), the total time for step Then the total time is Oðx  K 2  APA  AT AÞ þ
8 is O(ARFk  1A  k  APA  ATA). Since Oðn  l  x  K  2K Þ.
RFk  1 pCk  1 , we have O(ARFk  1A) VO(x ) Next, we analyze the memory space required for
and the total time becomes O(x  k  the algorithm. We perform the analysis by examining
APA  ATA). the space needed to store the data structures used in
4.3. In steps 9 to 11, there are two tasks: (1) the algorithm.
compute the supports of all itemsets in Ck, and
(2) compute the supports of all subsets of 1. Because the space requirement for the PT-
itemsets in RFk  1. The time required for the Interval table for each item is OðAPA  AT AÞ,
first task is O(n  l  ACkA), because it can be the total requirement for all individual items is
done by first reading every transaction and Oðm  APA  AT AÞ.
then adding the counts to the corresponding 2. The requirement for the PT-Interval table for each
itemsets. In the second task, we add the counts itemset in RFk  1 is OðARFk  1 A  APA  AT AÞ.
348 Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354

Since OðARFk1 AÞV OðACk1 AÞ, the total require- Table 1


ment for all itemsets is Oðx  APA  AT AÞ. Parameters used in simulation
3. The requirement for the TS table is OðAPA  AT AÞ, D Number of transactions
q Number of stores
because it is a single table.
m Number of periods
4. The space requirements for Ck, RFk, and Fk are r Number of items
O(x ) individually. Note that, because the same L Average length of transactions
space can be shared by different iterations in the Fl Average length of maximum potentially frequent itemsets
loop from steps 6 to 17, we only need one copy of Fd Number of maximum potentially frequent itemsets
Su,Sl The maximum and minimum sizes of stores
them rather than multiple copies.
Id Replacement rates of items
5. The space for storing the supports of all subsets
of itemsets in RFk  1 is OðARF k  1 A  2k  1 Þ.
Since OðARFk  1 AÞ V OðACk  1 AÞ , the required sizes, respectively, and the size of store i for 1 V i V q,
space is Oðx  2k  1 Þ. denoted by Si, is generated by a uniform distribution
6. Combining all the space requirements, we find the between Su and Sl. We assume that the total number of
total space is Oðx  2K Þ þ Oðx  APA  AT AÞ. transactions and the number of products are dependent
on a store’s size. In addition, we also allow the stores
to have different product replacement (turnover) ra-
4. Performance evaluation tios. In the simulation, these relationships are estab-
lished by generating m random numbers for the store i
In this section, we perform a simulation study to from a Poisson distribution with mean Si, and we use
empirically compare the proposed and traditional the jth number, denoted by Wij, as the weight of store i
association-rule mining methods. The main objective in period j. Let Dij denote the number of transactions
of the simulation study is to identify the conditions of store i in period j. The total number of transactions,
under which the proposed method significantly out- D, is distributed to the store i, and period j is
performs the traditional method in identifying impor- determined by:
tant purchasing patterns in a multi-store environment.
Three factors are considered in the study: (1) the D
Dij ¼ Wij
numbers of stores and periods, (2) the store size, P X
X T

and (3) the product replacement ratio. In addition, Wmn


m¼0 n¼0
we evaluate the computational efficiency of the pro-
posed Apriori_TP algorithm using the Apriori algo-
rithm [1] as the baseline for comparisons. The Furthermore, we assume that the number of prod-
proposed algorithm is implemented by Borland C+ ucts in a store is proportional
pffiffiffiffi to the square root of its
language and tested on a PC with a Celeron 1.8 G size. Thus, let ISi ¼ Si for i = 1, 2, . . ., q. Then, the
processor and 768 MB main memory under the number of products in store i, denoted by Ni, is
Windows 2000 operating system. determined by the following formula:

r
4.1. Data generation Ni ¼  ISi
MaxðISi Þ
In the experiment, we randomly generate the
synthetic transactional data sets by applying the data Note that the products sold in a store may change
generation algorithm proposed by Agrawal and Sri- over time, although Ni is kept the same in all periods.
kant [1]. The factors considered in the simulation are Since the parameter Id is the proportion of products
listed in Table 1. In addition, we generate the time that will be replaced in every period, store i replaces
and store information for each transaction in the data Ni  Id products in each period. Furthermore, we
sets. follow the method used by Agrawal and Srikant [1]
To generate the store sizes, we use two parameters, to generate Fd maximum potentially frequent itemsets
Su and Sl, to represent the largest and smallest store with an average length of Fl.
Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354 349

Finally, we generate all the transactions in the data The type A error measures the relative difference
sets. To generate the transactions for store i in period j, in the support levels of all frequent itemsets gener-
we generate Dij from a Poisson distribution with mean ated by the traditional and proposed methods. It is
L and a series of maximum potentially frequent item- determined by rel _ sup(X, D V X )  sup(X, D)/
sets. If an itemset generated from the process has rel_sup(X, DVX). For example, if the support and
some items not sold at store i in period j, we remove relative support for an itemset X are sup(X, D) = 0.02
these items, and repetitively add the items into the and rel_sup(X, DVX) = 0.03, respectively, then the
transaction until we have reached the intended size. If type A error rate is rel_sup(X, DVX)  sup(X, D))/
the last itemset exceeds the boundary of this transac- rel_sup(X, DVX) = 33.33%. By averaging the error
tion, we remove the part that exceeds the boundary. rates of all frequent itemsets, we obtain the overall
When adding an itemset to a transaction, we use a type A error rate. Similarly, the type B error is used
‘‘corruption level,’’ c = 0.7, to simulate the phenome- to compare the difference in confidence levels of all
non that all the items in a frequent itemset do not rules generated by the traditional and proposed meth-
always appear together. Information on how the ods. It is defined as conf(X Z Y) conf V(X Z Y))/conf
corruption level affects the procedure of generating (X Z Y), where conf V(X Z Y) is the rule confidence
items for a transaction is included in the paper by computed by the traditional methods. By averaging
Agrawal and Srikant [1]. To generate the nine types of the type B error rates of all common rules in the two
data sets shown in Table 2, we use the following methods, we obtain the overall type B error rate.
parameter values: r = 1000, D = 100 K, L = 6, Fl = 4, Finally, the type C error is used to compare the relative
and Fd = 1000. For each type of the data sets, 10 difference in the numbers of rules generated by the two
replications are generated for statistical analysis of the methods. Note that we set rs and rr at the same level
results. when evaluating the types A and B error rates. It is
because the frequent itemsets found by the two algo-
4.2. Performance measures rithms have to be the same in order to have a common
base to compare the results produced by the two
As discussed in Section 2, the traditional method algorithms. Furthermore, we set rc at 1% in the
underestimates the support and the confidence values comparison based on the type B error. Using this
and, as a result, may fail to identify important pur- low value, we can include almost all possible rules
chasing patterns in a multiple-store environment. We in the comparison. However, because in a practical
define three measures (errors) for empirically assess- situation the minimum confidence threshold could be
ing the magnitudes of the deviations in support, higher than this value, we also obtain the results for
confidence, and the number of association rules when selected minimum confidence values ranging from
we use the traditional association rules for the store- 40% to 60%. Finally, we set rs at 0.5% in the
chain data. comparison based on the type C error.

Table 2 4.3. Simulation results


Data sets
Data set Number Number Range of Product The first comparison is carried out based on the
of stores of periods store sizes replacement first three types of data sets in Table 2. Because these
rate three types of data sets have different numbers of
1 5 5 50 – 100 0.001 stores and periods, the results show the effects of the
2 10 10 50 – 100 0.001 size of store chain and the length of time on the errors
3 50 50 50 – 100 0.001
associated with using the traditional method. In order
4 50 50 10 – 100 0.001
5 50 50 50 – 100 0.001 to study the effect of rs, we also obtain the results for
6 50 50 90 – 100 0.001 selected minimum support thresholds ranging from
7 50 50 50 – 100 0.001 0.3% to 0.6%. The averages of the types A, B, and C
8 50 50 50 – 100 0.005 errors are shown in parts (a), (b), and (c), respectively,
9 50 50 50 – 100 0.010
of Fig. 7. The two-factor ANOVA model is used to
350 Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354

reported, 40% of the SC rules are not successfully


discovered when rc is 60%.
The second comparison is used to study the effects
of the store size on the error rates. The data set types
4, 5, and 6 are used, and the average error rates are
shown in Fig. 8. The results of statistical analysis
based on the two-factor ANOVA model indicate that
the error rates are significantly larger when the store
size has a larger variation. As shown in parts (a) and

Fig. 7. (a) Effects of the numbers of stores and periods on the type A
error rate. (b) Effects of the numbers of stores and periods on the
type B error rate. (c) Effects of the numbers of stores and periods on
the type C error rate.

analyze the results. We find that all the three error


rates are significantly larger in the cases involving
larger numbers of stores and periods. We also notice
that the error rates generally increase as the minimum
support decreases. All these results suggest that the
traditional method is not suitable for the store-chain Fig. 8. (a) Effects of store size on the type A error rate. (b) Effects of
data. The result in part (c) of the figure further store size on the type B error rate. (c) Effects of store size on the
supports this conclusion, where, in the worst case type C error rate.
Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354 351

comparison. The results, shown in Fig. 9, indicate


that the error rates associated with larger replacement
ratios are significantly higher than those associated
with smaller replacement ratios. We also notice that
the error rates increase as the minimum support
decreases. These observations are supported by the
results of our statistical analysis. Consequently, we
conclude that the performance of the traditional meth-
od deteriorates as the product replacement ratio
increases.
In the second part of the simulation study, we
observe how the type B error rate changes when rc
is varied from 40% to 60%. In this experiment, we set
rs at 0.5% and use data set types 2, 4, 5, and 9 for
comparison. We use data set type 5 as the baseline;
data set type 2 to study the effect of smaller numbers
of time periods and stores; and data set types 4 and 9,
to study a larger variation in store size and a larger
product replacement rate, respectively.
The simulation results are summarized in Fig. 10,
where lines 1, 2, 3, and 4 correspond to the results of
data sets 5, 2, 4 and 9, respectively. The result
indicates that the error rate decreases significantly as
we increase rc. This is because, when rc is higher,
only those rules with higher confidence values are
used in comparison, causing the type B error rate to
decrease. Furthermore, we found that the effect of the
product replacement rate is very similar to that of the
numbers of periods and stores, and both factors are
stronger than that of the variation in store size.

Fig. 9. (a) Effects of product replacement ratio on the type A error


rate. (b) Effects of product replacement ratio on the type B error
rate. (c) Effects of product replacement ratio on the type C error rate.

(b) of the figure, when the variation of the store size is


the largest, the types A and B errors rates are close to
35% and 23%, respectively. In part (c), we find that
more than 50% of the SC rules are not generated by
the traditional method when rc is 50%, and the error
rate reaches almost 70% when rc is 60%.
In the third part of the simulation study, we
compare the error rates under different replacement
rates. We use data set types 7, 8, and 9 for the Fig. 10. The type B error rates vs. minimum confidence thresholds.
352 Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354

proposed by Agrawal et al. [1] in 1993, it has become


an established and active research area. The existing
methods, however, may fail to discover important
purchasing patterns in a multi-store environment,
because of an implicit assumption that products under
consideration are on-shelf all the time across all
stores.
To overcome the problem, a new method, called
store-chain association rules, is proposed specifically
for a multi-store environment, where stores may have
different product-mix strategies that can be adjusted
over time. The format of the rules is similar to that of
the traditional rules. However, the rules also contain
information on store (location) and time where the
Fig. 11. Run times. rules hold. The rules extracted by the proposed
method may be applicable to the entire chain without
time restriction, but may also be store- and time-
To summarize the simulation study, we conclude specific. These rules have a distinct advantage over
that the traditional association rules may not be able to the traditional ones because they contain store (loca-
extract all important purchasing patterns for a multi- tion) and time information so that they can be used not
store chain, especially when there are large numbers only for general or local marketing strategies (depend-
of stores and periods, a large variation in store sizes, ing on the results), but also for product procurement,
and high product replacement ratios. This finding is inventory, and distribution strategies for the entire
significant because many store chains are growing in store chain.
size to maintain the economy of scale and, at the same An Apriori-like algorithm is developed for mining
time, dynamically localize their product-mix strate- chain-store association rules. A simulation is used to
gies. All these trends support the need for the pro- empirically compare the proposed and traditional
posed method. association-rule mining methods. Three factors are
Finally, we evaluate the computational efficiency considered in generating stores’ sales data: (1) the
of the proposed algorithm by comparing it with the numbers of stores and periods, (2) the store size, and
Apriori algorithm. We show the result in Fig. 11, (3) the product replacement ratio. The analysis of the
where the running time is obtained by averaging the simulation result suggests that the proposed method
running times of all the data sets in Table 2. From the has advantages over the traditional method especially
figure, we find that the proposed algorithm requires when the numbers of stores and periods are large,
larger process times, but the differences are not stores are diverse in size, and product mix changes
substantial. This result is reasonable, because the rapidly over time. Furthermore, the time complexity
proposed algorithm requires one more scan of the of the proposed algorithm is discussed, and the
data than does the Apriori algorithm, and also requires simulation results show that the algorithm is compu-
additional basic operations in each phase of the tationally efficient.
algorithm. Store-chain association rules represent a promising
research area in data mining. The results of this paper
can be extended by considering time constraints,
5. Conclusion spatial constraints, quantitative attributes and/or tax-
onomy, and other kinds of time- or location-related
Association-rule mining is a useful method of knowledge. Furthermore, it is important to explore the
discovering customer purchasing patterns by extract- strategies of generating the store-chain association
ing associations or co-occurrences from stores’ trans- rules incrementally, in an on-line model, in a distrib-
actional databases. Since the method was first uted environment, or in parallel models.
Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354 353

Acknowledgements [2] R. Agrawal, T. Imielinski, A. Swami, Mining association rules


between sets of items in large databases, Proceedings of the
ACM SIGMOD International Conference on Management of
The first author was supported in part by the Data, Washington, D.C., 1993, pp. 207 – 216.
Ministry of Education (MOE) Program for Promoting [3] J.M. Ale, G.H. Rossi, An approach to discovering temporal
Academic Excellence of Universities under Grant No. association rules, Proceedings of the 2000 ACM Symposium
91-H-FA07-1-4 and National Science Council Grant on Applied Computing (Vol. 1), Villa Olmo, Como, Italy,
2000, pp. 294 – 300.
No. 91-2416-H-008-003.
[4] R.J. Bayardo Jr., R. Agrawal, Mining the most interesting
rules, Proceedings of the 5th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, San
Appendix A Diego, CA, USA, 1999, pp. 145 – 154, Aug.
[5] I. Bose, R.K. Mahapatra, Business data mining—a machine
The support values and the confidence values learning perspective, Information and Management 39 (2001)
211 – 225.
obtained by the traditional association mining are [6] S. Brin, R. Motwani, J.D. Ullman, S. Tsur, Dynamic item-
underestimated, compared with the true value dis- set counting and implication rules for market basket data,
cussed in this paper. First, it is easy to see that the Proceedings of the 1997 ACM-SIGMOD Conference on
traditional support value is lower because its base is Management of Data, Tucson, Arizona, USA, May 1997,
larger. As to the confidence value, say conf(X Z Y), pp. 255 – 264.
[7] E. Clementini, P.D. Felice, K. Koperski, Mining multiple-
the traditional approach defines it as follows. level spatial association rules for objects with a broad
boundary, Data and Knowledge Engineering 34 (3) (2000)
conf ðX Z Y Þ 251 – 270.
[8] M.-S. Chen, J. Han, P.S. Yu, Data mining: an overview from a
¼ supðX [ Y ; DÞ=supðX ; DÞ database perspective, IEEE Transactions on Knowledge and
¼ ½AW ðX [ Y ; DÞA=ADA =½AW ðX ; DÞA=ADA Data Engineering 8 (1996) 866 – 883.
[9] A. Freitas, On rule interestingness measures, Knowledge-
¼ AW ðX [ Y ; DÞA=AW ðX ; DÞA: ðA1Þ Based Systems 12 (5) (1999) 309 – 315.
[10] J. Han, Y. Fu, Mining multiple-level association rules in large
But the correct one should be databases, IEEE Transactions on Knowledge and Data Engi-
neering 11 (5) (1999) 798 – 805.
[11] J. Han, M. Kamber, Data Mining, Morgan Kaufmann, San
conf ðX Z Y Þ Francisco, 2001.
¼ rel supðX [ Y ; DVX [Y Þ=rel supðX ; DVX [Y Þ [12] J. Han, J. Pei, Y. Yin, Mining frequent patterns without can-
didate generation, Proceedings of the 2000 ACM-SIGMOD
¼ ½AW ðX [ Y ; DVX [Y ÞA=ADVX [Y A Int. Conf. on Management of Data, Dallas, TX, 2000, May.
=½AW ðX ; DVX [Y ÞA=ADVX [Y A [13] H. Ishibuchi, T. Nakashima, T. Yamamoto, Fuzzy association
rules for handling continuous attributes, Proceedings of the
¼ AW ðX [ Y ; DVX [Y ÞA=AW ðX ; DVX [Y ÞA ðA2Þ IEEE International Symposium on Industrial Electronics,
Pusan, Korea, 2001, pp. 118 – 121.
By comparing Eq. (A1) with Eq. (A2), we find that [14] C.M. Kuok, A.W. Fu, M.H. Wong, Mining fuzzy association
the numerators are the same, because it is not possible rules in databases, SIGMOD Record 27 (1) (1998) 41 – 46.
that X[Y appears in a transaction not in DVX[Y , and [15] K. Koperski, J. Han, Discovery of spatial association rules in
geographic information databases, Proc. 4th International
that the denominator of Eq. (A1) is no less than that Symposium on Large Spatial Databases (SSD95), Portland,
of Eq. (A2), because AW(X, D)A z AW(X, DVX[Y )A. Maine, Aug. 1995, pp. 47 – 66.
Thus, we conclude that the confidence value of Eq. [16] C.H. Lee, C.R. Lin, M.S. Chen, On mining general temporal
(A1) is no larger than that of Eq. (A2). association rules in a publication database, Proceedings of the
2001 IEEE International Conference on Data Mining, San
Jose, California, 2001, pp. 337 – 344.
[17] Y. Li, P. Ning, X.S. Wang, S. Jajodia, Discovering calendar-
References based temporal association rules, Proceedings of the Eighth
International Symposium on Temporal Representation and
[1] R. Agrawal, R. Srikant, Fast algorithms for mining association Reasoning, Cividale Del Friuli, Italy, 2001, pp. 111 – 118.
rules, Proceedings of the 20th VLDB Conference, Santiago, [18] J. Liu, Y. Pan, K. Wang, J. Han, Mining frequent item sets by
Chile, 1994, pp. 478 – 499. opportunistic projection, Proceedings of the 2002 Int. Conf.
354 Y.-L. Chen et al. / Decision Support Systems 40 (2005) 339–354

on Knowledge Discovery in Databases, Edmonton, Canada, Kwei Tang is Professor of Management and the area coordinator of
2003, July. quantitative methods in the Krannert Graduate School of Manage-
[19] H. Lu, L. Feng, J. Han, Beyond intra-transaction association ment at Purdue University. He received a BS from National Chiao
analysis: mining multi-dimensional inter-transaction associa- Tung University, Taiwan, an MS from Bowling Green State
tion rules, ACM Transactions on Information Systems 18 (4) University, and a PhD in Management Science from Purdue
(2000) 423 – 454. University. His current research interests include data mining,
[20] J.-S. Park, M.-S. Chen, P.S. Yu, Using a hash-based method supply chain management, and quality management.
with transaction trimming for mining association rules, IEEE
Transactions on Knowledge and Data Engineering 9 (1997) Ren-Jie Shen is a system analyst and designer in Data Systems
813 – 825. Consulting, a leading commercial software company in Taiwan. He
[21] R. Rastogi, K. Shim, Mining optimized association rules with received his MS degree in Information Management from National
categorical and numeric attributes, IEEE Transactions on Central University of Taiwan. His research interests include data
Knowledge and Data Engineering 14 (2002) 29 – 50. mining, information systems and EC technologies.
[22] J.F. Roddick, M. Spiliopoilou, A survey of temporal knowl-
edge discovery paradigms and methods, IEEE Transactions on Ya-Han Hu is currently a PhD student in the Department of
Knowledge and Data Engineering 14 (2002) 750 – 767. Information Management, National Central University, Taiwan.
[23] S. Shekhar, S. Chawla, S. Ravadam, A. Fetterer, X. Liu, C. He received the MS degree in Information Management from
Lu, Spatial databases—accomplishments and needs, IEEE National Central University of Taiwan. His research interests
Transactions on Knowledge and Data Engineering 11 (1999) include data mining, information systems and EC technologies.
45 – 55.
[24] R. Srikant, R. Agrawal, Mining quantitative association rules
in large relational tables, Proceedings of the ACM-SIGMOD
1996 Conference on Management of Data, Montreal, Canada,
1996, pp. 1 – 12, June.
[25] J. Wijsen, R. Meersman, On the complexity of mining quan-
titative association rules, Data Mining and Knowledge Dis-
covery 2 (1998) 263 – 281.

Yen-Liang Chen is Professor of Information Management at


National Central University of Taiwan. He received his PhD degree
in Computer Science from the National Tsing Hua University,
Hsinchu, Taiwan. His current research interests include data mod-
eling, data mining, data warehousing and operations research. He
has published papers in Operations Research, IEEE Transaction on
Software Engineering, Computers and OR, European Journal of
Operational Research, Information and Management, Information
Processing Letters, Information Systems, Journal of Operational
Research Society, and Transportation Research.

You might also like