A Quantum Swarm Evolutionary Algorithm For Mining Association Rules in Large Databases
A Quantum Swarm Evolutionary Algorithm For Mining Association Rules in Large Databases
ORIGINAL ARTICLE
KEYWORDS
Quantum Evolutionary
Algorithm;
Swarm intelligence;
Association rule mining;
Fitness
Abstract Association rule mining aims to extract the correlation or causal structure existing
between a set of frequent items or attributes in a database. These associations are represented by
mean of rules. Association rule mining methods provide a robust but non-linear approach to nd
associations. The search for association rules is an NP-complete problem. The complexities mainly
arise in exploiting huge number of database transactions and items. In this article we propose a new
algorithm to extract the best rules in a reasonable time of execution but without assuring always the
optimal solutions. The new derived algorithm is based on Quantum Swarm Evolutionary approach;
it gives better results compared to genetic algorithms.
2010 King Saud University. Production and hosting by Elsevier B.V. All rights reserved.
1. Introduction
Data mining methods such as association rule mining (Agrawal et al., 1993a,b) are gaining popularity for their power
and ease of use. Association rule learning methods provide a
robust and non-linear approach to nd associations (correlations) and causal structures among sets of frequent items or
attributes in a database. Association rule algorithms, such as
Apriori (Agrawal et al., 1993a,b), examine a long list of transE-mail address: [email protected]
1319-1578 2010 King Saud University. Production and hosting by
Elsevier B.V. All rights reserved.
Peer review under responsibility of King Saud University.
doi:10.1016/j.jksuci.2010.03.001
M. Ykhlef
Association rule mining in large databases is a very complex process and exact algorithms are very expensive to use.
We think that evolutionary computing provides much help
in this arena. In this article, we address the issue of using a
Quantum Swarm Evolutionary Algorithm (QSE) (Wang
et al., 2006) for mining association rules. QSE is a hybridization of Quantum Evolutionary Algorithm (QEA) (Han and
Kim, 2002) and particle swarm optimization (PSO) (Kennedy
and Eberhart, 1995).
QEA approach is better than classical evolutionary algorithms like genetic algorithm, instead of using binary, numeric
or symbolic representation; QEA uses a Q-bit as a probabilistic
representation, dened as the smallest unit of information. A
Q-bit individual is dened by a string of Q-bits called multiple
Q-bits. The Q-bit individual has the advantage that it can
represent a linear superposition of states (binary solutions) in
search space probabilistically. Thus, the Q-bit representation
has a better characteristic of population diversity than chromosome representation used in genetic algorithm. A Q-gate
is also dened as a variation operator of QEA to drive the individuals toward better solutions and eventually toward a single
state.
QSE (Wang et al., 2006) employs a novel quantum bit
expression mechanism called quantum angle and adopted
the improved PSO to update Q-bit of QEA automatically.
The authors of Wang et al. (2006) prove that QSE is better
than QEA.
The remainder of this article is organized as follows:
Section 2 presents basics of association rule mining. In Section
3, we give a general description of quantum computing and
particle swarm optimization. In Section 4, we present a new
approach to mine association rules. Section 5 illustrates our
experimental results.
2. Association rule mining
2.1. Problem denition
Association rule mining is formally dened as follows. Let
I fi1 ; i2 ; . . . ; im g be a set of Boolean attributes called items
and S fs1 ; s2 ; . . . ; sn g be a multi-set of records representing
data instances or transactions, where each record or data instance si 2 S is constituted from the non-repeatable attributes
from I. The presence of a Boolean attribute in a data instance
si means that its value is 1, if it is absent, its value is set to 0.
For example, let I fA; B; Cg be a set of Boolean attributes
and let S fhA; Bi; hCi; hCig be a multi-set of data instances,
the multi-set S can be rewritten as follows:
S fhA 1; B 1; C 0i; hA 0; B 0; C 1i;
hA 0; B 0; C 1ig
For categorical attribute, instead of having one attribute in I,
we have as many attributes as the number of attribute values.
For example, the more general multi-set of data instances S given by:
jC&Pj
.
jCj
Quantum computing (QC) is an emergent eld calling upon several specialties: physics, engineering, chemistry, computer science and mathematics. QC uses the specicities of quantum
A Quantum Swarm Evolutionary Algorithm for mining association rules in large databases
mechanics for processing and transformation of data stored in
two-state quantum bits or Q-bit(s) for short. A Q-bit can take
state value 0, 1 or a superposition of the two states at the same
time. The state of a Q-bit can be represented as w = a0 +
b1 where a and b are the amplitudes of 0 and 1, respectively, in this state. When we measure this Q-bit, we see 0 with
probability a2, and 1 with probability b2 such that
a2 + b2 = 1.
The idea of superposition makes it possible to represent an
exponential whole of states with a small number of Q-bits.
According to the quantum laws like interference, the linearity
of quantum operations makes the quantum computing more
powerful than the classical machines.
In order to exploit effectively the power of quantum computing, it is necessary to create efcient quantum algorithms.
A quantum algorithm consists in applying a succession of
quantum operations on quantum systems. Shor (1994) demonstrated that QC could solve efciently NP-complete problems
by describing a polynomial time quantum algorithm for factoring numbers.
One of the most known algorithms is Quantum-inspired
Evolutionary Algorithm (QEA) (Han and Kim, 2002), which
is inspired by the concept of quantum computing. This
algorithm has been rst used to solve knapsack problem (Han
and Kim, 2002) and then it has rst used to solve different
NP-complete problems like Traveling Salesman Problem (Talbi
et al., 2004) and Multiple Sequence Alignment (Layeb et al.,
2006, 2008).
Meanwhile, particle swarm optimization (PSO) has demonstrated a good performance in many functions and parameter
optimization problems. PSO is a population-based optimization strategy. It is initialized with a group of random particles
and then updates their velocities and positions with the following formula:
vt 1 vt c1 rand pbestt presentt
c2 rand gbestt presentt
presentt 1 presentt vt 1
where vt is the particle velocity, presentt is the current particle. pbestt and gbestt are dened as individual best and
global best. rand is a random number between [0, 1]. c1, c2
are learning factors; usually c1 = c2 = 2 (Wang et al., 2006).
In the next section we will tailor the hybrid Quantum
Swarm Evolutionary Algorithm (QSE) (Wang et al., 2006) to
the problem of mining association rules.
4. The QSE-RM approach
In this section we rst present QEA-RM for association rule
mining and then we give a PSO version of QEA-RM named
QSE-RM.
In order to show how QEA concepts have been tailored to
the problem of association rule mining, a formulation of the
problem in terms of quantum representation is presented and
a Quantum Swarm Evolutionary Algorithm for association
rules mining QSE-RM is derived.
4.1. Quantum representation
QEA-RM uses the novel representation based on the concept
of string of Q-bits called multiple Q-bit dened as below:
Q
a1
b1
a2
...
b2
am
bm
where at2 + bt2 = 1, t 1; . . . ; m, m is the number of Qbits. Quantum Evolutionary Algorithm with the multiple Qbit representation has a better diversity than classical genetic
algorithm since it can represent superposition of states. Only
one multiple Q-bit with three Q-bits such as:
" 1 1 1 #
p
p
2p
2
2
p1
p12 23
2
is enough to represent the following system with eight states:
p
p
p
3
3
3
1
1
1
j001i j010i
j011i j100i
j101i
j000i
4
4
4
4
4
4
p
1
3
j111i
j110i
4
4
This means that the probabilities to represent the states 0 0 0,
0 0 1, 0 1 0, 0 1 1, 1 0 0, 1 0 1, 1 1 0, 1 1 1 are 1/16, 3/16,
1/16, 3/16, 1/16, 3/16, 1/16, 1/16 respectively. However in
genetic algorithm one needs eight chromosomes for encoding.
For the data instances S of Section 2.1 given by
S fhA 1; B 1; C 0i; hA 0; B 0; C 1i; hA 0; B 0;
C 1ig one would have a multiple Q-bits representation constituted from 3 Q-bits.
4.2. Measurement
The measurement of single Q-bit projects the quantum state
onto one of the basis states associated with the measuring device. The process of measurement changes the state to that
measured. The multiple Q-bit measurement can be treated as
a series of single Q-bit measurements to yield a binary solution
P. In association rules, the occurrence of 1 in P means that the
corresponding item or the attribute value is present in P however 0 means that the corresponding item or attribute value is
absent from P.
4.3. Structure of QEA-RM
The Quantum-inspired Evolutionary Algorithm for association rules mining (QEA-RM) is described as follows:
Procedure QEA-RM
begin
t0
initialize population of Q-bit individuals Qt
project Qt into binary solutions P t
compute tness of P t
generate association rule from each P t if there is any
store the best solutions among P t
while (not end-condition) do
tt+1
project Q(t 1) into binary solutions P t
compute tness from P t
generate association rule from each P t if there is any
update Qt using Q-gate
store the best solutions among P t
end
end
M. Ykhlef
Table 1
xi
0
0
0
0
1
1
1
1
Lookup table.
fx P fb
bi
0
0
1
1
0
0
1
1
False
True
False
True
False
True
False
True
Dhi
0
0
0
Delta
Delta
Delta
Delta
Delta
the values of ai and bi are initialized with 1= 2. The step project Qt into binary solutions Pt generates binary solutions
by observing the states of population Qt; for each bit in multiple Q-bit we generate a random variable between 0 and 1; if
random(0, 1) < bi2 then we generate 1 else 0 is generated. In
the step compute tness of Pt, each binary solution Pt is
evaluated for the tness value computed by the formula F of
Section 2.2. The step update Qt using Q-gate is introduced
as follows (Han and Kim, 2002):
Procedure update Qt
begin
i0
while (i < m) do
ii+1
determine Dhi with the lookup table
a0i b0i T U Dhi ai bi T
end
end
Quantum gate UDh1pti is a variable operator, it can be
chosen according to the problem. We use the quantum gate dened in Han and Kim (2002) as follows:
cosnDhi sinnDhi
UDhi
sinnDhi cosnDhi
where nDhi sai ; bi Dhi ; s(ai, bi) and Dhi represents the
rotation direction and angle, respectively. The lookup table is
presented in Table 1, Delta is the step size and should be
designed in compliance with the application problem. However, it has not had the theoretical basis till now, even though
it usually is set as small value. Many applications set
Delta = 0.01p. The function f(x) (resp. f(b)) is the prot of
the binary solution x (resp. best solution b). For example, if
the condition f(x) P f(b) is satised and xi, bi are 1 and 0,
respectively, we can set the value of Dhi as 0.01p and sai ; bi
as +1, 1, or 0 according to the condition of ai, bi; so as to
increase the probability of the state 1.
4.4. Structure of QSE-RM
In order to introduce QSE-RM we present quantum angle. A
quantum angle (Wang et al., 2006) is dened as an arbitrary
angle h and a Q-bit is presented
h
i as [h]. Then [h] is equivalent
sinh
cosh
s(ai, bi)
aibi > 0
ai bi < 0
ai = 0
bi = 0
0
0
0
1
1
+1
+1
+1
0
0
0
+1
+1
1
1
1
0
0
0
1
1
0
0
0
0
0
0
0
0
1
1
1
j sinhj2 j coshj2 1:
a1
b1
placed by: [h1 h2 . . . hm].
The common rotation gate
a2
...
b2
am
could be rebm
A Quantum Swarm Evolutionary Algorithm for mining association rules in large databases
Table 2
1
2
3
4
5
6
7
8
9
Attribute values
Parents
Has_nurs
Form
Children
Housing
Finance
Social
Health
Recommendation
and available from UCI repository (http://www.archive.ics.uci.edu/ml/) of machine learning. Nursery database was derived
from a hierarchical decision model originally developed to
rank applications for nursery schools (Bohanec and Rajkovic,
1990).
The Nursery database contains 12,960 instances and 9 attributes, all of them categorical. The structure of Nursery database is given in Table 2.
As it is done in Lopes et al. (1999) we have specied three
goal attributes, namely Recommendation, Social and Finance.
A threshold of execution time is xed. In all cases, our results
are better than those found by GA-PVMINER.
Table 3
Table 4
1
C&P
Fitness
J-measure
0.33
0.40003
0.00005144
0.40036
0.00059339
1440
0.40010
0.00016954
4320
0.40005
0.00008476
Rule
C&P
Fitness
J-measure
855
0.98
0.40011
0.00017626
0.40038
0.00062905
720
M. Ykhlef
References
Agrawal, R., Imielinski, T., Swami, S., 1993a. Mining association rules
between sets of items in large databases. In: Buneman, P., Jajodia,
S. (Eds.), Proceedings of the 1993 ACM SIGMOD International
Conference on Management of Data, Washington, DC, May 26
28, pp. 207216.
Agrawal, R., Imielinski, T., Swami, S., 1993b. Mining association rules
between sets of items in large databases. SIGMOD Record 22 (2),
207216 (ACM Special Interest Group on Management of Data).
Angiulli, F., Ianni, G., Palopoli, L., 2001. On the complexity of mining
association rules. In: Proc. Nono Convegno Nazionale su Sistemi
Evoluti di Basi di Dati (SEBD), pp. 177184.
Bohanec, M., Rajkovic, V., 1990. Expert system for decision making.
Sistemica 1 (1), 145157.
Han, K.H., Kim, J.H., 2002. Quantum-inspired Evolutionary Algorithm for a class of combinatorial optimization. IEEE Transaction
on Evolutionary Computation 6 (6), 580593.
Kennedy, J., Eberhart, R.C., 1995. Particle swarm optimization. In:
Proceedings of the IEEE International Conference on Neural
Networks, vol. 9, Australia, pp. 21472156.
Layeb, A., Meshoul, S., Batouche, M., 2006. Multiple sequence
alignment by quantum genetic algorithm. In: Proceedings of the
IEEE Conference of the International Parallel and Distributed
Processing Symposium (IPDPS2006), Rhodes Island, Greece,
April 2529.
Layeb, A., Meshoul, S., Batouche, M., 2008. Quantum genetic
algorithm for multiple RNA structural alignment. In: IEEE
Proceedings of the Second Asia International Conference on
Modelling and Simulation (AMS 2008), Kuala Lumpur, Malaysia,
May 1315.
Lopes, H.S., Araujo, D.L.A., Freitas, A.A., 1999. A parallel genetic
algorithm for rule discovery in large databases. In: IEEE Systems,
Man and Cybernetics Conf., pp. 940945.
Melab, M., El-Ghazali, T., 2000. A parallel genetic algorithm for rule
mining. In: IPDPS, IEEE Computer Society.
Pei, J., Han, J., Yin, Y., 2000. Mining frequent patterns without
candidate generation. In: ACM SIGMOD Int. Conference on
Management of Data.
Shor, P.W., 1994. Polynomial-time algorithms for prime factorization
and discrete logarithms on a quantum computer. In: Proceedings of
the 35th Annual Symposium on Foundations of Computer Science,
Santa Fe, NM, November 2022.
Talbi, T., Draa, A., Batouche, M., 2004. A quantum inspired genetic
algorithm for solving the traveling salesman problem. In: Proceedings of the IEEE ICIT 04, Tunisia, December 810.
Wang, Y., Feng, X., Huang, Y., Pu, D., Zhou, W., Liang, Y., Zhou,
C., 2006. A novel quantum swarm evolutionary algorithm and its
applications. Neurocomputing 70 (46), 633640.