Balancing Security Overhead and Performance Metrics Using A Novel Multi-Objective Genetic Approach

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com


Volume 2, Issue 3, May June 2013 ISSN 2278-6856


Volume 2, Issue 3 May June 2013 Page 407

Abstract: With fast progress of the networks, data mining
and information sharing techniques, the security of the
privacy of sensitive information in a database becomes a vital
issue to be resolved. The mission of association rule mining is
discovering hidden relationships between items in database
and revealing frequent item sets and strong association rules.
Some rules or frequent item sets called sensitive which
contain some critical information that is vital or private for
its owner.
In recent research there is GA, users have tried to combine
(or aggregate) multiple objectives into a single scalar
function using different weights for each objective, or by
adding penalty functions for specific objectives. But these
methods add more adjustable parameters which require
profound domain knowledge which is usually not available.
In addition, the solutions generated are usually very sensitive
to small changes in these weights or penalties functions. In
this paper, we propose a method that solves those constraints
by using multi objective fitness functions, where it leaves the
choice for user to minimize or maximize many objectives
depending on his or her problem. We investigate the problem
using Multi-Objective Genetic Algorithm to find optimum
state of modification. Finally we establish some experiments
and test our approach by datasets. The experimental results
showed that the number of sensitive rules in sanitized data set
(hiding failure) equal to zero. The number of non- sensitive
patterns discovered from the original database D and the
sanitized database is different. Since we hide most of the
patterns considered sensitive from the original data set, thus
the miss cost (MC) is equal to 36%. The percentage of the
discovered patterns that are artifacts (AP) is 27%. The
percentage of the dissimilarity (DISS) between the original
and the sanitized datasets is 26%. The amount of non-
sensitive association rules that are removed as an effect of the
sanitization process is four.

Keywords: MOPP, MOGA, DBMS, MST, MCT and
GA.
1-INTRODUCTION
Nowadays, due to successful applications of data mining
techniques, they have been demonstrated in many areas
that benefit commercial, social and human activities.
Along with the success of these techniques, they pose a
threat to privacy. One can easily disclose others sensitive
information or knowledge by using these techniques. So,
before releasing database, sensitive information or
knowledge must be hidden from unauthorized access. To
solve privacy problem, Privacy-Preserving Data Mining
(PPDM) has become a hotspot in data mining and
database security fields.
Recent advances in data mining algorithms increased the
risk of information leakage and its confidence issue.
Because of this progress, the parallel research area has
been started to overcome the information leakage risks
and immunization of mining environment. Privacy
preserving against mining algorithms is a new research
area that investigates the side-effects of data mining
methods that is derived from the privacy diffusion of
persons and organizations. Mining these effects can be
considered as an optimization problem.

Optimization Technique
Optimization techniques are used for optimizing
problems in which one needs to minimize or maximize a
real function by methodically choosing the values of real
or integer variables from within a particular set. It is
finding the "best available" values of some objective
function given a defined area, including a variety of
different types of objective functions and different types of
domains. Many types of optimization techniques and
optimization algorithms are used in various types of
approaches. In this paper we use the genetic algorithm for
minimizing the cost function.

Genetic Algorithm
The genetic algorithm (GA) is an optimization and search
technique based on the ethics of genetics and usual
selection. GA allows a population composed of many
individuals to develop under particular selection rules to a
state that maximizes the fitness (i.e., minimizes the cost
function).
In GA, a population consists of a cluster of individuals
called chromosomes that signify a complete solution to a
certain problem. Each chromosome is a sequence of 0s or
1s. The initial set of the population is an erratically
generated set of individuals. A new population is
generated by two methods: steady state Genetic algorithm
and generational Genetic Algorithm. The steady-state
Genetic Algorithm replaces one or two members of the
population; whereas the generational Genetic Algorithm
replaces all of them at each generation of progression. In
this work a steady-state Genetic Algorithm is adopted as
Balancing Security overhead and Performance
Metrics using a novel Multi-Objective Genetic
Approach

Wasim Khalil Shalish
1
, A. Z. Ghalwash
2
, H. M. El-Deeb
3
and K. Badran
4

1
Syrian Armed Forces
2
Helwan University
3
Modern University
4
Military Technical College

International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 3, May June 2013 ISSN 2278-6856


Volume 2, Issue 3 May June 2013 Page 408

population replacement method. This method tries to
keep a certain number of the best individuals from each
generation and copies them to the new generation.
Each transaction is represented as a chromosome and
occurrence of an i
th
item in transaction showed by 1 and
non occurrence of the item by 0 in i
th
bit of transaction.
The fitness of a chromosome is dogged by several
methods and different strategies. Each population consists
of several chromosomes and the best chromosome is used
to generate the next population. For the initial population,
a large number of random transactions are preferred.
Based on the survival fitness, the population will make
over into the future generation.

Fitness function
Fitness function is defined over the genetic representation
and measures the superiority of the represented solution.
The fitness function is forever problem dependent. Once
we have the genetic representation and the fitness
function defined, GA proceeds to initialize a population
of solutions randomly, and then improves it through
repetitive application of mutation, crossover, and
inversion and selection operators.

Selection
In selection process, the individuals producing offspring
are elected. The selection step is preceded by the fitness
assignment which is based on the objective value. This
fitness is used for the real selection process.

Crossover
Main function of crossover operation in genetic
algorithms is to blend two chromosomes mutually to
generating novel offspring (child) [1]. Crossover occurs
only with some probability (crossover probability).
Chromosomes are not subjected to crossover remain
unmodified. The perception following crossover is the
exploration of new solutions and abuse of old solutions.
Better fitness chromosomes have a prospect to be selected
more than the inferior ones, so good solution always alive
to the next generation. There are different crossover
operators that have been developed for various purposes.
Single point crossover and multi-point are the most
famous operators. In this paper single-point crossover has
been applied to make a new offspring.

Mutation
Mutation is a genetic operator that alters one or more
gene values in a chromosome from its initial state. This
can result in entirely new gene values being added to the
gene pool. With these new gene values, the genetic
algorithm may be able to arrive at better solution than
was previously possible.
First the sensitive items and number of modifications
required for each sensitive item are initialized. Next
fitness function is evaluated for each transaction. Based
on this fitness values, each transaction selection process
are carried out in the third step. After the selection
process, frequent items are updated through crossover
operation. Crossover is the main process of genetic
algorithm so in this step most of the frequent items
become infrequent. Remaining items are modified in the
mutation process. After ensuring the conditions i.e. all the
sensitive items are modified then the process is completed
and the execution is terminated. Finally, a priori
algorithm has been applied to the modified database for
finding the frequent item sets for generating the sensitive
rules. Now, we have to ensure that all the sensitive rules
are hidden; no false rules are generated from the dataset
and the non sensitive items are not affected.
The rest of paper is organized as follows: Section 2 gives
a summary of the high-tech methodologies and related
works for privacy preserving in data mining and
association rule hiding with dataset sanitization. Section
3 describes problem formulation and enlightens the major
concepts upon which we base the proposal for the new
privacy preserving framework. Section 4 introduces our
proposed solution for dataset sanitization against
association rule mining. Section 5 presents the
experiments we performed in large scale datasets to
introduce our approach and to prove the effectiveness of
our method. Finally the conclusion will be given in
section 6.

2- RELATED WORK
Researchers have proposed several approaches for
knowledge hiding, in context of association rule hiding.
Chirag et al. in [2] introduced two heuristic blocking
based algorithms named ISARC (Increase Support of
Common Antecedent of Rule Clusters) and DSCRC
(Decrease Support of Common Consequent of Rule
Clusters) to preserve privacy for sensitive association
rules. Proposed algorithms cluster the sensitive rules
based on some criteria and hide them in fewer selected
transactions by using unknowns (?). They preserve
certain privacy for sensitive rules in database, while
maintaining knowledge discovery.
A new multi-objective method was introduced for hiding
sensitive association rules based on the concept of genetic
algorithms in [3]. The main purpose of this method is
fully supporting security of database and keeping the
utility and certainty of mined rules at highest level. In
their work, they have used four sanitization strategies
such as confidence, support, hybrid and max-min. They
introduced the idea of both rule and item set sanitization,
which complements the old idea behind data sanitization.
In [4], two algorithms were proposed ISL (Increase
Support of LHS) and DSR (Decrease Support of RHS).
Predicting items are given as input for both algorithms to
automatically hide sensitive association rules without pre-
mining and selection of hidden rules.
In [5], two algorithms, DCIS (Decrease Confidence by
Increase Support) and DCDS (Decrease Confidence by
Decrease Support) were proposed to automatically hide
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 3, May June 2013 ISSN 2278-6856


Volume 2, Issue 3 May June 2013 Page 409

collaborative recommendation association rules without
pre-mining and selection of hidden rules. The ISL and
DCIS algorithms have tried to increase the support of left
hand side of the rule. Furthermore, DSR and DCDS
algorithms have tried to decrease the support of the right
hand side of the rule. It is observed that ISL requires
more running time than DSR. Also both algorithms
exhibit contrasting side effects. DSR algorithm shows no
hiding failure (0%), few new rules (5%) and some lost
rules (11%). ISL algorithm shows some hiding failure
(12.9%), many new rules (33%) and no lost rule (0%).
Algorithm DCIS requires more running time than DCDS.
DCIS and DCDS also exhibit contrasting side effects
similar to ISL and DSR algorithms. DCDS algorithm
shows no hiding failure (0%), few new rules (1%) and
some lost rules (4%). DCIS algorithm shows no hiding
failure (0%), many new rules (75%) and no lost rule
(0%).
In [6], an algorithm DSC (Decrease Support and
Confidence) was proposed in which pattern-inversion tree
was used to store related information so that only one
scan of database is required. The proposed algorithm can
automatically sanitize informative rule sets without pre-
mining and selection of a class of rules under one
database scan. There are about 4% of new rules generated
and about 9% of rules are lost on the DSC algorithm and
it also shows hiding failure for two predicting items.
Border based approach was presented in [7-9]. It hides
sensitive association rule by modifying the borders in the
lattice of the frequent and the infrequent itemsets of the
original database. The itemsets which are at the position
of the borderline separating the frequent and infrequent
itemsets forms the borders.
In [10, 11], Exact approach was provided. This approach
contains non heuristic algorithms which formulates the
hiding process as a constraints satisfaction problem or an
optimization problem which is solved by integer
programming. These algorithms can provide optimal
hiding solution with ideally no side effects.
The related works previously described, use different
performance metrics; most of them use the (hiding
failure, dissimilarity, and miss cost, artificial false, and
side effect).

Performance evaluation measures for the association
rules
The efficiency of the association rule mechanisms can be
characterized by the following measures:
Dissimilarity quantifies the difference between the
original and the sanitized datasets by comparing their
histograms, where the horizontal axis contains the items
in the dataset and the vertical axis corresponds to their
frequencies. It is calculated as follows:
] [
1
) D Diss(D,
) ( ) (
1
1
) (
i D i D
n
i
n
i
i D
f f
f
'
=
=
= '

(1)

Where f
D(i)
, f
D (i)
represent the frequency of the i
th
item in
the dataset D and D

respectively, and n is the number of
distinct items in the original dataset D.
Miss Cost (MC) quantifies the percentage of the
nonrestrictive patterns that are hidden as side-effects of
the sanitization process. It is computed as follows:
) (
) ( ) (
MC
R
R R
P
P P
D
D D
'
' '
'
= (2)
Where
R P
'
(D) is the set of all non sensitive rules in the
original database D and
R P
'
( D' ) is the set of all non
sensitive rules in the sanitized data baseD' . As one can
notice that there exists a compromise between the miss
cost and the hiding failure, since the more sensitive
association rules need to hide, the more legitimate
association rules are expected to miss.
Similar to the measure of miss cost, Side-Effect Factor
(SEF) is used to quantify the amount of non-sensitive
association rules that are removed as an effect of the
sanitization process. It is defined as follows:
) (
) ) ( (
SEF
D Rp P
D Rp P P

+ '
= (3)
Artificial patterns (AF) quantify the percentage of the
discovered patterns that are artifacts. It is computed as
follows:
P
P P P
'
' '
=

AP (4)
Where P is the set of association rules discovered in the
original dataset D and P' is the set of association rules
discovered inD' .
Hiding Failure (HF) quantifies the percentage of the
sensitive patterns that remain exposed in the sanitized
dataset. It is defined as the fraction of the restrictive
association rules that appear in the sanitized database
divided by the ones that appeared in the original dataset,
formally:
) (
) (
HF
R
R
P
P
D
D'
= (5)
where Rp( D' ) corresponds to the sensitive rules
discovered in the sanitized dataset D' , RP (D) to the
sensitive rules appearing in the original dataset D.
Ideally, the hiding failure should be 0%. The performance
metrics for privacy preserving association rule mining
algorithms are given in [12].

3. PROBLEM FORMULATION
A sample transaction database D taken from [13] is
shown in Table 1. TID shows unique transaction number.
Binary valued item shows whether an item is present or
absent in that transaction. Suppose MST and MCT are
selected to be 50%, 70% respectively. Table 2 shows
sensitive rules satisfying MST, generated from sample
database D.
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 3, May June 2013 ISSN 2278-6856


Volume 2, Issue 3 May June 2013 Page 410

So, the possible number of association rules satisfying
MST and MCT, generated by Apriori algorithm [14] are
given: , , , . Suppose the
rules and are specified as sensitive and
should be hidden in sanitized database.
The problem of privacy preserving in association rule
mining (so called association rule hiding) that is focused
by this paper can be formulated as follows:
Given a transaction database (D), minimum support
threshold (MST), minimum confidence threshold (MCT),
a set of significant association rules R mined from (D)
and a set of sensitive rules to be hide.
Generate a new databaseD' .
Such that the rules in can be
mined from D' under the same MST and MCT.
Where no normal rules in are falsely hidden
(lost rules), and no extra spurious rules (ghost rules) are
mistakenly will mined after the rule hiding process.

Table 1: Sample database D
TID Item Item (Binary From)
0 0 1 3 1101
1 1 0100
2 0 2 3 1011
3 0 1 1100
4 0 1 3 1101

Table 2: Sensitive rules
R1

R2


4. PROPOSED SOLUTION
4.1 Security and Association rule Mining Trade
The association rule hiding problem can be considered as
a deviation of the well identified database inference
control problem in statistical and multilevel databases.
The primary goal in database inference control is to guard
access to sensitive information that can be obtained
through non sensitive data and inference rules. In
association rule hiding, we think about that it is not the
data itself but somewhat the sensitive association rules
that produce a breach to privacy.
For the simplicity of presentation and without loss of
generality, we make the following assumptions in this
implementation:
We want to extract all association rules which satisfy
minimum support transaction (MST), minimum
confidence transaction (MCT). Support is a measure of
the frequency of a rule. The confidence is a measure of
the strength of the relation between sets of rules.
Association rule mining algorithms scan the database of
transactions and calculate the support and confidence of
the candidate rules to determine if they are considerable
or not. A rule is considerable, if its support and
confidence is higher than the user specified minimum
support and minimum confidence threshold. In this way,
algorithms do not retrieve all possible association rules
that can be derivable from a dataset, but only a small
subset that satisfies the minimum support and minimum
confidence requirements set by the users.
Apriori association rule-mining algorithm works as
follows. It finds all the sets of rules that appear frequently
enough to be considered relevant and then it derives from
them the association rules that are strong enough to be
considered interesting. The major goal here is to
preventing some of these rules that we refer to as
"sensitive rules", from being revealed. We want to hide
association rules using the best way by multi objective
genetic algorithm. Also we are interested in investigating
the performance of association rules (hiding failure (HF),
dissimilarity (DIS), artificial pattern (AF), side effect
(SEF), and miss cost (MC)).
Figure (1) presents the basic architecture of a database
system with the association rule mechanism.



Figure 1 Architecture of a database application with the
association rule procedure
4.2 Security and Association Rule Mining Trade
using Optimization
In this paper we are studying the privacy breaches which
incurred from certain type association rules. In doing so
we suppose that a certain subset of association rule, which
is extracted from specific datasets, is considered as
sensitive/critical rules. Our major goal then is
modification of original data source in such a way that it
would be impossible for the adversary to mine the
sensitive rules from the modified data set as long as all
the remaining non sensitive information and/or
knowledge remains as close as possible to this of the
original set, as our minor goal.
The method developed in this paper uses binary
transactional dataset as an input and modifies the original
dataset based on the concept of genetic algorithms for
privacy preserving of association rule to find the best
solution for sanitizing original dataset based on multi-
objective optimization. In such a way that all of sensitive
rules become hide and minimum modification performed
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 3, May June 2013 ISSN 2278-6856


Volume 2, Issue 3 May June 2013 Page 411

in original dataset. The most famous possible style for
transaction modification is distortion of original database
(i.e., by replacing 1s by 0s and vice versa). We select
this style of modification in our method. Modification of
the dataset causes so many side-effect problems.
The modification process can affect the original set of
rules, that can be mined from the original database, either
by hiding rules which are not sensitive (lost rules), or by
introducing rules in the mining of the modified database,
which were not supported by the original database (ghost
rules). We have tried to minimize these unpleasant results
by minimum and suitable modification of original dataset.
The steps our work are explained in Figure 2.


Figure 2 Multi objectives privacy preserving (MOPP)

The following steps illustrate the methodology of the
proposed solution:
Step 1: Consider a transactional database with a set of
items and transactions.
Step 2: Write two external files one for original data set
and one for sensitive rules.
Step 3: Convert every chromosome to double value and
store in population then convert that value to binary
value.
Step 4: Create file for Apriori algorithm.
Step 5: Apriori algorithm is used to find the frequent item
sets based on the minimum support threshold.
Step 6: From the frequent item sets, the set of association
rules can be generated based on the minimum support
and confidence thresholds.
Step 7: Select the sensitive rules from the set of
association rules.
Step 8: Read association rules from output file and put in
structure for comparison with sensitive rules.
Step 9: Compare association rules with sensitive to
calculate Fitness Vector (1).
Step 10: Compare chromosome with original dataset to
calculate Fitness Vector (2).
Step 11: Genetic algorithm is used for modifying the
items based on the fitness function.
Step 12: Repeat the steps 5, 6 and 7 for the modified data
set.
Step 13: Verify (i) all the sensitive rules are hidden, (ii)
no non-sensitive rules are hidden (iii) no false rules.
To emphasis such activities in mathematical concepts, the
mathematical formulation of multi-objective optimization
problem could be defined as:
Find the vector
T
n
y y y y ] ,..., , [
2 1
= which satisfies the
m inequality constraints and the p equality constraints:
0 ) ( > x g
i
i =1,2,,m (6)

0 ) ( = x h
i
i =1,2,,p (7)

And optimizes (here we assume minimization) the vector
function:
T
k
x f x f x f x f )] ( ),..., ( ), ( [ ) (
2 1
= (8)
Where
T
n
x x x x ] ,..., , [
2 1
= is the vector of decision
variables, and the constraints given by equations (6) and
(7) define the feasible region F.?

Traditional Technique
Convert the multi-objective optimization problem into
one objective problem i.e. to find one optimal solution by
combining the objectives through weighting.
) ( ) 2 ( 2 ) 1 ( 1
...,
xn n x x
f w f w f w F + + =
Where 1 ... w
2 1
= + + +
n
w w

Proposed technique
Keep the problem AS multi-objective optimization
problem i.e. to find the pareto optimal solution
We say that a vector of decision variables F y e is
optimal if there is no other F x e such that
) ( ) ( y f x f
i i
s for all i =1, . . . , k and ) ( ) ( y f x f
j j
<
for at least one j.
A vector ) ,... , (
2 1 k
u u u u = is said to dominate
) ,.. , (
2 1 k
v v v v = (denoted by) v u if and only if u is
partially less thanv , i.e.
i i i i
v u k i v u k i < e - . s e : } ...., 2 , 1 { }, ,..., 2 , 1 {
.
Our fitness vector consists from two elements:
1
f =Hiding Failure =
) (
) (
D Sen
D Sen
R
R
'
(9)
2
f =Dissimilarity= ] [
1
) ( ) (
1
1
) (
i D i D
n
i
n
i
i D
f f
f
'
=
=

(10)
Where
) (i D
f ,
) (i D
f
'
represents the frequency of the ith
item in the dataset D, and D' respectively, and n is the
number of distinct items in the original dataset D.
Farther, we can choose menu of optimal solutions
according to our problem.
The main contributions are focused on three points: first,
a new proposed algorithm for hiding sensitive association
rules using multi objective genetic algorithm and
Modification old Math Model, the second contribution is
achieving balance between security and performance in
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 3, May June 2013 ISSN 2278-6856


Volume 2, Issue 3 May June 2013 Page 412

database, the last point of the contribution is evaluation of
hiding performance in our work.
5. DISCUSSION AND EXPERIMENTAL
RESULTS
5.1 Experimental Setup
The data set Congress Voting Data set [15] includes votes
for each of the U.S. House of Representatives
Congressmen on the 16 key votes identified by the CQA.
The CQA lists nine different types of votes: voted for,
paired for, and announced for (these three simplified to
yea), voted against, paired against, and announced
against (these three simplified to nay), voted present,
voted present to avoid conflict of interest, and did not vote
or otherwise make a position known (these three
simplified to an unknown disposition). Number of
Instances: 435 (267 democrats, 168 republicans) Number
of Attributes: 16 +class name =17 (all Boolean valued).
A sample transaction database D taken from [15] is
shown in Table (3). TID shows unique transaction
number, Suppose MST and MCT are selected 25% and
58% respectively.

Table 3: Sample data set

TID
C
l
a
s
s

N
a
m
e

h
a
n
d
i
c
a
p
p
e
d
-
i
n
f
a
n
t
s

w
a
t
e
r
-
p
r
o
j
e
c
t
-
c
o
s
t
-
s
h
a
r
i
n
g

a
d
o
p
t
i
o
n
-
o
f
-
t
h
e
-
b
u
d
g
e
t
-
r
e
s
o
l
u
t
i
o
n

p
h
y
s
i
c
i
a
n
-
f
e
e
-
f
r
e
e
z
e

e
l
-
s
a
l
v
a
d
o
r
-
a
i
d

r
e
l
i
g
i
o
u
s
-
g
r
o
u
p
s
-
i
n
-
s
c
h
o
o
l
s

1 republican N Y N y Y Y
2 republican N Y N y Y Y
3 democrat ? Y Y ? Y Y
4 democrat N Y Y n ? Y
5 democrat Y Y Y n Y Y
6 democrat N Y Y n Y Y
7 democrat N Y N y Y Y
8 republican N Y N y Y Y
9 republican N Y N y Y Y
10 democrat Y Y Y n N N

5.2 Association Rules Mining Methodology using
optimization
Table (4) shows frequent rules satisfying MST, generated
from sample database D, in following; the possible
number of association rules satisfying MST and MCT,
generated by Apriori algorithm are given: (20). Suppose
the rule: (el-Salvador-aid=y 212 religious-groups-in-
schools=y 197) are specified as sensitive and should be
hidden in sanitized database, the transactions which
contain the sensitive items are called population. The
chromosomes of this population the fitness function has
applied. After applying the crossover and mutation
operations, based on fitness function the sensitive items of
the original database are modified and for keeping the
privacy of the database. After modification, Apriori
algorithm has been applied to verify all the sensitive rules
are hidden with the same support and confidence. Then
we evaluated the performance and security metrics
(hiding failure, dissimilarity, artificial pattern, side effect,
miss cost).

Table 4: Best rules inference extracted from original
dataset with MCT=0.58 and MST=0.25
TID Rules
1 adoption-of-the-budget-resolution=y physician-fee-freeze=n 219
Class Name=democrat
2 adoption-of-the-budget-resolution=y physician-fee-freeze=n aid-
to-nicaraguan-contras=y 198 Class Name=democrat
3 physician-fee-freeze=n aid-to-nicaraguan-contras=y 211
Class Name=democrat 210
4 physician-fee-freeze=n education-spending=n 202 Class
Name=democrat 201
5 physician-fee-freeze=n 247 Class Name=democrat 245
6 Class Name=democrat el-salvador-aid=n 200 aid-to-
nicaraguan-contras=y 197
7 el-salvador-aid=n 208 aid-to-nicaraguan-contras=y 204
8 el-salvador-aid=y 212 religious-groups-in-schools=y 197

Table 5: shows the association rule evaluation
performance results
Parameters Results
HF 0%
MC 36%
AP 27%
DISS 26%
SEF 4

As shown in Table (5), and figure(3.a) the number of
sensitive rules in sanitized data set equal to zero, most of
the developed privacy preserving algorithms are designed
with the goal of obtaining zero hiding failure. Thus, we
hide all the patterns considered sensitive from the
original data set. The number of non- sensitive patterns
discovered from the original database D, and the sanitized
database is the different, since we hide most the patterns
considered sensitive from the original data set, thus the
MC is equal to 36% as obviously in figure (3.b). The
percentage of the discovered patterns that are artifacts
(AP) is 27% as obviously in figure (3.c). The percentage
of the dissimilarity (DISS) between the original and the
sanitized datasets is 26% as obviously in figure (3.d). The
amount of non-sensitive association rules that are
removed as an effect of the sanitization process is four as
obviously in figure (3.e).


International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 3, May June 2013 ISSN 2278-6856


Volume 2, Issue 3 May June 2013 Page 413

6. Conclusion
The drawbacks of the traditional techniques in [2, 3, 4, 5
and 6] are weights values are unknown so its assumed in
advance. Also, it is no warranty to achieve hiding failure
with high performance. But these methods add more
adjustable parameters which require profound domain
knowledge which is usually not available, In addition, the
solutions generated in [2,3,4,5 and 6] are usually very
sensitive to small changes in these weights or penalties
functions.
The proposed approach penetrate the problem of
Balancing Security and Performance Metrics in generic
way since we do optimize between hiding failure as
security over head and ((AF), (Diss), (SEF), (MC)) as
database performance metrics.
The approach generates multiple solutions rather than
only biased solution.
This approach could be used in a tailored fashion base
especially in military applications or in civilian
application with dynamic policies concentrating on either
the security or the performance or both.
Reference
[1] D. Whitley, A genetic algorithm tutorial, Colorado
State University, 1994.
[2] M. Chirag, et al, An Efficient Solution for Privacy
Preserving Association Rule Mining, (IJCNS)
International Journal of Computer and Network
Security, Vol. 2, No. 5, May 2010.
[3] M. Dehkordi.. A Novel Method for Privacy
Preserving in Association Rule Mining Based on
Genetic Algorithms, Journal of software-JSW,
volume 4, no 6, 2009.
[4] S. Wang, B. Parikh, A. Jafari, Hiding informative
association rule sets, ELSEVIER, Expert Systems
with Applications, pp. 316323, 2006.
[5] S. Wang, D. Patel, et al, Hiding collaborative
recommendation association rules, Published
Springer, Science Business Media, LLC 2007.
[6] S. Wang, R. Maskey, et al , Efficient sanitization of
informative association rules, ACM , Expert
Systems with Applications: An International Journal,
Volume 35, Issue 1-2, July, 2008 .
[7] G. Moustakides, V. Verykios, A maxmin approach
for hiding frequent itemsets, Data and Knowledge
Engineering, pp.7589, 2008.
[8] G. Moustakides, V. S. Verykios, A maxmin
approach for hiding frequent itemsets, In
Workshops Proceedings of the 6th IEEE
International Conference on Data Mining (ICDM),
pp. 502506, 2006.
[9] X. Sun, P. Yu, Hiding sensitive frequent itemsets by
a borderbased approach, Computing science and
engineering, pp.7494, 2007.
[10] A. Divanis, V. Verykios, An Integer Programming
Approach for Frequent Itemset Hiding, In Proc
ACM Conf Information and Knowledge
Management (CIKM 06), Nov. 2006.
[11] A. Divanis, V. Verykios, Exact Knowledge Hiding
through Database Extension, IEEE Transactions on
Knowledge and Data Engineering, vol. 21(5), pp.
699713, May 2009.
[12] C. Aggarwal, P.Yu, Privacy-Preserving Data
Mining: Models and Algorithms, Springer,
Heidelberg, pp. 267286, 2008.
[13] K. Duraiswamy, D. Manjula, Advanced Approach
in Sensitive Rule Hiding, Modern Applied Science,
Vol.3, no. 2, 2009.
[14] C. Clifton, M. Kantarcioglu, J. Vaidya, Defining
Privacy for Data Mining, In Proceedings US Nat'l
Science Foundation Workshop on Next Generation
Data Mining, pp. 126-133, 2002.
[15] J. Schlimmer, Concept acquisition through
representational adjustment, Doctoral dissertation,
Department of Information and Computer Science,
University of California, Irvine, CA. 1987

You might also like