2012 Ijact
2012 Ijact
Hemanta Kumar Bhuyan, Sanjit Kumar Dash, Subrata Roy and Dillip Swain
Hemanta Kumar Bhuyan1, Sanjit Kumar Dash2, Subrata Roy3 and Dillip Swain4
1
Dept of CSE, Mahavir Institute of Engg. & Technology, Bhubaneswar, Odisha, India
[email protected]
2
Department of IT, College of Engineering & Technology, Bhubaneswar, Odisha, India
[email protected]
3
Dept of CSE, Mahavir Institute of Engg. & Technology, Bhubaneswar, Odisha, India
[email protected]
4
Department of CSA, Indira Gandhi Institute of Technology, Saranga, Odisha, India
[email protected]
Abstract
Privacy is an important issue in many peer to peer data mining applications. Analysis of privacy
sensitive data in privacy preserving data mining is a challenging task. It is often assumed that parties
in a multiparty environment are well behaved and sustain with predefined protocols. Each party has
certain responsibilities for performing their computation, communicating correct values to others and
protecting the privacy of the data. They should not conspire with data of other parties during
computation or communication. The assumptions of well-behaved parties fail to translate to real life
application, where self-interested parties try to maximize their own benefit, even if they conspire. This
paper proposes a scalable privacy preserving algorithm based upon secure average computation for
decentralized network. The proposed methodology deals with the distributed computation of a set of
data stored at different peers in a distributed network. The paper focuses on modifying existing
PPDDM protocols to incorporate incentive or penalty. So the protocol reaches a desired equilibrium.
1. Introduction
Privacy preserving data mining is getting more popular in multi-party application domain where the
data is distributed among many nodes in network. Analysis of multi-party privacy sensitive data plays a
vital role in large number of applications. Privacy preserving distributed data mining (PPDDM)
algorithms create the strong decision about the behavior of participating parties. The parties may be
Honest or Semi-honest, where semi-honest may intentionally try to change their strategy in multiparty
computation. Some may try to conspire with other parties to disturb the computation or disclose the
private data. So we use privacy preserving computation with penalty algorithm to protect the data of
any party and give penalty to those parties who conspire about the data or computation. The semi
trusted based PPDDM for resource constrained devices are discussed in[1]. Additive noise and
multiplicative bias as disclosure limitation techniques for continuous micro data has been discussed in
[9]. The novel hybrid protection technique of privacy preserving data mining and adding data
perturbation and noisy data with anti data mining has been described in [5]. This paper offers an
alternative technique to recognize the parties who make the disturbance about the computation namely
secure average computation. The assumption of semi-honesty in participant behavior is sub-optimal
and proposes a penalty based mechanism for a series of secure average computations.
The rest of this paper is organized as follows. Section 2 discusses the related work. Section 3 describes
an approach for privacy preservation in decentralize network. Section 4 illustrates Secure average
multiparty computation with penalty. Section 5 outlines the penalty for semi-honest party. Section 6
presents the experimental result. Finally, section 7 concludes this paper.
2. Related Work
The distributed data mining deals with the problem of data analysis regarding with distributed data,
computing nodes, communication cost, computational task and user’s problem. It has been involved
with both centralized and decentralized environment with different sites and also has already
developed for homogenous and heterogeneous data distribution [12, 13, 17]. Peer-to-peer data mining
can be worked as massive network with decentralized administrator upon participating autonomous
nodes and monitoring all of their activities [18] .It requires high scalable and communicational efficient
algorithm for distributed data mining with approximate or exact techniques [14]. The several
approaches and models has already implemented for privacy preservation in distributed data mining
with collaborative computation among parties [11, 15]. The game theory is also used in privacy
preserving distributed data mining (PPDDM) with multiparty computation [16].
The existing techniques for PPDM are micro-aggregation, perturbation, anonymization, rule hiding,
swapping, Secure multiparty computation (SMC). The SMC tools for PPDM (e.g, secure sum, secure
set union, secure size set intersection, scalar product) are discussed in [2]. The extraction of data from
nodes and evaluates it’s own data with exchanging only minimal information can be proceed in SMC.
In many practical problems the game theory is required to take decision in a situation where two or
more opposite parties with conflicting interest and the action of one depends upon the other action
among the participating parties. The PPDM algorithm in distributed scenario is also used in game
theory application. The secrete sharing and a randomized practical mechanism for multiparty
computation among nodes are consider by Halpern and Teague [4]. The resilient secret sharing is
introduced by Abraham and Shamir [3, 8]. The measuring of privacy of PPDM was proposed by [10].
The interdependent security was proposed by [6, 7] which is closed to PPDM and also present several
algorithms.
298
Privacy Preservation with Penalty in Decentralized Network using Multiparty Computation
Hemanta Kumar Bhuyan, Sanjit Kumar Dash, Subrata Roy and Dillip Swain
Let PPDDM algorithm requires the ith node to perform a sequence of j tasks i.e. Ti1,Ti2, … ,Tij. The
sending and receiving messages of ith node are Si1,Si2 …. Sij and Ri1,Ri2 … Rij respectively. Sometimes
some nodes may or may not decide to send or receive messages to or from other nodes or tries to
disturb other nodes during computation. The framework is created in such way that the nodes are not
aware of other’s data but they may disturb the computation. As a result the communication and
computation cost varies during the computation of task. Each node will only participate in the system
to input their data without knowing the result of the other party’s computation. They gain the result
after each round of task computation.
In the current scenario, the framework prescribes the actions for computing, communication and
conspiracy during computation. The node ‘i’ is generated by different indicator I = {IiT, IiR, IiS, IiR-S, IiC,
IiE, IiFR IiCM } where ‘I’ is an indicator of node i and IiT is the number of task to be computed ,IiR and IiS
are indicator variables for receiving and sending messages, IiR-S indicator indicates the utilization of
time period of a particular node i, IiC indicates the conspire time, IiE indicates computational time, IiFR
indicates receive final result from initiator, IiCM indicates communication period between two nodes
(i,i-1). The computational period for all nodes should always same. If it varies, then the node must have
conspired. If IiE≠IiR-S then the task of a given node is violated. It tries to conspire a result on
computation with own data. In this framework every node has no more time to think about other’s data
or analyze the result. When nodes receive message from other nodes, they are bound to compute the
task, otherwise they transfer the task to other node within stipulated time. The protocol tries to keep the
system always in a equilibrium state where no node can gain any advantage by using a different
strategy. The initiator as head of the system keeps all information about nodes. It also keeps the
information how much number of bad nodes is involved in the system and how many times the bad
node tries to become an honest node. The framework is designed in such a way that each node can
participate as honest or if it tries to conspire the task, it has few chances to prove itself as honest or if it
does not obey the protocol at all, it will be terminated from the system
Utility function in PPDDM will be a linear or non-linear function f of utilities obtained by the choice
of framework in the respective dimensions of computation, communication, conspire and privacy
attack. The different parameters are required to describe the utility function and the parameter has
already described in previous section. Mathematically the utility function can be expressed in terms of
individual utilities:
Ui(x) = f (IiT, IiR, IiS, IiR-S, IiC, IiE, IiFR, IiCM) (1)
The utility functions assign a score for each individual parameter which is a weighted linear
combination of all of the above dimensions expressed as follows.
Ui(X) = ati ∑ Ct(IiT) + ari ∑ Cr (IiR) + asi ∑ Cs (IiS) + aiR-S ∑ Cr-s IiR-S + aiC ∑ CCIiC + aiE ∑ CE (IiE) +
aiFR ∑CFR (IiFR) + aicm ∑ CCM (IiCM) (2)
where ait, air, aiS, aiR-S aic, aiE, aiFR and aicm represents the weights for the corresponding utility factors and
‘c’is the cost function associated with individual utility factor. In the above utility function the
following utility factors: (1)aiti Ct(IiT) (2)air ∑Cr (IiR) (3)aiS ∑Cs (IiS) (4)aiR-S ∑Cr-s IiT (5)aiFR ∑CFR
(IiFR) (6)aicm ∑CCM (IiCM) are fixed for all nodes except conspired node.
Let the sum of above utility is P and rest utility is Q, Where P is used for purely honest node
and Q is used when conspire node are included. So the equation (2) is
Ui(x) = P + Q (3)
where P = ait ∑Ct (IiT) + air ∑ C (IiR) + ais ∑Cs (IiS)+ ai R-S
∑ CR-S (Ii(R-S)) + aiFR ∑ CFR (IiFR) + aiCM ∑CCM
(IiCM) & Q = aic ∑ CC (Iic) + aiE ∑CE (IiE)
At starting if the system contains only honest node then Q=0 and Ui(X) = P. In our proposed
methodology, if the system contains conspire node then the utility function of Q will be decreased even
converges to zero towards last round of the computation. In last few rounds, some of the conspire
299
Privacy Preservation with Penalty in Decentralized Network using Multiparty Computation
Hemanta Kumar Bhuyan, Sanjit Kumar Dash, Subrata Roy and Dillip Swain
nodes may be terminated or converted to honest node. After several rounds of computation, number of
semi-honest nodes in the system decreases gradually. At some later time Q converge to zero then
aiR-S ∑CR-S(IiR-S)= aiE ∑ CE (IiE) (4)
So automatically aiC ∑Cc C
(Ii ) converges to zero, when equation (4) satisfied in system then all nodes in
system becomes honest or if any node is semi-honest, they must be terminated or bound to become a
honest node.
Secure multiparty computation follows secure average protocol which computes the average of
values of n different nodes without disclosing the individual value of any node. Each node is having
value V which is used for computation and each value is kept within the domain for computation. The
secure average computation protocol has been described in [2].
The PPDM algorithm requires privacy model for measuring the threat to each party’s private data.
The Baye’s optimal privacy model [3] derives the threat associated with data privacy for a secure
computation with conspiring nodes. The Baye’s optimal model of privacy uses prior and posterior
distribution to provide privacy. The prior probability distribution is determined on the original node
value. Once the data mining process is executed, the participants can have some extra information
which is defined by posterior probability distribution and also available to the adversary at the end of
computation. When the difference between the two distributions exceeds the threshold value then the
privacy breach will occur.
The secure average computation algorithm assumes semi-honest parties to participate or terminate
from system according to their threat estimation. Each party does not know from where the data has
been sent. It only receives the computed result from the neighbor node. When the party gets the result
from other party, it immediately performs the computation by combining the received result and its
own data and sends the result to next party without knowing the next party’s information. When the
initiator receives the result, it estimates presence of semi-honest parties by viewing the result and
analyzing their computation time and data and takes measures to protect their data according to their
estimated threat.
In figure 1 node ‘V1’ is the initiator who acts as a coordinator in the ring network. Any node can be a
coordinator. The coordinator keeps own data, details of computation, threats of all nodes, registration
details of all nodes, communication report, variation of computation. Threat is measured in terms of a
certain threshold value. When the node falls in threat, then it can not perform the computation within
specified time period. At some later time it passes the result to the next node and subsequently the
result reaches to the initiator. When the initiator analyses the result, it identifies the semi-honest node
by viewing the computation details, computation time taken by individual node in the log registry. The
initiator tolerates the semi-honest node to conspire the result up to permissible number of rounds
otherwise the semi-honest node must be penalized as per the framework policies described in the next
session. After the first round of computation the initiator gets the sum of all values, number of threats
with the registration number of nodes. Then it computes the average of the participating node values
from the sum of last round and send the average value to all party. When the number of round increases,
number of semi-honest node decreases gradually i.e. converges to honest node in order to get rid of
penalty.
The algorithm given below discusses the above mechanism:
…………………………………………………………………………………………………………
Algorithm: Privacy preserving Computation with Penalty
Input: (1)Size of the network=n (2) Node type=H or SH (Honest or Semi-honest) (3) xj: data present at
each node within a specified range. (4) Only one node with node type=H designated as initiator
Output : Correct vector average
1. Initialize Threat=0, Threshold=Fixed No, Colluding Indicator I=0
2. Round=1 to xj
300
Privacy Preservation with Penalty in Decentralized Network using Multiparty Computation
Hemanta Kumar Bhuyan, Sanjit Kumar Dash, Subrata Roy and Dillip Swain
3. If (n>1)
4. R=rand()modX
5. Initiator sends it’s own data to the next node by adding it with R.
6. Next node adds it’s data with the result sent by initiator
7. If (data range)
8. Send the computation to next node
9. Else
10. Node type=SH and can’t send it’s data
11. I++
12. If(threat<threshold)
13. Threat++ and go to step 7
14. Else
15. Remove the node from the system
16. Endif
17. Endif
18. Endif
19. Continue up to n node and send final computational result to initiator.
20. Initiator evaluates Total result ’V’=receive(result-R)modX
21. Calculate avg=V/n
22. Send this avg value to all node
………………………………………………………………………………………………………
5. Penalty for Semi-honest Party
The proposed methodology creates an environment where semi-honest party can not conspire with
own optimal strategy for other party. Because the penalty is involved in the system by which the semi-
honest party think not to disturb the computation or data distortion. Sometimes the penalty is too harsh
that the semi-honest parties are bound to terminate from the system after violating the protocol
repeatedly. The following rules can be implemented in the framework to penalize the semi-honest party.
Rule-1: The party who violates the protocol, it has threshold chances to participate in the system. As
the protocol the party must bound to change their strategies if it disobey the rule.
Rule-2: If the party tries to violate the protocol continuously beyond threshold value, it must be
terminated from the system.
6. Experimental Result
We set up a simulation environment comprised of a network with 100 nodes where nodes are
randomly considered as honest or semi-honest. The topology is generated using NS-2 and
experimented for ring topology. Figure 2 shows the result of our experiment which shows the rate of
decrease of semi-honest nodes with increasing honest nodes. It shows that number of nodes interested
in violating the policy reduces to zero due to penalty scheme. After every round each node measures
penalty due to conspiracy. If the penalty is high, some of bad nodes decide not to conspire and becomes
an honest node for the subsequent rounds.
301
Privacy Preservation with Penalty in Decentralized Network using Multiparty Computation
Hemanta Kumar Bhuyan, Sanjit Kumar Dash, Subrata Roy and Dillip Swain
90
70 honest - o
semi-honest - x
60
50
40
30
20
10
0
1 1.5 2 2.5 3 3.5 4 4.5 5
number of rounds
Figure 2. Change in number of bad nodes in the network over successive rounds
7. Conclusion
The field of distributed data mining has been considerable research in the last decade. The definition
of privacy is still very much an open question and researchers and experts have different view point
about the concept of data privacy and how the problem of privacy preservation needs to be solved. Size
of the distributed network and presence of heterogeneous nodes makes privacy preservation more
complicated. The behavior of the participating nodes usually depend on their own objective and their
behavior is guided by maximization of their personal benefits. When an honest node attempts to access
the private information of another party for its own profit then it gets converted to a semi-honest node.
In order to reduce the number of semi-honest nodes in the network penalty needs to be incorporated. In
this paper we have presented a privacy preserving data computation algorithm for decentralized
distributed network with penalty for semi-honest party. The proposed algorithm can be applied to large
scale heterogeneous distributed network that require privacy preservation of data. The algorithm is
highly scalable due to the constant communication complexity.
References
[1]. M.G. Kaosar and X. Yi. Semi-trusted mixer based privacy preserving distributed data mining for
resource constrained devices international journal of computer science and information
security(IJCSIS),vol.8,No(1), April 2010.
[2]. C.Clifton, M. Kantarcioglu, J. Vaidya, X. Lin and M. Zhu. Tools for privacy preserving
distributed data mining. ACM,SIGKDD Explorations, 4(2), 2003.
[3]. I.Abraham, D.Dolev, R.Gonen and J. Halpern. Distributed computing meets game theory:
Robust mechanisms for rational secret sharing and multiparty computation. ACM symposium on
Principles of Distributed computing. Denver, Colorado, USA,2006.
[4]. J. Halpern and V. Teague. Rational secret sharing and multiparty computation extended abstract.
In proc of ACM symposium on theory of computing pages 623-632, Chicago,IL,USA,2004.
[5]. T.S. Chen, J.Chen and Y.H. Kao. A novel hybrid protection technique of privacy preserving
data mining and anti-data mining.. Information Technology Journal, 9(3), 500-505.
[6]. M.Kearns and L.Ortiz. Algorithms for interdependent security games. Advances in neutral
information processing systems 2004.
[7]. H.Kunreuther and G. Heal. Interdependent security. Journal of Risk and Uncertainty 26(2-3),
231-249, 2003.
[8]. A.Shamir. How to share a secret . Communications of the ACM, 22(11), 612-613,1979.
302
Privacy Preservation with Penalty in Decentralized Network using Multiparty Computation
Hemanta Kumar Bhuyan, Sanjit Kumar Dash, Subrata Roy and Dillip Swain
[9]. M. Trottini, S.E. Fienberg, U.E. Makov and M.M.meyer. Additive noise and Multiplicative bias
as disclosure limitation , techniques for continuous microdata. A simulation study Journal of
computational methods in science and engg. 4: 5-16, 2004.
[10]. N. Zhang, W. Zhao and J. Chen. Performance measurements for privacy preserving data mining.
In advances in knowledge discovery and data mining, pages 43-49, 2005.
[11]. R. Agrawal and R.Srikant. Privacy preserving data mining. In ACM SIGMOD, pages 0439-450,
may 2000.
[12]. H.Kargupta and K. Sivakumar. Existential pleasures of distributed data mining. Pages 1-25.
AAAI/MIT press 2004.
[13]. Mj.Zaki. parallel and distributed association mining: A survey. IEEE concurrency, 7(4):14-25,
1999.
[14]. S.Datta, K.Bhaduri, C.Giannella, R.Wolff and H.Kargupta. Distributed data mining in peer-to-
peer networks. IEEE internet computing, 10(4): 18-26, 2006.
[15]. S.Jha, L.Kruger, and P. MC Daniel. Privacy preserving clustering. In ESORICS, Pages 397-417,
2005.
[16]. H. Kargupta, K. Das, K. Liu. Multiparty. privacy preserving distributed data mining in
distributed data mining using a game theoretic framework. In 11th European conference on
principles and practice of knowledge discovery in data bases (PKDD),Pp 523-531,2007.
[17]. Pingshui WANG, "Survey on Privacy Preserving Data Mining", JDCTA: International Journal
of Digital Content Technology and its Applications, Vol. 4, No. 9, pp. 1 ~ 7, 2010
[18]. Xu Wu, "Research on Privacy Preservation in P2P Systems", IJACT: International Journal of
Advancements in Computing Technology, Vol. 3, No. 8, pp. 324 ~ 330, 2011
303