0% found this document useful (0 votes)
196 views4 pages

Conference LaTeX Template 10 17 19

The document proposes a clustering-based approach for privacy preservation on social networks. It discusses existing approaches to privacy preservation including clustering and graph-based anonymization. The proposed system performs various clustering techniques like K-Member, Greedy K-Member, c-means, one-pass k-means and k-means clustering with generalization and suppression on a dataset to anonymize social network data. The system is implemented in Python and evaluates the clustering results using the silhouette score metric.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
196 views4 pages

Conference LaTeX Template 10 17 19

The document proposes a clustering-based approach for privacy preservation on social networks. It discusses existing approaches to privacy preservation including clustering and graph-based anonymization. The proposed system performs various clustering techniques like K-Member, Greedy K-Member, c-means, one-pass k-means and k-means clustering with generalization and suppression on a dataset to anonymize social network data. The system is implemented in Python and evaluates the clustering results using the silhouette score metric.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Clustering Based Anonymization for Privacy

Preservation on Social Network


1st Rutuja Prakash Jagtap 2nd Mukta Ganpat Warungase
Department of Computer Engineering Department of Computer Engineering
K.K. Wagh Institutes of Engg Education and Research, K.K.Wagh Institutes of Engg Education and Research,
Nashik, India Nashik, India
[email protected] [email protected]

3rd Chaitali Rajendra Narayane 4th Kalpesh Bapurao Ahire


Department of Computer Engineering Department of Computer Engineering
K.K.Wagh Institutes of Engg Education and Research, K.K.Wagh Institutes of Engg Education and Research,
Nashik, India Nashik, India
[email protected] [email protected]

5th Prof. Dhanajay M. Kanade


Department of Computer Engineering
K.K.Wagh Institutes of Engg Education and Research,
Nashik, India
[email protected]

Abstract—The current surge in social network popularity has conventional approaches that concentrate on evaluating the
generated massive amounts of data about social network inter- characteristics of individual social actors.
action. Because these data contain many personal characteristics Privacy Preservation is a recent research area which consists
about individuals, anonymization is essential. Anonymization is
a realistic technique for protecting consumers’ privacy when of two major categories as given in Figure 1.1. One of the
publishing data. This important step is required before final data category is Privacy Preservation in Data Mining (PPDM)
can be used in research and data mining because it is no longer and another is Privacy Preservation in Data Publishing(PPDP)
personal data. Because online social networks (OSNs) include .In PPDM, after applying data mining functionalities mined
sensitive information about unique members, it is necessary to patterns can be hidden from the third parties (intruders).
anonymize network data before making it public. In this paper we
proposed a novel method for privacy preserving. The proposed
system is implemented in python on the dataset. Our system is
evaluated with the evaluation parameter silhouette score and the
results shows that our proposed approach outperforms.
Index Terms—Clustering, Privacy preserving, Social Network-
ing.

I. I NTRODUCTION
Recently, social networks [1,2] have received a lot of
attention in research and development, partly because more
and more social networks are being built online and the
development of Web 2.0 applications. Social networks model
social relationships using graph structures with vertices and Fig. 1. Categories of Privacy Preservation Mechanism
edges. Vertices model individual social actors in a network,
while edges model relationships between social actors. Social There has been a lot of research done on relational data
network analysis [3, 1, 4, 5] has become a crucial tool in con- privacy preservation. Re-identifying people by combining a
temporary sociology, geography, economics, and information published table holding sensitive information with some ex-
science as a result of the explosive rise of social networks. ternal tables modelling attacker background knowledge is
Finding hidden social patterns is the aim of social network a significant category of privacy attacks on relational data.
analysis. The effectiveness of social network analysis has Numerous effective algorithms and significant models have
been demonstrated to be substantially greater than that of been put out. However, the majority of current studies can only
work with relational data. Straightforwardly using those pro- A Privacy Preserving Framework for Supervisory Control
cedures on social network data is not possible. Comparatively and Data Acquisition (PPFSCADA) is Strategy-based permu-
speaking, anonymizing social network data is substantially tation method introduced by Adil Fahad et al. (2014)[14]
more difficult [6]. where the data privacy and data mining techniques are
In this paper we present a clustering based method for managed simultaneously. The designed technique includes
privacy preserving in social network. The proposed system is the vertically partitioning original dataset for increasing the
built in python with the help of various libraries. The rest of the perturbation results. A framework is introduced with many
paper is organized as, section II analyzed state of art systems, network traffic data with arithmetical, definite and hierarchical
section III presents the proposed model explanation. Section attributes. It is also used for clustering the partitioned sets
IV shows the results and section V concludes the paper. into many clusters depending on the designed framework.
The perturbation process is realized through the variation of
II. L ITERATURE S URVEY original attribute value with a new value.
[15] makes the assumption that the vertices are broken down
In this section we present a deep literature survey on state into equivalence classes and that each class is appropriately
of art systems for privacy preserving in social networks. The anonymised using a relational data anonymization method that
findings of the single pass k-means anonymization technique is already in use. Then, to more effectively anonymize the
and the anonymized viewpoint of a data collection are the main social network, examine whether edges should be included
topics of the [7] paper. Using generalization and suppression in the collapsed graph after condensing all of the vertices
techniques, the dataset is made anonymous. Researchers in in an equivalence class into a single vertex. Publishing the
[8] analyzes the likely issues in these crucial areas of privacy, number of edges of each edge type connecting two vertices in
background knowledge, and data utility. It focuses on the cur- an equivalence class is one practical method. The method in
rent techniques for anonymization for maintaining the privacy question is known as cluster-edge anonymization.
of disclosing data on social networks, acknowledges the chal-
lenges associated with maintaining secrecy while publishing III. P ROPOSED S YSTEM
social network information. The foundation is provided by the In this section we present our proposed model architecture
clustering- and graph-based anonymization method. and its details. Figure 2 depicts the system architecture dia-
The authors [9] presents a k-anonymity approach to min- gram. Our proposed model takes a dataset as an input. This
imize information loss during the generalization process for
anonymised data, Since the clustering-based k-Anonymity
technique employs separate anonymous sets of data and runs
in O (n2/k) time, it is crucial to combine related data types into
a single group. The author makes a useful comparison between
their techniques and other clustering-based k-Anonymity tech-
niques. The goal of this study is to gain an anonymized
view of data without revealing any personal information about
the users or their connections to other users. Researchers
[10] offer examples of privacy protection challenges on social
media. The sequential clustering-based anonymization process
is presented in two different iterations by the author, starting
with the centralized scenario.
A new method called slicing is designed by Li et al.
(2012)[11] that divides the data horizontally and vertically.
Slicing maintains data usage than generalization and is em-
ployed for membership disclosure protection. Slicing based
privacy preserving micro data publishing is used to manage
high- dimensional data. Slicing is employed for attribute
disclosure protection and serves as an efficient algorithm
for computing the sliced data with l-diversity needs. Slic-
ing maintains better usage of data than generalization and
bucketization in workloads with the sensitive attribute. Two
new anonymization methods have been designed by Ghinita Fig. 2. System Architecture
et al. (2011)[12] for sparsehigh-dimensional data. They depend
on Nearest-Neighbor (NN) search in high-dimensional spaces data is preprocessed to remove redundant, null and unwanted
which uses Locality-Sensitive Hashing (LSH). The slicing data from the dataset. After preprocessing the dataset is
process is revisited by Vani and Jayanthi (2013)[13] and is splitted into training and testing dataset. Our proposed model
utilized for the attribute disclosure protection. performs several clustering techniques, including K-Member,
Greedy K-Member, c-means, one-pass k-means and k-means The Silhouette Coefficient is a metric used to evaluate the
clustering with generalization and suppression on a dataset. quality of clustering results. It measures how well each data
The dataset is first loaded using Pandas and unnecessary point in a cluster is separated from the other data points
columns are dropped. Categorical variables are then encoded, in the same cluster (cohesion) compared to how well it is
and numerical variables are scaled using the MinMaxScaler separated from the data points in the neighboring clusters
method from sklearn.preprocessing. (separation). The Silhouette Coefficient ranges from -1 to
The first clustering technique applied is K-Member, 1, where a value of 1 indicates that the data point is well
which is performed using the KMedoids function from matched to its own cluster and poorly matched to neighboring
sklearn extra.cluster. The number of clusters is set to 5, clusters, a value of 0 indicates that the data point is on the
and the random state is set to 0. The Silhouette Coefficient boundary between two clusters, and a value of -1 indicates
is then calculated using the silhouette score function from that the data point is poorly matched to its own cluster and
sklearn.metrics. well matched to a neighboring cluster. A higher Silhouette
The second clustering technique applied is Greedy K- Coefficient value indicates a better clustering result. Figure 3
Member, which is performed using the SpectralClustering shows the comparative results analysis of our propose system.
function from sklearn.cluster. The number of clusters is set
C. Noise
to 5, the affinity is set to ’nearest neighbors’, the number
of neighbors is set to 5, and the labels are assigned using
’discretize’. The Silhouette Coefficient is then calculated using
the silhouette score function.
The third clustering technique applied is c-means clustering
using k-medoids, which is performed using the KMedoids
function from sklearn.cluster. The number of clusters is set
to 5, the metric is set to ’euclidean’, the initialization is set to
’k-medoids++’, and the maximum number of iterations is set
to 300. The Silhouette Coefficient is then calculated using the
silhouette score function.
The fourth clustering technique applied is one-pass k-means
clustering, which is performed using the MiniBatchKMeans
function from sklearn.cluster. The number of clusters is set Fig. 3. Comparative analysis
to 5, the random state is set to 0, and the maximum number
of iterations is set to 100. The Silhouette Coefficient is then
V. C ONCLUSION
calculated using the silhouette score function.
The final clustering technique applied is k-means clustering In this paper we present a clustering based approach for
with generalization and suppression. This is implemented privacy preserving on social networks. By analysing Dif-
using a for loop that performs the k-means algorithm multiple ferent Clustering Method, The Centroid-based Clustering i.e
times, suppressing sensitive attributes and generalizing quasi- K Means algorithm is best because it it is straightforward
identifiers. The number of clusters is set to 5, the maximum and effective. By applying generalization and suppression
number of iterations is set to 100, the convergence threshold is process on dataset to get the anonymized view of data set. As
set to 1e-4, the suppression factor for sensitive attributes is set social network data is much more complicated than relational
to 0.5, and the generalization factor for quasi-identifiers is set data, privacy preserving in social networks is much more
to 0.5. The sensitive attributes and quasi-identifiers are defined, challenging and needs many serious efforts in the near fu-
and centroids are initialized by sampling from the dataset. The ture. Particularly, modeling adversarial attacks and developing
Silhouette Coefficient is not calculated in this final step. corresponding privacy preservation strategies are critical.
Finally, the anonymized dataset is output as a CSV file for R EFERENCES
each clustering technique. The output files include the cluster
[1] J. Scott. Social Network Analysis Handbook. Sage Publications Inc.,
labels for each record. 2000.
[2] B. Wellman. For a social network analysis of computer networks: a
IV. R ESULTS sociological perspective on collaborative work and virtual community.
In Proceedings of the 1996 ACM SIGCPR/SIGMIS Conference on
A. Dataset Computer Personnel Research (SIGCPR’96), pages 111, New York, NY,
To test the efficiency of our system we have downloaded USA, 1996. A CM Press.
[3] L. C. Freeman, D. R. White, and A. K. Romney. Research Methods in
our own dataset of facebook users from kaggle with features Social Network Analysis. George Mason University Press, Fairfax, VA,
like gender, marital status, city, zip code, and country. The 1989.
dataset consists of 1500 records. [4] S. Wasserman and K. Faust. Social network analysis: Methods and
applications. Cambridge University Press, 1994.
B. Evaluation parameter [5] J. Srivastava, M. A. Ahmad, N. Pathak, and D. K.-W. Hsu. Data mining
based social network analysis from online behavior. Tutorial at the 8th
Silhouette Coefficient SIAM International Conference on Data Mining (SDM’08), 2008.
[6] B. Zhou and J. Pei. Preserving privacy in social networks against
neighborhood attacks. In Proceedings of the 24th IEEE International
Conference on Data Engineering (ICDE’08), pages 506515, Cancun,
Mexico, 2008. IEEE Computer Society.
[7] Rashmi B. Ghate, Rasika Ingle ”Clustering Based Anonymization for
Privacy Preservation” 2013
[8] Tamir Tassa and Dror J. Cohen, “Anonymization of centralized and
distributed social networks by sequential clustering” IEEE Transactions
on Knowledge and data Engineering, Vol. 25, pp. 311-324, Feb 2013.
[9] Jun-Lin Lin, Meng-ChengWei,” An Efficient Method for K-
anonymization“, Journal ACM 08 proceeding of International Workshop
o Privacy and Anonymity in Information Society, pp. 46-50, 2008
[10] Bin Zhou, Jian Pei and Wo-Shun Luk,” A Brief Survey on Anonymiza-
tion Techniques for Privacy Preserving Publishing of Social Network
Data”, ACM Newsletter Journal, Vol. 10, pp. 12-22, December 2008.
[11] Li, T, Li, N, Zhang, J & Molloy, I 2012, ’Slicing: A new approach for
privacy preserving data publishing’, IEEE transactions on knowledge
and data engineering, vol. 24, no. 3, pp. 561-574
[12] Ghinita, G, Kalnis, P & Tao, Y 2011, ’Anonymous publication of
sensitive transactional data’, IEEE Transactions on Knowledge and Data
Engineering, vol. 23, no. 2, pp. 161-174.
[13] Vani, B., & Jayanthi, D. 2013, ’Efficient approach for privacy preserving
microdata publishing using slicing’. International Journal of Research in
Computer and Communication Technology, 4, 225
[14] Haghnegahdar, A, Khabbazian, M & Bhargava, VK 2014, ’Privacy risks
in publishing mobile device trajectories’, IEEE Wireless Communica-
tions Letters, vol. 3, no. 3, pp. 241-244.
[15] E. Zheleva and L. Getoor. Preserving the privacy of sensitive relation-
ships in graph data. In Proceedings of the 1st ACM SIGKDD Workshop
on Privacy, Security, and Trust in KDD (PinKDD’07), 2007.

You might also like