Conference LaTeX Template 10 17 19

The document proposes a clustering-based approach for privacy preservation on social networks. It discusses existing approaches to privacy preservation including clustering and graph-based anonymization. The proposed system performs various clustering techniques like K-Member, Greedy K-Member, c-means, one-pass k-means and k-means clustering with generalization and suppression on a dataset to anonymize social network data. The system is implemented in Python and evaluates the clustering results using the silhouette score metric.

Uploaded by

Chaitali Narayane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

196 views4 pages

Conference LaTeX Template 10 17 19

Uploaded by

Chaitali Narayane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Clustering Based Anonymization for Privacy

Preservation on Social Network

1st Rutuja Prakash Jagtap 2nd Mukta Ganpat Warungase
Department of Computer Engineering Department of Computer Engineering
K.K. Wagh Institutes of Engg Education and Research, K.K.Wagh Institutes of Engg Education and Research,
Nashik, India Nashik, India
[email protected] [email protected]

3rd Chaitali Rajendra Narayane 4th Kalpesh Bapurao Ahire

Department of Computer Engineering Department of Computer Engineering
K.K.Wagh Institutes of Engg Education and Research, K.K.Wagh Institutes of Engg Education and Research,
Nashik, India Nashik, India
[email protected] [email protected]

5th Prof. Dhanajay M. Kanade

Department of Computer Engineering
K.K.Wagh Institutes of Engg Education and Research,
Nashik, India
[email protected]

Abstract—The current surge in social network popularity has conventional approaches that concentrate on evaluating the
generated massive amounts of data about social network inter- characteristics of individual social actors.
action. Because these data contain many personal characteristics Privacy Preservation is a recent research area which consists
about individuals, anonymization is essential. Anonymization is
a realistic technique for protecting consumers’ privacy when of two major categories as given in Figure 1.1. One of the
publishing data. This important step is required before final data category is Privacy Preservation in Data Mining (PPDM)
can be used in research and data mining because it is no longer and another is Privacy Preservation in Data Publishing(PPDP)
personal data. Because online social networks (OSNs) include .In PPDM, after applying data mining functionalities mined
sensitive information about unique members, it is necessary to patterns can be hidden from the third parties (intruders).
anonymize network data before making it public. In this paper we
proposed a novel method for privacy preserving. The proposed
system is implemented in python on the dataset. Our system is
evaluated with the evaluation parameter silhouette score and the
results shows that our proposed approach outperforms.
Index Terms—Clustering, Privacy preserving, Social Network-
ing.

I. I NTRODUCTION
Recently, social networks [1,2] have received a lot of
attention in research and development, partly because more
and more social networks are being built online and the
development of Web 2.0 applications. Social networks model
social relationships using graph structures with vertices and Fig. 1. Categories of Privacy Preservation Mechanism
edges. Vertices model individual social actors in a network,
while edges model relationships between social actors. Social There has been a lot of research done on relational data
network analysis [3, 1, 4, 5] has become a crucial tool in con- privacy preservation. Re-identifying people by combining a
temporary sociology, geography, economics, and information published table holding sensitive information with some ex-
science as a result of the explosive rise of social networks. ternal tables modelling attacker background knowledge is
Finding hidden social patterns is the aim of social network a significant category of privacy attacks on relational data.
analysis. The effectiveness of social network analysis has Numerous effective algorithms and significant models have
been demonstrated to be substantially greater than that of been put out. However, the majority of current studies can only
work with relational data. Straightforwardly using those pro- A Privacy Preserving Framework for Supervisory Control
cedures on social network data is not possible. Comparatively and Data Acquisition (PPFSCADA) is Strategy-based permu-
speaking, anonymizing social network data is substantially tation method introduced by Adil Fahad et al. (2014)[14]
more difficult [6]. where the data privacy and data mining techniques are
In this paper we present a clustering based method for managed simultaneously. The designed technique includes
privacy preserving in social network. The proposed system is the vertically partitioning original dataset for increasing the
built in python with the help of various libraries. The rest of the perturbation results. A framework is introduced with many
paper is organized as, section II analyzed state of art systems, network traffic data with arithmetical, definite and hierarchical
section III presents the proposed model explanation. Section attributes. It is also used for clustering the partitioned sets
IV shows the results and section V concludes the paper. into many clusters depending on the designed framework.
The perturbation process is realized through the variation of
II. L ITERATURE S URVEY original attribute value with a new value.
[15] makes the assumption that the vertices are broken down
In this section we present a deep literature survey on state into equivalence classes and that each class is appropriately
of art systems for privacy preserving in social networks. The anonymised using a relational data anonymization method that
findings of the single pass k-means anonymization technique is already in use. Then, to more effectively anonymize the
and the anonymized viewpoint of a data collection are the main social network, examine whether edges should be included
topics of the [7] paper. Using generalization and suppression in the collapsed graph after condensing all of the vertices
techniques, the dataset is made anonymous. Researchers in in an equivalence class into a single vertex. Publishing the
[8] analyzes the likely issues in these crucial areas of privacy, number of edges of each edge type connecting two vertices in
background knowledge, and data utility. It focuses on the cur- an equivalence class is one practical method. The method in
rent techniques for anonymization for maintaining the privacy question is known as cluster-edge anonymization.
of disclosing data on social networks, acknowledges the chal-
lenges associated with maintaining secrecy while publishing III. P ROPOSED S YSTEM
social network information. The foundation is provided by the In this section we present our proposed model architecture
clustering- and graph-based anonymization method. and its details. Figure 2 depicts the system architecture dia-
The authors [9] presents a k-anonymity approach to min- gram. Our proposed model takes a dataset as an input. This
imize information loss during the generalization process for
anonymised data, Since the clustering-based k-Anonymity
technique employs separate anonymous sets of data and runs
in O (n2/k) time, it is crucial to combine related data types into
a single group. The author makes a useful comparison between
their techniques and other clustering-based k-Anonymity tech-
niques. The goal of this study is to gain an anonymized
view of data without revealing any personal information about
the users or their connections to other users. Researchers
[10] offer examples of privacy protection challenges on social
media. The sequential clustering-based anonymization process
is presented in two different iterations by the author, starting
with the centralized scenario.
A new method called slicing is designed by Li et al.
(2012)[11] that divides the data horizontally and vertically.
Slicing maintains data usage than generalization and is em-
ployed for membership disclosure protection. Slicing based
privacy preserving micro data publishing is used to manage
high- dimensional data. Slicing is employed for attribute
disclosure protection and serves as an efficient algorithm
for computing the sliced data with l-diversity needs. Slic-
ing maintains better usage of data than generalization and
bucketization in workloads with the sensitive attribute. Two
new anonymization methods have been designed by Ghinita Fig. 2. System Architecture
et al. (2011)[12] for sparsehigh-dimensional data. They depend
on Nearest-Neighbor (NN) search in high-dimensional spaces data is preprocessed to remove redundant, null and unwanted
which uses Locality-Sensitive Hashing (LSH). The slicing data from the dataset. After preprocessing the dataset is
process is revisited by Vani and Jayanthi (2013)[13] and is splitted into training and testing dataset. Our proposed model
utilized for the attribute disclosure protection. performs several clustering techniques, including K-Member,
Greedy K-Member, c-means, one-pass k-means and k-means The Silhouette Coefficient is a metric used to evaluate the
clustering with generalization and suppression on a dataset. quality of clustering results. It measures how well each data
The dataset is first loaded using Pandas and unnecessary point in a cluster is separated from the other data points
columns are dropped. Categorical variables are then encoded, in the same cluster (cohesion) compared to how well it is
and numerical variables are scaled using the MinMaxScaler separated from the data points in the neighboring clusters
method from sklearn.preprocessing. (separation). The Silhouette Coefficient ranges from -1 to
The first clustering technique applied is K-Member, 1, where a value of 1 indicates that the data point is well
which is performed using the KMedoids function from matched to its own cluster and poorly matched to neighboring
sklearn extra.cluster. The number of clusters is set to 5, clusters, a value of 0 indicates that the data point is on the
and the random state is set to 0. The Silhouette Coefficient boundary between two clusters, and a value of -1 indicates
is then calculated using the silhouette score function from that the data point is poorly matched to its own cluster and
sklearn.metrics. well matched to a neighboring cluster. A higher Silhouette
The second clustering technique applied is Greedy K- Coefficient value indicates a better clustering result. Figure 3
Member, which is performed using the SpectralClustering shows the comparative results analysis of our propose system.
function from sklearn.cluster. The number of clusters is set
C. Noise
to 5, the affinity is set to ’nearest neighbors’, the number
of neighbors is set to 5, and the labels are assigned using
’discretize’. The Silhouette Coefficient is then calculated using
the silhouette score function.
The third clustering technique applied is c-means clustering
using k-medoids, which is performed using the KMedoids
function from sklearn.cluster. The number of clusters is set
to 5, the metric is set to ’euclidean’, the initialization is set to
’k-medoids++’, and the maximum number of iterations is set
to 300. The Silhouette Coefficient is then calculated using the
silhouette score function.
The fourth clustering technique applied is one-pass k-means
clustering, which is performed using the MiniBatchKMeans
function from sklearn.cluster. The number of clusters is set Fig. 3. Comparative analysis
to 5, the random state is set to 0, and the maximum number
of iterations is set to 100. The Silhouette Coefficient is then
V. C ONCLUSION
calculated using the silhouette score function.
The final clustering technique applied is k-means clustering In this paper we present a clustering based approach for
with generalization and suppression. This is implemented privacy preserving on social networks. By analysing Dif-
using a for loop that performs the k-means algorithm multiple ferent Clustering Method, The Centroid-based Clustering i.e
times, suppressing sensitive attributes and generalizing quasi- K Means algorithm is best because it it is straightforward
identifiers. The number of clusters is set to 5, the maximum and effective. By applying generalization and suppression
number of iterations is set to 100, the convergence threshold is process on dataset to get the anonymized view of data set. As
set to 1e-4, the suppression factor for sensitive attributes is set social network data is much more complicated than relational
to 0.5, and the generalization factor for quasi-identifiers is set data, privacy preserving in social networks is much more
to 0.5. The sensitive attributes and quasi-identifiers are defined, challenging and needs many serious efforts in the near fu-
and centroids are initialized by sampling from the dataset. The ture. Particularly, modeling adversarial attacks and developing
Silhouette Coefficient is not calculated in this final step. corresponding privacy preservation strategies are critical.
Finally, the anonymized dataset is output as a CSV file for R EFERENCES
each clustering technique. The output files include the cluster
[1] J. Scott. Social Network Analysis Handbook. Sage Publications Inc.,
labels for each record. 2000.
[2] B. Wellman. For a social network analysis of computer networks: a
IV. R ESULTS sociological perspective on collaborative work and virtual community.
In Proceedings of the 1996 ACM SIGCPR/SIGMIS Conference on
A. Dataset Computer Personnel Research (SIGCPR’96), pages 111, New York, NY,
To test the efficiency of our system we have downloaded USA, 1996. A CM Press.
[3] L. C. Freeman, D. R. White, and A. K. Romney. Research Methods in
our own dataset of facebook users from kaggle with features Social Network Analysis. George Mason University Press, Fairfax, VA,
like gender, marital status, city, zip code, and country. The 1989.
dataset consists of 1500 records. [4] S. Wasserman and K. Faust. Social network analysis: Methods and
applications. Cambridge University Press, 1994.
B. Evaluation parameter [5] J. Srivastava, M. A. Ahmad, N. Pathak, and D. K.-W. Hsu. Data mining
based social network analysis from online behavior. Tutorial at the 8th
Silhouette Coefficient SIAM International Conference on Data Mining (SDM’08), 2008.
[6] B. Zhou and J. Pei. Preserving privacy in social networks against
neighborhood attacks. In Proceedings of the 24th IEEE International
Conference on Data Engineering (ICDE’08), pages 506515, Cancun,
Mexico, 2008. IEEE Computer Society.
[7] Rashmi B. Ghate, Rasika Ingle ”Clustering Based Anonymization for
Privacy Preservation” 2013
[8] Tamir Tassa and Dror J. Cohen, “Anonymization of centralized and
distributed social networks by sequential clustering” IEEE Transactions
on Knowledge and data Engineering, Vol. 25, pp. 311-324, Feb 2013.
[9] Jun-Lin Lin, Meng-ChengWei,” An Efficient Method for K-
anonymization“, Journal ACM 08 proceeding of International Workshop
o Privacy and Anonymity in Information Society, pp. 46-50, 2008
[10] Bin Zhou, Jian Pei and Wo-Shun Luk,” A Brief Survey on Anonymiza-
tion Techniques for Privacy Preserving Publishing of Social Network
Data”, ACM Newsletter Journal, Vol. 10, pp. 12-22, December 2008.
[11] Li, T, Li, N, Zhang, J & Molloy, I 2012, ’Slicing: A new approach for
privacy preserving data publishing’, IEEE transactions on knowledge
and data engineering, vol. 24, no. 3, pp. 561-574
[12] Ghinita, G, Kalnis, P & Tao, Y 2011, ’Anonymous publication of
sensitive transactional data’, IEEE Transactions on Knowledge and Data
Engineering, vol. 23, no. 2, pp. 161-174.
[13] Vani, B., & Jayanthi, D. 2013, ’Efficient approach for privacy preserving
microdata publishing using slicing’. International Journal of Research in
Computer and Communication Technology, 4, 225
[14] Haghnegahdar, A, Khabbazian, M & Bhargava, VK 2014, ’Privacy risks
in publishing mobile device trajectories’, IEEE Wireless Communica-
tions Letters, vol. 3, no. 3, pp. 241-244.
[15] E. Zheleva and L. Getoor. Preserving the privacy of sensitive relation-
ships in graph data. In Proceedings of the 1st ACM SIGKDD Workshop
on Privacy, Security, and Trust in KDD (PinKDD’07), 2007.

Writing Your Dissertation in Fifteen Minutes A Day A Guide To Starting
100% (1)
Writing Your Dissertation in Fifteen Minutes A Day A Guide To Starting
7 pages
Laport 2009
100% (1)
Laport 2009
10 pages
7 Práctica de Laboratorio Realizar El Desafío de Python
No ratings yet
7 Práctica de Laboratorio Realizar El Desafío de Python
6 pages
Juventino Rosas
100% (1)
Juventino Rosas
5 pages
Fundamentals of Computers, Sixth Edition by Rajaraman, V., Adabala, Neeharika
100% (1)
Fundamentals of Computers, Sixth Edition by Rajaraman, V., Adabala, Neeharika
2 pages
Assignment #1 422 PDF
No ratings yet
Assignment #1 422 PDF
5 pages
2i-3j+4k B - I+2j+5k C 3i+6j-K 1. 2. : en Los Problemas Del 1 Al 4
No ratings yet
2i-3j+4k B - I+2j+5k C 3i+6j-K 1. 2. : en Los Problemas Del 1 Al 4
8 pages
Clustering Based Anonymization For Privacy Preservation PDF
No ratings yet
Clustering Based Anonymization For Privacy Preservation PDF
3 pages
Itu-T: Information Technology - Open Systems Interconnection - Systems Management
No ratings yet
Itu-T: Information Technology - Open Systems Interconnection - Systems Management
35 pages
1003 0720 Modelación y Simulación 2 - Libro Averill M Law - Simulation Modeling and Analysis - Solutions of Select Exercises
100% (1)
1003 0720 Modelación y Simulación 2 - Libro Averill M Law - Simulation Modeling and Analysis - Solutions of Select Exercises
285 pages
This Study Resource Was: Conjunto de Problemas 4.2C
No ratings yet
This Study Resource Was: Conjunto de Problemas 4.2C
8 pages
Untitled
100% (2)
Untitled
545 pages
Tarea 1 Diseno
100% (2)
Tarea 1 Diseno
20 pages
Manual de StatFit
No ratings yet
Manual de StatFit
112 pages
Examen2 Itsel
No ratings yet
Examen2 Itsel
7 pages
Actividad 1 T5 PDF
No ratings yet
Actividad 1 T5 PDF
5 pages
Acordeon Java
No ratings yet
Acordeon Java
4 pages
Act3 U2 M Grafico
No ratings yet
Act3 U2 M Grafico
40 pages
Chap6 Advanced Association Analysis
No ratings yet
Chap6 Advanced Association Analysis
85 pages
Investigacion de Operaciones Un Campo Multidisciplinario PDF
No ratings yet
Investigacion de Operaciones Un Campo Multidisciplinario PDF
22 pages
Informe KD Tree
No ratings yet
Informe KD Tree
14 pages
Analysis and Redesign of The Existing Campus Network: A Case Study
No ratings yet
Analysis and Redesign of The Existing Campus Network: A Case Study
12 pages
Density & Grid Based Clustering
100% (1)
Density & Grid Based Clustering
21 pages
Distribution and Network Models: Solutions
No ratings yet
Distribution and Network Models: Solutions
10 pages
Task 0 - Recognition of Knowledge - Evaluation Quiz - TEOR DECRevisión Del Intento
No ratings yet
Task 0 - Recognition of Knowledge - Evaluation Quiz - TEOR DECRevisión Del Intento
8 pages
Objective: For One Dimensional Data Set (7,10,20,28,35), Perform Hierarchical Clustering
No ratings yet
Objective: For One Dimensional Data Set (7,10,20,28,35), Perform Hierarchical Clustering
13 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
DBSCAN Clustering Algorithm: Presented by
No ratings yet
DBSCAN Clustering Algorithm: Presented by
22 pages
IM Ch14 Big Data Analytics NoSQL Ed12
No ratings yet
IM Ch14 Big Data Analytics NoSQL Ed12
8 pages
Programming Assignment 3: Greedy Algorithms
No ratings yet
Programming Assignment 3: Greedy Algorithms
15 pages
TreePlan Student 179 Addin
No ratings yet
TreePlan Student 179 Addin
2 pages
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
No ratings yet
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
2 pages
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
No ratings yet
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
5 pages
Data File For Students - C08Data
No ratings yet
Data File For Students - C08Data
46 pages
Software Exam
No ratings yet
Software Exam
8 pages
Cluster Analysis Chapter 8 Solution
No ratings yet
Cluster Analysis Chapter 8 Solution
8 pages
Tabla Distribución Normal
No ratings yet
Tabla Distribución Normal
1 page
Final Exam SP '18
No ratings yet
Final Exam SP '18
6 pages
Linear Programming Simplex Method
No ratings yet
Linear Programming Simplex Method
5 pages
Computer Aided Quality Control
No ratings yet
Computer Aided Quality Control
48 pages
Sat - 78.Pdf - Adaptive Transmission of Sensitive Information in Online Social Networks
No ratings yet
Sat - 78.Pdf - Adaptive Transmission of Sensitive Information in Online Social Networks
11 pages
Homework 1 Solution
No ratings yet
Homework 1 Solution
14 pages
Jumping Mario: Escuela Colombiana de Ingenier Ia Programaci On de Computadores
No ratings yet
Jumping Mario: Escuela Colombiana de Ingenier Ia Programaci On de Computadores
1 page
Costos de Transporte
No ratings yet
Costos de Transporte
4 pages
Taller Simulacion
No ratings yet
Taller Simulacion
157 pages
A Famous Example in Genetic Modeling Tanner 1996 or Dempster Laird and Rubin 1977 Is A PDF
No ratings yet
A Famous Example in Genetic Modeling Tanner 1996 or Dempster Laird and Rubin 1977 Is A PDF
1 page
Data Mining
No ratings yet
Data Mining
15 pages
Linear Programming
100% (1)
Linear Programming
82 pages
Module 2 - Data Structures
0% (1)
Module 2 - Data Structures
48 pages
Data Warehousing, OLAP, Data Mining Practice Questions Solutions
No ratings yet
Data Warehousing, OLAP, Data Mining Practice Questions Solutions
4 pages
Nearest Neighbour Algorithm
No ratings yet
Nearest Neighbour Algorithm
20 pages
Deterministic Dynamic Programming Part 1
No ratings yet
Deterministic Dynamic Programming Part 1
34 pages
Problemas de Programacion Lineal
No ratings yet
Problemas de Programacion Lineal
12 pages
Problem 5 Page 75
No ratings yet
Problem 5 Page 75
6 pages
03 - K Means Clustering On Iris Datasets
No ratings yet
03 - K Means Clustering On Iris Datasets
4 pages
Optimización de Consultas en Bases de Datos Relacionales
No ratings yet
Optimización de Consultas en Bases de Datos Relacionales
44 pages
Metodos de Optimizacion Ejercicios
No ratings yet
Metodos de Optimizacion Ejercicios
7 pages
Cluster Analysis in Python Chapter2 PDF
No ratings yet
Cluster Analysis in Python Chapter2 PDF
30 pages
HW 1 Chap 1
No ratings yet
HW 1 Chap 1
14 pages
Privacy in Online Social Networks: A Survey
No ratings yet
Privacy in Online Social Networks: A Survey
4 pages
The K-Anonymity and L-Diversit
No ratings yet
The K-Anonymity and L-Diversit
32 pages
Chapter Five Ethical and Legal Issues
No ratings yet
Chapter Five Ethical and Legal Issues
8 pages
The GDPR and You
100% (1)
The GDPR and You
11 pages
Bay City
No ratings yet
Bay City
3 pages
Nsg. Research Chap. 1 - 3
No ratings yet
Nsg. Research Chap. 1 - 3
38 pages
DOW Code of Conduct
No ratings yet
DOW Code of Conduct
24 pages
Bill of Rights Searches and Seizures Warrantless Searches
No ratings yet
Bill of Rights Searches and Seizures Warrantless Searches
3 pages
Toyota Motor Vehicle Insurance Policy
No ratings yet
Toyota Motor Vehicle Insurance Policy
40 pages
Safe and Quality Nursing Care
No ratings yet
Safe and Quality Nursing Care
2 pages
Privacy Notice For Senior High School Voucher Program Application
No ratings yet
Privacy Notice For Senior High School Voucher Program Application
3 pages
Data Protection and Data Privacy
No ratings yet
Data Protection and Data Privacy
75 pages
Abeka World Literature Review
100% (3)
Abeka World Literature Review
5 pages
Urogenital System
No ratings yet
Urogenital System
8 pages
Belo Henares
No ratings yet
Belo Henares
4 pages
Abuse of Dominance Digital Age India
No ratings yet
Abuse of Dominance Digital Age India
9 pages
Foreign Worker Medical Examination Registration Form 2021 25th
No ratings yet
Foreign Worker Medical Examination Registration Form 2021 25th
2 pages
Final Report of The Sandy Hook Advisory Commission
0% (1)
Final Report of The Sandy Hook Advisory Commission
277 pages
Exam 20230906 17525
No ratings yet
Exam 20230906 17525
3 pages
Atlassian Customer DPA - January 2023
No ratings yet
Atlassian Customer DPA - January 2023
24 pages
Department of Labor: o
No ratings yet
Department of Labor: o
18 pages
V2 Exhibitor Catalogue
No ratings yet
V2 Exhibitor Catalogue
9 pages
Short Spring School in Surveillance 17 19 Mag
No ratings yet
Short Spring School in Surveillance 17 19 Mag
2 pages
Sy0 601 16
No ratings yet
Sy0 601 16
20 pages
Presentation GROUP 2
No ratings yet
Presentation GROUP 2
5 pages
Adhar Scan Rajpurohit Kamini
No ratings yet
Adhar Scan Rajpurohit Kamini
2 pages
Music Player Final Report
No ratings yet
Music Player Final Report
33 pages
IT Policies and Procedures
No ratings yet
IT Policies and Procedures
35 pages
Oop 2017 71014
No ratings yet
Oop 2017 71014
57 pages
PRR 1779 Smith - Response From Schaaf
No ratings yet
PRR 1779 Smith - Response From Schaaf
118 pages
Unit Iii & Iv
No ratings yet
Unit Iii & Iv
60 pages

Conference LaTeX Template 10 17 19

Uploaded by

Conference LaTeX Template 10 17 19

Uploaded by

Clustering Based Anonymization for Privacy

Preservation on Social Network

3rd Chaitali Rajendra Narayane 4th Kalpesh Bapurao Ahire

5th Prof. Dhanajay M. Kanade

You might also like