Authors:
Narges Alipourjeddi
and
Ali Miri
Affiliation:
Department of Computer Science, Toronto Metropolitan University, Toronto, Canada
Keyword(s):
High-Dimensional Data, Privacy Preservation, Persistent Homology, Differential Privacy, Data Publishing, Topological Data Analysis.
Abstract:
As the era of big data unfolds, high-dimensional datasets with complex structures have become increasingly prevalent in various fields, including healthcare, finance, and social sciences. Extracting valuable insights from such data is essential for scientific discovery and decision-making. However, the publication of these datasets is full of privacy concerns, as they often contain sensitive and personally identifiable information. In this paper, we introduce a novel approach that addresses the delicate balance between data privacy and the exploration of high-dimensional data’s underlying structure. We leverage the power of persistent homology, a topological data analysis method, to unveil hidden patterns and captures the persistent topological features of the data, allowing us to study its shape and structure across different scales. Adding noise into the low dimensional embedding and provide private persistence diagram with differential privacy, offers a rigorous and well-establish
ed framework to ensure that individuals’ privacy in the dataset is protected. We synthetically generate high-dimensional data with a focus on differential privacy-preserved persistence diagrams, ensuring privacy in our publication of the synthesized dataset. We conduct extensive experiments on three real-world datasets and the experimental results demonstrate that our mechanism can significantly improve the data structure of the published data while satisfying differential privacy.
(More)