Preserving and Randomizing Data Responses in Web Application Using Differential Privacy
Preserving and Randomizing Data Responses in Web Application Using Differential Privacy
ISSN No:-2456-2165
Abstract:- The Data reflects Humans as most of the data is Individual user's Data that involves any personal private
online. Privacy for individual's data has become a prime information requires differential privacy to be applied. The
concern. Securing and preserving that privacy has been a most popular definition of privacy is differential privacy,
focus for long decades. In data analytics, machine learning coined as the new mathematical term came in 2004 in data
and data science analysis over users' private data utilize to privacy. It ensures publicly visible data does not make any
understand the individual user's responses to present data changes for a single individual if there are changes in the
publicly. The issue with private data presented online that dataset. It resolves issue by adding random noise to the
consisted of personal private data was sensitive and mechanism at work. The need for an increase in adding
confidential was a significant issue, so a particular group robustness in the form of noise, maintain meaning full data
of mathematicians and cryptographers came together to pattern, the mathematically stringent definition of privacy, and
resolve this issue by introducing the concept of Differential computation that would be rich in a class of algorithms that
Privacy. satisfy the definition of differential privacy. [1]
Differential privacy is assurance over information privacy Preserving Privacy and Security guarantees the data
without damaging the chance of having privacy risk by pattern has been spinning around the compliance process and to
including some amount of Random noise in the form of perform powerful collection and curation of data to apply
robust data to the original dataset. Differential privacy is appropriate policies for user's private data in any form
also a tool with an algorithm that helps maintain Privacy
by Preserving and Randomizing data responses— Differential Privacy Guarantees:
measuring the accuracy of statistical data by performing 1. The raw data holding individual responses will not be
analysis. To Perform this process of differential privacy, unauthorized access and does not need to be modified.
IBM developed an open-sourced algorithm called 2. Maintaining an individual's privacy will be valued over
Diffprivlib[1]. With this library, the project has created a mining important details from data.
Front-End Web application that can perform data analysis 3. Manage resilience to post-processing; output generated
that involves different mechanisms, models, and Tools. from the secret differential algorithm will not affect the
differential privacy of the algorithm. In other words, the
This project is an attempt to integrate all mechanisms, data analyst that does not have additional knowledge about
models, and tools involved in DiffPrivLib[1]. The primary the dataset cannot increase privacy loss by looking and
purpose of this paper is to showcase the work on thinking at the output of the Differentially Private
differential privacy that consists in developing a user- algorithm.[2]
friendly web application that can be open-sourced. This
application is designed in a python programming package The infield of research study on Machine Learning the
and will experiment with the dataset to perform the Scope has been growing, and flooding like a tidal surge, the in-
analysis to show the impact of differential privacy depth analysis in data privacy standard has emerged to
algorithms on different values on epsilon with accuracy differential privacy algorithms in the subjects of Cryptography
and privacy. and Security. IBM has presented great work by creating the
Differential Privacy library, a general purposed and open-
Keywords:- Differential Privacy, Python Programming, Open- sourced library for investigating, experimenting, and
Source Library, Data Science, Machine Learning, Data developing differential privacy applications in a python
Analytics. programming language called "Diffprivlib." The library
includes all host mechanisms, the building blocks of
I. INTRODUCTION differential privacy[1], and several other applications to
experiment in Machine learning and other data analytics tasks.
As exposure of electronic data over the internet has This project demonstrates the idea of differential privacy in a
become specific, detailed, and abundant, maintaining individual web application that can be useful for any data analyst or
privacy has become the top priority. So, securing these personal accountant to perform analysis in the form of mathematical
private data in the datasets has been processed in mathematical computations for the supervised dataset. This application solves
computation involved in differential privacy algorithms.
V. RESOURCE REQUIREMENT SPECIFICATION Fig 4: Data Flow diagram for DP web Application
The below-given Figure 3 shows diagram explains the
B. Process Overview Steps
process utilized for developing the Robust Differential privacy 1.Account Setup: Select the account to set up a temp session
data post-processing
for a data analyst or accountant. Setup budget account
1.Select the data set that would perform sampling
2.preview from the data selected
3.select the personal private data that is to mask
4.Upload-Epsilon value by Choosing epsilon Value
2.Data summarizer: Display the data of Private and Non-
Private and perform comparative analysis method and display
in the form of a graph having checked with accuracy
1. Non-Zero Count: This shows the number of non-zero
elements for each column in the dataset. The screenshot in
the next chapter clearly shows the change in the data
distribution of select columns of the dataset involved in the
differential private methods.
2. Mean: This shows the relative change in mean values of
different columns between the non-private/standard method
and differentially private counterpart.
3. Variance: This shows the relative change in variance
Fig 3: Process flow and modules in Differential Privacy values of different columns between the non-
private/standard method and differentially private
counterpart.
VII. IMPLEMENTATION
1. Home Page is the main page for all functional modules that
are given in the below screenshot
This chapter has implementation details that include a 3. In data summarizer displays the data accuracy pre- and
list of tools, commands for installation, and framework design post-application of differential privacy algorithm; those
with a screenshot of a web application; Next chapter provides values show how the accuracy is changing with each
complete in-depth details of testing business usecase. column on different epsilon values. The step helps in easy
comparison and understanding of change inaccuracy of
VIII. ANALYSIS AND RESULTS data.
Project and analysis show that the web application built
is open source using IBM's Differential Privacy Library called
Diffprivlib[15]. The Final Outcome of the Private differential
data is in the form of having values given in the form of a
graph, and simple computational values that have compared to
present the data in between private and non-Private data
results that provide graph plots to display if the Differential
privacy is applied has some amount of data leakage in this
section. For the technical analysis performed, the
computational results below Similarly for Mean Value change in comparison to the
mean of each column denotes how much data is a change in
accuracy post-processing, and further similarly for Variance
From the example of the diabetes dataset that performs IX. CONCLUSIONS AND FUTURE SCOPE
GaussianNB, the below results are for the epsilon values.
While Epsilon (epsilon= float("inf")) is applied with different A. Conclusion
values, check iteration The primary scope of work was to develop
computational results for a dataset using a differential privacy
1. High Epsilon (i.e., greater than 1) gives better and more algorithm from IBM's general purpose and an open-source
consistent accuracy but less privacy. Here the agreement python-based library called Diffprivlib[9] for simulation,
gave 26.56%, which depicts that data confidentiality is less experimentation, and development of this web application.
if epsilon is high. Library also includes standard tools, mechanisms, and models
with the functionality of version 0.4.1 Having implemented for
classification with GaussianNB and regression- Linear and
Logistic type of model using a supervised dataset that could
display the visualization and histogram results for the different
values of epsilon. A web application developed on packages
like Numpy[12], Scikit[13], and SciPy[14]. Improving the
efficiency of the model and provide more user-friendly data
considering personal private data.
REFERENCES