0% found this document useful (0 votes)
66 views30 pages

Privacy Enhancement Computation

This document discusses privacy-enhancing computation (PEC) and technologies. It begins by explaining that PEC aims to protect private data through various hardware and software solutions while still allowing useful data insights. It then provides details on three forms of PEC according to Gartner: 1) trusted environments for secure processing, 2) privacy-aware machine learning, and 3) data and algorithm transformations using techniques like homomorphic encryption. The document emphasizes that organizations need PEC to avoid privacy risks and comply with regulations. It also examines how technology can enhance privacy through secure access and data tracking. Finally, common PEC technologies like homomorphic encryption, secure multi-party computation, and differential privacy are outlined.

Uploaded by

Theertha S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views30 pages

Privacy Enhancement Computation

This document discusses privacy-enhancing computation (PEC) and technologies. It begins by explaining that PEC aims to protect private data through various hardware and software solutions while still allowing useful data insights. It then provides details on three forms of PEC according to Gartner: 1) trusted environments for secure processing, 2) privacy-aware machine learning, and 3) data and algorithm transformations using techniques like homomorphic encryption. The document emphasizes that organizations need PEC to avoid privacy risks and comply with regulations. It also examines how technology can enhance privacy through secure access and data tracking. Finally, common PEC technologies like homomorphic encryption, secure multi-party computation, and differential privacy are outlined.

Uploaded by

Theertha S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

INTRODUCTION

Data is at the core of all businesses, the most valuable asset in the current age. As its
value keeps increasing, keeping this asset safe becomes the highest priority for
organizations. Data needs to be managed, processed, and analyzed to glean valuable
insights. Since data is sensitive, it can also be abused by malicious attackers.
However, considering the tremendous volume of data that needs to be safeguarded
for privacy, data governance and technology integrations, the task becomes
herculean. Though there are privacy legislations such as GDPR in the EU and several
others around the globe, privacy breaches occur very often. One of the main reasons
this is happening is because of business transactions happening through third parties,
to gain insights to improve their services, to get some valuable data, or maybe just to
earn additional money by using any available data.

Consumers are becoming increasingly concerned about sharing their personal data as
they find it difficult to track who uses them, the manner it is used and mainly who is
responsible for handling them. Since cyberattacks are on the rise, and attackers using
more complex techniques to access data, the use of privacy-enhancing computation
(PEC) and technologies (PET) have become a crucial security measure for
organizations. PEC is one of the leading Gartner strategic technology trends. With the
adoption of technologies like AI, organizations today can process increasingly
complex and growing data in a structured, controlled, and protected manner.
Enterprises having a well-defined roadmap for PEC and technologies implementation
are expected to minimize the exposure to potential attacks and enable secured data
usage.
PRIVACY-ENHANCING COMPUTATION
What is Privacy-enhancing computation?
Though there is no standard definition there yet, we could say that privacy enhancing
computation is a group of various technologies that help to achieve the highest level
of private data protection. This group of technologies support privacy and data
protection and provide safeguards against violations and hacker attacks. The solutions
can be hardware and software designed to glean valuable data to use for various
purposes while building a robust and secure foundation.
These technologies have been around for some time but it’s only recently they have
been used for real-life applications and use cases. Gartner has classified PEC in the
“people centricity” category and according to the research and consulting company,
PEC has three forms, comprising of three technologies to protect data.
1. The first form involves technology providing a trusted environment where data
can be processed securely. To enable this there are trusted third parties and
hardware trusted execution environments.
2. The second form involves processing and analytics through privacy-aware
machine learning. The technologies leveraged in this form consists of federated
machine learning and privacy-aware machine learning.
3. The third form of PEC consists of technology enabling data and algorithms
transformation. These include homomorphic encryption to keep the data
confidential, multiparty computation, differential privacy, private set
intersection, among others.
Why do organizations need to implement privacy-
enhancing computation (PECs)?

The primary reason why organizations should adopt PECs is to avoid having
any possible risk to the privacy of the consumers. When any user enters their
personal data into any application, website, account, or any other form, they
would want to ensure the data is kept private and used only for the intended
purpose. Enterprises lacking a proper tried, and testing process for the
protection of data offers an easy opportunity for malicious attackers to misuse
the data. This poses a huge threat to users’ privacy and in turn affects the
credibility, reputation, trust, and confidence of people in the actions of the
organization. So, companies should ensure they have full control in managing
this information.
Moreover, with the rise of data protection laws around the globe such as
GDPR and CCPA, it will become mandatory for organizations to set up
processes and take measures to protect consumer data. Otherwise,
organizations may incur huge financial loss from data breaches and severe
fines as penalties. As per the DLA Piper GDPR Data Breach Survey 2020
report, GDPR fines incurred by organizations are estimated to be over
US$126 million from May 2018 to January 2020. This can significantly affect
any enterprise’s financials.
How Does Technology Enhance Privacy?

Technology enhances privacy by allowing secure access to client data. A

good example of this is AI companies that need protected access to client data

to build machine learning models. Privacy-enhancing technologies (PETs)

are the only secure way to achieve this whilst simultaneously allowing

businesses to utilize and commercialize accumulating non-sensitive data.

Privacy-enhancing technologies not only change the accessibility of

information but work to change privacy standards as well. For consumers,

Innovative technology allows everyday users to take swift action and secure

personal information that could have otherwise been sent to third parties. For

businesses, privacy-enhancing technologies allow them to track their data

flows, including transferred data that captures when, who, and the conditions

of transfer.
Privacy- Enhancing Computation
Technologies
Privacy-enhancing technologies (PETs) are a broad range of technologies
(hardware or software solutions) that are designed to extract data value in
order to unleash its full commercial, scientific and social potential, without
risking the privacy and security of this information.

Some of the common privacy-enhancing technologies are:

Cryptographic algorithms

1. Homomorphic Encryption: Homomorphic encryption is an encryption method


that enables computational operations on encrypted data. It generates an
encrypted result which, when decrypted, matches the result of the operations as
if they had been performed on unencrypted data (i.e. plaintext). This enables
encrypted data to be transferred, analysed and returned to the data owner who
can decrypt the information and view the results on the original data.
Therefore, companies can share sensitive data with third parties for analysis
purposes. It is also useful in applications that hold encrypted data in cloud
storage. Some common types of homomorphic encryption are:
o Partial homomorphic encryption: can perform one type of operation on
encrypted data, such as only additions or only multiplications but not
both.
o Somewhat homomorphic encryption: can perform more than one type of
operation (e.g. addition, multiplication) but enables a limited number of
operations.
o Fully homomorphic encryption: can perform more than one type of
operation and there is no restriction on the number of operations
performed.

This is one of the example of homomorphic encryption.

2. Secure multi-party computation (SMPC): Secure multi-party computation is


a subfield of homomorphic encryption with one difference: users are able to
compute values from multiple encrypted data sources. Therefore, machine
learning models can be applied to encrypted data since SMPC is used for a larger
volume of data.
• The secure multiparty computation is used for the utilization of data without
compromising privacy.
• It is the cryptographic subfield that helps in preserving the privacy of the data.
• Emerging technologies like blockchain, mobile computing, IoT, cloud
computing has resulted in the rebirth of secure multiparty computation.
• Secure multiparty computation has become the hot area of research in the last
decade due to the rise of blockchain technology.
• The researchers are now more interested to implement secure multiparty
computation in distributed systems.
• Unlike in centralized systems, secure multiparty computation may have better
performance in distributed systems.

Example
Suppose we want to compute the average salary among three employees without
revealing the actual salary, for such problems one can use secure multiparty
computation. Let’s take an example-

Mathematical representation of the problem can be given as:


F(A, B, C) = Average (A, B, C)
Sam, Bob, and Cassy want to calculate their average salary.
1. Say Sam’s salary is $40k. Using additive sharing, $40k is split into randomly
generated three pieces $44k, $-11k, and $7k.
2. Sam keeps one of these secret pieces with herself and distributes the other
two to each.
3. The same procedure is followed by all three.
4. Secret sharing keeps the data in encrypted form when in use. The procedure is
given below-
Sam Bob Cassy
44 -11 7 $40
-6 32 24 $50
20 0 40 $60

Total salary = $150


Average Salary = 150/3
= $50
From the above data shared there is no clue about the actual salary, but the
average salary is being calculated.

3. Differential privacy: Differential privacy protects from sharing any


information about individuals. This cryptographic algorithm adds a “statistical
noise” layer to the dataset which enables to describe patterns of groups within
the dataset while maintaining the privacy of individuals. differential privacy
forms data anonymous via injecting noise into the dataset studiously. It allows
data experts to execute all possible (useful) statistical analysis without
identifying any personal information. These datasets contain thousands of
individual’s information that helps in solving public issues and confine
information about the individual themselves.
Differential privacy can be applied to everything from recommendation
systems & social networks to location-based services. For example,

• Apple employs differential privacy to accumulate anonymous usage insights


from devices like iPhones, iPads and Mac.
• Amazon uses differential privacy to access user’s personalized shopping
preferences while covering sensitive information regarding their past
purchases.
• Facebook uses it to gather behavioural data for target advertising campaigns
without defying any nation’s privacy policies.

4. Zero-knowledge proofs (ZKP): Zero-knowledge proofs uses a set of


cryptographic algorithms that allow information to be validated without
revealing data that proves it. Zero-knowledge protocols are probabilistic
assessments, which means they don’t prove something with as much certainty as
simply revealing the entire information would. They provide un linkable
information that can together show the validity of the assertion is probable.
Currently, a website takes the user password as an input and then compares its
hash to the stored hash. Similarly a bank requires your credit score to provide
you the loan leaving your privacy and information leak risk at the mercy of the
host servers. If ZKP can be utilized, the client’s password is unknown the to
verifier and the login can still be authenticated.
Ali Baba Cave Example
The ‘Ali Baba Cave’ example is the most common zero knowledge proof
example that showcases the logic used in the ZKP cryptographic algorithm. In
the example, you have to assume two characters, namely Tina and Sam. Both
Tina and Sam are on an adventure and end up at a cave. They find two different
entrances to two distinct paths, namely A and B. There is another door inside the
cave, which helps in connecting both paths.
However, Sam knows the secret code to open the door and is therefore taking
over the role of ‘tester.’ On the other hand, Tina wants to purchase the code, and
she takes on the role of verifier. Tina wants to verify that Sam actually knows
the secret code to open the door and is not lying. So, we can clearly observe the
roles of ‘prover/tester’ and ‘verifier’ in the ZKP example.
Now, the parties involved in the ZKP transaction want to achieve their respective
goals. For this, Sam must prove to Tina that he knows the code without actually
revealing the contents in the code

Types of Zero Knowledge Proof :

1. Interactive Zero Knowledge Proof –


It requires the verifier to constantly ask a series of questions about the
“knowledge” the prover possess. The above example of finding Waldo is
interactive since the “prover” did a series of actions to prove the about the
soundness of the knowledge to the verifier.

2. Non-Interactive Zero Knowledge Proof –


For “interactive” solution to work, both the verifier and the prover needed to
be online at the same time making it difficult to scale up on the real world
application. Non-interactive Zero-Knowledge Proof do not require an
interactive process, avoiding the possibility of collusion. It requires picking a
hash function to randomly pick the challenge by the verifier. In 1986, Fiat
and Shamir invented the Fiat-Shamir heuristic and successfully changed the
interactive zero-knowledge proof to non-interactive zero knowledge proof.

Data masking techniques


Some privacy enhancing technologies are also data masking techniques that
are used by businesses to protect sensitive information in their data sets.

1. Obfuscation: This one is a general term for data masking that contains
multiple methods to replace sensitive information by adding distracting
or misleading data to a log or profile. Obfuscating also hides personal
information or sensitive data through computer algorithms and masking
techniques. This technique can also involve adding misleading or
distracting data or information so it's harder for an attacker to obtain the
needed data.

For example:

Below is an obfuscated C code:

int i;main(){for(i=0;i["]<i;++i){--i;}"];

read('-'-'-',i+++"hell\

o,world!\n",'/'/'/'));}read(j,i,p){

write(j/p+p,i---j,i/i);}

Here is the de obfuscated version which a person can understand.

int i;

void write_char(char ch)

printf("%c", ch);

int main()

for (i = 0; i < 15; i++) {

write_char("hello, world!\n"[i]);

}
return 0;

How to obfuscate code in apps?


To understand obfuscation, we need to know how Android and Java implement this
in-app formation. There are two ways to obfuscate code in apps:

• Shrinking: It helps detect and safely remove unused classes, fields, methods,
and attributes from the app’s release build.
• Optimization: It helps in inspecting and rewriting the code to reduce its size.
For example, if an optimizer detects an if-else statement in which the else {}
statement is never used, the code for the else statement is removed. Examples
of code shrinkers and optimizers are ProGuard for both Java and Android and
R8 for Android.

2.Pseudonymization: Pseudonymization is a method that allows you to switch the


original data set (for example, e-mail or a name) with an alias or pseudonym. It is a
reversible process that de-identifies data but allows the re-identification later on if
necessary. This is a well-known data management technique highly recommended by
the General Data Protection Regulation (GDPR) as one of the data protection
methods.

How is pseudonymization used in data protection?

Pseudonymization makes personal data processing easier, reducing the risk of


exposing sensitive data to unauthorized personnel and employees.
For example, when sending excel sheets containing sensitive data via e-mail.
Although the sender and receiver of the e-mails are authorized to access that
information, your IT support also has access to those e-mails. When the data is
pseudonymized, there is a lot less chance of exposing personal data, since
it makes the data record unidentifiable while remaining suitable for data
processing and data analysis.
Why should you opt for pseudonymization?

In the everyday operations of any business, a lot of sensitive data goes through
HR, marketing, or IT departments, and pseudonymization can help you lower
the risk and avoid any possible data breach.

3. Data minimisation: Collecting minimum amount of personal data that enables .


the business to provide the elements of a service. The principle of data
minimization involves limiting data collection to only what is required to fulfill a
specific purpose. When an organization applies data minimization, any processing
(the analysis of data to produce meaningful insight) they do will only use the least
amount of data necessary. Also, the data collected should not be used for any other
purpose or process without consent from the data subject (the individual from whom
the data was collected). Example: the purpose of collecting biometric data as part of a
fingerprint check at the entrance of a building is to prevent unauthorized persons
from entering.

4. Communication anonymizers: Anonymizers replace online identity (IP address,


email address) with disposal/one-time untraceable identity. It accesses the Internet on
the user's behalf, protecting personal information of the user by hiding the client
computer's identifying information.

Purposes

• There are many reasons for using anonymizers, such as minimizing risk,
prevention of identity theft, or protecting search histories from public
disclosure.
• Some countries apply heavy censorship on the internet. Anonymizers can help
to allow free access to all of the internet content, but they cannot help against
persecution for accessing the anonymizer website itself. Furthermore, as
information itself about anonymizer websites are banned in those countries,
users are wary that they may be falling into a government-set trap.[10]
• Anonymizers are also used by people who wish to receive objective
information with the growing target marketing on the internet and targeted
information. For example, large news outlets such as CNN target the viewers
according to region and give different information to different populations.
Websites such as YouTube obtain information about the last videos viewed on
a computer, and they propose "recommended" videos accordingly, and most of
the online targeted marketing is done by showing advertisements according to
that region. Anonymizers are used for avoiding that kind of targeting and
getting a more objective view of information.
• For building a reliable anonymous system, anonymous proxy signatures are
helpful. It can be used in anonymous voting or other authentication processes
that value anonymity.

With the help of AI & ML algorithms

1. Synthetic data generation: Synthetic data is an artificially created data by


using different algorithms including ML algorithms. If you are interested in
privacy-enhancing technologies because you need to transform your data into a
testing environment where third-party users have access, generating synthetic
data that has the same statistical characteristics is a better option.

Why is synthetic data important for businesses?

Synthetic data is important for businesses due to three reasons: privacy, product
testing and training machine learning algorithms.

How do businesses generate synthetic data?

Businesses can prefer different methods such as decision trees, deep learning
techniques, and iterative proportional fitting to execute the data synthesis process.
They should choose the method according to synthetic data requirements and the
level of data utility that is desired for the specific purpose of data generation.
After data synthesis, they should assess the utility of synthetic data by comparing
it with real data. The utility assessment process has two stages:

• General purpose comparisons: Comparing parameters such as distributions and


correlation coefficients measured from the two datasets
• Workload-aware utility assessment: comparing the accuracy of outputs for the
specific use case by performing analysis on synthetic data

Techniques to Generate Synthetic Data


Three common techniques for generating synthetic data are:

➢ Drawing Numbers From a Distribution

In contrast to more advanced, machine learning-based approaches, a popular


technique for generating synthetic data can be to simply draw, or sample
numbers from a distribution. While this approach does not capture the
insights of real-world data, it can create a distribution of data that follows a
curve that is loosely based on real-world data.

For this example, we will use Python and NumPy


library’s numpy.random.randn() function to create a set of four datasets using a
“normal” distribution of variables, each with a slight change to the centerpoint.

➢ Agent-Based Modeling

Agent-based modeling (ABM) is a simulation technique where individual


agents are created that interact with each other. These techniques are
particularly useful to examine interactions between agents (e.g., people, cells,
or even computer programs) in a complex system. Python packages such
as Mesa make it easy to quickly create agent-based models using built-in core
components, and to visualize them in a browser-based interface.
➢ Generative Models

Generative modelling is one of the most advanced techniques used to create


synthetic data. It can be described as an unsupervised learning task that
involves automatically discovering and learning the insights and patterns in
data in such a way that the model can be used to output new examples that
match the same distribution as the real-world data it was trained on.

Training generative models often starts with gathering a large amount of data
in a particular domain (e.g., images, natural language text, tabular data), and
then training a model to generate more data like it. The generative models
described below have different architectures, but are all based on neural
networks - and fundamentally leverage the same approach of utilizing a
number of parameters smaller than the input data they were trained on.

2. Federated learning: Federated learning is a machine learning method that


enables machine learning models obtain experience from different data sets
located in different sites (e.g. local data centers, a central server) without
sharing training data. This allows personal data to remain in local sites,
reducing possibility of personal data breaches.

Why is it important now?

Accurate machine learning models are valuable to companies and traditional


centralized machine learning approaches have shortcomings like lack of
continual learning on edge devices and aggregating private data on central
servers. These are alleviated by federated learning.

In traditional machine learning, a central ML model is built using all available


training data in a centralized environment. This works without any issues when
a central server can serve the predictions.

However, in mobile computing, users demand fast responses and the


communication time between user device and a central server may be too slow
for a good user experience. To overcome this, the model may be placed in the
end user device but then continual learning becomes a challenge since models
are trained on a complete data set and the end user device does not have access
to the complete dataset.

Another challenge with traditional machine learning is that user’s data gets
aggregated in a central location for machine learning training which may be
against the privacy policies of certain countries and may make the data more
vulnerable to data breaches

Federated learning overcomes these challenges by enabling continual learning


on end-user devices while ensuring that end user data does not leave end-user
devices.
What are the benefits of federated learning?
Federated learning is an emerging area in machine learning domain and it
already provides significant benefits over traditional, centralized machine learning
approaches. The benefits of federated learning are

• Data security: Keeping training dataset on the devices, so a data pool is not
required for the model.
• Data diversity: Challenges other than data security such as network
unavailability in edge devices may prevent companies from merging datasets
from different sources. Federated learning facilitates access to heterogeneous
data even in cases where data sources can communicate only during certain
times
• Real time continual learning: Models are constantly improved using client
data with no need to aggregate data for continual learning.
• Hardware efficiency: This approach uses less complex hardware, because
federated learning models do not need one complex central server to analyze
data
BENEFITS OF PRIVACY-ENHANCING
COMPUTATION
Here are some of the benefits of enabling privacy-enhancing computation.

1. Harm prevention
When there is no protection against the prevention of privacy data breaches,
malicious users can gain easy access to information without any permission. This can
be various types of information such as data from social media accounts, cloud stores,
bank details, among others. A Data breach can affect the privacy of the users and harm
their lives for a long time. PECs are capable of shielding access to sensitive information
and ensures that a mandatory set of permissions are enabled to protect and gain access
to sensitive information.

2. Tackling undetermined and unfair conditions


It is difficult to track activities performed by third-party providers and how they are
using the sensitive consumer data. Agreed, there are terms and conditions and privacy
policies, but there is no way to ensure the policy rules are followed. This is where data
protection laws and government regulations can help users, as the violations can be
challenged.

3. Avoid possibilities of misrepresentation


Personal data disclosure can compromise sensitive data and it can be used by malicious
users to do harm to individuals. Information can be misrepresented or changed for
instance it can be published representing another person. PEC ensures that such
interpretation of data does not affect the authenticity of the original person, identity
and interest of the individual, even if the data is misrepresented or used for different
purposes.
4. Avoiding violation of human dignity
When there is a lack of privacy, it can present a perfect arsenal for users with malicious
intent to misuse information and may change views or decisions of the original person,
making them appear out of character. This can create problems like misjudgements of
people in real life, violating their dignity. PECs can help avoid such situations.
USE CASES OF PRIVACY-ENHANCING
COMPUTATION
Use cases of Privacy-enhancing computation:

1. HR: The benefit of PEC in the Human Resources Department can be providing
gender equality and decreasing the gender pay gap in the company.

2. Fraud precluding: Fraudsters are known to oppress specific enterprises and


numerous firms in that industry. Firms can now work together by using PEC to catch
criminals fast. Also, good consumers can be recognized when they unite to set a pool
of authorized customers.

3. Medical analysis: In a pandemic year, it is comprehensible why the medical


associations must draw large quantities of data, even across rules and regulations,
for analysis. Many limitations adequately secure patient data. The PEC procedure
makes actionable patient records both private and accessible.

4. Financial transactions: Financial institutions are responsible for protecting the


privacy of the customers due to citizens’ freedom to conduct private deals and
transactions with other parties.

5. Facilitating data transfer between multiple parties including intermediaries:


For businesses that work as a middle man between two parties, the usage of PECs is
crucial since these businesses are responsible for protecting the privacy of both
parties’ information.
CONCLUSION
The volume of data being processed on the web is huge and it’s growing at an
exponential rate, every second. When individuals are asked to fill in a form to
download gated content or register for availing service or product, they would want
to be sure that this data will not be misrepresented, misused, published, or stolen by
people with malicious intent. So, organizations should start leveraging a wide range
of privacy-enhancing computation technologies to protect consumer data in different
ways. Some of these techniques and technologies safeguard individual data, while
others protect huge volumes of information. These technologies even increase the
level of anonymity of users and secure their personal data. When customers give their
data for services and goods, they expect these companies will secure their data. They
want to stay unidentified. But firms want actionable industry data to help expand
their businesses. Fortunately, there are techniques to use all this customer data in
secure ways with Privacy-Enhancing Computing (PEC).
BIBILIOGRAPHY
REFERENCES

➢ Introduction to Privacy Enhancing Technologies: A Classification-Based


Approach to Understanding PETs by Carlisle Adams, 30 October 2021.

➢ https://www.altamira.ai/blog/privacy-enhancing-computation/

➢ https://10xds.com/blog/cyber-security/privacy-enhancing-computation/

➢ https://research.aimultiple.com/synthetic-data-generation/

➢ https://101blockchains.com/zero-knowledge-proof-example/

➢ https://research.aimultiple.com/privacy-enhancing-technologies/

➢ https://www.geeksforgeeks.org/what-is-obfuscation/

➢ https://www.geeksforgeeks.org/what-is-secure-multiparty-computation/

You might also like