CSC 303 Data Protection Techniques Notes

The document discusses various stakeholders involved in data protection, including companies, customers, and government regulations, emphasizing the importance of safeguarding sensitive data to prevent legal and reputational damage. It outlines techniques for protecting personal information, such as anonymization, cryptography, and tokenization, while highlighting the challenges of balancing data privacy with utility. Additionally, it differentiates between privacy and anonymity, and presents methods like data masking and pseudonymization to enhance data security.

Uploaded by

Covenant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views6 pages

CSC 303 Data Protection Techniques Notes

Uploaded by

Covenant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

CSC 303 DATA PROTECTION TECHNIQUES NOTES

There are many stakeholders of data privacy in an organization; these are shown in
Figure 1.1. below. Let us define these stakeholders

Company: Any organization like a bank, an insurance company, or an e-commerce,

retail, healthcare, or social networking company that holds large amounts of
customer-specific data. They are the custodians of customer data, which are
considered very sensitive, and have the responsibility of protecting the data at all
costs. Any loss of these sensitive data will result in the company facing legal suits,
financial penalties, and loss of reputation.
Customer/record owner: An organization’s customer could be an individual or
another organization who share their data with the company. For example, an
individual shares his personal information, also known as PII, such as his name,
address, gender, date of birth, phone numbers, e-mail address, and income with a
bank. PII is considered sensitive as any disclosure or loss could lead to undesired
identification of the customer or record owner.
Government: Government defines what data protection regulations that
the company should comply with. Examples of such regulations are the
HIPPA Act, the EU Data Protection Act, and the Swiss Data Protection Act.
It is mandatory for companies to follow government regulations on data protection.
Data anonymizer: A person who anonymizes and provides data for analysis or as test
data.
Data analyst: This person uses the anonymized data to carry out data mining
activities like prediction, knowledge discovery, and so on. Following government
regulations, such as the Data Moratorium Act, only anonymized data can be used for
data mining. Therefore, it is important that the provisioned data support data mining
functionalities.
Tester: Outsourcing of software testing is common among many companies. High-
quality testing requires high-quality test data, which is present in production systems
and contains customer-sensitive information. In order to test the software system, the
tester needs data to be extracted from production systems, anonymized, and
provisioned for testing. Since test data contain customer-sensitive data, it is
mandatory to adhere to regulatory compliance in that region/country.
Business operations employee: Data analysts and software testers use anonymized
data that are at rest or static, whereas business operations employees access
production data because they need to support customer’s business requirements.

Adversary/data snooper: Data are precious and their theft is very common. An
adversary can be internal or external to the organization. The anonymization design
should be such that it can thwart an adversary’s effort to identify a record owner in
the database.
What constitutes personal information?
Personal information consists of name, identifiers like social security number,
geographic and demographic information, and general sensitive information, for
example, financial status, health issues, shopping patterns, and location data. Loss of
this information means loss of privacy—one’s right to freedom from intrusion by
others.

Protecting Sensitive Data

“I know where you were yesterday!” Google knows your location when you use
Google Maps. Google maps can track you wherever you go when you use it on a smart
phone. Mobile companies know your exact location when you use a mobile phone. You
have no place to hide. You have lost your privacy. This is the flip side of using devices
like smart phones, Global positioning systems (GPS), and radio frequency
identification (RFID). Why should others know where you were yesterday? Similarly,
why should others know your health issues or financial status? All these are sensitive
data and should be well protected as they could fall into the wrong hands and be
exploited

Let us look at a sample bank customer and an account table. The customer table
taken as such has nothing confidential as most of the information contained in it is
also available in the public voters database and on social networking sites like
Facebook. Sensitiveness comes in when the customer table is combined with an
accounts table. A logical representation of Tables 1.1 and 1.2 is shown in Table 1.3.
Data D in the tables contains four disjointed data sets:
1. Explicit identifiers (EI): Attributes that identify a customer (also called record
owner) directly. These include attributes like social security number (SSN), insurance
ID, and name.
2. Quasi-identifiers (QI): Attributes that include geographic and demographic
information, phone numbers, and e-mail IDs. Quasi identifiers are also defined as
those attributes that are publicly available, for example, a voters database.
3. Sensitive data (SD): Attributes that contain confidential information about the
record owner, such as health issues, financial status, and salary, which cannot be
compromised at any cost.
4. Nonsensitive data (NSD): Data that are not sensitive for the given context.

The first two data sets, the EI and QI, uniquely identify a record owner and when
combined with sensitive data become sensitive or confidential. The data set D is
considered as a matrix of m rows and n columns. Matrix D is a vector space where
each row and column is a vector
D = [DEI] [DQI] [DSD]
Each of the data sets, EI, QI, and SD, are matrices with m rows and i, j, and k
columns, respectively. We need to keep an eye on the index j (representing QI), which
plays a major role in keeping the data confidential. Apart from assuring their
customers’ privacy, organizations also have to comply with various regulations in that
region/country, as mentioned earlier. Most countries have strong privacy laws to
protect citizens’ personal data. Organizations that fail to protect the privacy of their
customers or do not comply with the regulations face stiff financial penalties, loss of
reputation, loss of customers, and legal issues. This is the primary reason
organizations pay so much attention to data privacy. They find themselves in a Catch-
22 as they have huge amounts of customer data, and there is a compelling need to
share these data with specialized data analysis companies. Most often, data
protection techniques, such as cryptography and anonymization, are used prior to
sharing data.
Anonymization is a process of logically separating the identifying information (PII)
from sensitive data. Referring to Table 1.3, the anonymization approach ensures that
EI and QI are logically separated from SD. As a result, an adversary will not be able to
easily identify the record owner from his sensitive data. This is easier said than done.
How to effectively anonymize the data?

Privacy and Anonymity: Two Sides of the Same Coin

This brings up the interesting definition of privacy and anonymity. According to
Skopek [1], under the condition of privacy, we have knowledge of a person’s identity,
but not of an associated personal fact, whereas under the condition of anonymity, we
have knowledge of a personal fact, but not of the associated person’s identity. In this
sense, privacy and anonymity are flip sides of the same coin. Tables 1.4 and 1.5
illustrate the fundamental differences between privacy and anonymity.
There is a subtle difference between privacy and anonymity. The word privacy is also
used in a generic way to mean anonymity, and there are specific use cases for both of
them. Table 1.4 illustrates an anonymized table where PII is protected and sensitive
data are left in their original form. Sensitive data should be in original form so that the
data can be used to mine useful
knowledge.
Anonymization is a two-step process: data masking and de-identification. Data
masking is a technique applied to systematically substitute, suppress, or scramble
data that call out an individual, such as names, IDs, account numbers, SSNs, etc.
Masking techniques are simple techniques that perturb original data. De-identification
is applied on QI fields. QI fields such as date of birth, gender, and zip code have the
capacity to uniquely identify individuals. Combine that with SD, such as income, and a
Warren Buffet or Bill Gates is easily identified in the data set. By de-identifying, the
values of QI are modified carefully so that the relationship is till maintained by
identities cannot be inferred.
In Equation 1.1, the original data set is D which is anonymized, resulting in data set
D′ = T(D) or T([DEI][DQI][DSD]), where T is the transformation function. As a first step
in the anonymization process, EI is completely masked and no longer relevant in D′.
As mentioned earlier, no transformation is applied on SD and it is left in its original
form.
This results in D′ = T([DQI]), which means that transformation is applied only on QI as
EI is masked
and not considered as part of D′ and SD is left in its original form. D′ can be shared as
QI is transformed and SD is in its original form but it is very difficult to identify the
record owner. Coming up with the transformation function is key to the success of
anonymization design and this is nontrivial.
The other scenario is protecting SD, as shown in Table 1.5, which is applied on data in
motion. The implementation of this is also very challenging. It is dichotomous as
organizations take utmost care in protecting the privacy of their customers’ data, but
the same customers provide a whole
lot of personal information when they register on social network sites like Facebook
(of course, many of the fields are not mandatory but most people do provide sufficient
personal information), including address, phone numbers, date of birth (DOB), details
of education and qualification, work
experience, etc. Sweeney [2] reports that zip code, DOB, and gender are sufficient to
uniquely identify 83% of population in the United States. With the amount of PII
available on social networking sites, a data snooper with some background knowledge
could use the publicly available information to re-identify customers in corporate
databases. In the era of social networks, de-identification becomes highly challenging.

Methods of Protecting Data

One of the most daunting tasks in information security is protecting sensitive data in
enterprise applications, which are often complex and distributed. What methods are
available to protect sensitive data? Some of the methods available are cryptography,
anonymization, and tokenization, which are briefly discussed in this section, and a
detailed coverage is provided in the other chapters
in the book. Of course, there are other one-way functions like hashing.
Cryptographic techniques are probably one of the oldest known techniques for data
protection. When done right, they are probably one of the safest techniques to protect
data in motion and at rest. Encrypted data have high protection, but are not readable,
so how can we use such data? Another
issue associated with cryptography is key management. Any compromise of key
means complete loss of privacy.
Anonymization is a set of techniques used to modify the original data in such a
manner that it does not resemble the original value but maintains the semantics and
syntax. Regulatory compliance and ethical issues drive the need for anonymization.
The intent is that anonymized data can be shared
freely with other parties, who can perform their own analysis on the data.
Anonymization is an optimization problem, in that when the original data are modified
they lose some of its utility. But modification of the data is required to protect it. An
anonymization design is a balancing act between data privacy and utility. Privacy
goals are set by the data owners, and utility goals are set by data users. Now, is it
really possible to optimally achieve this balance between privacy and utility?
Tokenization is a data protection technique that has been extensively used in the
credit card industry but is currently being adopted in other domains as well.
Tokenization is a technique that replaces the original sensitive data with nonsensitive
placeholders referred to as tokens. The fundamental difference
between tokenization and the other techniques is that in tokenization, the original
data are completely replaced by a surrogate that has no connection to the original
data. Tokens have the same format as the original data. As tokens are not derived
from the original data, they exhibit very powerful data protection features. Another
interesting point of tokens is, although the token is usable
within its native application environment, it is completely useless elsewhere.
Therefore, tokenization is ideal to protect sensitive identifying information.
For some time, the middle ground has been to use lighter privacy protection
mechanisms, mechanisms such as data masking or pseudonymization. These
processes aim at protecting data by removing or altering its direct, sometimes
indirect, identifiers. It's quite frequent to see the term "anonymization" in references
to these methods. However, the two have clear legal and technical implications.
Pseudonymization, or data masking, is commonly used to protect data privacy. It
consists of altering data, most of the time, direct identifiers, to protect individuals'
privacy in the datasets. There are several techniques to produce pseudonymized data:
 Encryption: hiding sensitive data using a cipher protected by an encryption key.
 Shuffling: scrambling data within a column to disassociate its original other
attributes.
 Suppression: nulling or removing from the dataset the sensitive columns.
 Redaction: masking out parts of the entirety of a column’s values

Atlantis Rising Magazine #25
100% (4)
Atlantis Rising Magazine #25
84 pages
Total Design Manuall
No ratings yet
Total Design Manuall
313 pages
Chapter 1-Introduction To Data Privacy
No ratings yet
Chapter 1-Introduction To Data Privacy
78 pages
Coroneos' 100 Integrals
100% (1)
Coroneos' 100 Integrals
92 pages
New Static Data Anonymization On Multidimensional Data 19-02-2024
No ratings yet
New Static Data Anonymization On Multidimensional Data 19-02-2024
71 pages
CIPT Onl Mod4Transcript PDF
No ratings yet
CIPT Onl Mod4Transcript PDF
16 pages
DLL Matatag Week 5 Pe and Health
No ratings yet
DLL Matatag Week 5 Pe and Health
14 pages
WINSEM2024-25 BCSE318L TH VL2024250501573 2024-12-13 Reference-Material-I
No ratings yet
WINSEM2024-25 BCSE318L TH VL2024250501573 2024-12-13 Reference-Material-I
59 pages
Chapter 3
No ratings yet
Chapter 3
75 pages
VenkatramanSection1.4 1.7
No ratings yet
VenkatramanSection1.4 1.7
40 pages
Information Security
No ratings yet
Information Security
42 pages
Data Provacy m1 Source2
No ratings yet
Data Provacy m1 Source2
91 pages
IFT-520-ResearchPaper Pranjal Mallela RadhakrishnanNair Group48
No ratings yet
IFT-520-ResearchPaper Pranjal Mallela RadhakrishnanNair Group48
22 pages
Data Anonymization Process Challenges and Context Missions
No ratings yet
Data Anonymization Process Challenges and Context Missions
10 pages
Anonymisation of Data VIAL
No ratings yet
Anonymisation of Data VIAL
20 pages
9.structural Behaviour and Design Criteria of Concrete Box-Girder Bridges - JRC
No ratings yet
9.structural Behaviour and Design Criteria of Concrete Box-Girder Bridges - JRC
16 pages
Chapter 2 - Highway Materials
No ratings yet
Chapter 2 - Highway Materials
50 pages
Application of Data Masking in Achieving Information Privacy
No ratings yet
Application of Data Masking in Achieving Information Privacy
9 pages
Chapter 4
No ratings yet
Chapter 4
71 pages
Privacy-Preserving Data Mining: Methods, Metrics, and Applications
No ratings yet
Privacy-Preserving Data Mining: Methods, Metrics, and Applications
21 pages
14 Module Six Privacy
No ratings yet
14 Module Six Privacy
45 pages
How Can We, or How Should We, Use Data: - Legal Standards
No ratings yet
How Can We, or How Should We, Use Data: - Legal Standards
16 pages
Pawar 2018
No ratings yet
Pawar 2018
6 pages
Privacy Preservation For Knowledge Discovery: A Survey: Jalpa Shah, Mr. Vinit Kumar Gupta
No ratings yet
Privacy Preservation For Knowledge Discovery: A Survey: Jalpa Shah, Mr. Vinit Kumar Gupta
8 pages
Samarati 2
No ratings yet
Samarati 2
35 pages
9 Privacyregulation
No ratings yet
9 Privacyregulation
35 pages
Big Data Privacy
No ratings yet
Big Data Privacy
28 pages
A Review On K-Anonymization Techniques
No ratings yet
A Review On K-Anonymization Techniques
8 pages
Infromation, Control and Privacy
100% (1)
Infromation, Control and Privacy
15 pages
Chapter 4
No ratings yet
Chapter 4
73 pages
Loan Documents
No ratings yet
Loan Documents
3 pages
Data Science Ethics - Lecture 2
No ratings yet
Data Science Ethics - Lecture 2
36 pages
Project Report: A Comparative Study of Performance of Top 5 Mutual Funds in India
No ratings yet
Project Report: A Comparative Study of Performance of Top 5 Mutual Funds in India
40 pages
A Systematic Overview On Methods To Protect Sensitive Data Provided For Various Analyses
No ratings yet
A Systematic Overview On Methods To Protect Sensitive Data Provided For Various Analyses
14 pages
El Deafo Teaching Guide
75% (8)
El Deafo Teaching Guide
3 pages
Database Security Lecture3
No ratings yet
Database Security Lecture3
12 pages
CMO 88 S. 2017 BS Electrical Engineering
No ratings yet
CMO 88 S. 2017 BS Electrical Engineering
112 pages
Elements of Industrial Automation Week 08 Notes
No ratings yet
Elements of Industrial Automation Week 08 Notes
6 pages
Chapter 6
No ratings yet
Chapter 6
25 pages
HUM 4441 Lecture 9
No ratings yet
HUM 4441 Lecture 9
25 pages
Chapter 6 Data Privacy
No ratings yet
Chapter 6 Data Privacy
14 pages
Securing Sensitive Business Data in Non-Production Environment Using Non-Zero Random Replacement Masking Method
No ratings yet
Securing Sensitive Business Data in Non-Production Environment Using Non-Zero Random Replacement Masking Method
9 pages
Society IT3
No ratings yet
Society IT3
25 pages
4.data Privacy and Protection Concepts
No ratings yet
4.data Privacy and Protection Concepts
18 pages
Basic Data Privacy Concepts QA
No ratings yet
Basic Data Privacy Concepts QA
5 pages
Data Protection and Privacy
No ratings yet
Data Protection and Privacy
18 pages
CS Unit 5
No ratings yet
CS Unit 5
10 pages
CH 2
No ratings yet
CH 2
37 pages
Week 5 6
No ratings yet
Week 5 6
34 pages
Privacy Preserving Data Mining
No ratings yet
Privacy Preserving Data Mining
10 pages
2.1 Differential Privacy
No ratings yet
2.1 Differential Privacy
12 pages
Differential Privacy
No ratings yet
Differential Privacy
12 pages
3 Privacy and Freedom of Expression
No ratings yet
3 Privacy and Freedom of Expression
23 pages
CS Unit 5
No ratings yet
CS Unit 5
35 pages
Powers of Central Government Under The Environmental Protection Act 1986
No ratings yet
Powers of Central Government Under The Environmental Protection Act 1986
4 pages
Chapter 4 Privacy
No ratings yet
Chapter 4 Privacy
11 pages
A Gift of Fire: Chapter 2: Privacy
No ratings yet
A Gift of Fire: Chapter 2: Privacy
33 pages
Issues of Privacy and Databases
No ratings yet
Issues of Privacy and Databases
15 pages
Survey On Anonymization Techniques in Big Data and Privacy Models
No ratings yet
Survey On Anonymization Techniques in Big Data and Privacy Models
20 pages
Slides
No ratings yet
Slides
27 pages
Cyber Security UNIT-5
No ratings yet
Cyber Security UNIT-5
28 pages
Database Security
No ratings yet
Database Security
26 pages
Cs Unit-5
No ratings yet
Cs Unit-5
5 pages
Data Privacy
No ratings yet
Data Privacy
9 pages
Security Protection Notes
No ratings yet
Security Protection Notes
9 pages
Firmenliste Katar DT DLD
No ratings yet
Firmenliste Katar DT DLD
1 page
Protection of Personal Data
No ratings yet
Protection of Personal Data
9 pages
Privacy and Its Relation To Cloud Based Information Systems
100% (1)
Privacy and Its Relation To Cloud Based Information Systems
2 pages
Part B
No ratings yet
Part B
15 pages
Persian Farsi Language
No ratings yet
Persian Farsi Language
129 pages
t560 - Engineering Science n2 QP Nov 2015final
No ratings yet
t560 - Engineering Science n2 QP Nov 2015final
12 pages
Learner Guide CHCCCS007 - Develop and Implement Service Programs
No ratings yet
Learner Guide CHCCCS007 - Develop and Implement Service Programs
45 pages
Data Protection, Privacy and The Freedom of Information
No ratings yet
Data Protection, Privacy and The Freedom of Information
14 pages
Nedal Alloy Data
No ratings yet
Nedal Alloy Data
1 page
Modernism and Post Modernism in Literature
No ratings yet
Modernism and Post Modernism in Literature
16 pages
LDM Practicum Portfolio For School Head: 103502-San Jose Norte Elementary School
No ratings yet
LDM Practicum Portfolio For School Head: 103502-San Jose Norte Elementary School
35 pages
Sofialidis HPC Ansys Fluent 01
No ratings yet
Sofialidis HPC Ansys Fluent 01
18 pages
wph16 01 Que 20220616
No ratings yet
wph16 01 Que 20220616
20 pages
Introduction To CAM Lesson 1
No ratings yet
Introduction To CAM Lesson 1
9 pages
TDA8139
No ratings yet
TDA8139
5 pages
Ticketcreator Barcodechecker Manual: Check Secure Tickets With Barcodes
No ratings yet
Ticketcreator Barcodechecker Manual: Check Secure Tickets With Barcodes
8 pages
Resume Material of Practicality and Authenticity
No ratings yet
Resume Material of Practicality and Authenticity
2 pages
Restauración de Poblaciones de Plantas Amenazadas
No ratings yet
Restauración de Poblaciones de Plantas Amenazadas
2 pages
Power and Communication
No ratings yet
Power and Communication
14 pages
DR Lal Pathlabs: Interpretation
No ratings yet
DR Lal Pathlabs: Interpretation
2 pages
362a3322p005 1
No ratings yet
362a3322p005 1
1 page
LFP Syllabus
No ratings yet
LFP Syllabus
2 pages
Ict Ethics And Logistics: Ethical hacking, #2
From Everand
Ict Ethics And Logistics: Ethical hacking, #2
Elias Mutegi
No ratings yet
Fortify Your Data Privacy
From Everand
Fortify Your Data Privacy
Michael A Hudak
No ratings yet
Data Privacy for Everyone: A Simple Guide to Big Ideas
From Everand
Data Privacy for Everyone: A Simple Guide to Big Ideas
NOVA MARTIAN
No ratings yet

CSC 303 Data Protection Techniques Notes

Uploaded by

CSC 303 Data Protection Techniques Notes

Uploaded by

CSC 303 DATA PROTECTION TECHNIQUES NOTES

Company: Any organization like a bank, an insurance company, or an e-commerce,

Protecting Sensitive Data

Privacy and Anonymity: Two Sides of the Same Coin

Methods of Protecting Data

You might also like