0% found this document useful (0 votes)
16 views29 pages

Data Privacy

Uploaded by

Madhu Ramya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views29 pages

Data Privacy

Uploaded by

Madhu Ramya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

A NEW PRIVACY

MEASURE FOR DATA


PUBLISHING
PRESENTED
BY
R.MADHURAMYA
Department of Information Technology
Kongu Engineering College
Erode.
OBJECTIVE

 To reduce the information loss and to increase the accuracy in


privacy preserving data mining in the presence of external
databases.
 Improving the efficiency of privacy preserving data mining for
transactional database

A New Privacy Measure For Data Publishing 06/24/2024


INTRODUCTION

Data mining
 Data mining is the process of analyzing data from different
perspectives and summarizing it into useful information
 It allows users to analyze data from different dimensions or
angles
 Data mining (also known as Knowledge Discovery in
Databases) is the nontrivial extraction of implicit, previously
unknown, and potentially useful information from database.

A New Privacy Measure For Data Publishing 06/24/2024


PRIVACY PRESERVING DATA MINING

 To allow the mining of data while at the same time prohibiting


the leakage of any private and sensitive information
WHY PRIVACY?
• Companies are often willing to collaborate with other entities
who conduct similar business, towards the mutual benefit of
their businesses
• Significant knowledge patterns can be derived and shared
among the collaborative partners through the aggregate mining
of their datasets

A New Privacy Measure For Data Publishing 06/24/2024


ATTRIBUTES IN MICRODATA

 Explicit identifiers - Uniquely identify the information (Passport num,


roll no, ID no)
 Quasi identifiers – Uniquely identify 87% of information (5-digit ZIP
code, birth date, gender )
 Sensitive attributes - Carry sensitive information (Disease,salary)

identifier quasi identifiers


- sensitive
Patient ID Birthdate Sex Zipcode Disease
Patient ID Birthdate Sex Zipcode Disease
1A234
1A234 21/1/79
21/1/79 male
male 53715
53715 Flu
Flu
3D456 10/1/81 female 55410 Hepatitis
3D456
7T567 10/1/81
1/10/44 female
female 55410
90210 Hepatitis
Brochitis
9G453
7T567 21/2/84
1/10/44 male
female 02174
90210 Sprained
Brochitis
Ankle
9G453
5K098 21/2/84
19/4/72 male
female 02174
02237 Sprained
AIDS Ankle

5K098 19/4/72 female 02237 Cancer

A New Privacy Measure For Data Publishing 06/24/202


4
K-ANONYMITY

 The information for each person contained in the released table


cannot be distinguished from at least k-1 individuals whose
information also appears in the release
◦ Example: you try to identify a man in the released table, but
the only information you have is his birth date and gender.
There are k men in the table with the same birth date and
gender.
 Any quasi-identifier present in the released table must appear in
at least k records

A New Privacy Measure For Data Publishing 06/24/2024


HOW TO ACHIEVE K-ANONYMITY

 Generalization
 publish more general values, i.e., given a domain hierarchy, roll-up
 Suppression
 remove tuples, i.e., do not publish outliers
 often the number of suppressed tuples is bounded

original microdata 2-anonymous data

Birthdate Sex Zipcode Birthdate Sex Zipcode

21/1/79 male 53715 */1/79 person 5****


group 1
10/1/79 female 55410 */1/79 person 5****
1/10/44 female 90210 suppressed 1/10/44 female 90210
21/2/83 male 02274 */*/8* male 022**
group 2
*/*/8* male 022**
19/4/82 male 02237

A New Privacy Measure For Data Publishing 06/24/202


4
METHODS USED IN K-ANONYMITY

MONDRIAN
 It is multidimensional ,local recoding technique
 Splits d-dimensional space into two partitions
 Terminates when group contains <2k records
TOPDOWN
 start with the entire data set
 iteratively split in two reminiscent of R-tree quadratic split
 continue until left with groups which contain <2k-1 tuples

A New Privacy Measure For Data Publishing 06/24/2024


MEASURING GROUP QUALITY
 Discernability metric(DM )depends only on the cardinality of the
group
• no measure of how tight the group is
 a good group is one that contains tuples with similar QI values
 define a new metric : normalized certainty penalty (NCP)
• measures the perimeter of the group

A New Privacy Measure For Data Publishing 06/24/2024


DRAWBACKS OF K-ANONYMITY

 Information loss will be high


 k-Anonymity does not provide privacy if:
• Sensitive values in an equivalence class lack diversity
• The attacker has background knowledge

A New Privacy Measure For Data Publishing 06/24/202


4
K-JOIN-ANONYMITY

 KJA requires each G-box encloses at least k tuples of JT,Subject


to constrain that each G-box encloses at least one tuple of MT.
 To achieve less information loss it shrinks the G-boxes using
public knowledge about universe tuples
3-anonymous
microdata

k-anonymity

join

joined microdata join 3-anonymous


public data

JKA

x3
x3 x3

A New Privacy Measure For Data Publishing 06/24/202


K-JOIN-ANONYMITY (CONTD….)

Methods used in KJA


 DIRECT
It generalizes the entire JT .Among the resulting G-boxes,it
keeps only the one that represents some record(s ) in MT,and
discards the rest
 REFINEMENT
Refinement involves the following steps:
It applies kAlgorithm on MT. Then, for each
generalization box created, it performs a range query on JT and
invokes kAlgorithm on the retrieved records. This operation
generates new G-boxes. Refinement places into JAT the MBBs
of the new G-boxes that contain at least one MT tuple.

A New Privacy Measure For Data Publishing 06/24/202


4
PROPOSED WORK

 Existing techniques work well for fixed-schema data, with low


dimensionality.
 Certain applications require privacy-preserving publishing of
transactional data, which involves hundreds or even thousands
of dimensions and this makes existing methods unusable
 This problem is solved through methods based on
 Local NN-search
 Global data reorganization

A New Privacy Measure For Data Publishing 06/24/2024


PROPOSED WORK (CONTD….)

 Objective is to anonymize data consisting of a set of


transactions T ={ t1; t2; . . . ; tn}, n =│Tj│. Each transaction tєT
contains items from an item set I={i1; i2; . . . ; id},d =│I│.
 The data are represent as a binary matrix A with n rows and d
columns, where A[i][j]=1 if ij є ti, and 0 otherwise. Among the set
of items I, some are sensitive.
 A permutation-based approach is employed which has a
considerable impact in reducing information loss, compared to
generalization-based approaches

A New Privacy Measure For Data Publishing 06/24/2024


APPROXIMATE NN SEARCH WITH LSH

 In NNGroup algorithm, the entire set T is searched for nearest-


neighbor transactions, which is an expensive process. To
address this drawback, approximate NN search based on LSH
is used.
 For transactional data anonymization, a LSH techniques
employed, which is specifically tailored for the high dimensional
Hamming space H│Q│,where │Q│ is the dimensionality of the
quasi-identifier.

A New Privacy Measure For Data Publishing 06/24/2024


DATA REORGANIZATION METHODS

 This method reorganize the data by placing transactions with


similar quasi-identifier in close proximity to each other.
 The two data reorganization methods used are
• Reduction to band matrix representation
• Gray Code Sorting
 Reduction to band matrix representation
This method transforms the data into a band matrix by
performing permutations of rows and columns in the original
table. This representation takes advantage of data sparseness
and places nonzero entries near the main diagonal. The
advantage is that neighboring rows have high correlation, i.e.,
share a large number of common items.

A New Privacy Measure For Data Publishing 06/24/2024


DATA REORGANIZATION METHODS
(CONTD….)

 Gray Code Sorting


This data transformation technique relies on sorting with respect
to Gray encoding the QID items in each transaction t are
interpreted as the Gray code of t. The transaction set is then
sorted according to the rank in the Gray sequence. In this
representation,transactions that are consecutive in the
sequence have low Hamming distance; hence, their QID are
similar
 The output of any of the two methods is then fed to an efficient
linear-time heuristic which groups together nearby transactions,
therefore reducing the search space of the solution. Since both
data transformations capture correlation well, groups contain
transactions with similar QID, leading to increased data utility.

A New Privacy Measure For Data Publishing 06/24/2024


CONCLUSION

 In most practical anonymization scenarios, there exists public


knowledge that can be used by an attacker to breach privacy.
On the other hand, this knowledge can also be exploited to
reduce the information loss in the published data
 Motivated by this observation,in the existing system the concept of
KJA was introduced
 K-Anonymity techniques work well for fixed-schema data, with low
dimensionality
 Certain applications require privacy-preserving publishing of
transactional data (or basket data), which involve hundreds or even
thousands of dimensions, which makes existing methods unusable.In
proposed work the problem of anonymizing sparse, high dimensional
transactional data is solved through methods based on local NN-
search and global data reorganization

A New Privacy Measure For Data Publishing 06/24/2024


SCREEN SHOT K-ANOMYMITY

A New Privacy Measure For Data Publishing 06/24/2024


SCREEN SHOT K-ANOMYMITY
(CONTD….)

A New Privacy Measure For Data Publishing 06/24/2024


SCREEN SHOT K-ANOMYMITY
(CONTD….)

A New Privacy Measure For Data Publishing 06/24/2024


SCREEN SHOT K-ANOMYMITY
(CONTD….)

A New Privacy Measure For Data Publishing 06/24/2024


SCREEN SHOT K-ANOMYMITY
(CONTD….)

A New Privacy Measure For Data Publishing 06/24/2024


SCREEN SHOT K- JOIN ANOMYMITY

A New Privacy Measure For Data Publishing 06/24/2024


SCREEN SHOT K- JOIN ANOMYMITY
(CONTD….)

A New Privacy Measure For Data Publishing 06/24/2024


SCREEN SHOT K- JOIN ANOMYMITY
(CONTD….)

A New Privacy Measure For Data Publishing 06/24/2024


REFERENCES

 R.J. Bayardo, Jr., and R. Agrawal,(2005) “Data Privacy through


Optimal k-Anonymization,” Proc. IEEE Int’l Conf. Data Eng.
(ICDE), pp. 217-228.
 Dimitris Sacharidis, Kyriakos Mouratidis, and Dimitris Papadias,
“k-Anonymity in the Presence of External
Databases” IEEE transactions on knowledge and data
engineering Issue Date : March 2010 Volume : 22 ,On
page(s): 392 - 403
 G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis, “Fast data
anonymization with low information loss,” in Proceedings of
the International Conference on Very Large Data Bases (VLDB),
2007, pp.758–769.
 Kristen LeFevre ,David J. DeWitt ,Raghu
Ramakrishnan,“Mondrian multidimensional k-anonymity,” in
Proceedings of the IEEE International Conference on Data
Engineering (ICDE),2006, p. 25

A New Privacy Measure For Data Publishing 06/24/2024


REFERENCES (CONTD…)

 K. LeFevre, D.J. DeWitt, and R. Ramakrishnan(2005),


“Incognito: Efficient Full-Domain k-Anonymity,” Proc. ACM
SIGMOD, pp. 49-60.
 L. Sweeney, “k-anonymity: A model for protecting
privacy,”International Journal of Uncertainty, Fuzziness and
Knowledge-Based Systems, vol. 10, no. 5, pp. 557–570, 2002

A New Privacy Measure For Data Publishing 06/24/2024


THANK
YOU

A New Privacy Measure For Data Publishing 06/24/2024

You might also like