K-anonymity: Difference between revisions

Content deleted Content added
Monkbot (talk | contribs)
m →‎top: Task 16: replaced (0×) / removed (1×) deprecated |dead-url= and |deadurl= with |url-status=;
"Unicity" has a technical meaning in this context.
Line 108:
Because ''k''-anonymization does not include any randomization, attackers can still make inferences about data sets that may harm individuals. For example, if the 19-year-old John from Kerala is known to be in the database above, then it can be reliably said that he has either cancer, a heart-related disease, or a viral infection.
 
''K''-anonymization is not a good method to anonymize high-dimensional datasets.<ref>{{cite conference|last = Aggarwal|first = Charu C.|title = On ''k''-Anonymity and the Curse of Dimensionality|year = 2005|location = Trondheim, Norway|isbn = 1-59593-154-6|book-title = VLDB '05 &ndash; Proceedings of the 31st International Conference on Very large Data Bases|url = http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.60.3155&rep=rep1&type=pdf}}</ref> For example, researchers showed that, given 4 locations, the [[Unicity_(computer_science)|unicity]] of mobile phone timestamp-location datasets (<math>\mathcal{E}_4</math>, ''k''-anonymity when <math>k=1</math>) can be as high as 95%.<ref>{{cite journal|last=de Montjoye|first=Yves-Alexandre|author2=César A. Hidalgo |author3=Michel Verleysen |author4=Vincent D. Blondel |title=Unique in the Crowd: The privacy bounds of human mobility|journal=Scientific Reports|volume=3|pages=1376|date=March 25, 2013|doi=10.1038/srep01376|pmid=23524645|bibcode=2013NatSR...3E1376D|url=http://dspace.mit.edu/bitstream/1721.1/92263/1/Hidalgo_Unique%20in%20the%20crowd.pdf}}</ref>
 
It has also been shown that ''k''-anonymity can skew the results of a data set if it disproportionately suppresses and generalizes data points with unrepresentative characteristics.<ref>{{cite web|last1=Angiuli|first1=Olivia|author2=Joe Blitzstein |author3=Jim Waldo |authorlink3=Jim Waldo|title=How to De-Identify Your Data|url=http://queue.acm.org/detail.cfm?id=2838930|website=ACM Queue|publisher=ACM}}</ref> The suppression and generalization algorithms used to ''k''-anonymize datasets can be altered, however, so that they do not have such a skewing effect.<ref>{{cite journal|last1=Angiuli|first1=Olivia|author2=Jim Waldo|authorlink2=Jim Waldo|title=Statistical Tradeoffs between Generalization and Suppression in the De-Identification of Large-Scale Data Sets|journal=IEEE Computer Society Intl Conference on Computers, Software, and Applications|date=June 2016|url=https://ieeexplore.ieee.org/abstract/document/7552278/}}</ref>