EDBT/ICDT 2009 Joint Conference

Electronic Conference Proceedings

On the Comparison of Microdata Disclosure Control Algorithms

Authors

Abstract

Privacy models such as k-anonymity and l-diversity typically offer an aggregate or scalar notion of the privacy property that holds collectively on the entire anonymized data set. However, they fail to give an accurate measure of privacy with respect to the individual tuples. For example, two anonymizations achieving the same value of k in the k-anonymity model will be considered equally good with respect to privacy protection. However, it is quite possible that for one of the anonymizations a majority of the individual tuples have lesser probabilities of privacy breaches than their counterparts in the other anonymization. We therefore reject the notion that all anonymizations satisfying a particular privacy property, such as k-anonymity, are equally good. The scalar or aggregate value used in privacy models is often biased towards a fraction of the data set, resulting in higher privacy for some individuals and minimalistic for others. Consequently, to better compare anonymization algorithms, there is a need to formalize and measure this bias. Towards this end, we advocate the use of vector-based methods for representing privacy and other measurable properties of an anonymization. We represent the measure of a given property for an anonymized data set using a property vector. Anonymizations are then compared using quality index functions that quantify the effectiveness of the property vectors. A formal analysis with respect to their scope and limitations is provided. Finally, we present preference based techniques when comparisons are to be made across multiple properties induced by anonymizations.

Session

EDBT Research Session 7: Privacy & Security (Tuesday, March 24, 16:00—17:30)