^Roberto J. Bayardo; Rakesh Agrawal (2005). “Data Privacy through Optimal k-anonymization”. ICDE '05 Proceedings of the 21st International Conference on Data Engineering: 217–28. doi:10.1109/ICDE.2005.42. ISBN0-7695-2285-8. ISSN1084-4627. https://www.cs.auckland.ac.nz/research/groups/ssg/pastbib/pastpapers/bayardo05data.pdf. "Data de-identification reconciles the demand for release of data for research purposes and the demand for privacy from individuals. This paper proposes and evaluates an optimization algorithm for the powerful de-identification procedure known as k-anonymization. A k-anonymized dataset has the property that each record is indistinguishable from at least k - 1 others. Even simple restrictions of optimized k-anonymity are NP-hard, leading to significant computational challenges. We present a new approach to exploring the space of possible anonymizations that tames the combinatorics of the problem, and develop data-management strategies to reduce reliance on expensive operations such as sorting. Through experiments on real census data, we show the resulting algorithm can find optimal k-anonymizations under two representative cost measures and a wide range of k. We also show that the algorithm can produce good anonymizations in circumstances where the input data or input parameters preclude finding an optimal solution in reasonable time. Finally, we use the algorithm to explore the effects of different coding approaches and problem variations on anonymization quality and performance. To our knowledge, this is the first result demonstrating optimal k-anonymization of a nontrivial dataset under a general model of the problem."
^Adam Meyerson; Ryan Williams (2004). “On the Complexity of Optimal K-Anonymity”. PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (New York, NY: ACM): 223–8. doi:10.1145/1055558.1055591. ISBN158113858X. http://www.stanford.edu/~rrwill/kanon-pods04.pdf. "The technique of k-anonymization has been proposed in the literature as an alternative way to release public information, while ensuring both data privacy and data integrity. We prove that two general versions of optimal k-anonymization of relations are NP-hard, including the suppression version which amounts to choosing a minimum number of entries to delete from the relation. We also present a polynomial time algorithm for optimal k-anonymity that achieves an approximation ratio independent of the size of the database, when k is constant. In particular, it is a O(k log k)-approximation where the constant in the big-O is no more than 4. However, the runtime of the algorithm is exponential in k. A slightly more clever algorithm removes this condition, but is a O(k logm)-approximation, where m is the degree of the relation. We believe this algorithm could potentially be quite fast in practice."
^Kenig, Batya; Tassa, Tamir (2012). “A practical approximation algorithm for optimal k-anonymity”. Data Mining and Knowledge Discovery25: 134–168.
^Angiuli, Olivia; Jim Waldo (June 2016). “Statistical Tradeoffs between Generalization and Suppression in the De-Identification of Large-Scale Data Sets”. IEEE Computer Society Intl Conference on Computers, Software, and Applications.