Metrics used for data utility and privacy simultaneously
What is K-anonymity? Key principle: A dataset is considered k-anonymous if, for every combination of identifying attributes in the dataset, there are at least k-1 other individuals with the same attributes. In simpler terms, each record should be indistinguishable from at least k-1 other records based on the identifying information available.
Achieving K-Anonymity Generalization: Replacing specific values with more general ones (e.g., replacing city with state). Suppression: Removing identifying attributes altogether. Perturbation: Adding slight random noise to certain values.
Limitations K-anonymity is not foolproof, and it’s still possible to re-identify individuals depending on the available background information and the specific implementation. Achieving a higher level of k-anonymity (larger k) often leads to significant data loss or reduced utility, limiting its applications.
Variants l-diversity: Requires each group of k individuals to have at least l different values for a sensitive attribute. t-closeness: Requires the distribution of sensitive attributes within each group of k individuals to be statistically indistinguishable from the overall distribution.
Related K-Map k-Map model is capable of considering another dataset, in addition to the original one, in order to achieve the privacy requirement