请输入您要查询的百科知识:

 

词条 T-closeness
释义

  1. Attacks on l-diversity

  2. Formal definition

  3. See also

  4. References

{{lowercase title}}

t-closeness is a further refinement of l-diversity group based anonymization that is used to preserve privacy in data sets by reducing the granularity of a data representation. This reduction is a trade off that results in some loss of effectiveness of data management or mining algorithms in order to gain some privacy. The t-closeness model extends the l-diversity model by treating the values of an attribute distinctly by taking into account the distribution of data values for that attribute.

Attacks on l-diversity

This is useful because in real data sets attribute values may be skewed or semantically similar. However, accounting for value distributions may cause difficulty in creating feasible l-diverse representations. The l-diversity technique is useful in that it may hinder an attacker leveraging the global distribution of an attribute's data values in order to infer information about sensitive data values. Not every value may exhibit equal sensitivity, for example, a rare positive indicator for a disease may provide more information than a common negative indicator. Because of examples like this, l-diversity may be difficult and unnecessary to achieve when protecting against attribute disclosure. Alternatively, sensitive information leaks may occur because while l-diversity requirement ensures “diversity” of sensitive values in each group, it does not recognize that values may be semantically close, for example, an attacker could deduce a stomach disease applies to an individual if a sample containing the individual only listed three different stomach diseases.

Formal definition

Given the existence of such attacks where sensitive attributes may be inferred based upon the distribution of values for l-diverse data, the t-closeness method was created to further l-diversity by additionally maintaining the distribution of sensitive fields. The original paper[1] by Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian defines t-closeness as:

{{quote|The t-closeness Principle: An equivalence class is said to have t-closeness if the distance between the distribution of a sensitive attribute in this class and the distribution of the attribute in the whole table is no more than a threshold t. A table is said to have t-closeness if all equivalence classes have t-closeness.}}

Charu Aggarwal and Philip S. Yu further state in their book on privacy-preserving data mining[2]that with this definition, threshold t gives an upper bound on the difference between the distribution of the sensitive attribute values within an anonymized group as compared to the global distribution of values. They also state that for numeric attributes, using t-closeness anonymization is more effective than many other privacy-preserving data mining methods.

See also

  • k-anonymity
  • l-diversity
  • Differential privacy

References

1. ^{{cite journal| url = https://www.cs.purdue.edu/homes/ninghui/papers/t_closeness_icde07.pdf| title = t-Closeness: Privacy beyond k-anonymity and l-diversity| authors = Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian| journal = ICDE| year = 2007| date = | publisher = Purdue University| doi = 10.1109/ICDE.2007.367856| quote = }}
2. ^{{cite book| title = Privacy-Preserving Data Mining – Models and Algorithms| url = http://charuaggarwal.net/generalsurvey.pdf| chapter = A General Survey of Privacy| editor1 = Charu C. Aggarwal| editor2 = Philip S. Yu| publisher = Springer| isbn = 978-0-387-70991-8| year = 2008}}

2 : Anonymity|Privacy

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/11 21:51:02