📜 ⬆️ ⬇️

Anonymity is an illusion. According to data from anonymized datasets, you can identify real people



Theguardian.com has published findings from a study done by two eminent universities: UCLouvain University of Belgium and Imperial College London: scientists confirm that there are many ways to associate any anonymous data with real people.
For example, data with 15 demographic attributes “will be revealed by 99.98% of Massachusetts residents.” And for small populations, this procedure is even simpler: for example, if we are talking about a small town, then "it will not be difficult to identify the residents of Harwich Port, Massachusetts, in which fewer than 2000 people live."

“Anonymized” data underlies many processes: from modern medical research to personal recommendations and AI technologies. Unfortunately, according to the study, in any complex datasets it is almost impossible to successfully anonymize the data.

All identifiable personal information should be completely removed from the anonymized dataset, so that only the basic useful data remains that researchers can operate without fear of violating privacy. For example, a hospital can remove the names, addresses, and birth dates of patients from an array of case histories in the hope that researchers can use the rest of the data to discover hidden relationships between conditions.
')
But, in practice, data can be deanonymized in various ways. In 2008, the anonymous Netflix movie rating dataset was deanonymized by comparing ratings with data on the IMDb website. Addresses of New York taxi drivers were disclosed based on an anonymous dataset of individual trips around the city. And the anonymous medical billing data proposed by the Australian Ministry of Health can be identified by cross-referencing with “prosaic facts,” such as the birth years of a mother and child, or a mother and several children.

Researchers from the Belgian Catholic University of Louvain (UCLouvain) and the Imperial College of London have built a model to assess the ease of deanonymization of any arbitrary dataset. For example, data with 15 demographic attributes “will be revealed by 99.98% of Massachusetts residents.” And for small populations, this procedure is even simpler: for example, if we are talking about a small town, then "it will not be difficult to identify the residents of Harwich Port, Massachusetts, in which fewer than 2000 people live."

Despite this, data brokers like Experian sell "de-identified" datasets containing much more information about each person. The researchers pointed to the data sold to the software company Alteryx - it contains 248 attributes for 120 million US households.

Researchers argue that their results prove the lack of anonymization efforts to comply with legal requirements, such as GDPR (general data protection regulation).
“ Our results disprove allegations that the restoration of identification information is impossible ... ”

“ They further question the relevance of current de-identification techniques to anonymization standards from modern data protection laws such as GDPR and CCPA (California Consumer Privacy Act), and emphasize the need, from a legal and regulatory point of view, to go beyond the de-identification model. "Released-and-forgot. "

Other approaches to the processing of massive datasets may more closely meet modern information protection criteria. Differentiated privacy, practiced by companies like Apple and Uber, deliberately erodes each unit of information averaged over the entire dataset, thereby preventing deanonymization by providing technically incorrect information about each person.

Homomorphic encryption does not allow data to be read, but they can still be manipulated. The results will also be encrypted, but the data controller can decrypt them. And in the end, we will come to synthetic datasets, which means training AI on real, identifiable information, on the basis of which new, fake data units will be generated that will be statistically identical, but not connected with specific people.

Source: https://habr.com/ru/post/461381/


All Articles