📜 ⬆️ ⬇️

Anonymization and de-anonymization on the Internet

The topic is addressed primarily to non-IT professionals, but to their wives, girls, mother-in-law, parents, and grandparents, who, having tasted the delights of the Internet, most likely do not think about how deeply the World Wide Web is built into society, and that, besides everything useful, The Internet carries objective risks for their own security.

The text outlines aspects of using the Internet, relating both to the process of registering a user in various places of communication in the network (blogs, forums, social networks, reviews about goods in stores), and speech styles in such places. It is shown what risks the continuation of extra-network communication styles (colloquial-everyday, official-business, publicistic) in the field of communication in the network can have.

image

')

Introduction


Intentionally I will not touch on the technical aspects of providing anonymization on the network, since this is an extensive topic that lies both in cryptography and in the implementation of networks, software and hardware complexes. Also, I do not consider the difference in access to profiles for guests and for registered users, as those interested in collecting information will find a way to register even in cases of limited registration. To de-anonymize a user, there is almost always enough open information and open ways to process it, primarily correct search queries (for queries, see, for example, the recent topic about habraysers and adding to favorites ). Mathematically, de-anonymization is described by identifying new areas of intersection of sets that reflect user tracks on the network.

Terminology


To avoid confusion, I will give here free definitions of the terms used:


About user registration


The first thing you should know when you want to associate yourself with an imaginary character, for example, on a forum, is that after reading and accepting the rules (disclaimer), you start playing by the rules of the resource administration, you must trust at least minimally. In other words, you are in some way or another delegating your personal data to the administration (I repeat, I don’t touch on the technical capabilities that allow you to do even hash all user data).

The minimum possible registration is a bunch of login (account) and password in the case where the login matches the nickname. The profile in this case consists of one nickname (and the user's sequence number). Date of registration may be added to the load. Agree that such minimalism in the set of personal data is rare.

More often during registration there are fields about date of birth, location of the user and his field.

Date of Birth

The presence of the first is sometimes necessary by law - for example, to limit access by age to “adult” materials. The item “hide my age” is not always present; There were some funny cases when even with the presence of “hide my age”, the user's nickname appeared in the frame “Today is the birthday of y: ...”. To identify a user, the difference in specifying only a birthday from the full date of birth is immaterial. When registering, the field exclusively “year of birth” (without day and month) is rarely encountered.

Location

Fields related to the user's location are most often “country” and “time zone”. Often the field "city". It should be understood that if you live in a city with a population of less than N (say, below the thirtieth position of the list ), then with reliably specified information about your location, the number of ways in which your personality can be identified (the limiting case - the method “ everyone is familiar with everyone ”).

In registration forms on forums designed for professional communication, support and exchange of experience, there are (mandatory!) Fields that reveal such specific areas (professional interests, used software and equipment, and even the unit where the user works) that anonymity is de facto removed .

E-mail and IM

In most cases, registration is suggested to leave an e-mail address, which should be used to confirm registration and to recover forgotten passwords. Most often, mail is hidden by default in the profile from everyone except the user, although the reason is not concern for the user's privacy, but protection from spam. The mailing address itself, sometimes, contains data about the user's first name, surname or year of birth, or even all of these data at once: “ivan.ivanov85@example.com”. There are also “one-time mail” services (like 10 Minute Mail ), which are often used to confirm registration by users who do not want to use their main mailbox for this purpose. I will not consider the degree of anonymization inside postal communication, since the specifics of closed channels are completely different than the profiles open to all (including search robots). I will only mention that postal correspondence can be accidentally or intentionally made public, for example, when carelessly forwarding to a third party does not come a specific message fragment, but the entire chain of letters. Many forms offer to leave addresses of instant messaging. All such tools have their own profiles that store personal user data. Sometimes in strict form, down to the user's first name, last name and location.

Multinic

Here, multi-user refers to the multiple use by a user of one nickname in various Internet social resources. Multiple use of a nickname significantly reduces the anonymity of the user - profiles from different places from the point of view of de-anonymization can be simply combined. If there are many (in the sense of mathematical logic) friends in the user at least on one of the sites where the user is active under a certain name, a part of such a set can be reconstructed on another site “around” the same nickname.

Social networks

In the topic Anonymity versus openness in a poster format, the “opposition” of these approaches is considered. In my topic, I describe the situation of a user who is between two such poles, when there is no impunity of anonymous and single-handed social user. network. Keeping anonymity residues with an open first name and surname is almost impossible. And social networks strive to maximize the life of a person and present it to the network.

Speech styles and text content


In general, the network uses the same speech styles that a person uses in live communication with friends, colleagues, bystanders. Some linguists believe that in order to identify a person, only a certain amount of his texts is sufficient. The simplest linguistic expertise is made by each person when reading a text. Immediately visible educational level of man. He is badly hidden under the masks of any style of speech and with any kind of communication. The educational level affects the active vocabulary. What matters more is not its size, but the presence of professional terms in it and the relative frequency of their use. That is, you will not be given what you write in a language that is rich in synonyms, but if you freely use any terminology in your texts (different from the site subject terminology), then this can indicate either your specialty or an advanced level in a certain hobby. Human activity always affects his speech. Thus, lawyer’s clericals do not disappear even when communicating in everyday life (“clean the room” instead of “clean the room”).

Professional theme

The presence of competent messages and comments in a highly specialized area in itself sharply reveals the person. But often, the literacy of such comments can be determined only by an expert in this field. Professionals always have publications, papers, articles. Scientific ethics, for example, require citation and reference to research. It is difficult to give up on ethics overnight, and as a result, an anonymous author of a deep and competent article will most likely give a link either to his work, or to the works of the authors to which he referred in his writings, or to works that refer to him.

Errors

All are mistaken. But, can you systematically err in any special word? Check your texts with spell checkers (spell checkers).

Crosspost

The topic is very familiar to users of Habrakhabr. The presence of identical anonymous texts (or their fragments) on various sites, with subsequent comments "from the author" will link together the nicknames of the users who published them. When publishing some text in different places, you openly shout “I’m there and there!”. From poor quality copy-paste (without reference to the source), they differ in form and presence of the author in discussions and comments.

Friends (friends)

As mentioned above, the presence of “friends” communities, even in cases where friends are visible under the nicknames, actively contributes to de-anonymization. If any of the friends personally know a user who wants to preserve anonymity, he can accidentally de-anonymize him (the banal “look at Misha’s blog” reveals the identification of one of the friends of the author with the name of Misha).

Anonymity of public people

Everything is quite simple here: a public person can either be completely deanonymized , when his articles, messages and comments are a continuation of his communication policy in life, or completely anonymized , in which case all his social activities on the Internet should reveal the minimum possible number of connections with his personality. - until the cancellation of registration on the site. All intermediate states will jump to the area of ​​de-anonymization of a given public person.

Finally, even the time of sending messages may partly de-anonymize. If a person most often sends messages at 3:00 am - 7:00 am, then he may be a night owl, or he lives in a distant time zone (however, the time zone settings on this site may be incorrect).

Examples of risks associated with de-anonymization


Example one

Probably the most common deanonymization. A person leaves a blog ad “Selling a sewing machine. Call by phone ... or write to hp. Providing a phone number significantly reduces the anonymity of the ad author.

Example two

The person with the nickname pavel123, who lives in a hundred-thousandth city, leaves on the local forum the topic “who can take a canary while on vacation”, in which he asks good people to look at his favorite bird in these terms. If the profile on the forum contains the date of birth and the person is actually called Pavel, the user pavel123 is very likely to be deanonymized. The attackers have information that his home will be empty during the named period. Information about the name and date of birth in case of easy availability of databases allows to compare the user’s pavel123 identity with some Paul, and the full name. last - to connect with the address where it is registered. If the address of Pavel’s residence coincides with the address in public easily accessible databases, the risk of de-anonymization in this case is the safety of his house / apartment.

It is difficult to give examples related to good anonymization - that’s why it’s good. First of all, Satoshi Nakamoto is remembered, one enumeration of attempts to de-anonymize which seems like good fiction (see the topic Search for the creator of Bitcoin ).

findings


At present, the user has the right to dispose of his anonymity in places of communication on the Internet (in the case of Russia, the European Union countries, the USA, etc.). It is important to know that if you do not want to disclose your identity, you need to follow certain rules of communication and carefully look at exactly what information will appear in your profile (you can preview the profile of several arbitrary users of this site). It is also extremely undesirable to use the same nicknames when registering on various sites. If you ever plan to specify commercial information (even if in the commentary of a run-down unpopular blog), provide the minimum information possible for your identification during registration. It is advisable to carry out a simple linguistic analysis of your texts and use the spell checker.

From a global point of view, it will be interesting to see how the situation goes with the “intermediate anonymity” of forums and blogs - will social networks take all positions, will anonymity be limited by law , or will everything remain as it is now?

Returning to the picture: on the Internet you can determine what color your tail is and what size of collar you wear.

Literature

habrahabr.ru/blogs/design/134595
E.I. Galyashin Linguistic Security of Speech Communication
ru.wikipedia.org/wiki/Anterological_examination

PS OpenID, “likes” and other technologies that simplify a person’s social life on the Internet are not mentioned in the topic - everything is impossible to fit in the format of a single article.

Source: https://habr.com/ru/post/137416/


All Articles