The topic is addressed primarily to non-IT professionals, but to their wives, girls, mother-in-law, parents, and grandparents, who, having tasted the delights of the Internet, most likely do not think about how deeply the World Wide Web is built into society, and that, besides everything useful, The Internet carries objective risks for their own security.
The text outlines aspects of using the Internet, relating both to the process of registering a user in various places of communication in the network (blogs, forums, social networks, reviews about goods in stores), and speech styles in such places. It is shown what risks the continuation of extra-network communication styles (colloquial-everyday, official-business, publicistic) in the field of communication in the network can have.

')
Introduction
Intentionally I will not touch on the technical aspects of providing anonymization on the network, since this is an extensive topic that lies both in cryptography and in the implementation of networks, software and hardware complexes. Also, I do not consider the difference in access to profiles for guests and for registered users, as those interested in collecting information will find a way to register even in cases of limited registration. To de-anonymize a user, there is almost always enough open information and open ways to process it, primarily correct search queries (for queries, see, for example, the recent
topic about habraysers and adding to favorites ). Mathematically, de-anonymization is described by identifying new areas of intersection of sets that reflect user tracks on the network.
Terminology
To avoid confusion, I will give here free definitions of the terms used:
- User - a person (group of persons), represented in a given place of World Wide Web communication by the nickname of the user, his profile and, sometimes, the avatar. Everywhere below we will assume that the user is a registered user.
- Anonymization - the process of deleting user personal data in order to conceal the user's identification with the real identity of the user
- Deanonimization - the process of partial or complete disclosure of the user's identity
- Anonymity - a state that determines the degree of incoherence of the profile with the real identity of the user
- User registration - the process of binding personal data to a profile
- Profile - a set of personal data of the user
- Nickname (nickname) - username
- Avatar (userpic) - graphical representation of the user
About user registration
The first thing you should know when you want to associate yourself with an imaginary character, for example, on a forum, is that after reading and accepting the rules (disclaimer), you start playing by the rules of the resource administration, you must trust at least minimally. In other words, you are in some way or another delegating your personal data to the administration (I repeat, I don’t touch on the technical capabilities that allow you to do even hash all user data).
The minimum possible registration is a bunch of login (account) and password in the case where the login matches the nickname. The profile in this case consists of one nickname (and the user's sequence number). Date of registration may be added to the load. Agree that such minimalism in the set of personal data is rare.
More often during registration there are fields about date of birth, location of the user and his field.
Date of Birth
The presence of the first is sometimes necessary by law - for example, to limit access by age to “adult” materials. The item “hide my age” is not always present; There were some funny cases when even with the presence of “hide my age”, the user's nickname appeared in the frame “Today is the birthday of y: ...”. To identify a user, the difference in specifying only a birthday from the full date of birth is immaterial. When registering, the field exclusively “year of birth” (without day and month) is rarely encountered.
Location
Fields related to the user's location are most often “country” and “time zone”. Often the field "city". It should be understood that if you live in a city with a population of less than N (say, below the thirtieth position of the
list ), then with reliably specified information about your location, the number of ways in which your personality can be identified (the limiting case - the method “ everyone is familiar with everyone ”).
In registration forms on forums designed for professional communication, support and exchange of experience, there are (mandatory!) Fields that reveal such specific areas (professional interests, used software and equipment, and even the unit where the user works) that anonymity is de facto removed .
E-mail and IM
In most cases, registration is suggested to leave an e-mail address, which should be used to confirm registration and to recover forgotten passwords. Most often, mail is hidden by default in the profile from everyone except the user, although the reason is not concern for the user's privacy, but protection from spam. The mailing address itself, sometimes, contains data about the user's first name, surname or year of birth, or even all of these data at once: “ivan.ivanov85@example.com”. There are also “one-time mail” services (like
10 Minute Mail ), which are often used to confirm registration by users who do not want to use their main mailbox for this purpose. I will not consider the degree of anonymization inside postal communication, since the specifics of closed channels are completely different than the profiles open to all (including search robots). I will only mention that postal correspondence can be accidentally or intentionally made public, for example, when carelessly forwarding to a third party does not come a specific message fragment, but the entire chain of letters. Many forms offer to leave addresses of instant messaging. All such tools have their own profiles that store personal user data. Sometimes in strict form, down to the user's first name, last name and location.
Multinic
Here, multi-user refers to the multiple use by a user of one nickname in various Internet social resources. Multiple use of a nickname significantly reduces the anonymity of the user - profiles from different places from the point of view of de-anonymization can be simply combined. If there are many (in the sense of mathematical logic) friends in the user at least on one of the sites where the user is active under a certain name, a part of such a set can be reconstructed on another site “around” the same nickname.
Social networks
In the topic
Anonymity versus openness in a poster format, the “opposition” of these approaches is considered. In my topic, I describe the situation of a user who is between two such poles, when there is no impunity of anonymous and single-handed social user. network. Keeping anonymity residues with an open first name and surname is almost impossible. And social networks strive to maximize the life of a person and present it to the network.
Speech styles and text content
In general, the network uses the same
speech styles that a person uses in live communication with friends, colleagues, bystanders. Some linguists believe that in order to identify a person, only a certain amount of his texts is sufficient. The simplest linguistic expertise is made by each person when reading a text. Immediately visible educational level of man. He is badly hidden under the masks of any style of speech and with any kind of communication. The educational level affects the active vocabulary. What matters more is not its size, but the presence of professional terms in it and the relative frequency of their use. That is, you will not be given what you write in a language that is rich in synonyms, but if you freely use any terminology in your texts (different from the site subject terminology), then this can indicate either your specialty or an advanced level in a certain hobby. Human activity always affects his speech. Thus, lawyer’s
clericals do not disappear even when communicating in everyday life (“clean the room” instead of “clean the room”).
Professional theme
The presence of
competent messages and comments in a highly specialized area in itself sharply reveals the person. But often, the literacy of such comments can be determined only by an expert in this field. Professionals always have publications, papers, articles. Scientific ethics, for example, require citation and reference to research. It is difficult to give up on ethics overnight, and as a result, an anonymous author of a deep and competent article will most likely give a link either to his work, or to the works of the authors to which he referred in his writings, or to works that refer to him.
Errors
All are mistaken. But, can you systematically err in any special word? Check your texts with spell checkers (spell checkers).
Crosspost
The topic is very familiar to users of Habrakhabr. The presence of identical anonymous texts (or their fragments) on various sites, with subsequent comments "from the author" will link together the nicknames of the users who published them. When publishing some text in different places, you openly shout “I’m there and there!”. From poor quality copy-paste (without reference to the source), they differ in form and presence of the author in discussions and comments.
Friends (friends)
As mentioned above, the presence of “friends” communities, even in cases where friends are visible under the nicknames, actively contributes to de-anonymization. If any of the friends personally know a user who wants to preserve anonymity, he can accidentally de-anonymize him (the banal “look at Misha’s blog” reveals the identification of one of the friends of the author with the name of Misha).
Anonymity of public people
Everything is quite simple here: a
public person can either be completely deanonymized , when his articles, messages and comments are a continuation of his communication policy in life,
or completely anonymized , in which case all his social activities on the Internet should reveal the minimum possible number of connections with his personality. - until the cancellation of registration on the site. All intermediate states will jump to the area of ​​de-anonymization of a given public person.
Finally, even the time of sending messages may partly de-anonymize. If a person most often sends messages at 3:00 am - 7:00 am, then he may be a night owl, or he lives in a distant time zone (however, the time zone settings on this site may be incorrect).
Examples of risks associated with de-anonymization
Example one
Probably the most common deanonymization. A person leaves a blog ad “Selling a sewing machine. Call by phone ... or write to hp. Providing a phone number significantly reduces the anonymity of the ad author.
Example two
The person with the nickname pavel123, who lives in a hundred-thousandth city, leaves on the local forum the topic “who can take a canary while on vacation”, in which he asks good people to look at his favorite bird in these terms. If the profile on the forum contains the date of birth and the person is actually called Pavel, the user pavel123 is very likely to be deanonymized. The attackers have information that his home will be empty during the named period. Information about the name and date of birth in case of easy availability of databases allows to compare the user’s pavel123 identity with some Paul, and the full name. last - to connect with the address where it is registered. If the address of Pavel’s residence coincides with the address in
public easily accessible databases, the risk of de-anonymization in this case is the safety of his house / apartment.
It is difficult to give examples related to good anonymization - that’s why it’s good. First of all, Satoshi Nakamoto is remembered, one enumeration of attempts to de-anonymize which seems like good fiction (see the topic
Search for the creator of Bitcoin ).
findings
At present, the user has the right to dispose of his anonymity in places of communication on the Internet (in the case of Russia, the European Union countries, the USA, etc.). It is important to know that if you do not want to disclose your identity, you need to follow certain rules of communication and carefully look at exactly what information will appear in your profile (you can preview the profile of several arbitrary users of this site). It is also extremely undesirable to use the same nicknames when registering on various sites. If you ever plan to specify commercial information (even if in the commentary of a run-down unpopular blog), provide the minimum information possible for your identification during registration. It is advisable to carry out a simple linguistic analysis of your texts and use the spell checker.
From a global point of view, it will be interesting to see how the situation goes with the “intermediate anonymity” of forums and blogs - will social networks take all positions, will
anonymity be limited by law , or will everything remain as it is now?
Returning to the picture: on the Internet you can determine what color your tail is and what size of collar you wear.
Literature
habrahabr.ru/blogs/design/134595E.I. Galyashin Linguistic Security of Speech Communicationru.wikipedia.org/wiki/Anterological_examinationPS OpenID, “likes” and other technologies that simplify a person’s social life on the Internet are not mentioned in the topic - everything is impossible to fit in the format of a single article.