Find out the age of the user VK or what else can tell the social graph
"Tell me who your friend is and I will tell you who you are." Euripides 480-406 BC. er
For a long time I looked at API VK like a cat on a washing machine - I was hypnotized by the opportunity to do some research in one of the largest social networks that has penetrated many areas of our life. And once the question was born, is it possible to determine his age by the social circle of a social network user?
For those who want to know the hidden age and before there was a small hack. It is only necessary to use the search by people, specify narrow parameters so that the desired profile gets into the output, and then use the binary search to determine the age range. Or it turns out that the contact information suddenly indicates the year of graduation. And no scripts do not need to write. But the latent age and indirect information may be distorted, and the main article is still not about how to get more personal information. The article proposes to analyze one of the aspects of the social graph. ')
One of the first things that comes to mind when considering profile connections: let's look at the age of classmates and classmates, the vast majority of this user will have an age of + - 1 year. For this thanks to the universal secondary education. There is only one nuance: to identify classmates. The more time passes from graduation, the more circles we start to spin in the more heterogeneous circles. School friends seem to be in a past life, and now they are almost imperceptible among a large number of new acquaintances. Is it possible for profiles of people of mature age to somehow understand to what stream they studied and, therefore, approximate age?
So, let's consider the task of determining the age of a user as the definition of a subset of classmates and classmates. That is, we have taken for the assumption that his friends have a certain number of classmates, whose age roughly corresponds to the age of the profile. Of course there are exceptions, but they are rare. A person goes to school from bell to bell for 10 years, during which time many cross-social connections are established. In short, everyone knows each other, while the age variation in this social tangle is minimal. In the future, when a person joins other teams, as a rule, the range of age in them is significant, be it a job, sports activity or a club of interests. Let us try, on the basis of such a difference, to identify the necessary social groups.
Let us consider for clarity one of the profiles of VK with a large number of friends. Get the friends list of the user using the friends.get request. Consider profiles only with the specified age and place them on the timeline in the form of a histogram by year. There is a small nuance with how to break a lot of friends at annual intervals. After all, we want to ensure that classmates go into one interval, and not spread over the two adjacent ones. It was empirically established that it is best to break a year in the fall, and that users with birth dates in the yellow season enter two adjacent intervals at once. That is, 15 monthly intervals are obtained from September to November with a 12-month increment. oX is the age of users, Y is the number of users that fell into a given interval.
We see a five-year plateau with the maximum annual number of friends. It’s not at all obvious to find a one-year group in the midst of this 5 year segment. In truth, such a picture is not typical. More often, the year of birth of classmates / classmates is significantly distinguished from others by most friends. But in a difficult case, let's find for each user the relationship of friendly relations within the annual group to the number of connections with other friends of the original user, for whom we determine the age; further average this figure for each year. We call this the normalized coefficient of connectivity. oX is the age of users, Y is the normalized coefficient of connectivity for a given interval.
The picture has changed, and the leaders have a single year. It has a large share of the team with a homogeneous age, therefore we have the right to expect that if the user is part of it, then it has a similar age. And what if a person in this team plays a special role, for example, not a classmate, but a teacher? Indeed, for the case of teachers / trainers, there may be subgroups with a high density of connections in a narrow age gap. Part of this case can be handled if, when choosing a group, not with the highest connectivity, but with the largest age among the groups with a sufficiently large connectivity. In other words, to use the logic that a person must first visit an ordinary student in his life path, and only then play a dedicated role in “collectives with a uniform age”.
More detailed description and some formulas
Express the phenomenon numerically detected on the graph. Let F0 be the set of friends for which the age is calculated. Fi - many friends of any profile. Fi, y is the set of profile friends that have the specified birth date in the annual interval y. Then i, y is the connectedness of profile i in the interval :
y - non-normalized connectivity coefficient in the interval across all profiles:
And finally, the desired year of birth:
There was also an idea to consider what kind of relationship a particular type belongs to. If the type of communication school or university friends, then take them into account with increased weight. And if the type of colleagues, relatives and everything else, then do not take into account such relationships in general. However, if you use requests that load such information, the waiting time increases by 5 times. In addition, specifying the type of connection is not a popular practice, so it was decided to request such information only for profiles with a small number of friends.
From the above algorithm, the natural limits of applicability of the approach to the determination of age follow. If the user does not suffer from nostalgia for his school years, and his friends lack his classmates / classmates, then another method should be used.
How about trying this disgrace in business? A comic service was implemented in the VC group "Fortune Teller Age" . There, a friendly bot decides on age, if you throw him a link to the unclosed VK profile, using the above algorithm.
How the service works
The first link in the work of the fortune teller is the message mechanism of the VK group. In the group settings, the callback API is connected to its own server. For the types of events to be sent, select “Incoming message”. In this way, the group message is converted into a request on our server. If you are also not friendly with the front-end, then this is a super option. Next, the VK API is accessed from the server with users.get requests for the profile in question and friends.get for friends of the profile with a known date of birth. For their implementation requires access token VK applications. I did not use requests requiring confirmation of user rights in order not to load people with requests to allow access. After the estimated age has been calculated, a response is generated to the request from the group, and the user is a fortune teller who sees the answer in the dialogs. Cheap and angry.
As for the improvement of the algorithm itself, nothing prevents one from going even further, to assemble a training dataset from profiles with a specified age and to train a regression model based on, say, an adjacency matrix of an age graph among profile friends. I am sure that with a sufficiently large sample, the results will be more accurate heuristics. As mentioned above, I was curious to check the basic idea, so I do not plan to develop this area.
In conclusion, I want to touch upon the ethical aspect. In my opinion, the "fortune teller of age" is on the border of private life, but still does not overstep it, because it uses open data for analysis. Actually, therefore, for users with a hidden profile, the service will not work.
There is a feeling that all sorts of “fortune tellers of the age", search engines likes, SearchFace are only the first signs of a socially transparent world. To some extent, this can be called a return to basics. Man has long existed in small societies, where everyone was in front of each other. Open reputation was an integral part of the mechanism of social regulation. Yes, the new tools will gradually allow the human social interaction to be made clear again, only now at the global level. Yes, like any tool, it can be used to harm. Do I need to make them accessible to everyone? I do not know. But I am sure that if such tools are available only to a limited circle of people, then the balance towards constructive use will not exactly shift.