In this post I will give statistics on the data that the users of the social network VKontakte indicated in their profiles. Under the cat also contains the answer to the question of interest to many - how many are still active users in this social network? And, of course, a couple of words about how it was all collected.
Prehistory
It all started with me reading Habratopika.
We make our service for monitoring users on VKontakte . During the session, as you know, there is nothing to do, so I decided to learn C # and get acquainted with the VKontakte API.
Statistics collection process
Not all id's are valid: some users have been deleted, beautiful numbers have been blocked. For a list of valid id it was decided to use the
catalog , carefully made for search engines.
')
The
Html Agility Pack library was used to parse the catalog. The process is rather trivial, I see no point in describing it in detail.
The list of valid id as of June 18, 2011 contains
94,072,230 id out of 139,132,951 possible, that is,
67.6% are valid.
Valid id distribution
(how many are valid out of every million)
So, we have a list of valid id, we need to get profiles from it. To do this, we make a request to https://api.vkontakte.ru/method/getProfiles (one of the few methods that do not require authorization and are not subject to a limit of three requests per second) with uids parameters, where we specify a list of 750 id (api allows a thousand, but when requesting the profiles of the last users, the requested address becomes too long), and fields, where we specify all possible fields (uid, first_name, last_name, nickname, sex, bdate, city, country, timezone, photo,
photo_medium, photo_big, photo_rec, contacts, home_phone, mobile_phone, education,
university, university_name, faculty, faculty_name, graduation, rate, counters).
Execution of requests into one stream loads the channel only for 100 Kb / s, so I broke the execution into 94 streams, the task of each of which was loading a million profiles.
At that time, I did not take into account that the VKontakte server does not always return the correct answer, so not all profiles were received. But the error resulted in obtaining 0.3% of profiles, which is insignificant for statistics.
All obtained profiles in JSON occupy
45 GB .
Then it was all deserialized into a table using
JSON.NET , the result took
24 GB .
Since the statistics on all accounts, including those who abandoned the profile two years ago and spammers, is of no interest to anyone, it was decided to get a list of all active users. To do this, cyclically in 20 streams (to ensure that all 10 megabits / s are scored), we make requests to getProfiles for 750 users, but this time we request only the online field. At each iteration, we take a list of users not seen on the network at the previous iteration. This stage was delayed for 17 days (June 21 - July 7, it didn’t go any further for technical reasons) to guarantee an absolute majority of active users. Due to the insufficient channel width, each user was checked once every one and a half to three hours.
Increase of new active users by dates
The total number of active users by date
As you can see, the increase in active users has become small enough so that they can be neglected.
Actually, statistics (by active users)
VKontakte users make up
29.93% .
Statistics on male names
Name | Owners | AT % |
---|
Alexander (Sasha, Sanya, Alex, Sanya, Alex, Oleksandr, Sasha, Sanya, Alexander) | 1106979 | 8.3% |
Sergey (Seryoga, Seryoga, Sergiy, Sergey, Sergey, Seryozha) | 755885 | 5.6% |
Andrey (Andriy, Andryukha, Andrey) | 622105 | 4.7% |
Alexey (Lyokha, Lech, Lesha) | 576573 | 4.3% |
Dmitry (Dimon, Dima, Dimka, Diman, Dima) | 529432 | 4.0% |
Yevgeny (Zhenya, Zheka) | 417668 | 3.1% |
Maxim (Max, Max) | 384803 | 2.9% |
Vladimir (Vova, Volodya, Vova) | 312799 | 2.3% |
Ivan (Ivan, Ivan, Ivan) | 288728 | 2.2% |
Denis (Denis) | 275334 | 2.1% |
Roman (Roman) | 245177 | 1.8% |
Igor (Igor) | 238341 | 1.8% |
Michael (Misha, Micah, Misha) | 234676 | 1.8% |
Anton (Anton) | 233756 | 1.8% |
Oleg (Oleg) | 208195 | 1.6% |
Pavel (Pasha) | 198175 | 1.5% |
Artyom (Tyoma) | 194117 | 1.5% |
Nikolay (Kolya, Kolyan) | 180639 | 1.4% |
Yuri (Yura) | 158678 | 1.2% |
Vitaly (Vitalik, Vitalia) | 152539 | 1.1% |
Statistics on female names
Name | Owners | AT % |
---|
Ekaterina (Katya, Katerina, Katyushka, Katyusha, Katyukha, Ekaterina, Katya) | 658746 | 4.8% |
Elena (Lena, Lenochka, Elena, Lena, Lenka) | 658212 | 4.8% |
Olga (Olya, Olenka, Olga, Olka, Olga, Olga) | 653994 | 4.7% |
Julia (Julia, Julia, Julia, Julia, Julia, Julia, Julichka, Julia) | 631431 | 4.6% |
Natalia (Natasha, Natalia, Natali, Natal, Natasha, Natalia) | 628287 | 4.5% |
Anna (Anya, Annie, Annie, Annie, Anna) | 605341 | 4.4% |
Anastasia (Nastya, Nastya, Nastya, Nastya, Anastasia, Nastya, Nastya, Nastya, Nastya) | 597008 | 4.3% |
Tatyana (Tanya, Tanya, Tanya, Tanya, Tanya) | 583525 | 4.2% |
Irina (Ira, Irishka, Irinka, Irina, Ira, Irisha, Irina, Ira, Irina) | 540894 | 3.9% |
Maria (Masha, Maria, Masha, Masha, Mashulya) | 385851 | 2.8% |
Svetlana (Sveta, Svetik, Svetlana) | 365338 | 2.6% |
Marina (Mariska, Marina, Marinka, Marisha) | 329941 | 2.4% |
Victoria (Vic, Vikulya, Viktoria) | 269936 | 2.0% |
Daria (Dasha, Dasha, Dashulya, Dasha, Dasha, Dashulka) | 255681 | 1.8% |
Alyona (Alyona, Alyonka, Alenka) | 223205 | 1.6% |
Ksenia (Ksyusha, Ksyushka, Ksyuha, Ksyunya) | 201960 | 1.5% |
Oksana | 179259 | 1.3% |
Evgenia (Zhenya, Zhenya) | 177853 | 1.3% |
Alexandra (Sasha, Sasha) | 175563 | 1.3% |
Hope (Nadia, Nadyaushka, Nadyusha) | 168086 | 1.2% |
Statistics by last name (male and female combined)
Surname | Owners | AT % |
---|
Ivanov | 196474 | 0.70% |
Kuznetsov | 94237 | 0.34% |
Smirnov | 92047 | 0.33% |
Petrov | 84133 | 0.30% |
Vasiliev | 77683 | 0.28% |
Popov | 74980 | 0.27% |
Wolves | 53343 | 0.19% |
Mikhailov | 51913 | 0.18% |
Novikov | 51508 | 0.18% |
Sokolov | 50988 | 0.18% |
Pavlov | 50379 | 0.18% |
Andreev | 49646 | 0.18% |
Morozov | 47689 | 0.17% |
Alekseev | 46386 | 0.17% |
Romanov | 44027 | 0.16% |
Makarov | 43505 | 0.15% |
Stepanov | 43161 | 0.15% |
Nikolaev | 43059 | 0.15% |
Yegorov | 42537 | 0.15% |
Zakharov | 40135 | 0.14% |
Kozlov | 40023 | 0.14% |
Sergeev | 39925 | 0.14% |
Nikitin | 39483 | 0.14% |
Yakovlev | 38197 | 0.14% |
Hares | 37744 | 0.13% |
Grigoriev | 36063 | 0.13% |
Lebedev | 36052 | 0.13% |
Eagles | 35822 | 0.13% |
Alexandrov | 33149 | 0.12% |
Kuzmin | 32227 | 0.11% |
Sex distribution
Nickname / patronymic
UPD: Hereinafter "not specified" can also mean "not available for viewing to all users."
Country availability
Distribution of users by country
Active users in each country
A country | Active | Total | AT % |
---|
Russia | 6552115 | 32519338 | 20.15% |
Ukraine | 1715898 | 8976390 | 19.12% |
Belarus | 429023 | 1680113 | 25.54% |
Kazakhstan | 152117 | 1088727 | 13.97% |
Moldova | 50815 | 375172 | 13.54% |
USA | 50501 | 416430 | 12.13% |
Germany | 45283 | 286761 | 15.79% |
City availability
Statistics by city
City | Users | AT % |
---|
Moscow | 893857 | 10.42% |
St. Petersburg | 497324 | 5.80% |
Kiev | 238863 | 2.79% |
Minsk | 148782 | 1.73% |
Yekaterinburg | 129787 | 1.51% |
Novosibirsk | 116443 | 1.36% |
Kharkov | 105301 | 1.23% |
Samara | 97530 | 1.14% |
Nizhny Novgorod | 94377 | 1.10% |
Omsk | 88284 | 1.03% |
Avatar availability
Availability / Validity of a Mobile Phone
( , "
+()-
")
/
(
UPD3: , )
| | % |
---|
| 120159 | 2,4855% |
| 50500 | 1,0446% |
«» | 28607 | 0,5917% |
| 25535 | 0,5282% |
| 20842 | 0,4311% |
| 19628 | 0,4060% |
() | 18472 | 0,3821% |
| 17521 | 0,3624% |
. | 16791 | 0,3473% |
. | 16226 | 0,3356% |
/
( , )
( 1941 2008)

, 1 , 1 .
—
69,23.
— , 01.01.1988, , , / , 69. (, )
- .
, . - : , , , ...
, . .
. , . , 10- ( ) 8,9 , 41 . — 10^24 , .
P.S.
Microsoft Excel.
.
UPD2: , , — 24.
UPD4: ( 02.10.11) .7z, — 4.6 .
«direct torrent link».
UPD5: (7 )
(8 ).