
Fig. 3. - Books of which authors from the top100 rating are read by Vkontakte users
There are interactive visualizations for all diagrams in the article: graphgrail.com/gg-client/vk_books.htmlBy 2014, the potential of traditional approaches to the development of social process analytics was exhausted for several reasons, the main one being the inability of the solutions created within these approaches to adapt to the changed conditions of the formation of social laws. We are talking about their lack of dynamism and inability to process data arriving in large volumes in a near real time mode. But the most serious blow to classical analytics was dealt by the explosive growth of unstructured data. [one]
In the analysis of the social network in this paper, we rely on the concept of “Big Data” (BIG Data) - a series of approaches that allow working with large amounts of data that are difficult or even impossible to manage using conventional means - they have a different structure and a significant replenishment rate .
Within the framework of the special technological stack used, many of these problems are solved, the stack combines the following technologies in a single interface:
- Graph theory as an innovative component of the technology for processing unstructured data [2]
- Natural language processing
- Information extraction technologies (data mining)
This paper describes the collection and statistical analysis of data from the users of the social network VKontakte on the example of 13 different types of groups, events and cultural communities: theaters, cinemas, museums, festivals, libraries, bikers, night clubs, music groups, philharmonic society, cultural news , yoga, bars, art cafes, antikafe [3]. In total, 899 communities of the above categories with geographic restrictions were collected and processed: communities in the city of Rostov-on-Don were considered. In total, data from more than 65,000 participants were collected from these communities. Participant information includes a wide range of both personal and socially significant fields: gender, date of birth, education, political views, attitudes towards alcohol and smoking, whether the participant is married, interests, or list of favorite books. The data was stored in a no-SQL MongoDB database [4].
One of the important criteria for involvement in cultural processes is reading the literature. The participants of cultural communities often indicate in their personal data those books or authors whom they love. We set the task to analyze the book preferences of the participants in order to obtain current data on the cultural trends of modern society. Analyzing the social network, we obtain the following data:
- The overall picture of the book preferences of the most cultural representatives of the social network,
- Detailed statistical sections for various categories of groups, with gender, age and other data of participants,
- Quantitative analysis of the book preferences of community members with the division into works and authors,
- Qualitative analysis of the participants' favorite books, with the possibility of subsequent comparison with the cultural needs and trends of the state and society.
The collected data allows, for example, to assess the degree of compliance of the favorite books of the participants in the groups with the opinion of Russian book experts, who compiled a list of the top 100 books.
The rating was compiled by a vote of
100bestbooks.ru visitors. The voting involves works of fiction of any length, of any genre, written in any language in any period of time. The voting system allows you to vote both "for" and "against." Registration is not required for voting. Voting is termless. At the moment the list is as follows:
1. Mikhail Bulgakov - Master and Margarita
2. Leo Tolstoy - War and Peace
3. Fyodor Dostoevsky - Crime and Punishment
4. Fyodor Dostoevsky - The Brothers Karamazov
5. Leo Tolstoy - Anna Karenina
6. Fyodor Dostoevsky - Idiot
7. Nikolai Gogol - Dead Souls
8. Alexander Pushkin - Eugene Onegin
9. Mikhail Bulgakov - Heart of a Dog
10. Mikhail Lermontov - A Hero of Our Time
11. Anton Chekhov - Stories
12. Victor Hugo - Outcast
13. Ilya Ilf, Evgeny Petrov - Twelve Chairs
14. Erich Maria Remarque - Three Comrades
15. Alexandre Dumas - Count of Monte Cristo
16. Ivan Turgenev - Fathers and Sons
17. Fyodor Dostoevsky - Demons
18. Arthur Conan Doyle - The Adventures of Sherlock Holmes
19. Nikolay Gogol - Taras Bulba
20. Alexander Griboyedov - Woe from Wit
Listing. 1. - Rating of the top 100 books (for a complete and current list, see
http://www.100bestbooks.ru/ )
')
Considering the different and rather diverse spellings of favorite books among the group members, the rating was divided into two lists: the list of authors of works and the list of the names of the works themselves. This separation allowed us to obtain detailed sections.
Consider the age composition of all participants in cultural groups (see Fig. 1). One can observe 2 pronounced peaks in the dates of birth of the participants: from 1987 to 1989, more than 8,000 people were born, and the majority of the active users of the considered groups are between 20 and 30 years old. These data directly correlate with the average age of social network users.

Fig. 1. - The age structure of all members of cultural groups
Moreover, the age distribution practically does not depend on the subject matter of the groups (the exception is the “Cinema” group, where, while maintaining the average age of participants 20-30 years old, there is no clear peak, the maxima on the distribution graph of birth years are relatively evenly between 1985 and 1992. ).
An analysis of the book preferences of participants in cultural groups showed that M. Bulgakov and his novel The Master and Margarita are the absolute leaders by reference. In the top are also Dostoevsky, Strugatsky and Remarque. It is worth noting that in the list of favorite books there are various genres, as well as classics and books of modern authors. For example, among contemporary authors, V. Pelevin and P. Coelho lead (100bestbooks.ru not listed), mystical / esoteric authors are represented by K. Castaneda and R. Bach (see Fig. 2).

Fig. 2. - Which books most often indicate users of Vkontakte in the "favorite books" field?
Understanding the preferences of the cultural audience, you can compare them with a rating of 100bestbooks.ru. Such a comparison will show exactly which authors and works from the rating the participants read. Observation shows that Dostoevsky and Tolstoy (in various spellings) are more common than Bulgakov. In general, the first ten by 90% coincides with the top ten of the top 100 best books (see Fig. 3).

Fig. 3. - Books of which authors from the top100 rating are read by Vkontakte users
The group “Bikers” is characteristic of the general trend, where the first place is occupied by the modern writer Sergey Lukyanenko (not in the 100bestbooks.ru rating). In addition, it should be noted that the group “Musical groups” turned out to be the only one that did not express a positive attitude towards reading: the “no” item takes place in the histogram of favorite books, the second is “all” (obviously, this answer is not sincere), and the sixth a place in popularity is the answer "I do not like to read."
Similar literary preferences are observed among the members of the Artkafé, Antikafe and Bars groups, and these groups do not observe similarity of preferences with the Nightclubs group.

Fig. 4. - Comparison of several groups by authors
Let us now consider which works from the rating are most often found in the audience (see. Fig. 4). An interesting observation is the success of G. Marquez's novel “One Hundred Years of Solitude” in 45th place - it ranks second in the preferences of the participants, even ahead of F. Dostoevsky’s “Crime and Punishment”.

Fig. 5. - What works are read by users Vkontakte
We can also compare different groups in pairs. In the diagram “Books of which authors from the top 100 rating are read by users of Vkontakte”, 2 groups of communities are compared: bikers and visitors of cultural events. An interesting observation: the communities are similar in love for Pushkin, Bulgakov and Remarque. But they differ greatly in another: Dostoevsky, Tolstoy and Gogol are not popular with bikers.

Fig. 6. - Comparison of biker communities and cultural events
Another interesting comparison: how do members of bar and cinema groups differ in their preferences? The figure shows that Crime and punishment is not among the favorite books of visitors of cinemas. At the same time, there are some similarities in foreign classics (Three Comrades, Romeo and Juliet).

Fig. 7. - Comparison of the community of bars and cinemas for works
We can compare the difference in ages: the figure shows that, in general, the distribution of dates of birth of visitors to theaters and night clubs is similar, there is only a slight shift towards 1980-1987 among theaters. This is expected: at the age of 30-35, people are more interested in live theater performances, and they are less attracted to the "special effects" of films.

Fig. 8. - Distribution of ages of participants of cultural communities of Vkontakte: theaters and night clubs
Consider basic statistical sampling by theater community (theater), see fig. 9.

Fig. 9. - Theater statistics
In addition to standard information, such as the expected prevalence of women in theatrical communities, data on attitudes, bad habits (attitudes toward alcohol, smoking), books and the interests of participants were also obtained. In particular, analyzing the sex composition of participants in theater groups, it can be noted that there is an extremely uneven distribution: the proportion of women is more than 70%. This observation is explained by the clear and consistently high interest in theatrical performances for women. At the same time, the picture of cinema community statistics (cinema) looks different (see fig. 10):

Fig. 10. - Cinema statistics
The ratio of men and women in these groups is approximately equal, it is also possible to estimate the books [6] that they read.
So, the analysis of data from social networks, in particular the social network VKontakte, allows you to quickly receive a large flow of data on the preferences and interests of community audiences. But the greatest value is the acquisition of real-time data, which makes it possible to track the dynamics, analyze cultural trends, assist in the formation of state policy in the field of cultural development of society, quickly identify shortcomings in cultural and moral education, lead the informational struggle for "minds" and values . This, by the way, is reflected in the new military doctrine of Russia.
You can learn more and read more such articles on our website
http://graphgrail.com/ In the comments write what analytics you would be interested to read.
Literature- Rozin MD, Svecharev VP, Kontorovich SD, Litvinov S.V., Nosko V.I. Problems of monitoring social networks as a platform for social communication of the Runet // Scientific thought of the Caucasus. Interdisciplinary and special studies, 2011, â„–2. Pp. 65-77.
- Nosko V.I. The system of automated construction of the social network graph // Engineering Bulletin of the Don, 2012, â„–4. URL: ivdon.ru/magazine/archive/n4p2y2012/1428
- Kontorovich S.D., Litvinov S.V., Nosko V.I. Methods of monitoring and modeling the structure of the politically active segment of social networks // Engineering Bulletin of the Don, 2011, â„–4 URL: ivdon.ru/ru/magazine/archive/n4y2011/642
- MongoDB is an open source document database, and the leading NoSQL database. Written in C ++. URL: mongodb.org
- Newman, Mark EJ "The structure and function of complex networks." SIAM review 45, no. 2 (2003): pp.167-256.
- Bird Steven. Natural Language Processing with Python. - O'Reilly Media Inc, 2009. - 482 p.