📜 ⬆️ ⬇️

Social Network Graph Visualization: Analysis of Blogosphere Events Before December 2011

This is a logical continuation of the article " Building a social network graph using Drupal and Feeds "

As part of a group I was engaged in collecting information from the blogosphere. The task was to assess the tension, the activity of political discussions during the election campaign period to the State Duma. Looking ahead, I will say that the study allowed to put forward hypotheses, which were later confirmed. In particular, according to the results about which you will read below, it is possible to understand who will enter the square and bring people along. And most importantly, for whom they will go.

In recent years there has been a rapid increase in the impact of blogosphere events on the political and social processes in the world, including the political life of our country. Social networks are a platform for active discussion of all political events in a country that shape public opinion, and, above all, young people - those in whose hands the fate of the country will be after 10-15 years. Thus, the need to develop methods and algorithms for the study of social communication of social media and the characteristics of their influence on current political events is becoming increasingly obvious.

The study of communications in social media was conducted in mid-November 2011. In the course of the study, an analysis was made of the October-November discussions from the LiveJournal concerning the upcoming elections to the State Duma on December 4.
The blog platform LiveJournal (LiveJournal) was chosen as a platform for testing the monitoring methodology of the social media segment under investigation. This choice is due to the focus of this network in the first place on the conduct of open public discussions: “Live Journal” has now become one of the main sites for “citizen journalism”.
')
In the course of the study, more than 1,200 user comments were collected, the number of edges in the oriented graph exceeded 950. The period for gathering information is July - November 2011.

For analysis, an open source Gephi program was used, into which the graph from the previous article was imported.

Properties of vertices and edges




Figure 1 - Graph after import
Intermediate (between) - the number of the presence of the vertex in the shortest path between any other vertices. The study showed that a very small number of nodes has a high degree of intermediateness - only 6 or about 0.5%. This means that in the political segment of the Runet there is no complex branched network with many large clusters and communities. As a rule, users of information have the opportunity to transmit information, communicating simultaneously in 2-4 different circles of political opinions. At the same time, these information guides do not have a great influence on the opinion of the communities they are in, therefore it is difficult to use them in information campaigns during the pre-election period.
The figure shows a graph in which the users with the greatest degree of intermediateness are selected with the largest size and color of warm shades (green, orange and red).


Figure 2 - Graph with selected vertices with a high degree of intermediateness
The distribution of intermediateness in the graph is extremely uneven, most of the vertices do not have it at all.


Figure 3 - Graph with selected vertices with high intermediateness
The table, sorted in descending order, contains the specific usernames of users with corresponding intermediateness. Of fairly well-known people, the leader can be noted - this is V. Milov (v_milov), one of the leaders of the opposition.


Figure 4 - High Intermediate Users
Eigenvector centrality is a recursive characteristic of the importance of a vertex derived from the sum of the importance of the connected vertices. The study showed that A. Navalny, G. Yavlinsky, S. Mironov have a high centrality, and from the political communities only ru_politics.


Figure 5 - Users with a high centrality of their own value

Cluster Properties


The degree of clusterization (transitivity) is a characteristic of increased probability of communication between the AC peaks, if AB and BC (my friend's friend is my friend). This characteristic may indicate that vertices with a high degree of clustering are commented on by people who personally know them.


Figure 6 - The number of "triangles" in the column

Network properties


Diameter is the maximum shortest path between any two vertices (between which such a path can be laid).
d = min⁡max⁡ L ij
Formula 1 - Determination of the diameter
The diameter of the resulting graph is 2, which indicates the absence of chains of communication interactions between users.
The degree distribution is a plot of the degree of a vertex versus the total number of such vertices in a graph. The degrees for the current study were calculated based on the challenges. To determine authoritative users, the inbound degree metric is used. If a vertex has a high incoming degree, then this user often and a lot of comments, which in turn means a high degree of interest to him from the community. As a rule, such users are opinion leaders and promoters of new ideas, which cause active discussions in society. The study showed that the distribution of incoming degrees obeys a power law and decreases sharply with an increase in the number of commentators. So, the leaders are users who typed 60, 30, 18, 15 comments on the given keywords.


Figure 7 - Users with a high incoming degree


Figure 8 - Distribution of incoming degrees
One of the most prominent leaders is A. Navalny.


Figure 9 - Distribution of incoming degrees
An analysis of the degree in the graph shows that, as a rule, people who comment on opinion leaders themselves are leaders in the number of comments.


Figure 10 - Distribution of incoming degrees
The average distribution of powers for the entire graph is 0.743, but the median is more interesting, it is in the region of 2-4. The overall distribution of degrees, both incoming and outgoing presented in the figure.


Figure 11 - Distribution of incoming degrees
The weighted degree characterizes the normalized distribution of degrees in the range from 1 to 100. The unconditional leaders are, A. Navalny, G. Yavlinsky, the community ru_politics. Also on the list are the economist Khazin and the Solidarity movement. An interesting result was the fact that there are no politicians and public figures like G. Zyuganov, V. Zhirinovsky, M. Prokhorov in the list, which partly can be explained by the fact that the main discussions are conducted by their supporters at other venues, in particular official sites. The absence of Prokhorov can also be explained by the fact that he now writes not about politics, but focused as before on business.
Another interesting result is that there are no regional political communities in the list, such as politics_south (401 readers) - Politics in the South of Russia, gorodgeroev_ru (281 readers) - Political life in Volgograd. These regional communities, although they have readers, do not attract active commentators. The Communist Party in the Communist Party ru_cprf is a political party, the Union of Right Forces ru_sps, spravedliva_ru Fair Russia contains only texts and reposts, there is practically no political activity and discussion.
The main conclusion: as a rule, active discussions are conducted in the journals of political leaders, but not in the communities, which therefore have a somewhat artificial character.


Figure 12 - Weighted Leaders

Modularity allows identifying communities or groups of users in the structure of a graph. In the resulting graph, you can select 4-6 small groups of selected keywords.


Figure 13 - Groups in the column


Figure 14 - Community A. Navalny
The sizes of the largest groups vary from 10 to 35 users, see fig.


Figure 15 - Distribution of groups


Figure 16 - Modularity Class
In addition to analyzing the structure of the study allows you to immediately get acquainted with the texts of user entries commentators. The table shows the edges of the graph, each edge corresponds to the title and text of the comment. This allows you to immediately analyze a more accurate subject of the comments left, to assess the overall tone of the messages.


Figure 17 - Tops of the graph with the text of comments

Summary: now, a year later, when we know how events have developed, it is clear that such a study with a high degree of accuracy can predict the real activity of protest leaders based on their activity in the blogosphere.
Of course, we collected some data, it is possible to argue about the representativeness of the sample (only records were collected for certain queries created using the Yandex Search designer), we need to explore more networks, not only LJ. This is in the future.

But now our research is unique in terms of analyzing the graph and network structure. As far as I know, research usually builds engagement graphs, quantitative characteristics (such as number of posts, number per user, etc.), audience size, etc. But no one builds the structure of the graph, does not calculate the metrics, as they did we. But this allows in the future to monitor the dynamics of events.

Source: https://habr.com/ru/post/164307/


All Articles