📜 ⬆️ ⬇️

Search for the most influential objects of the social network subset

In the modern world, relations between people, besides the social level, have taken another one - digital. With the proliferation of virtual social networks, there was a tendency to have a personal page with personal data, look for friends by interest, create groups, etc. At one IT Talk meeting held by DataArt, I met a person who was involved in the study of social network topologies. On this day, I completely decided on the topic of my master’s thesis, which is represented by the title of the article. The fact is that the amount of information in social networks is constantly increasing, and most of this information is presented in its raw form. By itself, it is not of interest. The idea arose to process such data and get results that could well serve a good cause.

This article discusses the search for the most influential objects. This information can be useful for conducting various virtual marketing campaigns, as well as for identifying users with suspiciously high activity.

Introduction


We denote a social network in the form of a graph whose nodes are people. If objects are somehow connected with each other (they are friends or are being rewritten), then there is an arc between these objects.

Fig. one
')
We give an intuitive notion of influence. Consider the following figure.

Fig. 2

Here the object in the center has 6 links. But influence is not reduced only to the number of links. We must consider the degree of influence of objects with which the target object is associated. Consider rice. 3

Fig. 3

In the above figure, it is clear that the object has 3 connections, but the objects with which it is associated have some influence on the network. It is necessary in this way to formalize the concept of influence, to take into account both the number of connections and the influence of objects with which the target object is associated.

Materiel


The impetus for research in this direction was this article [1]. I made amendments to the proposed algorithm, which in my opinion seem appropriate.
The graph will be represented as an adjacency matrix.



We introduce the concept of the iterated force of an object i of order k.



Note that the iterated force of the first order of the object i is the number of connections of this object with others. It still does not take into account the influence of other objects. Starting from the second order, this amount includes the influence of other objects.

Question : up to what order to consider the vector of the iterated forces?
Answer : either 2 or 3.

The fact is that the iterated force of an object i of order k expresses the degree of influence of object i, given that it can spread its influence no more than in radius k. So inherent in the sum itself. Remember when you asked your friend to ask someone to do something for you? This is influential within a radius of 2, and is expressed by an iterated second-order force. If we want to take into account chains with another intermediate participant, then we need to consider the vector of the iterated 3rd order forces. It seems inexpedient to consider the iterated forces to be more orderly due to the negligible probability of the occurrence of such long chains in real life.
Thus, if we want to calculate the influence of objects on the scale of a large city and above, then we count up to the 3rd order. For smaller scales it is advisable to use the second order.

Note that what matters to us is not the numerical value of the iterated force, but how the forces for different objects correlate with each other. Therefore, after calculating the vector of the iterated force of the next order, it is advisable to normalize this vector. The maximum is taken as the norm in absolute values ​​of its elements.

We collect information


Information was collected from the social network "Vkontakte". An application was written that loads 2 sets of friends: up to the 2nd and up to the 3rd levels. For greater clarity, consider Figure 4.
Fig. four
Here, friends of the first level are marked in red, the second in green, the third in yellow. Due to the fact that the VK API does not allow you to make more than 3 requests per second, you had to leave the car turned on at night to load the third level.

Fig. five

Note that if you download 4 levels, then most likely we will download 99% of network users. There is a theory of six handshakes [2], which states that between any two people on Earth there are no more than 5 levels of mutual acquaintances. Due to the fact that VK is distributed mainly in the CIS, this figure should be lower.

Analyzing data


So, we have 2 data sets:

I myself do not sit in social networks, so in both cases the center of the sample was my friend, a student of the Voronezh State University.

Let's start with the analysis of the first set.

Fig. 6

Here the sampling center ranked first in terms of influence. This can be explained by how the data was loaded. The fact is that the entire last level of friends contains only one connection. For these objects, the friends list is empty. Therefore, the influence of level 1 friends (which is largely based on level 2 friends) is rated low. Because of this, we get not quite objective results. Nevertheless, interesting data can be observed: in the first 30 lines about half of the activists of the VSU are located.

Let us turn to the study of the second data set. Here the sample center ranks 3539 place.
This is understandable: here influence is already considered within the city.

Fig. 7

The following entries are traced at the beginning of this table:

As for the people in the first lines of this list, we can say that some of them are really famous people (mostly photographers) of our city. But the activity of some objects is suspicious: too much influence with sufficiently meager information about the individual. These objects can be considered by network administrators for compliance with the information of reality.

PS If anyone has an idea how to improve this model, please unsubscribe in the comments.

Sources:


  1. www.basegroup.ru/library/web_mining//information_flows_in_social_networks
  2. ru.wikipedia.org/wiki/%D0%A2%D0%B5%D0%BE%D1%80%D0%B8%D1%8F_%D1%88% D0%B5%D1%81 % B8_% D1% 80% D1% 83% D0% BA% D0% BE% D0% BF% D0% BE% D0% B6% D0% B0% D1% 82% D0% B8% D0% B9


Source: https://habr.com/ru/post/183548/


All Articles