It is obvious that the fact of the development of social networks eliminates the distance between agents, and also increases the likelihood of accidental communication between two agents - thus, it is easier and simpler to infect agents with information. So, the question of the ability to predict how the infection will spread is becoming topical.
And although initially the need to predict the spread of infections in networks has arisen in biology, this problem is also present in economics. After all, if, say, a company wants to spread some kind of novelty through a social network (this method of diffusion of information is one of the most popular since the beginning of active development of social networks), then it needs to understand how the infection will go over the network with time in order to correctly select ambassadors for minimum costs of distributing information about the product. Thus, network predictive modeling turns out to be in demand in relation to networks of economic agents.
Next, I will show the practical application of models of the spread of infection on the example of the
Flickr network. For this, the two most popular and practical models will be implemented - SI (suspectible-infected) and SIR (suspectible-infected-recovered) [1], [5].
Implementing the SI model on the Flickr network
Suppose that we produce some product that we want to advertise using photo hosting Flickr. To do this, we "infect" a certain number of users by purchasing the right to publish through their profiles photos of our goods with the corresponding hashtag. The transfer of infection from one agent to another will be referred to as a hashtag transmission.
')
Considering the fact that we have “healthy” network members (susceptible), as well as those originally infected with our product (infected), the distribution model of our virus over the network can be described by the SI epidemic spread model (with a certain probability of infection, t. e. a measure of how susceptible a person is to advertising) [3], as well as a SIR model (when a person “had a disease” with the idea of ​​purchasing our product and is no longer susceptible to advertising) [4].
Consider the case of the SI model. For simplicity, we will assume that we can initially infect one member of the network. Then, the task of the manager can be reduced to monitoring the iterative change of the network (i.e., step by step, with the selected time interval, the network state is predicted at the next moment in time, based on current information). Thus, having infected a user in a giant connected component, one can observe step by step how the infection will spread. And by virtue of the particularity of the SI model, any vertex that at its next step joins the giant connected component will sooner or later become infected.
For example, let us build the giant component of the Flickr network graph several months after the start of the network operation (Fig. 1.).
Fig.1. Giant connected network component FlickrObviously, the share of “infected” network agents should be the final indicator of an advertising campaign. For the case of the SI model, this will simply be the fraction of the vertices of the giant connected component in the entire graph. Take a few slices of the Flickr network and see how the proportion of the infected population will change when using the SI model over time (Figure 2.).
Fig.2. The total size of the epidemic (the proportion of infected) with increasing graphWe obtain a natural result: the larger the graph, the greater its connected component, and hence the greater the total share of infected network members. Now consider a more complex model of the spread of infection in the network, namely the SIR model.
Implementing the SIR model on the Flickr network
In order to represent the spread of infection in the Flickr network by the type of the SIR model, we use this method of information dissemination as percolation [2].
The concept of percolation came from physics, where it means the flow or not the flow of electric current through a mixture of materials. As applied to the theory of the spread of infection, percolation refers to the transmission or non-transmission of infection through communication between two agents.
In order to explain the essence of percolation in the theory of epidemics, suppose that some vertex u from graph G has just become infected. This vertex has a connection with the vertex v, which means there is a certain probability p that the vertex v will be infected at the next moment in time. Find out the outcome of the event, we can, for example, throwing a coin, in which the "eagle" falls with probability p. But from the point of view of determining the outcome, it does not matter at all whether we threw a coin at the beginning of the epidemic, or at the moment when the top u became infected and there was a threat of infection of the top v. Then, at the very beginning of the epidemic, nothing prevents us from carrying out the coin flip procedure and determining whether the edge “passes” between the specific two peaks of the infection or not. We call the edge active, if, by the results of coin tossing, it has passed into the category of passing the infection, and passive otherwise. For the case of one initially infected vertex, it becomes possible to formulate the following theorem.
Theorem 1. A vertex v will become infected as a result of an epidemic if and only if it is connected to the initially infected vertex u by a path of active edges.This theory is directly related to the SIR model. Let T (probability of transmission) determine the threshold value of the probability of an edge to become active (a single number is fixed to check all the edges in the graph). We infect some initial vertex at time t = 1. Next time we will see which of the edges leading from this vertex are active, and pass the infection on them, and transfer the vertex itself to the recovered class (those who left the population would mean for of our example that the top was exposed to advertising, and in the next step I managed to pass on the advertising idea to some friends, after which I lost interest in distributing information about the product).
By doing the described procedure iteratively until the infected peaks are completely exhausted, we will come to the conclusion that the total share of infected network participants will be the set of vertices of the class recovered in relation to the number of vertices in the whole graph.
By default, we will assume that when it comes to a single initially infected vertex, it is selected from among the vertices in the giant connected component; We also note that for each further simulation 500 iterations were done and the result was averaged over all.
Let's look at the distribution of the shares obtained as a result of the epidemic (in 500 simulations) with different combinations of the number of initially infected (inf) and T (probability of transmission). In Fig.3. 9 combinations are presented (T = 0.25, 0.55 and 0.85; inf = 1, 50 and 100).
Fig.3. Dependence of the final epidemic coverage on the threshold T and the initial number of infectedWe find that the increase in the number of initially infected leads to the fact that the distribution becomes more and more like a normal one (the law of large numbers seems to work). An increase in the parameter T contributes to the fact that the "tails" become "easier" (because with small T, even a large number of initially infected ones do not guarantee "good" spread of the infection).
Separately, in the case when only 1 node is initially infected, we will look at how the correlation between the “centrality” of the initially infected node and the final share of infected ones changes depending on the T parameter (Fig. 4.).
Fig.4. Correlation between the final coverage and the “centrality” of the source node for different TNote that, starting from a certain moment, a trend is established with an inverse relationship (for degree centrality, closeness centrality and betweenness centrality). This is due to the fact that when T is small and the infection spreads poorly by itself, it becomes extremely important which peak we choose to start infecting the network (that is, there is a strong positive correlation between the “centrality” and the total share of the infected); and as the value of the T parameter increases, the infection spreads better and the importance of the “centrality” of the original node becomes less and less (thus, the correlation decreases, ie, regardless of the “centrality” of the initially infected node, the infection still spreads well by itself) .
Thus, we can model the distribution of our product, having predicted graphs and information about what the threshold T is (relative to all this and the available budget, it becomes possible to build a strategy on how many agents to infect at the initial time, depending on the target indicators of the final share "Infected").
The main conclusions are that for the case of using the epidemic spread model of SI, it can be argued that any vertex of the network graph that falls into a giant connected component will be infected sooner or later.
The total share of infected agents when using the SIR model is always less than in the case of the SI model. However, the SIR concept provides more possibilities for varying such indicators as the initial number of infected vertices, the threshold value of the probability of an edge to become active, etc., depending on what the target values ​​of the proportion of infected and the company's budget for advertising. Thus, having two specified input parameters and interrelations revealed in the work, the task of the manager is reduced to optimization.
Bibliography
1. Bailey, NTJ (1975), The Mathematical Theory of Infectious Diseases and Applications, 2nd ed. (Charlin Grin & Company, London)
2. David Easley and Jon Kleinberg "Networks, Crowds, and Markets: Reasoning about a Highly Connected World", Cambridge University Press, 2010., p. 655
3. Diekmann, O., H. Heesterbeek, and T. Britton (2012), Mathematical Tools for Understanding Infectious Disease Dynamics (Princeton University Press, Princeton, USA)
4. Jan Medlock, Mathematical modeling of epidemics. University of Washington, 22 & 24 Vaf 2002
5. Romualdo Pastor-Satorras, Claudio Castellano, Piet Van Mieghem and Alessandro Vespignani "Epidemic processes in complex networks", April 22 2015