📜 ⬆️ ⬇️

Algorithm to identify communities in large networks

Recently, numerous attempts have been made to develop an effective algorithm for identifying communities in social networks from millions of nodes that cannot be visualized or analyzed at the level of individual nodes.

The Belgian developers presented a new algorithm that surpasses all existing analogues in computational speed. As a result, it can be used on bases of unprecedented size: analysis of a typical network of 2 million nodes takes 2 minutes. It was called the Louvain Method, since it was created at the time when all the developers worked in Louvain (Belgium).

The Louvain method proved its accuracy when tested on networks with a known community structure, where it was possible to verify the result. Due to its hierarchical structure, it is suitable for analyzing communities at various scales. In particular, it was also checked for communities in the network of the Belgian mobile operator (2.6 million subscribers) and on the basis of web pages collected by the Stanford WebBase crawler (118 million nodes and over 1 billion links).


The result of the analysis of the subscriber base in the network of the Belgian mobile operator. The communities in French are marked in red, the Dutch in green. The section of “intermediate” communities that communicate in both languages ​​is zoomed. This scale shows groups of people of more than 100 people.
')
The method consists of two stages. The first is the search for "small" communities by optimizing modularity at the local level. At the second stage, the nodes of one community are aggregated and a new network of larger scale is built, after which these stages are repeated until the maximum level of modularity is reached.



Thus, after each stage, the program displays larger and larger communities.

For more information, see Fast unfolding of communities in large networks . For example, it contains a table of the quality and speed of the algorithm compared with the algorithms of Clauset / Newman / Moore, Pons / Latapy and Wakita / Tsurumi. For each task, the level of modularity and the execution time on an Opteron 2.2 GHz machine with 24 GB RAM (dash, if the time exceeds 24 hours) are indicated. Judging by the results, the Louvain method literally has no equal.



C ++ and matlab are free.

Source: https://habr.com/ru/post/112592/


All Articles