Generalization problems PageRank

If someone authoritative refers to you, this raises your status more than links (“voices”) from many of the least authoritative sources - such was the initial idea of ranking sites by Google. It found its obvious continuation in social network analysis, where the formula for PageRank is a kind of centrality , i.e. determining which of the nodes of the social graph is more “central" and on which grounds. I am not an expert in this subject; from a cursory inspection diagonally, it seemed to me that social network analysis on the Internet is mainly used for social media marketing, where people’s ranking is not the main goal. Rather, the goal of smm is to more effectively promote brands, increase sales, etc. However, ranking people can be an interesting goal in their own right. Here I briefly listed these interests.

Directly applying the PageRank formula to ranking people raises questions; I do not have enough competencies to answer them, I hope for the response of a knowledgeable community.

1. The classic PageRank of a site has a probabilistic interpretation - this is the probability that a person will randomly click on a link to a given site by clicking on the links. This takes into account the damping factor, i.e. the fact that the user does not click forever. From a mathematical point of view, the damping factor ensures the uniqueness of the solution to the ranking problem. But if we are talking about ranking people, the probabilistic interpretation loses its meaning. It is not clear then how to interpret the damping factor. Is that abstract - for the regularization of the problem, as suggested by Dmitry Shepelyansky . And what its value will be adequate in this case.

2. Another problem is related to what is considered a reference or voice to people. For sites, there is only one type of voice - a hyperlink, whereas for people you can consider a variety of things as a voice. The most obvious one is friend connections on blogs and social networks. But for example, the presence of comments on your topic is also essentially “voices” in your favor, because The topic attracted the interest and attention of the audience. Here we also include likes, retweets, facts of reading and so on. To summarize: any manifestation of attention to the author or his content is a voice. From here there are still problems.
')
3. For example, a voice according to karma from a casual reader cannot be considered equivalent to a friend relationship. Or, again, one comment is not equivalent to regular commenting from the same author. Therefore, the matrix encoding the social graph ( adjacency matrix ) must contain link weights. As far as I understand, in the case of the Google matrix, the links are essentially balanced. one node distributes wounds in inverse proportion to the number of connections emanating from it, and it turns out that relations in the whole graph differ in their “strength”. In other words, there seems to be no problem on the part of mathematics, the question is only in an adequate definition of weights (although it is not trivial in itself).

4. In the classical PageRank formula, if there is a link to the site, then the transmitted value of the wound cannot be negative. The non-negativity of the ranks of any node allows one to apply the Perron-Frobenius theorem on the existence of a solution. But the voice on karma can be negative, the comment can be negative, etc. The possibility of transferring negative values of ranks between nodes of a social graph, apparently, requires a mathematical proof of the existence and uniqueness of the solution of the ranking problem in such a formulation.

5. Classic PageRank is applied to a network of homogeneous objects, i.e. objects of the same type - "site". But as mentioned above, when ranking people, attention can be shown both directly to the author (voice by karma, friend-link, recommendation on LinkedIn, etc.), and indirectly through the reaction to its content. The latter case is common on the Internet - we often evaluate people by their content than through personal acquaintance. And authors and content units form a network of disparate objects, in which, for example, a high-ranking post can “vote” for its author. Judging by this post , in the ranking methodology of Witology this circumstance was somehow taken into account. In terms of mathematics, the rating from the PRUFFI agency is not at all equally advanced, but it focuses on another important aspect - when rating people it makes sense to take into account the rating of the organizations in which they work. If organizations understand a rather abstract thing - any projects involving a rated person, we also get a network of heterogeneous objects in which objects of different types transfer wounds to each other.

6. In a real communication network, not only do they have different weights, but this weight also depends on time. Today, people are friends, and tomorrow they are hostile.

In view of the above, this problematic is apparently related to the currently developing area of Dynamic network analysis .

I do not want to write down the seventh point, because this is my personal misunderstanding, but on my simple test task of calculating PageRank in a network of four nodes, it turns out that if the node has no outgoing connections, then the ranks of all the nodes in the network will eventually be reset. In one place I found how a person excluded all such nodes from consideration. This can be understood, but Google assigns the value of PR to any sites, including those that do not have links to other sites. In another place it is written that nodes without outgoing connections should be replaced by nodes that have outgoing connections to all other nodes of the network at once. But it is not very clear why this should be done exactly as it affects the ranking results.

Source: https://habr.com/ru/post/118053/

All Articles

Generalization problems PageRank

More articles: