📜 ⬆️ ⬇️

Session of white magic without revealing or how I was looking for trolls in LJ



Prologue


The last year and a half, I felt in LJ like this taxi driver. I first joined LiveJournal more than 10 years ago. About Facebook then probably Durov did not hear, but here it was possible to unite according to interests, exchange opinions, write to leading figures like Lebedev. I formed a friend and I began to notice that the Russian segment is not that big and in general everyone knows each other. Years from 2011 about LiveJournal began to sink away, movement began to move to Twitter and Facebook, and I began to notice that the contingent of commenting is changing. At first it was imperceptible, but since last year I caught myself thinking that, opening comments on a post, I wondered about the very bearded hermit. Yes, and in the press began to skip articles of interesting content .

But I don’t have much faith in journalists, so having armed myself with Python, BeautifulSoup, psycopg2, matplotlib and PostgreSQL, I decided to conduct my own mini-investigation and at the same time refresh / acquire some skills.

Appearance and behavior


What to conceal, I myself sometimes liked to troll, for many things it is still embarrassing, but the invaluable (ha-ha) experience allowed me to formulate the main signs of a troll:

But the ones I was looking for were not exactly trolls.
They were clearly not loners, it seems that they acted together and they had more opportunities to disguise.
They could write meaningful posts and have a lot of friends, so while my ugly scriptwriter twitched the pages of the mobile version of LiveJournal of top bloggers selected by me, I puzzled how then to process the data.
')

Catch


The script worked all New Year's holidays, in the console sometimes we found familiar people, sometimes even already gone. Still, 10 years is a long time ...
For a couple of weeks, catching three bans, we managed to parse about 11 thousand posts, 2.3 million comments, which left about 90 thousand users to not all of the records of the 7 top bloggers. Sparsely, and probably about 5% of what I originally wanted to parse. A dump of this database can be downloaded here .

It's time to analyze the data. First of all, I decided to deduce the dependence of "murzilkopodobnosti" on the time of registration.
Mdaa ...

Normalization, attempts to calculate with the influence of the number of own posts, the selection of weights, all this did not show any anomalies and was more like an attempt to fit the solution for the answer. We look for external users.
Hmm ...

Well, this jump is quite understandable. For example, added new services from which you can log in.

Almost resigned to the fact that there is no hint of evidence, I decided at last to plot a simple correspondence between the date of registration of the user and his current number of friends.
There they are



Yes, I collected little data. Yes, this is not really my subject area, I did many things for the first time and could be mistaken. Yes, I already forgot what the Student coefficient is. And in general, it proves nothing.
Is it possible that users registered on the same day are more popular in the blogosphere? Hardly. I propose to think together.

Instead of an epilogue


The funny thing is that the whole schedule looks like this

Anomaly of 2004 is larger.


Link to the repository . Do not judge the code strictly, in a hurry.
Special thanks for the advice in the investigation I want to express to my friend a11aud .

Source: https://habr.com/ru/post/247929/


All Articles