Session of white magic without revealing or how I was looking for trolls in LJ
Prologue
The last year and a half, I felt in LJ like this taxi driver. I first joined LiveJournal more than 10 years ago. About Facebook then probably Durov did not hear, but here it was possible to unite according to interests, exchange opinions, write to leading figures like Lebedev. I formed a friend and I began to notice that the Russian segment is not that big and in general everyone knows each other. Years from 2011 about LiveJournal began to sink away, movement began to move to Twitter and Facebook, and I began to notice that the contingent of commenting is changing. At first it was imperceptible, but since last year I caught myself thinking that, opening comments on a post, I wondered about the very bearded hermit. Yes, and in the press began to skip articles of interesting content .
But I don’t have much faith in journalists, so having armed myself with Python, BeautifulSoup, psycopg2, matplotlib and PostgreSQL, I decided to conduct my own mini-investigation and at the same time refresh / acquire some skills.
Appearance and behavior
What to conceal, I myself sometimes liked to troll, for many things it is still embarrassing, but the invaluable (ha-ha) experience allowed me to formulate the main signs of a troll:
Few posts. The troll does not write anything, it is fed in other magazines and as a result it ...
Few comments received.
Many comments written in foreign magazines
There are few "friends" friends. A troll does not start a journal for communication, often for the sake of a single cast.
But the ones I was looking for were not exactly trolls. They were clearly not loners, it seems that they acted together and they had more opportunities to disguise. They could write meaningful posts and have a lot of friends, so while my ugly scriptwriter twitched the pages of the mobile version of LiveJournal of top bloggers selected by me, I puzzled how then to process the data.
It was assumed that a large number of murzilok should have appeared in a short time, so the registration date for each user was retrieved.
Since some time in the LiveJournal, they have introduced the ability to comment via Twitter, Facebook and other services. Seeing as hordes of bots on Twitter are dragging anything into trends, I thought this subset of users was promising.
A "Murzilkopodobiya coefficient" was invented, which was the ratio of written comments to received ones. This coefficient for the ideal troll should have gone to infinity, and for a spherical introvert it was expected to be equal to one. But everything turned out to be a little weird.
')
Catch
The script worked all New Year's holidays, in the console sometimes we found familiar people, sometimes even already gone. Still, 10 years is a long time ... For a couple of weeks, catching three bans, we managed to parse about 11 thousand posts, 2.3 million comments, which left about 90 thousand users to not all of the records of the 7 top bloggers. Sparsely, and probably about 5% of what I originally wanted to parse. A dump of this database can be downloaded here .
It's time to analyze the data. First of all, I decided to deduce the dependence of "murzilkopodobnosti" on the time of registration.
Mdaa ...
Normalization, attempts to calculate with the influence of the number of own posts, the selection of weights, all this did not show any anomalies and was more like an attempt to fit the solution for the answer. We look for external users.
Hmm ...
Well, this jump is quite understandable. For example, added new services from which you can log in.
Almost resigned to the fact that there is no hint of evidence, I decided at last to plot a simple correspondence between the date of registration of the user and his current number of friends.
There they are
Yes, I collected little data. Yes, this is not really my subject area, I did many things for the first time and could be mistaken. Yes, I already forgot what the Student coefficient is. And in general, it proves nothing. Is it possible that users registered on the same day are more popular in the blogosphere? Hardly. I propose to think together.
Instead of an epilogue
The funny thing is that the whole schedule looks like this
Anomaly of 2004 is larger.
Link to the repository . Do not judge the code strictly, in a hurry. Special thanks for the advice in the investigation I want to express to my friend a11aud .