📜 ⬆️ ⬇️

Writers and readers - analysis of the structure of comments LJ TOP-500, part 1

Start


I continue a series of publications-studies on the structural analysis of the Russian-language segment of LiveJournal . The first publication was devoted to some analysis of the audiences of the top 10 bloggers. During its preparation, a link graph of Russian LJ was collected, covering more than 2 million blogs and 58 million links between them . I’ll come back to this column in the next series (I haven’t yet comprehended it), and today I’m about something else. Namely, about who, how often and whom he comments on in the most bustling corner of the LJ - discussions in TOP-500 journals .

Taking as a basis the state of the LiveJournal rating at the beginning of April and having pinched off 500 top positions from it, I started collecting data using the following method. Each blog from the list was requested 25 recent publications (available through regular means LJ). From each publication a list of commentators was pulled out (name, id-comment, place of comment in the tree) if, of course, comments on the record are open to outsiders.

Regular means of LiveJournal do not allow this, attempts to make a feint with ears and tear RSS feeds from a search on blogs from Yandex stumbled upon a very strange and somewhat illogical behavior of this issue (this is not a complaint, this is just a fact), therefore information about the structure of comments had to be extracted from the pages magazines. But it turned out for the better :) By the way, if that: DDos on LJ is not me :)
')
As a result, after several days of gathering information (the original version of the crawler was not buggy-free, LJ slowed down - at that time it was just another DDoS), these were the initial data:

487 journals with at least one commented post;
10546 posts with at least one comment;
809563 comments (excluding anonymous), of which 115326 (14.2%) are the answers of the owners of the journals;
114412 commentators, 3884 of them (3.4%) are logged in using external services (twitter, facebook, etc.)

Next in the program:

1. Statistics of various characteristics of magazines from TOP-500
2. Some implicit but curious ratings.
3. Search for the answer "how to become a popular blogger" using cluster and correlation analysis (this, however, will be in the second part of the study)

1 Statistics of journals and publications


Below are the distributions of some statistical characteristics of journals from the sample studied. In view of the power distribution characteristic of social networks (a special case of which is the Pareto curve ) having a “long tail” on the histograms, this “tail” will be collected in the last extended interval. And along with the arithmetic mean, I will give the median of the series as a more robust estimate of the mean value.

By the way, an interesting detail. The function of dependence of the number of friends on the position of a blogger in the top is almost ideally approximated by a power function with R2 = 0.9932. But similar approximations of the number of comments and commentators are significantly worse: R2 = 0.2355 for comments and R2 = 0.3074 for commentators.

It would be interesting to look at these numbers after a while and for more posts. So their desire for a unit would mean a gradual movement of blogs with heated discussions in the comments today to the “head” of a top reader, i.e. "Shake down" a consolidated rating .

1.1 Publications, comments, commentators

The two histograms below give an idea of ​​the distribution of such characteristics of publications (all authors) as the number of comments and the number of unique commentators .



In the studied sample only 198 posts with the number of comments from 500 to 1000, and 69 typed more than 1000 comments. A typical publication even a top blogger gains 26 comments (by median).

Of course, the publications of the “top” of the top collect more comments, this can be seen on the change in the median of the set of comments for different “cut-offs” of the rating. The larger the sample, the faster these indicators dissolve:

TOP-10 211
TOP-30 149
TOP-100 70
TOP-200 44
TOP-500 26

The same picture for the number of unique commentators in each publication.



A typical entry in LiveJournal has 16 "negotiators". More than a hundred people gathered on just 725 publications (6.85% of all), of which from 500 to 1000 commentators from 42 entries (0.4%) and as many as 4 entries collected more than 1000 readers who have something to say about this. .

1.2 Authors and their admirers - analysis of the discussion audience

It is very likely (and this I will try to identify in the second part of the study) that a significant contribution to the interest in the journal is made by the nature of user activity in the comments: the presence of a permanent audience , the involvement of the author of the journal in the discussion, the existence of the discussions themselves , and not just comments "fterku" and "Down".

For example, you can evaluate the activity of the author of the journal through the share of his answers in the total number of comments. The distribution of authors in this section is shown on the histogram:



So, the response rate of 50% means that the author left his answer for every visitor comment. Accordingly, a 20% share means that the author responded to every fourth (yes, yes, the fourth, not fifth) comment. The average value for all journals is about 16% of responses . Those. for every fifth comment the abstract author gives the answer.

Commentators

Magazines can be ranked by the number of unique commentators - i.e. by the audience, not only reading, but also participating in the discussions written.
Number of commentatorsNumber of magazines
0 - 200206
200 - 400118
400 - 60065
600 - 80034
800 - 1000eleven
more than 100053

The average magazine from the TOP-500 has about 260 commentators (of course, for the last 25 posts).

To isolate the core of the commentators, we will make three additional (and very indicative) sections and present the obtained average values ​​for them:
1. 61% of blog commentators left only one comment in the journal.
2. 29% left 2-4 comments
3. and only 10% of commentators actively take part in the life of a blog, leaving 5 or more posts

Discussions

The most interesting thing in my opinion is the definition of the debatable appeal of the magazine. To search for a magazine for an amateur chat, you can draw a lot of different metrics, the benefit is the comments - there is a tree, and the tree - there is a graph, and on the graphs you can count a lot of things.

After a brief reflection, I took the following indicator: the average number of comments in a thread . Very clear indicator. But not visual. Then that the average will fluctuate around two at best, and even roll down to unity.

Therefore, take the number of threads with more than N comments in the journal. For simplicity, N is taken as half the median of the maximum thread lengths. With a median of 22 comments in the thread, N = 11.

Amount of "heavy" threadsNumber of magazines
0 - 10346
10 - 2069
20 - 3021
30 - 4014
40 - 50five
50 - 100nineteen
more than 10013

The average journal has only 4 threads with more than 11 comments.

2 Additional ratings


I will give several additional ratings (in the three top positions) based on the commenting indicators discussed above.

Number of comments (total)

MagazineNumber of comments
nikitabesogon42752
alexsword33057
krispotupchik15465

Audience Commenters (total)

MagazineCommentators, total
pesen_net5989
toster4626
mzadornov4184

The number of responses of the owner of the journal (total, share in the number of comments)

MagazineAnswers totalReplies,% of the number of comments
mcheburashkina479940.5%
alexsword422112.8%
kitya335142.5%

The core of the audience (in total, the proportion of the total number of commentators)

MagazineCommentators% of total
nikitabesogon83523.1%
navalny82727.0%
fritzmorgen61022.9%

The core of the audience is the number of commentators who left 5 or more comments in the journal.

Pause...


This concludes the first part of the study. In the second part, I will try to put forward a couple of hypotheses, confirm or deny them, as well as look for similarities in such a motley crowd of bloggers :) In a week. Stay, as they say, tuned!

Cross-post in LiveJournal: infist-xxi.livejournal.com/79250.html

Source: https://habr.com/ru/post/118870/


All Articles