Start
I continue a series of publications-studies on the structural analysis of the Russian-language segment of
LiveJournal . The first
publication was devoted to some analysis of the audiences of the top 10 bloggers. During its preparation, a link graph of Russian LJ was collected, covering
more than 2 million blogs and 58 million links between them . I’ll come back to this column in the next series (I haven’t yet comprehended it), and today I’m about something else. Namely, about
who, how often and whom he comments on in the most bustling corner of the LJ - discussions in TOP-500 journals .
Taking as a basis the state of the LiveJournal rating at the beginning of April and having pinched off
500 top positions from it, I started collecting data using the following method. Each blog from the list was requested
25 recent publications (available through regular means LJ). From each publication a list of commentators was pulled out (name, id-comment, place of comment in the tree) if, of course, comments on the record are open to outsiders.
Regular means of LiveJournal do not allow this, attempts to make a feint with ears and tear RSS feeds from a search on blogs from Yandex stumbled upon a very strange and somewhat illogical behavior of this issue (this is not a complaint, this is just a fact), therefore information about the structure of comments had to be extracted from the pages magazines. But it turned out for the better :) By the way, if that: DDos on LJ is not me :)
')
As a result, after several days of gathering information (the original version of the crawler was not buggy-free, LJ slowed down - at that time it was just another DDoS), these were the initial data:
487 journals with at least one commented post;
10546 posts with at least one comment;
809563 comments (excluding anonymous), of which
115326 (14.2%) are the answers of the owners of the journals;
114412 commentators,
3884 of them
(3.4%) are logged in using external services (twitter, facebook, etc.)
Next in the program:
1. Statistics of various characteristics of magazines from TOP-500
2. Some implicit but curious ratings.
3. Search for the answer "how to become a popular blogger" using cluster and correlation analysis (this, however, will be in the second part of the study)
1 Statistics of journals and publications
Below are the distributions of some statistical characteristics of journals from the sample studied. In view of the power distribution characteristic of social networks (a special case of which is the
Pareto curve ) having a
“long tail” on the histograms, this “tail” will be collected in the last extended interval. And along with the arithmetic mean, I will give the
median of the series as a more robust estimate of the mean value.
By the way, an interesting detail. The function of
dependence of the number of friends on the position of a blogger in the top is almost ideally approximated by a power function with
R2 = 0.9932. But similar approximations of the number of comments and commentators are significantly worse: R2 = 0.2355 for comments and R2 = 0.3074 for commentators.
It would be interesting to look at these numbers after a while and for more posts. So their desire for a unit would mean a gradual movement of blogs with heated discussions in the comments today to the “head” of a top reader, i.e.
"Shake down" a consolidated rating .
1.1 Publications, comments, commentators
The two histograms below give an idea of ​​the distribution of such characteristics of publications (all authors) as the
number of comments and the
number of unique commentators .

In the studied sample only
198 posts with the number of comments from 500 to 1000, and
69 typed more than 1000 comments. A typical publication even a top blogger gains
26 comments (by median).
Of course, the publications of the “top” of the top collect more comments, this can be seen on the change in the median of the set of comments for different “cut-offs” of the rating. The larger the sample, the faster these indicators dissolve:
TOP-10
211TOP-30
149TOP-100
70TOP-200
44TOP-500
26The same picture for the number of unique commentators in each publication.

A typical entry in LiveJournal has 16 "negotiators". More than a hundred people gathered on just
725 publications (6.85% of all), of which from 500 to 1000 commentators from
42 entries (0.4%) and as many as
4 entries collected more than 1000 readers who have something to say about this. .
1.2 Authors and their admirers - analysis of the discussion audience
It is very likely (and this I will try to identify in the second part of the study) that a significant contribution to the interest in the journal is made by the nature of user activity in the comments: the
presence of a permanent audience , the
involvement of the author of the journal in the discussion, the
existence of the discussions themselves , and not just comments "fterku" and "Down".
For example, you can evaluate the activity of the author of the journal through the share of his answers in the total number of comments. The distribution of authors in this section is shown on the histogram:

So, the response rate of 50% means that the author left his answer for every visitor comment. Accordingly, a 20% share means that the author responded to every fourth (yes, yes, the fourth, not fifth) comment. The average value for all journals is about
16% of responses . Those. for every fifth comment the abstract author gives the answer.
Commentators
Magazines can be ranked by the number of unique commentators - i.e. by the audience, not only reading, but also participating in the discussions written.
Number of commentators | Number of magazines |
0 - 200 | 206 |
200 - 400 | 118 |
400 - 600 | 65 |
600 - 800 | 34 |
800 - 1000 | eleven |
more than 1000 | 53 |
The average magazine from the TOP-500 has about
260 commentators (of course, for the last 25 posts).
To isolate the core of the commentators, we will make three additional (and very indicative) sections and present the obtained average values ​​for them:
1.
61% of blog commentators left
only one comment in the journal.
2.
29% left
2-4 comments
3. and only
10% of commentators actively take part in the life of a blog, leaving
5 or more posts
Discussions
The most interesting thing in my opinion is the definition of the debatable appeal of the magazine. To search for a magazine for an amateur chat, you can draw a lot of different metrics, the benefit is the comments - there is a tree, and the tree - there is a graph, and on the graphs you can count a lot of things.
After a brief reflection, I took the following indicator: the
average number of comments in a thread . Very clear indicator. But not visual. Then that the average will fluctuate around two at best, and even roll down to unity.
Therefore, take the
number of threads with more than N comments in the journal. For simplicity, N is taken as half the median of the maximum thread lengths. With a median of 22 comments in the thread, N = 11.
Amount of "heavy" threads | Number of magazines |
0 - 10 | 346 |
10 - 20 | 69 |
20 - 30 | 21 |
30 - 40 | 14 |
40 - 50 | five |
50 - 100 | nineteen |
more than 100 | 13 |
The average journal has only
4 threads with more than 11 comments.
2 Additional ratings
I will give several additional ratings (in the three top positions) based on the commenting indicators discussed above.
Number of comments (total)
Audience Commenters (total)
The number of responses of the owner of the journal (total, share in the number of comments)
The core of the audience (in total, the proportion of the total number of commentators)
The core of the audience is the number of commentators who left 5 or more comments in the journal.
Pause...
This concludes the first part of the study. In the second part, I will try to put forward a couple of hypotheses, confirm or deny them, as well as look for similarities in such a motley crowd of bloggers :) In a week. Stay, as they say, tuned!