📜 ⬆️ ⬇️

About Habrahabr, statistics and cakes

Lyrical digression


Hello!
Once on a dark winter evening I had nothing to do and I killed time by reading my beloved Habrahabr. In comments, the phrase that Habr, they say, is no longer a cake, once again slipped through.

Statistics, statistics and again statistics


I wondered if it was possible to evaluate the quality of articles on Habré numerically and whether it will be seen by the received estimate, how it changed over time, or, in fact, all these comments are nothing more than a grumbling about what used to be and grass was greener? It was in the evening and there was nothing to do, so I took the will into a fist and wrote a simple bot who leisurely dug up nearly 2,800 pages of the main Habr and collected statistics on the hacked articles from the moment of Habr's discovery until December 31, 2009.
Traditional picture to attract attention, a graph of the number of articles by month:



Theme


The very first thought that came to my mind was to check how the thematic content of Habr changed during its existence. As is known, blogs on Habrahabr are divided into categories that can be found here . For a start, I tried to count the number of articles for each category by year (in the statistics for months there is too much noise, so I had to give it up). Unfortunately, not all blogs have a category, for such it is marked as "n / a".

')
More clearly, the same data can be represented in the form of pie charts:




The positive trend is obvious - on Habré the amount of offtopic decreased and there was more specialized content. Very much increased the percentage of programming. But the iron, about which there is a perception that it has become more lately, has in fact practically not grown - although, perhaps, thanks to the efforts of the same Bumburum, the quality of articles on iron has nevertheless risen.

Ratings


How did the quality of the spherical article in the vacuum of the main page of Habr change during its existence? The first thing that comes to mind is to calculate the average rating of such an article. The following chart illustrates this estimate by month:

The peak that we see in August 2008 is nothing like the launch of SuperHabra and the introduction of invites.

Comments


Another interesting indicator is the average number of comments to the article:

Everything is predictable: unlike articles, all registered users can leave comments, so the introduction of invites stopped the growth of this indicator. The average number of comments well reflects the size of the active audience Habr. Oh yes, the peak on the left is the only article in July 2006 that is still being commented on - after all, it is the very first.

Holivory


One of the most interesting questions that I asked myself before proceeding with this article is this: has there really been more controversial topics on Habré lately that cause readers a storm of emotions and a desire to beat interlocutors? How can you rate this indicator? After much deliberation, I decided that with a certain error this indicator can be illustrated using the ratio of the number of negative evaluations of the article to the total number of evaluations. So, I called an “controversial” article, in which the number of “minuses” is more than one third of the total number of assessments. The following graph shows the controversial articles in red, and the blue line shows everything:

It can be seen quite badly, let's try to calculate the relative number of controversial articles of the total number:

Here you can already see better: the number of controversial articles is growing and now it has almost reached the maximum, which was observed before entering invites (then rumors about a botnet circulated around Habr, which minus objectionable ones and adds to the articles acceptable to its creator). The introduction of invites and new rules has slowed down this process, but not for long. This is probably the only wake-up call that I saw after analyzing the collected data.

findings


It is clear that the whole life of Habr can be divided into two parts - in August 2008, with the introduction of the new engine and rules, the project matured and stabilized. 2009 was the first year of the adult life of this project and he lived just fine: the number and quality of articles grew, not to mention the attendance.
However, not everything is so smooth in the Danish kingdom - you need to do something with the articles, which are minus simply because they mention a topic that the fan doesn’t like some kind of technology or, on the contrary, they are positive because they talk about the sacred cow a fan. The concept of hidden articles for blog subscribers IMHO does not justify itself. However, the answers to the questions “who is to blame?” And “what to do?” Go far beyond the boundaries of this article, and I’ll stop here. The only comment is that the new steering Habr will have to seriously consider this issue.

Post Scriptum


If the reader has any ideas about how else you can analyze the data collected - write to me, I will be happy to hear them.

Source: https://habr.com/ru/post/80948/


All Articles