📜 ⬆️ ⬇️

Analysis of publications on Habrahabr over the past six months. Statistics, useful finds and ratings

Long time no one collected statistics about posts on Habrahabr. We at Cloud4Y decided to find out what changes have taken place over the past six months. We were interested in:

And much more…


What have we done?

On April 24, 2017, statistics was collected for all the latest publications on Habrahabr. It turned out that in the period from September 20, 2016 to April 22, 2017:

For the reliability of statistics from more than 6.5k of publications, we deleted the publication, the number of views of which does not fall in 99.7% of the area of ​​the normal distribution of views. Such posts can spoil the statistics, greatly affecting the mean values ​​and standard deviations in further calculations. We deleted the publications until they all fit into  displaystyle mu+3σ. A total of 7.4% of super-publications were removed.

Normal Density

Control card views of publications before data cleansing

Control card views publications after data cleansing

A useful find # 1

Among the cleaned data set

A useful find # 2

Pairwise correlation of key indicators

A useful find # 3

The best days of the week

Best time

The best publication time when recording the day of the week is the night from Friday to Saturday or Sunday evening.

The image can be enlarged by clicking (opens in the current window)

About conclusions

According to the meaning of the law of large numbers, there is always a finite number of tests where, with any probability less than 1 given in advance, the relative frequency of occurrence of a certain event will differ as little as desired from its probability. On this property, methods for estimating probability are based on an analysis of a finite sample. Our sample for each section had more than 30 empirical test results — actual publication rates.
When publishing one particular article, the above patterns may not have a significant impact or will be neutralized by other factors. If you plan to publish a series of posts, the use of these patterns can be beneficial. In each case, the content of the article has the greatest impact on the popularity of the article.


TOP10 Companies and Users with the highest number of subscribers

TOP10 with the most views

TOP10 with the most comments

TOP10 with the highest number of favorites


If you are interested in any dependence of the indicators, leave a comment, if possible, try to calculate and publish. Write to us and we will send you a link to the Excel file with information about the publications that we have collected.

Source: https://habr.com/ru/post/327352/

All Articles