📜 ⬆️ ⬇️

Audience comparison of Habrahabr, Hiktaims and Megamind

Hi, Habr!
A year ago, I wrote an article about who and how is subscribed to Habrahabr in the social network Vkontakte. Literally in the very first comments to that post, the wish was expressed to see the difference between the subscribers of Geektimes and Habr itself. Only a year has passed and I, loving my laziness, fulfill this desire.

In fact, my slowness had objective reasons - in January, Megamind was launched, and it became obvious that a comparison should be made on all three sites. And for this it was necessary to wait at least six months from the time of the final separation of Habr.

In this article, there will be no regular statistical calculations on which day of the week the post on Habré receives the best rating, and in which it collects few comments - everything has already been said about this long before me. But under the cut, we will try to understand how the audience of “Habrovye” publics differ in different parameters (from gender to attitude to bad habits), and whether there is a connection between the behavior of users in VK and on the sites themselves.
')


Instead of intro


For a start, let's turn to the subject area. What are the three once united sites?
If we recall the explanations of the creators, then briefly and very simply , the specificity of each site is as follows:


How and how are the audiences of these sites? Perhaps only TM employees can answer this question in detail. And we will look at how the audiences of the same name public in VK differ.

Briefly about the method of data collection.
With the help of VK Api, data was collected for all subscribers of Habrahabr, Geektimes and Megamozg publics. Data was collected at the end of October. Around the same date, using a samopisny parser (access to Habr Api, alas, no) all available (or almost all) available articles from the same sites were downloaded.

In some places I refer to the statistical significance or insignificance of differences. She was tested using the chi-square test. Significance level <0.05 (including correlation coefficients).

UPD: In addition, I will repeat my quotation from the previous article:

“Also, I draw your attention to the fact that the sample under study is a public audience from the Vkontakte social network. This means that user data in it may periodically change, they may be incorrect or inaccurate. Therefore, when I say “Habr's readers are 146% of 91-year-old men from the Isle of Man”, this is not the ultimate truth. This is just the information given by users in the profiles. ”And the conclusions made on the basis of the data of Habr's subscribers on VK, of course, will not necessarily be true for all the hackers on the sites themselves.

First, it is necessary to understand how the public audience resonates. For the solemnity of the moment, we present the Venn diagram with respect to scale:

Public audience intersection table
HabrahabrGiktimesMegamind
Habrahabr517,553--
Hiktimes31 30945,603-
Megamind11,1627,03413,470

General intersection (users subscribed to all three publics at once) - 6,481

We see a completely logical picture. Since GT and MM are “offspring” of Habr himself, for the time being they cannot be with him either by the size of the audience as a whole, or even by the relative number of “unique” subscribers.
By "unique" subscribers here are users who subscribe only to this public and not to one of the other two. In the figure, they are highlighted with colored areas, while “non-unique” ones are gray.
In order to most clearly highlight the differences between public audiences, we will analyze exactly the “unique subscribers”, that is, the gray areas in the figure are discarded. An example of why this should be done is given below.
So let's get started.

Floor


Let's not be original and first look at the differences by gender:


An interactive option (where possible, I will provide links to interactive charts, because they are more visual and pleasing to the eye).

Most girls as a percentage of subscribers Megamind - almost a third. Least of all - in Giktayms (among geeks are representatives of the "weak" sex less common?), And Habr occupies a middle ground. Moreover, these differences are statistically significant.

Notice how the distribution is different for unique and non-unique users: most GT and MM subscribers are XX subscribers at the same time. Most XX subscribers are men. Because of this, the distribution of the trait (in this case, gender) in other classrooms begins to be distorted. That is why we analyze only unique subscribers.

In general, we saw nothing unexpected: among the “techies”, traditionally more than men. Megamind, perhaps the least "tech-narian" project of all that predetermines a relatively high percentage of girls.
With a floor decided on age.

Age


Let's look at the distribution of the relative number of subscribers by year of birth (values ​​up to 1975 fluctuate around 0, so this part of the graph will be discarded for clarity):


Interactive option

Habr and GT have rather smooth curves. The Megamind line "sausage" most of all - probably, this is due to the relatively small number of respondents. But even in spite of this, it is obvious that the “peak” of Habr falls on a more substantial age, rather than on his “affiliated” sites, even if only for a couple of years. Probably, such differences are quite logical. Although I personally expected that Megamind will have a more aged audience. But, as you know, my expectations are my problems.

At the same time, the differences between XX and GT, XX and MM are statistically significant, and between GT and MM are not (which, in general, can be seen from the figure). The surge of activity in the range of 2000–2001, observed primarily in Habr, is also curious, I did not find an explanation for him. A strong surge in the number of the Vkontakte audience for this year of birth is not observed. So let's hope that young people are just growing interest in IT. Or is it somehow related to the "default" ages when registering on the social network.

Geography


This time (unlike the past research) we will limit ourselves to the countries of the “big four” Habr - Russia, Ukraine, Belarus, Kazakhstan. We will reject non-CIS countries, because even if a country is true in the user profile (remember that sometimes habravchane is indicated in the “country” column), the overwhelming majority of users from such countries are immigrants from the post-Soviet space. The countries of the former USSR remain. We will not take them into account either, because they do not give any meaningful (and sometimes they don’t give any) the number of unique subscribers for Megamind.

In the end, about 92% of subscribers fall into the four countries mentioned above, so we will not miss much. And this is how the breakdown of the “normalized” number of subscribers by them looks like:


Interactive option

If you remember, last year Belarus became the most zahbrennoy country. She still does not miss her, but only with respect to Habrahabr. While subsidiary projects are interesting, first of all, to users from Russia. Quartet closes Kazakhstan, except in the case of Megamind, where the third place was torn out in a bitter struggle from Ukraine. But according to MM, the most even distribution is generally observed.

The sharpest decline in interest in children’s affiliates is observed among Ukrainian users. Either in Ukraine they are less interested in the topics of these resources, or over the past year, users from this country have begun to subscribe to VK public less often. Testing the first hypothesis is beyond the scope of our study, but the second one is easy to refute - just look at the growth rate of Habrahabr subscribers over the past year (since the last study) by country:


Interactive option

As we can see, all the countries of the "big four" showed the same growth, with the exception of Kazakhstan, which is in the single-digit leaders.

Universities


There will be no statistics on universities this time, sorry. And that's why: as you remember, we only look at unique users. But the division by universities breaks subscribers into too small groups. So small that even for GT (not to mention MM) there are often no unique users left. Because of this, the university may be on the list of universities of the Habr subscriber, but will not be on the list for GT. What will create a false impression that the students / graduates of this university are not interested at all in Geektimes.

Clear example. There is such a university, or rather the faculty of the university - FSF ITMO. From it 30 people are signed on Habr and 5 people on Geektimes. In this case, all subscribed to the GT subscribed to the twentieth. As a result, the number of unique GT subscribers is 0. What to do with such a university? Ignore? Include statistics with a special mark? Analyze by non-unique users? In general, there are too many questions, and the value of the comparison is questionable. So if someone is interested in statistics for a particular university - contact, unload.

Bad habits


In relation to smoking and alcohol, subscribers express surprising indifference, even uninteresting:


Interactive option


Interactive option

However, it can be noted that mega-brainers are a little more loyal to bad habits. Apparently, the work is more nervous. But in fact, these are not significant differences.

Political Views


But the differences in political views were significant:


Interactive option

The most indifferent, liberal (but also conservative!) Were Megamind subscribers. And the least and most moderate are the “geeks” and habravchane respectively.

Family status


Even more interesting are the differences in love affairs.
"Vkontakte" provides several options for the relationship in which the user is. We will put them together a little bit to make it clearer and more convenient:

Correspondence table of marital status
Status for analysisStatus from VK
Have a partnerHave a partner
Married
Engaged to
In love (yes, you can be in love without a response, but do not be a bore)
No partnerNo partner
Actively lookingActively looking
-It's Complicated

The status “everything is difficult” is excluded - it is difficult to interpret, and only 3.2% of subscribers have chosen it.
In addition, we divide the respondents by gender. And we get an interesting picture:


Interactive option

Firstly, in all public places girls are more successful in finding the second half than guys (and statistically significant).

Now look at the number of subscribers without the second half. In total, the statuses “free” and “in search” give approximately the same results for all public groups. But at the same time, habravchane are almost twice as “bolder” as their colleagues and are actively looking for a soul mate. Any comment on this matter looks like a flat joke, even if it was said seriously. So leave no comments. Well, girls-subscribers Megamozga, apparently, and so well, even if they are alone.

The connection between VK and sites (likes, ratings, that's all)


The next step I would like to link the behavior of users in the VC and on the sites themselves. Immediately make a reservation that we will only consider data for the year 2015. Firstly, because it was at the beginning of this year that the final division into three different sites took place. And secondly, I am not sure that the creators of Habr would like to see a comparison of indicators, for example, the number of views. Especially in the context of years.

For VK records we will consider three main numerical indicators:


The posts on the sites indicators a little more:

But, of course, in addition to the above, there are a number of factors that may affect the performance of posts. Some of them were described in other articles on the subject (the day on which the post was published, for example), some require a more in-depth analysis, which is beyond the scope of this article, so we will not try to take them into account. After all, we don’t have a task to build a regression model, we just want to look at the relationship between the indicators.

But at least one more factor we must take into account, namely, the date of publication. After all, over time, the number of subscribers can grow, and this, in turn, can affect the number of reposts and likes (more subscribers - more likes). Then we can’t just compare the record created on January 1, 2015 with the record from today's date - we will also need to take into account how many likes put today.

To begin, we will determine the change in the number of subscribers for the 2015th year. With this we will be helped by the good old web archive , with which we can find several values ​​of the number of subscribers of each public for several different dates. Let's display these points on the chart:



We see that the fastest growing audience of the Megamind (the nearest one is Giktatimes), and the slowest - Habr. This is quite logical, given the age of the public - young publics grow faster.

But the main good news for us is that the change in the number of subscribers is almost perfectly described by a linear function. You will not have to suffer much further if we want to take into account the influence of this factor. With the simplest regression, we can predict the size of the audience of any of the publics on any date in the period under study.

But will this factor have to be taken into account? Looks like no:



Huskies are fairly evenly spread throughout the year. It turns out that no matter how the audience grows, it does not become more generous to likes and repost.

By the way, pay attention to the “notches” below on the distribution of HH. These are the same weekends about which so many times have been said in the reviews of Habr's articles - apparently because there are few articles and habrazhiteli become more generous to the rating. To a certain extent, this pattern also migrated to the social network. But only for Habr - the other publics, as can be seen from the graphs, it does not apply. This is confirmed by the correlation coefficients for the values ​​“number of records per day” and “average number of likes.”


Now that we have clarified the issues with the most obvious dependencies, I want to see how things are going with other indicators. To do this, we construct the correlation matrix for each public. But let us remember that the correlation indicates a closeness of the connection, but in the general case does not allow to establish the cause and effect. For clarity, we will display the matrix in the following form:



As we can see, the situation is about the same for all publics. There are serious differences only in the dependence of the “favorites” indicator on likes and reposts. At Habr, the connection is quite obvious, the rest is much weaker.
It should also be noted almost linear relationship likes and reposts, although it was quite expected.

From the day of the year (and, as a result, from the number of subscribers) nothing depends. But there is a fairly strong correlation between article views and its rating / number of additions to favorites. Which is quite logical - a bad article is unlikely to be viewed a lot, and a good article written for a small audience will not get very many advantages.

Huskies and reposts from VK are poorly related to the rating affixed to the sites (but at Habr and GT they are not strong, but correlate with the number of articles viewed). This is actually one of the main conclusions of the comparison. It turns out that the audience of habro-publics in Vkontakte and the audience on the websites do not agree too much on the evaluation of posts.

Interestingly, the number of comments on sites and the number of comments in the VC are very weakly dependent on each other, although they are designed to serve the same purpose - to discuss the article. Another confirmation of the different behavior of users in VK and on the portals themselves.

Instead of conclusion


One can argue for a long time whether the Habr's division was justified and for what purpose it was done, but already now, after a little less than a year, the differences between the audiences of three different sites (or at least their publics) begin to appear. Summarizing, we can say that gradually both Hiktames and Megamind begin to live their own lives, gaining their own partly unique audience. Although so far incomparable in quantity with the audience of his “dad”. How the division affected the life of Habr himself is another matter that goes beyond the scope of this post.

On this philosophical note and round out. Until new meetings, if such is to be. And remember that statistics is only the third kind of lie.

PS I apologize for posting the same way in the VK Api hub, but I did not give any code (it is trivial). But as far as I saw, there are sometimes such articles. I think this is quite a suitable public post for the processing of data extracted from VK.

Source: https://habr.com/ru/post/273387/


All Articles