📜 ⬆️ ⬇️

Measuring Telegram

“So far, the opportunities for full-fledged analytics channels
limited primarily by the capabilities of the BotAPI Telegram
Telegram-marketing channel, June 28, 2016

Everything is good with Telegram channels, except for one thing - they are too difficult to look for. Links can be found almost everywhere, ...

For example:
In the Internet:

1. With the help of robots, indexing some channels in search of other channels ( 1.1 , 1.2 )
2. In the catalogs of channels replenished by the owners of the channels ( 2.1 , 2.2 )
3. On channel exchanges ( 3.1 , 3.2 )
4. In the thematic selections of channels (here, too, flew: 4.1 , 4.2 )
5. In the channel lists ( 5.1 )
6. In Google Dock with channels about channels (took from @raskruti: 6.1 )
')
In the mobile app:

7. Downloading the application with the channel catalog (for iOS: TeleBots)

In the Telegram itself:

8. On channels about channels ( 8.1 )
9. Using bots for channels ( 9.1 )
10. On channels about channels about channels ( 10.1 )

... but the search process remains far from perfect. Without a single source of data and normal analytical tools, it is difficult not only to find, but even to understand:
1. How common are channels in Russia?
2. How popular are the channels and what is “popularity” in numbers?

Such questions need to be answered in numbers. On the Internet, it is possible to find only scattered data. There is something on Rusbase ( here ), in Vedomosti ( here ), on Twitter ( here ), but all this information is not systematically and difficult to verify.

This article is devoted to the analysis of the market of Russian-language Telegram channels. Work has been done from collecting a generalized (unreliable complete) list of Russian-language channels to crawling their contents and building metrics. Only those channels that are promoted on the big Internet were analyzed, indicating links to themselves. This behavior is an indicator of channels created for a large audience.


Step 1. Collecting the list of channels


In the beginning, a single database was created using guerrilla analytics methods. Two online catalogs were selected as data sources, in which users enter their feeds manually: tlgrm.ru and tchannels.me, and two self-replenishing catalogs: tsear.ch ( about the authors ) and inten.to ( about the authors in the section “About us ”at the bottom of the page ), one is reindexed manually, the other in real time. All sites looked for channels that are listed as Russian-speaking. Below is a short table for comparing sources.



What allows you to be sure that the found channels are in Russian?

1. tlgrm.ru is by default created for channels in Russian;
2. tchannels.me consists of channels added manually and explicitly indicated as Russian (two languages ​​for a channel cannot be indicated there by design);
3. tsear.ch determines the language of the channel based on the analysis of its contents using the Yandex.Translate API;
4. inten.to was created by authors specializing in unified API for translation services, which is just used to automatically determine the language of channels.

Tools used in the work: Chrome Developer Tools, cURL to Python converter, Python itself, general erudition.

More than 10,000 channels were found, the results are shown in diagrams. The main conclusion: the databases of different directories coincide only by 3%.

image

How the data were obtained in the above table


Manual replenished directories

1. tlgrm.ru Target section: / channels /. For a complete list, you need to bypass all the relevant categories and “write off” ID channels, they will be right in the body of the page.

2. tchannels.me You can select Russian in the directory settings and skip 27 categories one after another. You can also use the service API to your advantage by slightly changing the parameters: tchannels.me/api/channels?list=top&categoryId=&languages=russian&offset=0&count=1000000

Automatically updated directories

3. tsear.ch Target section: / list / en /. Over and over again, clicking on the Next button can record channels from each page and create a general list.

4. inten.to Target section: / telegram / channels / russian /. The creators of the catalog are very worried about the safety of their database: the channel can be found either by complete matching of the request with the channel ID, or by partial coincidence with the text of its description, while the search yields no more than 100 first matches, but not all 100, but 10 pieces Were tried: the method "in the forehead" and the search for a combination of three letters. Thus, only 1722 channels were found. This is definitely not all, as the rating collected manually from this data with the original inten.to rating did not match.
In order to assess the completeness of the database in the case of inten.to, the “by contradiction” method was used. First, a list of unique channels from the previous three sources (a total of 8283 channels) was compiled, then a search was started for the presence of each of them in the inten.to database. As a result, 3325 of 8283 (~ 40%) channels were found, which makes it possible to estimate the size of the base of Russian-language channels in inten.to as 3325 + 1783 ~ = 5100.

Step 2. Development of a statistics collection tool


No one in their right mind would be able to bypass all 10,000 channels - there is no time. In addition, catalogs are constantly updated - here you need a tool for regular use. The author had some groundwork in the “leveling-up” of the web browser, so it was decided in the work to follow the path of automating the Telegram web client using the browser extension.

Arguments in defense of this method:

1. Sufficiency for solving the problem: the extension allows you to make javascript injections and gives convenient access to the page code using jQuery
2. Visibility of the work: a script that in real time itself clicks and drives in the text, it is clear how to twist
3. Principal scalability: installing a crawler-agent on five virtual machines and deploying somewhere a server distributing crawling agents to the “tasks”, you can get a botnet (in extensions you can send requests to the outside, distribution can be done via webhook)
4. Easy and quick installation: the crawler is installed on any computer with a browser and does not require any preliminary settings for operation. The code can be modified for further work.
5. Cross-platform: Google Chrome is under all common operating systems.
6. Easy and free access to VPN: due to the prevalence of the corresponding extensions to the browser (Hola, frigate, etc.)

Such an approach essentially turns crawling into perverse (without Selenium) GUI writing of autotests. You can watch the crawling process on the video:



Looking ahead, we say that the approach was not universal, as the web client Telegram has memory flowing. When I tried to download the entire history of the channel with flooding (scrolled up), all posts were cached, and the browser began to slow down terribly. Instead of fixing the web client (the code before the webpack ), to raise your instance and through it, it was decided to simply exclude channels with flood from consideration.

Step 3. Process Scaling


To get to Telegram you need to pass authorization by SMS. Nowadays, it would seem that no longer requires a person: you could buy numbers on twilio and use SMS API, but this did not work. Therefore, a direct solution was used: at the Belorussky railway station 15 SIM cards were purchased from the hands. It remained to charge all the old and unnecessary phones that were available, pick up a couple of virtual locks on the home wheelbarrow and start collecting data.

Step 4. Collect statistics from inside Telegram, discuss results


We will consider the main patterns on a sample of channels from the tlgrm.ru website. Arguments for this source:

1. Only 13% of the unique channels, that is, most of them are contained somewhere else - the channel owners are concerned with promotion;
2. The presence of categories in the catalog - will be more interesting and accurate analytics;
3. Russian domain + popularity in runet = channels will be Russian in 99% of cases, you can not recheck.

Almost all categories of channels were selected for the study, except for channels with 18+ content and channels in Uzbek - the first was about one flood, and the second, on average, were more promoted, so their presence would “shift” the statistics upwards.

Next, breaking the list of all channels into quanta of ~ 500 pieces (after 500 Telegram will ban for too frequent calling the search method) and running crawlers in different virtual machines (the author tries to use high performance computing at every opportunity) we get statistics for the period from the last post on the channel deep into at least a week. If there have been no posts on the channel for seven days already - we consider the channel “dead”. If the posts go more often than once every three hours - it is a flood. Then follow the slides with the results and discussion in the format of FAQ

image

What is the size of our sample?
Three independent measurements were taken: May 3, June 4 and June 23. Channels in the categories of interest to us, all three times present on tlgrm.ru, there were 1,889 units.

Are new channels being created?
Yes, constantly. Judging by the creation dates of channels, every day there appear at least 3-4 channels.

Do these channels read and how active?
Yes, they read, quite actively. Over 70% of the channels have “grown” over the past 2 months, the total increase in subscriptions is almost 900 thousand (suspiciously rapidly growing channels are filtered out of this figure, more about them), while only 60 thousand “unsubscribed” from the channels that they once visited .

How many channels are “normal”?
65% of channels are updated regularly and it does not seem like they are heavily flooded. On the other hand, every third arbitrarily taken channel either has not been updated for a long time, or with flood.

image

What is “success” in a telegram?
As follows from the chart above, success is when a channel grows at a speed of 160 people per day or more. Even the best channels could not show an average speed higher than this value. We advise you to pay attention to the music channels: there are relatively few of them, but they have many subscribers. You can try to think in this direction. At the same time, channels with gifs and funny vidos are especially popular: if you believe the statistics, they have grown at least four times more than any other channel in two months.

Summary


The numbers are shown on the graphs, the qualitative conclusions are listed below in the form of a list.

On the stage of collecting the list of channels

1. There are more than ten thousand channels in Russian - this is a lot, for life it’s not reread
2. Each of the directories has its own small list and does not exchange it with “neighbors”, which means that the user must search in many places at once - this is very inconvenient

By the channel analysis stage

1. Channels as a phenomenon continue to evolve - they are becoming more and more every day;
2. The number of channel subscriptions is growing at a monstrous pace, almost half a million each month;
3. The introduction of more complex metrics allows you to generate automatic selection of channels, which nevertheless need a human review.

Already, Russian users are looking for channels, spending dozens of clicks to find content to their liking. At the same time per month increase in subscriptions on these channels in the aggregate is half a million per month. If a service appears that allows you to search for channels centrally and switch between them with just one click, Telegram’s popularity will increase significantly.

Given the above, it's time to use the resources of developers. In order to increase usability service, we recommend directory owners to do the following:

1. Update your channel classifiers - web.archive prompts that they have almost never been changed since inception. It's summer, you can hire students.
2. Change the interface of sites by making a “listalk” of channels in the spirit of Yandex.Music, so that a maximum of one click is required to select a new channel.
3. Start watching how people search for channels and measure how much they view per session; developers give premiums for exceeding the figures.
4. Contact the creators of inten.to and offer them integration - for them Telegram is not a core business, and their technology can be used to create the best catalog on the market.

The materials used in the article
All results (presentation, tables, code) are laid out in open access.

1. Presentation, spreadsheets, channel lists - archive on Google Drive
2. Expansion Code - Github Repository
3. Contact information in Telegram: devrazdev

Thank.

PS Finally, a short funny story about us and the Telegraph in the form of step-by-step instructions:

1. Go to telegra.ph
2. You master the mechanics: you drive in all the test fields, press publish. I once turned out this way . This means that my post was the sixteenth post titled test for that day (June 14th).
3. You have an epiphany.
4. You start to travel in time and pry others like you, who also wrote test in the headline, just changing numbers. You can, for example, go back to January 20 and accidentally meet your own soul .

Source: https://habr.com/ru/post/333344/


All Articles