Create a proxy checker worth 100 million for a couple of days

Every day, passions boil around the relatively popular IT department Roskomnadzor. Recently I stumbled upon one not to say that a fresh, but not quite forgotten article about plans for the future. The point is simple: the RKN announced a tender for the development of a “system of control over blocking sites”, hereinafter referred to as “Auditor”.

In order not to go deep into the essence of the task, I will tear out a fragment of an interview with the head of department, Alexander Zharov, from the article:

“Four companies are participating in the competition. Requirements, despite the fact that they are simple, quite tough. Wins the one who offers the lowest price and the best solution. The initial contract price is 100 million rubles. ” According to the head of the supervisory authority, the Auditor system will allow real-time monitoring of the effectiveness of blocking prohibited sites.

Directly the essence of the problem (quote):

The essence of the probe or program is that, as a regular user, but faster than a simple user, it sends a large number of requests for the entire list of prohibited resources and, by the number of answers, is able to inform the inspector of Roskomnadzor whether the site is blocked.

Suppose that we are only interested in the final result: the ILV inspector wants to know whether the site is blocked by each of the providers.
')
In principle, the task is extremely clear, we will try to find the “optimal solution”:

Let us hypothetically assume that the so-called "Auditor" is a normal proxy checker that receives a server response for a specific proxy. What we need:

List of all providers
About 10 proxies from each provider
List of banned pages (information from the registry)

Having these inputs, we can create a proxy checker that will work in 3 stages:

Downloading the list of prohibited pages from the registry (not many of them, 12,000 according to unofficial data)
Getting a server response to each of the IP addresses of all providers
Report generation / sending

Suppose we have an up-to-date list of all providers, each provider undertakes to allocate 1 proxy from all of its regional data centers. It is enough to loop through the entire list from the registry of prohibited pages, for each of the provider’s proxies, and receive an answer for each regional data center. In the report we will always receive up-to-date information that a particular document gives the correct answer, and the document from the registry is blocked, but to publish negative answers to the report, where you can clearly see which provider and which data center did not restrict access to the prohibited resource. . Well, based on the generated report, you can already fantasize. The inspector will see the current information - this is obvious, but this can not be limited. If you require the provider to give not only the list of proxies for each data center, but also simultaneously send the contact e-mail of the authorized agency responsible for processing the registry, you can automatically inform each provider about non-compliance with technical requirements.

In addition, the majority of data centers for each of the providers use the same registry processing system, in which case the provider either fully meets the requirements of Roskomnadzor, or a defect has occurred somewhere, and the data center does not matter at all, I think the problem There will be a global scale across the provider as a whole.

However. To oblige each provider to give a proxy looks too global, although primitive, perhaps this is the task of the Auditor to bypass this need. But if we talk about the optimization of labor costs, this list is not very helpful for the inspector, so in any case, you need to inform the provider through a report. Accordingly, if you maintain up-to-date information on each of the providers, you can send a signal to the appropriate department to eliminate the problem.

This is a difficult, but rather primitive decision. Wait until the providers form a list of actual proxy servers for the RKN, there is no way out. But, judging by the fact that “at the initial stage, 700 probes will be installed on the lines of the main telecom operators,” it is the smallest of all troubles to give the provider to give a proxy!

Well, now let's go back to the task and consider the “collective farm”, but a quick and even more primitive way of obtaining the desired result. Again, let me remind you of the quotation from the assigned task: “the ILV inspector wants to get information about whether the site is blocked for each of the providers”.

Well, let's imagine that we have a list of all available proxies from Russia. For example, we have 100,000 of them. We will need 4 stages:

Check proxy on the definition of the provider
Creating a list of unique providers
Checking the response of pages from the registry for each of the proxies
Formation of the same report

Only the first item may seem incomprehensible. According to him, I will give a detailed answer. Having a list of available proxies, we can easily identify each or another provider by the hostname of each ip address, besides such solutions are already available. For example, if you open the 2ip service, you will see information about your IP address:

As you can see, it is not difficult to determine by your provider’s IP address. Accordingly, even using the list of current proxy servers, it is possible in 2 stages to check for polling each proxy for belonging to a provider, group proxies by marking all duplicate providers, well and poll the registry for the maximum possible unique list of providers, where the group will be unique proxy to one or another provider. Depending on the popularity of the provider, there will be an extended proxy group, but after all, every 12,000 banned pages will be banished through each proxy, and if the problem with popular providers is solved (since the popular provider will have a rather large proxy group, and evenly distribute the answers to all 12 000 documents will be quite realistic), then what to do with the list of those providers, where God forbid only one live proxy is defined? Do not chase through it the entire list of 12,000 pages.

PS Well, in the end, the use of a proxy is quite a “collective farm method”, it only shows the possibility of implementation. But do not forget, in most cases, the available proxies - this is someone's botnet, and of course the RNN cannot use the proxy list either.

In fact, Roskomnadzor has planned a much more expensive and less optimal way to implement the task. Although, if the issue of the relevance of blocking sites execution is so acute, it would be safer to still oblige the provider to allocate several proxies from their data centers in order to actually check the responses on a particular resource from the registry of prohibited sites. In addition, the registry processing center is not tied to the data center, and the registry processing algorithm is most likely global for the entire provider. This means that the misfires will also be global, which means that we can limit ourselves to the list of providers without additional problems for each data center.

This solution looks fast enough and reliable, and most importantly reinvent the wheel and call it a probe to anything. In addition, the provider will be required to participate in the implementation of any of the options. It is easier to maintain the relevance of the algorithm on its side, rather than control something on the side of the provider, and it will not take much to implement.

Source: https://habr.com/ru/post/268799/

All Articles

Create a proxy checker worth 100 million for a couple of days

More articles: