It is no secret that the automated system "Auditor" monitors the control of locks on the list of prohibited information in Russia. How it works is well written here in this
article on Habr , the picture from the same place:

Directly at the provider,
the Agent Inspector module is installed:
The module “Agent Revizor” is a structural element of the automated system “Revizor” (AS “Revizor”). This system is designed to control the fulfillment by telecom operators of the requirements to restrict access within the provisions established by Articles 15.1-15.4 of the Federal Law of July 27, 2006 No. 149- “On Information, Information Technologies and Information Protection”.
')
The main purpose of creating the Auditor is to ensure that telecom operators comply with the requirements established by Articles 15.1-15.4 of the Federal Law No. 149-FZ dated July 27, 2006 “On Information, Information Technologies and Information Protection” in terms of identifying access to prohibited information and obtaining supporting materials (data) on violations to restrict access to prohibited information.
Considering the fact that if not all, then many providers installed this device, it should have turned out to be a large network of test beacons like
RIPE Atlas and even more, but with private access. However, the lighthouse is the lighthouse to send signals in all directions, but what if you catch them and see what we caught and how much?
Before considering, let's see why this may be possible at all.
A bit of theory
Agents check the availability of the resource, including through HTTP (S) requests, like this one for example:
TCP, 14678 > 80, "[SYN] Seq=0" TCP, 80 > 14678, "[SYN, ACK] Seq=0 Ack=1" TCP, 14678 > 80, "[ACK] Seq=1 Ack=1" HTTP, "GET /somepage HTTP/1.1" TCP, 80 > 14678, "[ACK] Seq=1 Ack=71" HTTP, "HTTP/1.1 302 Found" TCP, 14678 > 80, "[FIN, ACK] Seq=71 Ack=479" TCP, 80 > 14678, "[FIN, ACK] Seq=479 Ack=72" TCP, 14678 > 80, "[ACK] Seq=72 Ack=480"
In addition to the payload, the request also consists of the connection setup phase: the
SYN
and
SYN-ACK
exchange, and the call completion phase:
FIN-ACK
.
The registry of prohibited information contains several types of locks. Obviously, if the resource will be blocked by IP address or domain name, then we will not see any requests. These are the most destructive types of blocking that lead to the unavailability of all resources on one IP address or all information on a domain. There is also a type of “by URL” lock. In this case, the filtering system must parse the HTTP request header to determine exactly what to block. And, as can be seen above, the connection setup phase should happen to it, which you can try to track down, since the filter will most likely pass it.
To do this, select a suitable free domain with a blocking type “by URL” and HTTP to facilitate the operation of the filtering system, preferably long abandoned, to minimize the ingress of unauthorized traffic except from Agents. This task turned out to be not at all difficult, there are a lot of free domains in the registry of prohibited information and for every taste. Therefore, the domain was acquired, tied to IP addresses on a VPS running
tcpdump
and the counting began.
Audit of "Auditors"
I expected to see periodic bursts of requests, which would tell in my opinion about the controlled action. It is impossible to say that I didn’t see it at all, but there was definitely no clear picture:

That is not surprising, even an unnecessary domain for never used IP will be sent to just a lot of unsolicited information, this is the modern Internet. But fortunately, I needed only URL-specific requests, so all scanners and password pererabotchiki were quickly found. Also, it was enough just to understand where the flood is in the mass of similar requests. Then I made up the frequency of occurrence of IP addresses and walked around the top manually separating those who slipped in the previous stages. Additionally, I cut out all the sources that were sent in one package, there were not many of them. And it turned out this:

A small lyrical digression. A little more than a day later, my hosting provider sent a letter of rather streamlined content, they say there is a resource from the forbidden list of the RKN at your facilities, so it is blocked. At first I thought that my account was blocked, it was not. Then I thought that I was just warned about what I already know. But it turned out that the hoster turned on its filter in front of my domain and in the end I came under double filtering: from the providers and from the hoster. The filter passed only the ends of the requests:
FIN-ACK
and
RST
cutting off all HTTP at the banned URL. As can be seen from the graph above, after the first day I began to receive less data, but I still got them, which was quite enough for the task of counting the sources of requests.
Get to the point. In my opinion, two bursts are clearly seen every day, the first one is smaller, after midnight Moscow time, the second is closer to 6 am with the tail before 12 noon. Peak does not occur at exactly the same time. At first, I wanted to single out the IP addresses that fell only during these periods, and each one for all periods, assuming that the checks by the Agents are performed periodically. But upon careful viewing, I rather quickly found periods falling into other intervals, with different frequencies, up to one request every hour. Then I thought about time zones and what was possible in them, then I thought that in general the system might not be globally synchronized. In addition, for sure, NAT will play its role, and the same Agent can make requests from different public IP.
Since my original goal was not exactly, I considered in general all the addresses that were caught in a week and received -
2791 . The number of TCP sessions established from one address is on average 4, with a median of 2. Top sessions per address: 464, 231, 149, 83, 77. The maximum of 95% of the sample is 8 sessions per address. The median is not very high, let me remind you that the schedule shows a clear daily frequency, so you could expect something around 4 to 8 in 7 days. If we throw out all the once-encountered sessions, then we just get the median equal to 5. But I could not exclude them on a clear sign. On the contrary, a random test showed that they are related to requests for a prohibited resource.
Addresses are addresses, and on the Internet, autonomous systems, AS, are
1510 , an average of 2 addresses on AS with median 1. Top addresses on AS: 288, 77, 66, 39, 27. The maximum of 95% of the sample is 4 addresses on AS. Here the median is expected - one Agent per provider. Top is also expected - there are big players in it. In a large network, Agents, probably, should be located in each region of the operator’s presence, and we don’t forget about NAT. If we take the countries, the maximums will be: 1409 - RU, 42 - UA, 23 - CZ, 36 from other regions, not the RIPE NCC. Inquiries not from Russia draw attention to themselves. Probably, this can be explained by geolocation errors or recorder errors when filling in the data. Or the fact that the Russian company may have non-Russian roots, or have a foreign representation because it is easier that it is natural to deal with the foreign organization RIPE NCC. Some part is undoubtedly superfluous, but it is authentically difficult to separate it, since the resource is under blocking, and from the second day under double blocking and most of the sessions represent only the exchange of several service packets. We agree on the fact that this is a small part.
These numbers can already be compared with the number of providers in Russia.
According to the ILK of licenses for Data Communication Services, with the exception of voice, 6387, but this is a highly priced estimate from above, not all of these licenses are for Internet providers who need to install an Agent. In the RIPE NCC zone, a similar number of AS are registered in Russia - 6,230, of which not all providers.
UserSide made a more rigorous calculation and received 3,940 companies in 2017, and this is more of a top rating. In any case, we have the number of illuminated AS two and a half times less. But here it is understood that the AS is not strictly equal to the provider. Some providers do not have their own AS, some have more than one. If we assume that Agents still stand at all, then someone filters more strongly than the others, so their requests are indistinguishable from garbage, if at all. But for a rough assessment it is quite tolerable, even if something is lost due to my mistake.
About DPI
Despite the fact that my hosting provider turned on its filter starting from the second day, according to the information for the first day, it can be concluded that the locks are working successfully. Only 4 sources were able to break through and have fully completed HTTP and TCP sessions (as in the example above). Another 460 can send
GET
, but the session instantly terminates on
RST
. Pay attention to
TTL
:
TTL 50, TCP, 14678 > 80, "[SYN] Seq=0" TTL 64, TCP, 80 > 14678, "[SYN, ACK] Seq=0 Ack=1" TTL 50, TCP, 14678 > 80, "[ACK] Seq=1 Ack=1" HTTP, "GET /filteredpage HTTP/1.1" TTL 64, TCP, 80 > 14678, "[ACK] Seq=1 Ack=294"
Variations of this can be different: less
RST
or more retransmitts - it also depends on what the filter sends to the source node. In any case, this is the most reliable template from which it is clear that it was the forbidden resource that was requested. Plus there is always an answer that appears in a session with a
TTL
larger than in the previous and subsequent packets.
From the rest you can not even see
GET
:
TTL 50, TCP, 14678 > 80, "[SYN] Seq=0" TTL 64, TCP, 80 > 14678, "[SYN, ACK] Seq=0 Ack=1"
Or so:
TTL 50, TCP, 14678 > 80, "[SYN] Seq=0" TTL 64, TCP, 80 > 14678, "[SYN, ACK] Seq=0 Ack=1" TTL 50, TCP, 14678 > 80, "[ACK] Seq=1 Ack=1"
The difference in the
TTL
is surely visible if something arrives from the filter. But often nothing can fly at all:
TCP, 14678 > 80, "[SYN] Seq=0" TCP, 80 > 14678, "[SYN, ACK] Seq=0 Ack=1" TCP Retransmission, 80 > 14678, "[SYN, ACK] Seq=0 Ack=1" ...
Or so:
TCP, 14678 > 80, "[SYN] Seq=0" TCP, 80 > 14678, "[SYN, ACK] Seq=0 Ack=1" TCP, 14678 > 80, "[ACK] Seq=1 Ack=1"
And all this is repeated and is repeated and repeated, as can be seen on the graph, not just once, every day.
Pro IPv6
The good news is it. I can say for sure that from 5 different IPv6 addresses, periodic requests to the forbidden resource occur, exactly the behavior of the Agents that I expected. And one of the IPv6 addresses does not fall under filtering and I see a full session. From the other two I saw only one unfinished session, one of which was interrupted by the
RST
from the filter, the second in time. Total only
7 .
Since there are few addresses, I studied all of them in detail and it turned out that there are only 3 providers there, they can be applauded while standing! Another address is cloud hosting in Russia (does not filter), another is a research center in Germany (there is a filter, where?). But why they check the availability of prohibited resources is a good question. The remaining two were made on a single request and are not in the limits of Russia, and one of them is filtered (after all, in transit?).
Locks and Agents is a big brake on IPv6, the implementation of which is not moving very fast. It is sad. Those who solved this task can be fully proud of themselves.
Finally
I didn’t try to forgive me for 100% accuracy for this, I hope someone will want to repeat this work with more accuracy. It was important for me to understand whether such an approach would work in principle. The answer will be. The obtained figures in the first approximation, I think, are quite reliable.
What else could I do and what I was too lazy to do is to count the queries to the DNS. They are not filtered, but they do not give much accuracy as they work only for the domain, and not for the entire URL. The periodicity should be visible. If you combine it with what is seen directly in the requests, then this will allow you to separate the superfluous and get more information. It is even possible to determine the DNS developers used by providers and much more.
I did not expect that for my VPS the hoster will also turn on its filter. Maybe this is a common practice. In the end, the RKN sends a request to delete the resource to the hoster. But I was not surprised, and even somewhere played a benefit. The filter worked very effectively by cutting off all the correct HTTP requests to the forbidden URL, but not the correct ones that passed through the filter of the providers before, even if in the form of endings:
FIN-ACK
and
RST
- a minus and a minus plus. By the way, the IPv6 hoster is not filtered. Of course, this affected the quality of the collected material, but still made it possible to see the frequency. It turned out to be an important point when choosing a site for placing resources; do not forget to be interested in the organization of work with the list of prohibited sites and requests from the RKN.
In the beginning, I compared the AU "Inspector" with
RIPE Atlas . This comparison is justified and a large network of Agents can be useful. For example, determining the quality of resource availability from different providers in different parts of the country. You can calculate delays, you can build graphs, you can analyze it all and see the changes occurring both locally and globally. This is not the most direct way, but astronomers use “standard candles”, why not use Agents? Knowing (finding) their standard behavior, one can determine the changes that occur around them and how this affects the quality of the services provided. And at the same time it is not necessary to independently place the probes on the network; Roskomnadzor has already supplied them.
One more thing I want to touch on, every tool can be a weapon. AS “Revizor” is a closed network, but Agents hand over all the giblets by sending requests for all resources from the prohibited list. To have such a resource does not represent any problems. In total, providers through Agents, unwittingly, talk about their network much more than they possibly would have: types of DPI and DNS, location of the Agent (central hub and service network?), Network markers of delays and losses - and this is only the most obvious. Just as someone can monitor the actions of Agents to improve the availability of their resources, someone can do it for other purposes and there are no obstacles to this. A double-edged and very versatile tool turned out, anyone can see this.