📜 ⬆️ ⬇️

We develop a monitoring system for 55,000 RTP video streams

Good day!

Recently I read a very interesting article about processing 50 gigabits / s on the server and remembered that I have an article in drafts about how we developed a system for monitoring video streams with a total traffic volume up to 100 Gbit / s a ​​year ago. Once again, “read” it and decided to submit to the developers. The article is more devoted to the analysis of protocols and the search for an architectural solution, rather than tuning of various linux subsystems, because we have taken the path of load distribution between the server and network probes that are connected to 10 Gigabit Ethernet traffic flows.


')
If you are interested in how we could measure the characteristics of network streams from 55 thousand video cameras, I ask for cat.



In this article I plan to tell about:


What are we monitoring?



You need to monitor several 10G Ethernet transport links, which are transmitted by tens of thousands of video streams. The first installation - 22 thousand cameras, the second - 55 thousand. The average bit rate of the camera is 1 megabit / s. There are cameras with 2 megabits / s and 500 kilobits / s.

The video is transmitted using the RTP-over-UDP and RTP-over-RTSP-over-TCP protocols, and the connection is established via RTSP. At the same time, there can be one stream from one IP address (one address — one camera) or several (one address — one encoder, that is, from 1 to 16 streams).

Connection to ethernet links is possible only in the monitoring mode, using optical taps, in other words - in the non-intrusive mode. Such a connection is preferable, since in this case the traffic does not pass through the equipment and, therefore, it can in no way affect the quality of the services provided (the drop in the optical signal level on the splitter is considered quite insignificant). For operators, this is an extremely important argument. And for developers, an important nuance follows from such a connection - you will always have to watch the flows “from the side”, since packets cannot be transmitted to the network (for example, you cannot send a ping and get a response). It means that we will have to work in conditions of lack of information.

What are we measuring?



The stream quality assessment is based on the analysis of the RTP transport headers and the h.264 NAL-unit headers. Image quality is not measured. Instead, the transport stream of video frames is analyzed by the following criteria:


RTP can “go” both over UDP, and (mostly, in 90% of cases) over RTSP / TCP in the “Interleaving data” mode. Yes, despite the fact that the RFC for RTSP says that it is better not to use the Interleaving Data mode - see 10.12, rfc2326 ).

Total: the monitoring system is a complex that is connected in a non-intrusive mode to the nth number of 10-gigabit Ethernet links, which continuously “watches” the transfer of all RTP video streams present in the traffic and takes measurements with a certain time interval in order to save them to the base. According to data from the database, reports are regularly generated for all cameras.

And what's so complicated?



In the process of finding a solution, several problems were immediately fixed:


We are looking for a suitable solution ...



We, naturally, sought to make the most of our own experience. By the time we made the decision, we already had the implementation of processing ethernet packets on an FPGA-powered Bercut-MX device (easier, MX). With the help of Bercut-MX we were able to get the necessary fields for analysis from the Ethernet packet headers. We didn’t have the experience of handling such traffic volume by means of “ordinary” servers, so we were wary of this decision with some caution ...

It would seem that it remained to simply apply the method to the RTP packets and the golden key would be in our pocket, but MX can only handle traffic, it does not include the possibility of accounting and storing statistics. There are not enough memory for storing the found connections (IP-IP-port-port combinations) in the FPGA, because there can be about 15 thousand video streams in the 2x10-gigabit link that comes in for each input , the number of lost packets, and so on ... Moreover, searching at this speed and for that amount of data under the condition of lossless processing becomes a non-trivial task.

To find a solution, we had to “dig a little deeper” and figure out what algorithms we would use to measure quality and identify video streams.

What can be measured by the fields of the RTP packet?



The format of the RTP packet is described in rfc3550 .



From the description it is clear that in terms of quality measurements in the RTP package, we are interested in the following fields:


Obviously, the sequence number allows you to define the following stream parameters:


Timestamp allows you to measure:


Well, M-bit allows you to measure the frame rate. True, SPS / PPS frames of the h.264 protocol introduce an error, since video frames are not. But it can be leveled by using the information from the NAL-unit header, which always follows the RTP header.

Detailed algorithms for measuring parameters are beyond the scope of the article, I will not go deeper. If interested, then rfc3550 has an example of loss calculation code and formulas for calculating jitter . The main conclusion is that for measuring the basic characteristics of the transport stream, only a few fields from the RTP packets and NAL units are sufficient. And the rest of the information is not involved in the measurements and it can and should be discarded!



How to identify RTP streams?



To keep statistics, information obtained from the RTP header must be “tied” to some camera identifier (video stream). The camera can be uniquely identified by the following parameters:


Interestingly, we first made camera identification only by IP source and SSRC, relying on the fact that the SSRC should be random, but in practice it turned out that many cameras set the SSRC to a fixed value (say, 256). Apparently, this is due to resource savings. As a result, we had to add more ports to the camera ID. This solved the problem of uniqueness completely.

How to separate RTP packets from the rest of the traffic?



The question remains: how does the Bercut-MX, having accepted the packet, understand that this is an RTP? The RTP header does not have such an explicit identification as IP, it does not have a checksum, it can be transmitted via UDP with port numbers that are dynamically selected when a connection is established. And in our case, most of the connections have been established for a long time and you can wait a long time for reinstallation.

To solve this problem in rfc3550 (Appendix A.1) it is recommended to check the bits of the RTP version - these are two bits, and the Payload Type (PT) field is seven bits, which in the case of the dynamic type accepts a small range. We found out in practice that for the multitude of cameras we work with, PT fits in the range from 96 to 100.

There is one more factor - the port's parity, but as practice has shown, it is not always respected, so it had to be abandoned.

Thus, the behavior of the Bercut-MX is as follows:
  1. we receive a package, we sort into fields;
  2. if the version is 2 and the payload type is within the specified limits, then send the headers to the server.


It is obvious that with this approach there are false positives, since Under such simple criteria not only RTP packets can fall. But for us it is important that we definitely will not miss the RTP packet, and the “wrong” packets will be filtered out by the server.

To filter out false cases, the server uses a mechanism that registers the source of video traffic over several consecutively received packets (in the packet, there is a sequence number!). If several packets come with consecutive numbers, then this is not a coincidence and we start working with this stream. This algorithm turned out to be very reliable.

Moving on ...



Realizing that all the information going in the packets is not needed for measuring the quality and identifying flows, we decided to take all the highload & time-critical work on receiving and isolating the RTP packet fields on Bercut-MX, I mean FPGA. It “finds” the video stream, parses the packet, leaves only the required fields and sends it to a regular server in the UDP tunnel. The server measures each camera and saves the results to a database.

As a result, the server does not work with 50-60 Gigabit / s, but with a maximum of 5% (this is exactly the proportion of the data sent to the average packet size). That is, at the input of the entire system 55 Gigabit / s, and the server gets only 3 Gigabits per second!

As a result, we got this architecture:



And we received the first result in this configuration two weeks after the initial TZ was set!

What is the result of the server busy?



So, what does the server do in our architecture? His tasks:


Given that the total traffic at the server's input is about 3 Gigabit / s, the server copes even if we do not use any DPDK, but work simply via a linux socket (after increasing the buffer size for the socket, of course). Moreover, it will be possible to connect new links and MXs, because the performance margin remains.

Here is the top of the server (this is the top of only one lxc container, reports are generated in another):



It shows that the entire load on the calculation of quality parameters and statistics is distributed over four processes evenly. We managed to achieve such a distribution due to the use of hashing in the FPGA: the IP hash is considered a hash function, and the low bits of the received hash determine the UDP port number that the statistics will go to. Accordingly, each process listening to its port receives approximately the same amount of traffic.

Cons and pros



It is time to boast and admit the shortcomings of the solution.

I'll start with the pros:


For the sake of justice, I will consider the disadvantages:


Summary



In the end, we have a software and hardware complex in which we can control both the part that parses the packets on the interfaces and the one that keeps the statistics. Full control over all nodes of the system literally saved us when the cameras began to translate to RTSP / TCP interleaved mode. Because in this case, the RTP header is no longer located in the packet at a fixed offset: it can be anywhere, even on the border of two packets (the first half in one, the second in the other). Accordingly, the algorithm for obtaining the RTP header and its fields has undergone dramatic changes. We had to do TCP reassembling on the server for all 50,000 connections - hence the rather high load on top.

We have never worked before in the field of high-loaded applications, but we managed to solve the problem at the expense of our skills in FPGA and it turned out pretty good. There is even a reserve - for example, another 20-30 thousand streams can be connected to a system with 55000 cameras.

Tuning linux subsystems (queuing by interrupts, increasing receive buffers, directive allocation of cores to specific processes, etc.) I left behind the article, since This topic is already very well covered.

I described not everything, the rake was collected a lot, so do not hesitate to ask questions :)

Many thanks to all who read to the end!

Links



Source: https://habr.com/ru/post/266561/


All Articles