📜 ⬆️ ⬇️

Oh, I have a delay. Part 2

In the previous article, we talked about reducing the delay in broadcasting video. With the shipment figured out, now let's talk about the delivery.



The client has places where the video can accumulate and lag behind the real time:


And this again buffers, buffers and buffers again. Let's take it in order.
')

(lamp buffer)

Network input buffer


The client receives data through a network socket. IP sockets have buffers that can be filled when the user program data is not fast enough.

If the buffer is too small, the video broadcast will be constantly interrupted, and if it is too large, the user will have to wait until the buffer is filled longer than the required time, determined by the data transfer rate. In the worst case, if the buffer is under a megabyte, then video can fill up to 4-8 seconds in it.

If the buffer is full, you can restore the realtime by either fast forwarding or discarding part of the data. In fact, this almost never happens, since player developers are postponing this functionality for future releases.

From the point of view of proximity to realtime, a large network buffer is evil. From the point of view of the smoothness of playing the TV channel, this is quite a reasonable thing, because the core works fast enough and well enough to have time to fill its buffers.

It is difficult to look at the buffer state on the client, because if there was Linux, you could run the ss utility. But finding out this information on Android or Windows is more difficult.

Demuxer internal synchronization buffer


In various protocols, they love to drag apart audio and video in different ways. For example, in MPEG-TS, it is customary to take 3-5 audio frames in a row and put them in one packet in order to reduce traffic. At the same time, the frame flow from “VAVAAVAVAA” turns into “VVAAAAAVVAAAAVAAAVVVAAA”.

In this case, timestamps are mixed and, if you want to have them sorted further (for example, for returning to RTMP this is critical), then you need to create a buffer that will slow down the frames. Especially wonderful behavior in such a buffer occurs when the audio disappears completely or on one channel out of five.

In general, it is almost impossible to distinguish a complete loss from a very large delay. Therefore, in the development process, you have to write code that should “feel” what happened there.

Network speed compensation buffer


Here we will talk about the buffer that we have in the HLS, or the buffer that was controlled in the RTMP player.

For example, the HLS player is designed like this:


In the ideal case, 2-3 segments are supported in the buffer, which can disappear if the network speed starts to subside.

Buffer to compensate for fluctuating network speeds laid in any player. For example, in RTMP this buffer is, theoretically, 0, in HLS it is, theoretically, say 10 seconds, in RTSP players, it is usually real 0, and in SIP it is exactly 0.

We say “theoretically”, because nothing will work on the boundary parameters. For example, HLS players on 1-second fragments can start to work extremely unstable. Moreover, this may be due to trivial errors such as accounting for the duration of a segment in seconds, and not milliseconds.

And the RTMP flash player on the zero buffer behaves just “delightfully” - it starts accumulating a delay of one second per minute and in a couple of hours it can die in terrible agony.

Player buffer


The playback mechanism is almost always separated from the mechanism that downloads video using any protocol. Frames are unpacked, separated from the metadata and transferred to the player. Then begins the "strange": Flash and MSE player may require more than one frame to start playing.

If this happens, then you again begin to lag and delay. Here the numbers are about 2–5 frames, i.e. up to 200 ms

Decoder buffer


A good realtime decoder will immediately decode the received stream and give it away. The access blocks enter the buffer continuously, with the speed of filling the buffer proportional to the speed of the encoded stream. Access blocks are loaded into the buffer at different times, because the coded images have a different amount of data. Data is unloaded from the buffer at regular intervals equal to the frame rate of the reproduced image, and it is unloaded completely and instantly. If, for example, the starting delay of the new stream is much larger than the finish delay of the old one, then after the last image of the old stream has been played and unloaded from the buffer, it will take a long time to decode and play the first image of the new stream. This will, for example, lead to the freezing of the last image of the old stream and a noticeable glueing. If the speed of the new stream is much higher than the speed of the old one, then the splicing will be even more noticeable, since the buffer is full and part of the data is lost.



If you have turned on b-frames on coding a real-time stream, which is a wonderful idea in itself, then you will immediately receive a delay in the amount of N * T, where:


Thus, we on an even place receive a delay in a minimum of 160 ms.

In reality, a decoder can still slow down a frame or two, or simply not have a good event API for timely notification of frame readiness.

Hardware delays on the last meter of video delivery


A huge decoded frame must be shown. If you decoded not on the video card, but on the processor, then it should be copied to the video memory for display, and this takes time. Video cards can also add milliseconds, but usually everything is not so terrible here compared to the thousands of milliseconds in previous cases.

Total


Having dealt with all kinds of delays in the delivery and receipt of video, you can begin to measure it. In the next article we will explain how this can be done.

Source: https://habr.com/ru/post/345010/


All Articles