Network speed measurements - what bandwidth meter creators are silent about

Maybe it is useful to someone from system administrators / networkers. It was necessary to measure the characteristics of the loaded channel from the provider, to understand where the problem is and, if indeed in the channel, provide objective data for further conversations with the provider.

Gigabit channel. Peak load, according to the router, is about 480 Mbit / 70 000 packets / s. Users complain that they are “slowing down” and that all kinds of speed meters available online regularly provide all sorts of terrifying results.

Made a pack of tests with different online Bandwidth Meters and all sorts of utilities. The first thing that caught my eye was a completely unbelievable scatter of results. Not only did each tool give its “unique” results, but also the launch of the same tool gave radically different results within a few minutes. As a consequence, the only conclusion that has been made from these measurements: they all lie, at which they lie not a little bit, but by hundreds or more plus or minus.
')
The next step - once the available tools are lying - try to quickly bungle something of your own that can send between 2 points (on both sides of the channel) all sorts of different packages and their combinations and measure the arrival times of the packages as accurately as possible in order to have statistics for the analysis.

And here, it seems, and found the "root of evil" - the scheduler of processes in the system. In most operating systems, processes do not have the ability to use the processor for as long as they want, because they are not alone in the system, others are also necessary. Therefore, the processor time is given to them in batches, all in turn (well, if you simplify a little), and at certain time intervals. And the more loaded the system, the longer these intervals.

As I understood from the documentation for the nanosleep () function (for Linux), if the interval is less than 2 ms. and the process is started with the right level of privileges - it performs a delay through a certain cycle within itself, without giving control of the system, because it will not have time to get it back otherwise, and if the privilege is not enough, then it asks the system to “wake up in time” but really hope not, because the interval is likely to be no less, but usually much more than 2 msec.

On this basis, it can be assumed that a normal user application, which is not part of the system’s kernel, and as a result, not having the ability to “stop the world” while it is busy, can measure time intervals no more accurately than ~ 2 ms.
Then a bit of mathematics: at 2 ms, on a 1-Gbit link, you can receive at least about 160 packets (1 gigabit / 1500 byte packet (12,000 bits) / 1000 milliseconds * 2), atoms and much more if they are small.

That is, from the moment when the program trying to track the moment of arrival of the package was interrupted by the system and until the moment when it will have the opportunity to continue its work, about 160 packages can accumulate in the buffer, which from the point of view of this program appeared there SIMULTANEOUSLY.

In such a situation, measurements based on the arrival time of the packets, as well as on the time difference of their passage, are to say the least useless. They acquire some meaning only at speeds of the order of 1 megabit and less.

At the same speed, without being able to reliably track the times of movement of packages, we can measure only 2 things:
- how quickly a relatively large block of information (file), large enough to make the measurement errors insignificant, creeps through the channel - that is, the available free bandwidth at a given time,
- as well as what percentage of this block of information will disappear along the way - that is, packet loss at a given time.

Neither one nor the other, I think, cannot be called an objective assessment of the quality of the channel as such, if there is also some traffic from other users on the channel at the moment - both of these parameters may be affected too much by other users loading the channel.

For now, digging further. If anyone is interested, I’m happy to share the results of further searches.

Source: https://habr.com/ru/post/87585/

All Articles

Network speed measurements - what bandwidth meter creators are silent about

More articles: