📜 ⬆️ ⬇️

Terabytes and tebibytes, traffic rating, the consequences of not understanding the difference and how to save on a video online project

Not so long ago, we had an interesting incident that really requires separate coverage and will help to avoid possible misunderstandings in the future. The case is unique in its own way, because the client, with whom we have successfully cooperated almost from the day of its official opening (2009, 7 years of work!), Accused us of fraud and very strange traffic accounting.

And here the reason was not only in the lack of awareness of the client, but also of the administrators who trivially did not notice the difference, or did not realize it either. It is for this reason that we decided to make out this case.

Prehistory


Elena came to us in 2009, not having the funds for her own dedicated server, but having a rather interesting project, we decided to give a chance - and gave a significant part of our own dedicated server resources, almost at cost. Over time, the project grew and began to generate dozens and even hundreds of megabits of traffic, it required several dedicated servers. And when the traffic became equal to 2 Gbit / s and the third was required, it became obvious that there is no better solution than the server with 10Gbps Unmetered connection for Elena. After all, it was beneficial, both in price and in software optimization, although it provided less fault tolerance compared to the case of using multiple servers.

For a while, the project worked and only gained momentum, but the situation changed, the dollar rose, advertising prices fell, online streaming faced difficulties from copyright advocates, the project’s popularity declined, as did the need for server support. At the beginning, the project switched to a server with a 2x1 Gbit / s connection, then to 1 Gbit / s, and then it became obvious that it was more profitable to rent several promotional servers with 100 TB of traffic and a 1 Gbit / s connection than a single server with a gigabit connection and no traffic accounting.
')
For example, we offer a dedicated server in the Netherlands http://www.ua-hosting.company/servers with a traffic limit of 100TB and a gigabit connection:

2 x Intel Quad Core Xeon E5504 / 32GB DDR3 / 4 x 240GB SSD / 1Gbps 100TB - $ 139 / month,

and those who need more disk space and do not need SSD - they can choose the same configuration with the E5620 processor and 6 drives of 2 TB each at about the same price.

It is usually 3-4 times cheaper than 1 such server with 1 Gb / s connection and traffic without accounting, and at the same time, the offer is more profitable, because the result:

- the subscriber receives a more fault-tolerant environment, since in the event of a hardware failure, only a part of the computing power and bandwidth becomes inaccessible;
- a greater number of CPU, RAM, SSD or HDD drives (3-4 times);
- it is possible to give more traffic.

The last point is a paradox, but this is so, 3 servers with a connection of 1 Gbit / s and a traffic limit of 100 TB will allow more traffic than one server with a gigabit connection without traffic (in almost all cases), since in fact there is access to the channel at 3 Gb / s. With a connection of 1 Gbit / s, it is unlikely to generate 300 TB of traffic, since the daily consumption curve is uneven, they pump much less at night than during the day, and 300 TB are achievable only with almost 100% channel utilization 100% of the time during the month so that this figure is real except in cases:

- when there is saturation of the channel (users can consume more traffic, but does not allow the channel and as a result, the band is distributed between users, as a result of which each of the users can no longer get the maximum speed, and in some cases, for example, in the case of saturation of the channel with a large% of packet loss, even necessary, resulting in video buffering tupit);

- in the case of a large% of incoming traffic (proxying, processing incoming requests, etc.).

Recommendations


Different providers consider traffic differently. Someone removes data from the port of the switch, and someone only on the border, the root border router of the data center. As a result, not all traffic can be taken into account, or even traffic that did not physically reach the server can be taken into account in the case of using the same netflow (the same DDOS over UDP, which was cut on the border). Someone this statistics is updated online every few minutes, and someone - once a day. For example, in the case of large volumes of traffic on border routers or a large amount of data for analysis, hundreds or even thousands of gigabits, it is quite difficult and expensive to update statistics more often than once a day, with the result that full statistics are available, but with delayed.

For this reason, we always recommend our clients to keep 5-15% of unused traffic in reserve, depending on the intensity of daily consumption and the time required to supply and configure an additional server, to redistribute the load on it.

Also, the reserve will be necessary in case one of the nodes fails and the load falls on the remaining nodes.

As a result of this backup, you can save a lot of nerves, avoid problems with overspending, including due to the delay in updating statistics, delivering equipment, quietly waiting for the delivery of the server and setting it up, as well as reliving the worst scenario - the case of a node failure.

Do not save these 5-15% of the budget, they are not worth it.

In our case, the situation was more interesting, one that is not described by anything that is indicated above. Although it took place only for the reason that the client did not leave a sufficient reserve, because there would be a reserve - this would not have arisen, and there would be no discrepancies in the traffic data, although they really did not exist ...

What happened


The client wanted to use traffic by almost 100%, despite the fact that 3 servers were in operation, a good discount was provided, but the 4th server, for another $ 100, the budget probably did not allow renting. However, this is the client's choice, he has the right to it. Moreover, we always make concessions to our subscribers and when it comes to adding a new server, in case of problems with payment, we give the opportunity to use the server for up to a month almost free of charge, as equipment is often available in reserve and when it is not available - we order for the client and provide the necessary delay.

Nevertheless, Elena decided that she would get into the 100 TB limit, despite the fact that her statistics already talked about using almost 84, and there were 5 days left until the end of the month, and a banal calculation showed that there should not be enough traffic. hoped that the load was associated with the holidays and the traffic is still enough.

To Elena's surprise, the traffic almost ended the next day, reaching a value of 87 TiB (according to her data) and 96 TB of ours, about which we informed Elena that it would be good to add a server.

Elena, there are logical questions, why is our value greater, how is it?

She appealed both to our support and to the support of the data center, since the statistics are still provided by the data center, with the question and the accusations, saying that the traffic is incorrectly calculated.

The engineers were silent for a long time, updating the statistics data, but they did not reveal any discrepancies in the accounting and could not clearly answer this question to the client immediately. And this was, by the way, the weekend, when people from the NOC department were available only by phone or very in critical cases, because they promised to give a detailed analysis of the situation no earlier than Monday.

Our technical department apparently went down to work, or simply didn’t notice a small detail in the message from administrator Elena displaying data from vnstat, which its administrator used to collect statistics and who also thought that we had a problem, either on the data side center, because of which he sees not all traffic, and sent the following request through us to the data center, in which he stated that there was a discrepancy of 10% with his data and at the end even indicated possible reasons, in his opinion:

eth2 / monthly

month rx | tx | total | avg. rate
------------------------ + ------------- + ----------- - + ---------------
Jul '15 459.59 GiB | 16.68 TiB | 17.13 TiB | 54.93 Mbit / s
Aug '15 1.05 TiB | 47.04 TiB | 48.10 TiB | 154.25 Mbit / s
Sep '15 847.71 GiB | 37.69 TiB | 38.52 TiB | 127.66 Mbit / s
Oct '15 865.86 GiB | 35.36 TiB | 36.21 TiB | 116.13 Mbit / s
Nov '15 638.09 GiB | 28.18 TiB | 28.80 TiB | 95.45 Mbit / s
Dec '15 483.77 GiB | 21.62 TiB | 22.09 TiB | 70.84 Mbit / s
Jan '16 840.79 GiB | 36.21 TiB | 37.04 TiB | 118.78 Mbit / s
Feb '16 2.20 TiB | 83.32 TiB | 85.52 TiB | 293.19 Mbit / s
Mar '16 1.90 TiB | 84.92 TiB | 86.82 TiB | 304.64 Mbit / s
------------------------ + ------------- + ----------- - + ---------------
estimated 2.08 TiB | 92.91 TiB | 94.99 TiB |

Can you provide traffic for 1 day / 1 month for 7 days and etc?

See it’s your server’s VLAN and multicast for example.

Thank you.

Helen, without waiting for a response from the data center engineers, despite 7 years of successful cooperation, decided to publish on the Internet a review that we strangely consider traffic and even conduct dishonest records, and publicly, in forums, “If you take hosting a server from limited traffic, you will have many surprises. And it turns out that they consider the traffic as something special and they always have 10% more. Moreover, the data is manipulated as they see fit. ”

It also announced its intention to refuse our services.

The information reached me and I tried to figure out what was the matter. Immediately struck by the fact that the statistics still indicate TiB, and not TB. Tebibytes, not terabytes. That is, accounting is done on the binary system, and not decimal, based on the fact that in kilobyte, or rather in kibibite - 1024 bytes, and not 1000.

It should be noted that in order for this distinction not to be used for marketing purposes, ISO (International Standartization Organization) has long introduced the prefix "bi" for binary bytes, that is, kibibytes, mebibytes, gibibytes, tebibytes. However, marketing did take place, and if manufacturers of drives use decimal bytes — they indicate smaller volumes of storage capacity, then when measuring and recording traffic, the situation worked exactly the opposite. Data Center, providing 100 TB of traffic, provides it less than it can actually be.

It would seem that the difference is small, only 24 bytes per 1000, the error from this is only 2.4%, but why does the client have such a big difference, at the level of 10%? Can some traffic really be considered?

And here I have already made a mistake, forgetting that the “error” is growing, namely:

1024 bytes in a kibibete (if we speak in accordance with ISO standards), in mebibyte already 1024 * 1024 = 1 048 576 bytes, in a gibibite - 1024 * 1024 * 1024 = 1 073 741 824, and in tebibite - 1024 * 1024 * 1024 * 1024 = 1,099,511,627,776.

Unexpected turn? Yes?

It turns out that on large volumes of traffic, the “error” reaches just 10%, which is observed by our client.

The incident was exhausted and the client, after providing this information and detailed explanations, decided to continue working with us.

Findings.


- to ignore the effect of the difference between decimal and binary bytes with large traffic is no longer possible, for clarity, the differences in percentages - we give a screen from Wikipedia:


- data centers have found a legal way to save traffic, because even giving the user a small amount of traffic, they often like to specify it in terabytes (the savings are maximum and 3% more than if specified in gigabytes), thus saving up to 5-10% of the traffic additionally, with a total consumption of, say, 2.4 terabits / s, it will provide up to a quarter of terabits of bandwidth savings! And this, if we take the case of the channel from the first-level backbone operator - 600 * 256 = $ 153,600 per month or over a million dollars a year.

It is precisely because taking into account the difference between decimal and binary bytes is extremely important, especially in the era of video streaming and the constant growth of traffic consumption, we decided to focus on this issue on Habré, perhaps many of you knew this difference, but surely far not everyone wondered how important it could be and how this knowledge could be effectively used, including for marketing purposes.

We are very interested in whether this article was useful, because we want to conduct a small survey. Please answer only honestly.

Source: https://habr.com/ru/post/309768/


All Articles