Greetings, friends!
During the study of materials for the CCNA Voice exam, the idea was born of arranging some of the information obtained as a separate article. Pursuing two goals: one self-serving - better understand the material under study and sort it all out in your mind; the second altruistic is to share the knowledge gained with those who are interested in it.
In this article, I’ll tell you about voice coding processes, codecs per se, and bandwidth calculations required for voice over IP networks.
')
About the main digital signal
I think it is not necessary to explain that, in order to transmit an analog signal (which is a human voice) over IP networks, it is necessary to convert this very signal into a sequence of ones and zeros. How this is done, the user
denis_g perfectly explained in his many favorite article. In order not to pass for a plagiarist and in order not to duplicate information, I just leave
it here. In short, the essence is as follows: based on Kotelnikov's theorem (bourgeoisie call it the Nyquist theorem) when using pulse-code modulation to transmit a voice signal without loss of quality, it is enough to transmit data at a speed of 64 kbit / s.
64 kilobits per second is what is called the
main digital signal in modern digital telephony.
On the 32nd (30 voice + 2 service) basic digital signals, the primary (smallest, simplest) level in the plesiosynchronous (almost synchronous) digital hierarchy (PDH) is built - the so-called. E1 stream (2048 kbps). And the main digital signal itself is sometimes called the zero level. It is worth noting that there is a second (E2), a third (E3) and a fourth (E4) level in PDH. Each subsequent level is multiplexed from the four previous ones with the addition of some service information, for example, E3 = 4 * E2 + signaling.
For a while (in the 80s), all digital telephony in the world was built on PDH technology. But she had a number of flaws, the most significant of which was the need for consistently demultiplexing a high-level stream to extract lower-level flows. That is, for example, to extract one E1 stream from the E4 stream in order to route it to another place, it was necessary to first decompose E4 into four E3, then disassemble E3 into four E2, disassemble E2 into four E1, redirect E1 where it should reassemble the flow in the reverse order and send further. Dreary in general, and resources Nemer eaten.
SDH (synchronous digital hierarchy) has replaced PDH technology, which to this day remains the main communication option for cellular operators, and the network of our two trunk providers (TTC, RTK) is still based on SDH.
Nevertheless, the primary levels (E1) have not gone away, and sometimes remain the only way to organize communication. For example, all telephone operators in our country use the N-th number of E1 flows to connect to each other.
Yeah, distracted. Let's go back to IP telephony, that is, to packet switching, but let's forget about channel switching.
About codecs
So, you and I have a primary digital channel, the embodiment of which in IP networks has become the G.711 codec. This standard has become the de facto most popular and is now used in protocols such as SIP and SCCP. It uses a bandwidth of 64 kbps and is probably familiar to everyone who deals with modern IP telephony.
The standard was developed in the 70s of the last century and at the moment the term of the patent for it has expired, and it is in the public domain.
The standard describes two coding algorithms — Mu-law (used in North America and Japan) and A-law (used in Europe and the rest of the world). Both algorithms are logarithmic, but the later a-law was originally intended for computer processing of processes. (c) Wikipedia
In addition to the generally accepted G.711, there is still a mass of standards for encoding / decoding audio signals. The most popular of them are G.729, G.729a, G.726, G.728. If we evaluate them by the bandwidth, we see the following picture:
G.729 - 8 kbps
G.729a - 8 kbps
G.726 - 32 kbps
G.728 - 16 kbps
It would seem that if they use a smaller band, then why did not G.711 become more popular? The fact is that the bandwidth is not the most important parameter of the codec, the speed of operation is also important, and as a result, the DSP (Digital Signal Processor) download is a digital signal processor that is responsible for encoding / decoding the signal in real time.
Another important criterion determining the success of a particular codec is the so-called. MOS (Mean Opinion Score, in Russian literature is found as averaged subjective assessment). The idea of ​​MOS is very simple: a specially formed group of people is given the opportunity to use the communication system and are asked to rate from 1 (terrible) to 5 (excellent). The averaged data from such a study are called MOS.
So, for the codecs I specified, the MOS estimates have the following meanings:
G.711 - 4.1 (according to some sources 4.45 for Mu Law)
G.729 - 3, 92 (maybe I would have snatched from G.711, but it eats a lot of CPU time)
G.729a - 3.7 (this codec works much faster than its elder brother, but as we can see - to the detriment of quality)
G.726 - 3.85
G.728 - 3.61
And the combination of all these factors (bandwidth, speed, MOS) determines the primacy of a particular codec in the realm of digital coding of signals.
By the way, all these standards (well, which begin at G.) are the fruits of the activities of the international advisory committee on telephony and telegraphy (a division of ITU - the international telecommunication union) and are essentially proprietary. And nowadays it is difficult to imagine the absence of free alternatives from proprietary standards. So in the area of ​​audio coding, the iLBC standard (internet Low Bitrate Codec) was born, which uses 15.2 Kbps and has a MOS rating of 4.1. These factors, along with openness, have influenced the fact that this standard is used in Google talk, Yahoo messenger and all of our beloved Skype.
It is worth noting that popular IP PBXs (asterisk, cisco CME) support all these codecs, and you can always decide for yourself what you will use in your telephone network.
Pro bandwidth
Estimated throughput is the parameter that must be taken into account when planning any data transfer network, so that it is easily scalable and your users do not have unnecessary inconveniences during its operation. I repeat - any network, including VoIP networks.
An important parameter in this particular case is the sample size (measured in milliseconds). The sample size is the parameter that determines the “amount” of voice information in an IP packet — for example, you can cram one syllable or two into the same standard size packet. The larger the sample size, the more economical you spend your bandwidth, but the more there will be a delay in the conversation (a consequence of the digital processor's encoding / decoding).
I do not know how in Asterisk (I hope someone will tell), but in Cisco CME (solution from Cisco in the field of IP-telephony) when setting the codec parameters, unfortunately there is no such parameter - the size of the sample, but there is a parameter that determines the number of bytes in the sample. They are connected with each other by a simple formula (linear dependence), and are easily expressed through each other. And here is the formula:
BVS = PC * PPK / 8, where BVS is the number of bytes in the sample, PC is the size of the sample in seconds, PPK is the codec transmission rate in bits / sec. That is, if we want the G.711 codec in one package to have, for example, 20 milliseconds of conversation, then we need to set the value of the BVS parameter = 0.02 * 64000/8 = 160
Thus, we need to put 160 bytes of useful information into our UDP fragment. Ok, we go further.
Suppose we use a classic IP network, the channel protocol for which is Ethernet, plus we want to drive it all over an encrypted VPN network. Then another 18 bytes of Ethernet overhead will be added to our 160 bytes. Add here the network and transport layer - headers IP, UDP and RTP (20 + 8 + 12 bytes). And we wrap all our stuff in IPSec - another plus 50 bytes. At the output we have a packet of 268 bytes in size.
To calculate the total bandwidth, we need to multiply the size of this packet by the number of packets per second. Taking into account the fact that our sample size is 20 ms, then in one second there will be 50 such samples. Multiplying 50 by 268, we see that in one second we need to drive 13,400 bytes or 1,07200 bits per second, that is, 107.2 Kbit / s. And this is almost two times more than the original 64 kilobits! From this number it is necessary to proceed when planning your network.
Be carefull! May the force be with you!
Sources used:
CCNA Voice Video Course
en.wikipedia.org
www.deltann.ru
PS: I hope for your additions