Basics of IP telephony, basic principles, terms and protocols

Good afternoon, dear habrazhiteli. In this article I will try to consider the basic principles of IP-telephony, describe the most frequently used protocols, indicate the methods of encoding and decoding voice, and analyze some typical problems.

Under IP-telephony refers to voice communication, which is carried out over data networks, in particular over IP networks (IP - Internet Protocol). Today, IP-telephony is increasingly replacing traditional telephone networks due to ease of deployment, low cost of a call, ease of configuration, high quality of communication and comparative security of the connection. In this presentation, we will adhere to the principles of the OSI reference model (Open Systems Interconnection basic reference model) and talk about the subject “bottom-up”, starting with the physical and channel levels and ending with the data levels.

"
OSI Model and Data Encapsulation

Principles of IP telephony

When making a call, the voice signal is converted into a compressed data packet (this process will be discussed in more detail in the chapters “Pulse Code Modulation” and “Codecs”). Next, data packets are sent over packet-switched networks, in particular, IP networks. When the packets reach the receiver, they are decoded into the original voice signals. These processes are possible due to the large number of auxiliary protocols, some of which will be discussed further.
')
In this context, the data transfer protocol is a kind of language that allows two subscribers to understand each other and to ensure high-quality data transfer between two points.

Difference from traditional telephony

In traditional telephony, the connection is established using a telephone station and pursues only the purpose of the conversation. Here, voice signals are transmitted via telephone lines through a dedicated connection. In the case of IP telephony, the compressed data packets arrive in the global or local network with a specific address and are transmitted based on this address. It already uses IP addressing, with all its inherent features (such as routing).

At the same time, IP telephony is a cheaper solution for both the operator and the subscriber. This is due to the fact that:

Traditional telephone networks have excessive performance, while IP telephony uses voice packet compression technology and makes full use of the capacity of the telephone line.
As a rule, at the moment everyone has access to the global network, which allows to reduce connection costs or completely eliminate them.
Calls in the local network can use the internal server and occur without the participation of an external PBX.

Together with the above, IP-telephony can improve the quality of communication. This is achieved, again, due to three main factors:

Telephone servers are constantly being improved and the algorithms of their work become more resistant to delays or other problems of IP networks.
In private networks, their owners have complete control over the situation and can change such parameters as bandwidth, the number of subscribers on one line, and, as a result, the amount of delay.
Packet-switched networks are evolving, and new protocols and technologies are introduced annually to improve the quality of communication (for example, the RSVP bandwidth reservation protocol).

Thanks to IP-telephony, the problem of a busy line is very elegantly solved, since the redirection or transfer to the standby mode can be carried out by several commands in the configuration file on the PBX.

Physical Layer

At the physical level, a stream of bits is transmitted over the physical medium through an appropriate interface. IP telephony relies almost entirely on the already existing network infrastructure. As a transmission medium, categories 5 (UTP5), single-mode or multimode optical fiber, or coaxial cable are usually used. Thus, the principle of convergence of telecommunication networks is fully realized.

Poe

It is interesting to consider the technology of PoE (Power Over Ethernet) - IEEE 802.3 af-2003 and IEEE 802.3at-2009 standards. Its essence lies in the ability to provide power to devices using a standard twisted pair. Most modern IP phones, in particular, the Cisco Unified IP Phones 7900 Series, come with PoE support. According to the 2009 standard, devices can receive a current of up to 25.5 watts.

When powering up, only two twisted pairs of 100BASE-TX cable are used, however, some manufacturers use all four, reaching power up to 51 Watts. It should be noted that the technology does not require modification of existing cable systems, including Cat 5 cables.

To determine whether the connected device is a PD-powered device, a voltage of 2.8 - 10 V is applied to the cable. This calculates the resistance of the connected device. If this resistance is in the range of 19 - 26.5 kΩ, then the process proceeds to the next stage. If not, the check is repeated at intervals of ≥2 ms.

Next, the power range of the powered device is searched by applying a higher voltage and measuring the current in the line. Following this, the line is supplied with 48 V - supply voltage. Constant overload control is also carried out.

Link Layer (Data Link Layer)

According to the IEEE 802 specification, the data link layer is divided into two sublevels:

MAC (Media Access Control) - provides interaction with the physical layer;
LLC (Logical Link Control) - serves the network layer.

At the data link layer, there are switches - devices that provide the connection of several nodes of a computer network and the distribution of frames between hosts based on physical (MAC) addressing.

It is necessary to mention the mechanism of virtual local networks (Virtual Local Area Network). This technology allows you to create a logical network topology without regard to its physical properties. This is achieved by tagging traffic, which is described in detail in the IEEE 802.1Q standard.

Frame format

In the context of IP telephony, we note Voice VLAN, which is widely used to isolate voice traffic generated by IP phones from other data. Its use is advisable for two reasons:

Security. Creating a separate voice VLAN reduces the likelihood of interception and analysis of voice packets.
Improving the quality of transmission. The VLAN mechanism allows you to set an increased priority for voice packets, and, as a result, improve the quality of communication.

Network Layer

At the network level, routing occurs, respectively, the main network-level devices are routers (Router). It is here that it is determined by which way the data reaches the recipient with a specific IP address.

The main routable protocol is IP (Internet Protocol), on the basis of which IP telephony is built, as well as the worldwide Internet. There are also many dynamic routing protocols, the most popular among which is OSPF (Open Shortest Path First) - an internal protocol based on the current state of communication channels;

To date, there are special VoIP-gateways (Voice Over IP Gateway) that provide connection of ordinary analog phones to the IP-network. As a rule, they also have a built-in router that allows you to keep track of traffic, authorize users, automatically distribute IP addresses, and manage bandwidth.

Among the standard features of VoIP gateways:

Security features (creation of access lists, authorization);
Facsimile support;
Voice mail support;
Supports H.323, SIP (Session Initiation Protocol) protocols.

To combat possible delays in IP transmission, it is necessary to supplement with additional means, such as queuing protocols (so that voice data does not compete with normal ones).
As a rule, for this purpose, routers use low-latency queuing (LLQ - Low-Latency queuing) or weighted queuing based on classes (CBWFQ - Class-Based Weighted Fair Queuing).
In addition, prioritized labeling schemes are needed to consider voice data as the most important for transmission.

Transport Layer

For the transport level are characteristic:

Segmentation of top-level application data
Provide end-to-end connection;
Guaranteed data reliability.

The main transport layer protocols are TCP (Transmission Control Protocol), UDP (User Datagram Protocol), RTP (Real-time Transport Protocol). Directly in IP-telephony, UDP and RTP protocols are used, and their main difference from TCP is that they do not ensure reliable data delivery. This is a more acceptable option than controlling delivery control (TCP), since telephony is highly dependent on transmission delays, but less sensitive to packet loss.

UDP

UDP is based on IP network protocol and provides transport services to application processes. Its main difference from TCP is to ensure non-guaranteed delivery, that is, no confirmation is requested when sending and receiving data. Also, when sending information, it is not necessary to establish a logical connection between the UDP modules (source and receiver).

Rtp

Despite the fact that RTP is considered to be a transport layer protocol, as a rule, it works over UDP. With the help of RTP, traffic type recognition, time tagging, transmission control and packet sequence numbering are implemented.

The basic purpose of RTP is that it assigns time stamps to each outgoing packet that are processed at the receiving side. This allows you to receive data in the proper order, reduces the impact of the uneven transit time of packets over the network, restores synchronization between audio and video data.

Data Layers

The last three levels of the OSI model will be considered together. Such an association is permissible, since the processes occurring at these levels are closely related to each other, and it would be more logical to describe them without any distinction between the sublevels.

H.323

The first step is to describe the H.323 protocol stack, developed in 1996. This standard contains a description of equipment, network services and terminal devices intended for audio and video communication in packet-switched networks (Internet). For any H.323 device, voice information sharing is required.

H.323 recommendations suggest:

Platform independence.
Analog data encoding standards.
Bandwidth management.
Flexibility and compatibility.

We note a very important fact: the recommendations do not define the physical transmission medium, the transport protocol and the network interface. This means that devices that support the H.323 standard can work in any existing packet-switched networks.

According to H.323, the four main components of a VoIP connection are:

terminal;
Gateway;
zone controller;
Multipoint Control Unit (MCU) Controller.

An example of a block diagram of a network in IP telephony

Excerpt from the document describing the H.323 protocol stack.

1. Connection control and alarm:
1.a. H.225.0: multimedia stream signaling and packet protocols (uses a subset of the Q.931 signaling protocol).
1.b. H.225.0 / RAS: registration, admission and status procedures.
1.in. H.245: control protocol for multimedia.
2. Processing audio signals:
2.a. G.711: pulse code modulation of tone frequencies.
2.b. G.722: 7 kHz audio coding at 64 kbps.
2.in. G.723.1: two-rate speech coders for multimedia communication with 5.3 and 6.3 kbit / s.
2.g. G.728: 16 kbps speech coding using linear prediction with low latency coding of the excitation signal.
2.d. G.729: 8 kbps speech coding using linear prediction with algebraic coding of the excitation signal of a conjugate structure.
3. Video signal processing:
3.a. H.261: video codecs for audiovisual services at 64 kbps.
3.b. H.263: Video coding for low bit rate transmission.
4. Conferencing for data transfer:
4.a. T.120: protocol stack (includes T.123, T.124, T.125) for data transfer between end points.
5. Multimedia transmission:
5.a. RTP: Real-Time Transport Protocol.
5 B. RTCP: real-time transmission control protocol.
6. Security:
6.a. H.235: Security and Encryption for H.323 Multimedia Terminals.
7. Additional services:
7.a. H.450.1: generic functions for managing supplementary services in H.323.
7.b. H.450.2: transfer the connection to the phone number of the third party.
7.in. H.450.3: call forwarding.
7.d. H.450.4: call hold.
7.d. H.450.5: parking a call (park) and answering a call (pick up).
7.e. H.450.6: notification of an incoming call in a call state.
7.zh. H.450.7: message waiting indication.
7.h. H.450.8: name identification service.
7.i. H.450.9: call termination service for H.323 networks.

Connection setup script based on H.323 protocol

SIP (Session Initiation Protocol)

SIP is a signaling protocol designed to organize, modify and terminate communication sessions. SIP is independent of transport technologies, however, it is preferable to use UDP when establishing a connection. It is recommended to use RTP for voice and video transmission itself, but the possibility of using other protocols is not excluded.

In SIP, two types of signaling messages are defined — request and response. There are also six procedures:

INVITE (invitation) - invites the user to take part in a communication session (serves to establish a new connection; may contain parameters for approval);
BYE (disconnect) - terminates the connection between two users;
OPTIONS (options) - used to transfer information about the supported characteristics (this transfer can be done directly between two user agents or through a SIP server);
ACK (confirmation) - used to confirm receipt of a message or to respond positively to an INVITE command;
CANCEL (cancel) - stops searching for a user;
REGISTER (registration) - transmits information about the location of the user to the SIP server, which can transmit it to the address server (Location Server).

SIP Session Scenario

Codecs

An audio codec is a program or algorithm that compresses or decompresses digital audio data, making it possible to reduce the bandwidth requirements of a data transmission channel. In IP telephony today, the most common transformation is through the G.729 codec, as well as G.711 compression according to the A-law (alaw) and μ-law (ulaw).

G.729

G.729 is a codec that compresses the original signal with data loss. The main idea embodied in G.729 is the transfer not of the digitized signal itself, but of its parameters (spectral characteristics, the number of zero crossings), sufficient for subsequent synthesizing on the receiving side. In this case, all the main characteristics of the voice, such as amplitude and timbre are preserved.

The bandwidth of the channel for which this codec is designed is 8 kbps. The frame length of the processed G.729 is 10 ms, the sampling frequency is 8 kHz. For each of these frames, the parameters of the mathematical model are determined, which are subsequently transmitted to the channel in the form of codes.

When using G.729 coding, the delay is 15 ms, of which 5 ms is spent on filling the pre-buffer. Note also that the G.729 codec makes quite high demands on the processor resources.

G.711

G.711 - voice codec, which does not involve any compression, in addition to companding - a method of reducing the effects of channels with a limited dynamic range. This method is based on the principle of reducing the number of levels of quantization of a signal in a high-volume region, while maintaining the sound quality. Two companding schemes widely used in telephony are alaw and ulaw.

The signal in this codec is provided by a 64 kbit / s stream. The sampling rate is 8000 frames at 8 bits per second. Voice quality is subjectively better than using the G.729 codec.

alaw

alaw or A-law is an audio data compression algorithm with loss of information. Mainly used in Europe and Russia.

For the x signal, the alaw transform is as follows:

Where A is the compression parameter (usually taken to be 87.7).

ulaw

ulaw or μ-law is an audio data compression algorithm with loss of information. Mainly used in Japan and North America.

For the x signal, the ulaw conversion is as follows:

where μ is assumed to be 255 (8 bits) in the standards of North America and Japan.

Pulse Code Modulation (PCM - Pulse Code Modulation)

Pulse code modulation - the transfer of a continuous function in the form of a series of consecutive pulses.

To obtain the modulated signal at the input of the communication channel, the instantaneous value of the carrier signal is measured by an ADC with a specific period. The number of digitized values per second (otherwise, the sampling frequency) must be greater than or equal to twice the maximum frequency in the spectrum of the analog signal.

Further, the obtained values are rounded to one of the previously accepted levels. Note that the number of levels must be taken in multiples of the power of two. Depending on how many levels were defined, the signal is encoded by a certain number of bits.

Signal quantization

This figure shows the coding using four bits (that is, all intermediate values of the analog signal will be rounded to one of the 16 preset levels). For example, when the time is zero, the signal will be represented in a similar way: 0111.

During demodulation, a sequence of zeros and ones is converted into pulses by a demodulator, the quantization level of which is equal to the quantization level of the modulator. After that, the DAC, on the basis of these pulses, restores the signal, and the smoothing filter finally removes inaccuracies.

In modern telephony, the number of quantization levels must be greater than or equal to 100, that is, the minimum number of bits with which the signal can be encoded is 7.

Issues of quality of service in IP-telephony (Quality of Service - QoS)

In networks based on the TCP / IP stack, high quality service for traffic sensitive to transmission delays is not provided by default. When using the TCP protocol there is a guarantee of reliable delivery of information, but its transfer can be carried out with unpredictable delays. UDP is characterized by minimizing delays, but there is no guarantee of correct packet delivery.

At the same time, the quality of voice traffic is highly dependent on the quality of the transmission, and in a network where mechanisms are not implemented to guarantee consistent quality, the implementation of IP telephony may not be satisfying the requirements of users.

The main indicators of quality of service are network bandwidth and transmission delay. The delay in this case is defined as the time elapsed from the moment of sending the packet to the moment of its reception.

There are also characteristics such as network availability and reliability (assessed by the results of monitoring the level of service for a long time, or by utilization).

To improve the quality of communication, the following mechanisms are used:

Rerouting. If one of the communication channels is overloaded, it allows delivery via backup routes.
Reservation of communication channel resources at the time of connection.
Prioritization of traffic. Provides the ability to tag packages according to their level of importance and perform maintenance based on tags.

As mentioned earlier, voice traffic is extremely sensitive to transmission delays. The maximum delay time should not exceed 400 ms (this includes the duration of information processing at the end stations). There are two main types of delays:

- Delay in coding information in voice gateways or terminal equipment. Reduced by improved voice processing and conversion algorithms.
- The delay introduced by the transmission network. Decreases by improving the network infrastructure, in particular, by reducing the number of routers and using high-speed links.

Sources of Delay in IP Telephony

Jitter

Another phenomenon characteristic of IP telephony is jitter, or, otherwise, the random delay in packet propagation.

Jitter is caused by three factors:

Limited bandwidth or incorrect operation of active network devices;
High signal propagation delay;
Thermal noise.

The most commonly used method of dealing with jitter is jitter buffer, which stores a certain number of packets.

Usually, a dynamic adjustment of the buffer length is provided for the duration of the connection. Heuristic algorithms are used to select the best length.

Jitter buffer

To compensate for the uneven packet arrival rate at the receiving side, a temporary packet storage, or so-called jitter buffer, is created. Its task is to collect incoming packets in the correct order in accordance with the timestamps and give them to the codec at the correct intervals and in the correct order.

Jitter buffer

Buffer size receiving VOIP device counts in the process of work, or forced to set in the settings. On the one hand, it cannot be too large so as not to increase the transport delay. On the other hand, a small buffer size causes packet loss due to changes in the delay time in the IP network.

From here comes one of the main contradictions, between Internet providers and IP telephony users. From the point of view of the provider, all packages are delivered to the subscriber, that is, there are no losses. And from the point of view of a VoIP device, the time difference between the arrival of packets significantly exceeds the jitter buffer. Therefore, in fact there are losses. In practice, a loss of more than 1% causes certain discomfort. At 2% of the conversation is difficult. With values greater than 4% conversation is almost impossible.

Jitter buffer size

The random propagation delay Ji for the i-th packet can be determined by the formula:

Where:
Di is the deviation from the expected arrival time of the i-th packet.
The deviation from the expected arrival time of the i-th packet Di is determined by the formula:

Where:
R is the arrival time of the packet in the RTP timestamps,
S is the RTP timestamp taken from the packet.

We give an example of calculating the expected size of the random propagation delay of the 5th packet, based on the previous two.

Let J4 = 10 ms; R4 = 10, R3 = 11, S4 = 6, S3 = 5, then D5 will be equal (10-11) - (6-5) = - 2.

On average, the random delay of the propagation time for one packet in the current example is 10 ms (more precisely, it can be calculated using the formula given above). Then, in order for no packet to be dropped, the jitter buffer size must be equal to 10 ms.

To determine the required jitter buffer size in megabytes, multiply the resulting value by 100 Mb / s - the average network bandwidth: 10 • 10 ^ -3 • 100 = 128 kb.

The size of the jitter buffer must be greater than the fluctuation of transit time in the network. For example, if for 10 packets the transit time ranges from 5 to 10 ms, then the buffer must be at least 8 ms, so that no packet is lost. It is better if the buffer is even larger, for example 12 ms, then the mechanism for re-querying lost packets will work.

Telephony Deployment Solutions

Asterisk

Asterisk is a software PBX that can switch both VoIP calls and calls made between IP phones and a traditional public telephone network.

Supported protocols: IAX, SIP, H.323, Skinny, UNIStim.
Supported codecs: G.711 (ulaw and alaw), G.722, G.723, G.729, GSM, iLBC, LPC-10, Speex.

Asterisk is a dynamically developing open source software that can be installed without regard to licensing. This makes this software PBX attractive for small and medium businesses. The number of subscribers in the network can reach 2000 and is limited only by server capacity.

Another advantage of Asterisk is the possibility of flexible settings. All the necessary functionality is either already implemented, or can be written independently without significant time and money costs. This is facilitated by the principle: one task - one software module.

Compared to solutions from vendors such as Cisco or Avaya, Asterisk is also attractive for deployment costs. In fact, all costs are reduced only to the purchase of telephone sets and a server capable of providing the required load on the network. The program itself is absolutely free.

Cisco Unified Communication Manager (CallManager)

CallManager is designed more for large networks of up to 30,000 subscribers. This hardware and software system ensures reliability of operation and allows you to configure many parameters, such as call forwarding or voice menu.There is also a “lightweight” express version, intended rather for small offices.

Of the benefits of Cisco CallManager, first of all, we should mention the famous Cisco technical support. With an appropriate level of a service contract, any problem, starting with tuning questions and ending with equipment that has failed, will be solved almost instantly. Therefore, Cisco CallManager is suitable for companies willing to pay a lot of money, but at the same time receive the highest quality of service.

Avaya IP Office

An IP Office system can be a good choice for a medium-sized telephone network. The number of subscribers here is limited not only by server capacity, but also by the number of licenses purchased. Almost everything needs to be licensed - expansion cards, used applications, etc., which can cause certain inconveniences.

Configuration can be done through a number of programs, but the most popular and easy to use is Avaya IP Office Manager. It is also possible to control via the console using the Avaya Terminal Emulator.

In general, Avaya products are not limited to IP Office alone. Avaya, merged in 2009 with another well-known manufacturer Nortel, is a recognized leader in the market for equipment for IP-telephony.

What can be read on the topic:

Wendell Odom - all his books are good.
“IP-telephony in computer networks”. I.V. Baskakov, A.V. Proletarsky, S.A. Melnikov, R.A. Fedotov.
“IP-telephony”. B. S. Goldstein, A. V. Pinchuk, A. L. Sukhovitsky.
“Asterisk. The future of telephony. ” Jim Van Meggelen, Leif Madsen, Jared Smith.
“SIP protocol. Directory". B. S. Goldstein, A. A. Zarubin, V. V. Samorezov.

Source: https://habr.com/ru/post/183152/

All Articles