IPsec vs TLS / SRTP for VoIP Security

Foreword

Under the cut, the translation of the original article , which I needed to do to pass the Ph.D. in English in the magistracy. I chose this text because, even while writing a thesis, I was well acquainted with its contents. Since then, about a year has passed, and only now I decided to publish it. It is noteworthy that during this time, solving the problems of protecting IP telephony, I had the opportunity to work with both TLS / SRTP and IPsec . I hope for someone it will be useful (as for me, at one time), or at least just an interesting reading. Write your opinion about this material.

PS By virtue of a sufficiently large volume, I deliberately lowered some things, omissions are marked by a triple point. The term Information Assurance was left without translation, I never met an analogy in Russian.

1. Introduction

...
In connection with the requirements of the Internet Engineering Task Force (IETF) to include IPsec in every implementation of IPv6 , it is reasonable to consider IPsec as a suitable protocol to ensure the security of VoIP . However, now prefer to use the protocol TLS (Transport Layer Security) - to ensure the security of SIP (Session Initiation Protocol) , and the use of SRTP (Secure Real-time Transport Protocol) , to protect RTP .

At the present time ( approx. Translation - at the time of writing the original, 2007) there is no comparison of these two approaches, this work provides such a comparison, discusses the advantages and disadvantages of each of the approaches. Based on this work, executors and designers of IA (Information Assurance) will be able to make informed decisions.

SIP becomes the dominant VoIP session protocol . RTP is the dominant protocol for packaging voice data and, then transporting it, between terminals, over IP networks. TLS , SRTP and IPsec are the protocols used to secure SIP and RTP sessions, they provide authentication, integrity and confidentiality of VoIP- related IP packets. ... The figure below shows the location of the TLS , IPsec , SRTP , SIP and RTP protocols within the model OSI .

The figure below shows the general scheme of a call using SIP and RTP .

...

2. Review of protocols

Sip

SIP is described in RFC 3261 as an application-level control protocol for creating, modifying, and terminating communication sessions with one or more participants. VoIP service providers are investing heavily in the development of SIP used for signaling in VoIP . The diagram, in the figure above, as a whole, shows the flow of messages related to establishing a SIP voice communication session.
...

Rtp

RTP is described in RFC 3550 as a protocol that provides end-to-end network transfer functions suitable for applications that transmit real-time data, such as audio, in broadcast or targeted network services. Currently it is the only protocol suitable for voice over VoIP . An RTP session is initiated by each SIP client upon receipt of an ACK or OK message, as shown in the figure above.

IPsec

IPsec is described in RFC 4301 as a set of security services at the IP layer, which allows the target system to select the required security protocols, define algorithms, and enforce the cryptographic keys necessary to provide the requested services. The IPsec SA (Security Association) is established before initiating SIP and RTP sessions, and once it is established, IPsec will be automatically used to secure SIP and RTP packets as they pass through the network layer of the OSI model , within the IP stack.

Tls

TLS is described in RFC 4346 as a protocol for securing communications on the Internet. The protocol allows client-server applications to communicate in a way that is protected from eavesdropping, damage and / or tampering with messages. TLS version 1.0 is also known as SSL (Secure Socket Layer) version 3.1. A secure TLS connection is established before initiating a SIP session. TLS is used to secure SIP packets as they pass through the transport layer of the OSI model , within the IP stack.

SRTP

SRTP is described in RFC 3711 as an RTP profile that can provide privacy, message authentication, and Real-time Transport Control Protocol RTP and RTCP protection. The general message passing scheme for SRTP and SIP is the same as in the figure above. SRTP is used to ensure the security of RTP packets, since they pass through the transport layer of the OSI model , within the IP stack, it relies on SIP messages, key exchange, and TLS , to authenticate SIP clients.

3. Comparison

...

Complexity of implementation and standards coverage

In terms of implementation, TLS is easier than IPsec to integrate with SIP . RFC 4346 has about 200 requirements for implementing TLS . On the other hand, IPsec has more than 500 implementation requirements, described in about 11 RFCs .

The IETF has published several documents on how SIP , TLS and SRTP can be integrated. In addition to this, the Ministry of Defense has developed technical compatibility requirements applicable to SIP security with TLS and SIP / TLS integration with SRTP . The authors of this article are not aware of the existence of IEFT documents describing how IPsec can be integrated with SIP or RTP , this issue is not well understood by the communications industry. Nevertheless, some research-oriented implementations have appeared that show that it is more difficult to implement this, due to the need to access the operating system kernel. VoIP providers typically install their VoIP applications on existing operating systems, such as Windows , Linux or UNIX , and usually have limited access, or do not have access to the operating system kernel at all.

Implementing both approaches on VoIP devices may not be possible, depending on the vendor and device type. For example, some endpoints are limited in memory, data storage size, and computing power and may not support TLS , SRTP, and IPsec implementations at the same time.

Hierarchical signaling support

The main marketing feature of IPsec is that it provides end-to-end encryption, which is required by most applications that work with data. However, commercial voice offerings are based on a hierarchical signaling model, in which the OS (terminal device) notifies the LCC (Local Call Controller) , in order to establish a communication session, using its own signaling protocol. LCC , as a rule, notifies the SS (Software Switch) provider, using SIP , to go to the external network, then the SS can notify another SS or LCC in order to complete the establishment of a communication session with the remote OS, as shown in the figure below (left) . A proprietary signaling protocol is used between the Shelter and the LCC so that the service provider can provide the user with unique value-added functions, which is not possible if a standardized protocol has been adopted.

In a hierarchical model, each hop hierarchy must be able to decrypt the signal packet, process and re-encrypt it before sending. This goes against the end-to-end security model. However, both IPsec and TLS can be implemented within a hierarchical model; however, at present, VoIP providers believe that TLS is better suited to this model.

The end-to-end security model is used to organize the data link and can be implemented using both SRTP and IPsec . IEFT has published a key exchange method for SRTP over SIP packets, so a session can be established after the completion of the signaling. A similar approach can be developed for IPsec , but this, at present, has not been done.

Bandwidth efficiency

Comparison of bandwidth efficiency makes sense with respect to the voice data transmission channel, since the effect of signaling packets on data packets on the bandwidth is insignificant.

Comparing the size of IPsec packets with SRTP is quite difficult, since they depend on the mode used (transport or tunnel), the number of padding bytes, and the authentication and integrity control algorithms used. Assuming that IPsec uses ESP (Encapsulating Security Payload) protocol in transport mode with minimal padding and small size, it can be argued that SRTP is 6% more efficient for IPv6 packets than IPsec . If the integrity of the IP header is required to be monitored, then AH (Authentication Header) protocol can be used in IPsec , which will entail additional overhead.

Using the same prerequisites in the case of SIP , IPsec requires 2 bytes more, compared to TLS , to protect SIP .

The impact of the RTP compression header is left without an estimate. In environments where such compression is used, SRTP is 10 bytes more efficient than IPsec. The table below summarizes the above.

Protocol	Package size, byte	Bandwidth, KB / s
SRTP	254	101.6
RTP / IPsec	270	108.0
SIP / TLS	1280	N / A
SIP / IPsec	1282	N / A

Commercial use

Commercial VoIP service providers invest heavily in the use of TLS and SRTP , to ensure the security of VoIP . IPsec was also considered for this task, but TLS and SRTP were considered the best solution. Currently there is no commercial IPsec implementation designed to provide SIP- based VoIP security. Vendors who use H.323 legacy signaling for their voice solutions are more likely to choose IPsec to protect their solutions. However, most H.323 providers currently use unencrypted solutions and move on to SIP- based solutions.

Information Assurance

The most common argument for using IPsec is that it provides end-to-end encryption. However, this advantage is not used in the case of VoIP signaling, since most implementations are based on, as mentioned earlier, a hierarchical signaling model, and TLS is better suited for this model. ...

The advantage of IPsec is that it protects data at the IP network level, which is lower in the protocol stack than TLS , which provides protection at the transport level. ...

Another difference between IPsec and SRTP is that IPsec encrypts the RTP header, while SRTP does not. The advantage of using IPsec here is that it hides useful information from a potential attacker. The disadvantage is that it limits the ability of firewalls and SBC (Session Border Controllers) to use microchannels on certain ports. This is especially critical for firewalls and SBC acting as network address translation ( NAT ) devices for multiple overlapping LCCs . Since the IP addresses of all arriving VoIP packets are directed to the firewall or SBC , the only distinguishing feature that the screen or SBC can use to determine the appropriate target LCC is the port number.

Both protocols use similar encryption, authentication, and integrity controls. For example, both protocols support public key encryption, AES symmetric encryption, and HMAC-SHA1 imitation protection. Thus, from this point of view there is no difference in security.

Connection establishment, change of keys and time of restoration of communication

To avoid an excessively long session setup time and a cutoff effect (packet loss at the beginning of a voice communication session), it is extremely important that the encryption key of the data transmission channel is distributed as part of the signaling process. The IEFT has determined how SRTP keys are distributed as part of SIP signaling by placing the key in the SDP (Session Description Protocol) body of SIP messages. IEFT has not yet developed an SDP encryption key distribution mechanism in IPsec . Moreover, the request / response model from RFC 3264 may prevent the inclusion of IPsec key information in SIP signaling messages.

Another issue is the delay associated with changing keys. Recent studies comparing the time for changing the TLS session and IPsec session keys have shown that IPsec takes about 20 times (26 ms vs. 1.3 ms) more time for changing keys than TLS . This is not a long period for a single shift, but it can be a problem if thousands of end devices try to change keys at the same time.

The last question is the delay associated with restoring a secure connection. SIP using TLS requires a minimum of 6 message exchanges. Restoring a SIP connection using IPsec is mainly related to the implementation of the Internet Key Exchange ( IKE) protocol and will depend on how the main, basic, or aggressive mode is used in the first phase of the exchange. Assuming that the main mode is used, IPsec requires 9 message exchanges (in the original, we are talking about IKEv1, which is currently replaced by IKEv2 , and IKEv2 , in turn, requires 4 exchanges if EAP is not used).

...

Network management

The main advantage of SRTP over IPsec is that the UDP and RTP packet headers are open to network service personnel, they can use the information obtained to find and fix network problems. IPsec encrypts these headers, destroys such information. From this point of view, IPsec and TLS are comparable.

Topology Hiding

IPsec has an advantage over TLS and SRTP in concealing the network topology, since IPsec is able to encapsulate the original header inside an encrypted load when it is used in tunnel mode. TLS and SRTP do not have this functionality, and must be relied on external NAT device to ensure it. However, most VoIP implementations are used not in the tunnel mode, but in the transport mode, which also does not provide such functionality.

4. Conclusion

Based on our preliminary comparison of using IPsec and TLS + SRTP bundles to protect VoIP , it is recommended that developers use TLS and SRTP . This approach is easier to implement and maintain, as it is more profitable than IPsec in terms of bandwidth usage. There is no significant security advantage from using IPsec compared to TLS and SRTP .

Such conclusions are based on an analysis of existing standards, current implementations of TLS and SRTP from VoIP providers and scientific-oriented implementations of IPsec , as well as on previously published comparisons. However, there are published papers comparing IPsec and TLS / SRTP implementations to ensure the security of voice sessions under working conditions either do not exist or they are limited and offer only a basis for further uses. One of the goals of further research can be effective mechanisms for transmitting IPsec key information through SIP messages and a comparison of security and performance for each approach.

Source: https://habr.com/ru/post/346862/

All Articles