Excerpt: Securing VoIP Networks: Threats, Vulnerabilities and Countermeasures

Authors Peter Thermos and Ari Takanen discuss the strengths and weaknesses of SRTP.

The following is an excerpt from the book Securing VoIP Networks: Threats, Vulnerabilities and Countermeasures. In this section of Chapter 6: Media Protection Mechanisms (.pdf), authors Peter Thermos and Ari Takanen discuss the strengths and weaknesses of SRTP.


Any multimedia application-such as video, voice, or gaming-uses a distinct set of protocols to set up sessions between end points (for example, SIP, H.323) and a distinct protocol to transmit the media streams. The standard protocol used to exchange media streams is RTP1 (Real Time Protocol), which is defined in RFC 3550. As discussed in Chapter 3, "Threats and Attacks," RTP streams can be intercepted and manipulated in order to perform various attacks. Although IPSec can be used to protect RTP, its limitations require a more scalable and versatile solution that alleviates the NAT traversal issue, dynamic allocation of sessions,2 and the need for a PKI. This has led to the development of SRTP3 (Secure Real Time Protocol). The use of SRTP requires a mechanism to exchange cryptographic keys before sending any media. Therefore, key management protocols such as MIKEY and SDescriptions4 have been proposed to provide the necessary keying material and management mechanisms to maintain the security of multimedia sessions. Currently, there is not a single key-exchange mechanism considered to be the industry standard because each has strengths and weaknesses. The most logical approach: to combine SRTP with the appropriate key-exchange mechanism is to identify the requirements that need to be supported by the environment and evaluate the applicability of each of the existing key management mechanisms. Alternatives to using SRTP include DTLS (Datagram Transport Layer Security) and IPSec, which were discussed in Chapter 5, "Signaling Protection Mechanisms." The following sections describe SRTP and discuss its strengths and limitations.


The Secure Real Time Protocol (SRTP) is a profile for the Real Time Protocol (RTP, IETF RFC 3550) to provide confidentiality, integrity, and authentication to media streams and is defined in the IETF RFC 3711. Although there are several signaling protocols (for example, SIP, H.323, Skinny) and several key-exchange mechanisms (for example, MIKEY, SDESCRIPTIONS, ZRTP), SRTP is considered one of the standard mechanism for protecting real-time media (voice and video) in multimedia applications. In addition to protecting the RTP packets, it provides protection for the RTCP (Real-time Transport Control Protocol) messages. RTCP is used primarily to provide QoS feedback (for example, round-trip delay, jitter, bytes and packets sent) to the participating end points of a session. The RTCP messages are transmitted separately from the RTP messages, and separate ports are used for each of the protocols. Therefore, both RTP and RTCP need to be protected during a multimedia session. If RTCP is left unprotected, an attacker can manipulate the RTCP messages between participants and cause service disruption or perform traffic analysis.

The designers of SRTP focused on developing a protocol that can provide adequate protection for media streams but also maintain key properties to support wired and wireless networks in which bandwidth or underlying transport limitations may exist. Some of the highlighted properties are as follows:

  • The ability to incorporate new cryptographic transforms.
  • Maintain low bandwidth and computational cost.
  • Conservative in the size of implementation code. This is useful for devices with limited memory (for example, cell phones).
  • Underlying transport independence, including network and physical layers that may be used, and perhaps prone to reordering and packet loss.

These properties make the implementation of SRTP feasible even for mobile devices that have limited memory and processing capabilities. Similar design properties are found in MIKEY (Multimedia Internet KEYing). Therefore, the use of MIKEY for key exchange and SRTP for media protection is one combination of mechanisms to provide adequate security for Internet multimedia applications, including VoIP, video, and conferencing.

The application that implements SRTP has to convert RTP packets to SRTP packets before sending them across the network. The same process is used in reverse to decrypt SRTP packets and convert them to RTP packets.

To see how the conversion process works and to learn more about key management defense measures, download the rest of Chapter 6: Media Protection Mechanisms (.pdf).

1 H. Schulzrinne, et al. "RTP: A Transport Protocol for Real-Time Applications," IETF RFC 3550, July 2003.

2 P. Thermos, T. Bowen, J. Haluska, and Steve Ungar. Using IPSec and Intrusion Detection to protect SIP implanted IP telephony. IEEE GlobeCom, 2004.

3 M. Baugher, D. McGrew, M. Naslund, E. Carrara, and K. Norrman. "The Secure Real-time Transport Protocol (SRTP)," IETF RFC 3711, March 2004.

4 F. Andreasen, M. Baugher, and D. Wing. Session Description Protocol Security Descriptions for Media Streams, IETF draft draft-ietf-mmusic-sdescriptions-12.txt, 2005.

Reproduced from the book Securing VoIP Networks: Threats, Vulnerabilities and Countermeasures Copyright [2007], Addison Wesley Professional. Reproduced by permission of Pearson Education, Inc., 800 East 96th Street, Indianapolis, IN 46240. Written permission from Pearson Education, Inc. is required for all other users.

Read more on Voice networking and VoIP