An introduction to the SIP protocol, part 1

SIP is the primary protocol that's used by most VoIP and UC products. This tip will explain how the SIP protocol works.

The Session Initiation Protocol (SIP) is the primary protocol that's used by most VoIP and unified communications (UC) products, so I wanted to take the opportunity to introduce you to this protocol. In this series of three articles, I will explain how the SIP protocol works.

Before I begin

One of my goals in writing this series is to make the information that I am presenting both practical and easy to understand. That being the case, I am going to avoid taking a bit-level approach to describing the protocol and focus instead on its functions. I'm doing this because bit-level descriptions are typically useful to only a small group of people. If you need more detailed information than I am providing here, you can access the SIP protocol's RFC.

SIP's five main functions

The SIP protocol's primary job is to control user sessions. As such, the SIP protocol contains five primary functions that allow it to perform various session-related tasks.

The first of these functions is the user location function. As I'm sure you know, UC deployments often involve multiple networks, each containing multiple types of devices. As such, the SIP protocol has to be able to locate the end user geographically and to know what end systems will be used by the session.

The second function is user availability. This function is best known for the way that it is used in providing presence information. End users can tell the system that they are available to talk or that they are busy and do not wish to be disturbed.

The third function is the user capabilities function. The basic idea behind this function is that different devices have different capabilities. For example, there are many things that a computer is capable of doing that a phone is not. The user capabilities function allows SIP to make a determination of the media being used and of the parameters that are associated with that media type. For example, will the user be communicating using voice, video or something else?

The fourth function is the session setup. This is the function that is responsible for connecting a call. It establishes session parameters for both the caller and the recipient of the call.

The fifth of the primary SIP functions is the session management function. This is the function that allows users to end a call, transfer a call to someone else, or make modifications to the session parameters.

The protocol stack

Now that I have shown you the five basic functions associated with the SIP protocol, I want to take a step back and talk about how SIP fits in with the rest of the IP-based communications that are taking place on a network.

The most important thing you need to know about SIP is that it is designed only to create, manage and terminate sessions. SIP does not provide any services by itself; rather, it lays the groundwork for services to be provided by other protocols.

In Figure A, you can see that SIP is an application-layer protocol and is designed to work parallel to other multimedia protocols, such as the Real Time Streaming Protocol (RTSP) and the Real Time Transport Protocol (RTTP). Of course, this diagram is just an example. There are numerous other protocols that can be used to create a full-blown multimedia experience. For example, the Media Gateway Control Protocol (MEGACO) is also commonly used to control the connection to the gateway to the public-switched telephone network.

Figure A

The SIP protocol resides at the application layer of the OSI model.

In this diagram, the Session Definition Protocol provides SIP with a session description. This allows SIP to be aware of the type of session that needs to be established. SIP is then able to communicate through the IP network and use the session description to establish a session with the requested host.

I have also shown in this diagram how RTSP and RTTP can be used alongside SIP. Neither of these protocols is an absolute requirement; they are just in the diagram for demonstration purposes. RTSP controls the delivery of streaming audio or video. RTTP provides QoS feedback to ensure that the appropriate amount of bandwidth is being reserved for the session.


In this article, I have talked about the SIP protocol's primary function. In the next article in this series, I will discuss the various verbs associated with the SIP protocol, and what they are used for.

Read more on Voice networking and VoIP