Problem solve Get help with specific problems with your technologies, process and projects.

SIP school: A to Z on SIP

Whether you're an end user or an IP network engineer, this primer will help you clear a strategic path to SIP. You'll uncover some of the relevant services and solutions that SIP can provide for the enterprise. Additionally, you'll learn what IT staffs will find similar, and what they will find different, about SIP as compared to other IP-based protocols. Since SIP will impact many job functions within an IT organization, from the telephony team to the security team, we've made it easy to find out what you need to know about SIP as it relates to your responsibilities.

SIP school: A to Z on SIP
If you've found yourself wondering what SIP is and how it works, but don't want to read the latest book on SIP or surf the Web ad nauseam to get a handle on it, we have the just the guide for you. Whether you're an IT executive, an IP telephony architect, a unified messaging specialist or an end user, SIP School: A to Z on SIP will help you understand how SIP will impact your job and the way you approach it. In this complete guide to SIP, you'll learn how this signaling protocol's modularity and openness simplifies communications, improves productivity and sets the stage for multimedia networks. School is in session!

   SIP architecture overview
   What end users need to know about SIP
   What IT executives need to know about SIP
   What IP telephony architects need to know
   What unified messaging specialists should know
   What IP network engineers should know
   What everyone should know about SIP and security
   What developers and visionaries can do with SIP

Introduction  Return to Table of Contents

Service providers, startups, and even the FCC have made VoIP a household term recently. Behind the scenes, SIP is creating as much buzz in the technical community. The Session Initiation Protocol is expected to deliver services to date unthinkable in the multimedia communications space. "SIP is without doubt the leading protocol for VoIP, presence, instant messaging (IM) and conferencing collaboration," said Henry Sinnreich, an industry visionary and accomplished SIP expert. Third-generation wireless networks are being built on it, major IT software vendors have endorsed it and SIP is a part of most RFPs from enterprises and carriers.

Since its inception in the mid-90s, SIP has had much of its early success in carrier networks. Carriers have used VoIP and SIP to efficiently set up long-distance calls over their IP backbones. High-tech consumers have embraced cheap phone service over cable and DSL, so startups have leveraged SIP as a low-cost way to enter the business. Some cellular carriers use SIP to connect push-to-talk cell phones for reasonable per-month fees. Since the FCC recently ruled that individual states cannot regulate voice on the public Internet, VoIP and SIP's momentum will likely continue to grow in the consumer and cellular markets.

Less focus has been paid to what SIP offers an enterprise. Promising much more than commoditized phones and cheap long-distance service, SIP will link disparate communication systems and applications that to date were islands unto themselves.

SIP architecture overview  Return to Table of Contents

Enterprise environments are typically made up of multiple SIP clients, called User Agents (UAs), and one or more SIP servers. Depending on the implementation, the server may handle user registration, proxy, location, redirection, and/or presence services. Often times, multiple services are implemented into a single piece of hardware. Additionally, the SIP server can serve as a proxy to external SIP servers, like those in carrier networks that enable hosted services or VoIP trunk connectivity.

User agents are software entities that send and receive SIP messages on behalf of the client. UAs may be embedded in phones, video devices, IM clients and other devices. Figure 1 illustrates the connectivity between main components of SIP's architecture.

SIP uses simple text messages between UAs and their local proxy servers in a format derived from HTTP and SMTP. Proxy servers forward invitations across proxies if necessary, and eventually, a session and media stream is set up directly between UAs. After the setup of the session, the proxy is out of the loop unless called upon by a UA or purposely configured as stateful (for call accounting, for instance). SIP is thoroughly defined in the IETF's RFC 3261.

A valuable property of SIP is its modularity and openness to future development. SIP is a part of the IETF toolkit and is extensible, allowing additional or updated modules to be added without impacting the core protocol. As an example of SIP's extensibility, a standard instant messaging protocol, SIMPLE (Session Initiation Protocol (SIP) for Instant Messaging and Presence Leveraging Extensions) was created in 2001. SIMPLE provides an architecture for the implementation of a traditional buddy list-based instant messaging and presence application with SIP.

SIP's original draft went to press in February 1996 and is considered a very mature protocol, yet development continues. In the SIP working group, there are less than seven items that are not in the very last stages of standardization, according to Henning Schulzrinne, Professor of Computer Science and Electrical Engineering at Columbia University. "SIMPLE has a longer list," added Schulzrinne, an active SIP leader since its inception.

What end users need to know about SIP  Return to Table of Contents

Most users won't notice that SIP is working behind the scenes to unify their multiple media devices, but at some point, they will happily realize that something is starting to simplify communications like never before.

SIP is NOT a VoIP protocol

SIP specifies how to set up a session and not what will be in the session itself. As a result, many types of sessions (i.e. voice, video, IM, games) can be set up using the same core protocol. As such, SIP is not a VoIP protocol, but is used for much more.

A Session Description Protocol is used to define what kind of session is be started and what parameters need to be negotiated, like which codecs will be used. SIP servers, therefore, can be a single point of managing video, voice and IM services.

While SIP doesn't define the specific media gateways needed to source/sink communication streams (i.e. video bridges and TDM-to-IP voice codecs), gateways will still be needed until SIP and ubiquitous IP networks replace traditional PSTN circuitry.

"SIP is going to allow communications between devices, applications and users that were not previously possible or very expensive to do," said Bob Hafner, Gartner Group's chief of research, communications sector. "The ability to establish video sessions as easy as phoning; grabbing v-mail from e-mail queue; talking or listening to your business applications, these are all possible."

Even if the applications are desirable, today's users have concerns with privacy and want to be able control interruptions. SIP provides options to advertise presence and preferences, similar to a buddy list. The difference is that instead of being "idle" or "out-to-lunch" based on only the status of an IM client, SIP is multi-modal. The presentity can now be a desktop or a mobile phone, or another PC application. With a personal profile manager, users will be able to control presence settings for all their devices from a Web portal. They can set preferences to allow/deny others to see their presence based on time of day, the device they're currently using, and other factors. Such control will help to address big-brother type privacy concerns.

SIP will improve the productivity of the business person by enabling presence across multiple communications devices. Today, most users view the presence status of a buddy as it pertains to their desktop status only, resulting in inefficiencies in the way they communicate. For instance, they waste time playing voice/email tag and by instant messaging someone just to ask, "Can u talk?"

When SIP/SIMPLE IM is integrated with an IP-PBX, IM presence is combined with the on/off-hook status of the buddy's phone, all in the same buddy list. So a caller can see if their buddy is on the phone or not, eliminating the need for such a request. Combining presence from multiple SIP devices will inform the caller that the user is present or not, but the caller will not need to know on which number to call their buddy. They simply launch a message to and the SIP server will send the invitation to the correct device at the correct time, as illustrated in Figure 2 below.

SIP provides a single contact alias for all devices, called a Uniform Resource Identifier (URI). URIs take the form of, for instance. "The biggest early advantage will be greater use of one 'number' reach me applications that hide all the contacts for a user," said David Chavez, chief architect of Avaya's infrastructure products. A single number means contact lists with fewer constantly changing cell or email addresses. As an example, in Figure 2, the caller need not know the IP address or phone number of each individual device. Each device will register with the server which in turn provides the correlation to the URI. Thus, callers simply send an invitation to and the user receives the invitation on whatever device is active.

Another vision Sinnreich and others have for SIP is to embed presence into business applications. Instead of just people and IM clients in a buddy list, business applications can be buddies in the list. Then, presence can be used for applications to alert users. For instance, notifications about stock updates, inventory issues, or money wire transfers can be sent to users based on their presence and preference (i.e. as text in an email or IM, or as a live call).

SIP features:
As of this writing, SIP has nineteen features that are interoperable across vendor platforms, according to the IETF's SIPPING Work Group. While for the POTS consumer and basic office users, features like hold/transfer/conference are adequate, enterprises with more advanced needs may not yet be satisfied with SIP. Some vendors add extensions to the protocol (as was commonly done with H.323 and its variants) to provide additional enterprise features. There are different approaches to doing so, and care should be taken to assure the extensions don't go so far as to render the core SIPPING features inoperable across vendor platforms.

One approach to providing additional features to standards-based SIP systems is to tie a SIP proxy server to a "feature server." This allows a SIP server to extend some of the H.323/TDM features to SIP endpoints. This is done via a back-to-back user agent (B2BUA) embedded within the feature server. A B2BUA is a logical entity that can receive and process INVITE messages as a SIP user agent server (UAS) and can process requests as a SIP user agent client (UAC). This approach allows standard SIP endpoints to connect to the SIP server for native SIP functions (like presence), but get advanced features (like whisper page) extended from the feature server. Now, standard SIP endpoints can use an expanded feature set from the H.323 server, as opposed to only the base SIP feature set. For users, this means more of the features that they're accustomed to, on phones that are cost-effective for IT to purchase. But read on, as SIP offers so much more than cheap IP phones.

What IT executives need to know about SIP  Return to Table of Contents

SIP allows for components of the multimedia infrastructure to be strategically outsourced. With TDM environments, it was basically an all-or-nothing decision when it came to outsourcing telephony – enterprises either owned or leased their PBX/voicemail OR they contracted for Centrex. The reason was simple -- voicemail and application servers were typically connected with a hard wire to a PBX, making proximity a prerequisite. But since IP brings "death to distance," pieces of the infrastructure become more geographically flexible.

SIP will likely enable even more options for hosted services than H.323-based services did. Subtle differences between H.323 implementations and security issues made it difficult to locate a component offsite and connect it via IP to an onsite PBX.

With SIP, interoperability between systems promises to be much more simple, provided they're all following the RFCs. This will allow a service provider to host the "commodities" – the PBX and voicemail system – while the enterprise might host "profit center" components, like contact center intelligence and CTI servers. Carriers and hosted providers will offer value-add services like presence, IM, and collaboration tools, which may not fit the traditional sweet spot of an enterprise staff's core competencies.

Outsourcing options are changing, with communications vendors and small startups beginning to join traditional ASPs and ISPs.

The challenge for all hosted providers will be:

  • Meeting service levels for sensitive real-time applications, especially over networks that they don't manage
  • Meeting the needs of customers spread across broad geographical distances
  • Security across the enterprise/service provider border

The technical challenges will be countered by the benefits of:

  • Cost savings of a single IP connection to a service provider instead of both IP and PSTN connections
  • Mitigating the risk of an enterprise implementing new technology without past experience
  • Enterprises using fresh technology without the need for recurring capital investment
  • Bundled billing
Outsourcing is expected to be mainly targeted towards small and midsized businesses until the challenges have been met and large companies begin to buy in. When asked about the benefits of SIP communications to enterprises, Sinnreich said: "The distinction has to be made between internal communications and external communications." Internally, moving towards a SIP PBX from another "is a cost item," or a non-starter. But the real power of SIP will be in the linking internal communications to external applications: "The Internet using SIP. This is the critical capability for any business to communicate with customers and partners worldwide," he said.

What IP telephony architects need to know  Return to Table of Contents

Because of presence and multi-modal access, SIP can fundamentally change the way calls are handled. By extension, SIP will alter the architecture and scalability planning of telephony systems. Today's PBXs are often designed for maximum busy-hour call completions, and are designed to handle situations where a caller blindly dials and hopes the callee is there. The PBX in turn blindly switches the incoming call to ring the callee's phone, resulting in many ring-no-answer situations before the call is sent to voicemail.

"The integration of presence will optimize communication sessions," said Avaya's Chavez. "The concept of RoNA (ring no answer) will begin to disappear." Since callers know the presence status of their buddies, they will no longer blindly attempt calls without completion in SIP environments. As in Figure 2, since the SIP registrar knows on which device the user is currently active, session will be set up with the proper device immediately.

Today, significant system resources are allocated to account for worst-case situations, including ring-no-answer situations. SIP call servers will require fewer resources, since ring-no-answer rates will decrease. "Given that TDM networks have previously been engineered assuming a large portion of call attempts would end this way, eventually the engineering of all communication networks will have to adopt to different calling patterns than occurred historically," said Chavez.

As a result, SIP servers supporting general telephony will become more scalable than TDM or H.323 servers. But since SIP can do so much more than voice, "it remains to be seen if it can scale better when a large complex feature set is involved," said Chavez. The challenge to vendors is to design servers that harness the bonus processing power gained by the aforementioned to effectively meet the additional of maintaining integrated presence, IM and other stateful services.

In some ways, SIP and IP telephony in general promise to make the moves/adds/changes functions of the telephony administrator's job more simple. Since open APIs and IP applications will allow users to program their own settings, call handling, preferences, etc., the administrator won't need to be involved in each request for a change. What will still be critical is the upfront station (user needs) review process and creating rules of who can change what within the systems. A key component of this transition will be the integration of the telephony system to corporate databases (via LDAP, etc.), which will help manage the ever-changing user population and allow pushing policy to multiple systems from one source, like Active Directory. Once the rule sets and policy are created, then telephony administrators can step out of the MAC loop.

Is SIP a multi-vendor panacea?

SIP provides enterprises with more choices for user devices. "SIP offers a standards-based approach," said Gartner's Hafner. "This will provide end users with a far larger library of options, from a variety of competitors." SIP is more open to multi-vendor interoperability than H.323 or TDM, and although it is much younger, the SIPPING group has nineteen vendor-interoperable features, more than the senior H.323.

SIP allows enterprises to select the right device the unique needs of individual users. For instance, high-tech executives can be  provided a top-of-the line mobile PDA phone from one vendor, while another vendor's high-quality video phone may be best for a field technician to collaborage with product developers about client products defects. More generally, SIP opens up some exciting telephony service that IT developers can begin to customize for their enterprise. 

"SIP allows for systems to be interconnected more creatively than previous technologies allowed," said Chavez. "This could allow for different high availability architectures, or for greater seamless feature integration. The whole point is that the flexibility opens the imagination and innovation."

While SIP allows basic functionality between some vendors' proxies and other vendors' phones, for instance, it is not yet the silver bullet for all interoperability challenges. "Practically all VoIP vendors claim to support it, though their product maturity and standards compliance varies widely," said MCI's Sinnreich.

There is a greater likelihood of interoperability among proxies, phones, and session border controllers, and a lower probability among firewalls, IVRs, conferencing bridges and gateways. "Basic call setup to such devices is likely to work, but more complicated scenarios, such as third-party call control, or transfer (REFER) are less universally deployed," said Schulzrinne. "More difficult still are media control and conferencing, where standards are, at best, still evolving and where there are multiple solutions."

In conferencing systems and IVRs, however, the application-level logic is much more complex and proprietary than simply setting up a session. "This is generally not a SIP issue, but more the need for supporting protocols, such as those being developed within the XCON working group for centralized conferencing," concluded Schulzrinne.

SIP in contact centers:
SIP can improve customer relations with the enterprise in the contact center. There are two types of communications where IM/SIP can benefit both parties: with internal communications and in external communications (customer-to-enterprise).

Internally, enabling the contact center staff with presence will allow agents to see the onhook/offhook and IM status of their peers, supervisors, and subject matter experts. This should lower the time to resolve the call and improve the likelihood of doing so on the first call. As an example:

  1. An irate customer greets a tier one agent by immediately asking to speak with a supervisor.
  2. Instead of raising their hand and wondering when the supervisor will get off their current call, the agent glances at their buddy list and sees that the supervisor is on the phone but available to begin an IM session.
  3. After sharing basic information about the issue with their supervisor via IM, the agent sees that the presence of supervisor has changed to on-hook, so they click-to-call the supervisor and begin the voice conversation.
  4. The caller is transferred to the supervisor, who already knows the issue, and can resolve the issue quicker.

Multi-tiered help desks, technical support lines, and sales contact centers can easily leverage this technology to enhance the bottom line of the enterprise. Figure 3 exemplifies a similar example of a manufacturer's rep starting an IM with a technical expert before adding them onto a customer call.

Step 1: Customer calls sales representative.
Step 2: Rep needs an answer from a technical expert.
Step 3: Rep uses presence to "peek over the cubicle" to see if an expert is available.
Step 4: Rep sees that the expert is present but is busy on the phone.
Step 5: Rep IMs the expert and begins to get answers.
Step 6: Rep notices the expert has ended the conversation, and clicks-to-conference the expert to join the call with the client.
Step 7: Expert immediately answers the question and continues to IM with the rep in the background, suggesting an upsell.

Contact center technologists are considering how to build this powerful functionality into their existing contact center business processes. Tools like personal profile managers can allow agents to set preferences on who can view their presence, so that individuals aren't overwhelmed by too many IM sessions. Presence and preference can allow enterprises to leverage the expertise of subject matter experts that aren't part of the contact center routing functionality itself, without over-burdening their regular work.

SIP for carrier connectivity:
To date, enterprises that have implemented "VoIP trunking" have done so between their own VoIP systems over their own WANs. This has enabled toll bypass and a flattened enterprise dial plan. But a PSTN trunk was still needed at each site for connectivity to the outside world. But with SIP, enterprises can realize the vision of a single pipe to the carrier cloud for all of their voice/data/video traffic.

Carriers are now offering voice trunking services to enterprises over an IP connection. This single pipe to the cloud will carry Internet traffic as it does today, but will also carry voice calls that are destined for the PSTN..

To enable SIP trunks, enterprises create a logical connection over IP between their own SIP Proxy and their service provider's SIP proxy. Calls are made similarly as they are today, with an outside access code preceding the phone number. The dialed digits are sent within a SIP message to the carrier's SIP Proxy, where the carrier reads the digits and sends the setup message to the appropriate PSTN gateway. The SS7 network then takes over, ringing the called party's phone over their land line. When the call is answered, the voice stream is carried over IP from the enterprise IP phone (or gateway), over the IP connection to the carrier, to the PSTN gateway.

SIP trunks to carriers offer both hard dollar and productivity improvements including:

  • Elimination of the need for and cost of separate PSTN and data circuits at the enterprise
  • Reduced cost of communications between an enterprise and its partners and suppliers
  • Support for multiple forms of real-time communication including voice, video, and instant messaging, along with the benefits of user-based presence
  • Reduced toll charges from SIP origination/termination services to the PSTN for external long-distance and local access calls
  • Reduced hardware costs

It should be noted that while carriers and vendors may say they support SIP trunks, successful interoperability may vary. There are so many implementation options for the carrier and the telephony vendor to choose from that the enterprise needs to choose partners carefully. In fact, a SIP Forum Workgroup called the IP PBX/ Service Provider Interconnect Task Group was formed to get agreement on best practices for implementation.

SIP and emergency services:
IP phones can usually be picked up and moved with no intervention from an administrator. While IP's mobility reduces move/add/change work, it makes meeting e911 requirements a challenge.

SIP introduces nothing better or worse than H.323 to address the e911 problem. "There is no major difference between the two protocols," said Schulzrinne, whose work in the SIPPING work group includes a draft defining a Universal Emergency Address. "The problem is easy for enterprise deployments where users are not also using IP phones at home and where each enterprise has a single PSTN gateway on premise. Things become more difficult if you have mobile or nomadic users," he continued. "Identifying the location, via switch ports in the enterprise or via LLDP-MED in the mid-term, is one part of the puzzle. If you have mobile users, you still need to provide the location to the 911 routing infrastructure."

In today's IPT deployments, methods like LLDP allow the user's location to be coordinated with the LAN switch port or wall jack to which their device is connected. Enterprise users can pick up and move their phones and be located by the IP-PBX wherever they plug in based on the MAC address of the LAN switch port they connect to.

"In the next year, the so-called I2 solution will allow interfacing with the legacy PSAP infrastructure, but it will be primarily of interest to large consumer VoIP providers," Schulzrinne said. "The longer-term solution, called I3, will be more suitable for enterprises, but will require changes in the public safety infrastructure."

What UM specialists should know  Return to Table of Contents

Just because SIP supports presence doesn't mean users will want to be contacted anywhere at any time, so messaging servers will still have a home in the enterprise.

Messaging servers will not likely be presence managers themselves. Instead, they may be enabled with SIP UAs that communicate to an enterprise presence server. This will change the means in which a subscriber gets notified about their waiting messages. For instance, the presence server could inform the voicemail server that the called party is available on their IM client, but not on the desktop phone that was being called. As the caller leaves a voicemail message, instead of simply dropping the message in an inbox, an IM may be sent telling the called party that a message has arrived. Further down the line, a speech-to-text tool may convert the voicemail message to a textual instant message. Coupling presence servers with application servers will add much more decision-making intelligence about to which device the message should be delivered, improving the speed at which users respond and communicate.

SIP will likely improve the efficiency of voicemail system. Typical voicemail systems are sized based on ports, with a port being used for each active user sending or retrieving messages. Since SIP provides presence information about the called party, it's likely that less people will call users that they know are not present at their telephone. For instance, if the caller sees that the called party is not present at their telephone, but still requires real-time and immediate communication, they'll try the mode which they know the called party is using, like IM or their cell phone. Since fewer calls will be "blindly" going to voicemail, it's likely that smaller messaging systems can support the same number of users in the future.

SIP will change the functionality of today's find-me/follow-me features. Such features allow callers to opt out of leaving a message, and instead ask the system to find the callee at one or more alternate phone numbers (i.e. cell, home number) programmed by the user. Since this is a somewhat "blind" transfer, it's only effective when the user is available and interested in taking a call on the alternate device. If the callee's cell phone were off for the weekend, it is inefficient for the messaging system to have offered that option to the caller. When SIP is prevalent in multiple devices that all communicate to a presence server, the messaging system would only offer such a service to the caller if it will truly get them connected to their party.

What IP network engineers should know  Return to Table of Contents

SIP is similar, but simpler than H.323, and as such, network engineers need not reinvent the wheel to support the protocol.

Quality of Service (QoS) policies, always critical for multimedia, needn't change much for SIP. SIP uses the same protocols (RTP/UDP/IP) to deliver the media stream as does H.323. Therefore, classifying, policing, and prioritizing voice and video calls set up by SIP can be done in the same way as for H.323. The same concepts of delivering signaling packets without loss and media packets without delay or jitter apply in both SIP and H.323 systems.

Since SIP can set up different multimedia streams using the same signaling protocol, the various media needs to be classified and prioritized differently. To properly classify the QoS requirements for multimedia applications, the ITU created Recommendation G.1010. Rather than simply having one class for voice/video, one for signaling, and one for data, G.1010 specifies eight different categories based on user expectations for QoS. These categories are then mapped to router and switch queues according to their requirements for delay, packet loss, and jitter (delay variation).

In short, as next-generation networks begin carrying more media services across the enterprise, QoS classifications are becoming much more granular. For example, in the ITU model, audio communication isn't a single category, but instead, has separate sub-classes for conversational voice, voice messaging, and streaming audio. Table 1 illustrates the dramatic difference in the performance requirements of these voice services.

As G.1010 implies, it would not be optimal to handle streaming audio in the same router queue as conversational voice. The "hard" real-time requirements for voice (and video) are most stringent for delay, but streaming may be best served by another queue, since a bit more delay is worth assuring accurate delivery without loss or bit errors. Applying this to today's queuing terms, real-time voice would remain in the express forwarding queue while streaming audio would go into an assured forwarding queue. While its name implies instantaneous delivery, instant messaging is actually a "soft" real-time application, and is acceptable to users even when there are several seconds of delay.

More types of media means a more complex QoS policy and possibly, some conservative network engineering. If high-priority queues share different types of multimedia traffic, then the bandwidth should be carefully provisioned so that there is no contention in that queue. An example of where this could occur is when conversational voice and video conferencing share a single high-priority queue sent into typical MPLS clouds. In addition, the extra processing power needed by a network device to inspect, police, and prioritize incoming multimedia traffic will tax non-optimized routers and switches. Network over-engineering or congestion control algorithms can reduce such risks.

Protocol analyzers, probes, and other tools can analyze SIP messages and its clear text format with ease, so troubleshooting SIP's text-based messages with sniffers should be simpler than troubleshooting H.323's binary messages. If protecting the payload is desired, SIP messages can be encrypted using the Transport Layer Security (TLS) protocol. Since TLS encrypts the payload, and not the IP header, no new tools are needed from a network engineering standpoint to secure SIP but still troubleshoot the network layer.

What security professionals should know  Return to Table of Contents

Current SIP standards provide more than one way to protect signaling, including Transport Layer Security (TLS). SIP is just one of the many application layer protocols that can be encrypted using TLS, which is based on the Secure Sockets Layer specification to secure client-server connections. TLS is an interoperable and extensible protocol (RFC 2246) for servers and clients to authenticate the identity of one another, preventing masquerading attacks. TLS runs beneath the application layer but above the transport layer and is application layer independent.

Depending on the implementation, TLS may carry SIP between proxies, from UAs to proxies, and between proxies and other devices, like SIP-aware firewalls.

SIP natively runs over port 5060, while SIP-TLS runs over port 5061. As with H.323, this port is used from the server to the client, while the client to the server uses an algorithmically assigned port. IM media streams are carried over the same port.

Note that TLS does not encrypt the media (i.e. voice/video) stream. Voice and media can be encrypted using other protocols, like Advanced Encryption Standard or Secure RTP. Also, TLS does not ensure end-to-end encryption between UAs. For end-to-end signaling encryption, there exists a SIP standard for S/MIME key exchange.

A perceived flaw in the H.323 standard is its requirement to open a broad range of UDP ports for media streams. In its base implementation, SIP is no different -- RTP still transports media over a broad range of UDP ports. Fortunately, in both the SIP and H.323 cases, network devices can allow for appropriately sized pinholes to be poked in enterprise firewalls to allow media through on a per-session basis. These H.323 and SIP-aware firewalls provide NAT and recognize when sessions are starting, opening one inbound UDP port and one outbound port for a full-duplex media stream. When the media exchange completes, the ports are dynamically closed and the firewall once again sealed.

Securing the enterprise border:
Another DMZ component filling a security gap for VoIP is the Application Layer Gateway. ALGs are in the same category as SIP-aware firewalls and session border controllers. They have come about mainly since existing corporate data firewalls/NATs do not know how to process SIP signaling. There are many reasons why VoIP in general poses problems for firewalls and NATs:

  • Signaling and media are on separate ports
  • SIP embeds signaling and media IP addresses and ports within message headers and SDP bodies, and when packets destined for a UA hit the outside of a firewall, they're often dropped since that port isn't typically opened by the firewall.
  • Allowing media streams through a firewall typically meant leaving large port ranges open for media traffic.
  • Finally, standard firewalls may adjust NAT bindings indeterminately, which will cause intermittent issues.

ALGs solve firewall/NAT traversal problems and can be deployed to replace or augment existing firewalls. They support TLS upstream to the Internet and downstream to enterprise SIP servers. They can detect intrusions, protect against Denial of Service, and support SNMP alarming, logging/auditing, QoS tagging, and other required services. They enable outsourcing of strategic SIP components (i.e. voicemail), SIP Trunks to VoIP Service Providers, multi-Site Connectivity over Public WANs, and connections between enterprises.

"ALGs are an acceptable near-term enterprise solution for NAT/FW traversal," said Sinnreich. Yet the ALG industry is faced with a number of challenges, like how to handle VPNs and scalability. They frequently will need to be updated for changing SIP standards. As such, there is a current debate on the future of ALGs, with the future to be decided by enterprise demand.

Standards work is underway to solve the NAT issue without the use of gateways, since after all, SIP is designed to support universal peer-to-peer sessions. "The long-term solution is to have ICE/TURN in the UAs and STUN/TURN servers on the Internet, provided by ISPs or enterprises themselves," concluded Sinnreich.

The Interactive Connectivity Establishment (ICE), Simple Traversal of UDP through NAT (STUN) and Traversal Using Relay NAT (TURN) all represent a key part of a standards-based solution. ICE works when UAs behind a NAT make a request of a public ICE server, asking it "what was the public IP address (from the outside of the NAT) from which my packet came?" The ICE server responds by telling the UA "use this outside public address" and the UA inserts that public IP address into its signaling messages, instead of the private IP address. Figure 5 illustrates the basics of the system, but details of the ICE implementation are documented in the current "Best Practices for NAT Traversal" in the SIPPING work group's "draft-ietf-sipping-nat-scenarios."

Spim and other privacy concerns:
SIP faces a new challenge, IM spam (spim), or unsolicited commercial messages sent via instant messaging. Spim affects all IM systems, not just those using SIP/SIMPLE, so it is a consideration for any IM user. According to a study by the Pew Internet & American Life Project, of the 52 million people that use IM, nearly one-third been "spimmed." Spim systems use software bots that randomly generate aliases or surf Internet chat rooms and web sites for IM handles, then send commercial messages to the aliases from bogus accounts. In some cases, active code within the IM causes that message to be forwarded on to handles in the recipient's buddy list, permeating the message much like an email virus.

"I don't think it's an issue for closed, enterprise-only networks," said Columbia's Schulzrinne. In networks where enterprise-class IM systems are linked to external IM systems via a gateway, "a good protection is to only accept SIP messages from the outside via TLS and to make sure that the originating proxy name corresponds to the caller/sender domain," he said. "Longer term, authenticated identity bodies might be useful, but they are mostly of interest in closed communities in the absence of a certificate infrastructure," continued Schulzrinne. SIP/IM proxies in the demilitarized zone of the corporate network often act as the first hop of the TLS session into the enterprise.

For developers and visionaries  Return to Table of Contents

An exciting property of SIP is its modularity and openness to future development. SIP is a part of the IETF toolkit and is modular in nature, allowing additional or updated modules to be added without impacting the core protocol. This will allow quicker integration of new authentication or codec algorithms, for instance. Since SIP builds on well-understood protocols like HTTP, enterprise programmers needn't learn an all-new new programming language. Instead, they can address the needs of the particular business to build new services, without worrying about learning a new, complex, binary protocol. SIP's openness is exemplified by the fact that "3G wireless networks use the IP Multimedia Subsystem (IMS) platform based on SIP," according to Sinnreich.

Combined with Web services and XML-based applications, SIP will enable presence within business applications. Desktop programs that have references to business contacts within them will be enabled to show the presence of those contacts, on the screen within that application. There will be no need to toggle to an IM client to view the presence of the contact. For instance, an assembly line worker is viewing the parts needed for an upcoming order in an inventory application, and notes that there is a shortage. He can immediately view the presence line manager and procurement manager of his own company, and possibly even that of the parts supplier, within the inventory management application itself.

Taking the example a step further, SIP will allow the line worker to click-to-conference with all contacts that are present, firing up a collaborative conference call to discuss the inventory problem. Open web services will even allow the inventory application itself to check the presence of all relevant contacts, and interact with a SIP-enabled audio bridge to proactively out-call to the parties, removing the need for human to start the process.

Conclusion  Return to Table of Contents

SIP is gaining mindshare as the platform for multimedia networks of the future. SIP provides many differences when compared to H.323, such as:

  • User presence within multiple devices allowing informed communication decisions
  • User preferences all users to control who has access to them and when
  • Carrier-to-enterprise trunking to reduce costs and optimize networks
  • Easier creation of new services
  • Greater interoperability and choice of endpoints

While each of the above is at different stages of development, the reality is that they all can be leveraged now. To take advantage of SIP as a competitive advantage, build intellectual capital now with general VoIP or specific SIP trials. Schulzrinne advises: "I think the cheapest way to learn is to get an open-source proxy, some SIP phones and a few SIP soft clients. Even simpler, they can get a soft or hard client and subscribe to free services. These are basically SIP 'IP-Centrex' solutions."

Gartner's Hafner agreed: "Trials and testing are the only way," adding a recommendation about who should be involved in the trial. "Picking a base of power users - people that are eager to accept change and use new technologies and processes should be a key part of the target group."

SIP is certain to enable businesses to build customized applications that will provide them with competitive advantages. SIP will integrate devices and applications that have long been islands unto themselves, thus optimizing the way businesses communicate using voice, conferencing and IM.

About the author

Christian Stegh is the IP telephony practice leader for Avaya's North American region. His experiences include deploying and maintaining IP networks for an enterprise IT staff, designing converged networks, and integrating Avaya IP Telephony solutions into hundreds of client networks. His current responsibilities include providing Avaya's product developers with requirements, direction, and input from clients. He sits on Avaya's SIP core product development team, leads an internal team of SIP subject matter experts, and speaks regularly at IEEE and other industry events.

If there are specific areas of the article which you'd like more information, he'd be glad to take your input for future articles on the topic. Contact Christian Stegh.

Dig Deeper on VoIP Protocols

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.