Common SIP Terms Explained

  1. Home
  2. Programmable Voice
  3. Common SIP Terms Explained

Common SIP Terms Explained

Developers that are new to Voice Elements can be overwhelmed by the new terminology they hear. At times it can seem like a new language.

This is meant as a short guide to explain commonly used terms when setting up and troubleshooting SIP issues.


SIP stands for Session Initiation Protocol. It is a defined protocol for making and receiving calls using standard computer networks, instead of traditional telephone equipment. The SIP Protocol maintains control of the call (i.e. when a call is received, when it is answered, etc) and sets up the media, or audio for the call.

HMP Elements can be considered a SIP stack, but also has all of the features of a media stack as well, in that it manages both SIP and common VoIP media such as G711 and G729.


RTP stands for Real-Time Transport Protocol. It is the protocol used for transmitting the audio in a call. RTP is very sensitive to timing — if packets come in out of order or are dropped, you may notice an impact in the Voice Quality for a call.


G711 is the most commonly used codec for sending and transmitting audio during a SIP session. It is an open standard and can therefore be implemented without additional licensing fees. The bandwidth usage per port is approximately 64kbps, but in real world use, you may want to consider calculation for 80kbps. This means that a 2MB connection should be more than enough for 24 concurrent calls.

G711 is the default codec used with HMP Elements.


G729 is a commonly used codec for sending and transmitting audio during a SIP session. It is not an open standard, and requires additional licensing fees. However, the bandwidth usage is supposed to be around 8kbps, but in real world use, it is usually 25-50% of an equivalent G711 session.

G729 is supported by HMP Elements, but additional licensing fees do apply.

RFC 2833 (Out of Band DTMF)

RFC 2833 is the standard developed for transmitting DTMF digits. The DTMF packets are sent in the RTP packets separately from the RTP audio stream. It can be considered more robust and reliable than Inband DTMF. When given the option between RFC 2833 and Inband DTMF, we suggest using RFC2833.

Inband DTMF

Traditionally, DTMF digits are sent in the audio of a phone call on certain frequencies for certain durations. In the SIP World, this same concept is known as Inband DTMF, meaning that the DTMF digits become part of the audio stream. This can be considered less reliable than RFC 2833 because jitter and delays can prevent digits from being detected properly.


This is a term that is used to describe any loss in quality that is caused by RTP packets being dropped, or delayed by either the sender or receiver.

Common causes of jitter can include: Network issues, Slow internet connection, carrier issues, or trying to run too many ports in HMP Elements

SIP Endpoint

A SIP Endpoint can be considered anything that either originates or terminates a SIP session. Common SIP Endpoints include: SIP Phones, Soft Phones, or SIP software (such as HMP Elements.)

SIP Carrier

A SIP Carrier allows a SIP endpoint to make and receive calls via SIP to traditional telephone equipment.


A SIP Invite packet is an invitation to accept a call. Calls are initiated via SIP INVITE packets. When making a call, you will send an INVITE packet.  When receiving a call, you will receive an INVITE packet.


An OK Message in SIP is a type of confirmation message that a previous SIP packet has been accepted. For example, a SIP OK message is sent, when you answer a call. Also, a SIP OK message is sent when you successfully register to a carrier.


An ACK message is a confirmation of a certain packet. For example, a carrier will send you an INVITE, when you have an inbound call, and your SIP service is expected to send back a few messages to setup the Media (Audio), and then an OK to accept the call.

It is expected that the carrier will send an ACK message to confirm the OK message that you sent. If you see multiple OK messages without a corresponding ACK, you may have configuration issues that you need to resolve.

IP Based Authentication

Many SIP Carriers will authenticate SIP Calls based on the IP/Port combination of the packets that it received. They may also send calls to you based on the IP address / Port combination that you set up with your carrier.

SIP Registration

SIP Registration is a form of authentication with your carrier so that you are able to make or receive calls. These packets contain information about the network location of your endpoint, along with authentication information (such as a username and password). Once you authenticate, your carrier uses that IP address and Port to accept calls from you, and to decide where to send calls.

Continuous Speech Processing (CSP)

CSP is the process by which you can listen to and/or capture the speech of the caller at the same time you are playing something to the caller. It is used in processes such as speech recognition.

CSP Voice Formats

Depending on the architecture, CSP permits the following encoding when using CSP streaming. This information is useful when selecting the correct format to use for speech recognition (i.e. when using the Microsoft LumenVox Speech Recognition Engine).

DM3 Formats

On DM3, you don’t have to play and stream the same format.

Streaming Formats

  • G.711 mu-law PCM, 8 kHz sampling rate, 8-bit resolution (64 Kbps)
  • G.711 A-law PCM, 8 kHz sampling rate, 8-bit resolution (64 Kbps)
  • Linear PCM, 8 kHz sampling rate, 16-bit resolution little Endian and big Endian format (128 kbps)
  • G.711 mu-law PCM, 8 kHz sampling rate, 8-bit resolution (64 Kbps)
  • G.711 A-law PCM, 8 kHz sampling rate, 8-bit resolution (64 Kbps)

Springware Formats

On Springware, you must play and stream the same format.

Streaming And Playing Formats

  • G.711 mu-law PCM, 8 kHz sampling rate, 8-bit resolution (64 kbps)
  • G.711 A-law PCM, 8 kHz sampling rate, 8-bit resolution (64 kbps)
  • Linear PCM, 8 kHz sampling rate, 8-bit resolution (64 kbps)
  • OKI ADPCM, 8 kHz sampling rate, 4-bit resolution (32 kbps)
Was this article helpful to you? No Yes 15

How can we help?