Click here to visit valid8.com

VoIP QoS Metrics Explained

VoIP QoS is affected by several types of network errors and characteristics. Focusing on RTP QoS, or voice quality, these can result in either an increase in end-to-end delay or a loss of audio clarity.


End-to-end delay is present in all communications systems and its primary effect in VoIP is to cause speaker/listener discomfort and conversational difficulty. Delays below 100ms are acceptable whereas delays above 100ms will result in increasingly difficult conversation.


These are just a few of the delay sources that are critical elements in VoIP QoS.

Propagation Delay

This is the time it takes for data to traverse a network connection. Typically the propagation delay on copper wire is ~1ms per 100 miles. However, the actual delay varies with congestion and routing equipment along the connection.

Processing Delay

This is the time is takes for a codec to encode or decode a packet of data. This will normally be insignificant in a correctly scaled system. However, systems are scaled for cost-effectiveness and are designed using traffic models, so if the real-world traffic is abnormally higher it can cause overload conditions in the codec resources (typically a pool of DSPs) resulting in an increase in processing delay.

Packetization Delay

In addition to the network delays, the codec introduces algorithmic and packetization delays. Because the codecs work in blocks on data, they cannot encode and transmit until they have received and appropriate amount of audio data, this is the packetization delay is a multiple of the frame size e.g. G.723= 30ms, G.711= 20ms.

Algorithmic Delay

The algorithms used by lossy codecs like G.723.1 often use audio data in the adjacent packets. This is required for their advanced prediction models, however it introduces an addition delay known as the algorithmic delay e.g. G.723.1= 7.5ms, G.711 (lossless)= 0ms.

It is clear that the sum of all these delays can quite easily exceed 100ms and therefore impact QoS.


Jitter in VoIP is a measurement of the variation of per packet delivery around the ideal delivery time. For example G.711 packets should be delivered every 20ms, but even under good conditions actual delivery may vary by a millisecond or so, however this level of jitter does not cause a problem.


However jitter can be tens of milliseconds and it can randomly effect individual packets or groups of packets in unpredictable ways. VoIP systems implement dejitter buffers that hold the incoming packets before passing them to the codec, smoothing out the delays to remove gaps in the audio. If jitter increases then it means individual packet delay increases, so to compensate the buffer size can be increased either automatically or manually, in turn increasing delay and conversational difficulty.


However, this buffer can only get so long before the imposed delay becomes high enough that the call quality would suffer. Therefore upper thresholds are used at which point the missing packets will be discarded and the audio played anyway. These "discarded" packets are treated the same as packets lost on the network and may be successfully concealed under many circumstances.

Out of Order

Out of order packets are caused by routing problems. Individual packets may take different paths across the network, some routes may be slower resulting in their arrival at the receiver in the wrong order. This problem has the effect of increasing end-to-end delay as the receiver has to wait for the late packets before it can decode, this lengthens the decode buffer. This has the same effect as jitter.


Packet loss can be caused but a variety of problems including link failure and congestion. Loss falls into two categories, burst and random. It is important to distinguish between these types as the total lost packets for a call may seem low, but if those packets were lost in sequence they could represent clearly perceptible audio loss; far worse than if they we distributed over the entire call.

Burst Loss


Burst loss is very destructive even if aggressive action such as redundant audio is used. If there is burst loss, there will be significant audio loss.

Discrete Loss


Individual packet loss can be effectively masked using PLC (Packet Loss Concealment). This can be as simple as the receiver repeating the last packet of audio to fill the void, or producing some audio that contains similar harmonics as the previous packet. Redundant audio is also an effective scheme however it requires additional processing power, bandwidth and increases application complexity. However, with the growth of HD Voice, such schemes have become more popular as the quality expectations take a leap.
If the percentage of lost packets becomes too high, then speech becomes completely unintelligible.


This is caused by router errors so should be rare. It is easily countered so it should have no effect on the QoS unless it is so severe that the volume of repetition becomes a significant bandwidth hog. Its detection can provide clues that there may be critical problems within the network itself.

Valid8 Protocol Engine can help assess network, device and application performance by providing detailed metrics for audio, video and signaling, returning a CDR report similar to the one shown here:


For more on performance, scale, and conformance testing for telecommunications networks, visit www.valid8.com