100% found this document useful (14 votes)
4K views10 pages

Practical Troubleshooting of G729 Codec in A VoIP Network

Journal of Telecommunications, ISSN 2042-8839, Volume 27, Issue 2, October 2014 www.journaloftelecommunications.co.uk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (14 votes)
4K views10 pages

Practical Troubleshooting of G729 Codec in A VoIP Network

Journal of Telecommunications, ISSN 2042-8839, Volume 27, Issue 2, October 2014 www.journaloftelecommunications.co.uk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

JOURNAL OF TELECOMMUNICATIONS, VOLUME 27, ISSUE 2, OCTOBER 2014 12

Practical Troubleshooting of G729 Codec in a


VoIP Network
Yuri Ritvin

Abstract In the modern telecom arena VoIP networks became ubiquitous and their share is ever-growing bringing more advanced
services to the end users. While the VoIP systems' advantages are undoubted, there are multiple challenges during their deployments and
this requires a high level of a technical expertise from the personnel involved in such projects. This paper particularly discusses aspects
related to a proper G729 codec deployment, describing the pertinent problems and troubleshooting approach successfully used by the
author to solve them.
Index Terms G729, codec, VoIP, RFC2833, troubleshooting, sniffer, QoS



1 INTRODUCTION
VoIP (Voice over Internet Protocol) communications became
common today as they replace the legacy telephony services
everywhere in a global technological shift towards "All IP
network" when the packet switched networks will
eventually supersede all the circuit switched networks. But
in the IP (Internet Protocol) technology there are multiple
challenges for achieving a good voice quality. These
challenges are intrinsic to the IP networks' nature and are
stipulated by a few major affecting factors as follows:
packet delay (also known as latency)
jitter (variation between delays of different packets
at the same voice stream)
packet loss (packets from a sender that had never
been received by a conversation counterpart).
As a threshold for a sound VoIP call quality the following
values of the mentioned above factors are commonly
acceptable:
delay 150 ms (one way, per an ITU-T
recommendation G.114, see [1])
jitter 30 ms
packet loss 1 %
As well there are some additional influencing factors as the
out-of-sequence packets (when network is affected by
insufficient QoS), jitter buffer misconfiguration (different
for particular VoIP appliances) and echo.
To achieve a desirable voice quality all these factors have to
be taken into account during a planning phase of each VoIP
project, because the technological decisions, made at that
phase, will have a critical impact on the project's
deployment results. Along with technological
considerations it's important to work with experienced
professionals for such deployments since, for instance, even
if there will be plenty of reserved bandwidth in a network
for a VoIP part, the voice quality can be affected by
improper setup of pertinent hardware, buggy software
versions and non-optimal configurations.
One of the key questions to address during the planning
phase is what codec to use in the VoIP system ? The answer
to this question will have an essential impact on a scope of
the effort to ensure the desired voice quality.
2 VOICE CODEC SELECTION
There are many codecs that can be used in VoIP systems
(see Appendix A for the codecs' list), but the most common
one is G711 (described in [2]), the default codec in majority
of VoIP implementations.
There are 2 versions of this codec - G711a and G711u, but
both have the same bit rate of 64 Kbit/s. In USA G711u is
used, while G711a is used mostly in Europe. The advantages
of this codec are in high voice quality with Mean Opinion
Score (MOS) of 4.1 (on a scale from 1 to 5), simplicity of
deployment, ubiquity and availability on every vendor
platform. But the quality comes in expense of the high
bandwidth utilization per each call - G711 call consumes
87.2 Kbit/s minimum. This number is built of the voice
payload, which is 64Kbit/s as produced by DSP (Digital
Signal Processor) that converted analog voice into digital,
and 23.2 Kbit/s of a network overhead - the "vehicle" that
transports G711 voice payload over the network.
The calculation is based on a G711 codec sampling rate of
8,000 times per second with each sample size of 8 bits - this
comes to 64,000 bits per second or 8,000 bytes since a byte is
equal to 8 bits. The standard packetization rate of G711
codec is 50 pps (packets per second), meaning a time frame
size of each packet is equal 20 ms and the voice payload
per packet is 8,000 bytes / 50 = 160 bytes. This payload is
encapsulated into a network packet that adds the overhead,

Yuri Ritvin is the founder of YRI CORP, a professional services company
in fields of telecommunication, VoIP, security, databases and systems,
Internationally recognized telecommunication expert, with over 20 years of
experience. Author of a few patents and among them a patent pending
CareOneCall project that will change drastically the future of 911 system
bringing together modern telecom technologies, health care industry and
public safety systems. Has a M.Sc. degree in electrical engineering and
holds numerous professional certifications including Cisco CCNA, CCNP,
Unix Administrator, Oracle and MySQL DBA.


JOURNAL OF TELECOMMUNICATIONS, VOLUME 27, ISSUE 2, OCTOBER 2014 13
mentioned above, consisting of "envelopes" of Layer 2
(Ethernet in most cases), Layer 3 (IP), Layer 4 (UDP) and
service protocol (RTP - Real Time Transport Protocol) that
provides time stamps and sequence numbers for the
packets. The entire packet size is dissected in a Table 1
below with a total bandwidth consumption's calculation for
50 pps (packets per second).
Along with the voice payload (aka "media") VoIP traffic
includes signaling that is responsible for the call session
establishment, media parameters negotiation, call session
maintenance and tearing down. Most popular signaling
protocol today is SIP and the SIP traffic is in average about
5% of total media traffic. As well some additional
bandwidth is consumed by an RTCP protocol, which is "a
companion" of RTP responsible for network conditions

Size in bytes per one packet Size in bits
Ethernet
header
IP
header
UDP
header
RTP
header
Total
overhead
per packet
Voice
payload
Total
VoIP
packet
Total bandwidth
consumption per
second
18 20 8 12 58 160 218 218 x 8 x 50 = 87200
Table 1

monitoring and reporting during the established voice
session essentially providing a feedback on the quality of
service (QoS). RTCP traffic volume is also around 5% of
media traffic (according to [4], 6.2). So, in the network with
an available bandwidth of 100 Mbit/s dedicated to the VoIP
usage there will be possible to place 1042 simultaneous G711
calls (100,000,000 / (87,200 x 1.1) = 1042).
For the VoIP provider it'll be just a half of that number
(521) if all media - the voice itself or fax - is going thru the
VoIP provider premises since each end-to-end call will
consist of 2 call legs (as shown in Figure 1): one - aka leg A -
from a subscriber (a caller) into the VoIP provider's
softswitch and another - aka leg B - from that softswitch to
the actual call destination (a callee).
Bandwidth in most cases is a pricy asset, so to reduce its
exhaustion or to constrain the need for its expansion there
were introduced voice codecs that consume much less
bandwidth than G711. The list contains many (see
Appendix A) - some more popular, some less. Many factors
influenced codecs adoptability in the industry like
complexity of implementation, availability of source code,
licensing fees and others. One of the bandwidth saving
codecs that received wide popularity is G729 with a bit rate
of 8 Kbit/s, which 8 times less than G711 with 64 Kbit/s.




Figure 1
JOURNAL OF TELECOMMUNICATIONS, VOLUME 27, ISSUE 2, OCTOBER 2014 14
3 G729 SPECIFICS
G729 codec is a good choice for the networks where
bandwidth cannot be easily increased regardless of the
reason - like network hardware capabilities limitation,
leased lines availability, cost prohibiting cases, etc. Low bit
rate is achieved with a patented audio data compression
algorithm that compresses digital voice by default in
packets of 10 ms duration (frame size). It is officially
described as Coding of speech at 8 Kbit/s using code-
excited linear prediction speech coding (CS-ACELP) in [3].
G729 codec voice quality score (MOS) is 3.92, which is
slightly less than that of G711, but is still considered as good
enough (aka toll-quality). The bandwidth conservation,
however, is significant - G729 consumes only 31.2 Kbit/s
(see Table 2 below) in comparison to G711 that takes
minimum 87.2 Kbit/s (as shown in a Table 1 above).
Capacity-wise in comparison with G711 for the same
bandwidth of 100 Mbit/s there will be possible to place 2913
simultaneous G729 calls (100,000,000 / (31,200 x 1.1) = 2913)
that is 2.8 times more than with G711.
Despite the fact that a default packet time frame size of
G729 codec is 10 ms, it may have different frame sizes
depending on particular implementation: 10 ms, 20 ms, 30
ms. Table 2 shows case for 20 ms frame size that is 50 pps
(packets per second).

Size in bytes per one packet Size in bits
Ethernet
header
IP
header
UDP
header
RTP
header
Total
overhead
per packet
Voice
payload
Total
VoIP
packet
Total bandwidth
consumption per
second
18 20 8 12 58 20 78 78 x 8 x 50 = 31200
Table 2

There are a few variations of G729:
1) G729 original.
2) G729A or annex A - it's a simplification of G729 and
compatible with G729. Less complex algorithm, but
produces lesser voice quality.
3) G729B or annex B - provides silence suppression
and not compatible with the previous ones.
4) G729AB - essentially G729A with silence
suppression and only compatible with G729B.
As well there are G729 versions with 6.4 kbps (annex D) and
11.4 Kbps ( annex E).
When 2 VoIP peers conduct media capabilities
negotiation during a call establishment phase they have to
agree on the right codec's version to use in order to have the
quality communication. In a case when a SIP protocol is
used for signaling such the negotiation is managed within
an SDP portion of a SIP message body (SDP stands for
Session Description Protocol). Each codec is specified there
with a number corresponding to an RTP payload type in the
media description and media attribute fields and for G729
codec this number is 18. This number, however, remains the
same for all versions of G729 codec despite their actual
annex. The difference is indicated in another media attribute
- annexb=no (for G729/G729A) or annexb=yes (for G729B).
Figure 2 shows these media attributes in an actual call
trace. Absence of the annexb attribute in an SDP part of a
SIP message body is interpreted by some vendors as a
declaration of G729A version, but by some others it's
considered as a declaration of G729B, so it's important to
include this attribute for explicit indication of the desirable
version.

Figure 2
JOURNAL OF TELECOMMUNICATIONS, VOLUME 27, ISSUE 2, OCTOBER 2014 15
4 LICENSING CONSIDERATION
G729 includes patents from several companies and is licensed
by Sipro Lab Telecom. Sipro Lab Telecom is the authorized
Intellectual Property Licensing Administrator for G729
technology. OEM vendors sale G729 licenses in different prices
depending on amount of channels requested by end users.
Retail price of a single channel license is $10. For wholesale
cases there are discounts - more licenses requested, less it'll
cost. A single channel for a purpose of licensing is any
connection to a softswitch that activates the codec processing -
this can be a call session that required transcoding between
G729 and any other codec or a need in an IVR (Interactive Voice
Response) session. In the latter case 2 licenses will be required
for the call - the first one for an initial caller's channel and the
second one for the IVR channel. When both call parties - caller
and callee - use the same codec there is no need in the codec
license activation on a softswitch and this case is known as a
path-thru call.
5 PROBLEMS
Problems of G729 codec deployment stem from the
described earlier plenty of the codec variations and from
discrepancies in understanding of the codec implementation
by different vendors even when the same version is
declared by both peers participating in a call session. When
all configurations look good, but the actual voice quality is
bad - like voice distortion / garbling, very low volume,
choppy voice / breaking up - the reason is not always easy
to identify. Additionally to voice, special attention should be
given to a DTMF method definition since a voice stream
compression applied in G729 codec distorts the inband
DTMFs. Two possible DTMF methods are compatible with
G729 - RFC2833 (RTP events, described in [5]) and a SIP info
(described in [6]). Some vendors prefer RFC2833 (RFC2833
is a universal name for the RTP events method despite the
fact that RFC2833 recommendations per se had been
superseded by RFC4733, http://tools.ietf.org/html/rfc4733
) while others recommend to use the SIP info with their
equipment. The proper method is chosen during the actual
interoperability tests in the field and - in a case of problems -
after conducting the necessary troubleshooting activities.
In some cases the problem appears in a long-time properly
working system without apparent reason. If such a problem
persists or can be reproduced, then it's a "good" situation for
troubleshooting since the "culprit" can be caught during the
troubleshooting session.
6 TROUBLESHOOTING
Undoubtedly, the best sources of the information for VoIP
systems' troubleshooting are the packet traces that are
collected using network sniffers. The traces have to include
both media and signaling together. Some softswitches allow
easy trace collection from a command line, some switches
require attachment of external network sniffer or - in cases
when it's feasible - a trace can be taken at some point on the
network pipe, like Firewall or an access switch with a port
mirroring, for instance. Additional sources of valuable
information are application and system logs. When a
softswitch has a rich logging capability, then the log should
be set to maximum verbosity in order to catch the errors that
can shed light on the codec interoperability problems.
For the effective troubleshooting the traces should be taken
simultaneously on both sides of the call channel and it - of
course - requires cooperation of both involved parties from
both sides of a SIP trunk. Analysis of the collected traces
will show the actual call processing.
Figure 3
JOURNAL OF TELECOMMUNICATIONS, VOLUME 27, ISSUE 2, OCTOBER 2014 16
The good choice to perform such analysis is a Wireshark
network sniffer. When the trace is open with Wireshark, the
VoIP calls are extracted by selecting a "VoIP Calls" option
from a "Telephony" menu item (see Figure 3). From a list of
VoIP calls the call flow diagram is displayed by highlighting
a particular call and then selecting a "Flow" button at the
bottom (see Figure 4).




















Figure 4
Figure 5
JOURNAL OF TELECOMMUNICATIONS, VOLUME 27, ISSUE 2, OCTOBER 2014 17
In the call flow diagram an entire call session is presented in
a graphical visual form (see Figure 5), where a signaling
part and a media part are commented as SIP and RTP
respectively, and DTMFs are marked explicitly. Reviewing
of the call flow diagram is a good starting point for finding
the problem's cause. Clicking on any arrow in the call flow
diagram opens a corresponding packet in a details window
of a main Wireshark screen (see Figure 6) and allows a deep
analysis of the packet content.
In particular, Synchronization Source identifier's number
will allow to correlate the corresponding voice stream with
the right direction at the next troubleshooting step - RTP
stream analysis.
To get to the latter step, first, a "Telephony" option should
be chosen from a Wireshark menu, then an "RTP" option
and from there "Stream Analysis" (see Figure 7). In the
Stream Analysis window (see Figure 8) there are 2 tabs -
Forward direction and Reversed direction.




Figure 6
Figure 7
JOURNAL OF TELECOMMUNICATIONS, VOLUME 27, ISSUE 2, OCTOBER 2014 18
In each direction the first thing to analyze is "Delta" time
between packets in ms. This should be as close as possible to
the voice packets' frame size per the codec's negotiated
packetization time. The latter is shown in the SDP portion of
the SIP message body at a Media attribute called ptime, for
instance, ptime=20 (see Figure 2).
Meticulous attention should be paid to deviations of the
delta, starting from a packet that has the maximum delta
(Max delta) as indicated in the voice stream summary at the
bottom of the Stream Analysis window (see Figure 8).
The output in the Stream Analysis window can be rebuilt
according to the ascending or descending order of any
column. If to rebuild it in the descending order per a Delta
column values, then it'll allow to assess how stable was the
voice stream and whether there were packets with an
abnormal delta and whether the amount of such packets
was considerable.In the problematic cases the abnormality
will be noticeable (see Figure 9) - an expected RTP packets'
frame size (packetization time, ptime) is 20 ms, but the
analysis shows many packets with a delta time close to 40
ms that is twice bigger than expected, meaning there were
chunks of conversation with a packetization rate of 25 pps
instead of expected 50 pps.
This means the ptime that was negotiated between call
parties (ptime=20) is not maintained by the source
softswitch and it causes the voice quality deterioration on
the destination softswitch or softphone. This finding should
lead to validation of the softswitch configuration,
particularly, to check within the configuration and operation
manual or wiki what are manufacturer's recommendations
in such a case.
For instance, for such a popular softswitch like FreeSwitch,
the recommendation is to add the following statement into
the configuration file vars.xml:
<X-PRE-PROCESS cmd="set"
data="rtp_manual_rtp_bugs=IGNORE_MARK_BIT"/>

Figure 8
JOURNAL OF TELECOMMUNICATIONS, VOLUME 27, ISSUE 2, OCTOBER 2014 19












































And after making the change the service should be
restarted, like "service freeswitch restart".
Discrepancy in the packetization rate between source and
destination - like in the depicted case - is one of the major
reasons of the bad voice quality, but, as mentioned before,
there is an impact of other factors, for instance, of a jitter
buffer's size.
Usually, a jitter buffer is applied on a receiving side, but
sometimes it's overlooked during an integration phase and a
jitter buffer is set on a softswitch for purpose of the choppy
voice occurrence elimination. For G729 codec this introduces
extra latency over the acceptable limit, because G729 has the
built-in compression delays of 10 ms on each side of the
call channel - during encoding and during decoding. So, the
delay budget of G729 codec is 20 ms less than G711, for
instance, and it should be taken into consideration. Many
packets are dropped by the jitter buffer as a result of its size
misconfiguration and the call is perceived as breaking up
and garbled. Another cause of the problematic voice quality
is out-of-sequence delivery of the RTP packets. Each
packet in the network, in general, can take its own path -
this is one of the tenets of the packet switching technology
vs the circuit switching technology where all
communication, related to a particular call, goes via a
strictly predefined path. When packets of the same voice
stream are sent via different paths (different routers) it can
be because of the network congestion conditions, network
convergence or multi-path load balancing on the routers
staying in the path between the call parties. At any rate the
packets that came out-of-sequence are dropped during the
voice stream reconstruction by the call receiving party and it
negatively impact the voice quality. The solution for such
cases is in striving to achieve the network QoS agreement
with the VoIP provider thru establishing of MPLS circuit or
via installing a dedicated leased line (like a fiber circuit).
Figure 9
JOURNAL OF TELECOMMUNICATIONS, VOLUME 27, ISSUE 2, OCTOBER 2014 20
ABBREVIATION AND ACRONYMS
Acronym Description
VoIP Voice over Internet Protocol
MOS
Mean Opinion Score (measure of the quality of human speech at the destination end of
the circuit, value of 5 is excellent, value of 1 is bad)
DSP Digital Signal Processor
IP Internet Protocol
UDP User Datagram Protocol
RTP Real Time Transport Protocol
RTCP RTP Control Protocol
ITP Internet Telephony Provider
SIP Session Initiation Protocol
SIP trunk VoIP communication line of specific capacity as defined by agreement with ITP
QoS Quality of Service
pps packets per second
CS-ACELP Conjugate-Structure Algebraic Code Excited Linear Prediction
SDP Session Description Protocol
OEM Original Equipment Manufacturer
IVR Interactive Voice Response
DTMF Dual Tone Multi Frequency

APPENDIX A. VOICE CODECS LIST
Number Standard by Description Bit rate (kb/s)
Sampling
rate (kHz)
Frame
size (ms)
MOS
(Mean
Opinion
Score)
G.711 ITU-T Pulse code modulation (PCM) 64 8 Sampling 4.1
G.722.1 ITU-T
Coding at 24 and 32 kbit/s for
hands-free operation in systems
with low frame loss 24/32 16 20
G.722.2 AMR-
WB ITU-T
Adaptive Multi-Rate Wideband
Codec (AMR-WB)
23.85/ 23.05/
19.85/
16 20
18.25/ 15.85/ 14.25/
12.65/ 8.85/ 6.6
G.723.1 ITU-T
Dual rate speech coder for
multimedia communications
transmitting at 5.3 and 6.3 kbit/s 5.6/6.3 8 30 3.8-3.9
G.726 ITU-T
40, 32, 24, 16 kbit/s adaptive
differential pulse code
modulation (ADPCM) 16/24/32/40 8 Sampling 3.85
G.727 ITU-T
5-, 4-, 3- and 2-bit/sample
embedded (ADPCM) var. Sampling
G.728 ITU-T
Coding of speech at 16 kbit/s
using low-delay CELP 16 8 2.5 3.61
JOURNAL OF TELECOMMUNICATIONS, VOLUME 27, ISSUE 2, OCTOBER 2014 21
G.729 ITU-T
Coding of speech at 8 kbit/s
using conjugate-structure
algebraic-code-excited linear-
prediction (CS-ACELP) 8 8 10 3.92
G.729.1 ITU-T
Coding of speech at 8 kbit/s
using CS-ACELP
8/12/14/16/
8 10
18/20/22/24/
26/28/30/32
GSM 06.10 ETSI
RegularPulse Excitation Long-
Term Predictor (RPE-LTP) 13 8 22.5
LPC10
USA
Government Linear-predictive codec 2.4 8 22.5
Speex 8, 16, 32
2.15-24.6
(NB) 30 ( NB )
4-44.2 (WB) 34 ( WB )
iLBC 8 13.3 30
DoD CELP
Department
of Defense
(DoD) USA
Government 4.8 30
EVRC 3GPP2 Enhanced Variable Rate CODEC 9.6/4.8/1.2 8 20
DVI
Interactive
Multimedia
Association
(IMA)
DVI4 uses an adaptive delta
pulse code modulation
(ADPCM) 32 Variable Sampling
L16
Uncompressed audio data
samples 128 Variable Sampling
SILK Skype From 6 to 40 Variable 20




REFERENCES

[1] ITU-T Recommendation G.114, http://www.itu.int/rec/T-REC-
G.114-200305-I
[2] ITU-T Recommendation G.711, http://www.itu.int/rec/T-REC-
G.711-198811-I/en
[3] ITU-T Recommendation G.729, http://www.itu.int/rec/T-REC-
G.729-201206-I/en
[4] RTP: A Transport Protocol for Real-Time Applications,
https://www.ietf.org/rfc/rfc3550.txt
[5] RTP Payload for DTMF Digits, Telephony Tones and Telephony
Signals, http://tools.ietf.org/html/rfc2833,
http://tools.ietf.org/html/rfc4733
[6] The SIP INFO Method, http://www.ietf.org/rfc/rfc2976.txt

You might also like