0% found this document useful (0 votes)
25 views17 pages

Speech Coding Techniques

The document discusses various speech coding techniques, emphasizing their importance for VoIP and the balance between quality and bandwidth. It covers concepts such as voice quality measurement using the Mean Opinion Score (MOS), speech production mechanisms, and different quantization methods. Additionally, it highlights specific codecs like G.711 and CELP, detailing their functionalities and performance metrics.

Uploaded by

Bala Murugan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views17 pages

Speech Coding Techniques

The document discusses various speech coding techniques, emphasizing their importance for VoIP and the balance between quality and bandwidth. It covers concepts such as voice quality measurement using the Mean Opinion Score (MOS), speech production mechanisms, and different quantization methods. Additionally, it highlights specific codecs like G.711 and CELP, detailing their functionalities and performance metrics.

Uploaded by

Bala Murugan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Speech Coding Techniques

Introduction
 Efficient speech-coding techniques
 Advantages for VoIP
 Digital streams of ones and zeros
 The lower the bandwidth, the lower the
quality
 RTP payload types
 Processing power
 The better quality (for a given bandwidth)
uses a more complex algorithm
 A balance between quality and cost
Voice Quality
 Bandwidth is easily quantified
 Voice quality is subjective
 MOS, Mean Opinion Score
 ITU-T Recommendation P.800
 Excellent – 5
 Good – 4
 Fair – 3
 Poor – 2
 Bad – 1
 A minimum of 30 people
 Listen to voice samples or in conversations
 P.800 recommendations
 The selection of participants
 The test environment
 Explanations to listeners
 Analysis of results
 Toll quality
 A MOS of 4.0 or higher
About Speech
 Speech
 Air pushed from the lungs past the vocal
cords and along the vocal tract
 The basic vibrations – vocal cords
 The sound is altered by the disposition of
the vocal tract ( tongue and mouth)
 Model the vocal tract as a filter
 The shape changes relatively slowly
 The vibrations at the vocal cords
 The excitation signal
Speech sounds
 Voiced sound
 The vocal cords vibrate open and close
 Quasi-periodic pulses of air
 The rate of the opening and closing – the pitch
 Unvoiced sounds
 Forcing air at high velocities through a constriction
 Noise-like turbulence
 Show little long-term periodicity
 Short-term correlations still present
 Plosive sounds
 A complete closure in the vocal tract
 Air pressure is built up and released suddenly
Voice Sampling
 Discrete Time LTI Systems: The
Convolution Sum
+∞ +∞
x [n]= ∑ x[ k ] δ[ n−k ] y [ n]= ∑ x[ k ] h[ n−k ]
k=−∞ k=−∞

1
h[n]

0 1 2 n
2.5
2 2
x[n] y[n]
0.5 0.5

0 1 n 0 1 2 3 n
 Nyquist sampling theorem
X c ( j Ω)


s(t )= ∑ δ (t −nT )
n=−∞
Ω
−Ω N ΩN
x s (t )=xc (t )s( t )

=x c (t ) ∑ δ(t −nT )
−Ω S 0 X c ( j Ω) ΩS Ω n=−∞


S ( j Ω )= ∑ δ (Ω−k Ω s )
T k=−∞
ΩS −Ω N ΩN ΩS Ω

( ΩS −ΩN )
Quantization (Scalar
Quantization)
v1 v2 vk+1 vL

m0= -A m1 m2 …… mk mk+1 mL1 mL=A


J
 Assume | x[n] |  A k+1

divide the range [ A , A ] into L quantization levels


{ J1 , J2 , …… Jk ,….. JL }
Jk : [mk-1,mk ]
R
L=2

each quantization level Jk is represented by a value vk


S = U Jk , V = { v1 , v2 , …… vk ,….. vL }
Non-Uniform Quantization
m0 = -A m1 m2 …… 0 mL=A

Concept : small quantization levels for small x


large quantization levels for large x

Goal: constant SNRQ for all x


Companding

x[n] ^
x[n]
F(x) Uniform Uniform F1(x)
Quantization Decoder

Compressor …1101…1101… Expandor

Compressor + Expandor  Compandor


F(x) is to specify the non-uniform quantization
characteristics
Non-Uniform Quantization
 -law
log [ 1 +μ|x|]
|F ( x )|= , 0≤|x|≤1
log (1 +μ )
 A-law
A|x| 1


|F ( x )|=
{1+ ln A
, 0≤|x|≤
1+ln [ A|x|] 1
A

Waveform codecs,source codecs (also


known
1+ ln A
, ≤|x|≤1
A
as vocoders),and hybrid codecs.
}
 Typical values in practice
 = 255 , A = 87.6
speech Source Model and
Source Coding
unvoiced G(z), G(), g[n]
random Excitation parameters
sequence u[n] 1 x[n]v/u : voiced/ unvoiced
G(z) =
generator  P N : pitch for voiced
periodic 1  akz-k
pulse
G G : signal gain
k=1
train v/u
generator voiced Vocal Tract  excitation signal u[n]
N Model
Vocal Tract parameters
Excitation {ak} : LPC coefficients

formant structure of
speech signals
A good approximation,
though not precise enough
LPC Vocoder(Voice Coder)
x[n] { ak }
LPC Encoder
Analysis N,G
…11011
v/u

N by pitch detection
v/u by voicing detection
receiver

{ ak } x[n]
Decoder Ex g[n]
N,G G(z)
…11011
v/u

{ak} can be non-uniform or vector


quantized to reduce bit rate further
G.711
 The most commonplace codec
 Used in circuit-switched telephone network

 PCM, Pulse-Code Modulation

 If uniform quantization
 12 bits * 8 k/sec = 96 kbps

 Non-uniform quantization
 65 kbps DS0 rate

 North America
 A-law
 Other countries, a little friendlier to lower signal levels

 An MOS of about 4.3


 1024 vectors in the code book
 10-bit pointer (index)
 16 kbps
 CELP encoder
 Minimize a frequency-weighted mean-square error
 LD-CELP decoder

 An MOS score of about 3.9


 One-quarter of G.711 bandwidth

You might also like