Speech Coding Techniques
Speech Coding Techniques
Introduction
Efficient speech-coding techniques
Advantages for VoIP
Digital streams of ones and zeros
The lower the bandwidth, the lower the
quality
RTP payload types
Processing power
The better quality (for a given bandwidth)
uses a more complex algorithm
A balance between quality and cost
Voice Quality
Bandwidth is easily quantified
Voice quality is subjective
MOS, Mean Opinion Score
ITU-T Recommendation P.800
Excellent – 5
Good – 4
Fair – 3
Poor – 2
Bad – 1
A minimum of 30 people
Listen to voice samples or in conversations
P.800 recommendations
The selection of participants
The test environment
Explanations to listeners
Analysis of results
Toll quality
A MOS of 4.0 or higher
About Speech
Speech
Air pushed from the lungs past the vocal
cords and along the vocal tract
The basic vibrations – vocal cords
The sound is altered by the disposition of
the vocal tract ( tongue and mouth)
Model the vocal tract as a filter
The shape changes relatively slowly
The vibrations at the vocal cords
The excitation signal
Speech sounds
Voiced sound
The vocal cords vibrate open and close
Quasi-periodic pulses of air
The rate of the opening and closing – the pitch
Unvoiced sounds
Forcing air at high velocities through a constriction
Noise-like turbulence
Show little long-term periodicity
Short-term correlations still present
Plosive sounds
A complete closure in the vocal tract
Air pressure is built up and released suddenly
Voice Sampling
Discrete Time LTI Systems: The
Convolution Sum
+∞ +∞
x [n]= ∑ x[ k ] δ[ n−k ] y [ n]= ∑ x[ k ] h[ n−k ]
k=−∞ k=−∞
1
h[n]
0 1 2 n
2.5
2 2
x[n] y[n]
0.5 0.5
0 1 n 0 1 2 3 n
Nyquist sampling theorem
X c ( j Ω)
∞
s(t )= ∑ δ (t −nT )
n=−∞
Ω
−Ω N ΩN
x s (t )=xc (t )s( t )
∞
=x c (t ) ∑ δ(t −nT )
−Ω S 0 X c ( j Ω) ΩS Ω n=−∞
∞
2π
S ( j Ω )= ∑ δ (Ω−k Ω s )
T k=−∞
ΩS −Ω N ΩN ΩS Ω
( ΩS −ΩN )
Quantization (Scalar
Quantization)
v1 v2 vk+1 vL
x[n] ^
x[n]
F(x) Uniform Uniform F1(x)
Quantization Decoder
|F ( x )|=
{1+ ln A
, 0≤|x|≤
1+ln [ A|x|] 1
A
formant structure of
speech signals
A good approximation,
though not precise enough
LPC Vocoder(Voice Coder)
x[n] { ak }
LPC Encoder
Analysis N,G
…11011
v/u
…
N by pitch detection
v/u by voicing detection
receiver
{ ak } x[n]
Decoder Ex g[n]
N,G G(z)
…11011
v/u
…
If uniform quantization
12 bits * 8 k/sec = 96 kbps
Non-uniform quantization
65 kbps DS0 rate
North America
A-law
Other countries, a little friendlier to lower signal levels