0% found this document useful (0 votes)
50 views24 pages

New Speech Coding Techniques: Mr. L.Ramesh Ap/Ece

The document discusses new speech coding techniques that provide efficient digital encoding of voice signals. It covers various coding methods like ADPCM, LPC vocoding, analysis-by-synthesis codecs, and standardized codecs including G.711, G.728, G.723.1, and G.729. These techniques aim to balance voice quality and bandwidth usage by modeling the vocal tract and compressing excitation signal parameters.

Uploaded by

Ramesh L
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views24 pages

New Speech Coding Techniques: Mr. L.Ramesh Ap/Ece

The document discusses new speech coding techniques that provide efficient digital encoding of voice signals. It covers various coding methods like ADPCM, LPC vocoding, analysis-by-synthesis codecs, and standardized codecs including G.711, G.728, G.723.1, and G.729. These techniques aim to balance voice quality and bandwidth usage by modeling the vocal tract and compressing excitation signal parameters.

Uploaded by

Ramesh L
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

New Speech Coding

Techniques

Mr. L.Ramesh
AP/ECE
Introduction

Efficient speech-coding techniques


Advantages for VoIP
Digital streams of ones and zeros
The lower the bandwidth, the lower the
quality
RTP payload types
Processing power
The better quality (for a given bandwidth)
uses a more complex algorithm
A balance between quality and cost
Voice Quality

Bandwidth is easily quantified


Voice quality is subjective
MOS, Mean Opinion Score
ITU-T Recommendation P.800
 Excellent – 5
 Good – 4
 Fair – 3
 Poor – 2
 Bad – 1
A minimum of 30 people
Listen to voice samples or in conversations
 P.800 recommendations
 The selection of participants
 The test environment
 Explanations to listeners
 Analysis of results
 Toll quality
 A MOS of 4.0 or higher
About Speech

Speech
Air pushed from the lungs past the vocal
cords and along the vocal tract
The basic vibrations – vocal cords
The sound is altered by the disposition of the
vocal tract ( tongue and mouth)
Model the vocal tract as a filter
The shape changes relatively slowly
The vibrations at the vocal cords
The excitation signal
Speech sounds
Voiced sound
 The vocal cords vibrate open and close
 Quasi-periodic pulses of air
 The rate of the opening and closing – the pitch
Unvoiced sounds
 Forcing air at high velocities through a constriction
 Noise-like turbulence
 Show little long-term periodicity
 Short-term correlations still present
Plosive sounds
 A complete closure in the vocal tract
 Air pressure is built up and released suddenly
Voice Sampling
Discrete Time LTI Systems: The Convolution
Sum

 
x[n]   x[k ] [n  k ]
k  
y[n]   x[k ]h[n  k ]
k  

1
h[n]

0 1 2 n
2.5
2 2
x[n] y[n]
0.5 0.5

0 1 n 0 1 2 3 n
 Nyquist sampling theorem
X c ( j )


s (t )    (t  nT )
n  
 N N 
xs (t )  xc (t ) s (t )

 xc (t )   (t  nT )
 S 0 X c ( j ) S  n  

2 
S ( j ) 
T
  (  k )
k  
s

S  N N S 

( S   N )
Quantization (Scalar
Quantization)
v1 v2 vk+1 vL

m0= -A m1 m2 …… mk mk+1 mL1 mL=A


J
· Assume | x[n] |  A k+1

divide the range [ A , A ] into L quantization levels


{ J1 , J2 , …… Jk ,….. JL }
Jk : [mk-1,mk ]
R
L=2

each quantization level Jk is represented by a value vk


S = U Jk , V = { v1 , v2 , …… vk ,….. vL }
Non-Uniform Quantization

m0 = -A m1 m2 …… 0 mL=A

Concept : small quantization levels for small x


large quantization levels for large x

Goal: constant SNRQ for all x


Companding

x[n] ^
x[n]
F(x) Uniform Uniform F1(x)
Quantization Decoder

Compressor …1101…1101… Expandor

Compressor + Expandor  Compandor


F(x) is to specify the non-uniform quantization
characteristics
Non-Uniform Quantization
 - law

 A-law log 1  μ x 
F ( x)  ,0  x  1
log( 1  μ)

 Ax 1
 ,0  x 

F ( x )   1  lnA A
1  ln[ A x ] , 1  x  1

 1  lnA A

 Typical values in practice


 = 255 , A = 87.6
Types of Speech Codecs
Waveform codecs,source codecs (also
known as vocoders),and hybrid codecs.
Speech Source Model and
Source Coding

unvoiced G(z), G(), g[n]


random Excitation parameters
sequence u[n] 1 x[n]v/u : voiced/ unvoiced
G(z) =
generator  P N : pitch for voiced
periodic 1  akz-k
pulse
G G : signal gain
k=1
train v/u
generator voiced Vocal Tract Model  excitation signal u[n]
N
Vocal Tract parameters
Excitation {ak} : LPC coefficients

formant structure of
speech signals
A good approximation,
though not precise enough
LPC Vocoder(Voice Coder)

x[n] { ak }
LPC Encoder
Analysis N,G
…11011…
v/u

N by pitch detection
v/u by voicing detection
receiver

{ ak } x[n]
Decoder Ex g[n]
N,G G(z)
…11011…
v/u

{ak} can be non-uniform or vector


quantized to reduce bit rate further
G.711

 The most commonplace codec


 Used in circuit-switched telephone
network
 PCM, Pulse-Code Modulation
 If uniform quantization
 12 bits * 8 k/sec = 96 kbps
 Non-uniform quantization

  law
 65 kbps DS0 rate

 North America
 A-law
 Other countries, a little friendlier to
lower signal levels
 An MOS of about 4.3
ADPCM(adaptive differential
PCM)
DPCM and ADPCM.
ADPCM : Adaptive Prediction in DPCM
Adaptive Quantization
Adaptive Quantization
 Quantization level  varies with local signal level
 [n] = ax[n]
 x[n] : locally estimated standard deviation of x[n]

G.721:ADPCM-coded speech at 32Kbps.


G.726(A-law or )
16,24,32,40Kbps
  law
MOS 4.0 , at 32Kbps
Analysis-by-Synthesis (AbS)
Codecs
 Hybrid codec
Fill the gap between waveform and source
codecs
The most successful and commonly used
 Time-domain AbS codecs
 Not a simple two-state, voiced/unvoiced
 Different excitation signals are attempted
 Closest to the original waveform is selected
 MPE, Multi-Pulse Excited
 RPE, Regular-Pulse Excited
 CELP, Code-Excited Linear Predictive
G.728 LD-CELP
 CELP codecs
 A filter; its characteristics change over time
 A codebook of acoustic vectors
 A vector = a set of elements representing various
char. of the excitation
 Transmit
 Filter coefficients, gain, a pointer to the vector
chosen
 Low Delay CELP
 Backward-adaptive coder
 Use previous samples to determine filter coefficients
 Operates on five samples at a time
 Delay < 1 ms
 Only the pointer is transmitted
 1024 vectors in the code book
 10-bit pointer (index)
 16 kbps
 LD-CELP encoder
 Minimize a frequency-weighted mean-square error
 LD-CELP decoder

 An MOS score of about 3.9


 One-quarter of G.711 bandwidth
G.723.1 ACELP
 6.3 or 5.3 kbps
 Both mandatory
 Can change from one to another during a conversation
 The coder
 A band-limited input speech signal
 Sampled at 8 KHz, 16-bit uniform PCM quantization
 Operate on blocks of 240 samples at a time
 A look-ahead of 7.5 ms
 A total algorithmic delay of 37.5 ms + other delays
 A high-pass filter to remove any DC component
 G.723.1 Annex A
 Silence Insertion Description (SID) frames
of size four octets
 The two lsbs of the first octet
 00 6.3kbps 24 octets/frame
 01 5.3kbps 20
 10 SID frame 4
 An MOS of about 3.8
 At least 37.5 ms delay
G.729
 8 kbps
 Input frames of 10 ms, 80 samples for 8 KHz
sampling rate
 5 ms look-ahead
 Algorithmic delay of 15 ms
 An 80-bit frame for 10 ms of speech
 A complex codec
 G.729.A (Annex A), a number of simplifications
 Same frame structure
 Encoder/decoder, G.729/G.729.A
 Slightly lower quality

You might also like