2720 Slides7
2720 Slides7
David Tipper
Associate Professor
Graduate Program of Telecommunications and
Networking
University of Pittsburgh
Telcom 2720
Slides 7
http://www.sis.pitt.edu/~dtipper/tipper.html
http://www.sis.pitt.edu/~dtipper/tipper.html
Telcom 2720 2
1
Digital Speech Processing
Telcom 2720 3
Source Channel
Source Modulator
Encoder Encoder
Channel
Telcom 2720 4
2
Characteristics of Speech
• Bandwidth
– Most of energy between 20 Hz
to about 7KHz ,
– Human ear sensitive to energy
between 50 Hz and 4KHz
• Time Signal
– High correlation
– Short term stationary
• Classified into four categories
– Voiced : created by air passed
through vocal cords (e.g., ah, v)
– Unvoiced : created by air
through mouth and lips
(e.g., s, f )
– Mixed or transitional
– Silence
Telcom 2720 5
Characteristics of Speech
Typical
Voiced
speech
Typical
Unvoiced
speech
Telcom 2720 6
3
Digital Speech
• Speech Coder: device that converts speech to digital
• Types of speech coders
– Waveform coders
• Convert any analog signal to digital form
– Vocoders (Parametric coders)
• Try to exploit special properties of speech signal to reduce bit rate
• Build model of speech – transmit parameters of model
– Hybrid Coders
• Combine features of waveform and vocoders
Telcom 2720 7
Mean Opinion
Score is a
subjective measure
of quality
Tradeoff in quality
vs. data rate vs.
complexity
Telcom 2720 8
4
Waveform Coders (e.g.,PCM)
• Waveform Coders
• Convert any analog signal to digital -
basically A/D converter
• Analog signal sampled > twice highest
frequency- then quantized into ` n ‘ bit
samples
• Uniform quantization
• Example Pulse Code Modulation
• band limit speech < 4000 Hz
• pass speech through μ−law compander
• sample 8000 Hz, 8 bit samples
• 64 Kbps DS0 rate
• Characteristics
• Quality – High
• Complexity – Low
• Bit rate – High
• Delay - Low
• Robustness - High
Telcom 2720 10
Analog
input Sample- Analog-to-
Bandpass Analog PAM PCM
and-hold Digital
filter compressor
circuit converter
μ law compander
Transmission
medium
Analog
output Digital-to
Bandpass Analog Hold PAM
Analog
filter expander circuit
converter
μ law expander
Pulse code modulation (PCM) system with analog companding then digital
conversion
– ITU G.700 standard basis for speech coding In PSTN in 60’s
Telcom 2720 11
5
Companding
μ-Law companding
1
0.9
15
5 Analog Compander
0.8
100 40 emphasizes small values,
μ = 255
0.7 de-emphasizes large
0.6 values in-order to
Output
0.5
equalize SNR across
samples.
0.4
0 (no compression)
0.3
Reverse the mapping at
0.2 the receiver with an
0.1 expandor
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Input
Telcom 2720 12
PCM transmitter
Transmission
medium
Analog Digital-to
Linear
output Bandpass Hold PAM Analog PCM Digital
filter circuit expander
converter
PCM receiver
Telcom 2720 13
6
DPCM Speech Coding
Encoded
Analog difference
input Differen- Analog-to- samples
Low-pass
tiator Digital
filter
(summer) converter
Accumulated
signal level
Digital-to
Integrator Analog
converter
DPCM transmitter
DPCM Analog
input Digital-to output
Hold Low-pass
Analog Integrator
circuit filter
converter
DPCM receiver
Telcom 2720 14
Bandpass
A/D 1
Filter 1
Bandpass A/D 3
Filter 3
band Range encoding
Partition signal into non-overlapping
----------------------------------------
frequency bands use different A/D
1 50- 700 Hz 4 bits
quantizer for each band
2 700-2000 Hz 3 bits
Example: 3 subbands
3 2000-3400Hz 2 bits
5600+12000 + 13600 = 31.2 Kbps
Telcom 2720 15
7
Vocoders
• Vocoders (Parametric Coders)
• Models the vocalization of speech
• Speech sampled and broken into frames (~25 msec)
• Instead of transmitting digitized speech
1. Build model of speech
2. Transmit parameters of model
3. Synthesize approximation of speech
• Linear Predictive Coders (LPC) basic Vocoder model
• Models vocal tract as a filter
• Filter excitation
• periodic pulse (voiced speech) or noise (unvoiced
speech)
• Transmitted parameters:
• gain, voiced/unvoiced decision, pitch (if voiced), LPC
parameters
Telcom 2720 16
Vocoders
• Linear Predictive Coders (LPC)
• Excitation
• periodic pulse (voiced speech) or noise (unvoiced speech)
• Transmitted parameters: gain, voiced/unvoiced decision, pitch (if
voiced), LPC parameters
Telcom 2720 17
8
Vocoders
• Example Tenth Order Linear Predictive Coder
• Samples Voice at 8000 Hz – buffer 240 samples => 30 msec
• Filter Model
• (M=10 is order, G is gain, z-1 unit delay, bk are filter coefficients)
G
H (z) = M
1+ ∑k =1
bk z −k
Telcom 2720 18
Vocoders
• LPC coders can achieve low bit rates 1.2 – 4.8 Kbps
• Characteristics of LPC
• Quality – Low
• Complexity – Moderate
• Bit Rate – Low
• Delay – Moderate
• Robustness – Low
Telcom 2720 19
9
Vocoders
•Hybrid Coders
• Combine Vocoder and Waveform Coder concept
• Residual LPC (RELP)
• Codebook excited LPC (CELP)
Telcom 2720 20
RELP Vocoder
• Residual Excited LPC
• improve quality of LPC by transmitting error (residue)
along with LPC parameters
Buffer/
residue
s(n)
Window
ENCODER
Encoded
ST-LP LP parameters output
Analysis
v/u decision
gain, pitch
LPC
Synthesis
Telcom 2720 21
10
GSM Speech Coding
104 kbps 13 kbps
RPE-LTP
Analog Low-pass Channel
A/D speech
speech filter encoder
encoder
8000 samples/s,
13 bits/sample
Telcom 2720 22
Telcom 2720 23
11
GSM Speech Coding (cont)
Channel encoder
4 tail bits*
50 class
1a bits 53 bits 470
3-bit bits 456
260 bits/ (2,1,5)
CRC bits/
20 ms convolution Bit
= 13 kb/s coder inter- 20 ms
182 class 1b bits
leaver = 22.8
kb/s
78 class 2 bits
Hybrid Vocoders
• Codebook Excited LPC
• Problem with simple LPC is U/V
decision and pitch estimation
doesn’t model transitional speech
well, and not always accurate
Telcom 2720 25
12
Typical CELP Encoder
Telcom 2720 26
Telcom 2720 27
13
CELP Speech Coders
Code Book
i #1 γ1
γ1
γ2
Code Book
h #3 β
β
Filter Coefficients
14
Evaluating Speech Coders
MOS for clear channel environment – no errors
Result vary a little with language and speaker gender
Telcom 2720 30
Telcom 2720 31
15
Evaluating Speech Coders
Telcom 2720 32
Codec Selection
• For cellular need to consider Quality,
Complexity, Delay, Compression Rate
Telcom 2720 33
16
3G Standards
• Two competing 3G standards
• Both standards use multi-mode CELP vocoders
1. 3GPP/cdma2000 2. 3GPP/UMTS
(SMV – Multimode rate set 1) (AMR-NB Multi-rate)
Telcom 2720 34
Silence Compression
• Much of a conversation is Silence (~40%)
no need to transmit
• Voice Activity Detector (VAD)
– Hardware to detect silence period quickly
• Variable Bit Rate coders – reduce bit rate when silence
• Discontinuous transmission (DTX)
– Stop transmitting frames
• Send minimal # of frames to keep connection up
• Comfort Noise Generator (CNG)
– Synthesize background noise avoids: “Did you hang up?”
• Random noise or reproduce speaker’s ambient background
• For example GSM codec and popular VoIP G.723.1
codec has VAD/DTX/CNG
• Cdmaone and CDMA2000 codec use variable bit rate
approach
Telcom 2720 35
17
Silence Compression
Telcom 2720 36
Voice Coding
• Basic Voice Coding Approaches
– Waveform
– Vocoders
– Hybrid Vocoders
• Evaluation of Vocoder Quality
• Codebook based vocoders use in new
technology
• 3GPP and ITU recently standardized a
– AMR wideband CELP
– input 50-7000 HZ rather than 300-3400 Hz of current
systems
– more natural quality speech – slightly higher bit rate
Telcom 2720 37
18