0% found this document useful (0 votes)
12 views18 pages

2720 Slides7

The document discusses digital speech processing, focusing on the conversion of analog speech to digital form for applications such as telephony and speech storage. It covers various speech coding techniques, including waveform coders, vocoders, and hybrid coders, highlighting their characteristics, advantages, and trade-offs in terms of quality, complexity, and bit rate. Additionally, it addresses the importance of efficient bandwidth use, speech quality, and hardware complexity in digital speech systems.

Uploaded by

Red Knight
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views18 pages

2720 Slides7

The document discusses digital speech processing, focusing on the conversion of analog speech to digital form for applications such as telephony and speech storage. It covers various speech coding techniques, including waveform coders, vocoders, and hybrid coders, highlighting their characteristics, advantages, and trade-offs in terms of quality, complexity, and bit rate. Additionally, it addresses the importance of efficient bandwidth use, speech quality, and hardware complexity in digital speech systems.

Uploaded by

Red Knight
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Digital Speech Processing

David Tipper
Associate Professor
Graduate Program of Telecommunications and
Networking
University of Pittsburgh
Telcom 2720
Slides 7
http://www.sis.pitt.edu/~dtipper/tipper.html
http://www.sis.pitt.edu/~dtipper/tipper.html

Digital Speech Coding


• Digital Speech
– Convert analog speech to digital form and transmit digitally
• Applications
– Telephony: (cellular, wired and Internet- VoIP)
– Speech Storage (Automated call-centers)
– High-Fidelity recordings/voice
– Text-to-speech (machine generated speech)
• Issues
– Efficient use of bandwidth
• Compress to lower bit rate per user => more users
– Speech Quality
• Want tollgrade or better quality in a specific transmission environment
• Environment ( BER, packet lost, packet out of order, delay, etc.)
– Hardware complexity
• Speed (coding/decoding delay), computation requirement and power
consumption

Telcom 2720 2

1
Digital Speech Processing

• Speech coding in wireless systems


– All 1G systems have analog speech transmission
– 2G and 3G systems have digital speech
– Type of source coding

• Motivation for digital speech


– Increase system capacity
• Compression possible
• Quality/bandwidth tradeoffs can be made
– Improve quality of speech
• Error control coding possible, equalization, etc.
– Improve security as encryption possible for privacy
– Reduce Cost and Operations and Maintenance (OAM)

Telcom 2720 3

Typical Wireless Communication System

Source Channel
Source Modulator
Encoder Encoder
Channel

Source Channel Demod


Destination
Decoder Decoder -ulator

Telcom 2720 4

2
Characteristics of Speech
• Bandwidth
– Most of energy between 20 Hz
to about 7KHz ,
– Human ear sensitive to energy
between 50 Hz and 4KHz
• Time Signal
– High correlation
– Short term stationary
• Classified into four categories
– Voiced : created by air passed
through vocal cords (e.g., ah, v)
– Unvoiced : created by air
through mouth and lips
(e.g., s, f )
– Mixed or transitional
– Silence

Telcom 2720 5

Characteristics of Speech

Typical
Voiced
speech

Typical
Unvoiced
speech

Telcom 2720 6

3
Digital Speech
• Speech Coder: device that converts speech to digital
• Types of speech coders
– Waveform coders
• Convert any analog signal to digital form
– Vocoders (Parametric coders)
• Try to exploit special properties of speech signal to reduce bit rate
• Build model of speech – transmit parameters of model
– Hybrid Coders
• Combine features of waveform and vocoders

Telcom 2720 7

Speech Quality of Various Coders

Mean Opinion
Score is a
subjective measure
of quality

Tradeoff in quality
vs. data rate vs.
complexity

Telcom 2720 8

4
Waveform Coders (e.g.,PCM)
• Waveform Coders
• Convert any analog signal to digital -
basically A/D converter
• Analog signal sampled > twice highest
frequency- then quantized into ` n ‘ bit
samples
• Uniform quantization
• Example Pulse Code Modulation
• band limit speech < 4000 Hz
• pass speech through μ−law compander
• sample 8000 Hz, 8 bit samples
• 64 Kbps DS0 rate
• Characteristics
• Quality – High
• Complexity – Low
• Bit rate – High
• Delay - Low
• Robustness - High
Telcom 2720 10

PCM Speech Coding

Analog
input Sample- Analog-to-
Bandpass Analog PAM PCM
and-hold Digital
filter compressor
circuit converter

μ law compander

Transmission
medium

Analog
output Digital-to
Bandpass Analog Hold PAM
Analog
filter expander circuit
converter

μ law expander

Pulse code modulation (PCM) system with analog companding then digital
conversion
– ITU G.700 standard basis for speech coding In PSTN in 60’s
Telcom 2720 11

5
Companding
μ-Law companding
1

0.9
15
5 Analog Compander
0.8
100 40 emphasizes small values,
μ = 255
0.7 de-emphasizes large
0.6 values in-order to
Output

0.5
equalize SNR across
samples.
0.4
0 (no compression)
0.3
Reverse the mapping at
0.2 the receiver with an
0.1 expandor
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Input

Telcom 2720 12

PCM Speech Coding


• Digitally companded PCM system – ITU G.711 standard
• better quality speech than analog companding

Analog Sample- Analog-to-


Linear Compressed
input Bandpass
and-hold PAM Digital PCM Digital
PCM
filter compressor
circuit converter

PCM transmitter
Transmission
medium

Analog Digital-to
Linear
output Bandpass Hold PAM Analog PCM Digital
filter circuit expander
converter

PCM receiver

• Differential PCM (DPCM) : reduce bit rate from 64 Kbps to 32 Kbps)


• since change is small between sample – transmit 1 sample
• then on transmit difference between samples – use 4 bits to quantize
• adaptively adjust range of quantizer – improves quality (ADPCM ITU G.726 )

Telcom 2720 13

6
DPCM Speech Coding
Encoded
Analog difference
input Differen- Analog-to- samples
Low-pass
tiator Digital
filter
(summer) converter

Accumulated
signal level

Digital-to
Integrator Analog
converter

DPCM transmitter

DPCM Analog
input Digital-to output
Hold Low-pass
Analog Integrator
circuit filter
converter

DPCM receiver

Telcom 2720 14

Subband Speech Coding

Bandpass
A/D 1
Filter 1

Analog Bandpass Channel


A/D 2 Mux
speech Filter 2 encoder

Bandpass A/D 3
Filter 3
band Range encoding
Partition signal into non-overlapping
----------------------------------------
frequency bands use different A/D
1 50- 700 Hz 4 bits
quantizer for each band
2 700-2000 Hz 3 bits
Example: 3 subbands
3 2000-3400Hz 2 bits
5600+12000 + 13600 = 31.2 Kbps

Telcom 2720 15

7
Vocoders
• Vocoders (Parametric Coders)
• Models the vocalization of speech
• Speech sampled and broken into frames (~25 msec)
• Instead of transmitting digitized speech
1. Build model of speech
2. Transmit parameters of model
3. Synthesize approximation of speech
• Linear Predictive Coders (LPC) basic Vocoder model
• Models vocal tract as a filter
• Filter excitation
• periodic pulse (voiced speech) or noise (unvoiced
speech)
• Transmitted parameters:
• gain, voiced/unvoiced decision, pitch (if voiced), LPC
parameters
Telcom 2720 16

Vocoders
• Linear Predictive Coders (LPC)
• Excitation
• periodic pulse (voiced speech) or noise (unvoiced speech)
• Transmitted parameters: gain, voiced/unvoiced decision, pitch (if
voiced), LPC parameters

Telcom 2720 17

8
Vocoders
• Example Tenth Order Linear Predictive Coder
• Samples Voice at 8000 Hz – buffer 240 samples => 30 msec
• Filter Model
• (M=10 is order, G is gain, z-1 unit delay, bk are filter coefficients)
G
H (z) = M
1+ ∑k =1
bk z −k

• G = 5 bits, bk = 8 bits each, voiced/unvoiced decision = 1 bit, pitch


= 6 bits => 92 bits/30 msec = 3067 bps

Telcom 2720 18

Vocoders
• LPC coders can achieve low bit rates 1.2 – 4.8 Kbps

• Characteristics of LPC
• Quality – Low
• Complexity – Moderate
• Bit Rate – Low
• Delay – Moderate
• Robustness – Low

• Quality of pure LPC vocoder to low for cellular telephony -


try to improve quality by using hybrid coders

• Try to improve the quality by


• refining model of speech,
• improve accuracy of model
• improve input to speech coder

Telcom 2720 19

9
Vocoders
•Hybrid Coders
• Combine Vocoder and Waveform Coder concept
• Residual LPC (RELP)
• Codebook excited LPC (CELP)

Telcom 2720 20

RELP Vocoder
• Residual Excited LPC
• improve quality of LPC by transmitting error (residue)
along with LPC parameters

Buffer/
residue
s(n)
Window
ENCODER

Encoded
ST-LP LP parameters output
Analysis

v/u decision
gain, pitch
LPC
Synthesis

Block diagram of a RELP encoder

Telcom 2720 21

10
GSM Speech Coding
104 kbps 13 kbps
RPE-LTP
Analog Low-pass Channel
A/D speech
speech filter encoder
encoder

8000 samples/s,
13 bits/sample

•GSM uses Regular Pulse Excited -- Linear Predictive Coder (RPE--


LPC) for speech
–Basically combine DPCM concept with LPC
–Information from previous samples used to predict the current sample.
–The LPC coefficients, plus an encoded form of the residual (predicted -
actual sample = error), represent the signal.

Telcom 2720 22

GSM Speech Coding (cont)


Regular pulse excited - long term prediction (RPE-LRP)
speech encoder (RELP speech coder

160 samples/ RPE-LTP 36 LPC bits/20 ms 260 bits/20 ms


20 ms from A/D speech 9 LTP bits/5 ms to channel
(= 2080 bits) encoder encoder
47 RPE bits/5 ms

LPC: linear prediction coding filter


LTP: long term prediction – pitch + input
RPE: Residual Prediction Error:

Telcom 2720 23

11
GSM Speech Coding (cont)
Channel encoder
4 tail bits*
50 class
1a bits 53 bits 470
3-bit bits 456
260 bits/ (2,1,5)
CRC bits/
20 ms convolution Bit
= 13 kb/s coder inter- 20 ms
182 class 1b bits
leaver = 22.8
kb/s
78 class 2 bits

Class 1a: CRC (3-bit error detection) and convolutional coding


(error correction)
Class 1b: convolutional coding
Class 2: no error protection
*tail bits to periodically reset convolutional coder
Telcom 2720 24

Hybrid Vocoders
• Codebook Excited LPC
• Problem with simple LPC is U/V
decision and pitch estimation
doesn’t model transitional speech
well, and not always accurate

•Codebook approach – pass


speech through an analyzer to find
closest match to a set of possible
excitations (codebook)

• Transmit codebook pointer +


LPC parameters
• NA-TDMA standard, IS-95, 3G,
ITU G.729 standard

Telcom 2720 25

12
Typical CELP Encoder

Telcom 2720 26

CELP Speech Coders


• General CELP architecture

Telcom 2720 27

13
CELP Speech Coders
Code Book
i #1 γ1
γ1

Code Book Output


l #2 γ2 LPC Synthesis
Filter

γ2

Code Book
h #3 β

β
Filter Coefficients

Block diagram of the NA-TDMA (IS-54) speech coder –


subband codebook approach – termed vector sum excited
LPC (VSELPC)
Telcom 2720 28

Evaluating Speech Coders


• Qualitative Comparison
– based on subjective procedures Mean Opinion Score (MOS)
in ITU-T Rec. P. 830 -------------------------------------
• Major Procedures Excellent 5
• Absolute Category Rating Good 4
Fair 3
– Subjects listen to samples
Poor 2
and rank them on an
Bad 1
absolute scale - result is a
mean opinion score (MOS)
• Comparison Category Rating Comparison MOS (CMOS)
– Subjects listen to coded -------------------------------------
samples and original un- Much Better 3
coded sample (PCM or Better 2
analog), the two are Slightly Better 1
compared on a relative scale About the Same 0
– result is a comparison Slightly Worse -1
mean opinion score (CMOS) Worse -2
Much Worse -3
Telcom 2720 29

14
Evaluating Speech Coders
MOS for clear channel environment – no errors
Result vary a little with language and speaker gender

Standard Speech coder Bit rate MOS


PCM Waveform 64 Kbps 4.3
CT2 ADPCM 32 Kbps 4.1
DECT ADPCM 32 Kbps 4.1
NA-TDMA Hybrid VSELPC 8Kbps 3.0
GSM Hybrid RELPC 13 kbps 3.54
QCELP Hybrid CELP 14.4 Kbps 3.4 – 4.0
QCELP Hybrid CELP 9.6 Kbps 3.4
LPC Vocoder 2.4 Kbps 2.5
ITU G.729 Hybrid CELP 8.Kbps 3.9

Telcom 2720 30

Evaluating Speech Coders


• Types of environments recommended for testing coder
quality
– Clean Channel no background noise
– Vehicle : emulate car background noise
– Street : emulate pedestrian environment
– Hoth : emulate background noise in office environment (voice
band interference)
• Consider environments above for cases of
– Perfect Channel – no transmission errors
– Random channel errors
– Bursty channel errors
• May consider repeated encoding/decoding (e.g.,
mobile to mobile call)

Telcom 2720 31

15
Evaluating Speech Coders

Background noise and errors degrade


quality

Repeated coding degrades quality

Telcom 2720 32

Codec Selection
• For cellular need to consider Quality,
Complexity, Delay, Compression Rate

ITU Coder Bit Rate Coding Decoding Complexity


Delay Delay
G.711 64 Kbps 0 0 Low

G.729 8 Kbps 15 ms 7.5 ms Medium

G.723.a,b 6.4/5.3 35.5 ms 18.75 ms High


Kbps

Telcom 2720 33

16
3G Standards
• Two competing 3G standards
• Both standards use multi-mode CELP vocoders
1. 3GPP/cdma2000 2. 3GPP/UMTS
(SMV – Multimode rate set 1) (AMR-NB Multi-rate)

Variable bit rate vocoder Fixed rate vocoder


Source Control of bit rate Voice Activity Detection
Discontinuous Transmission
network control of coder rate
Channel coding treats all bits equally Tailors Channel coding to
speech coder

Telcom 2720 34

Silence Compression
• Much of a conversation is Silence (~40%)
no need to transmit
• Voice Activity Detector (VAD)
– Hardware to detect silence period quickly
• Variable Bit Rate coders – reduce bit rate when silence
• Discontinuous transmission (DTX)
– Stop transmitting frames
• Send minimal # of frames to keep connection up
• Comfort Noise Generator (CNG)
– Synthesize background noise avoids: “Did you hang up?”
• Random noise or reproduce speaker’s ambient background
• For example GSM codec and popular VoIP G.723.1
codec has VAD/DTX/CNG
• Cdmaone and CDMA2000 codec use variable bit rate
approach

Telcom 2720 35

17
Silence Compression

Telcom 2720 36

Voice Coding
• Basic Voice Coding Approaches
– Waveform
– Vocoders
– Hybrid Vocoders
• Evaluation of Vocoder Quality
• Codebook based vocoders use in new
technology
• 3GPP and ITU recently standardized a
– AMR wideband CELP
– input 50-7000 HZ rather than 300-3400 Hz of current
systems
– more natural quality speech – slightly higher bit rate
Telcom 2720 37

18

You might also like