0% found this document useful (0 votes)

169 views41 pages

Speech Compression Techniques - Formant and CELP Vocoders

Speech compression techniques can analyze speech into model parameters like formant frequencies and bandwidths or the output of a bank of bandpass filters. These parameters are transmitted instead of the full speech signal. At the receiver, the parameters are used to synthesize speech. Early techniques like the channel vocoder used bandpass filters and transmitted the energy output of each filter. The phase vocoder estimates and transmits the phase derivative at the filter outputs to preserve phase information. Formant vocoders estimate and transmit formant frequencies and bandwidths to model the vocal tract resonances.

Uploaded by

siddharth2000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

169 views41 pages

Speech Compression Techniques - Formant and CELP Vocoders

Uploaded by

siddharth2000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 41

Speech compression techniques –

Formant and CELP Vocoders

Introduction
• Earlier approach towards lossy compression is to model the source
output and send the model parameters to the source instead of the
estimates of the source output.
• The receiver tries to synthesize the source output based on the
received model parameters.
• Speech can be analyzed in terms of a model, and the model
parameters can be extracted and transmitted to the receiver.
• At the receiver the speech can be synthesized using the model.
• This analysis/synthesis approach was first employed by Homer
Dudley at Bell Laboratories, who developed what is known as the
channel vocoder.
• He developed a “speaking machine” in which the vocal tract was
modeled by a flexible tube whose shape could be modified by an
operator. Sound was produced by forcing air through this tube
using bellows.
Unlike speech, images are generated in a variety of different ways; therefore, the
analysis/ synthesis approach does not seem very useful for image or video
compression

2
speech production mechanism
speech is produced by forcing air
first through an elastic opening, the
vocal cords, and then through the
laryngeal, oral, nasal, and pharynx
passages, and finally through the
mouth and the nasal cavity.
 Everything past the vocal cords is
generally referred to as the vocal
tract.
The first action generates the
sound, which is then modulated into
speech as it traverses through the
vocal tract

3
Simplified model of speech synthesis
models the vocal tract

corresponds to the sound generation

At the transmitter, the speech is divided into segments. Each segment is
analyzed to determine an excitation signal and the parameters of the vocal
tract filter.
 In some of the schemes, a model for the excitation signal is transmitted
to the receiver.
The excitation signal is then synthesized at the receiver and used to drive
the vocal tract filter.
In other schemes, the excitation signal itself is obtained using an analysis-
by-synthesis approach. This signal is then used by the vocal tract filter to
generate the speech signal
4
Channel Vocoder
• each segment of input speech is analyzed using a bank of
band-pass filters called the analysis filters.
• The energy at the output of each filter is estimated at fixed
intervals and transmitted to the receiver.
• In a digital implementation, the energy estimate may be
the average squared value of the filter output.
• In analog implementations, this is the sampled output of an
envelope detector.
• Generally, an estimate is generated 50 times every second.
• Along with the estimate of the filter output, a decision is
made as to whether the speech in that segment is voiced,
as in the case of the sounds /a/ /e/ /o/, or unvoiced, as in
the case for the sounds /s/ /f/.

5
The sound /e/ in test (male voice saying the
word test)
Voiced sounds tend to have a
pseudoperiodic structure

The period of the

fundamental harmonic is
called the pitch period.
The transmitter also forms
an estimate of the pitch
period, which is transmitted
to the receiver.

6
The sound /s/ in test
Unvoiced sounds tend to have
a noiselike structure,
Eg., the /s/ sound in the word
test.

7
The Channel Vocoder (analyzer block diagram):
Bandpass Lowpass A/D
Rectifier
Filter Filter Converter

Bandpass Lowpass A/D

Rectifier

Encoder
Filter Filter Converter To
S(n)
Channel

Voicing
detector

Pitch
detector
8
The channel vocoder receiver

9
The Channel Vocoder (synthesizer):
• At the receiver, the vocal tract filter is implemented by a
bank of band-pass filters. The bank of filters at the
receiver, known as the synthesis filters, is identical to the
bank of analysis filters.
• Based on whether the speech segment was deemed to
be voiced or unvoiced, either a pseudonoise source or a
periodic pulse generator is used as the input to the
synthesis filter bank.
• The period of the pulse input is determined by the pitch
estimate obtained for the segment being synthesized at
the transmitter. The input is scaled by the energy
estimate at the output of the analysis filters

10
channel vocoder

• The channel vocoder employs a bank of bandpass

filters,
– Each having a bandwidth between 100 HZ and 300 HZ.
– Typically, 16-20 linear phase FIR filter are used.
• The output of each filter is rectified and lowpass
filtered.
– The bandwidth of the lowpass filter is selected to match
the time variations in the characteristics of the vocal tract.
• For measurement of the spectral magnitudes, a
voicing detector and a pitch estimator are included in
the speech analysis.

11
The Phase Vocoder

• The phase vocoder is similar to the channel

vocoder.

• However, instead of estimating the pitch, the

phase vocoder estimates the phase derivative
at the output of each filter.

• By coding and transmitting the phase

derivative, this vocoder destroys the phase
information .
12
The Phase Vocoder (analyzer block diagram):
cos k n Short-term
magnitude
ak n
Lowpass
cos  n sin  k n
Decimator
k

Filter
Differentiator Compute
Short-term

Encoder
S(n) To
Magnitude
And Channel
Phase
Differentiator Derivative
Lowpass
cos  n Decimator
Filter k

bk n
Short-term phase
sin k n derivative

13
The Phase Vocoder
(synthesizer block diagram, kth channel) :
Decimate
Short-term
amplitude

cos k n
Decoder

From
sin  n
Channel
Cos Interpolator
k

Integrator ∑
Decimate Interpolator
Sin
Short-term
Phase
derivative

14
The Phase Vocoder

• LPF bandwidth: 50 Hz
• Demodulation separation: 100 Hz
• Number of filters: 25 – 30
• Sampling rate of spectrum magnitude and phase derivative:
50 - 60 samples per second
• Spectral magnitude is coded using PCM or
• DPCM
• Phase derivative is coded linearly using 2 - 3 bits
• The resulting bit rate is 7200 bps

15
Formant Vocoder
• as the vocal tract is a tube of nonuniform cross section, it resonates
at a number of different frequencies. These frequencies are known
as formants.
• The formant values change with different sounds; however, we can
identify ranges in which they occur.
• For example, the first formant occurs in the range 200–800 Hz for a
male speaker, and in the range 250–1000 Hz for a female speaker.
• formant vocoders transmits an estimate of the formant values
(usually four formants are considered sufficient) and an estimate of
the bandwidth of each formant.
• At the receiver the excitation signal is passed through tunable
filters that are tuned to the formant frequency and bandwidth.

16
Formant Vocoder

• The formant vocoder can be viewed as a type

of channel vocoder that estimate the first
three or four formants in a segment of
speech.

• It is this information plus the pitch period that

is encoded and transmitted to the receiver.

17
Formant Vocoder

• The speech can be represented as the output

of a linear time-varying system whose
properties vary slowly with time.
• The digital model of speech production
represents voiced speech by pitch period,
amplitude and the lowest three formant
frequencies and unvoiced speech simply by
amplitude and a single zero and pole.
• All these parameters vary with time.
Formant Vocoder (analyzer block diagram):
F3
F3 B3

F2
F2 B2
Input
Speech F1
F1 B1

Pitch V/U
And
V/U
F0
Decoder

Fk :The frequency of the kth formant

Bk :The bandwidth of the kth formant
19
Formant Vocoder ( synthesizer block diagram) :

F3
F3
B3

F2
F2 ∑
B2

F1
F1
B1

V/U Excitation
F0 Signal

20
Digital model for speech production
Linear Predictive Coder
• vocal tract is modeled as a single linear filter
whose output yn is related to the input ξn by

the input to the vocal tract filter is either the output of a

random noise generator or a periodic pulse generator.

22
23
Speech synthesis model
The input speech is generally sampled at 8000 samples
per second.
In the LPC-10 standard, the speech is broken into 180
sample segments, corresponding to 22.5 milliseconds of
speech per segment.

 the samples of the voiced speech have larger amplitude;

that is, there is more energy in the voiced speech.
 Also, the unvoiced speech contains higher frequencies.
 As both speech segments have average values close to
zero, this means that the unvoiced speech waveform crosses
the x = 0 line more often than the voiced speech sample.
Linear Predictive Coding -10
• In the LPC-10 algorithm, the speech segment is first low-pass
filtered using a filter with a bandwidth of 1 kHz.
• The energy at the output relative to the background noise is
used to obtain a tentative decision about whether the signal in
the segment should be declared voiced or unvoiced.
• The estimate of the background noise is basically the energy in
the unvoiced speech segments. This tentative decision is
further refined by counting the number of zero crossings and
checking the magnitude of the coefficients of the vocal tract
filter

25
Principle
• A speech sample can be approximated as a linear
combination of past speech samples.
• By minimizing the sum of the squared differences between
the actual speech samples and the linearly predicted ones, a
unique set of predictor coefficients can be determined.
• Linear prediction provides a robust, reliable and accurate
method for estimating the parameters that characterize the
linear, time-varying system.
Steps
1. Voiced / unvoiced decision (based on the
energy in the segment)
2. Pitch period estimation - (using
autocorrelation function)
3. Obtaining the vocal tract filter – linear filter
with coefficients in a minimum mean
squared error sense.
Autocorrelation function

• the autocorrelation of a periodic function Rxx (k) will

have a maximum when k is equal to the pitch
period.
• Coupled with the fact that the estimation of the
autocorrelation function generally leads to a
smoothing out of the noise, this makes the
autocorrelation function a useful tool for obtaining
the pitch period.

28
problems with the use of the
autocorrelation
• Voiced speech is not exactly periodic, which makes the
maximum lower than we would expect from a periodic signal.
• Generally, a maximum is detected by checking the
autocorrelation value against a threshold; if the value is
greater than the threshold, a maximum is declared to have
occurred.
• When there is uncertainty about the magnitude of the
maximum value, it is difficult to select a value for the
threshold.
• Another problem occurs because of the interference due to
other resonances in the vocal tract

29
Average magnitude difference function
(AMDF)

If a sequence yn is periodic with period P0, samples that are P0

apart in the yn sequence will have values close to each other, and
therefore the AMDF will have a minimum at P0.

30
AMDF function for the sound /e/ in test.

31
AMDF function for the sound /s/ in test.

32
Obtaining the Vocal T r a c t F i l t e r

• if yn are the speech samples in that particular segment, then

we want to choose ai to minimize the average value of e2n

33
Autocovariance approach

we assume that the yn sequence is zero outside the segment for

which we are calculating the filter parameters

34
Contd.,

• reflection coefficients, or partial correlation

(PARCOR) coefficients
•

35
Problem
• In order to get an effective reconstruction of the voiced
segment, the order of the vocal tract filter needs to be
sufficiently high.
• Generally, the order of the filter is 10 or more.
• Because the filter is an IIR filter, error in the coefficients can
lead to instability, especially for the high orders necessary in
linear predictive coding.
• As the filter coefficients are to be transmitted to the receiver,
they need to be quantized. This means that quantization error
is introduced into the value of the coefficients, and that can
lead to instability.

36
covariance method
• Discarding the assumption of stationary speech signals as in
Autocovariance method, the equations to obtain the filter
coefficients change.
• Defining E[yn−I yn−j ] as a function of both i and j,

37
Code excited linear prediction(CELP)

• In CELP, each trial waveform is synthesized by passing

it through a two part cascade synthesis filter.
• The first part, termed the pitch synthesis filter, inserts
pitch periodicities into the reconstructed speech.
• The second filter is the formant synthesis filter which
introduces a frequency shaping related to the formant
resonances produced by the human vocal tract.
• Both filters are all-pole structures, using an FIR filter in
a feedback configuration.
CELP analyzer
• The formant filter F(z)
removes sample to
sample correlation.
• Predictive filter P(z)
filter removes
periodicities due to the
pitch excited nature of
speech
CELP synthesizer
• P(z) - pitch synthesis
filter, inserts pitch
periodicities into the
reconstructed speech
• F(z) - the formant
synthesis filter which
introduces a frequency
shaping related to the
formant resonances
produced by the human
vocal tract.
CELP
• In CELP, the excitation waveform i ( n ) is chosen from a dictionary
of waveforms.
• Conceptually, each waveform in the dictionary is passed through
the synthesis filters to determine which waveform "best” matches
the input speech.
• The optimality criterion is based on the same type of frequency
weighted mean-square error criterion used in multi-pulse coding.
• The index of the “best" waveform used is transmitted to the
decoder. In addition, both the formant and pitch filters are
transmitted periodically.
• The parameters of these filters are sent to the decoder as side
information to allow it to form the appropriate synthesis filters.

DAFX: Digital Audio Effects
From Everand
DAFX: Digital Audio Effects
Udo Zölzer
3.5/5 (2)
Cec355 - Software Defined Radio Laboratory
No ratings yet
Cec355 - Software Defined Radio Laboratory
64 pages
Vocoder
No ratings yet
Vocoder
72 pages
HSBC Bank Statement TemplateLab Com
100% (1)
HSBC Bank Statement TemplateLab Com
1 page
Analog IC Unit 4 Notes
No ratings yet
Analog IC Unit 4 Notes
12 pages
EC3451 Linear Integrated Circuits Lab Record
No ratings yet
EC3451 Linear Integrated Circuits Lab Record
85 pages
5EC4-02 - Electromagnetics Waves - Pallav Rawal
No ratings yet
5EC4-02 - Electromagnetics Waves - Pallav Rawal
192 pages
Ethernet With 8051
100% (2)
Ethernet With 8051
57 pages
Introduction To Sigma Delta Converters
100% (1)
Introduction To Sigma Delta Converters
72 pages
EE3001 - Advanced Measurements: Digital Filters
100% (1)
EE3001 - Advanced Measurements: Digital Filters
38 pages
Dsplabmanual-By 22
100% (1)
Dsplabmanual-By 22
64 pages
LDIC Word New
No ratings yet
LDIC Word New
221 pages
Optical - Communication - LAB - MANUAL 22222 3
No ratings yet
Optical - Communication - LAB - MANUAL 22222 3
38 pages
Analog IC Unit 5 Notes
No ratings yet
Analog IC Unit 5 Notes
8 pages
ECD Lab NEC 752
0% (1)
ECD Lab NEC 752
17 pages
Ews Manual
No ratings yet
Ews Manual
61 pages
Projectreport-G15 Tue
100% (1)
Projectreport-G15 Tue
19 pages
BJT Small Signal
No ratings yet
BJT Small Signal
56 pages
Digital Signal Synthesis DDS - Tutorial - Rev12!2!99
No ratings yet
Digital Signal Synthesis DDS - Tutorial - Rev12!2!99
122 pages
5,6 Ldic New Course File
No ratings yet
5,6 Ldic New Course File
49 pages
Block Diagram of 8085
No ratings yet
Block Diagram of 8085
32 pages
Ic Application Lab Viva Answers
No ratings yet
Ic Application Lab Viva Answers
26 pages
209 Dolby Surround Pro Logic II Decoder Principles of Operation
100% (1)
209 Dolby Surround Pro Logic II Decoder Principles of Operation
8 pages
Study of Gunn-Oscillator Characteristic
No ratings yet
Study of Gunn-Oscillator Characteristic
10 pages
Motorola DSP 56k
100% (1)
Motorola DSP 56k
6 pages
Classification of Business Environment
83% (6)
Classification of Business Environment
12 pages
Ssm2044 VCF - Bom & Build Info
100% (2)
Ssm2044 VCF - Bom & Build Info
5 pages
Ec6702 Optical Communication and Networks
No ratings yet
Ec6702 Optical Communication and Networks
36 pages
Small Signal Amplifier - BJT-1-25
No ratings yet
Small Signal Amplifier - BJT-1-25
25 pages
Sinusoidal Oscillator: An Electronic Device That Generates Sinusoidal Oscillations of Desired
No ratings yet
Sinusoidal Oscillator: An Electronic Device That Generates Sinusoidal Oscillations of Desired
22 pages
Service Manual, PM7100, English PT00112534 Rev A Release 8-2020
No ratings yet
Service Manual, PM7100, English PT00112534 Rev A Release 8-2020
64 pages
Amine Unit
100% (1)
Amine Unit
69 pages
Realization of A Sigma-Delta Modulator in Fpga
No ratings yet
Realization of A Sigma-Delta Modulator in Fpga
76 pages
Consumer Electronics Study Material Module-2 - Part-1
No ratings yet
Consumer Electronics Study Material Module-2 - Part-1
9 pages
Pulse Code Modulation
No ratings yet
Pulse Code Modulation
16 pages
Consumer Electronics PDF
No ratings yet
Consumer Electronics PDF
22 pages
Fire Wire
No ratings yet
Fire Wire
68 pages
Casting and Type Conversions
No ratings yet
Casting and Type Conversions
11 pages
Eim File PDF
No ratings yet
Eim File PDF
28 pages
Production Process of Monolithic IC
100% (2)
Production Process of Monolithic IC
5 pages
Shruti-1 User Manual: Mutable Instruments
No ratings yet
Shruti-1 User Manual: Mutable Instruments
18 pages
Notch Filter Digital
No ratings yet
Notch Filter Digital
43 pages
WC 2 Marks Qus PDF
No ratings yet
WC 2 Marks Qus PDF
11 pages
Unit 2 A
No ratings yet
Unit 2 A
48 pages
Pcs-Unit-Ii MCQ
No ratings yet
Pcs-Unit-Ii MCQ
6 pages
Speech Compression
No ratings yet
Speech Compression
37 pages
Audio Mixer Project Report
No ratings yet
Audio Mixer Project Report
7 pages
Clavia DMI AB
No ratings yet
Clavia DMI AB
4 pages
Seminar On Dolby Sound Processing
No ratings yet
Seminar On Dolby Sound Processing
19 pages
A Filter Is A System
No ratings yet
A Filter Is A System
10 pages
System Test Engineering (MSC) : S. No. Course Quality and Reliability Testing
No ratings yet
System Test Engineering (MSC) : S. No. Course Quality and Reliability Testing
4 pages
ECE Interview Questions With Answers
100% (1)
ECE Interview Questions With Answers
7 pages
01050528
No ratings yet
01050528
12 pages
Sampling, Quantization
No ratings yet
Sampling, Quantization
11 pages
AES 122 Paper
No ratings yet
AES 122 Paper
7 pages
A Paradigm For Physical Interaction With Sound in 3-D Audio Space
No ratings yet
A Paradigm For Physical Interaction With Sound in 3-D Audio Space
9 pages
Feldman-Mahalanobis Model
No ratings yet
Feldman-Mahalanobis Model
3 pages
UART / USART in PIC Microcontroller
No ratings yet
UART / USART in PIC Microcontroller
5 pages
Dolby Surround Pro Logic Operation
No ratings yet
Dolby Surround Pro Logic Operation
7 pages
EEE 425 Audio Processing: Izmir University Electronics and Communications Engineering Department
No ratings yet
EEE 425 Audio Processing: Izmir University Electronics and Communications Engineering Department
6 pages
Material Test Report: Cse. Chiang Sung Enterprise Co., LTD
No ratings yet
Material Test Report: Cse. Chiang Sung Enterprise Co., LTD
3 pages
Pulse Code Modulation (PCM) .: Basic Elements of PCM
No ratings yet
Pulse Code Modulation (PCM) .: Basic Elements of PCM
2 pages
Simplified Analogue Realization of The Digital Direct Synthesis (DDS) Technique For Signal Generation
No ratings yet
Simplified Analogue Realization of The Digital Direct Synthesis (DDS) Technique For Signal Generation
5 pages
List of Equipments Ec Lab
No ratings yet
List of Equipments Ec Lab
1 page
Medical Electronics Syllabus
No ratings yet
Medical Electronics Syllabus
1 page
Lecture 27: Mixers. Gilbert Cell: Mixers Shift The Frequency Spectrum of An Input Signal
No ratings yet
Lecture 27: Mixers. Gilbert Cell: Mixers Shift The Frequency Spectrum of An Input Signal
0 pages
Data Umum SSH 2024
No ratings yet
Data Umum SSH 2024
376 pages
Teen Smart Prep 2 2020
No ratings yet
Teen Smart Prep 2 2020
151 pages
Understanding SAP EWM Wave
No ratings yet
Understanding SAP EWM Wave
8 pages
Seafarer Medical Certificate
No ratings yet
Seafarer Medical Certificate
2 pages
Cryptography
No ratings yet
Cryptography
201 pages
Application of Six Sigma With Respect To Abbott Laboratories.
100% (1)
Application of Six Sigma With Respect To Abbott Laboratories.
17 pages
Energy harvesting Third Edition
From Everand
Energy harvesting Third Edition
Gerardus Blokdyk
No ratings yet
Learning Area Learners With Special Educational Needs (LSEN) Learning Delivery Modality Modular Distance Learning Modality
No ratings yet
Learning Area Learners With Special Educational Needs (LSEN) Learning Delivery Modality Modular Distance Learning Modality
5 pages
A.Datum Case Study
No ratings yet
A.Datum Case Study
23 pages
Keralauniversity of Fisheries & Ocean Studies: Panangad P.O., Kochi 682 506, Kerala, India
No ratings yet
Keralauniversity of Fisheries & Ocean Studies: Panangad P.O., Kochi 682 506, Kerala, India
13 pages
M. Ali Asdar Departement of Pulmonology and Respiratory Medicine Faculty of Medicine University of Indonesia - Persahabatan General Hospital Jakarta
No ratings yet
M. Ali Asdar Departement of Pulmonology and Respiratory Medicine Faculty of Medicine University of Indonesia - Persahabatan General Hospital Jakarta
30 pages
Sec A: Project: It Building, Bhaktapur NEA Supply GEN Supply
No ratings yet
Sec A: Project: It Building, Bhaktapur NEA Supply GEN Supply
3 pages
Domino Squares
100% (2)
Domino Squares
1 page
Term 2 Year 1 Hass
No ratings yet
Term 2 Year 1 Hass
13 pages
Project Scope Statement1
No ratings yet
Project Scope Statement1
6 pages
Update On Renewed Effort To Strengthen Routine Immunization
No ratings yet
Update On Renewed Effort To Strengthen Routine Immunization
49 pages
4as Tle7 LC4
No ratings yet
4as Tle7 LC4
5 pages
New Microsoft Office Word Document
No ratings yet
New Microsoft Office Word Document
6 pages
Guidanc CTspection
No ratings yet
Guidanc CTspection
17 pages
Introduction To Soil Ecology
No ratings yet
Introduction To Soil Ecology
15 pages
Intervention21120-5570393 152823
No ratings yet
Intervention21120-5570393 152823
10 pages
SonarQube Users (Archive) - Java - lang.OutOfMemoryError - Java Heap Space PDF
No ratings yet
SonarQube Users (Archive) - Java - lang.OutOfMemoryError - Java Heap Space PDF
9 pages
Dilution Systems For Aerosols Series DIL, DDS and HDS: Special Advantages
No ratings yet
Dilution Systems For Aerosols Series DIL, DDS and HDS: Special Advantages
4 pages
CS Nipple 21K-62-71310
No ratings yet
CS Nipple 21K-62-71310
1 page

Speech Compression Techniques - Formant and CELP Vocoders

Uploaded by

Speech Compression Techniques - Formant and CELP Vocoders

Uploaded by

Speech compression techniques –

Formant and CELP Vocoders

corresponds to the sound generation

The period of the

Bandpass Lowpass A/D

• The channel vocoder employs a bank of bandpass

• The phase vocoder is similar to the channel

• However, instead of estimating the pitch, the

• By coding and transmitting the phase

• The formant vocoder can be viewed as a type

• It is this information plus the pitch period that

• The speech can be represented as the output

Fk :The frequency of the kth formant

the input to the vocal tract filter is either the output of a

 the samples of the voiced speech have larger amplitude;

• the autocorrelation of a periodic function Rxx (k) will

If a sequence yn is periodic with period P0, samples that are P0

• if yn are the speech samples in that particular segment, then

we assume that the yn sequence is zero outside the segment for

• reflection coefficients, or partial correlation

• In CELP, each trial waveform is synthesized by passing

You might also like