0% found this document useful (0 votes)
28 views23 pages

Unit2 1

The document discusses different types of vocoders used for digital coding of speech signals, including channel vocoders, formant vocoders, cepstrum vocoders, voice-excited vocoders, LPC-10 vocoders, code-excited linear prediction (CELP) vocoders, and mixed-excitation linear prediction (MELP) vocoders. It provides details on how each vocoder type works and the principles behind them.

Uploaded by

Hydra ANI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views23 pages

Unit2 1

The document discusses different types of vocoders used for digital coding of speech signals, including channel vocoders, formant vocoders, cepstrum vocoders, voice-excited vocoders, LPC-10 vocoders, code-excited linear prediction (CELP) vocoders, and mixed-excitation linear prediction (MELP) vocoders. It provides details on how each vocoder type works and the principles behind them.

Uploaded by

Hydra ANI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Ajay Kumar Garg Engineering College

Wireless & Mobile Communication


(KEC-076)
UNIT 2
Vocoders
Lecture-1
By

Mr. Naveen Kumar Saini


Assistant Professor
Department of Electronics & Communication Engineering Department
VOCODERS
• Vocoder is an audio processor that is used to transmit speech or voice signal
in the form of digital data. The vocoder is used as short form for voice
coder. Vocoders are basically used for digital coding of speech and voice
simulation. The bitrate for available narrowband vocoders is from 1.2 to 64
kbps.
• Vocoder operates on the principle of formants. Formants are basically the
meaningful components of a speech that is generated due to the human voice.
• Whenever a speech signal is transmitted, it is not needed to transmit the
precise waveform. We can simply transmit the information by which one can
reconstruct that particular waveform. This reconstructed waveform at the
receiver must be similar and identical to the waveform actually transmitted.

2
• Vocoder works in such a way that it first captures the characteristic element of the signal.
Then other audio signals are affected by the use of that characteristic signal.
• Vocoders are a class of speech coding systems that analyze the voice signal at the
transmitter, transmit parameters derived from the analysis, and then synthesize the voice at
the receiver using those parameters.
• The pitch frequency for most speakers is below 300 Hz, and extracting this information
from the signal is very difficult.
• The pole frequencies correspond to the resonant frequencies of the vocal tract and are
often called the formants of the speech signal.
• For adult speakers, the formants are centered around 500 Hz, 1500 Hz, 2500 Hz, and
3500 Hz.

3
HUMAN SPEECH PRODUCTION SYSTEM

Speech production :

(a) Human speech production


modelling

(b) Equivalent synthetic speech


production blocks

4
• A voice model is used to simulate voice. As speech contains a sequence of voiced
and unvoiced sounds, this is the basis for the operation of a voice model.
• Before proceeding further, it is better to first understand what is voiced and
unvoiced sounds.
• Voice sounds are basically the sounds generated by vibrations of the vocal
cords.
• On contrary, the sound produced at the pronunciation of the letters such as ‘s’, ‘p’
or ‘f’ is known as unvoiced sounds. Unvoiced sounds are generated
by expelling air through lips and teeth.
• voiced sounds are simulated by the impulse generator, the frequency of which
is equal to the fundamental frequency of vocal cords. The noise source present in
the circuit is used to simulate the unvoiced sounds.
• The position of the switch helps in determining whether the sound is voiced or
unvoiced.
• Then the selected signal is passed through a filter that simulates the effect of
mouth, throat and nasal passage of speaker. The filter unit then filters the input
in such a way so as the required letter is pronounced. Thus we can have a
synthesized approximated speech waveform. 5
VOICE ENCODER

6
VOICE DECODER

7
TYPES OF VOCODERS:
• Channel Vocoders
• Formant Vocoders
• Cepstrum Vocoders
• Voice-Excited Vocoders
• LPC-10 which uses linear predictive coding
• Code-excited linear prediction (CELP)
• Mixed-excitation linear prediction (MELP)
• Adaptive Differential Pulse Code Modulation (ADPCM)

8
Channel Vocoders
• The sound generating mechanism forms the source and is linearly separated from
the intelligence modulating vocal tract filter which forms the system.

• The speech signal is assumed to be of two types: voiced and unvoiced

• Voiced sounds are a result of quasiperiodic vibrations of the vocal chord and
unvoiced sounds are fricatives produced by turbulent air flow through a
constriction

• The pitch frequency for most speakers is below 300 Hz.

• The pole frequencies correspond to the resonant frequencies of the vocal tract and
are often called the formants of the speech signal. 9
Formant Vocoders
• The formant vocoder is similar in concept to the channel vocoder.
• The formant vocoder can operate at lower bit rates than the channel vocoder
because it uses fewer control signals
• The formant vocoder attempts to transmit the positions of the peaks (formants) of
the spectral envelope, instead of sending samples
• A formant vocoder must be able to identify at least three formants for representing
the speech sounds
• It must control the intensities of the formants.
• Formant vocoder can reproduce speech at bit rates lower than 1200 bits/s.

10
Cepstrum Vocoders:
• The Cepstrum vocoder separates the excitation and vocal tract spectrum by
inverse Fourier transforming of the log magnitude spectrum to produce the
Cepstrum of the signal.
• The low frequency coefficients in the cepstrum correspond to the vocal tract
spectral envelope
• The high frequency excitation coefficients forming a periodic pulse train at
multiples of the sampling period
• Linear filtering is performed to separate the vocal tract cepstral coefficients from
the excitation coefficients
• In the receiver, the vocal tract cepstral coefficients are Fourier transformed to
produce the vocal tract impulse response.
• By convolving this impulse response with a synthetic excitation signal (random
noise or periodic pulse train), the original speech is reconstructed
11
Voice-Excited Vocoder:
• Voice-excited Vocoders eliminate the need for pitch extraction and voicing
detection operation.

• This system uses a hybrid combination of PCM transmission for the low
frequency band of speech, combined with channel vocoding of higher frequency
bands.

• A pitch signal is generated at the synthesizer by rectifying, band pass filtering, and
clipping the baseband signal.

• Voice excited vocoder have been designed for operation at 7200 bits/s to 9600
bits/s 12
LPC-10 (which uses linear predictive coding)
• Linear predictive coding (LPC) is a method used mostly in audio signal
processing and speech processing for representing the spectral envelope of
a digital signal of speech in compressed form, using the information of
a linear predictive model.
• Most signals, such as speech, music and video signals, are partially predictable
and partially random.
• These signals can be modelled as the output of a filter excited by an uncorrelated
input.
• The random input models the unpredictable part of the signal, whereas the filter
models the predictable structure of the signal.
• The aim of linear prediction is to model the mechanism that introduces the
correlation in a signal.

13
• Speech is generated by inhaling air and then exhaling it through the glottis and the
vocal tract. The noise-like air, from the lung, is modulated and shaped by the
vibrations of the glottal cords and the resonance of the vocal tract.
• Figure illustrates a source-filter model of speech. The source models the lung, and
emits a random input excitation signal which is filtered by a pitch filter.

The pitch filter - long-term predictor, it models the correlation of each sample with the
samples a pitch period away.
The vocal tract- Short-term predictor, it models the correlation of each sample with
the few preceding samples.
14
• A linear predictor model forecasts the amplitude of a signal at time m, x(m), using a
linearly weighted combination of P past samples [x(m−1), x(m−2), ..., x(m−P)] as :

where the integer variable m is the discrete time index, xˆ(m) is the prediction of x(m), and
ak are the predictor coefficients.

15
Code-excited linear prediction (CELP)
One of the main principles behind CELP is
called Analysis-by-Synthesis (AbS),
meaning that the encoding (analysis) is
performed by perceptually optimising the
decoded (synthesis) signal in a closed loop.
The CELP technique is based on three
ideas:
1.The use of a linear prediction (LP) model
to model the vocal tract
2.The use of (adaptive and fixed) codebook
entries as input (excitation) of the LP
model
3.The search performed in closed-loop in a
``perceptually weighted domain''
16
Mixed-excitation linear prediction (MELP)
• The MELP vocoder evolved from improvements and modifications to another code
excited linear predictive (CELP) coder known as LPC-10.
• The MELP coder uses a mixed-excitation model that can produce more natural sounding
speech because it can represent a richer ensemble of possible speech characteristics.
• MELP encoding is robust in difficult acoustic environments with significant background
noise and reverberation such as those frequently encountered in commercial and military
communication systems.

17
• The Mixed Excitation Linear Prediction coder is based on the traditional Linear
Prediction Coder (LPC) parametric model, but also includes five additional features. They
are:
1. Mixed excitation
2. Aperiodic pulses
3. Adaptive spectral enhancement
4. Pulse dispersion
5. Fourier magnitude modeling.
• When the input speech is voiced, the MELP coder can synthesize using either periodic or
aperiodic pulses.
• Aperiodic pulses are used most often during transition regions between voiced and
unvoiced segments of the speech signal. This feature enables the decoder to reproduce
erratic glottal pulses without introducing tonal sounds.

18
Adaptive differential pulse code modulation
(ADPCM)
• Adaptive differential pulse code modulation (ADPCM) is a very efficient digital coding
of waveforms.

• The principle of ADPCM is to use our knowledge of the signal in the past time to predict
it in the future, the resulting signal being the error of this prediction.

• PCM is performed before ADPCM to decrease the number of bits for coding by passing
through a PCM process before transforming to an ADPCM sample.

19
• ADPCM Encoder:
• Subsequent to the conversion of the A-law or µ -law, PCM input signal to uniform PCM,
a difference signal is obtained by subtracting an estimate of the input signal from the
input signal itself.
• An adaptive 31-, 15-, 7-, or 4-level quantizer is used to assign five, four, three, or two
binary digits, respectively, to the value of the difference signal for transmission to the
decoder.

20
• ADPCM Encoder:
• An inverse quantizer produces a quantized difference signal from these same five, four,
three or two binary digits, respectively.
• The signal estimate is added to this quantized difference signal to produce the
reconstructed version of the input signal.
• Both the reconstructed signal and the quantized difference signal are operated upon by an
adaptive predictor, which produces the estimate of the input signal, thereby completing
the feedback loop.

21
• ADPCM Decoder:
• The decoder includes a structure identical to the feedback portion of the encoder, together
with a uniform PCM to A-law or -law conversion and a synchronous coding adjustment.
• The synchronous coding adjustment prevents cumulative distortion occurring on
synchronous tandem coding (ADPCM, PCM, ADPCM, etc., digital connections) under
certain conditions.
• The synchronous coding adjustment is achieved by adjusting the PCM output codes in a
manner which attempts to eliminate quantizing distortion in the next ADPCM encoding
stage.

22
THANK YOU!!

23

You might also like