CSPL 392

This document describes a study of Code Excited Linear Prediction (CELP) speech coding. The authors designed a basic CELP coder and analyzed its rate-distortion characteristics. They then examined methods to reduce the computational complexity of the encoder by using special codebooks like binary, ternary and overlapping codebooks, and compared the performance of these codebooks in terms of complexity and speech quality. Finally, the authors investigated the use of variable rate coding to potentially reduce the rate of the CELP coder.

Uploaded by

Son Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views20 pages

CSPL 392

Uploaded by

Son Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

EECS 651- Source Coding Theory

Design of a CELP Speech Coder and

Study of Complexity vs Quality Trade-offs
for Different Codebooks.

Suresh Kumar Devalapalli
Raghuram Rangarajan
Ramji Venkataramanan

i
Abstract
In this term project, we study the compression of speech signals using Code Excited Linear
Prediction (CELP). We design a basic CELP coder and study its Rate vs. SNR characteristics
using MSE as the distortion criterion. The computational complexity of the encoding process is
analyzed. We then examine methods to reduce complexity in the encoder by using special types
of codebooks viz., binary, ternary and overlapping codebooks. We compare the performance of
these codebooks in terms of complexity and SNR of the reconstructed speech. It is found that
significant reduction in complexity can be achieved without much degradation in performance.
We then compare the different codebooks using Bark Spectral Distortion, a perceptual measure of
speech quality. As a final step, we examine if variable rate coding can help reduce the rate of the
CELP coder.

ii
Contents
1 Introduction..............................................................................................................................1
1.1 The CELP concept ............................................................................................................1
2 Design and Performance Analysis of a CELP Coder .................................................................4
3 Complexity of the CELP encoder......................6
4 Special Codebooks for CELP ...................................................................................................8
4.1 The Binary Codebook .......................................................................................................8
4.2 The Ternary Codebook......................................................................................................9
4.3 The Overlapping Codebook...............................................................................................9
4.4 Performance of the Special Codebooks............................................................................ 11
5 Comparison of CELP coders based on perceptual quality........................................................13
6 Variable rate coding ...............................................................................................................17
7 Conclusions............................................................................................................................17
REFERENCES ......................................................................................................................... 17

1
The most valuable of talents is that of never using two words when one will do.
- Thomas Jefferson
1 Introduction
For long, transmission of speech has been an important application of communication systems the
world over. Applications such as transmission of speech over the internet demand real-time
coding of speech at low bit rates. In most speech coders, the analog speech signal is sampled at
8000 samples/sec. The aim is to represent these samples with as few bits as possible with the
constraint of maintaining a required perceived quality of the reconstructed speech. Thus, speech
coding is clearly a lossy compression.
Over the years many speech coding techniques have been developed starting from PCM and
ADPCM in the 60s, to linear prediction in the 70s and Code Excited Linear Prediction (CELP)
[1] in the late 80s and 90s. Since its introduction, CELP has evolved to become the dominant
paradigm for real-time speech compression. In this work, we design a basic CELP coder and
study its rate vs. SNR characteristics. We then study various methods to reduce computational
complexity of the coder without compromising on the quality of reconstructed speech.
1.1 The CELP concept
The basic principle that all speech coders exploit is the fact that speech signals are highly
correlated waveforms. Speech can be represented using an autoregressive (AR) model:

1
.
L
n i n i n
i
X a X e

=
= +

(1.1)
Each sample is represented as a linear combination of the previous L samples plus a white noise.
The weighting coefficients
1 2
, ,...,
L
a a a are called Linear Prediction Coefficients (LPCs).

We
now describe how CELP uses this model to encode speech.
The samples of the input speech are divided into blocks of N samples each, called frames. Each
frame is typically 10-20 ms long (this corresponds to 80 160 N = ). Each frame is divided into
smaller blocks, of k samples (equal to the dimension of the VQ) each, called subframes. For
each frame, we choose
1 2
, ,...,
L
a a a so that the spectrum of {
1 2
, ,...,
N
X X X }, generated using
the above model, closely matches the spectrum of the input speech frame. This is a standard
spectral estimation problem and the LPCs
1 2
, ,...,
L
a a a

can be computed using the Levinson-
Durbin algorithm.

2

Adaptive Codebook

Fig.1: Basic CELP scheme, minimize error by selecting best codebook entry.

Writing Eq.(1.1) in z-domain, we obtain

) (
1
) .... ( 1
1
) (
) (
2
2
1
1
z A z a z a z a z E
z X
L
L
=
+ + +
=

(1.2)
From Eqs.(1.1) and (1.2), we see that if we pass a white sequence [ ] e n through the filter
1/ ( ) A z , we can generate ( ) X z , a close reproduction of the input speech.
The block diagram of a CELP encoder is shown in Fig.1. There is a codebook of size M and
dimensionk , available to both the encoder and the decoder. The codevectors have components
that are all independently chosen from a (0,1) N distribution so that each codevector has an
approximately white spectrum. For each subframe of input speech ( k samples), the processing
is done as follows: Each of the codevectors is filtered through the two filters (labeled 1/ ( ) A z
and 1/ ( ) B z ) and the output y
k
is compared to the speech samples. The codevector whose output
best matches the input speech (least MSE) is chosen to represent the subframe.
The first of the filters,1/ ( ) A z , is described by Eq.(1.2). It shapes the white spectrum of the
codevector to resemble the spectrum of the input speech. Equivalently, in time-domain, the filter
incorporates short-term correlations (correlation with L previous samples) in the white sequence.
Besides the short-term correlations, it is known that regions of voiced speech exhibit long term
CODE BOOK
GAIN LPCs

PITCH
INPUT
SPEECH

y
n
_

+

error
1 1
A(z) B(z)
MSE
3
periodicity. This period, known as pitch, is introduced into the synthesized spectrum by the pitch
filter 1/ ( ) B z . The time domain behavior of this filter can be expressed as:
[ ] [ ] [ ] , y n x n y n P = + (1.3)
where [ ] x n is the input, [ ] y n is the output and P is the pitch.
The speech synthesized by the filtering is scaled by an appropriate gain to make the energy
equal to the energy of the input speech. To summarize, for every frame of speech ( N samples)
we compute the LPCs and pitch and update the filters. For every subframe of speech ( k samples),
the codevector that produces the best filtered output is chosen to represent the subframe.
The decoder receives the index of the chosen codevectors and the quantized value of gain for
each subframe. The LPCs and the pitch values also have to be quantized and sent every frame for
reconstructing the filters at the decoder. The speech signal is reconstructed at the decoder by
passing the chosen codevectors through the filters.
An interesting interpretation of the CELP encoder is that of a forward adaptive VQ. The filters
are updated every N samples and so we have a new set of codevectors y
k
every frame. Thus, the
dashed block in Fig.1 can be considered a forward adaptive codebook because it is designed
according to the current frame of speech.
In our design, we have used MSE as the criterion to choose the best codevector. However, it is
the perceptual quality of the synthesized speech that we should seek to optimize. Does a lower
MSE always guarantee better sounding speech? This is in general, but not always, true and so
many practical CELP coders use perceptually weighted MSE as the fidelity criterion. In our
design, we found MSE to be a reasonable fidelity criterion. In a later section, we examine the
correlation between MSE and the perceptual quality of the synthesized speech.

Rate
The rate of the CELP coder is determined by two factors:
1. Rate of the VQ =
2
log
VQ
M
R
k
=
2. Overhead bits needed to send the quantized values of gain for every subframe and the
LPC coefficients for each frame.
The rate of the coder in bits per second is given by
( #overhead bits/sample)*8000
VQ
R R = +

4
In this work, we consider only the rate of the VQ in all our experiments. In practical systems, the
number of bits is allocated for the VQ and the overhead values is approximately equal. So the rate
of the VQ is roughly half the rate of the coder.
2 Design and Performance Analysis of a CELP Coder
We designed a basic CELP coder in MATLAB with the following parameters:
- Frame size N=80
- L = 10 (10
th
order AR model)
- Fidelity criterion- MSE
- Dimension k=10 or 5
Our input speech sample for all experiments comprised of two sentences each of male and
female speech. The SNR of the reconstructed speech was determined at different rates for
dimensions k=5 and k=10. The SNR vs. rate curves obtained for 5 k = and 10 k = are shown in
Fig.2. We ran experiments with codebooks of size M = 32, 64, 128, 256, 512 and 1024. This
corresponds to a varying rate R
VQ
from 0.5 to 1 when 10 k = and from 1 to 2 when 5 k = .
The SNRs predicted by Zadors formula are also shown in the graph. The dotted line is the
Zador SNR assuming an AR-1 model for speech. This was computed by estimating the AR-1
parameter (correlation coefficient ) from the input speech signal. The line above the dotted line
represents the Zador SNR assuming a k
th
order AR model. The Zador factor
k
for this model is
given by

2
1
2
2
2
2
k
k
k
k
K
k
t
|
o
+
+ | |
|
\ .
=
The k autocorrelation values required for computing the determinant were estimated from the
input speech signal. While it may seem surprising that the SNR of the CELP coder is higher than
the Zador SNR, it should be noted that CELP is significantly different from a normal VQ (to
which we have applied Zadors formula). CELP has an adaptive codebook that changes every
frame according to the input speech. The SNR predicted by Zadors formula is applicable to a
fixed codebook VQ for a source that is characterized by the estimated covariance matrix K.
Thus, we cannot expect Zadors formula (in the form applied here) to give a reasonable estimate
of the SNR. An improved estimate could be obtained by calculating the Zador SNR for every
frame based on a covariance matrix that is estimated for that frame.
5
We also observe that for 10 k = (lower rates), the rate vs. SNR slope is clearly more than 6
dB/bit; for 5 k = (higher rates) the slope approaches 6dB/bit. This is consistent with our
expectation that SNR increases at approximately 6dB/bit at high rates.

(a)

(b)
Fig.2 SNR vs Rate characteristic of our CELP coder (a) k=5 (b)k=10
6
3 Complexity of the CELP encoder
Complexity is an important consideration while designing any VQ. The CELP algorithm gives
good output speech quality at low bit rates. However, this quality is obtained at the cost of very
high complexity. Since CELP coders are used for real-time applications such as the transmission
of speech over networks, long delays in encoding are undesirable. In this section, we analyze
complexity of the encoder and derive expressions for the number of computations and storage
required. We then discuss methods to reduce the complexity of the algorithm without significant
degradation in performance.

Computations for filtering the codebook each subframe

VQ (3M ops./sample)
Fig. 3: Computational complexity of CELP

To analyze complexity, it is convenient to look at CELP as a forward adaptive VQ. The following
analysis will serve as a baseline for comparison though the actual operations may be performed
differently pin real-time applications. From Fig.3, it is clear that the encoding process involves
two stages:
1. The filtering operation: Convolution of each codevector with the impulse response of the
filter. This has to be done every k samples of speech.
2. Choosing the best codevector from the filtered codebook. This part can be considered a
VQ and hence involves 3M ops/sample.
CODE BOOK
LPCs pitch
INPUT
SPEECH

y
n
_

+
error
1 s
n
1
A(z) B(z)
MSE
7
The number of operations in Step 1 can be derived as follows:
While filtering the codevectors in a subframe, we have to take care of the zero-input response
(ZIR) at the output of the filters. This is produced due to filtering that occurred in the previous
subframe. At the beginning of each subframe, we can subtract the ZIR from the speech. Note that
the subtraction needs to be done only once in a subframe, since the ZIR is the same for all
codevectors. We can then represent the convolution of each codevector with the impulse response
of the LPC filter by a matrix multiplication (i.e. as if we have zero initial state):
, s H d = (3.1)
where d is the codevector, s is the output of the LPC filter and H is the lower triangular matrix :

0
1 0
1 2 0
0 0 . . . 0
0 . . . 0
. .
.
. . . .
k k
a
a a
H
a a a

(
(
(
(
=
(
(
(

For each codevector (dimensionk ), the matrix multiplication requires ( 1) / 2 k k multiplications
and ( 1) / 2 k k additions.
The pitch filter equation can be written as
,
n n p
y s g y

= + (3.2)
where g is the pitch gain, p is the pitch period and
n
y is the filtered codevector. This requires 2k
operations per codevector. Thus the total number of operations per codevector is

( 1) ( 1)
/ 2 ( 1)
2 2
n
k k k k
op codevector k k k

= + + = +
Since we have M codevectors and k samples/codevector, the filtering in Step 1 will require
) 1 ( + k M operations/sample. The VQ part of the CELP coder will require 3M operations/sample.
Thus the overall computation for the encoding process is
( 4) . /
n
op M k ops sample = + (3.3)
We have to store the M codevectors; so we need storage corresponding toMk floating point
numbers.
To get an idea of the complexity involved, for 10 k = , encoding requires 14M ops./sample. Out
of this, 11M is due to the filtering part and while the VQ part requires 3M operations/sample.
Clearly, we would like to reduce the complexity of filtering without sacrificing performance. The
codebook that we use consists of codevectors with random numbers drawn from a (0,1) N
8
distribution. In the following subsection, we consider special types of codebooks[2], which
reduce the filtering complexity and analyze the performance vs. complexity tradeoffs obtained by
using them.

4 Special Codebooks for CELP
4.1 The Binary Codebook
Binary codebooks contain codevectors with only binary components. i.e. zeros and ones. Clearly,
the filtering operation does not require multiplications. The filtering of a codevector involves only
additions and that too, only when the component of the codevector is a one. The number of
ops./sample is computed using a method similar to the previous part.
For the LPC filter, the number of multiplications is zero while the number of additions is half
the number of additions in the previous case (there are 50% zeros and 50% ones in the binary
codebook). The long-term filter will still involve 2k operations per codevector. Thus the total
number of operations for the filtering part alone will require
( 1)
/ 2
2.2
b
k k
op codevector k

= +
Since we have M codevectors and k samples/codevector, the filtering will require
( 1)
2
4
k
M M

+ operations/sample. The VQ part of the CELP coder will require 3M

operations/sample. Thus the overall computation for the encoding using a binary codebook is

( 7)
3 . /
4
b
M k
op M ops sample
+
= + (4.1)
For 10 k = , this translates to ~ 7M ops./sample compared to 14M for the normal codebook.

Interestingly, the binary codebook involves no storage when 2
k
M = as the index of the
codevector in binary notation will be the codevector itself. However, a major disadvantage of
using the binary codebook is that the size M can be at most 2
k
. This is because for a given k we
have only 2
k
possible binary sequences. For example, for 5 k = , M can be at most 32, which is
too small a number to get a good reproduction of speech. This is discussed later when we
compare the performance of different codebooks.

9
4.2 The Ternary Codebook
In ternary codebooks [2][3], the components of the codevectors are chosen from the set {-1,0,1}.
This offers more flexibility than the binary codebook. Here again, we do not have multiplications;
we have only additions and subtractions. The complexity of the filtering can be controlled by
varying the percentage of zeros in the codebook. The higher the number of zeros, the lesser the
number of adds or subtracts we need to perform. For our experiments, we used 50% zeros. The
ternary codevectors were created by first generating a uniform random variable in [-1,1] and all
values within 0.5 were mapped to zero, those greater than 0.5 to 1 and those less than 0.5 to -1.
This ensures that the number of computations in the ternary case is exactly the same as that in the
binary case. As before, the number of operations/sample is

( 7)
3 .
4
t
M k
op M
+
= + (4.2)
The storage required corresponds to storing Mk ternary valued numbers. As we will see later,
the ternary codebook performs very well- the performance is much better than the binary
codebook and almost as good as the normal codebook.

4.3 The Overlapping Codebook
Overlapping codebooks [2][3] have codevectors that are a shifted version of the preceding vector
plus some extra components. An example of a 6 dimensional codebook with overlap 1 would be:
w
1
=(d
1
,d
2
,d
3
,d
4
,d
5
,d
6
), w
2
=(d
2
,d
3
,d
4
,d
5
,d
6
,d
7
), w
3
=(d
3
,d
4
,d
5
,d
6
,d
7
,d
8
) and so on.
An overlapping codebook of dimension k , size M and overlap 1 would be of the form

1 2
2 3 1
3 4 2
1 1
( . . . . )
( . . . . )
( . . . . )
..
..
( . . )
k
k
k
M M k M
d d d
d d d
d d d
d d d
+
+
+ +
(
(
(
(
(
(
(
(
(

Since the codebook has only M+k-1 distinct entries, we need to store only M+k-1 floating point
numbers instead of the usual Mk numbers. We determine the complexity of filtering an
overlapping codebook as follows:

10
Compare the output of the LPC filter corresponding to the i
th
and (i+1)
th
codevectors :
1. When the i
th
codevector is filtered, the output s
i
is given by:

0 0
1 0 1 1 0 1
2 1 1 0 2
1 2 0 1 1
0 0 . . . 0
0 . . . 0
. .

. . .
. . .
. . . .
i i
i i i
i i i
i
k k i k k i
a d a d
a d a d a a d
a d a d a d
s
a a a d a d
+ +
+ +
+
( (
( (
+
( (
( (
+ +
= =
( (
( (
( (
( (
( (
2 1 0 1
...
k i i k
a d a d
+ +
(
(
(
(
(
(
(
(
+ + +
(

(4.3)

2. When the i+1
th
codevector is filtered, the output s
i+1
is given by

0 1 0 1
1 1 0 2 1 0 2
2 1 1 2 0 3
1
1 2 0
0 0 . . . 0
0 . . . 0
. .

. .
. .
. . . .
i i
i i i
i i i
i
k k i k
a d a d
a d a d a a d
a d a d a d
s
a a a d
+ +
+ + +
+ + +
+
+
( (
( (
+
( (
( (
+ +
= =
( (
( (
( (
( (
( (
1 1 2 2 0
.
.
...
k i k i i k
a d a d a d
+ + +
(
(
(
(
(
(
(
(
+ + +
(

(4.4)

Comparing the RHS of Eqs.(4.3) and (4.4), we make the following observation: While
computing
i
s , we have computed all the additive terms required to determine
1 i
s
+
, except for
those in the last row, already computed in the previous case. If we have a temporary buffer to
store all the terms involved in computing
i
s , we then only need to compute the terms
1 1 2 2 0
, ,...,
k i k i i k
a d a d a d
+ + +
and then perform the additions. Thus at the cost of slight increase in
temporary storage, most of the multiplications are eliminated. For the last row, we require k
multiplications and k-1 additions and for the remaining rows we require
2
) 1 )( 2 ( k k
additions.
The number of operations/codevector for the filtering part is given by:

( 1)( 2) ( 5)
/ 2 1 2
2 2
o
k k k k
op codevector k k
+ | |
= + + =
|
\ .

11
Taking into account the 3M ops./sample for the VQ, the total computation for encoding using
the overlapping codebook is given by

( 5)
3 . /
2
o
M k
op M ops sample
+
= + (4.5)

The storage and the complexity involved for each of the codebooks is summarized in Table 1.

CODEBOOK STORAGE
COMPUTATION
(ops/samples)
ops/sample for k=10
Normal (iid Gaussian) Mk M(k+1)+3M 14M
Overlapping M+k-1 M(k+5)/2+3M 10.5M
Ternary Mk ternary numbers M(k+7)/4+3M ~7M
Binary None M(k+7)/4+3M ~7M

Table 1: Storage and complexity for various codebook structures.

4.4 Performance of the Special Codebooks
Having evaluated the complexity of the different codebooks, we now compare their
performance. The SNR of the reconstructed speech was determined at different rates for each of
the codebooks. The SNR vs. rate curves obtained for dimension 10 k = are shown in Fig.4.
Clearly, the performance of the binary codebook is the worst among the four. This is not
surprising as the codevectors contain only ones and zeros. We recall that the codevectors are
supposed to be zero-mean white sequences. The binary valued codevectors are not zero-mean and
do not have a white spectrum. One possible scope for improvement would be to consider
codevectors with binary components 1 and -1. Though the number of codevectors would still be
limited by the dimension of the codebook, the codevectors would be zero mean and might give a
better SNR than {0, 1} binary codebook..
Remarkably, the ternary codebook, which is a slight modification over the binary codebook,
produces much higher SNR (the SNR curve is just a little lower than the normal codebook). Since
we have used a ternary codebook with 50% zeros, the number of computations is same as in the
binary case. Despite its structured nature, the overlapping codebook performs almost as well as
the normal codebook. An intuitive explanation could be given to justify this. Although the
12

Fig.4. Comparison of different codebooks

codevectors overlap, each of them still has a white spectrum since it consists of independent
variables drawn from an (0,1) N distribution. When these codevectors are filtered, the outputs
could be considerably different. Therefore, there need not be any structure in the adaptive
codebook although the codevectors have overlapping components,

To summarize, we see that the ternary and overlapping codebooks perform almost as well as the
normal codebook. Both are excellent alternatives to the normal codebook that reduce the
encoding complexity significantly. The overlapping codebook performs slightly better than the
ternary. From Table 1, we see that this is achieved at a cost of higher computational complexity.
There is a tradeoff between computational complexity and SNR and we can choose a codebook
suited to the constraints imposed on the encoder.

13
5 Comparison of CELP coders based on perceptual quality
Upto this point, the performance of the coders was analyzed based on MSE as the distortion
criterion. A few questions we may ponder about are, How good is MSE as the distortion
criterion? Does a coder with low SNR always sound worse than a coder with high SNR?
How do we compare the performance of different coders? These are questions we would like to
address in this sub-section.

Mean Opinion Score (MOS)
The most widely accepted criterion for the performance of a coder is Mean Opinion Score
(MOS). This is a subjective test and it maps the perceived level of distortion on to a scale of
numerical values in the range 5-1. The rating scale employed in MOS testing is illustrated in the
following table.
Table 2 Rating Scale of Mean Opinion Score

MOS permits the ranking of coders according to their subjective quality. But it involves lengthy
listening tests which are usually costly and time consuming. So, an objective measure that
predicts the subjective quality of the coder would be useful. Bark Spectral Distortion [4] is one
such measure that has been shown to be highly correlated with Mean Opinion Scores.

Bark Spectral Distortion (BSD)
BSD tries to measure the subjective quality by considering the features of perceptual processing
of speech sounds by the human ear. The steps involved in the computation of BSD are briefly
explained below.
- Critical band filtering: In this step, the speech is passed through band of filters whose
center frequencies and bandwidths increase with frequency. This takes care of the
changing sensitivity of the ear as the frequency varies. The human ear is more sensitive
at lower frequencies than at higher frequencies.
Rating Speech Quality Level of Distortion
5 Excellent Imperceptible
4 Good Just perceptible but not annoying
3 Fair Perceptible and slightly annoying
2 Poor Annoying but not objectionable
1 Unsatisfactory Very annoying and objectionable
14
- Equal loudness pre-emphasis: This step incorporates the fact that the ear is not equally
sensitive to stimulations at different frequencies. For e.g., a 100Hz tone needs to be up to
35dB more intense than a 1000Hz tone, for the two to sound equally loud. After this
step, the spectrum is loudness equalized so that the relative loudness imitates the
perceptual loudness.
- Computation of BSD: Both the original and the reconstructed speech are processed
through the two steps described above. The BSD is then computed as the Euclidean
distance between the two.

Mapping BSD to MOS
To map the BSD to Mean Opinion Score, we used the data in Table 3. A second-order equation
was obtained by performing least square regression on the data.

Table 3 MOS for different values of BSD
(Note: This data was obtained from Dr. Robert Yantorno, Temple Univ., PA)

The equation that we obtained to estimate MOS from BSD is:
MOS = 0.0169 0.4359*BSD + 4.2253*BSD
2

Using this, MOS was estimated for various rates for the special codebooks discussed in the
previous section to compare their subjective performances. Fig.5 in the next page compares their
MOS obtained for various codebooks. From the graphs we see that at low rates ( 10 k = ), all the
codebooks perform more or less same. But at higher rates ( 5 k = ), overlap and normal
codebooks are perceptually better than the ternary codebook.

BSD MOS
9.725 1.58
3 3.07
0.475 4.00
0.412 4.07
15

(a)

(b)

Fig.5 MOS vs. Rate for different codebooks. (a) k=10 (b) k=5
16
We now try to examine the validity of using MSE as a measure of perceptual distortion. Fig.6
shows the relation between SNR and MOS for the various codebooks considered. We see that
MOS increases with SNR and the relationship is almost linear in the case of the normal
codebook. Hence, we conclude that MSE is a reasonably good fidelity criterion, though not the
best, for the designing CELP coders.

Fig.6 Correlation between MOS and SNR for different codebooks.
17
6 Variable rate coding
We conducted an experiment to see whether variable rate coding would be help improve the rate
in a CELP coder. We estimated the probability of each of the code vector in a codebook with
10, 512 k M = = . This was done by encoding a large number of speech samples and finding the
frequency of occurrence of each codevector. With these probabilities, the entropy of the source
was found to be 0.88. This suggests that all the code vectors are more or less equally probable.
Hence, there is not much to gain by using a variable rate code in CELP.
7 Conclusions
In this term project, we designed a simple CELP coder that produces good quality speech at low
bit rates. Its performance at different rates was studied and its computational complexity
analyzed. To reduce its complexity, various special codebooks were used and their performance
was evaluated. We found that some of these codebooks achieve significant computational savings
without compromising a great deal on performance. We used BSD as a measure to compare the
subjective quality of the different codebooks, since it is closely correlated with MOS. Using this,
observed that MOS and SNR are closely correlated which suggests that MSE is not a bad
distortion criterion. From our last experiment, we concluded that there is not much to gain by
using variable rate coding in CELP.

REFERENCES
[1] M. R. Schroeder and B. S. Atal, Code-excited linear prediction (CELP): High quality speech
at very low bit rates, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, Mar
1985, pp 937-940.

[2] W. B. Kleijn, D. J. Krasinski and R. H. Ketchum, Fast Methods for the CELP speech
coding algorithm, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, Aug 1990,
pp 1330-1342.

[3] R.Goldberg and L.Riek, A practical handbook of Speech Coders, CRC Press, 2000.

[4] Shihua Wang, Andrew Sekey, and Allen Gersho, An Objective Measure for Predicting
Subjective Quality of Speech Coders, IEEE Journal on selected areas in communications,
June 1992, pp 819-829.

Welcomes You To ISO 9001: 2015 Awareness Training Programme
100% (2)
Welcomes You To ISO 9001: 2015 Awareness Training Programme
184 pages
Real-Time Implementation of Melp Vocoder
No ratings yet
Real-Time Implementation of Melp Vocoder
11 pages
Evaluation of Gas Hydrate in Gas Pipeline Transportation
No ratings yet
Evaluation of Gas Hydrate in Gas Pipeline Transportation
107 pages
Agitation Laboratory Report
100% (4)
Agitation Laboratory Report
34 pages
CELP
No ratings yet
CELP
23 pages
Code Excited Liner Predictive Coding
No ratings yet
Code Excited Liner Predictive Coding
9 pages
Comparative Between CELP and ACELP Encoder For CDMA
No ratings yet
Comparative Between CELP and ACELP Encoder For CDMA
6 pages
6 Comparative Review
No ratings yet
6 Comparative Review
6 pages
dịch bt
No ratings yet
dịch bt
11 pages
Unit Iv Audio and Video Coding
No ratings yet
Unit Iv Audio and Video Coding
15 pages
dịch bt
No ratings yet
dịch bt
13 pages
Speech Compression
No ratings yet
Speech Compression
15 pages
Kbps Wideband Speech Coding Technique Based On Algebraic Celp
No ratings yet
Kbps Wideband Speech Coding Technique Based On Algebraic Celp
4 pages
Codificadores de Voz
No ratings yet
Codificadores de Voz
26 pages
Speech Generation
No ratings yet
Speech Generation
11 pages
Speech Coder
No ratings yet
Speech Coder
20 pages
LPC Modeling: Unit 5 1.speech Compression
No ratings yet
LPC Modeling: Unit 5 1.speech Compression
13 pages
New Speech Coding Techniques: Mr. L.Ramesh Ap/Ece
No ratings yet
New Speech Coding Techniques: Mr. L.Ramesh Ap/Ece
24 pages
Speech Coders For Wireless Communication
No ratings yet
Speech Coders For Wireless Communication
53 pages
Adaptive Multi Rate Coder Using ACLP
No ratings yet
Adaptive Multi Rate Coder Using ACLP
45 pages
6.1 To 13.3-Kfvs Variable Rate Celp Codec (Vr-Celp) For Amr Speech Coding
No ratings yet
6.1 To 13.3-Kfvs Variable Rate Celp Codec (Vr-Celp) For Amr Speech Coding
4 pages
Comparative Analysis of Speech Compression Algorithms With Perceptual and LP Based Quality Evaluations
No ratings yet
Comparative Analysis of Speech Compression Algorithms With Perceptual and LP Based Quality Evaluations
1 page
Speech Coding: Fundamentals and Applications: ARK Asegawa Ohnson
No ratings yet
Speech Coding: Fundamentals and Applications: ARK Asegawa Ohnson
20 pages
Speech Coding Journal
No ratings yet
Speech Coding Journal
20 pages
Speech Coding: Fundamentals and Applications: ARK Asegawa Ohnson
No ratings yet
Speech Coding: Fundamentals and Applications: ARK Asegawa Ohnson
20 pages
Speech and Audio Processing: Lecture-3
No ratings yet
Speech and Audio Processing: Lecture-3
20 pages
Research Paper
No ratings yet
Research Paper
5 pages
Comparative Analysis of Speech Compression Algorithms With Perceptual and LP Based Quality Evaluations
No ratings yet
Comparative Analysis of Speech Compression Algorithms With Perceptual and LP Based Quality Evaluations
5 pages
Human Speech Producing Organs: 2.4 Kbps
No ratings yet
Human Speech Producing Organs: 2.4 Kbps
108 pages
b18592958 PDF
No ratings yet
b18592958 PDF
104 pages
Speech and Audio Coding
No ratings yet
Speech and Audio Coding
16 pages
4: Speech Compression: Data Rates
No ratings yet
4: Speech Compression: Data Rates
14 pages
MELP Low Bit Rate Speech Coding Algorithm
No ratings yet
MELP Low Bit Rate Speech Coding Algorithm
5 pages
MMC Unit III-1
No ratings yet
MMC Unit III-1
122 pages
Unit2 1
No ratings yet
Unit2 1
23 pages
Fed STD 1016
No ratings yet
Fed STD 1016
24 pages
A Float Model of CELP Speech Codec: Presented by Neda Kazemian
No ratings yet
A Float Model of CELP Speech Codec: Presented by Neda Kazemian
32 pages
Audio Compression
No ratings yet
Audio Compression
81 pages
Unit 2 Wireless
No ratings yet
Unit 2 Wireless
159 pages
Dokumen - Tips Elec9344speech Audio Processing 4pdfspeech Signal For Digital Storage or Transmission
No ratings yet
Dokumen - Tips Elec9344speech Audio Processing 4pdfspeech Signal For Digital Storage or Transmission
87 pages
Nice
No ratings yet
Nice
15 pages
Regular-Pulse Excitation-A Novel Approach To Effective and Efficient Multipulse Coding of Speech
No ratings yet
Regular-Pulse Excitation-A Novel Approach To Effective and Efficient Multipulse Coding of Speech
10 pages
Lecture 16
No ratings yet
Lecture 16
23 pages
Elec 464 Lab Exp 5
No ratings yet
Elec 464 Lab Exp 5
2 pages
Regular Pulse Excitation
No ratings yet
Regular Pulse Excitation
10 pages
DAP Speech Coding v3 2025
No ratings yet
DAP Speech Coding v3 2025
49 pages
Human Speech Communication
No ratings yet
Human Speech Communication
44 pages
Speech Coding Techniques
No ratings yet
Speech Coding Techniques
38 pages
Speech Coding: Before You Start..
No ratings yet
Speech Coding: Before You Start..
5 pages
Speech Coding
100% (3)
Speech Coding
36 pages
ch5.3 (Vocoders)
No ratings yet
ch5.3 (Vocoders)
23 pages
4 Chapter Audio and Video Compression
No ratings yet
4 Chapter Audio and Video Compression
122 pages
LPC, Which Has Mathematically Tractable and Well-Understood Model. This Model Is
No ratings yet
LPC, Which Has Mathematically Tractable and Well-Understood Model. This Model Is
14 pages
Procedia: Speech Coding Techniques
No ratings yet
Procedia: Speech Coding Techniques
11 pages
UNIT - 4 - Speech Coding in GSM
No ratings yet
UNIT - 4 - Speech Coding in GSM
13 pages
RELP
No ratings yet
RELP
13 pages
AN2197 - Implementing The Levinson-Durbin Algorithm On The StarCore SC140 - SC1400 Cores
No ratings yet
AN2197 - Implementing The Levinson-Durbin Algorithm On The StarCore SC140 - SC1400 Cores
24 pages
Speech Compression
No ratings yet
Speech Compression
14 pages
2720 Slides7
No ratings yet
2720 Slides7
18 pages
Digital Speech Processing
No ratings yet
Digital Speech Processing
18 pages
Analysis of The Gate-Source/Drain Capacitance Behavior of A Narrow-Channel FD SOI NMOS Device Considering The 3-D Fringing Capacitances Using 3-D Simulation
No ratings yet
Analysis of The Gate-Source/Drain Capacitance Behavior of A Narrow-Channel FD SOI NMOS Device Considering The 3-D Fringing Capacitances Using 3-D Simulation
5 pages
Introduction To Linear Programming Sau
No ratings yet
Introduction To Linear Programming Sau
42 pages
Design Basis: CE 315-Design of Concrete Structure - I Instructor: Dr. E. R. Latifee
No ratings yet
Design Basis: CE 315-Design of Concrete Structure - I Instructor: Dr. E. R. Latifee
2 pages
CPC Modes of Servive Esummon
No ratings yet
CPC Modes of Servive Esummon
12 pages
ETF Report
No ratings yet
ETF Report
4 pages
DOLE Advisory No - 3 - 09
No ratings yet
DOLE Advisory No - 3 - 09
4 pages
ADSL Application Form
No ratings yet
ADSL Application Form
6 pages
Macronix MX25L12855FXCI 10G Datasheet
No ratings yet
Macronix MX25L12855FXCI 10G Datasheet
15 pages
Cold Working of Metals 2997
No ratings yet
Cold Working of Metals 2997
7 pages
Vero, Krishia Ann G. (DRRR Week #2)
No ratings yet
Vero, Krishia Ann G. (DRRR Week #2)
3 pages
Plate # 2 Primary and Secondary Batteries
No ratings yet
Plate # 2 Primary and Secondary Batteries
14 pages
HW8-smoother Tuning DIAL
100% (1)
HW8-smoother Tuning DIAL
5 pages
Illycaffe: The Starbucks Threat: Marketing Strategy
No ratings yet
Illycaffe: The Starbucks Threat: Marketing Strategy
12 pages
Our Development Board: Product Details
No ratings yet
Our Development Board: Product Details
4 pages
Mobile Offshore Drill. Unit Safety Certificate (General) - 2020-02-26
No ratings yet
Mobile Offshore Drill. Unit Safety Certificate (General) - 2020-02-26
5 pages
Engineering Foundation 2020-2021
No ratings yet
Engineering Foundation 2020-2021
5 pages
Module 1: Short Questions
No ratings yet
Module 1: Short Questions
1 page
(Maria Brouwer) Governance and Innovation (Routled
No ratings yet
(Maria Brouwer) Governance and Innovation (Routled
260 pages
Grating and Expanded Metal Catalog
No ratings yet
Grating and Expanded Metal Catalog
118 pages
CAO 7-2022 Reference Materials
No ratings yet
CAO 7-2022 Reference Materials
1 page
CV Ognjanovic
No ratings yet
CV Ognjanovic
23 pages
Chalukya Exp Second Ac (2A) : Electronic Reservation Slip (ERS)
No ratings yet
Chalukya Exp Second Ac (2A) : Electronic Reservation Slip (ERS)
3 pages
Mechanical Module 06
No ratings yet
Mechanical Module 06
14 pages
Assessment 613 Full Resubmission PDF
No ratings yet
Assessment 613 Full Resubmission PDF
32 pages
223 Dak 17 DRG Cul Misc GW Typ 01
No ratings yet
223 Dak 17 DRG Cul Misc GW Typ 01
2 pages
Banquet Personnel
No ratings yet
Banquet Personnel
21 pages
Maritime Sewip Datasheet
No ratings yet
Maritime Sewip Datasheet
2 pages

CSPL 392

Uploaded by

CSPL 392

Uploaded by

EECS 651- Source Coding Theory

Design of a CELP Speech Coder and

+ operations/sample. The VQ part of the CELP coder will require 3M

You might also like