0% found this document useful (0 votes)
17 views70 pages

Sp'module 4.pdf'

Uploaded by

Manoj Naik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views70 pages

Sp'module 4.pdf'

Uploaded by

Manoj Naik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Module 4: The Cepstrum and

Homomorphic Speech Processing


• For discrete-time signals, a definition that captures the essential
features of the original definition is that thecepstrum of a signal is
the inverse discrete-time Fourier transform (IDTFT) of the logarithm
of the magnitude of the discrete-time Fourier transform (DTFT) of
the signal.
That is,
HOMOMORPHIC SYSTEMS FOR CONVOLUTION

• Oppenheim developed a new theory of systems that


was based on the mathematical theory of linear vector
spaces.
• The essence of this theory was that certain operations
of signal combination (convolution and multiplication
in particular) satisfy the same postulates as does
addition in the theory of linear vector spaces.
• From this observation, Oppenheim showed that
classes of non-linear systems could be defined on the
basis of a generalized principle of superposition. He
termed such systems homomorphic systems.
• Of particular importance for our present discussion is
the class of homomorphic systems for which the input
and output are combined by convolution.
• Ahomomorphic filter is simply a homomorphic system having the
property that one component (the desired component) passes
through the system essentially unaltered, while the other
component (the undesired component) is removed.
• In Eq. (8.5), for example, ifx2[n] were the undesirable component,
we would require that the output corresponding tox2[n] be a unit
sample, y2[n] = δ[n], while the output corresponding tox1[n] would
closely approximate x1[n] so that the output of the homomorphic
filter would bey[n] = x1[n] ∗ δ[n] = x1[n].
• This is entirely analogous to when a conventional linear system is
used to separate (filter) a desired signal from an additive
combination of the desired signal and noise. In this case the
desired result is that the output due to the noise is zero.
• Thus, the sequenceδ[n] plays the same role for convolution as is
played by the zero signal for additive combinations. Homomorphic
filters are of interest to us because our goal in speech processing
is to separate the convolved excitation and vocal tract
components of the speech model.
• An important aspect of the theory of homomorphic systems is that
any homomorphic system can be represented as a cascade of
three homomorphic systems, as depicted in Figure 8.3 for the case
of homomorphic systems for convolution.
The first system takes inputs combined by convolution and transforms them
into an additive combination of corresponding outputs.
The second system is a conventional linear system obeying the principle of
superposition as given in Eq. (8.3a). The third system is the inverse of the first
system; i.e., it transforms signals combined by addition back into signals
combined by convolution.
The importance of the existence of such a canonic form for homomorphic
systems is that the design of such systems reduces to the problem of the
design of the central linear system in Figure 8.3.
The systemD∗{·} is called the characteristic system for convolution and it is
fixed in the canonic form of Figure 8.3. Likewise, its inverse, called theinverse
characteristic system for convolution, and denoted D−1 ∗ {·}, is also a fixed
system.
• The characteristic system for convolution also obeys a
generalized principle of superposition where the input operation
is convolution and the output operation is ordinary addition.
The properties of the characteristic system are defined as
Representation by DTFTs
an appropriate definition of the complex logarithm is
Figure 8.7 illustrates the problem
that arises when one tries to
properly define the phase angle of
the DTFT. The principal value phase
has discontinuities of sizeπ2
because an angle in the complex
plane is always ambiguous to within
an integer multipleπ.ofThis
2
poses no problem for the complex
exponential
Minimum- and Maximum-Phase Signals
Homomorphic Analysis of the Speech Model
• Since the excitation and impulse response of a linear
time-invariant system are combined by convolution, the
problem of speech analysis can also be viewed as a
problem in separating the components of a convolution,
and therefore, homomorphic systems and the cepstrum
are useful tools for speech analysis.
• In the model of Figure 8.12, the pressure signal at the
lips,s[n], for a voiced section of speech is represented
as the convolution
s[n] = p[n] ∗ hV[n], .............(8.38a)
• where p[n] is the quasi-periodic voiced excitation signal,
• hV[n] represents the combined effect of the vocal tract
impulse responsev[n], the glottal pulse g[n], the radiation
load response at the lips, r[n], and the voiced gain, AV.
• The effective impulse response,hV[n], is itself the convolution
of g[n], v[n], and r[n], including scaling by the voiced section
gain control, AV; i.e.,
hV[n] = AV · g[n] ∗ v[n] ∗ r[n]........ (8.38b)
Homomorphic Analysis of the Model for Voiced Speech
Homomorphic Analysis of the Model for Unvoiced Speech
COMPUTING THE SHORT-TIME CEPSTRUM AND COMPLEX CEPSTRUM OF
SPEECH
The inverse characteristic system for convolution is needed for homomorphic
filtering
of speech. Following our approach above, we obtain this system from Figure
8.6 by simply replacing the DTFT operators by their corresponding DFT
computations.

Complex cepstrum involves the use of the complex logarithm and that the
cepstrum, as it has traditionally been defined, involves only the logarithm of
the magnitude of the Fourier transform; that is, the short-time cepstrum,c[n],
is given by
Computation Based on the z-Transform
HOMOMORPHIC FILTERING OF NATURAL SPEECH
• We are now in a position to apply the concepts of the
cepstrum and homomorphic filtering to a natural
speech signal.
• Recall that the model for speech production, as shown
in Figure 8.12, consists of a slowly time-varying linear
system excited by either a quasi-periodic impulse train
or by random noise.
• Thus, it is appropriate to think of a short segment of
voiced speech as having been taken from the steady-
state output of a linear time-invariant system excited
by a periodic impulse train.
• Similarly, a short segment of unvoiced speech can be
thought of as resulting from the excitation of a linear
time-invariant system by random noise.
• The purpose of this section is to demonstrate that
similar behavior results if short-time homomorphic
analysis methods are employed with natural speech
inputs.
A Model for Short-Time Cepstral Analysis of
Speech
• over the length(L) of the window, the speech signal s[n]
satisfies the convolution equation
For unvoiced speech, no such periodicity occurs in the logarithm of the DTFT of the
windowed unvoiced signal, and therefore no cepstral peaks occur.
Voiced Speech Analysis Using the DFT
• Figure 8.31, which shows a segment of speech selected by the window,
w[n], with the complex cepstrum computed of the input is selected by
what might be termed a “cepstrum window,” denoted l[n]. This type of
filtering is appropriately called “frequency-invariant linear filtering”
since multiplying the complex cepstrum l[n]by corresponds to
convolving its DTFT, L(e jω), X(eˆ jω), as in
with the complex logarithm,
Unvoiced Speech Analysis Using the DFT

• To complete the illustration of homomorphic analysis of


natural speech, consider the example of unvoiced
speech given in Figure 8.35. Figure 8.35a shows a
waveform segment of the fricative /SH/ multiplied by a
401-point Hamming window. The rapidly varying curve
plotted with the thin line in Figure 8.35b is the
corresponding log magnitude function X(e log |jω)|.
Figure 8.35c shows the corresponding cepstrum c[n].
CEPSTRUM ANALYSIS OF ALL-POLE
MODELS
CEPSTRUM DISTANCE MEASURES
Mel-Frequency Cepstrum
Coefficients

You might also like