0% found this document useful (0 votes)
34 views5 pages

General Method of Minimum Cross-Entropy Spectral Estimation

Uploaded by

nailgrp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views5 pages

General Method of Minimum Cross-Entropy Spectral Estimation

Uploaded by

nailgrp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

748 TRANSACTIONS

IEEE ON ACOUSTICS,
SPEECH,
SIGNAL
ANDPROCESSING, VOL. ASSP-33, NO. 3, JUNE 1985

spectral densities. Examplesare worked out using this approach aswell


as Shore’s [5] minimum cross-entropy spectral analysis (MCESA) and
Burg’s [7]maximum entropy spectral analysis (MESA). The results
show that aside from its wider applicability, ease of implementation,
and pedagogical advantage, the method performs better than MESA
and equally well or better than MCESA.

INTRODUCTION
The minimization of the cross-entropy functional

has been repeatedly used in the past as a method for estimating an


unknown probability density function (pdf) q(x) when some guess
p ( x ) can be made about this pdf and constraints regarding the un-
known pdf are available in an expected value form. First proposed
by Kullback [l] as the principle of minimum directed divergence
or minimum discrimination information, this principle was later
reinforced by Shore and Johnson [2]. It has been shown to be a
generalization of the principle of maximum entropy (MEP)[3], [4].
When the “guess” or prior density is uniform, cross-entropy min-
imization reduces to entropy maximization.
MEP and (1) have been applied to spectral estimation by Burg
[7] and Shore [5],[6],respectively. These methods are assumed
known and will not be reviewed here.
UP FINAL VAI.1L.E CIF (7 3067. 1 1

THE PROPOSED
METHOD
REFERENCES We now present a new approach to the estimationof power spec-
tra using the minimum cross-entropy principle. This approach does
[l] N. Szabo and R. Tanaka, Residue Arithmetic and Its Applications to not assume any specific pdf structure for the random process in-
Computer Technology. NewYork: McGraw-Hill,1967. volved. What it does requireis that spectral densities must be treat-
[2] C. Huang and F. Taylor, “A floating point residue arithmet.ic unity,” J. able as pdf‘s, and we take this issue up immediately below.
Franklin Znst.. Jan. 1981.
The idea that the nonnegative power spectrum can benormalized
[3] B. D. Tseng, G. A. Jullien, and W. C. Miller, “Implementation of FFT
structures using the residue number system,” ZEEE Trans. Comput., to have an area of one and thatit can therefore bevisualized as the
Nov.1979. pdf of a random variable w (frequency), has been circulating for
[4] W. Jenkins and B. Leon, “The use of residue arithmetic in the design some time [9], [lo]. In [ll] is established that the Fourier transform
of FIR’S,” ZEEE Trans. Circuits Syst., Apr. 1977. pair R(T) tf S(w) has the same properties asthose of the pair char-
[5] W. Jenkins, “ A highly efficient residue combinational architecture for acteristic function++ pdf of a random variable (r.v.) differing only
digital filters,” Proc. ZEEE, June 1978. by a scaling factor.
[6] F. Taylor, “Computer aided design and analysis of standard IIR archi- Further justification for the above issue is provided by the fol-
tectures, Part 111,” ZEEE Circuits and Syst. Mag., June 1982. lowing.
[7] L. Cooper, Introduction and Methods of Optimization. Philadelphia, A known problem in signal theory [ll] is that of forming the real
PA: Saunders, 1970.
[8] F. Taylor, Digital Filter Design Handbook. NewYork:Marcel-Dek- process
ker, 1984. +
X(t) = u cos (at $9) (2)
where (o is a given r.v. whose characteristic function obeys ( ~ ( 1 )=
(o(2) = 0 and w an r.v. independent of (o with an even density func-
A General Method of Minimum Cross-Entropy tion flu), and proving that X ( t ) is wide sense stationary and has
Spectral Estimation power spectrum S(w) = xu*f(w). The crucial point to be made here
is that the power spectrum is identical to an underlying pdf of the
MICHAEL A. TZANNES,DIMITRISPOLITIS, AND random process (multiplied by a constant).
N. S. TZANNES The above example is of theoretical interest only in our spectral
estimation problem because of the severe restriction that the ran-
dom process must be of the form (2), obviously not the general
Abstmct-Anew, general approach to estimating power spectra is case.
presented, based on minimizing a cross-entropy functional involving We now extend the above to include the general caseby showing
that the normalized spectral densityof a wide sense stationary ran-
dom process is equivalentto the pdf of its instantaneous frequency
Manuscript received December 2, 1983; revised July 26, 1984. A first (multiplied by a constant).
version of this paper was presented at the Measurement and Control Con-
ference (MECO), August 29-September 2, 1983, Athens, Greece. We support that every bounded stochastic process (which is the
M. A. Tzannes is with the Department of Electrical Engineering and practical case) can be regarded as the phase modulation of another
Computer Science, Universityof Michigan, Ann Arbor MI. fullydeterminableprocess.Thiscanbeunderstoodas follows.
D. Politis is with the Departmentof Electrical Engineering, Rennselaer Every bounded deterministic signal x ( t ) is related in a one-to-one
Polytechnic Institute, Troy, NY. manner to another bounded signal (o(t)through the transformation
N. S. Tzannes is with the Department of Electrical Engineering, Uni-
versity of Virginia, Charlottesville, VA 22901. x(t) = A sin ~ ( t ) (Ix(t)l 5 A , I(o(t)l i d 2 , V t ) . (3)

0096/3518/85/0600-0748$01.000 1985 IEEE


IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-33, NO. 3, JUNE 1985 749

Correspondingly, the stochastic processX(t) constituted of many


realizations x , @ ) , x2(t), * * ,x&), has a one-to-one relation with
the stochastic process +(t) which is made up of cpl(t),cp2(t), pN(t),
where
XI@) = A sin (oI(~), x2(t) = A sin (oz(t), * * * ,xdt) = A sin cpdt).

Due to the one-to-one correspondenceof all their members, the FREPUENCIES


two random processes X ( t ) and +(t)correspond one to one through
Fig. 1. Original spectrum for the first example.
the transformation
X(t) = A sin +(t). (5) imization leads to the solution
N
Now that (5) has been established, we can make use of a theorem
known as the quasi-static approximation [15] or Woodword's theo- qo) = s(w) exp [-I - c hk cos ok].
k=O
(11)
rem [ 161. This theorem is usually encountered in the literature as
a tool for FM analysis and thus usually involves a carrier frequency The values of the Lagrangian multipliers are determined by the
coo, a modulation index k , and a random initial angle a. The case known autocorrelation function values in (10).
dealt with here uses wo = 0,k = 1, a = 0. It should be noted that when the prior spectral estimate S(o) is
The quasi-static approximation states that given the phase mod- uniform (white noise) then the method reverts to maximizing (7),
ulated random signal X ( t ) = A sin +(t),its instantaneous frequency and therefore it becomes the SMEM or MEM2 approach to spectral
Q(t) = (d/dt)+(t)and the first-order pdf Pra(Q)of Q(t), which is estimation. Furthermore, the normalizing condition imposed above
convenient to assume as time invariant andan even function of Q , to make S(w) and T(w) integrate to unity is not.essentia1. All that
the power spectral density of X(t) is is needed is that both integrate out to the same value.
EXAMPLES
s&) = 7rA2Pra("). (6)
In this section, we present three examples using the proposed
Equation (6) is the result we need. It shows that the normalized method and MESA and MCESA. The first two examples are similar
power spectrum of any wide sense stationary random process is to two of the examples in[6]. The third example is new and it pro-
equivalent to the first-order pdf of an underlying random process, videsacomparisontothefourthentropymethod,SMEM,or
namely, the instantaneous frequency process of X@). MEM2.
We have therefore established a theoretical andphysical basis for
the interpretation of normalized spectral power densities as pdf's, Example 1
and maintain that they can be so treated in spectral analysis prob-
lems. Let it also be noted thatthey have been treated so in spectral A known true spectrum Tk is assumed as shown in Fig. 1. It is
estimationalready.Manyinvestigators(see [9], [12]-[ 141) have made up of a background noise term that approximates llfnoise
suggested an alternate MESA approach (called SMEM or MEM2) and a fixed sinusoid which is the signal. Thus, the n o i b is
under which the functional
--
f k -
0.01 G'"'
k = 1, ,50
h
H = SS(f) 1% W)df (7)
for 50 equallyspaced frequencies& = (0.005, 0.015, * , 0.495) -
is maximized [rather than the Burg's integralof log S(f)], a func- between 0 and 0.5 (which is the Nyquist frequency; we take the
tional that makes sense only in light of our present discussion. spacing between autocorrelation lags to be unity).
Now we are ready for the presentation of a new MCESA algo- The signal term is
rithm.
Let us assume that T(o)is the desired spectrum and S(w) is a
prior guess (estimate) of T(w).
To find the optimum T(w) given a prior guess S(o), normalize
S(w) to have an area of one and minimize the cross-entropy func- The autocorrelation function values for lags 7 = 0,1, 2, 3, 4,5,
tional are: R(0) = 15.7511,R(l) = 11.6149, R(2) = 7.8699, R(3) = 4.5411,
R(4) = 2.0145, and R(5) = 1.1413.
As a prior spectrum estimate for the two cross-entropy methods
we took
which is exactly expression (1) with T(o),S(w) treated as the pdf's
q ( x ) and p(x), respectively.
The minimization constraints are
This obviously provides us with a guess that has the same shape as
the background noise term.
Using a Newton-Raphson algorithm we arrive at the posterior
which in a noncomplex form are estimates given by the three methods which are shown in Figs. 2,
*- 3, and 4. Note the different values on the vertical coordinate which
I has to do with the normalizationinvolved in the proposed method.
J,
2R(o)R(7) = T(w) cos (UT)do, Bothcross-entropymethodsobviouslyperformbetter, but this
7 = 0, 1, 2, 3, * ,N. - is to be expected due to the existence
is interesting how much better the
of
proposed
the prior
methodper-
guess Sk. It

This is just the autocorrelation function-power spectrum Fourier forms, giving a very keen indication of the existence of the sinu-
transform pair, with the 2R(O) factor serving normalization pur- soid.
poses.
To find T(w)so that it is minimized and(10) is satisfied, standard Example 2
techniques from the calculus of variations are used. We introduce In this example, we consider that the spectral powers are at the
the Lagrangian multipliers hk to the constraints (9) and the min- same frequency range as before,and that the background noise term
750 IEEETRANSACTIONSONACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL.ASSP-33,NO.3,JUNE 1985

0.90 1 2 1.25
1.00
2 0.75
6 0.50
0.25
0.00
0.00 0.20 0.40 0.50
FREQUENCIES
Fig. 5. Original spectrum for the second example.

Fig. 2. MESA posterior estimate for the first example.

FREQUENCY
Fig. 6. MESA posterior estimate for the second example.

FREQUENCY
Fig. 3. MCESA posterior estimate for the first example.

0.95

m
=
w
0
a
2 0.48
+
"
w FREQUENCY
a
v,
Fig. 7. MCESA posterior estimate for the second example.

This time asa prior spectrum estimate we took

FREQUENCY
Fig. 4. Posterior estimate obtained using the proposed method (first
ample).
ex-
S+k =
i 1.07
0.07
which is again of the same shape as GfL.
The posterior spectrum estimates
(fk = 0.215)

elsewhere

produced by the three methods


(17)

is white noise plus a single fixed sinusoid atfk = 0.215 are shown in Figs. 6, 7, and 8. Again, the cross-entropy methods
perform obviously better than theclassical MESA. MCESA and the
1.05 (fk = 0.215) proposed method seem to lead to similar results.
0.05 elsewhere. Example 3
The signal term is another.fixed sinusoid In this example, we take as background noise

In (0 elsewhere. The signal term consists of two sinusoids at fixed frequencies


The original spectrum is shown in Fig. 5.
For the same lagsas in the previousexample the autocorrelations
are: R(0) = 9.000, R(l) = 1.4544, R(2) = -2.7732, R(3) =
-3.2248, R(4) = 0.2032, and R(5) = 2.6900.
IEEE TRANSACTIONS ON ACOUSTICS,SPEECH, AND SIGNALPROCESSING, VOL. ASSP-33,NO. 3, JUNE 1985 75 1

1.47

w
Y
0
a
2
E 0.74
Y
a
Lo

0 25 50
0 25 50
FREQUENCY FREQUENCY
Fig. 8. Posterior estimate using the proposed method (second example). Fig. 11. MCESA posterior estimate for the third example.

0.08-

CL
w
3
0
a
2
0 0.20 0.40 0.50 4
pi 0.04.
FREQUENCY 5
w
n.
Fig. 9. Original spectrum for the third example. Lo

FREQUENCY
Fig. 12. Pgsterior estimate using the proposed method (third example).

0.85

M
w
50 0 25 za
FREQUENCY 2 0.43
Fig. 10. MESA posterior estimate for the third example. co
w
a
Lo

The
spectrum of thesum shown
is Fig.
in 9. The
autocorrelation
values
for
the
same
lags
before
as are:
R(0) = 19.7511, R(1) = 9.3745, R(2) = 1.5708,R(3) = 5.1321,
R(4) = 5.0796, and R(5) = 2.9573. 0
Taking the same prior spectrum estimate as in (14) we arrive at FREQUENCY
the posterior estimates shown in Figs.10, 11, and 12. Fig. 13 shows
Fig. 13. SMEM or MEM2 posterior estimate for the third example.
us the performance of the SMEM of MEM2 method. Here the su-
periority of the proposed method over all the existing methods is
obvious. None of the other methods are able to resolve the two
sinusoids. MCESA, as propose4 by Shore, suffers from restrictions on the
nature of the process [5, equation (7)] and possible dependence on
CONCLUSIONS the form of the assumed prior pdf [5, equation (8)]. These can be
Since both MCESA and our method appear to perform better removed ifit isviewed as a subcaseof Itakura-Saito distortion mea-
than MESA, we limit our conclusions to comparisons between them. sure minimization [ 5 ] , but then it loses the meaning of a cross-
The proposed method has no restrictions on its applicability and entropy method.
enjoys a pedagogical advantage as it is a simple extension of the Both methods need a prior guess at the spectrum for their im-
original minimum cross-entropy principle to spectral densities. Its plementation. If this spectrum is assumed white, Shore’s method
solution is the same as that for pdf‘s, and no new algorithms are leads to Burg’s MESA, and ours to the second form of maximum
needed for its implementation. It appears to performequally as well entropy spectral analysis [based on (7)] referred to as MEM2. Of
as Shore’s in white noise and in the presenceof llf noise definitely course, MCESA does so under restrictions unless it is viewed in
better. terms of the Itakura-Saito distortion measure mentioned above.
752 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH,ANDSIGNAL PROCESSING,VOL.ASSP-33, NO. 3, JUNE 1985

REFERENCES two
found
between
scores
bilitv
were
the voice vrocessors: how-
ever: the situation was different for human speaker&recognition.The
[l] S; Kullback, InformationTheory and Statistics. New York:Wiley,
1959. test procedure developed to measure speaker recognitionperfor-
[2] J. E. Shore and R. W. Johnson, “Axiomatic derivation of the principle mance, and the results obtained, form the subject of this corre-
ofmaximum entropy and the principle ofminimum cross-entropy,” spondence.
ZEEE Trans. Inform. Theory, vol. IT-26,pp.26-37,Jan.1980.
[3] E. T. Jaynes, “Information theory and statistical mechanics I , ” Phys. SPEAKER RECOGNITION TESTING
Rev., vol.106,pp.620-630,1957. There are a considerable number of laboratory studies that have
[4] -, “Priorprobabilities,” IEEE Trans. Syst. Sci. Cybern., vol. SSC- investigated different aspects of human speaker recognition. One
4, pp. 227-241, 1968. such study [2] attempted to determine the number and nature of the
[5] J. E. Shore, “Minimum cross-entropy spectral analysis,” Naval Res. basic ways in which voices are perceived to differ from each other
Lab., Washington, DC, NRL Memo. Rep. 3921, Jan. 1979.
[6] -, “Minimumcross-entropyspectral analysis,” IEEE Trans. by a typical listener. Compton [3j investigated the effects of filter-
Acoust., Speech, Signal Processing, vol. ASSP-29, pp. 230-237, Apr. ing and vocal duration on the identification of speakers. Bricker and
1981. Pruzansky [4] determined the influence of the duration and content
[7] J. P.Burg,“Maximumentropyspectral analysis,” presented at the of speech samples on talker identification. However, test procedures
37th Annu. Meet. SOC.Exploration Geophys., Oklahoma City, OK, giving some quantitative measure of speaker recognition perfor-
1967. mance of voice processors were not readily available. A simple ex-
[8] -, “Maximum entropy spectral analysis,” PhD. dissertation, Stan- perimental procedure was developedin-houserequiring no addi-
ford Univ., Stanford, CA, 1978 (Univ. Microfilms No. 72-25, 499). tional equipment besides the voice processors to be tested.
[9] N. S. Tzannes and T. G. Avgeris, “A new approach to the estimation The test setup involved connecting two voice processors of the
of continuous spectra,” Kybernetes, vol. 10, pp. 123-133,1981.
[lo] D. Middleton, An Introduction to StatisticalCommunications The- same type back to back, enabling them to communicate with each
ory. New York: McGraw-Hill,1960, sect. 3.2-3. other in the two-way mode. (Because of the digital dataformat and
[ I l l A . Papoulis, Probability,Random Variables and StochasticPro- the relatively high bit-error rate toleranceof LPC voice processors,
cesses. NewYork: McGraw-Hill,1965, ch. 10. the length of the test transmission path is not critical;back to back
1121B. R. Frieden, “Restoring with maximum likelihood and maximum testing is generally representativeof long distance communication.)
entropy,” J. Opt. Soc. Amer., vol. 62, pp. 511-518, Apr. 1972. Both voice processors A and B were equipped with their own tele-
[I31 S. F. Gull and C. J. Daniell, “Image reconstruction from incomplete phone handsets incorporating high-quality dynamic microphones.
and noisy data,” Nature, vol. 272, pp. 686-690, 1978.
The handsets were used for communicating between speaker and
[I41 N. L. Wu, “An explicit solution and data extension in the maximum
entropy method,” IEEE Trans. Acoust., Speech, Signal Processing, listener during the tests, simulating actual use.
vol. ASSP-31, pp. 486-491, Apr. 1983. Most potential users of voice processors are accustomed to tele-
[15] P.Z. Peebles, Jr., Communication System Principles. Reading, MA: phone quality voice, either through the telephone system itself, or
Addison-Wesley,1976, sect. 6.6. high data rate voice systems.The low voice-processordatarate
[I61 A. Papoulis, “Random modulation, A review,” IEEE Trans. Acoust., (2400 bitsis), however. does not provide full telephone fidelity. One
Speech, Signal Processing, vol. ASSP-31, pp. 96-105, Feb. 1983. aim of our work was to determine to what extent existing speaker
recognition skills can be adapted to the LPC voice-processor sys-
tem; another aim was to determine if there was a significant differ-
ence between the speakerrecognition performance of the two units.
Human Speaker Recognition Performance of LPC When conducting speaker recognition tests, it is desirable to use
Voice Processors a team of co-workers as the test subjects, since their existing tele-
phone speaker recognition skills form a welldeveloped baseline.
Z. UZDY The test team consistedof seven subjects, divided into one listener
and a speaker poolof six. On a rotating basis, each subject assumed
the role of listener so that seven individual tests comprised a test
Abstract-Immediate identification of speakers’ voices can be highly session. The tests were carried out with the listener isolated from
important to efficient communication incertain applications. This cor- the speakers.
respondence describes an experimental investigation to determine the The test material consisted of six conversational utterances. The
human speaker recognition performance of LPC voice processors. A firstutteranceconsisted of asingleword,andeachsucceeding
small group of coworkers were usedas thetest subjects. The test results phrase was longer by one word than the preceding. Presenting test
indicate the importance of high-frequency data bandwidth for speaker phrases in this manner provided speech material of increasing du-
recognition. ration, and using the same six phrases repeatedly avoided introduc-
ing semantic variables. Since identification is aided by increasing
speech duration, the phrases are applied in order of shortest to long-
INTRODUCTION
est. To avoid learning bias, the listening crew was trained on voice
An experimental investigation was carried out to determine the processor A until no improvement was apparent from further prac-
speech performance of currently available commercial voice pro- tice before comparisons were made. This procedure concurs with
cessorsusinglinearpredictivecoding(LPC).Theareasinvesti- that elaborated by Egan [5j on the importance of training inexpe-
gated were speech intelligibility and human speaker recognition. rienced listeners. The number of test sessions was limited to two
Twovoice processors were available for testing, designated by A per day to minimize fatigue and boredom(which can introduce bias
and B , and their specifications relevant to speech performance are by loweringscores); in addition, the 9 A . M . and 2 P.M. schedule
shown in Table I. ensured that both processors were subject to the same regular test
The well established diagnostic rhyme test (DRT) [ l ] was used regimen.
to evaluate the intelligibility performance under various operating
DISCUSSION OF RESULTS
conditions. No statistically significant differences in the intelligi-
The speaker recognition test scores with the six phrases as the
Manuscript received March 10, 1982;revised October 31, 1983, Septem- stimulus are shown in Fig. 1. The average scores for the test team
ber 10, 1984, and November 10, 1984. are shown by the bars; they are expressed as the probability of re-
The author is with the Aerospace Corporation, El Segundo, CA 90245. cognizing randomly selected speakers (from pool a of six members)

0096/3518/85/0600-0752$01.00 0 1985 IEEE

You might also like