General Method of Minimum Cross-Entropy Spectral Estimation
General Method of Minimum Cross-Entropy Spectral Estimation
IEEE ON ACOUSTICS,
SPEECH,
SIGNAL
ANDPROCESSING, VOL. ASSP-33, NO. 3, JUNE 1985
INTRODUCTION
The minimization of the cross-entropy functional
THE PROPOSED
METHOD
REFERENCES We now present a new approach to the estimationof power spec-
tra using the minimum cross-entropy principle. This approach does
[l] N. Szabo and R. Tanaka, Residue Arithmetic and Its Applications to not assume any specific pdf structure for the random process in-
Computer Technology. NewYork: McGraw-Hill,1967. volved. What it does requireis that spectral densities must be treat-
[2] C. Huang and F. Taylor, “A floating point residue arithmet.ic unity,” J. able as pdf‘s, and we take this issue up immediately below.
Franklin Znst.. Jan. 1981.
The idea that the nonnegative power spectrum can benormalized
[3] B. D. Tseng, G. A. Jullien, and W. C. Miller, “Implementation of FFT
structures using the residue number system,” ZEEE Trans. Comput., to have an area of one and thatit can therefore bevisualized as the
Nov.1979. pdf of a random variable w (frequency), has been circulating for
[4] W. Jenkins and B. Leon, “The use of residue arithmetic in the design some time [9], [lo]. In [ll] is established that the Fourier transform
of FIR’S,” ZEEE Trans. Circuits Syst., Apr. 1977. pair R(T) tf S(w) has the same properties asthose of the pair char-
[5] W. Jenkins, “ A highly efficient residue combinational architecture for acteristic function++ pdf of a random variable (r.v.) differing only
digital filters,” Proc. ZEEE, June 1978. by a scaling factor.
[6] F. Taylor, “Computer aided design and analysis of standard IIR archi- Further justification for the above issue is provided by the fol-
tectures, Part 111,” ZEEE Circuits and Syst. Mag., June 1982. lowing.
[7] L. Cooper, Introduction and Methods of Optimization. Philadelphia, A known problem in signal theory [ll] is that of forming the real
PA: Saunders, 1970.
[8] F. Taylor, Digital Filter Design Handbook. NewYork:Marcel-Dek- process
ker, 1984. +
X(t) = u cos (at $9) (2)
where (o is a given r.v. whose characteristic function obeys ( ~ ( 1 )=
(o(2) = 0 and w an r.v. independent of (o with an even density func-
A General Method of Minimum Cross-Entropy tion flu), and proving that X ( t ) is wide sense stationary and has
Spectral Estimation power spectrum S(w) = xu*f(w). The crucial point to be made here
is that the power spectrum is identical to an underlying pdf of the
MICHAEL A. TZANNES,DIMITRISPOLITIS, AND random process (multiplied by a constant).
N. S. TZANNES The above example is of theoretical interest only in our spectral
estimation problem because of the severe restriction that the ran-
dom process must be of the form (2), obviously not the general
Abstmct-Anew, general approach to estimating power spectra is case.
presented, based on minimizing a cross-entropy functional involving We now extend the above to include the general caseby showing
that the normalized spectral densityof a wide sense stationary ran-
dom process is equivalentto the pdf of its instantaneous frequency
Manuscript received December 2, 1983; revised July 26, 1984. A first (multiplied by a constant).
version of this paper was presented at the Measurement and Control Con-
ference (MECO), August 29-September 2, 1983, Athens, Greece. We support that every bounded stochastic process (which is the
M. A. Tzannes is with the Department of Electrical Engineering and practical case) can be regarded as the phase modulation of another
Computer Science, Universityof Michigan, Ann Arbor MI. fullydeterminableprocess.Thiscanbeunderstoodas follows.
D. Politis is with the Departmentof Electrical Engineering, Rennselaer Every bounded deterministic signal x ( t ) is related in a one-to-one
Polytechnic Institute, Troy, NY. manner to another bounded signal (o(t)through the transformation
N. S. Tzannes is with the Department of Electrical Engineering, Uni-
versity of Virginia, Charlottesville, VA 22901. x(t) = A sin ~ ( t ) (Ix(t)l 5 A , I(o(t)l i d 2 , V t ) . (3)
This is just the autocorrelation function-power spectrum Fourier forms, giving a very keen indication of the existence of the sinu-
transform pair, with the 2R(O) factor serving normalization pur- soid.
poses.
To find T(w)so that it is minimized and(10) is satisfied, standard Example 2
techniques from the calculus of variations are used. We introduce In this example, we consider that the spectral powers are at the
the Lagrangian multipliers hk to the constraints (9) and the min- same frequency range as before,and that the background noise term
750 IEEETRANSACTIONSONACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL.ASSP-33,NO.3,JUNE 1985
0.90 1 2 1.25
1.00
2 0.75
6 0.50
0.25
0.00
0.00 0.20 0.40 0.50
FREQUENCIES
Fig. 5. Original spectrum for the second example.
FREQUENCY
Fig. 6. MESA posterior estimate for the second example.
FREQUENCY
Fig. 3. MCESA posterior estimate for the first example.
0.95
m
=
w
0
a
2 0.48
+
"
w FREQUENCY
a
v,
Fig. 7. MCESA posterior estimate for the second example.
FREQUENCY
Fig. 4. Posterior estimate obtained using the proposed method (first
ample).
ex-
S+k =
i 1.07
0.07
which is again of the same shape as GfL.
The posterior spectrum estimates
(fk = 0.215)
elsewhere
is white noise plus a single fixed sinusoid atfk = 0.215 are shown in Figs. 6, 7, and 8. Again, the cross-entropy methods
perform obviously better than theclassical MESA. MCESA and the
1.05 (fk = 0.215) proposed method seem to lead to similar results.
0.05 elsewhere. Example 3
The signal term is another.fixed sinusoid In this example, we take as background noise
1.47
w
Y
0
a
2
E 0.74
Y
a
Lo
0 25 50
0 25 50
FREQUENCY FREQUENCY
Fig. 8. Posterior estimate using the proposed method (second example). Fig. 11. MCESA posterior estimate for the third example.
0.08-
CL
w
3
0
a
2
0 0.20 0.40 0.50 4
pi 0.04.
FREQUENCY 5
w
n.
Fig. 9. Original spectrum for the third example. Lo
FREQUENCY
Fig. 12. Pgsterior estimate using the proposed method (third example).
0.85
M
w
50 0 25 za
FREQUENCY 2 0.43
Fig. 10. MESA posterior estimate for the third example. co
w
a
Lo
The
spectrum of thesum shown
is Fig.
in 9. The
autocorrelation
values
for
the
same
lags
before
as are:
R(0) = 19.7511, R(1) = 9.3745, R(2) = 1.5708,R(3) = 5.1321,
R(4) = 5.0796, and R(5) = 2.9573. 0
Taking the same prior spectrum estimate as in (14) we arrive at FREQUENCY
the posterior estimates shown in Figs.10, 11, and 12. Fig. 13 shows
Fig. 13. SMEM or MEM2 posterior estimate for the third example.
us the performance of the SMEM of MEM2 method. Here the su-
periority of the proposed method over all the existing methods is
obvious. None of the other methods are able to resolve the two
sinusoids. MCESA, as propose4 by Shore, suffers from restrictions on the
nature of the process [5, equation (7)] and possible dependence on
CONCLUSIONS the form of the assumed prior pdf [5, equation (8)]. These can be
Since both MCESA and our method appear to perform better removed ifit isviewed as a subcaseof Itakura-Saito distortion mea-
than MESA, we limit our conclusions to comparisons between them. sure minimization [ 5 ] , but then it loses the meaning of a cross-
The proposed method has no restrictions on its applicability and entropy method.
enjoys a pedagogical advantage as it is a simple extension of the Both methods need a prior guess at the spectrum for their im-
original minimum cross-entropy principle to spectral densities. Its plementation. If this spectrum is assumed white, Shore’s method
solution is the same as that for pdf‘s, and no new algorithms are leads to Burg’s MESA, and ours to the second form of maximum
needed for its implementation. It appears to performequally as well entropy spectral analysis [based on (7)] referred to as MEM2. Of
as Shore’s in white noise and in the presenceof llf noise definitely course, MCESA does so under restrictions unless it is viewed in
better. terms of the Itakura-Saito distortion measure mentioned above.
752 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH,ANDSIGNAL PROCESSING,VOL.ASSP-33, NO. 3, JUNE 1985
REFERENCES two
found
between
scores
bilitv
were
the voice vrocessors: how-
ever: the situation was different for human speaker&recognition.The
[l] S; Kullback, InformationTheory and Statistics. New York:Wiley,
1959. test procedure developed to measure speaker recognitionperfor-
[2] J. E. Shore and R. W. Johnson, “Axiomatic derivation of the principle mance, and the results obtained, form the subject of this corre-
ofmaximum entropy and the principle ofminimum cross-entropy,” spondence.
ZEEE Trans. Inform. Theory, vol. IT-26,pp.26-37,Jan.1980.
[3] E. T. Jaynes, “Information theory and statistical mechanics I , ” Phys. SPEAKER RECOGNITION TESTING
Rev., vol.106,pp.620-630,1957. There are a considerable number of laboratory studies that have
[4] -, “Priorprobabilities,” IEEE Trans. Syst. Sci. Cybern., vol. SSC- investigated different aspects of human speaker recognition. One
4, pp. 227-241, 1968. such study [2] attempted to determine the number and nature of the
[5] J. E. Shore, “Minimum cross-entropy spectral analysis,” Naval Res. basic ways in which voices are perceived to differ from each other
Lab., Washington, DC, NRL Memo. Rep. 3921, Jan. 1979.
[6] -, “Minimumcross-entropyspectral analysis,” IEEE Trans. by a typical listener. Compton [3j investigated the effects of filter-
Acoust., Speech, Signal Processing, vol. ASSP-29, pp. 230-237, Apr. ing and vocal duration on the identification of speakers. Bricker and
1981. Pruzansky [4] determined the influence of the duration and content
[7] J. P.Burg,“Maximumentropyspectral analysis,” presented at the of speech samples on talker identification. However, test procedures
37th Annu. Meet. SOC.Exploration Geophys., Oklahoma City, OK, giving some quantitative measure of speaker recognition perfor-
1967. mance of voice processors were not readily available. A simple ex-
[8] -, “Maximum entropy spectral analysis,” PhD. dissertation, Stan- perimental procedure was developedin-houserequiring no addi-
ford Univ., Stanford, CA, 1978 (Univ. Microfilms No. 72-25, 499). tional equipment besides the voice processors to be tested.
[9] N. S. Tzannes and T. G. Avgeris, “A new approach to the estimation The test setup involved connecting two voice processors of the
of continuous spectra,” Kybernetes, vol. 10, pp. 123-133,1981.
[lo] D. Middleton, An Introduction to StatisticalCommunications The- same type back to back, enabling them to communicate with each
ory. New York: McGraw-Hill,1960, sect. 3.2-3. other in the two-way mode. (Because of the digital dataformat and
[ I l l A . Papoulis, Probability,Random Variables and StochasticPro- the relatively high bit-error rate toleranceof LPC voice processors,
cesses. NewYork: McGraw-Hill,1965, ch. 10. the length of the test transmission path is not critical;back to back
1121B. R. Frieden, “Restoring with maximum likelihood and maximum testing is generally representativeof long distance communication.)
entropy,” J. Opt. Soc. Amer., vol. 62, pp. 511-518, Apr. 1972. Both voice processors A and B were equipped with their own tele-
[I31 S. F. Gull and C. J. Daniell, “Image reconstruction from incomplete phone handsets incorporating high-quality dynamic microphones.
and noisy data,” Nature, vol. 272, pp. 686-690, 1978.
The handsets were used for communicating between speaker and
[I41 N. L. Wu, “An explicit solution and data extension in the maximum
entropy method,” IEEE Trans. Acoust., Speech, Signal Processing, listener during the tests, simulating actual use.
vol. ASSP-31, pp. 486-491, Apr. 1983. Most potential users of voice processors are accustomed to tele-
[15] P.Z. Peebles, Jr., Communication System Principles. Reading, MA: phone quality voice, either through the telephone system itself, or
Addison-Wesley,1976, sect. 6.6. high data rate voice systems.The low voice-processordatarate
[I61 A. Papoulis, “Random modulation, A review,” IEEE Trans. Acoust., (2400 bitsis), however. does not provide full telephone fidelity. One
Speech, Signal Processing, vol. ASSP-31, pp. 96-105, Feb. 1983. aim of our work was to determine to what extent existing speaker
recognition skills can be adapted to the LPC voice-processor sys-
tem; another aim was to determine if there was a significant differ-
ence between the speakerrecognition performance of the two units.
Human Speaker Recognition Performance of LPC When conducting speaker recognition tests, it is desirable to use
Voice Processors a team of co-workers as the test subjects, since their existing tele-
phone speaker recognition skills form a welldeveloped baseline.
Z. UZDY The test team consistedof seven subjects, divided into one listener
and a speaker poolof six. On a rotating basis, each subject assumed
the role of listener so that seven individual tests comprised a test
Abstract-Immediate identification of speakers’ voices can be highly session. The tests were carried out with the listener isolated from
important to efficient communication incertain applications. This cor- the speakers.
respondence describes an experimental investigation to determine the The test material consisted of six conversational utterances. The
human speaker recognition performance of LPC voice processors. A firstutteranceconsisted of asingleword,andeachsucceeding
small group of coworkers were usedas thetest subjects. The test results phrase was longer by one word than the preceding. Presenting test
indicate the importance of high-frequency data bandwidth for speaker phrases in this manner provided speech material of increasing du-
recognition. ration, and using the same six phrases repeatedly avoided introduc-
ing semantic variables. Since identification is aided by increasing
speech duration, the phrases are applied in order of shortest to long-
INTRODUCTION
est. To avoid learning bias, the listening crew was trained on voice
An experimental investigation was carried out to determine the processor A until no improvement was apparent from further prac-
speech performance of currently available commercial voice pro- tice before comparisons were made. This procedure concurs with
cessorsusinglinearpredictivecoding(LPC).Theareasinvesti- that elaborated by Egan [5j on the importance of training inexpe-
gated were speech intelligibility and human speaker recognition. rienced listeners. The number of test sessions was limited to two
Twovoice processors were available for testing, designated by A per day to minimize fatigue and boredom(which can introduce bias
and B , and their specifications relevant to speech performance are by loweringscores); in addition, the 9 A . M . and 2 P.M. schedule
shown in Table I. ensured that both processors were subject to the same regular test
The well established diagnostic rhyme test (DRT) [ l ] was used regimen.
to evaluate the intelligibility performance under various operating
DISCUSSION OF RESULTS
conditions. No statistically significant differences in the intelligi-
The speaker recognition test scores with the six phrases as the
Manuscript received March 10, 1982;revised October 31, 1983, Septem- stimulus are shown in Fig. 1. The average scores for the test team
ber 10, 1984, and November 10, 1984. are shown by the bars; they are expressed as the probability of re-
The author is with the Aerospace Corporation, El Segundo, CA 90245. cognizing randomly selected speakers (from pool a of six members)