Pitch Extraction
Pitch Extraction
For each value of delay, computation is made over an The effect of center-clipping and infinit-peak-clipping is
integrating window of N samples. To generate the entire range clearly shown in the Fig. 1 (a, b, c). From Fig. 1 (b), after
of delays, the window is “cross differenced” with the full center-clipping, the autocorrelation only leave several pulse
analysis interval. An advantage of this method is that the that show the reduction of the confused secondory peak. From
relative sizes of the nulls tend to remain constant as a function Fig. 1 (c), the first peak is very clear. Also the secondary
of delay. This is because there is always full overlap of data peak
between the two segments being cross differenced.
Also to reduce the effects of the formant structure on the value is reduced. All of these show that the center-clipping
detailed shape of the short-time autocorrelation function, the and infinite-clipping is effective in reduce the effects of the
nonlinear processing is usually used in pitch tracking. formant structure [2,3,5].
()
Φ2 Ni
=
180N 3
(N − 1)(N + 2)(N + 3)
[
( Ni )2 − ( )+ ]
i
N
N −1
6×N
Calculate CL
12
()
Φ3 i =
N
2800N 5
(N − 1)(N − 2)(N + 2)(N + 3)(N + 4) Center-Clipping (C) and infinite-
peak-clipping (V)
( ) 3 − 23 (Ni ) 2 + 6N10−×3NN +2 (Ni ) − (N20−1×)(NN −2)
• Ni
3
2 2
Pitch Value
approximated as (13):
Where
Fig 2. Block Diagram of Pitch Detection Algorithm using Modified
N Autocorrelation Method
1 i i
aj =
N +1 ∑
i =0
f ( )×Φj( )
N N 3.2 AMDF
We only implement a coarse quantization. We leave the
The reconstructed pitch contour will not lose much
voice/unvoiced detection and the decision logic as the further
information since orthogonol polynomials up to degree of
work. Fig. 3 shows a block diagram of the AMDF pitch
three are used to fit it. [7,8]
detector. The speech signal, is initially sampled at 10 kHz.
Then the signal pass a low-pass filter (0-900 Hz) and set the 4.2 Autocorreltion and AMDF on Continuous Speech
first 20 samples to be zero. The clipping threshold is then To observe the difference between AMDF and
calculated and the center-clipping is done on the signal. Then Autocorrelation method, we test both of them through a Thai
average magnitude difference function is computed on the continuous digit “07229”, which is shown in Fig. 5. The pitch
center-clipped speech signal at the lag (20—140 samples) is shown in Fig. 7. From the figure, the pitch information
through the signal from 20 to 160 samples. The pitch period is mainly lies on the voiced part in the speech signal. In the
identified as the value of the lag which the minimum AMDF silence part of the pitch is shown as the big variation. In the
occurs. Thus a fairly coarse quantization is obtained for the voiced part the pitch tracking show continuously and
pitch period. smoothly. Then the voiced/unvoiced decision is proved to be a
very important part of pitch detection. Also although the pitch
Segment track shown in Fig. 6 can describe the trend of the pitch, it still
exists some error points which need the further processing,
LPF (0 – 900 Hz) that we say, smoothing. Also in Fig. 7, it shows both results
for Autocorrelation method and AMDF. We can see that both
methods can give us accepted result.
Clipping threshold CL
AMDF
From table 1, we can see the lose of accuracy mainly lies in 5 REFERENCES
the confusion between 1st tone and 4th tone. The reason for
this result may lie in the 5-Thai-tone contour which is shown [1]. L. R. Rabiner, M. J. Cheng, A. E. Rosenberg, and C. A.
in Fig. 15 and the effects of continuous speech. McGonegal. “A comparative performance study of several
pitch detection algorithms”. IEEE Transactions on Audio,
Signal, and Speech Processing 24, 399-417 1976.
[2]. M. M. Sondhi, “New methods of pitch extraction,” IEEE
Trans.Audio Electroacoust., vol. AU-16, pp. 262-266, June
1968.
[3]. Yi Kechu, Tian Fu, Fu Qiang, “YU YIN XIN HAO CHU
LI”, China Machine Press, BeiJing, 2000
[4]. M. J. Ross, H. L. Shaffer, A. Cohen, R. Freudberg, and H.
J.Manley, “Average magnitude difference function pitch
extractor,” IEEE Trans. Acoust., Speech, Signal Processing,
Fig. 14 Average F0 contours of the five Thai tones produced vol. ASSP-22, pp. 353-362, Oct. 1974.
in isolation (adapted from [9]) [5]. Lawrence R. Rabiner, "On the Use of Autocorrelation
Analysis for Pitch Detection" IEEE Trans. Acoust, Speech,
From here, we can see the initial level of tone 1 and tone 4 is Signal Processing, VOL. ASSP-25, NO. 1, 1977
similar. Also because of the continuous effect of speech, the [6]. X. Huang and A. Acero, H. Hon,” Spoken Language
trend of tone can’t meet the final level for tone 4 and it let the Processing: A Guide to Theory, Algorithm, and System
tone 1 end in a higher level than the isolated case. Also here Development”, Prentice Hall, 2001
only 4 feature is used in classification. So it’s possible the
accuracy will be improved if more feature is added. [7]. S.-H. Chen and Y.-R.Wang. “Vector quantization of
pitch information in Mandarin speech”. IEEE Transactions on
5. CONCLUSIONS Communications 38(9), 1317-1320 1990.
[8]. C. Wang, “Prosodic Modeling for Improved Speech
The work that we described here is the two pitch detection Recognition and Understanding”, Ph.D. dissertation, MIT,
algorithms and the related techniques including preprocessing June 2001
post-processing and extraction of pitch pattern. According to [9]. S. Potisuk, M. P. Harper., and J. Gandour. “Classification
our observing of the experiments. We found that both of Thai tone sequences in syllable-segmented speech using the
autocorrelation method and AMDF algorithm can provide the analysisby-synthesis method”, IEEE Transactions on Speech
accepted results. Through the observing of preprocessing and Audio Processing, 7(1): 95-02,1999.