0% found this document useful (0 votes)
40 views6 pages

Pitch Tracking - ACF - BABU ARUN KR.

pitch tracking docs

Uploaded by

Ansh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views6 pages

Pitch Tracking - ACF - BABU ARUN KR.

pitch tracking docs

Uploaded by

Ansh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 6

PITCH TRACKING USING AUTO CO-RRELATION(A.C.

F)
Name Of College-Manoharbhai Patel Institute Of Eng. & Tech.
Babu Arun Kumar,8th Sem, e-mail:[email protected],(9960084971)
Deepak Shukla ,8th Sem, e-mail : [email protected],(9270859348)

Abstract the partials are related to the frequency of the


Pitch estimation is the process of determining lowest partial by a small whole-number ratio.
the fundamental frequency present in the Signal. waveform. Most research into this area goes
It is inherently related to the detection & under the name of pitch tracking , although what
estimation of Sinusoidal varieties of methods is being done is actually f0 estimation. Because
are available for tracking the fundamental the psychological relationship between f0 and
frequency of harmonic sound in the literature. pitch is well known, it is not an important
Some primarily use time domain Analysis , distinction to make, although a true pitch
Some primarily use frequency domain analysis tracking should take the perceptual models into
& other use a combination of both. account and produce a result on a pitch scale
In this paper we present a widely used pitch rather than a frequency scale.
tracking method “ AUTO CORRELATION 1.1 Pitch Features
FUNCTION BASED (ACF)”.This time domain Pitch tracking is one of the most important
methods, which divides the musical signals into studies in speech processing. The fundamental
frame , for each frame ACF is calculated . The tone is the pitch we hear in melody and the
time lag corresponding to highest ACF peak is harmonic is the timber that performed by
reported as the estimated pitch period for that different instruments. The final tune we hear is
frame . the composition of the fundamental tune and
1. Introduction harmonics. Hence, the goal of pitch tracking is
Fundamental frequency (f0) estimation, also to find the fundamental frequency, or F0. In the
referred to as pitch tracking, has been a popular following, we shall cover representative
research topic for many years, and is still being methods of pitch tracking in each of these
investigated today. At the 2002 IEEE categories.[1] [2]
International Conference on Speech and Signal 1.2 Physical Pitch Character
Processing, there was a full session on f0 The sensations of frequencies are commonly
estimation. The basic problem is to extract the referred to as the pitch of a sound. A high pitch
fundamental frequency (f0 from a sound sound corresponds to a high frequency and a
signal, which is usually the lowest frequency low pitch sound corresponds to a low frequency.
component, or partial , which relates well to Many people are capable of detecting a
most of the other partials. In a periodic difference in frequency between two separate
waveform, mostpartials are harmonically sounds which is as little as 2 Hz. When two
related, meaning that the frequency of most of sounds with a frequency difference of greater
than 7 Hz are played simultaneously, most of the waveform. One of the first things that
people are capable of detecting the presence of a researchers used the ZCR for was f0. The
complex wave pattern resulting from the thought was that the ZCR should be directly
interference and superposition of the two sound related to the number of times the waveform
waves repeated per unit time. It was soon made
2. Pitch Tracking Algorithms clear that there are problems with this
There are two major categories for pitch measure of f0 . If the spectral power of the
tracking algorithm. waveform is concentrated around f0, then it
2.1 Time Domain will cross the zero line twice per cycle, as in
The most basic approach to the problem of f0 Figure 5a. However, if the waveform
estimation is to look at the waveform that contains, higher-frequency spectral
represents the change in air pressure over time, components, as in Figure 5b, then it might
and attempt to detect the f0 from that waveform. cross the zero line more than twice per
2.1.1 Time-Event Rate Detection cycle. A ZCR f0 detector could be
There is a family of related time-domain f0 developed with initial filtering to remove
estimation methods which seek to discover how the higher partials that contaminate the
often the waveform fully repeats itself. The measurement, but the cutoff frequency
theory behind these methods is that if a needs to be chosen carefully so as not to
waveform is periodic, then there are extractable remove the f0 partial while removing as
time-repeating events that can be counted, and much high-frequency information as
the number of these events that happen in a possible. Another possibility for the ZCR f0
second is inversely related to the frequency. detector would be to detect patterns in the
Each of these methods is useful for particular zero-crossings, and hypothesize a value for
kinds of waveforms. If there is a specific time- f0 based on these patterns.
event that is known to exist once per period in
the waveform ,such as a discontinuity in slope
or amplitude, it may be identified and counted in
the same way as the
other methods.
2.1.2 Zero-crossing rate (ZCR)
Since it was made popular in [20], the utility of
the zero-crossing rate has Figure2.1: Influence of higher harmonics on
often been in doubt, but lately it has been zero crossing rate. It has since been shown that
revived. Put simply, the ZCR is a measure ZCR is an informative feature in and of itself,
of how often the waveform crosses zero per unrelated to how well it tracks f0. Many
unit time. The idea is that the ZCR gives researchers have examined statistical features of
information about the spectral content 4 the ZCR. The ZCR has been used in the context
of f0 estimation , where the mean and the
variance of the zero crossing rate were ACF (auto co-relation Function)
calculated to increase the robustness of a feature AMDF (Average Magnitude Difference
extractor. The feature is used to track the Function)
constancy of the f0 across time frames. If the SIFT (Simple Inverse Filtering Tracking )
waveform is steady-state or slowly varying, as is
the case in most pseudo-periodic musical 2.2 Frequency Domain
signals, the mean and variance of the ZCR There is much information in the frequency
will be consistent over the course of a note, and domain that can be related to the f0 of the signal.
thus this feature can be used to detect note Pitched signals tend to be composed of a series
boundaries, glissade and frequency modulation of harmonically related partials, which can be
effects .[3][7] identified and used to extract the f0. Many
2.1.3 Peak rate attempts have been made to extract and follow
This method counts the number of positive the f0 of a signal in this manner.
peaks per second in the waveform. In theory, the 2.2.1 Cepstrum Methods
waveform will have a maximum value and a Cepstrum method is a form of spectral analysis
minimum value each cycle, and one needs only where the output is the Fourier transform of the
to count these maximum values (or minimum log of the magnitude spectrum of the input
values) to determine the frequency of the waveform .This procedure was developed in an
waveform. In practice, a local peak detector attempt to make a non-linear system more
must be used to find where the waveform is linear. Naturally occurring partials in a
locally largest, and the number of these local frequency spectrum are often slightly
maxima in one second is the frequency of the inharmonic, and the cepstrum attempts to
waveform, unless each period of the waveform mediate this effect by using the log spectrum.
contains more than one local maximum. Similar The name cepstrum comes from reversing the
alternatives are available for this method as are first four letters in the word “spectrum”,
available for the zero-crossing rate detector— indicating a modified spectrum. The
the distance between the local maxima gives the independent variable related to the cepstrum
wavelength which is inversely proportional to transform has been called “quefrency”, and
the frequency. Slope event rate. If a waveform is since this variable is very closely related to
periodic, the slope of the waveform will also be time . it is acceptable to refer to this variable as
periodic, and peaks or zeros in the slope can be time. The theory behind this method relies on
extracted in the same way as the ZCR. In some the fact that the Fourier transform of a pitched
cases, zeros or peaks in the figure [1]. slope signal usually has a number of regularly spaced
might be more informative than zeros or peaks peaks, representing the harmonic spectrum of
in the original waveform, or the detection of the signal. When the log magnitude of a
these events might be more robust, depending spectrum is taken, these peaks arereduced, their
on the domain of the signal amplitude brought into a usable scale,
and the result is a periodic waveform in the mathematical definition of the autocorrelation of
frequency domain, the period of which (the a finite discrete function x0[n] of size N.
distance between the peaks) is related to the
fundamental frequency of the original signal.
The Fourier transform of this waveform has a
peak at the period of the original waveform. The
cepstrum method assumes that the signal has
regularly- spaced frequency partials. If this is Equation [1][2]
not the case, such as with the inharmonic The cross-correlation between two functions
spectrum of a bell or the single-partial spectrum x[n] and y[n] is calculated using Equation 3:
of a sinusoid, the method will provide erroneous
results. As with most other f0 estimation
methods, this method is well suited to specific
types of signals. It was originally developed for Equation[3]
use with speech signals, which are spectrally Periodic waveforms exhibit an interesting
rich and have evenly spaced partials. autocorrelation characteristic: the
autocorrelation function itselfis periodic. As the
time lag increases to half of the period of the
waveform, the correlation decreases to a
minimum. This is because the waveform is out
of phase with its time-delayed copy. As the time
lagincreases again to the length of one period,
the autocorrelation again increases back to a
3. Auto Co-relation maximum, because the waveform and its time-
The correlation between two waveforms is a delayed copy are in phase. The first peak in the
measure of their similarity. The waveforms are autocorrelation indicates the period of the
compared at different time intervals, and their waveform.Problems with this method arise
“sameness” is calculated at each interval. The when the autocorrelation of a harmonically
result of a correlation is a measure of similarity complex, pseudoperiodic waveform is taken.
as a function of time lag between the beginnings One can imagine the output of an
of the two waveforms. The autocorrelation autocorrelation applied to the waveform in
function is the correlation of a waveform with Figure 5b.
itself. One would expect exact similarity at a 6 The first peak would not be at the period of
time lag of zero, with increasing dissimilarity as the full waveform, but at the period of the 20th
the time lag increases. The mathematical harmonic overtone. The first “large” peak would
definition of the autocorrelation function is
shown in Equation 1, for an infinite discrete
function x[n], and Equation 2 shows the
indeed occur at the fundamental period of the % of the absolute maximum of the s(n), ψ is an
waveform, but it adaptive threshold that can be adjusted by the
reduces the robustness and increases the user. In our CBMR system, ψ is set to 0.3.
computational complexity to have the algorithm Figure 6 shows the result of center clipping after
try to distinguish between “large” and “small” autocorrelation.
peaks.[1][4][8] Figure 6. Center clipping after short-term
3.1 Short-term Autocorrelation Autocorrelation
After sampling, we set frame size to be 256
points, and there are 128 points overlap between
neighboring frames. Figure shows the short- 3.3 Calculate Each Frame’s Frequency
term autocorrelation based on a frame size of A sound that has pitches is periodic. In our
256 points, 128-point overlap, a rectangular approach, the pitch period is defined as the
window ending at 300. average distance between the peaks of local

maxima. In symbol, the fundamental frequency


of a speech or singing signal frame can be
derived from the above equation. where the
sample_rate is the sampling rate while sampling
the wave data and the pitch_period is the pitch
period.
Figure 5. Short-term autocorrelation before 4. Discussion
center clipping f0 estimation algorithms tend to be based on a
3.2 Center Clipping number of fairly strict assumptions:
After computing the short-term autocorrelation, 1. The input waveform consists of a single
we need to find the local maximum peak in pitched signal, segmented into frames, and the
order to determine the pitch frequency. Before waveform is homogeneous throughout the time
computing the pitch period, we can apply center frame being considered.
clipping to remove some local maximum points 2. The input is limited to a specific audio
that do not have enough height. The center domain, for which the algorithm is designed.
clipping function that we used in CBMR is 3. f0 estimation is the same thing as pitch
defined as detection.

These assumptions are acceptable for initial


In the above equation, α,β are set toψ% and –ψ development, and many successful algorithms
have been developed using these music. Journal of the Acoustical Society of
assumptions.Indeed, without severely limiting America, 111(4), 2002
the domain at the beginning of research, it [6] David Gerhard. Audio visualization in
would be impossible to achieve anything at all. phase space. In Bridges: Mathematical
Many researchers who accept that assumption 3 Connections in Art,
is theoretically incorrect continue to cite their Music and Science, pages 137–144, August
work as pitch detectors rather than f0 estimators. 1999.
Assumption 2 is another necessity for the [7] Eric Scheirer and Malcolm Slaney.
introductory design of an algorithm. As the Construction and evaluation of a robust
algorithms become more robust and more multifeature speech/music
accurate, the domain for which the algorithm is discriminator. In International Conference on
useful will expand until assumption stream Acoustics, Speech and Signal
separation is proceeding, but it would perhaps [8] Boris Doval and Xavier Rodet. Fundamental
be more fruitful if the f0 estimation community frequency estimation and tracking using
would work with the stream separation maximum
community, and vice versa. Clearly, each has likelihood harmonic matching and HMMs. In
much to learn from the other. International Conference on Acoustics, Speech
5. References and
[1] Albert Bregman. Auditory Scene Signal Processing, volume I, pages 221–224.
Analysis. MIT Press, Cambridge, 1990. IEEE, 1993.
[2] Stanley Coren, Lawrence M. Ward, [9] James L. Flanagan. Speech Analysis,
and James T. Enns. Sensation and Perception. Synthesis and Perception. Springer-Verlag, New
Harcourt Brace York, 1965.
[3] Curtis Roads. The Computer Music [10] Edouard Geoffriois. The multi-lag-window
Tutorial. MIT Press, Cambridge, 1996. method for robust extended-range f0
[4] Erkan Dorken and S. Hamid Nawab. determination. In
Improved musical pitch tracking using principal Fourth International Conference on Spoken
decomposition analysis. In International Language Processing, volume 4, pages 2239–
Conference on Acoustics, Speech and Signal 2243, 1996.
Processing, volume II, pages 217–220. IEEE, [11] David Gerhard. Audio visualization in
1994. phase space. In Bridges: Mathematical
[5 ]Alain de Cheveign´e and Hideki Kawahara. Connections in Art,
Yin, a fundamental frequency estimator for Music and Science, pages 137–144.
speech and

You might also like