Seewave Analysis
Seewave Analysis
J er ome Sueur Mus eum national dHistoire naturelle CNRS UMR 7205 OSEB, Paris, France http://sueur.jerome.perso.neuf.fr January 27, 2014
This document is a very brief introduction to sound analysis principles. It is mainly written for students starting with bioacoustics. The content should be updated regularly. Demonstrations are based on the package seewave.
Contents
1 Digitization 1.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Quantisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 File format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Amplitude envelope 3 Discrete-time Fourier Transform (DTFT) 3.1 Denitions and principle . . . . . . . . . . 3.2 Complete sound . . . . . . . . . . . . . . . 3.3 Sound section . . . . . . . . . . . . . . . . 3.3.1 Window shape . . . . . . . . . . . 4 Short-time Fourier Transform (STFT) 4.1 Principle . . . . . . . . . . . . . . . . . 4.2 Spectrogram . . . . . . . . . . . . . . 4.2.1 3D in a 2D plot . . . . . . . . . 4.2.2 Overlap . . . . . . . . . . . . . 4.2.3 Values . . . . . . . . . . . . . . 4.3 Mean spectrum . . . . . . . . . . . . . 3 3 3 4 4 6 6 7 7 10 10 10 11 11 12 13 13 15 15 15 16
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
5 Instantaneous frequency 5.1 Zero-crossing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Hilbert transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Cepstral transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
16 17 17 17
J. Sueur
Amplitude
0.000
0.001
0.002
0.003
0.004
Time (s)
Figure 1: Digital sound is a discrete process along a time scale: the same sound sampled at two dierent rates: 44.1 kHz (above) and 22.05 kHz (bottom) respectively.
1
1.1
Digitization
Sampling
Digital recording is not a continuous but a discrete process of data acquisition. Sound is recorded through regular samples. These samples are taken at a specied rate, named the sampling frequency or sampling rate f given in Hz or kHz. The most common rate is 44,100 Hz = 44.1 kHz but lower rate can be used for low frequency sound (e.g. 22.05 kHz) or higher rate can be used for high frequency sound (up to 192 kHz). Figure 1 shows 5 ms of a pure tone sound (440 Hz) sampled at 44.1 kHz and 22.05 kHz respectively. The discretization of sound digitization should not be underestimated as a too low sampling rate can lead to frequency artefacts.
1.2
Quantisation
Another important parameter of digitization is the process of quantisation that consists in assigning a numerical value to each sample according to its amplitude. These numerical values are attributed according to a bit scale. A quantisation of 8 bit will assign amplitude values along a scale of 28 = 256 states around 0 (zero). Most recording systems use a 216 = 65536 bit system. Quantisation can be seen as a rounding process. A high bit quantisation will produce values close to reality, i. e. values rounded to a high number of signicant digits, when a low bit quantisation will produce values far from reality, i. e. values rounded a low number of signicants digits. Low quantisation can lead to impaired quality signal.
J. Sueur
Figure 2: Digital sound is a discrete process along amplitude scale: a 3 bit (= 23 = 8) quantisation (grey bars) gives a rough representation of a continuous sine wave (red line).
1.3
File format
All these formats generate binary les, sound being encoded into a succession of 0 and 1. When importing these formats into R through tuneR the data are transformed into a decimal format. This implies an important increase in data size.
Amplitude envelope
The amplitude envelope or amplitude contour is the prole of sound energy over time. The envelope can be expressed along a relative or an absolute energy scale. There are two ways to obtain a relative amplitude envelope (see Figure 3):
by computing the absolute value of the waveform, by computing the Hilbert transform of the waveform.
An example of the two envelopes types is shown in the gure 3 for the song of the bird Zonotrichia capensis (Figure 4) included in the tico data.
J. Sueur
Absolute Hilbert
Amplitude
0.0
0.5
1.5
Figure 3: Two ways to compute the amplitude envelope of a sound: the absolute value or the Hilbert transform of the time wave.
Figure 4: The rufous-collared sparrow Zonotrichia capensis also named tico-tico in Portuguese. Picture by Ladislav Nagy, Wikimedia Commons.
J. Sueur
0.32
0.21
0.3
0.18
Amplitude
5 % 0.07
0.21
0.17
0.24
0.09
0.0
0.5
1.5
Figure 5: Use of the amplitude envelope to automatically measure the temporal pattern of a sound. timer(tico,f=22050,threshold=5,msmooth=c(50,0)) The envelope can then be used to measure the duration of the dierent temporal parts of the sound as shown in gure 5 using the function timer() or to analyse the amplitude modulation rates with the function ama().
3
3.1
Start rst with some terminology: Fourier transform (FT) This is a reversible mathematical transform named after the French mathematician Joseph Fourier (1768-1830) (Figure 6). The transform decomposes a time series into a sum of nite series of sine or cosine functions. Fast Fourier Transform (FFT) This is an algorithm to compute quickly the FT. Discrete-time Fourier transform (DTFT) This is a specic form of the FT applied to a time wave, typically a sound. Each sine / cosine function has a specied frequency and a relative amplitude. These two parameters are used to build the frequency spectrum of the original time wave. The DTFT is then a way to switch from the time domain to the frequency domain. The signal s depicted in the gure 7 was made by the addition of three original waves with three dierent carrier frequencies i : 1 = 1 kHz, 2 = 2 kHz, and 3 = 3 kHz. The waves were added in phase ( = 0) but with three dierrent relative amplitudes : a1 = 1, a2 = 0.5, and J. Sueur 6 January 27, 2014
Figure 6: Joseph Fourier around 1823. Engraving by Jules Boilly (Public Domain) a3 = 0.25). The carrier frequencies i and the relative amplitude of each sine function can be plotted in X-Y graph as shown in gure 8. This graph is a frequency spectrum.
3.2
Complete sound
The number of sine functions n is determined by the number of samples N of the original time wave following n = 0.5 N . If the DTFT is computed on tico data, which includes 39,578 samples, the DTFT will decompose the sound into 0.5 39578 = 19789 sine functions. The rst sine function will have a frequency w1 = fs /N = 22050/39578 = 0.557 Hz (Figure 10). This is equivalent to the frequency resolution f of the decomposition.
3.3
Sound section
Such a high frequency resolution is often not required, if not irrelevant. In addition, computing the FFT of the whole sound might not be appropriate if there is frequency modulation along time, i. e. the frequency of the sound is not constant along the time scale. A rst solution is to compute the DTFT locally, on a specic sound section. The size of this section, or window, can be set up in seconds or in number of samples, a more accurate solution. We can, for instance, compute the DTFT in the middle of the third note produced by the tico bird that is at 1.1 s (Figure 10). The length of the FFT is controlled with the argument wl for window length. If we choose a window size of 512 samples, we will end up with a decomposition into 0.5 512 = 256 sine functions with a frequency precision f = 22050/512 = 43.07 Hz. Increasing the window size will increase frequency resolution but the decomposition will be less accurate in terms of time as more signal will be selected. Inversely, reducing the window size will be more specic in terms of time (position) but the frequency resolution will decrease. This trade-o is an example of the uncertainty or Heisenberg principle that stipulates that there is a limit in the precision of pairs of parameters, here the time and frequency parameters.
J. Sueur
Amplitude
0.00
0.01
0.03
0.04
0.05
NULL
1 kHz
Amplitude
2 kHz 3 kHz
0.00 0.01 0.02 0.03 0.04 0.05
Time (s)
Figure 8: Decomposition of the time wave s into three sine functions. See gure 7.
J. Sueur
0.4
G
DTFT on complete sound DTFT on a sound section Mean spectrum (STFT) Amplitude
6 Frequency (kHz)
10
Figure 10: Three categories of frequency spectra computed on tico : (1 ) the spectrum of complete sound, (2 ) the spectrum computed at 1.1 s with a 512 samples window, and (3 ) the mean spectrum computed with the STFT (see section 4).
J. Sueur
1.0
Amplitude
0.0 0
0.2
0.4
0.6
0.8
100
200 Index
300
400
500
Figure 11: Three dierent Fourier window shapes. Try example(ftwindow) for other shapes. 3.3.1 Window shape
When computing the DTFT, the shape of analysis window is by default a rectangle. However this shape is not always appropriate as it induces artefacts like side frequency lobes. A way to avoid this is to multiply the original window with a function of the same length with a particular shape. This shape can be rectangular in that case nothing is changed to the original signal triangular (Bartlett window), or sinusoidal (Blackman, Hamming, at top, and Hanning windows) (see gure 11 for three examples of window shapes and gure 12 for a test on a simple signal). The default window shape used in seewave is the Hanning window but other windows could be more appropriate depending on main signal features.
4
4.1
Computing the FFT on the whole sound or a single section might not be informative enough. An intuitive solution is to compute the DTFT on successive sections along the signal. A window is then slided along the signal and a DTFT is computed at each slide or jump. This is what the short-time Fourier transform (STFT) does. A good way to understand how it works is to use the function dynspec(). The successive DTFT can be tracked when moving along the signal with a sliding cursor. Here is an example with a DTFT window of 1024 samples:
> dynspec(tico, wl=1024, osc=TRUE)
Basically the STFT returns a matrix of values where columns are the successive spectra along time. This can be summarized as a anp matrix : with aij the Fourier coecients, n the number J. Sueur 10 January 27, 2014
This matrix is nice but a plot of it would even be better. There are three ways to friendly visualize this matrix:
a waterfall plot, see the function wf(), a density plot, or spectrogram, see the function spectro(). a 3D plot, or 3D-spectrogram, see the function spectro3D(),
4.2
4.2.1
Spectrogram
3D in a 2D plot
The density plot option, or spectrogram, is the most popular representation used in bioacoustics. It has the main advantage not to be based on a 3D representation that is not appropriate for human eye inspection. The principle is quite simple: the successive DTFT are plotted against J. Sueur 11 January 27, 2014
Frequency (kHz)
1.5
1 The temporal and the frequency resolutions of the spectrogram are linked, with f = t . In the latter case, the frequency resolution is 22050/512 = 43.06 Hz and the time resolution is 512/22050 = 0.0232s. As mentionned above, increasing the size of the window will increase frequency resolution but decrease time resolution. However, there is a trick to counteract this two-dimension precision limit. In the rst example, the DTFT window was coarsely jumping from a position to another but we can make the jump slightly better. The solution is to simply allow an overlap between successive windows. This overlap is usually set up in percentage: the default value is 0% as in gure 13. A percentage of 50% will double the number of DTFTs, hence increasing the time resolution by a factor of 2 (now 153 FFTs) when the frequency resolution is not reduced. The overlap parameter is set up with the argument ovlp of the function spectro()(see gure 14). A value of 100% is of course a non-sense as the sliding window will stay on the spot. Increasing the overlap inscrease computing time as more FFT are computed. We advice to keep reasonable values for computing eciency.
J. Sueur
12
Frequency (kHz)
1.5
The spectrogram is a graphical function but the values along the three scales can be saved. The value of spectro() is a list containing three components:
$time or [[1]] returns the values of the time axis, $frequency or [[2]] returns the values of the frequency axis, $amp or [[3]] returns the amplitude values of the successive FFT decompositions or spectra.
These components can be used to plot the spectrogram manually (Figure 15). The successive spectra computed by the successive DTFT can also be picked up and plot as the function spec() would do (Figure 16).
4.3
Mean spectrum
The columns of the STFT matrix can be averaged giving the so-called mean or average spectrum as shown in the gure 10. The frequency resolution and the shape of the mean spectrum will of course change when changing the window size (wl) and overlap (ovlp) arguments. The function to compute the mean spectrum is meanspec().
J. Sueur
13
50
100
150
Figure 15: Redrawing the graphical output of the function spectro() with the native filled.contour() function, with the command: filled.contour(x=spectro[[1]], y=spectro[[2]], z=t(spectro[[3]])).
80 60 40 20
120
10
Frequency (kHz)
Figure 16: One of the dB spectra computed by the STFT. plot(x=spectro[[2]], y=spectro[[3]][,15], type="o", xlab="Frequency (kHz)", ylab="Relative amplitude (dB)")
J. Sueur
14
GG G G G G G G G G G G G G G
GG
G G G G G G
GGG G G G G G G G G G G G G G G G G G
G G G G G G G G G G G G
G G G G
G G G
Amplitude
G G G G G G G G G GGG GG GG G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G
0.000
0.002
0.004
0.006
0.008
0.010
Time (s)
Figure 17: Zero-crossing: principle and interpolation to reduce innacurracy of measurement. The upper panel shows a 440 Hz signal sampled at 8000 Hz. The sampling is too low to measure properly the period betwee successive cycles. The lower pannel plots the same wave with 10 interpolation factor. New samples are added, it is now possible to measure the periodicty of the wave.
5
5.1
Instantaneous frequency
Zero-crossing
The zero-crossing is a rather simple technique which consists in measuring successive time intervals at which the wave crosses the zero amplitude line. This gives a measure of the period T of a full cycle and the instantaneous frequency is obtained by simply computing f = T 1 . The signal has to be quite periodic to make the method reliable. The main problem in zero crossing procedure is linked to the discrete process of sound sampling. The signal to be analysed might not always have values equal or very close to zero. This makes the zero-crossing results quite approximative. An example of this issue is illustrated in the upper panel of the gure 17. It is therefore sometimes necessary to oversample the signal by interpolation. This process adds values closer to zero and then increase the accuracy of the measure as shown in the lower panel of the gure 17. An example of such measure on the usual tico song in the gure 18.
5.2
Hilbert transform
The Hilbert transform is a decomposition of a signal x(t) into the amplitude envelope and the instantaneous frequency. More specically, the amplitude envelope is the modulus of the analytic signal, dened as z (t) = x(t) + iy (t) where y (t) is the Hilbert transform, and the instantaneous frequency is the derivative of the phase of z (t) with respect to time. The Hilbert transform can be thus used to track both amplitude and frequency modulations. J. Sueur 15 January 27, 2014
G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G GG G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G GG G G
0.1
0.3
0.4
Figure 18: Zero-crossing: measuring the instantaneous frequency of a note the tico song without (upper panel) and with interpolation (bottom panel). The amplitude enveloppe is obtained with the function env() and the instantaneous frequency is obtained with the function ifreq().
5.3
Cepstral transform
The Cepstral transform is the inverse Fourier transform of the logarithm of the spectrum. The real cepstrum (an anagram of spectrum) is the real part of the Cestral transform. The scale of the independent variable (usually the y -axis) of the cepstrum is named quefrequency. The quefrequency scale is not intuitive but can be transformed in frequency (Hz). The cepstrum is useful for detecting the fundamental frequency of an harmonic series, it corresponds to the rst peak of the cepstrum as shown in the gure 19. Note that this dectection will work properly with harmonic signals only. The function cepstro() is short-term version of the Cepstral function: successive cepstrum are computed along the signal with a sliding window in a similar way as the STFT (see section 4).
Other transforms
There are several other options to analyse a signal. Among others, we could list the following ones that are not included in seewave:
Mel-frequency cepstral transform, see the function melfcc() of the package tuneR [not tested] Wavelet transform, see the packages biwavelet, rwt, wavelets, waveslim, wavethresh and wmtsa [not tested]. Gabor transform not yet implemented in R.
J. Sueur
16
Inf
258.398
129.199
86.133
64.6
51.68
43.066
Amplitude
0.004
0.008
0.012
0.015
0.019
0.023
7
7.1
References
Books
Au WWL, Hastings MC (2008) Principles of marine bioacoustics, Springer. Bradbury JW, Vehrencamp SL (1998) Principles of animal communication, Sinauer Associates. Fletcher NH (1992) Acoustic systems in biology, Oxford University Press. Gerhardt HC, Huber F (2002) Acoustic communication in insects and anurans, University of Chicago Press. Hopp, SL, Oweren MJ, Evan CS (1998) Animal acoustic communication, Springer. Marler P, Slabbekoorn H (2004) Natures Music. The Science of Birdsong, Academic Press, Elsevier. Rossing TD (2007) Handbook of acoustics, Springer. Rumsey F, McCormick T (2002) Sound and recording - an introduction, Elsevier. Speaks CE (1999) Introduction to sound, Singular.
7.2
Dedicated journals
Animal Behaviour http://www.journals.elsevier.com/animal-behaviour/ Bioacoustics http://www.tandfonline.com/toc/tbio20/current Journal of the Acoustical Society of America http://asadl.org/jasa/
J. Sueur
17