S H Li Speech Analysis
S H Li Speech Analysis
Analysisofspeechsoundstakingintoconsiderationtheirmethodof y p g
production Thelevelofprocessingbetweenthedigitisedacousticwaveformandthe The level of processing between the digitised acoustic waveform and the acousticfeaturevectors. Th Theextractionof``interesting''informationasanacousticvector t ti f ``i t ti '' i f ti ti t
waveforms
SpeechWaveforms h f
A waveform is a two dimensional representation of a sound. The two dimensions in a waveform display are time and intensity. The vertical dimension is intensity and the horizontal dimension is time. Waveforms are known as time domain representations of sound as they display changes in intensity over time. The intensity dimension actually displays sound pressure. Sound pressure is a measure of the tiny variations in air pressure that we are able to perceive as sound. I t it i th Intensity in these waveforms i a simple li f is i l linear scaling of sound li f d pressure (not dB).
ResonancesandFormants
Resonances are vibratory characteristics of a resonating body. In the case of an air filled tube the resonance characteristics exist even when there is no sound being produced. When we produce vowel sounds the resonances of the vocal tract selectively enhance sound vibrations close to the resonance frequencies and selectively attenuate sound vibrations remote from the resonance frequencies frequencies. This results in peaks in the acoustic spectrum of the resulting speech sound. These acoustic spectral peaks are called formants, particularly when they occur in vowels and vowellike consonants.
Fig:waveformandbroadbandspectrogramoftheword"heard"
Figure:anarrowbandspectrogramoftheword"heard"
g1 0
g2
aag
aa1
aa2
aam
m1
m2 0.491
Time (s) ( )
aayvu 1
0.18 0 18
0.2
0.1
0.07
0.04
0.07
0.07
0.19
Words aayvu g aa ay y yv v vu u
Duration insecs 0.77 0.19 0 19 0.2 0.1 0.07 0.04 0.07 0 07 0.06 0.2
Intensity indB 80.4 62.4 62 4 81.3 84.0 80.5 78.7 73.4 73 4 78.2 77.8
Pitch inHz 160.2 128 137.1 171.1 179 174.5 162.2 162 2 166.5 167.2
F1 540.7 900.78 900 78 810.4 654.07 362.1 349.3 348.7 348 7 3636.0 387.36
FormantsinHz F2 F3 1484.6 3750.3 1853.0 1853 0 2899.3 2899 3 1181.6 2865.5 1755.3 2599.9 2275.9 2570.3 1928.6 2365.0 1154.98 1154 98 2418.4 2418 4 1147.2 2570.8 1488.5 2611.5
F4 3750.2 4078.2 4078 2 3792.2 3753.5 3878.4 3876.5 3636.0 3636 0 3568.2 3693.2
LPC of aa in aayvu
Sound pressu level (dB/Hz) ure
886.4
1212.5
60
2916.7
40
3754.0
4813.6
LPC of ay in aayvu
Sound press sure level (dB/Hz)
671.6
1694.1 2272.1
3679.9
60
40
LPC of y in aayvu
80
60
40
1000
2000
4000
5000 5500
LPC of v in aayvu
60
40
LPC of vu in aayvu
360.3
60
1108.7
2612.9
3583.6
40
LPC of u in aayvu
397.4
60
1486.3
3583.6 2590.7
40
An LPC analysis separates the analysis of the resonant characteristics of a speech sound from the source characteristics of that sound.
The resulting LPC spectrum is a smoothed spectrum with the peaks representing the formants (resulting from the vocal tract resonances) of the spectrum of a vowel or vowel like consonant vowel-like consonant.
Figure:Threelongvowelsinan/h_d/context.
Figure:ThreeEnglishvoicelessoralstopsinCVcontext
Figure:ThreeEnglishvoicedoralstopsinCVcontext.
Figure:ThetwoEnglishaffricatesinCVcontext.
Figure9:WaveformsoftwooftheEnglishvoicelessfricativesinCVcontext