0% found this document useful (0 votes)
71 views32 pages

S H Li Speech Analysis

The document discusses various techniques for analyzing speech sounds, including waveform analysis, spectrograms, and linear prediction coding (LPC). Waveform analysis looks at changes in sound intensity over time. Spectrograms examine dynamic changes in a speech spectrum and are useful for segmenting phonemes. LPC separates resonant vocal tract characteristics from sound source characteristics to identify formant peaks representing resonances. Examples of applying these techniques to analyze vowels, stops, affricates and fricatives are shown through various figures.

Uploaded by

ilasundaram
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views32 pages

S H Li Speech Analysis

The document discusses various techniques for analyzing speech sounds, including waveform analysis, spectrograms, and linear prediction coding (LPC). Waveform analysis looks at changes in sound intensity over time. Spectrograms examine dynamic changes in a speech spectrum and are useful for segmenting phonemes. LPC separates resonant vocal tract characteristics from sound source characteristics to identify formant peaks representing resonances. Examples of applying these techniques to analyze vowels, stops, affricates and fricatives are shown through various figures.

Uploaded by

ilasundaram
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Speechanalysis S h l i

WhatisSpeechAnalysis? What is Speech Analysis?

Analysisofspeechsoundstakingintoconsiderationtheirmethodof y p g

production Thelevelofprocessingbetweenthedigitisedacousticwaveformandthe The level of processing between the digitised acoustic waveform and the acousticfeaturevectors. Th Theextractionof``interesting''informationasanacousticvector t ti f ``i t ti '' i f ti ti t

waveforms

SpeechWaveforms h f
A waveform is a two dimensional representation of a sound. The two dimensions in a waveform display are time and intensity. The vertical dimension is intensity and the horizontal dimension is time. Waveforms are known as time domain representations of sound as they display changes in intensity over time. The intensity dimension actually displays sound pressure. Sound pressure is a measure of the tiny variations in air pressure that we are able to perceive as sound. I t it i th Intensity in these waveforms i a simple li f is i l linear scaling of sound li f d pressure (not dB).

ResonancesandFormants
Resonances are vibratory characteristics of a resonating body. In the case of an air filled tube the resonance characteristics exist even when there is no sound being produced. When we produce vowel sounds the resonances of the vocal tract selectively enhance sound vibrations close to the resonance frequencies and selectively attenuate sound vibrations remote from the resonance frequencies frequencies. This results in peaks in the acoustic spectrum of the resulting speech sound. These acoustic spectral peaks are called formants, particularly when they occur in vowels and vowellike consonants.

Spectrograms Spectrograms permit the examination of the dynamic changes in a Spectrogramspermittheexaminationofthedynamicchangesina


speechspectrum. This is particularly useful for the examination of rapidly changing Thisisparticularlyusefulfortheexaminationofrapidlychanging consonants(eg.stopbursts)andalsoforvoweltransitions(between vowelsandconsonantsandbetweenthetargetsindiphthongs). Spectrograms,usuallyinconjunctionwithwaveforms,areessential duringthesegmentingandlabelingofspeech. Spectrogramsusuallyprovidetheclearestvisualcuestothe boundariesbetweenphonemes. Spectrogramsdonot,however,provideaccuratemeasurementsof vowelformantsasbroadbandspectrogramshaveapoorfrequency resolution(about300Hz)andsothereisahighdegreeofintrinsic errorinformantmeasurementstakenvisuallyfromspectrograms. error in formant measurements taken visually from spectrograms ThatiswhywetendtouseFFTsandLPCsfortheaccurate measurementofformantfrequencies.

Fig:waveformandbroadbandspectrogramoftheword"heard"

Figure:anarrowbandspectrogramoftheword"heard"

Figure: Thisisabroadbandspectrogramof theword"hide"withtheformanttracksfor formants1to5superimposedoverit.

1_aam 0.0143017892 0.490396511

g1 0

g2

aag

aa1

aa2

aam

m1

m2 0.491

Time (s) ( )

aayvu 1

-1 g 0 Time (s) aa ay y yv v vu u 0.8455

0.18 0 18

0.2

0.1

0.07

0.04

0.07

0.07

0.19

Words aayvu g aa ay y yv v vu u

Duration insecs 0.77 0.19 0 19 0.2 0.1 0.07 0.04 0.07 0 07 0.06 0.2

Intensity indB 80.4 62.4 62 4 81.3 84.0 80.5 78.7 73.4 73 4 78.2 77.8

Pitch inHz 160.2 128 137.1 171.1 179 174.5 162.2 162 2 166.5 167.2

F1 540.7 900.78 900 78 810.4 654.07 362.1 349.3 348.7 348 7 3636.0 387.36

FormantsinHz F2 F3 1484.6 3750.3 1853.0 1853 0 2899.3 2899 3 1181.6 2865.5 1755.3 2599.9 2275.9 2570.3 1928.6 2365.0 1154.98 1154 98 2418.4 2418 4 1147.2 2570.8 1488.5 2611.5

F4 3750.2 4078.2 4078 2 3792.2 3753.5 3878.4 3876.5 3636.0 3636 0 3568.2 3693.2

LPC of aa in aayvu
Sound pressu level (dB/Hz) ure

886.4

1212.5

60
2916.7

40

3754.0

4813.6

20 0 1000 2000 3000 Frequency (Hz) 4000 5000 5500

LPC of ay in aayvu
Sound press sure level (dB/Hz)

671.6

1694.1 2272.1

3679.9

60

40

20 0 1000 2000 3000 Frequency (Hz) 4000 5000 5500

LPC of y in aayvu

Sound pressure level (dB/Hz) d (

80

352.9 2323.9 3939.3 4939.6

60

40

1000

2000

3000 Frequency (Hz)

4000

5000 5500

LPC of v in aayvu

Sound pressure level (d /Hz) dB

60

323.3 1190.2 2346.2 3613.2

40

20 0 1000 2000 3000 Frequency (Hz) 4000 5000 5500

LPC of vu in aayvu

Sound pressure level (dB/Hz) p B

360.3
60

1108.7

2612.9

3583.6

40

20 0 1000 2000 3000 Frequency (Hz) 4000 5000 5500

LPC of u in aayvu

Sound pressure level (d /Hz) dB

397.4
60

1486.3

3583.6 2590.7

40

20 0 1000 2000 3000 Frequency (Hz) 4000 5000 5500

Linear Prediction Coefficient (LPC)


Linear Prediction Coefficient (LPC) analysis attempts to predict the poles (related to resonances or formants) that, when combined with the speech source spectrum (the "residual" in LPC analysis), would result in the original waveform. g

An LPC analysis separates the analysis of the resonant characteristics of a speech sound from the source characteristics of that sound.

The resulting LPC spectrum is a smoothed spectrum with the peaks representing the formants (resulting from the vocal tract resonances) of the spectrum of a vowel or vowel like consonant vowel-like consonant.

Figure:ThisisanLPCanalysisofthevowelinheard.Note thesmoothspectrumclearlyshowingthepositionsofthe mainspectralpeaks(formants)ofthisvowel

Figure:Whitenoiseusedasasimplifiedmodelofafricativesound source. Notetherandompatternofboththewaveform(bottom)andthe spectrum(top).Alsonotethatthespectralenvelope(LPCspectruminred) isapproximatelyflat.

Identification of Speech Waveforms

Figure:Threelongvowelsinan/h_d/context.

Figure:ThreeEnglishvoicelessoralstopsinCVcontext

Figure:ThreeEnglishvoicedoralstopsinCVcontext.

Figure:ThetwoEnglishaffricatesinCVcontext.

Figure9:WaveformsoftwooftheEnglishvoicelessfricativesinCVcontext

You might also like