0% found this document useful (0 votes)
384 views34 pages

Acoustic Theory of Speech Production MIT PDF

The document discusses the acoustic theory of speech production. It describes how speech sounds are produced by the interaction of the vocal tract and vocal folds. The vocal tract can be modeled as a series of acoustic tubes. The natural resonance frequencies of these tubes, called formants, determine the characteristics of different speech sounds. The document provides examples of estimating formant frequencies from simplified tube models of the vocal tract for different vowels.

Uploaded by

Marina Lacerda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
384 views34 pages

Acoustic Theory of Speech Production MIT PDF

The document discusses the acoustic theory of speech production. It describes how speech sounds are produced by the interaction of the vocal tract and vocal folds. The vocal tract can be modeled as a series of acoustic tubes. The natural resonance frequencies of these tubes, called formants, determine the characteristics of different speech sounds. The document provides examples of estimating formant frequencies from simplified tube models of the vocal tract for different vowels.

Uploaded by

Marina Lacerda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Acoustic Theory of Speech Production

Overview
Soundsources
Vocaltracttransferfunction
Waveequations
Soundpropagationinauniformacoustictube
Representingthevocaltractwithsimpleacoustictubes
Estimatingnaturalfrequenciesfromareafunctions
Representingthevocaltractwithmultipleuniformtubes
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction1
Lecture # 2
Session 2003
Anat omi cal St r uct ur es f or Speech Pr oduct i on
6. 345 Automatic Speech Recognition Acoustic Theory of Speech Production 2
Phonemes in American English
PHONEME EXAMPLE PHONEME EXAMPLE PHONEME EXAMPLE
/i/ beat
/I/ bit
/e/ bait
/E/ bet
/@/ bat
/a/ Bob
/O/ bought
/^/ but
/o/ boat
/U/ book
/u/ boot
/5/ Burt
/a/ bite
/O/ Boyd
/a/ bout
/{/ about
/s/ see /w/ wet
/S/ she /r/ red
/f/ fee /l/ let
/T/ thief /y/ yet
/z/ z /m/ meet
/Z/ Gigi /n/ neat
/v/ v /4/ sing
/D/ thee /C/ church
/p/ pea /J/ judge
/t/ tea /h/ heat
/k/ key
/b/ bee
/d/ Dee
/g/ geese
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction3
Places of Articulation for Speech Sounds
Palato-Alveolar
Velar
Alveolar
Labial
Uvular
Dental
Palatal
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction4
Speech Waveform: An Example
Twoplussevenislessthanten
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction5
A Wideband Spectrogram
Twoplussevenislessthanten
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction6
Acoustic Theory of Speech Production
Theacousticcharacteristicsofspeechareusuallymodelledasa
sequenceofsource,vocaltractlter,andradiationcharacteristics
U
G
U
L
P
r
r
P
r
(j)=S(j)T(j)R(j)
Forvowelproduction:
S(j) = U
G
(j)
T(j) = U
L
(j)/ U
G
(j)
R(j) = P
r
(j)/ U
L
(j)
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction7
Sound Source: Vocal Fold Vibration
Modelledasavolumevelocitysourceatglottis,U
G
(j)
P
r
( t )
U
G
( t )
T 1/F
o o
=
t
t
U
G
( f )
1 / f
2
f
F
0
ave(Hz) F
0
min(Hz) F
0
max(Hz)
Men 125 80 200
Women 225 150 350
Children 300 200 500
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction8

Sound Source: Turbulence Noise


Turbulencenoiseisproducedataconstrictioninthevocaltract
Aspirationnoiseisproducedatglottis
Fricationnoiseisproducedabovetheglottis
Modelledasseriespressuresourceatconstriction,P
S
(j)
P ( f )
s
f
0.2
V
D
4A

V: Velocityatconstriction D: Criticaldimension= A

6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction9


Vocal Tract Wave Equations
Dene: u(x, t) =
U(x, t) =
p(x, t) =
=
c =
particlevelocity
volumevelocity(U =uA)
soundpressurevariation(P =P
O
+p)
densityofair
velocityofsound
Assumingplanewavepropagation(foracrossdimension ),
andaone-dimensionalwavemotion,itcanbeshownthat

p
=
u

u
=
1 p
2
u 1
2
u
=
x t x c
2
t x
2
c
2
t
2
Timeandfrequencydomainsolutionsareoftheform
u(x, t)=u
+
(t
x
)u

(t+
x
) u(x, s)=
1
P
+
e
sx/c
P

e
sx/c
c c c
x x
p(x, t)=c u
+
(t ) + u

(t+ ) p(x, s)=P


+
e
sx/c
+P

e
sx/c
c c
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction10
U
G
Propagation of Sound in a Uniform Tube
A
x = - l x = 0
Thevocaltracttransferfunctionofvolumevelocitiesis
U
L
(j) U(, j)
T(j)=
U
G
(j)
=
U(0, j)
UsingtheboundaryconditionsU(0, s) = U
G
(s)andP(, s) = 0
2 1
T(s) =
e
s/c
+e
s/c
T(j)=
cos(/c)
ThepolesofthetransferfunctionT(j)arewherecos(/c) = 0
4 (2f
n
)
=
(2n
2
1)
f
n
=
4
c

(2n1)
n
=
(2n1)
n= 1,2, . . .
c
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction11
Propagation of Sound in a Uniform Tube (cont)
Forc= 34,000cm/sec,= 17cm,thenaturalfrequencies(also
calledtheformants)areat500Hz,1500Hz,2500Hz,. . .
j
x
x
x
x
x
x

40 )

T

(

j

20
1
0

2
0

l
o
g

0

0 1 2 3 4 5
Frequency ( kHz )
Thetransferfunctionofatubewithnosidebranches,excitedat
oneendandresponsemeasuredatanother,onlyhaspoles
Theformantfrequencieswillhavenitebandwidthwhenvocal
tractlossesareconsidered(e.g.,radiation,walls,viscosity,heat)
4

1
,
4

2
,
4

3
,..., Thelengthofthevocaltract,,correspondsto
1 3 5
where
i
isthewavelengthofthei
th
naturalfrequency
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction12
Standing Wave Patterns in a Uniform Tube
Auniformtubeclosedatoneendandopenattheotherisoften
referredtoasaquarterwavelengthresonator
x
glottis lips
SWP for
F
1
|
U(x)
|
SWP for
F
2
2
3
SWP for
F
3
2 4
5 5
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction13
Natural Frequencies of Simple Acoustic Tubes
z
-l
A
z
-l
A
x = - l x = 0 x = - l x = 0
Quarterwavelengthresonator Half-wavelengthresonator
P(x, j) = 2P
+
cos
x
P(x, j) =j2P
+
sin
x
c c
U(x,j)=j
A A
c
2P
+
sin
x
U(x, j) =
c
2P
+
cos
x
c c
c
tan
c
cot Y

= j
A
Y

=j
A
c c
j
A A 1
c
2
=jC
A
/c 1 j

=j
M
A
/c 1
C
A
=A/c
2
=acousticcompliance M
A
= /A =acousticmass
c c
f
n
=
4
(2n1) n= 1, 2, . . . f
n
=
2
n n = 0, 1, 2, . . .
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction14
Approximating Vocal Tract Shapes
[ i ] [ a ] [ u ]
A
1
A
2
1
l
2
l
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction15
2
1 2
l
Estimating Natural Resonance Frequencies
Resonancefrequenciesoccurwhereimpedance(oradmittance)
functionequalsnatural(e.g.,opencircuit)boundaryconditions
U
G
A
1
A
2
U
L
1
l
Y + Y = 0
ForatwotubeapproximationitiseasiesttosolveforY
1
+Y
2
= 0
j
A
1
tan

1
j
A
2
cot

2
= 0
c c c c
sin

1
sin

2

A
2
cos

1
cos

2
= 0
c c A
1
c c
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction16
Decoupling Simple Tube Approximations
IfA
1
A
2
, orA
1
A
2
,thetubescanbedecoupledandnatural
frequenciesofeachtubecanbecomputedindependently
Forthevowel/i/,theformantfrequenciesareobtainedfrom:
A
1
A
2
1
l
2
l
c c
f
n
=
2
1
n plus f
n
=
2
2
n
Atlowfrequencies:

A
2

1/2
1

1

1/2
c
f = =
2 A
1

2
2 C
A
1
M
A
2
ThislowresonancefrequencyiscalledtheHelmholtzresonance
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction17
Vowel Production Example
7 cm
2
1 cm
2
8 cm
2
1 cm
2
9 cm 8 cm
9 cm 6 cm
+
+ +
1093 268 1944 2917 972
2917 . . .
. . . .
. . . .
. . . .
Formant Actual Estimated Formant Actual
F1 789 972 F1 256
F2 1276 1093 F2 1905
F3 2808 2917 F3 2917
. . . . .
. . . . .
Estimated
268
1944
2917
.
.
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction18
Example of Vowel Spectrograms
kHz kHz
Wide Band Spectrogram
kHz kHz
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
Time (seconds)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
kHz kHz
0 0
8 8
16 16
Zero Crossing Rate
dB dB
Total Energy
dB dB
Energy -- 125 Hz to 750 Hz
Waveform
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
kHz kHz
Wide Band Spectrogram
kHz kHz
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
Time (seconds)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
kHz kHz
0 0
8 8
16 16
Zero Crossing Rate
dB dB
Total Energy
dB dB
Energy -- 125 Hz to 750 Hz
Waveform
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
/bit/ bat/
6.345 Automatic Speech Recognition Acoustic Theory of Speech Production 19
/
Estimating Anti-Resonance Frequencies (Zeros)
Zerosoccuratfrequencieswherethereisnomeasurableoutput
U
N
U
G
A
p
A
o
A
n
Y
p
Y
o
Y
n
n
l
A
b
A
c
A
f
P
s
U
L
l
p
l
o
l
b
l
c
l
f
Fornasalconsonants,zerosinU
N
occurwhereY
O
=
Forfricativesorstopconsonants,zerosinU
L
occurwherethe
impedancebehindsourceisinnite(i.e.,ahardwallatsource)
Y = 0 Y + Y = 0
1 3 4
Zerosoccurwhenmeasurementsaremadeinvocaltractinterior
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction20
Consonant Production
A
b
A
c
A
f
P
s
l
b
l
c
l
f
POLES
ZEROS
+ + + +
A
b
A
c
A
f

f
[g] 5 0.2 4 9 3 5
[s] 5 0.5 4 11 3 2.5
[g] [s]
poles zeros poles zeros
215 0 306 0
1750 1944 1590 1590
1944 2916 3180 2916
3888 3888 3500 3180
. . . .
. . . .
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction21
Example of Consonant Spectrograms
kHz kHz
Wide Band Spectrogram
kHz kHz
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
Time (seconds)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
kHz kHz
0 0
8 8
16 16
Zero Crossing Rate
dB dB
Total Energy
dB dB
Energy -- 125 Hz to 750 Hz
Waveform
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
kHz kHz
Wide Band Spectrogram
kHz kHz
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
Time (seconds)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
kHz kHz
0 0
8 8
16 16
Zero Crossing Rate
dB dB
Total Energy
dB dB
Energy -- 125 Hz to 750 Hz
Waveform
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
/kip/ si/
6.345 Automatic Speech Recognition Acoustic Theory of Speech Production 22
/
A

A
Y

j
Y
l
Perturbation Theory
forsmall
l
Considerauniformtube,closedatoneendandopenattheother
l
x
Reducingtheareaofasmallpieceofthetubeneartheopening
(whereU ismax)hasthesameeectaskeepingtheareaxed
andlengtheningthetube
Sincelengtheningthetubelowerstheresonantfrequencies,
narrowingthetubenearpointswhereU(x)ismaximuminthe
standingwavepatternforagivenformantdecreasesthevalueof
thatformant
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction23
A
Perturbation Theory (contd)
A
Y

j
c
2
forsmall
Y
l
l
l
x
Reducingtheareaofasmallpieceofthetubeneartheclosure
(wherepismax)hasthesameeectaskeepingtheareaxedand
shorteningthetube
Sinceshorteningthetubewillincreasethevaluesoftheformants,
narrowingthetubenearpointswherep(x)ismaximuminthe
standingwavepatternforagivenformantwillincreasethevalue
ofthatformant
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction24
Summary of Perturbation Theory Results
x
glottis lips
SWP for
F
1
|
U(x)
|
SWP for
F
2
2
3
SWP for
F
3
2 4
5 5
x
glottis lips
F
1
1
2
+

(as a consequence of decreasing A)
F
2
1
2
+ +

F
3
1
2

+ +

+

6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction25
Illustration of Perturbation Theory
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction26
Illustration of Perturbation Theory
Theshipwastornapartonthesharp(reef)
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction27
Illustration of Perturbation Theory
(Theshipwastornapartonthesh)arpreef
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction28

Multi-Tube Approximation of the Vocal Tract


WecanrepresentthevocaltractasaconcatenationofN lossless
tubeswithconstantarea{A
k
} andequallengthx=/N
Thewavepropagationtimethrougheachtubeis=
x
=
Nc c
A A
7
x
x
x
x
x
x
x
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction29
Wave Equations for Individual Tube
Thewaveequationsforthek
th
tubehavetheform
c x
A
k
k
(t
x
) + U

c
p
k
(x, t) = [U
+
k
(t+ )]
c
U
k
(x, t) = U
+
c
) U

c
)
k
(t
x
k
(t+
x
wherexismeasuredfromtheleft-handside(0 x x)
+ + + +
U
k
( t ) U
k
( t - ) U
k+1
( t )
U
k+1
( t - )
-
- -
-
U
k
( t ) U
k
( t + ) U
k+1
( t )
U
k+1
( t + )
A
k
x
x
A
k+1
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction30
Update Expression at Tube Boundaries
Wecansolveupdateexpressionsusingcontinuityconstraintsat
tubeboundariese.g.,p
k
(x, t) = p
k+1
(0, t), andU
k
(x, t) = U
k+1
(0, t)
+
k + 1
U
+
k + 1
U
-
k
U )
-
k
U )
+
1 - r
1 + r
k
k
r
k k
- r

DELAY

DELAY

DELAY

DELAY
k th ( k + 1 ) st
k
(t ) + r
k
U

( t )
( t ) ( t +
( t -
tube tube
+
U
k
( t )
U
k + 1
( t - )
-
-
U
k
( t ) U
k + 1
( t + )
U
k
+
+1
(t)=(1+r
k
)U
+
k+1
(t)
U
k

(t+)=r
k
U
k
+
(t ) + (1 r
k
)U

k+1
(t)
r
k
=
A
k+1
A
k
note | r
k
| 1
A
k+1
+A
k
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction31
Digital Model of Multi-Tube Vocal Tract
Updatesattubeboundariesoccursynchronouslyevery2
Ifexcitationisband-limited,inputscanbesampledeveryT = 2
Eachtubesectionhasadelayofz
1/2
1
+
z
2 1 + r
k
+
U
k
( z )
k
r
1
k
-r
U
k + 1
( z )
- -
U
k
( z ) U
k + 1
( z )
z
2 1 - r
k
ThechoiceofN dependsonthesamplingrateT
T = 2 = 2

= N =
2
Nc cT
Seriesandshuntlossescanalsobeintroducedattubejunctions
Bandwidthsareproportionaltoenergylosstostorageratio
Storedenergyisproportionaltotubelength
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction32
Assignment 1
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction33
References
Zue,6.345CourseNotes
Stevens,AcousticPhonetics,MITPress,1998.
Rabiner&Schafer,DigitalProcessingofSpeechSignals,
Prentice-Hall,1978.
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction34

You might also like