User Guide vtl3d
User Guide vtl3d
Rémi Blandin
August 31, 2022
Contents
1 Introduction 1
1.1 What are 3D acoustic simulations? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 What VocalTractLab3D can do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 What VocalTractLab3D cannot do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Download, intallation and requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 How to cite VocalTractLab3D? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Interface 4
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Vocal tract geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Defining a vocal tract geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Visualizing the vocal tract geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Transverse modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Transfer functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.1 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.2 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Acoustic field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.1 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.2 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6 Default parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.7 Phoneme synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Log file 10
4 Aknowledgements 11
1 Introduction
VocalTractLab3D is a special version of the articulatory synthesizer VocalTractLab 2.3 [1] (www.vocaltractlab.de)
which integrates a module that performs 3D acoustic simulations. The other modules are, to some very little
differences, the same as the original VocalTractLab 2.3 and the reader is referred to the manual of VocalTractLab 2.3
to learn how to use them. The 3D acoustic simulations are performed with a frequency domain multimodal method
which has been designed to be particularly fast and accurate. The details of this simulation method are provided
in Blandin et al. [4].
1
a) b)
20
Magnitude (dB)
-20
1D
-40 3D
0 2 4 6 8 10 12 14 16 18 20
f(kHz)
c)
Figure 1: Transfer function and acoustic fields computed for the vowel /u/. The transfer function has been computed
with a 1D and a 3D simulation. The acoustic fields have been computed at frequencies shown on the transfer function
with arrows. They are shown in the sagittal plane and some selected transverse planes indicated with dashed lines.
functions, as an example, between the acoustic volume flow created at the vocal folds and the acoustic pressure
radiated in front of the lips. It can also be used to compute the acoustic field, which describes the variations of the
acoustic pressure and the particle velocity over space.
What is specific to vocal tract acoustics?
The vocal tract has an elongated shape in which the acoustic waves are guided to travel mainly along its length.
From the point of view of wave propagation, the vocal tract can be called a waveguide. This specificity of the vocal
tract makes easy to approximate the propagation of acoustic waves using a single value of the acoustic pressure
varying along its length, thus neglecting transverse variations of the acoustic field. This has led to 1D simulation
methods and electrical analogies which are very widely used to simulate vocal tract acoustics [9].
What is 3D vocal tract acoustics?
Even though not very important below 4-5 kHz, the acoustic field has transverse variations, and thus, varies in all
the three dimensions of space. At low frequency these variations appear as a curvature of the acoustic field related
to variations of cross-sectional dimensions (see Fig. 1a). At higher frequency, the 3D nature of the acoustic field is
more obvious as transverse resonances can be observed (see Fig. 1b).
What does accounting for the 3D acoustics changes in comparison to using a 1D simplifying
assumption?
The impact of accounting for the 3D nature of the acoustic field inside the vocal tract is rather limited up to about
3 kHz. It consists mainly in small changes in the resonance properties (frequency, amplitude and bandwidth). From
3 kHz on, the changes in the resonance properties can be more substantial. Above 4-5 kHz the transverse resonances
can induce zeros and additional peaks in the transfer function (see Fig.1c).
What is the multimodal method?
It is a simulation method which relies on the projection on eigen-function basis. Such an approach is very efficient
to reduce computation times and memory requirements. In the case of the vocal tract, the 3D geometry is cut
in multiple segments in which the local transverse eigen modes are computed and on which the acoustic field is
2
decomposed. The method implemented in VocalTractLab3D is described in details in Blandin et al. [4].
3
• wxWidgets 3.1.5 (https://www.wxwidgets.org) is used for the graphical interface.
• boost 1.71.0 (https://www.boost.org/) is used for Bessel functions and Gauss integration.
• The Computational Geometry Algorithms Library (CGAL 5.0) (https://www.cgal.org) is used for the
generation of mesh and other geometry problems.
• Eigen 3.3.9 (http://eigen.tuxfamily.org) for solving linear algebra problems, in particular eigenvalue
decomposition for the computation of the transverse modes.
Note that VoocalTractLab3D comes with no warranty of any kind.
1.6 Troubleshooting
For any issue, bug report or question related to VocalTractLab3D, please write to [email protected] or
[email protected].
2 Interface
2.1 Overview
4
The "3d acoustic simulation" page is divided into 4 panels (see Fig. 2):
1. the left panel contains buttons to manage the geometries, launch the simulations and synthesize phonemes.
2. The middle panel shows a sagittal cut of the geometry simulated.
3. The right panel shows a transverse cut of the geometry corresponding to a specific segment.
4. The bottom panel shows the transfer functions and input impedance computed.
Centerline x Normal x Input scaling Contour point 1 y Contour point 2 y ... Contour point N y
Centerline y Normal y Output scaling Contour point 1 z Contour point 2 z ... Contour point N z
One segment is defined on two lines: the first one describes the first coordinates (x or y) and the input scaling
factor, the second one the second coordinates (y or z) and the output scaling factor. This is summarized in the
Tab. 1. The columns must be separated by semi-columns ";". An example of such .csv file encoding a simple
waveguide geometry is provided in Tab. 2. Note that in this example the normal of the second segment is not
normalized. In this case the normalization is done when the file is imported, otherwise this would be equivalent to
applying a scaling factor. The length and curvature of a segment are defined by its centerline point and normal
and the centerline point and normal of the following segment. Thus, a minimal number of two segments must
be provided. The before-last and the last segment are defined by computing an intermediate centerline point and
normal between the last and before last centerline points and normals provided.
Table 2: Example of .csv file which can be imported to generate a waveguide geometry.
Examples of such files generated for vowel geometries measured on magnetic resonance image (MRI) are provided
in the archive in the folder "geometries_from_MRI". They have been generated with VocalTractTransferFunction
from MRI provided in the Dresden Vocal Tract Dataset [2]. The software VocalTractTransferFunction can be
5
downloaded at www.vocaltractlab.de and used to load a surface mesh and save it in the .csv format specified
above (Not yet though).
The geometry can also be exported in the same format through the context menu which appears with a right
click on the central panel "Export geometry in a csv file".
Geometry options:
When a geometry is loaded, it can be chosen to take into account or not the curvature and the area variations in
the segments. This is done using the simulation parameters dialog displayed by clicking on the button "Simulation
parameters" in the left panel. These options are found in the "Geometry options" section. When "Varying area" is
checked, the variation of area is taken into account through the scaling factor which is set to vary linearly from the
entrance to the exit of the segments. The entrance and exit scaling factors are displayed in the information text of
the right panel. The variations of the scaling factor can be computed in two different ways, or directly provided by
the user when the geometry is loaded as a .csv file. These options can be selected in the "Geometry options" as
well. One scaling factor computation method, "Area", consists simply in linearly interpolating the cross-sectional
area. The other one, "Bounding box", interpolates the largest dimension of the bounding box of the contour of the
segments, provided that the resulting scaled contour does not exceed the area of the following contour, in which
case it is set to interpolate the area. Finally, one can specify that the scaling factors provided in the input .csv file
must be used by selecting "From file".
Epiglotis
Openings on the
side of the lips
Teeth
Uvula
Tongue
Lips
Figure 3: Correspondence between the colors of the contour and the anatomical parts to which the walls corresponds.
The coordinates of the points of the contour can also be exported into a text file through the context menu of
the right panel obtained with a right click "Export contour in text file". Note that the contour exported has the
scaling with which it is displayed: if at the entrance the scaling is 0.5 and the entrance contour is displayed, the
coordinate of the exported contour will be the ones of the original contour multiplied by 0.5. The exported contour
can easily be plotted using another software in the same way as the segment picture.
6
2.3 Transverse modes
2.3.1 Computation
The transverse modes can be computed by clicking the button "Compute modes" of the left panel. This can be
useful if one is interested in analyzing the transverse modes without computing the acoustic field or the transfer
functions.
The computation of the transverse modes is parametrized by:
• the density of the mesh which is used to solve with 2D FEM the eigenvalue problem giving the transverse
modes and their associated cutoff frequencies. √This is related to the average side length of the elements
through the relationship average side length = cross−sectional
mesh density
area
. Thus, the mesh density is an estimation
of the number of elements per characteristic length.
• The maximal cutoff frequency. It is an upper limit to the cutoff frequency of the transverse modes included
in the simulations: for a given segment, only the modes having a cutoff frequency lower than this value are
kept. Thus, segments having a small cross-section have less transverse modes than the ones having a bigger
one. This is done to increase the efficiency of the simulations.
The cutoff frequency is related to the sound speed, which itself is related to the temperature. Both the sound
speed and the temperature can be set in the "Physical constants" section of the "Simulation parameters" dialog.
Since both quantities are related, they cannot be modified independently: changing the temperature will change
the sound speed and conversely.
2.3.2 Visualization
The mesh used to compute the transverse modes can be visualized in the right panel by selecting "Mesh" in the
bottom. The transverse modes can be visualized by selecting "Modes" at the bottom. One can browse the different
modes using the arrows "<" and ">". The amplitude variation of the modes is displayed as a color scale, and their
cutoff frequency is given in the text information.
• "Constant wall admittance" includes a frequency independent wall admittance whose real and imaginary parts
can be set by the user.
7
The index of the segment in which the noise source is integrated can be set either in the "Transfer functions
options" section of the "Simulation parameters" dialog, or through the context menu of the central panel by selecting
"Define current segment as noise source location". The noise source segment is highlighted in blue (or green when
selected) in the central panel. When the segment has a non-zero length, the noise source is implemented at the end
of the segment which is closer to the mouth exit. The noise source implemented is uniform over the cross-sectional
surface, which is equivalent to excite the vocal tract with a plane wave at the specified location. The transfer
function computed for the noise source is a pressure-pressure transfer function, contrarily to the glottal transfer
function which is a velocity-pressure transfer function.
The upper frequency limit for the transfer function computation can be set in the "Transfer functions options"
section of the "Simulation parameters" dialog. The frequency step size can be selected in a list. The proposed
values correspond to divisions of the sampling frequency by powers of 2 to make the synthesis which can be done
afterward faster.
The coordinates of the reception point of the transfer functions can also be set in the "Transfer functions options"
section of the "Simulation parameters" dialog. It can be chosen either to use a single point whose coordinates can
be directly set, or to use several points whose coordinates can be loaded from a .csv file. The origin of the landmark
in which the coordinates of the reception points are expressed is the center of the mouth exit. The ny unit vector
of this landmark is the normal to the centerline at the mouth exit. The reception points can be placed anywhere.
However, if it is located in the half-space behind the mouth exit and not inside the vocal tract, the returned value
will be "nan".
When point coordinates are loaded from a .csv file, they must be given in 3 columns corresponding to the x,
y and z coordinates. This functionality can be useful to compute the directivity patterns of the radiated sound, or
the acoustic field at multiple frequencies.
2.4.2 Visualization
The transfer function points are visualized as "+" on the middle panel. It is possible to hide them by unchecking
"Show TF points" on the bottom of the panel. This can be useful if many points are used and their visualization
disturbs the visualization of the other elements. Note that if the point is located outside of the area of the sagittal
cut displayed, it will not be visible.
The transfer function and the input impedance computed are displayed in the bottom panel. The glottal transfer
function, the noise source transfer function and the input impedance are plotted in black, blue and green respectively.
It is possible to show or hide each of them by checking or unchecking "Glottal transfer function", "Noise transfer
function" or "Input impedance" on the right of the bottom panel.
In case several points are used, the transfer functions corresponding to the different reception points can be
visualized by clicking on the "<" and ">" buttons on the right of the bottom panel. The coordinates of the point
corresponding to the transfer function plotted appear above these buttons, and the corresponding point is displayed
as a red "+" in the middle panel. Note that the input impedance does not depend on a reception point location,
and hence it will be the same for each point.
The transfer functions and the input impedance can be exported using the context menu which is displayed by
a right click on the bottom panel. They are saved in a text file in which the first column gives the frequency, the
second and third the magnitude and phase of the first point, and the following columns the magnitude and phase
of the other points, if other points have been included. Such text files can easily be loaded in another software such
as Matlab to plot and analyse the data. This can be done easily with Matlab with the following code:
1 load t r a n s f e r _ f u n c t i o n . t x t
2 figure
3 subplot 211
4 plot ( t r a n s f e r _ f u n c t i o n ( : , 1 ) , 20∗ log10 ( t r a n s f e r _ f u n c t i o n ( : , 2 ) ) )
5 xlabel ( " f ( Hz ) " )
6 ylabel ( " Magnitude (dB ) " )
7 subplot 212
8 plot ( t r a n s f e r _ f u n c t i o n ( : , 1 ) , t r a n s f e r _ f u n c t i o n ( : , 3 ) )
9 xlabel ( " f ( Hz ) " )
10 ylabel ( " Phase ( rad ) " )
8
2.5 Acoustic field
2.5.1 Computation
The acoustic pressure field can be computed in the sagittal plane and the transverse planes by clicking the button
"Compute acoustic field". It corresponds to a sound source located at the glottis. The frequency at which it is
computed can be set by moving the vertical dashed line in the transfer function plot in the bottom panel. A precise
frequency can also be set in the section "Acoustic field options" of the "Simulation parameters" dialog.
In the sagittal plane the acoustic field is computed in a rectangular area displayed as a gray rectangle in the
middle panel. By default, this rectangle is the bounding box of the geometry outline. The dimensions of this
rectangle can be modified by clicking "Define bounding box lower corner" or "Define bounding box upper corner"
in the context menu which is displayed by right clicking on the middle panel. In this case, the location of the
right click is attributed to the lower left corner or the upper right corner of the rectangular area respectively. This
functionality is useful for looking in more details at a specific area. Alternatively the dimensions of this rectangular
area can be manually set in the "Acoustic field options" of the "Simulation parameters" dialog. This can be useful
if one wants to visualize the radiated field as well. In this case, the maximal value of x can be increased to extend
the area to the radiated field. The original dimension of the acoustic field area can be restored by double clicking
on the middle panel, or clicking "Reset bounding box" in the context menu of the middle panel.
The resolution of the grid of points used to compute the sagittal plane acoustic field can be set in the "Acoustic
field options". In the transverse plane the resolution of the field corresponds to the resolution of the image displayed
on the screen.
The computation of the radiated field takes a bit more time than the internal field, so it is possible to avoid
computing the radiated field by unchecking the option "Compute radiated field" in the "Acoustic field options".
2.5.2 Visualization
Once computed, the acoustic field is displayed as a logarithmic color scale in the middle and right panels. For
a better visualization, the segments and/or the transfer function points can be hidden by unchecking the options
"Show segments" and "Show TF points" in the middle panel. Alternatively, the acoustic field can also be hidden
by unchecking the option "Show field". This can be useful if the acoustic field disturbs the visualization of the
segments and/or transfer function points. In the transverse plane the acoustic field corresponds to the exit plane
of the segments.
The acoustic field can be exported in text files and easily loaded in other softwares such as Matlab for further
analysis or different visualization. This can be done by clicking "Export acoustic field as text file" in the context
menu shown by right clicking on the middle and right panels. The acoustic field can be easily loaded and plotted
with Matlab with the following code:
1 load " a c o u s t i c _ f i e l d . t x t "
2 figure
3 imagesc ( 2 0 ∗ log10 ( a c o u s t i c _ f i e l d ) ) ;
4 axis xy
5 axis e q u a l
9
the bottom panel. The glottal pulses are generated with a Liljencrants-Fant model [7] whose parameters can be
defined using the dialog shown by clicking on the button "LF glottal flow pulse" in the left panel.
The noise synthesis can be useful to synthesize fricative consonants. A synthetic noise signal corresponding to a
white noise filtered with a first order low-pass filter having a cutoff frequency of 5 kHz is convolved with the impulse
response of the noise source transfer function displayed in the bottom panel.
The synthetic sound generated correspond to the point at which the transfer functions displayed have been
computed. Thus, it is possible to listen to the synthetic vowel generated at various locations. This can be useful
to study directivity effects. However, note that phenomena important for directivity such as the head and torso
diffraction are not simulated. Thus, the directivity effects which can be studied are only due to the mouth opening
dimension and the influence of the vocal tract on the acoustic field at the mouth exit.
The synthesized sounds can be visualized and analyzed in the "Signal" page. It is also possible to export them
as .wav files by clicking "Save WAV" or "Save WAV as TXT" in the "File" menu of the main window.
3 Log file
The parameters used for each simulation and information regarding the evolution of the simulation process are
given in a log file. An example of such file is given below:
1 Wed J u l 20 1 5 : 0 6 : 0 9 2022
5 PHYSICAL PARAMETERS:
6 Temperature 3 1 . 4 2 6 6 C
7 Volumic mass : 0 . 0 0 1 1 5 7 7 1 g /cm^3
8 Sound s p e e d : 35000 cm/ s
10 BOUNDARY CONDITIONS :
11 P e r c e n t a g e l o s s e s 100 %
12 Visco−t h e r m a l l o s s e s i n c l u d e d
13 v i s c o u s boundary s p e c i f i c a d m i t t a n c e ( 2 . 0 2 9 8 4 e − 0 5 , 2 . 0 2 9 8 4 e −05) g . cm^−2 . s^−1
14 t h e r m a l boundary s p e c i f i c a d m i t t a n c e ( 4 . 8 4 8 3 2 e − 0 5 , 4 . 8 4 8 3 2 e −05) g . cm^−2 . s^−1
15 Wall l o s s e i n c l u d e d
16 g l o t t i s boundary c o n d i t i o n : IFINITE_WAVGUIDE
17 mouth boundary c o n d i t i o n : ZERO_PRESSURE
10
This file, named log.txt, is generated and modified automatically in the working directory of VocalTractLab3D.
It can be useful to assert which parameters have been used for a specific simulation, or to follow in more details the
simulation process. During a simulation, on can follow its updates in real time with an appropriate software. This
can be done with Notepad++ by selecting the option "Monitoring" in "View". In Linux this can be done in the
command line with
1 t a i l −f l o g . t x t
A copy of the log file can be saved to keep track of the parameters used for a specific simulation.
4 Aknowledgements
The development of VocalTractLab3D was supported by the German Research Foundation (DFG) under Grant BI
1639/7-1.
I am very grateful for the support of Peter Birkholz for the development of this special version of VocalTractLab.
Without his work on articulatory synthesis, which led to the development of VocalTractLab, this software could
not exist. All along this project he was very helpful and supportive through insightful discussions which helped
designing the software and solving problems.
We acknowledge the contribution of Jingyan Geng for the creation of the geometry files from MRI data. We
thank all the members of the Chair of Speech Technologies and Cognitive Systems of the TU-Dresden and Mario
Fleischer for helping with testing VTL3D and spotting bugs.
References
[1] P Birkholz. “Modeling consonant-vowel coarticulation for articulatory speech synthesis”. In: PloS one 8.4 (2013),
e60603. doi: 10.1371/journal.pone.0060603.
[2] P Birkholz et al. “Printable 3D vocal tract shapes from MRI data and their acoustic and aerodynamic prop-
erties”. In: Scientific data 7.1 (2020), pp. 1–16.
[3] R Blandin et al. “Effects of higher order propagation modes in vocal tract like geometries”. In: J. Acoust. Soc.
Am. 137.2 (2015), pp. 832–843. doi: 10.1121/1.4906166.
[4] R Blandin et al. “Efficient 3D acoustic simulation of the vocal tract by combining the multimodal method and
finite elements”. In: IEEE Access (2022), pp. 69922 –69938. doi: 10.1109/ACCESS.2022.3187424.
[5] R Blandin et al. “Multimodal radiation impedance of a waveguide with arbitrary cross-sectional shape termi-
nated in an infinite baffle”. In: J. Acoust. Soc. Am. 145.4 (2019), pp. 2561–2564. doi: 10.1121/1.5099262.
[6] AM Bruneau et al. “Boundary layer attenuation of higher order modes in waveguides”. In: J. Sound Vib. 119.1
(1987), pp. 15–27. doi: 10.1016/0022-460X(87)90186-6.
[7] G Fant, J Liljencrants, QG Lin, et al. “A four-parameter model of glottal flow”. In: STL-QPSR 4.1985 (1985),
pp. 1–13.
[8] V. Pagneux. “Multimodal admittance method in waveguides and singularity behavior at high frequencies”.
In: J. Comput. Appl. Math. 234.6 (2010). Eighth International Conference on Mathematical and Numerical
Aspects of Waves (Waves 2007), pp. 1834–1841. issn: 0377-0427. doi: 10.1016/j.cam.2009.08.034.
[9] M Sondhi and J Schroeter. “A hybrid time-frequency domain articulatory speech synthesizer”. In: IEEE Trans.
Audio Speech Lang. Process. 35.7 (1987), pp. 955–967. doi: 10.1109/TASSP.1987.1165240.
11