Game Audio Programming 4 Principles and Practices
Game Audio Programming 4 Principles and Practices
With such a wide variety of topics, game audio programmers of all levels will find
something for them in this book. The techniques presented in this book have all
been used to ship games, including some large AAA titles, so they are all practical
and many will find their way into your audio engines. There are chapters about
timed ADSRs, data-driven music systems, background sounds, and more.
This book collects a wealth of advanced knowledge and wisdom about game audio
programming. If you are new to game audio programming or a seasoned veteran,
or even if you’ve just been assigned the task and are trying to figure out what it’s all
about, this book is for you!
Guy Somberg has been programming audio engines for his entire career. From
humble beginnings writing a low-level audio mixer for slot machines, he quickly
transitioned to writing game audio engines for all manner of games. He has writ-
ten audio engines that shipped AAA games like Hellgate: London, Bioshock 2, The
Sims 4, and Torchlight 3, as well as smaller titles like Minion Master, Tales from the
Borderlands, and Game of Thrones. Guy has also given several talks at the Game
Developer Conference, the Audio Developer Conference, and CppCon. When he’s
not programming or writing game audio programming books, he can be found at
home reading, playing video games, and playing the flute.
Game Audio
Programming 4
Principles and Practices
Edited by
Guy Somberg
Designed cover image: Shutterstock
First edition published 2024
by CRC Press
2385 NW Executive Center Drive, Suite 320, Boca Raton, FL 33431
and by CRC Press
4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
CRC Press is an imprint of Taylor & Francis Group, LLC
© 2024 selection and editorial matter, Guy Somberg; individual chapters, the contributors
Reasonable efforts have been made to publish reliable data and information, but the author and
publisher cannot assume responsibility for the validity of all materials or the consequences of their use.
The authors and publishers have attempted to trace the copyright holders of all material reproduced in
this publication and apologize to copyright holders if permission to publish in this form has not been
obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com
or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. For works that are not available on CCC please contact [email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are
used only for identification and explanation without intent to infringe.
DOI: 10.1201/9781003330936
Acknowledgments, ix
Foreword, xi
Contributors, xv
vii
viii ◾ Contents
INDEX, 327
Acknowledgments
Every book in this series accumulates more and more people to whom I
am indebted for its creation. This page in each volume is my opportunity
to name them all again and again, which I do with relish.*
My contributors are the lifeblood of these books, and this volume is no
exception. I always learn so much from all of you – thank you all!
Throughout my career, folks have taken me under their wing, taken
chances on me, and allowed me to grow and flourish. Tyler Thompson,
David Brevik, and Brian Fitzgerald I have mentioned in previous volumes,
and I am still indebted to you all. New to this volume is Justin Miller, who
has been a great boss.
It’s entirely possible that Thomas Buckeyne will never see these
books and will never know that he is acknowledged in every single
one, but without him I would never have discovered my joy of audio
programming.
David Steinwedel hasn’t been a sound designer for a while, but he still
gets acknowledgment here for working with me on my first big game,
Hellgate: London.
I have learned so much from the sound designers I’ve worked
closely with over the years: David Steinwedel, Jordan Stock, Andy
Martin, Pam Aranoff, Michael Kamper, Michael Csurics, and Erika
Escamez.
Another new name in this volume is Fiach O’Donnell, the first audio
programmer I’ve ever had the opportunity to work alongside. Thanks for
joining me, Fiach!
Thanks to Rick Adams from CRC Press, who got me started writing
the first volume of this series, and thanks to Will Bateman, Simran Kaur,
and the rest of the team at CRC Press, who are working on this volume.
ix
x ◾ Acknowledgments
NOTE
* And mustard and catsup! (Finishing this book has made me loopy, I think.)
Foreword
INTRODUCTION
Welcome to the fourth volume of Game Audio Programming: Principles and
Practices! Every one of these volumes opens with this phrase: a greeting to all
who have opened this book, either its physical pages or selected this volume in
their e-reader application. These words are in some ways ceremonial, but I also
mean them with all of the warmth and enthusiasm that I can muster in black
and white text. Game audio programming is a passion that I and the contribu-
tors to this book all share, and we are all excited to share our knowledge with
you, the reader, in its pages. As always, everybody who contributed to this
work has done a fantastic job of laying out their knowledge and skills.
When the first volume of this series was published, little to none of the
combined wisdom of the game audio programming community was doc-
umented, or even written down. It was with great joy that I had to coordi-
nate with several of the contributors to this volume in order to make sure
that their chapters for this volume didn’t overlap with similar topics from
previous volumes. The fact that we have this much collective wisdom writ-
ten down means that we can all stand on our colleagues’ giant shoulders
and see further for future games.
This Book
As with the previous volumes, there are two broad categories of chapters in
this book. The first are high-level overviews of topics, the thinking behind
them, and some techniques to approach certain aspects of them. The sec-
ond are deep dives into code and specific approaches to solving problems.
Here are brief summaries of all of the chapters in this book:
Parting Thoughts
At a recent Game Developers Conference, I had several people come up to
me and tell me how valuable this book series has been to them and their
careers. These stories make me happy and really inspire me to continue
doing the work and putting in the effort to make them. I always learn so
much and get so much inspiration from my fellow game audio program-
mers; hopefully, you do as well!
Contributors
Robert Bantin has been writing audio code for an ever-expanding length
of time. While at school, they were an active member of the Amiga demo
scene. At Salford University, they studied acoustics and brought their
coding experience to their studies in the form of DSP and audio-focused
applications. Upon graduating, they were recruited by Philips ASA
Labs in Eindhoven in order to join the MPEG technology programme.
Robert Bantin has since worked on several AAA games for Activision,
Codemasters, and Ubisoft. When they’re not programming, they can be
found at home cooking, attempting to shred on guitar, and playing video
games when the rest of their family are asleep. And the rabbit thinks they
are one of the cats.
Michael Filion has been developing video games for his career of more
than 15 years with Ubisoft Québec, with the majority in the world of
audio. When explaining his work and passion to friends and family, he
often oversimplifies by stating that he is “responsible for ensuring the
bleeps and bloops work in the game.” He has had the opportunity to
work with many talented people from around the world on games such
as Assassin’s Creed, Tom Clancy’s The Division, and Immortals: Fenyx
Rising. In between delivering amazing titles, he is known to friends
and family for his multitude of different projects, from traveling the
world with his daughter to earning his pilot’s license to renovating his
“haunted” house.
Jorge Garcia started his career in audio in 2001. His passion and curios-
ity about the potential of audio and music technology to improve people’s
lives has led him to study sound and music computing and work as an
audio engineer, software engineer, and programmer for professional audio
brands and game studios on franchises such as FIFA, Skate, and Guitar
Hero. Jorge is also experienced in all parts of the interactive audio technol-
ogy stack across various platforms (desktop, mobile, and console) using
proprietary technologies and middleware. He also enjoys working with
creatives and thinking on ways of improving their workflows and making
their process more efficient and enjoyable. In his spare time, and on top of
learning about disparate topics, he likes hiking, reading books, spending
time with his family and friends, traveling, and relaxing.
Contributors ◾ xvii
Colin Walder has been developing audio technology for games since
2006, with a focus on AAA games. His credits include GTA V, Red Dead
Redemption 2, The Witcher 3: Wild Hunt, and Cyberpunk 2077, where he
was involved in a wide range of audio topics from acoustics and ambi-
sonics to performance and streaming. He currently works as Engineering
Director, Management and Audio at CD Projekt RED, where in addition
to audio topics, he provides direction for management across the technol-
ogy department.
I
Game Integration
1
Chapter 1
Audio Object
Management
Techniques
Christian Tronhjem
before it can be used and its position updated. We can think of this as our
general audio object: some functions and data that we can use throughout
our game to emit audio. It might not be clear to other programmers that
the object needs to be registered in our middleware, so having a Play()
method in the class for your audio object or a method that takes the audio
object as an argument would signal that this is what is needed for correct
behavior. Using these affordances enforces using your audio object struc-
ture rather than just a normal game entity and gives you more control.
The details of every implementation will be different, but your basic
audio object would include ways to at least register and send a play event
to your audio object with your middleware and have an update method
to take care of any behavior that would have to run while the audio is
playing.
class AudioObject
{
private:
// Could also just be the position vector, but you would
// most likely want to know the rotation as well
Transform m_Position;
void Register();
void DeRegister();
public:
AudioObject() {}
int Play(int eventId);
void Update();
void SetPosition();
};
on your game engine, but this is what this chapter refers to when talking
about updates at slower rates.
1.2 POOLS
We need ways of supplying our game with audio objects. We want to
request these when needed, and not just have game audio emitters assigned
to all objects in the game, as this could quickly add up to a lot of redundant
objects. Having to allocate and deallocate these in memory every time we
want to play audio can be quite costly and quite unnecessary.
Since we most likely won’t have 3,000 audio objects active at the same
time, we can reuse some pre-allocated objects and shuffle them around when
needed. This is where a pool of objects can come in handy. By using the same
objects allocated once, we save both memory and CPU cycles. We can simply
request a new emitter from the pool, set the position of the emitter at the
place we want to play audio from, and return it to the pool when we are done.
We need to select a quantity to start with – let’s go with 400 objects for
this example. When we want to emit audio, we request a new object from
the pool, set the position, and use its ID for playing audio on the object.
class PooledAudioEmitter
{
public:
AudioObject* AudioObject;
PooledAudioEmitter* Next;
};
PooledAudioEmitter* GetFreeObject()
{
assert(m_First != nullptr);
PooledAudioEmitter m_Pool[MAX_POOL_SIZE];
PooledAudioEmitter* m_First = nullptr;
};
It doesn’t matter how many objects are in the pool if they are never
put back in the pool.1 Every game will have different requirements that
will inform when to return the objects to the pool. One option could be
to return objects to the pool when the audio finishes by registering our
ReturnObject() function to a callback when the audio is done playing.
However, if audio is emitted from a behavior frequently, such as a foot-
step, it would be wasteful to request and send back an audio object for
every audio event. On top of that, there could be context data that has to
be set, such as switched or other game parameters, that would need to be
reset every time, which would waste processor cycles. In these cases, it
might be worth tying the emitter’s lifetime to the lifetime of the object.
Alternatively, objects that exhibit this behavior could have a permanent
reference to an emitter assigned, rather than request one from the pool.
Another option is to return an audio object after it has not been used by
its current game object for a certain amount of time. This solution works,
but it comes with the overhead of keeping track of time and when to return
the objects. Also, because the audio objects now live for longer on their
game objects, the total count of audio objects will need to go up.
One downside to a pool is that the data associated with the specific
object must be set every time we want to play audio from that object again,
and likewise potentially clean it upon returning, if the objects are not de-
registered from the middleware. Features such as switches in Wwise would
have to be set every time an emitter is requested from the pool and could
add overhead to your “play audio” code. You will need to decide if the cost
of memory versus CPU cycles is worthwhile to keep all your emitters in
the pool registered with the middleware.
Now that we have allocation and objects out of the way, let us have a
look at managing the position updating.
listener than if they were 2 meters in front of the listener moving 2 meters
to the left. With that information, we can save more updates by updating at
a lower frequency – say every 3rd frame or even every 10th frame.
It could be set up in the following way. Implement a routine that runs if
the audio object is active which performs a distance calculation between
the object and the listener. You could then categorize an object’s distance
into buckets of thresholds defined as close, medium, and far, and change
the update rate based on the defined thresholds for these categories. For
example, lower than 20 meters is close, and the object should update every
frame. Between 20 and 40 meters, it could update the position every 5th
frame or 3 times a second and so on. This could also be defined as inter-
vals in the object’s maximum distance so that it is individual to each audio
event.
It is possible that handling the buckets and calculating distance will
end up adding more overhead than they end up saving. As with any per-
formance improvement, you should profile to determine whether or not it
is worthwhile.
Another option to consider is to completely cull the event if the object
is outside of the max distance for the attenuation. This can create issues
where an event is looping if it would later come into range. For short-lived
audio events, this can be okay, though, as newly triggered sounds will be
audible if they are in range. Make sure to communicate this feature to the
sound designers in order to avoid confusion – culled sounds will not show
up in the middleware profiling tools, which can be surprising if sound
designers are not expecting it.
Different rates of updates can act as an effective way to save on API
calls. These are suggestions on actions you can take to try and optimize
your audio code, especially if the budget is tight. It comes at the cost of
having to maintain either more audio object types (and for audio design-
ers to know and understand them) or a more complex audio source that
can handle more of these cases. However, implementing this culling can
improve your overall performance and give you more milliseconds for
other actions.
1.4 GROUPING
Let us build on the previous example of the NPCs in the marketplace. For
certain configurations in our game, we know that a group of behaviors will
always be close together and never separate. NPCs are a good example of
this. Considering the distance calculation to the listener, an approximation
10 ◾ Game Audio Programming 4: Principles and Practices
throughout the game. We are also saving update cycles for all these objects
as we just learned we do not need to update positions all at once.
However, we are now facing another problem. There might be over a
thousand torches in the dungeon. We want to be able to decide for our-
selves when the static source should play and stop and not rely on visual
rendering for it to play audio.2
Luckily, we do not need to set the position more than once, so one
approach could be to just have the sources start playing at the beginning
and continue throughout their lifetime. However, when started just in the
beginning, we end up with sources that play audio that is not audible to the
player. Middleware saves CPU by not processing these audio files usually
by making them “virtual,” but we can optimize this further since there
is no good reason to have thousands of virtual voices playing. To solve
this, we can calculate the distance to each object and determine if they are
in range. This could mean a lot of redundant checking, as some objects
are far away and unlikely to be in range any time soon. Even with slower
update rates, evaluating hundreds or even thousands becomes unneces-
sarily heavy. So, in short, we need a way to find the objects that are close
by without having to check all the objects at our disposal.
This is where dividing the space up can help us, also called spatial
partitioning.
1.6.1 Grid
Since we know the sources will never move, we can develop a method for
knowing what is near to us and what is far away. If we divide the space
into chunks, we know that the chunk next to the one the player is cur-
rently in is more important to check than a chunk closer to the edge of the
map. So, let us consider dividing our game space up into a grid that has a
predefined number of cells, where each cell has a fixed size. Each cell has a
list of objects that are placed inside that cell. For now, we will assume that
the audio objects all have the same size, meaning we will not take different
attenuations into account – we will look at this a little later.
12 ◾ Game Audio Programming 4: Principles and Practices
With this approach, we now have all sources belonging to some cell
in the grid, as well as the listener being in one of these cells. This way
we can easily query the cells immediately adjacent to the listener (the
eight surrounding cells), to get a list of the objects we should be check-
ing now for starting and stopping the audio. This leaves us with a lot
fewer objects to perform distance calculations. One good approach is
to remove the audio objects from the grid when we query the grid and
add them to an updating list, then start the audio playing on the object.
When a source’s distance becomes too far from the listener and it is not
in an adjacent grid, we remove it again from updating, stop the audio,
and insert it back into the grid. Below is sample code of how a simple grid
could look.
class Grid
{
public:
Grid(){}
private:
std::array<std::array<std::vector<AudioObject*>,
NUMBER_OF_CELLS>, NUMBER_OF_CELLS> grid;
};
First, we have a method for adding and getting, and a private method
for getting from an index, to have the size of the query. Size could either
be part of the class or as here defined elsewhere. In this example, the
AudioObject is an object that also holds a 2D vector position.
int yIndex =
static_cast<int>(audioObject->position.y / SIZE_OF_CELLS);
grid[xIndex][yIndex].push_back(audioObject);
}
When adding, we can find the x and y indices by dividing by the size
of the cell and casting to an integer. This example code does not check if
the index is within the size of the array and just uses a std::vector as the
backing store. Error checking and choice of data structure will be up to
your game mechanics.
For removing from a std::vector, we will use std::erase(), which
requires C++20. If you are using an older version of C++, the code to use
the erase() function with the “erase-remove” idiom is commented out
just below.
int yIndex =
static_cast<int>(audioObject->position.y / SIZE_OF_CELLS);
std::erase(grid[xIndex][yIndex], audioObject);
// If using an earlier version of C++
// auto& gridEntry = grid[xIndex][yIndex];
// gridEntry.erase(
// std::remove(gridEntry.begin(), gridEntry.end(), audioObject),
// gridEntry.end());
}
When we want to query from a position and get the adjacent grids, we
pass in a list to put the objects into, as well as a position from where we
want to query. We can loop through a 3×3 grid where the cell of the player
would be in the middle and get all the objects that were added to those
cells. It is up to you what type of container to pass in to get the objects
if you want to remove them from the grid if they are playing. When the
AudioObject starts playing, you will need to check at some sort of interval
if this source is still within range to stop it again.
14 ◾ Game Audio Programming 4: Principles and Practices
This should give you an idea of how a basic grid could be implemented.
Here, we just allocate everything up front, which could end up with a lot
of unnecessary memory spent on empty cells. Looking into implementing
a sparse grid could help you save memory by only allocating memory for
cells that contain an object. However, search times can become worse over
time as we add more objects to it.
Another issue is that, for now, we have not considered that audio sources
are rarely just a point in space. They are of varied sizes as we have differ-
ent attenuations and, therefore, also take up space around the object that
Audio Object Management Techniques ◾ 15
determines when we want to activate it. In that sense, the grid is a crude
solution. To handle this property, we can add a way to add an AudioObject
as a box that also checks intersections with adjacent cells and adds itself to
those as well. That way, we are able to have a source spanning over multiple
cells, but the cost is that we also have to manage multiple cells for querying
and removing as well in the implementation. So long as the attenuation
range is around the size or not bigger than two cells, we would not run into
many problems as we always query the adjacent cells to the one the player
is in. Of course, these queries would span over multiple cells, so it would
be worth tweaking the cell size and the number of cells.
1.6.2 Quadtree
Another approach to handling objects, and potentially better handling of
varied sizes of objects, is dividing the space using a tree structure called a
quadtree for 2D or an octree for 3D. As in the example before, we will go
over this in 2D, but it can easily be expanded into 3D. We will therefore
only be looking at the quadtree here. The octree is based on the same prin-
ciples, except that it divides into eight octants instead of four quadrants.
The idea of a quadtree is to subdivide spaces into quadrants to find the
best possible fit for an object. Instead of dividing our space into equal size
grids as looked at previously, we have a more granular defined grid of cells
that fit around the objects that are inserted. The algorithm is quite simple.
We define a cell size that is our whole game world first. This is our root.
Each tree node, including the root, will have four equally sized children,
each a quarter of the size of the parent.
When we want to insert an object, we look through the children to
see which one can contain the child. When we find one, we go down one
level to see if it can fit entirely inside the bounds of the child, and if not,
then add it to the level we were at. The process of inserting a square into
a quadtree is illustrated in Figure 1.1. In the example, we check if the
object is contained within one of the children, and we can see that it is
contained in the northwest quadrant (Figure 1.1a), so we can go one level
deeper. We subdivide the northwest quadrant into four (Figure 1.1b) and
check again if the object can fit fully inside one of the quadrants. We
can still fit it, this time in the southeast quadrant. When we go one level
deeper, however, it no longer fits fully inside the quadrant (Figure 1.1c).
The object overlaps two of the quadrants, and therefore we consider the
item to have the best fit two levels down in the southeast quadrant and
we can add it to the list.
16 ◾ Game Audio Programming 4: Principles and Practices
FIGURE 1.1 (a) Quadtree insertion, first step. The object fits into the northwest
quadrant. (b) Quadtree insertion, second step. The northwest quadrant is subdi-
vided, and the object fits into the southeast quadrant. (c) Quadtree insertion, third
step. The southeast quadrant of the northwest quadrant is subdivided. The object
overlaps with two of the subdivisions, so the algorithm stops.
This setup gives the benefit when querying our tree that we can check
for overlaps, rather than adjacent cells like we had with the grid. For que-
rying, we check which quadrants overlap with our input box, and poten-
tial items that intersect with the query field get returned. In the example
(Figure 1.2), we can see that we check all the first big children at level one.
Since they have no children in the northeast, southeast, and southwest
quadrants, we do not continue into those quadrants. We check each of the
four children in the northwest and find that we intersect with the child in
Audio Object Management Techniques ◾ 17
FIGURE 1.2 Quadtree query. The light gray box is passed through the quadtree
to find that it intersects the previously inserted object.
the southeast. Here we have an item, and we can see that it also intersects,
so we return it.
With this data structure, we have a much more granular approach to que-
rying, and the varying size of objects is handled much more dynamically
than in the grid example earlier. Imagine we had objects populated in the
other quadrants next to our object here. We would only get the ones that
intersect with our box, and it is closer to the objects we are interested in.
Below is some example code to give you an idea of how this could be
implemented.
constexpr int NUM_CHILDREN = 4;
class QuadTree
{
public:
QuadTree(const Bounds& nodeBounds, int nDepth = 0);
void Insert(const QuadTreeNode& audioObject);
void GetObjects(const Bounds& searchBounds,
std::vector<QuadTreeNode>& listItems);
18 ◾ Game Audio Programming 4: Principles and Practices
private:
void AddItemAndChildren(std::vector<QuadTreeNode>& listItems);
int m_Depth = 0;
Bounds m_Bounds;
std::vector<QuadTreeNode> m_Items;
};
m_ChildBounds =
{
Bounds(topLeft, halfSize),
Bounds(topRight, halfSize),
Bounds(bottomLeft, halfSize),
Bounds(bottomRight, halfSize)
};
}
We subdivide the root bounds and set the positions of the new children,
so each child is now a quadrant. If you are implementing an octree, this is
the main point where you would subdivide into eight cubes instead of four
quadrants. The rest of the functions would be largely the same, apart from
looping over more children.
To insert in the tree, we loop through all children and check whether the
new object can fit inside. Many game engines provide this overlap function-
ality built-in, but if you are implementing it yourself, you will need to check
if the x and y coordinates of the object to insert are within the coordinates
of the box. If the object is fully contained, we also make sure that the level
we reached is not bigger than the max defined depth. We then call Insert()
recursively to get to the best fit and return.
void Insert(const QuadTreeNode& audioObject)
{
for (int i = 0; i < NUM_CHILDREN; i++)
{
if (m_ChildBounds[i].contains(audioObject.Bounds) &&
m_Depth + 1 < MAX_DEPTH)
{
if (!m_ChildTreeNodes[i])
{
m_ChildTreeNodes[i] = std::make_unique<QuadTree>(
m_ChildBounds[i], m_Depth + 1);
}
m_ChildTreeNodes[i]->Insert(audioObject);
return;
}
}
m_Items.push_back(audioObject);
}
20 ◾ Game Audio Programming 4: Principles and Practices
To search the tree for objects, we pass in a Bounds for the area to
search as well as a list to insert items into, here just std::vector as an
example.
FIGURE 1.3 Left shows the radius defined as an extent. Right shows twice the
radius as width and height.
Audio Object Management Techniques ◾ 23
aid you in managing a lot of objects. You will find that this is a deep topic
and there is much room for improvement, and this is just the beginning of
techniques to help you solve these problems.
NOTES
1 “A cache with a bad invalidation policy is another name for a memory
leak.” This statement is from Raymond Chen, who is quoting Rico Mariani.
https://devblogs.microsoft.com/oldnewthing/20060502-07/?p=31333 – Ed.
2 One might also argue that all these torches or even just the sources should
not be loaded in all at the same time, but that is a discussion for another
time. For the sake of the example, let us assume that they are all loaded at
once.
Chapter 2
State-Based
Dynamic Mixing
Colin Walder
Dynamic mixing is one of the key topics that separates game audio from
audio in other industries. Audio mixing techniques are well known from
a long tradition in linear media: how to change the sound volume, pro-
cessing, or even selection of specific sounds to support the narrative. In
interactive media, these techniques are made more challenging by the
nonlinear nature of the medium. Typically, we will have a base mix for the
game, sometimes called the static mix, that will shape the overall audio
aesthetic, and then elements of dynamic mixing that will adjust or super-
impose upon that mix as the situation changes in the game. Even in a game
with a linear narrative structure, the exact timing of events will be flexible
and can change based on the decisions of the player. In some games, the
narrative itself might change dramatically based on those decisions and
actions. When we know the course of events in the game ahead of time,
changes can be described using existing systems such as scripting, visual
scripting, quest graphs, etc. If the game is nonlinear or requires narrative
changes tied to moment-to-moment gameplay, then these techniques can
be impossible or impractical to implement. Not to mention that the people
who are best positioned to be making the creative decisions often don’t
have technical skills, experience, or perhaps inclination toward scripting
and similar implementation methods.
We can’t ship a sound designer or mixing engineer with each copy of the
game to stand behind the player and make creative decisions as the player
makes their way through the game, but we still desire and aspire to be able
24 DOI: 10.1201/9781003330936-3
State-Based Dynamic Mixing ◾ 25
to creatively use sound in the same ways that are available to our linear
peers. Indeed, we could argue that in games there is potential for even more
creative impact from sound! Leveraging such techniques from linear media
to evoke emotions in a player or direct their attention and focus could affect
their decisions and actions, which in turn results in changes in the action
and narrative in the game, bringing about a positive cycle.
In order to be able to apply such techniques in an interactive context, we
can build a system that is responsible for directing the dynamic audio mix.
Such a system (sometimes called a Director System) needs to have a broad
overview of - and ability to - affect the game audio, as well as knowledge
of the context of the game It then needs a way to bridge that context and
capability with decisions about what should be done with the audio in a
specific situation. There are different approaches that could be taken with
this: a common and effective one is to use a state-based decision machine.
In the case of dynamic mixing, each state is a mixing state representing the
creative choice (i.e., effect) or choices that are being made, and the transi-
tions between those states represent the act of making that choice. While
this chapter will focus mainly on the state-based system, we will also touch
on other approaches and see in our examples that the Director System
can be viewed also as a collection of interconnected systems. Conceptually
(and to some extent practically), we can approach our Director System(s)
by dividing it into three concerns: selection, decision, and action.
For each of the three elements of the dynamic mixing system, a key
factor will be the user experience. Such a system can easily become highly
complex, and it doesn’t matter how sophisticated the technology is; if it’s
difficult to use, it won’t be used! At each stage of building the system, we
need to pay special attention to making it convenient to use. It is also worth
keeping in mind that while we consider the elements separately here, in
practice the boundaries may not be so clear-cut. Some elements will be
better to implement separately in order to provide flexibility of combina-
tions, and some will make sense to implement as a combined solution.
There can be a temptation to try to adhere to ideals that appear elegant in
theory, but this should always be tempered by pragmatism, considering
the specific environment and context.
The exact implementation and even specific features will vary a
lot depending on the engine and game that you are working in. I will
give examples of features we implemented on Cyberpunk 2077 using
REDengine 4 with Wwise as an audio middleware. Some things may fit
your own context, some may not – but hopefully they will still act as an
inspiration for finding and designing your own features.
2.1 SELECTION
For selection, we consider a variety of Action Types, where each Action
Type describes a selection condition that we use to decide if an action
should be applied or not. Some Action Types have a local context that
allows for finely targeted mixing, while others have a global context.
State-Based Dynamic Mixing ◾ 27
2.1.3 VO Context
In Cyberpunk, we had three categories of voiceovers (VO): quest, gameplay,
and community. These categories were used primarily for mixing and
selecting different attenuations and also gave us some valuable informa-
tion and context about the line being played. A community line is usually
unimportant: something that can almost be considered part of the ambi-
ence. A gameplay line is usually something combat related, with medium
importance to be heard. A quest line is likely to be an important VO that
is part of the narrative scenes that we definitely want to hear. As with the
sound tags, this information can be applied to both the emitter and the
entity so that we can use the context for VOs as well as other sounds. From
the VO context, we can also know if an entity is just in the crowd, is rel-
evant to gameplay (usually combat), or is an active participant in the quest.
2.1.5 Global
In addition to a selection that is specific to an entity in the game, it can
also be convenient to be able to apply some global actions from states.
State-Based Dynamic Mixing ◾ 29
For example, we have the option to set a global parameter (i.e., a globally
scoped RTPC in Wwise) or disable combat VO. These things don’t follow
the paradigm of the system perfectly, since there is no specific selection. It
is possible to implement these actions outside of the state system, but since
the states themselves provide relevant mixing context, it’s good to take
advantage of that.
2.2 ACTION
In addition to specifying the selection conditions, the Action Type is also
associated with one or more Mixing Actions that define the actual mixing
to be done. Where possible, these Actions can be combined freely with
different selection conditions. However, some actions (such as globally
applied actions) may have no selection condition or only specific selection
conditions that make sense.
will pull the sound directly toward the listener. This requires maintaining
two positions for the sound: the true position, which is updated by the
game, and the virtual position, which is computed from the true position
before passing it to the middleware.
Although this feature is not as simple to set up as the Actions above,
you may already have access to the functionality. In Cyberpunk, we
already had a repositioning pipeline in place as part of our acoustics
system, so we were able to piggyback on the repositioning logic from
that feature. Listener handling in third person games also often uses this
technique, so if you have such a system in place, check if you can also
use it for dynamic mixing. By manipulating the 3D position, we make
use of all the distance-based functionality that we have already set up,
including distance attenuation curves, low-pass filtering, and reverb. If
we consider the situation where we want a specific sound to fade into
the background, it becomes clear how much more effective using the
distance rolloff factor can be than adjusting volume alone. The fact that
it’s using the existing distance setup also means that the adjustment has
a natural and organic sound, in common with the base audio experience
of the game.
2.2.10 Interpolation
It is the responsibility of each action to provide interpolation functional-
ity for being added and removed. While there are a variety of curves that
can be used to interpolate between different Actions, in practice we found
for us that it was enough to use either linear interpolation or immediate
transitions depending on the Action.
2.3 DECISION
In Cyberpunk 2077, there are multiple elements that are responsible for
making the decision of when and which Mixing Actions to apply. These
are key to realizing the “dynamic” part of dynamic mixing.
FIGURE 2.1 A Mixing State (tier_3) containing three separate mixing actions.
State-Based Dynamic Mixing ◾ 35
2.3.1.2 Transitions
Another section of the State displays the Transitions priority window. Each
state contains a list of Exit Transitions which connect the State to potential
next States. The list is sorted by priority so that if there are multiple viable
Transitions, we will select the highest priority transition. We can also see
in Figure 2.1 the write variable actions array. Rather than a Mixing Action,
the write variables provide the possibility to set or modify a value on a
variable internal to the Scene which can then be used as a condition on
transitions to control the flow. Next, let’s see what happens in the tier_3
to q004_lizzie_private_room Transition shown in Figure 2.2.
Inside the Transition, we can set a transition time to control the time taken
to interpolate between the first and second scenes, as well as the conditions
which will define the dynamic decisions in our system. First, we need to
decide if we want the Transition to require all conditions in order to be con-
sidered viable or if it can occur with any condition being met. Next, we have
FIGURE 2.2 The Transition from the tier_3 Mixing State to the q004_lizzie_
private_room Mixing State.
36 ◾ Game Audio Programming 4: Principles and Practices
the built-in conditions which are common across all transitions: we can use
the exit time to have States transition after some time or to prevent a transi-
tion from happening before that time has elapsed. The exit signal, also known
as a Signpost, is the most common way to control Scenes with State logic that
follows the progress of a quest; signals are received from Signpost nodes in the
quest graph or by Signposts signals resolved from the quest dialog. Using quest
dialog as a trigger is convenient because it doesn’t require any additional setup
on the quest side, and very often we will have some quest dialog in a good
moment for us to make a mixing transition. To manage this, we maintain a
map of dialog line IDs to mixing Signposts. After the built-in conditions, we
have the option to create a condition based on internal Scene variables; in this
case, the variable that we set as a write action in the previous State. Finally, we
can assign optional conditions based on a variety of contexts:
point the Transition will be activated and the Scene will switch between
the tier_3 and q004_lizzie_private_room, interpolating between the
Actions over a period of 2 seconds.
Something that we found helpful with the setup of Scenes, especially
for the more complex ones, was to introduce a special any_state case for
Transitions. This allows Transitions to be authored without requiring a
specific starting State and instead can be used as a transition from (as the
name suggests) any State. This can clean up the graph a lot as well as save
on duplicated work when setting up more complex Scenes.
FIGURE 2.3 The Scene used to drive the dynamic music in a combat encounter.
38 ◾ Game Audio Programming 4: Principles and Practices
While the State defines the Mixing Actions, the actions triggered,
when the State is activated, will be applied on the Scene. The Scene is then
responsible for managing the lifetime and updates of the actions, and
interpolating and removing them when a Transition occurs.
The combination of Scenes, States, and Transitions allows us to craft
powerful dynamic mixing Decisions, but it comes with a certain level of
complexity. In order to help the sound designers use it to its fullest potential,
it is worth spending time and energy making a polished and expressive user
interface to make it as easy as possible to set up the dynamic mixing data.
PriorityParametersArray newParams;
for( triggerVolume : triggerVolumes )
{
for( parameter : triggerVolume.parameters )
{
PriorityParameter* existingParameter =
newParams.Get( parameter );
if( existingParameter == nullptr )
{
newParams.Append( parameter );
}
else if ( parameter.priority > existingParameter.priority )
{
*existingParameter = parameter;
}
}
}
Trigger volumes may also be useful for other forms of dynamic mixing,
for example, to control event replacement. You may find many other appli-
cations since player location is such a useful selection mechanic.
State-Based Dynamic Mixing ◾ 39
2.4 CONCLUSION
We’ve seen how to build a state-based dynamic mixing system based
around the concepts of Action, Selection, and Decision, as well as other
elements of dynamic mixing that operate in tandem with the state-based
system. In examples from the mixing systems developed for Cyberpunk
2077, we’ve seen both features that could be applied generically to different
types of games, as well as features that are more useful to specific contexts
in this game, and that we should consider closely the context of the game
that we are working on in order to build the system best suited to our
needs. While we can gain a lot of value from a powerful bespoke mixing
system, we should reuse and adapt existing systems and look for ways to
make the system more convenient to use for designers.
As a final word, I want to say a big thank you to the members of our
Audio Code team who brought these systems to life in Cyberpunk 2077:
Giuseppe Marano, Marek Bielawski, and Mateusz Ptasinki, as well as the
awesome Sound Design and Music teams at CD Projekt Red who continue
to use these tools to add dynamic mixing and music into the game.
Chapter 3
3.1 INTRODUCTION
Consider a long looping sound – say, the crackle of an electric beam weapon
or the mechanical motor of a drill. These sounds are usually important and
triggered by the player and should therefore be loud, exciting, and dynamic.
But what happens if the player holds down the button that fires the electron
gun or that triggers the drill for a long time? These sounds can’t be the most
important thing in the game the entire time that they’re playing.
Enter the AHDSR envelope – short for its components of Attack, Hold,
Decay, Sustain, and Release. The Hold parameter is a more modern addition
to the envelope and is not always present, so the curve is typically abbreviated
as ADSR.1 In this chapter, we’ll examine the ADSR curve and its component
pieces, look at its applications and where it breaks down, and then discuss a
system that we call Timed ADSRs to apply ADSRs to one-shot sounds.
• Initial – The starting point of the curve. This is the value at time zero.
• Attack – The time that it takes the curve to transition from the Initial
value to the Peak value.
• Peak – The value that will be reached after the Attack time has passed
and which is maintained for the duration of the Hold time.
• Hold – Once the value has reached the Peak, it is held for this length
of time.
• Decay – The time that it takes the curve to transition from the Peak
value to the Sustain value.
• Sustain – The value that is maintained while the sound is playing,
after the Attack, Hold, and Decay times have passed.
• Release – Once a stop has been triggered, how long it takes to reach
the Final value.
• Final – The final value at the end of the curve, after the stop has been
triggered and the Release time has elapsed.
FIGURE 3.3 An ADSR curve with nonlinear interpolation between the various
segments.
• Fadeout – This is the simplest use of the curve since it only involves
one of the features. To implement a fadeout, we set our Attack and
Decay times to zero, then set the Initial, Hold, and Sustain values to
0 dB. The Final value gets set to silence, and we set the Release time
to our desired fadeout time (see Figure 3.4a).
• Crossfade – Given two sounds that we want to cross fade, we can set
both of them to have both a fade in and a fade out. This is accom-
plished by setting the Attack and Release times to the desired crossfade
duration and the Decay time to zero. Initial and Final are both set to
silence, and then Hold and Sustain are set to 0 dB (see Figure 3.4b).
• Drill/Beam Weapon – In these examples, the sound should be initially
loud and occupy much of the aural focus, then fade down to a quieter vol-
ume after a while, and fade out entirely when the weapon is untriggered.
Here we take advantage of the entire suite of parameters. Attack time is
zero and Initial is 0 dB in this case – there is no fade in since the sound
should start immediately. Similarly, Release is set to a short duration and
Timed ADSRs for One-Shot Sounds ◾ 45
FIGURE 3.4 (a) A simple fadeout implemented with an ADSR curve. (b) A cross-
fade implemented with an ADSR curve. (c) A more complex ADSR curve with an
instantaneous Attack, followed by a Hold, Decay, Sustain, and Release to silence.
46 ◾ Game Audio Programming 4: Principles and Practices
3.3 LIMITATIONS
So far, this chapter has been an in-depth discussion of a feature that is
likely already a part of the middleware that you use. In FMOD Studio,
for example, nearly every property can be modulated over time with an
AHDSR curve. Despite their ubiquity within the editor and runtime, there
is one context in which their functionality is unavailable: repeated one-
shot sounds. All of the examples that we have discussed to this point in this
chapter involve long sounds that have an opportunity to fade in and out,
but how can we apply this same technique to sounds that are short-lived?
Let us consider the humble footstep. When a character goes from standing
around idle to moving, individual footstep sounds start to play, likely trig-
gered by some code that is interacting with the animation system. The foot-
step sounds at this point are new and novel and therefore should be louder.
However, after a few seconds of walking, the footsteps become common-
place – a part of the ambiance of the environment. At that point, they can be
played more quietly until the player stops moving for a while. Importantly, it
should take a little while to reset the volume, since the player may pause in
their movement for just a few moments, and we don’t want to constantly reset
the volume every time there’s a small hiccough in movement. Although the
example is about footsteps, we would want this same functionality for other
similar impulses such as repeated magic spell effects or gunshots.
The behavior that we have described here is similar to an ADSR curve,
but the existing functionality in our middleware won’t work to implement
it. This is because the ADSR curves must work exclusively within a sin-
gle Event and have no concept of maintaining state across Event triggers.
Instead, we will need to introduce a new concept.
• Peak and Hold – These values will be our starting point for our
repeated one-shots. We’ll start at the Peak value and hold it there for
the Hold duration.
• Decay and Sustain – Once again, these values map directly to one-
shot sounds in the same way that they apply to looped sounds. After
the Hold time, we fade down to the Sustain value over the Decay time
and then hold it there.
• Release and Final – Similar to Initial and Attack, these values have
no meaning for one-shot sounds. They’re used to implement a fadeout,
but one-shot sounds have nothing to fade.
This leaves us with just Peak, Hold, Decay, and Sustain as the values left
to construct our curve, which results in a curve that looks like Figure 3.5.
However, there is one last piece of the puzzle that remains to be imple-
mented: we need some way of knowing when to reset the curve back to
its starting point. In our example, this value would map to how long the
player should be standing still in one spot before the footsteps get loud
again. We’ll refer to this value as the Reset time.
FIGURE 3.5 The Timed ADSR curve used for repeated one-shot sounds. The
markers on the curve match the entries in Table 3.1.
48 ◾ Game Audio Programming 4: Principles and Practices
TABLE 3.1 Example Sequence of Timed ADSR Values at Various Times. The entries
match the markers in Figure 3.5.
Trigger Time Figure 3.5 Marker Notes
0s A Our initial playback at time zero starts at the Peak value.
2.3 s B We are still within the Hold time, so remain at Peak.
4.5 s C We have passed our Hold time, and we’re 0.5 seconds into
our Decay time. We start fading from Peak to Sustain.
8.8 s D Same story as 4.5 seconds, except that we’re now
4.8 seconds into our 5 second decay time. Our fade
continues, except that we are now near the end of it
and the value nears Sustain.
9s E We’ve reached the Sustain portion of the curve, where
the value stays until it resets.
12 s F Same as above.
17 s A Our Reset time has elapsed since the last trigger, so the
value is reset to the Peak.
19 s B Still on the Hold time again.
21.5 s C Starting into our Decay time again.
25 s A Even though we haven’t finished the Hold and Decay
times, we haven’t played since the Reset time, so the
parameter’s value is reset to Peak.
volume (Sustain) until the end. The sustain section will be implemented
with a looping region. That implements the curve itself, but now we must
have some way to trigger the sound. We’ll add a discrete parameter called
Trigger to trigger the sound whenever its value goes from zero to one. In
order to make sure that the value resets automatically, we’ll give it a negative
velocity. Figure 3.6 shows how all of the parameters and timeline are set up.
FIGURE 3.6 (a) Event timeline for a first-pass attempt at implementing the
desired behavior. (b) The Trigger parameter sheet. (c) Velocity setup for the
Trigger parameter.
50 ◾ Game Audio Programming 4: Principles and Practices
That wasn’t too bad – just a few straightforward items to set up. Let’s
take a step back now and see how much of our Timed ADSR setup we’ve
managed to implement with this Event so far. The Event starts loud then
fades out over time into a sustained value, and each trigger of the sound
respects this volume curve – there’s our Peak, Hold, Decay, and Sustain.
Great! But this still leaves us with no automatic way to guarantee that the
volume starts at full volume immediately when the first sound triggers,
and no way to reset it once it triggers.
One way to implement this reset is to do it by hand. Keep the sound
stopped until we’re ready to play it and immediately set the Trigger param-
eter to a value of 1 on start. We track how long it’s been since we last played
and then stop the sound once it’s been long enough. All of this must hap-
pen in native code, but we’re looking to minimize the amount of code that
we have to write for this solution. Is there any way to implement Reset
without having to leave the tool?
• The timeline must stay at the beginning of the Event until the first
time the Trigger parameter is hit.
• Every time a sound is triggered during the sustain interval, the time-
line should reset to a point that defines the delay time.
FIGURE 3.7 (a) Destination Marker and Transition at the beginning of the time-
line. (b) Transition condition setup.
Trigger parameter is off (zero). With this setup in place, the timeline cursor
is freed as soon as we play our first sound. Figure 3.7 shows this setup.
FIGURE 3.9 (a) Event Timeline showing the sustain region. (b) Transition region
condition setup.
FIGURE 3.10 Curve driving the TimedADSR parameter. The markers corre-
spond to Table 3.2.
since the Peak and Hold values are effectively hard-coded to zero and one,
respectively. The shape of this curve is shown in Figure 3.10.
Note that Figure 3.10 represents the value of the parameter rather
than the parameter curve itself, and the parameter can then adjust any
property that can be automated. Figure 3.11 shows an example of an
3.6.2 Example
Let’s revisit our example from Section 3.4, except that this time we’ll make
the values more concrete. Our TimedADSR parameter curve has hard-
coded the Peak and Sustain values at 0 and 1, respectively, which lets us fill
specific values into our table. Table 3.2 shows the value of the TimedADSR
parameter at different times. (The Notes are duplicated from Table 3.1 for
convenience, with the specific values included.)
It is worth noting that Figure 3.10 and Table 3.2 are exactly the same as
Figure 3.5 and Table 3.1, except with the specific values of 0 and 1 for Peak
and Sustain. With this in mind, we can now dive into our implementation.
1. We must track our Timed ADSRs per Event for each actor in the
game. Consider two player characters in a multiplayer game cast-
ing the same spell at the same time – each one will need to track the
Timed ADSR status independently.
2. Most Events will not be set up with Timed ADSRs, so our design must
not pay a memory or CPU cost for Events that do not use the system.
3. For those Events that do use Timed ADSRs, we must avoid any kind
of operation happening on a tick for performance reasons.
When we play an Event, first we’ll check that it has all of the required
parameters, and then we’ll look up the correct parameter value. In
order to find the value, we look up the TimedADSRContext in the map-
ping based on the Event that we’re playing and then do a linear search
through the array of TimedADSRs to find the one that matches the actor
that we’re attached to.
Let’s put this all together:
struct TimedADSR;
struct TimedADSRContext
{
float HoldTime;
float DecayTime;
float ResetTime;
std::vector<TimedADSR> TimedADSRs;
};
struct TimedADSR
{
std::weak_ptr<Actor> Instigator;
float StartTime;
float LastTriggerTime;
};
class AudioEngine
{
// ...
std::unordered_map<FMOD::Studio::ID, TimedADSRContext> TimedADSRs;
};
A quick note on the types used in these examples: for the purposes of
this book, we are using the C++ standard library, but the types may need to
be changed based on your game engine. For example, when using Unreal
engine, the instigator would need to be a TWeakObjectPtr<AActor>,
and the TimedADSRs map would use a TMap<>. Either way, the types
used will have some requirements about how they are implemented. For
example, to properly use FMOD::Studio::ID in a std::unordered_map,
there will need to be a specialization of std::hash and operator== for
FMOD::Studio::ID. For the purposes of this book, we will assume that
these operations exist and are implemented properly.
Timed ADSRs for One-Shot Sounds ◾ 59
Now that we have the structures in place, we can start to put together
functionality. Let’s start with the code that actually evaluates the curve:
// We're past the Hold time, so adjust the time so that the
// zero value is at the beginning of the Fade time.
TimeSinceStart -= Context.HoldTime;
// We've passed the Hold and Decay times, so the current value
// is 1.
return 1.0f;
}
bool TimedADSR::IsExpired(
const TimedADSRContext& Context,
float CurrentTime) const
{
// How long has it been since the last time we played.
auto TimeSinceLastPlay = CurrentTime - LastTriggerTime;
// If our last play time is past the reset time, then this
// TimedADSR has expired.
return TimeSinceLastPlay < Context.ResetTime;
}
60 ◾ Game Audio Programming 4: Principles and Practices
These two functions describe the entire operation of the Timed ADSR
curve described in Section 3.4, and they meet all of the technical require-
ments outlined in Section 3.6.3. The rest of the work that we have to do is
to hook up these new structs. The API that we will expose from the audio
engine is a single function that returns the current value for the Timed
ADSR given an Event ID and instigator. This function will find or add a
TimedADSRContext, find or add a TimedADSR for the given instigator, reset
the TimedADSR if it has expired, and then return the current value.
In order to implement that function, we’ll need a few straightforward
helper functions:
TimedADSRContext::TimedADSRContext(
float HoldTime, float DecayTime, float ResetTime) :
HoldTime(HoldTime), DecayTime(DecayTime), ResetTime(ResetTime)
{}
TimedADSR::TimedADSR(float CurrentTime)
{
Reset(CurrentTime);
}
float AudioEngine::GetTimedADSRValue(
const FMOD::Studio::ID& EventId, const Actor& Instigator,
float HoldTime, float DecayTime, float ResetTime)
{
// First, either find the existing TimedADSRContext
// for the given Event or add a new one.
auto [FoundContext, NewlyAdded] =
TimedADSRs.try_emplace(EventId, HoldTime, DecayTime, ResetTime);
Now that we’ve written this code, the final piece of the puzzle is to actu-
ally use the AudioEngine::GetTimedADSRValue() when starting to trigger
our Event. We will write a function that can be called on Event playback,
in the ToPlay state8:
void PlayingEvent::SetupTimedADSR()
{
// Get the user properties for the Hold, Decay, and Reset times.
FMOD_STUDIO_USER_PROPERTY HoldTimeProperty;
FMOD_STUDIO_USER_PROPERTY DecayTimeProperty;
FMOD_STUDIO_USER_PROPERTY ResetTimeProperty;
auto HoldTimeResult =
EventDescription->getUserProperty(
"HoldTime", &HoldTimeProperty);
auto DecayTimeResult =
EventDescription->getUserProperty(
"DecayTime", &DecayTimeProperty);
auto ResetTimeResult =
EventDescription->getUserProperty(
"ResetTime", &ResetTimeProperty);
// Make sure that all three values have been read properly
if (HoldTimeResult != FMOD_OK
|| DecayTimeResult != FMOD_OK
|| ResetTimeResult != FMOD_OK)
return;
Timed ADSRs for One-Shot Sounds ◾ 63
We have avoided any sort of ticking code, but now we have a different
problem: old TimedADSRContexts stick around indefinitely. We have a few
options:
• We can accept the memory leak under the expectation that it will be
small and the memory cost is manageable.
• We can add a ticking function that runs at some slow rate to clean up
any expired TimedADSRs. Despite our determination to do otherwise,
this will end up being a minor cost that we may be willing to accept.
• We can select a specific moment in time (such as loading a new game
map) to clean up the expired TimedADSRs.
• We can schedule a timer to run only when we expect that there will
be at least one TimedADSR to clean up. There are a few options for
selecting when to trigger the timer:
• Schedule one timer for each TimedADSR at the moment of its expi-
ration. The benefit is that there will never be any wasted space, and
each TimedADSR will be cleaned up immediately when it expires.
However, we do have to keep track of potentially many timers.
• Schedule one timer only, for the time of the nearest TimedADSR
expiration. This is a lot more efficient in space and time (fewer
timers, fewer timer handles, fewer callbacks), but it can still have
a lot of callbacks, as we will need to hit each TimedADSR. Also, the
function to calculate this value is more complex, since it needs to
iterate over all of the TimedADSRs.
• Schedule one timer only, for the time of the latest TimedADSR
expiration. This has all of the same benefits and drawbacks as
the previous option, but it collects several TimedADSR expirations
into one callback trigger. So long as we’re okay holding onto the
TimedADSRs for a few seconds, this is my preferred option.
Which option we pick depends on the game requirements, but the code
for doing the actual cleanup is straightforward.
TimedADSRs,
[=](const TimedADSR& Entry)
{
return Entry.IsExpired(*this, CurrentTime);
});
}
void AudioEngine::CleanupExpiredTimedADSRs()
{
// Clean up any expired TimedADSRs from each context
for (auto& [ID, Context] : TimedADSRs)
{
Context.CleanupExpiredTimedADSRs(GetCurrentTime());
}
3.7 CONCLUSION
ADSRs are a massively useful tool in the hands of sound designers, but
they are limited to looped sounds. The Timed ADSR concept introduced
in Section 3.4 extends the ADSR to one-shot sounds. In this chapter, we
have examined two different approaches for implementing the Timed
ADSR: a tool-only approach and a code-driven approach. Although both
approaches function well, the code-driven approach has fewer caveats and
a simpler setup (once the code has been written, of course).
NOTES
1 Both ADSR and AHDSR are the sort of unfortunate acronyms that aren’t
actually pronounceable. That is, you don’t typically hear people refer to
them as “add-ser” curves, but rather as “aiy dee ess ar” curves.
2 Cirocco, Phil. “The Novachord Restoration Project,” CMS 2006, http://
www.discretesynthesizers.com/nova/intro.htm
3 “Envelope (music),” Wikipedia, Wikimedia Foundation, June 3, 2022,
https://en.wikipedia.org/wiki/Envelope_(music)
4 Valid, but, y’know, weird… However, we’re coming into this with
the assumption that the sound designer knows what they want, and that
this is it.
5 This example uses volume, but can be applied to any parameter that can be
automated.
Timed ADSRs for One-Shot Sounds ◾ 67
6 This is in contrast to the pinning that we did in Section 3.5.2.1, where the
transition was set when the value is 0.
7 See Game Audio Programming: Principles and Practices Volume 2, Chapter 8
“Advanced FMOD Studio Techniques”, section 8.3.2 by Guy Somberg, pub-
lished by CRC Press.
8 See Game Audio Programming: Principles and Practices Volume 1,
Chapter 3 “Sound Engine State Machine” by Guy Somberg, published by
CRC Press.
Chapter 4
Systemic Approaches
to Random
Sound Effects
Michael Filion
4.1 INTRODUCTION
Large open-world games, where the world is one of the fundamental ele-
ments of the player experience, need to feel alive, vibrant, and detailed so
that players do not get bored of seeing the same blade of grass or the same
tree for the 100th time. Systemic implementation of random sound effects
(RFX) in video games adds a level of realism and unpredictability to what
can easily become a repetitive backdrop. A systemic approach can allow
for a greater diversity of sound with less repetition and give the impression
to the player of a full-fledged dynamic world simulation without requiring
huge computational resources and development time.
This chapter will discuss the fundamentals of RFX and walk through
the implementation of a complex and flexible system that allows for greater
creative control.
the densely packed forest – even when they are not easily visible. While in
reality the reason why a given sound is produced at any given moment in
this scene is not random and is the reaction to a given event, as a simple
observer in this environment each of these sounds will be indistinguish-
able from random background noise. This perceived randomness is what
systemic RFX systems attempt to recreate in the virtual game environment.
For the purposes of this chapter, we will define RFX as any random
sound effect not triggered as part of player or NPC actions. This means
that any sound triggered for any animations, gameplay events from inter-
action (such as opening a door or a chest), looping ambience sounds, etc.
will not be included as part of the discussion. This category of sound will
often attempt to imitate the randomness of the natural world.
#include <random>
#include <iostream>
int main()
{
std::random_device rd;
std::mt19937 gen { rd() };
std::uniform_real_distribution<> dis { 0.0f, 1.0f };
70 ◾ Game Audio Programming 4: Principles and Practices
return 0;
}
Those three lines will output different values each time the code is run.
This code uses a random_device to select a seed for the mt19937 generator
and then feeds that generator to a uniform_real_distribution to select
uniformly distributed random values between 0 and 1.
Extending this functionality to a useful game context is outside the scope
of this chapter, but it can be encapsulated without too much effort into
a function with an interface similar to float GetRandomNumber(float
Min, float Max), which is what we will use in this chapter as a proxy for a
fuller random number interface.
FIGURE 4.1 A basic data-driven example using Wwise. The Sequence Container
is set to continuous, and the Silence is randomized between 1 and 5 seconds.
play silence with a randomizable duration, and the ability to loop between
these two methods.
While the basic example shown in Figure 4.1 does not provide any
interesting results, we can use it as a baseline for discussions around a
more complex system. For example, we can randomize the position of the
sounds to create some more variety.
We will start by defining a simple class to represent the data for our RFX.
struct RandomSoundEffect
{
bool IsPlaying() const;
Time GetNextTriggerTime() const;
SoundEffect soundEffect;
float minDistance;
float maxDistance;
Time lastTriggerTime;
Duration triggerDelay;
};
72 ◾ Game Audio Programming 4: Principles and Practices
class RandomSoundEffectPlayer
{
public:
void Update();
private:
void PlayRandomSoundEffect(
const RandomSoundEffect& rfx, const Vector3& pos) const;
Vector3 GetRfxPosition(const RandomSoundEffect& rfx) const;
private:
std::vector<RandomSoundEffect> rfxs;
};
void RandomSoundEffectPlayer::Update()
{
const Time now = GetCurrentTime();
for(auto& rfx : rfxs)
{
//Has the RFX finished playing?
//Has enough time passed since the last trigger?
Systemic Approaches to Random Sound Effects ◾ 73
Vector3 RandomSoundEffectPlayer::GetRfxPosition(
const RandomSoundEffect& rfx) const
{
const Vector3 playerPos = GetPlayerPosition();
FIGURE 4.2 The shaded area where random sound effects will play.
Systemic Approaches to Random Sound Effects ◾ 75
distance the player can traverse given the length of a random sound effect
to avoid any discontinuity issues.
struct RandomSoundEffect
{
bool IsPlaying() const;
SoundEffect soundEffect;
float maxDistance;
};
struct PlayingRandomSoundEffect
{
Time GetNextTriggerTime() const;
Time lastTriggerTime;
RandomSoundEffect rfx;
};
struct RandomSoundEffectCategory
{
//Configuration Data
unsigned int maxConcurentRfxs;
Duration triggerDelay;
std::vector<RandomSoundEffect> rfx;
//Runtime Data
std::vector<PlayingRandomSoundEffect> playingRFXs;
std::vector<std::pair<OwnerID, Vector3>> positions;
};
76 ◾ Game Audio Programming 4: Principles and Practices
Vector3 pos;
do
{
const unsigned int idx = GetRandomNumber(0, maxCatPos);
pos = category.positions[idx].second;
} while((playerPos – pos).Length > rfx.maxDistance);
return pos;
}
Systemic Approaches to Random Sound Effects ◾ 77
The function now requires both the RFX and the category it is associ-
ated with. Because each RFX can have a different maximum distance,
the function enters a loop to continuously select a random position,
ensure that it is close enough to the player, and repeats if that is not the
case. One assumption that this algorithm makes is that there are enough
positions that eventually a valid position will be found. This function
can quickly become a performance bottleneck if there are too many posi-
tions provided that are not close enough to the player’s position. If that
is the case, then there are other algorithms that will find appropriate
positions quickly.
The next function that is required is an update per category. With this
function, the algorithm needs to iterate on all the playing RFX, validate
if they are currently playing, and play a new instance if enough time has
elapsed after the previous one has finished.
void RandomSoundEffectPlayer::Update(
RandomSoundEffectCategory& category) const
{
const Time now = GetCurrentTime();
const unsigned int maxCatRFXs = category.rfx.size();
for(auto& playingRFX : category.playingRFXs)
{
//Has the RFX finished playing?
//Has enough time passed since the last trigger?
if(!playingRFX.rfx.IsPlaying() &&
now > playingRFX.GetNextTriggerTime())
{
//Determine the new RFX to play & assign it
const unsigned int idx = GetRandomNumber(0, maxCatRfxs);
playingRFX.rfx = category.rfx[idx];
Looking at the algorithm for the RFX category, we notice that it shares the
same basic structure as the original implementation. The differences here are
that the algorithm now assigns the RFX in PlayingRandomSoundEffect and
stores information in that structure rather than the RandomSoundEffect.
The number of playing RFX can be increased or decreased globally or based
upon a given game context.
We have left off a few implementation details in this chapter, such as ini-
tialization of the arrays and any runtime resizing given a specific context.
We also haven’t shown some game-specific details, such as updating, add-
ing, and removing positions as appropriate for the specific game engine
used. Finally, the global RandomSoundEffectPlayer::Update() function
needs to be modified to iterate on the categories and call the Update()
function with the category as a parameter.
The level of control that sound designers have has increased substan-
tially with these added features. The amount of time to add these addi-
tional features is more than worth the improvements.
4.9 CONCLUSION
Helping a game world come alive with an added level of sound that mimics
what we know from the real world is an effective tool for player immersion.
Both the simplest and more complicated implementations can be memory
and CPU efficient while respecting an artistic vision. Starting with the
most basic implementations can provide a level of realism to the world
with very little investment. With the basic version in place, iterating on the
basic algorithm and modifying the behavior to be more appropriate for the
given game context can be done over time. The resulting implementations
will often be still quite simple, allowing for easy debugging and testing in
a multitude of scenarios and also imitating the real-world easily with great
results.
NOTES
1 At least, finally for this chapter. The failings of rand() could fill many, many
more pages of text. It’s an awful function and you shouldn’t use it. - Ed
2 https://mathworld.wolfram.com/SpherePointPicking.html
Chapter 5
Alex Pappas
5.1 INTRODUCTION
5.1.1 Establishing the Problem
Early on in the development of Back 4 Blood, our sound design team was
struggling to implement a system for shrubbery sounds. Their solution at
the time was to hand-draw and place volumes throughout every gameplay
map at locations where shrubbery had reached a critical density. Each time
a character would move while inside this shrubbery volume, it would trig-
ger a traversal sound determined by parameters assigned per volume. The
end result would simulate the sounds of leaves and branches pulling at the
clothes of players, NPCs, and enemies alike as they passed through over-
grown foliage and brambles.
Most of the game’s foliage relied heavily on static meshes, a type of
mesh that is capable of conveying a great amount of detail, though it
can’t be deformed through vertex animations. Thus, these shrubbery
volumes added a great deal of life to the game, even though the meshes
themselves remained stationary and unmoving. We could heighten
immersion for players as they crashed through the undergrowth or
reward a keen ear with the subtle rustlings of a “Ridden” as it ambled
through a cornfield.
80 DOI: 10.1201/9781003330936-6
The Jostle System ◾ 81
It soon became clear that the shrubbery system was highly impracti-
cal. At this stage in the project, every gameplay map was undergoing
iteration, and there were more maps, level designers, and environment
artists than any single sound designer could keep up with. Often shrub-
bery volumes drawn by a sound designer on Monday would need to be
entirely reworked before the end of the week because maps were chang-
ing with such velocity. Also, organic foliage is seldom so neatly placed
that a single volume will properly capture its shape. The sound design-
ers would often grapple with fields of tall grass that would dissolve into
sparse clumps at the edges. Deciding to draw a volume only over the
densest parts of the grass would risk having inconsistent behavior; the
sparser clumps not included within the volume wouldn’t trigger the
expected traversal sounds. On the other hand, drawing a volume to
encompass everything would result in erroneous foliage traversal sounds
in locations where there was no tall grass.
This was not the only system that suffered from these issues. Two
other systems – one designed to handle sounds for bumping into
objects and another to trigger rattle sounds in response to explosions –
also required the team to draw volumes around ever-changing chain
link fences and alleyways full of debris. The maintenance work was
endless.
strain your physics engine. Additionally, if all the logic for identifying and
playing these sounds is distributed across disparate actors or components,
the run-time cost of your system is fragmented and difficult to control as
several events could stack up on the same frame.
1. Provide the sound designer with a tool that they can use to dictate a
static mesh’s sonic behavior when acted upon by nearby characters
and/or explosions.
2. Develop a process for associating these behaviors with placed static
mesh actors without changing the nature of, or adding on to, the
meshes themselves.
3. Generate and store this data such that it can gracefully handle map
iteration.
4. Identify valid character and explosion interactions at run-time with-
out relying on overlap or trigger events.
5. Create a run-time manager to process all the incoming interaction
requests during gameplay, house any performance optimizations,
and play the appropriate sounds.
FIGURE 5.1 An overview of our Jostle Pipeline, outlining the primary responsi-
bilities and functions of each step.
the appropriate sonic behavior together. These are saved to an array for
each gameplay map. At run-time, this packaged data will be loaded in with
the map and then interpreted by the Jostle System’s Run-Time Manager.
The manager uses this data to determine when an external stimulus should
trigger a jostle sound based on the stimulus’s nature, world position, and
the unique sonic behavior of the nearby static mesh actors.
struct JostleBehavior
{
EJostleShape OverlapShape = EJostleShape::Sphere;
float JostleSphereRadius = 0.0f;
Vector3 JostleBoxExtent = Vector3::ZeroVector;
Vector3 Offset = Vector3::ZeroVector;
};
TABLE 5.1 Summary of the Three Basic Jostle Types and How They Respond to
Collision and Explosion Events
Jostle Type Properties Collision Behavior Explosion Behavior
Shrub ShrubSound Always play ShrubSound NA
Bump BumpSound Play BumpSound if the source NA
Probability of the excitation is moving faster
SpeedThreshold than SpeedThreshold with
a likelihood of Probability
Rattle RattleSound Same behavior as Bumps Always play
RattleSound
FIGURE 5.2 An alleyway full of trash has been decorated with Bump Behaviors
(visualized by the dotted white outlines around various meshes). By including a
speed threshold, we may reward cautious players with more stealthy navigation,
while less aware players may noisily knock over discarded bottles as they sprint
through.
class JostleBehaviorsAsset
{
public:
ShrubBehavior* GetShrubBehavior(const StaticMeshAsset* Mesh) const;
BumpBehavior* GetBumpBehavior(const StaticMeshAsset* Mesh) const;
RattleBehavior* GetRattleBehavior(
const StaticMeshAsset* Mesh) const;
private:
std::map<StaticMeshAsset*, ShrubBehavior> ShrubBehaviors;
std::map<StaticMeshAsset*, BumpBehavior> BumpBehaviors;
std::map<StaticMeshAsset*, RattleBehavior> RattleBehaviors;
};
// ...
}
The Jostle System ◾ 89
First, we discard any old jostle data that may have been gener-
ated from previous builds. Calling GenerateForSMActors() and
GenerateForInstancedFoliage() should repopulate this data as they
iterate over all static mesh actors and instanced foliage actors, respec-
tively. If your team is using any custom systems for placing static meshes
in maps, you will want to expand GenerateJostlesForWorld() to
accommodate those systems as well. With access to the world, it is trivial
to iterate over all placed static mesh actors to check if any of their static
mesh assets are referenced by the Jostle Behaviors in our Jostle Behaviors
Asset.
void JostleBuilder::GenerateForSMActors(
const JostleBehaviorsAsset* Behaviors)
{
// Iterate over all static mesh actors in the world
for (const auto& SMActor : GetAllStaticMeshActorsForWorld())
{
// Get the static mesh asset used by the static mesh actor
StaticMeshAsset* Mesh = SMActor->GetStaticMeshAsset();
// Check the found mesh against the list of shrub, bump, and
// rattle behaviors stored on the Jostle Behaviors Asset
AddPotentialShrubBehavior(Behaviors, Mesh, ActorTransform);
AddPotentialBumpBehavior(Behaviors, Mesh, ActorTransform);
AddPotentialRattleBehavior(Behaviors, Mesh, ActorTransform);
}
}
void JostleBuilder::AddPotentialShrubBehavior(
const JostleBehaviorsAsset* Behaviors,
const StaticMeshAsset* Mesh,
const Transform& ActorTransform)
{
// Retrieve the Shrub Behavior in our Jostle Behaviors Asset, if
// an association has been made
const ShrubBehavior* ShrubPtr = Behaviors->GetShrubBehavior(Mesh);
if (!ShrubPtr)
return;
90 ◾ Game Audio Programming 4: Principles and Practices
// Create a Shrub Emitter Node for the placed static mesh actor
// and Shrub Behavior
ShrubEmitterNode* ShrubNode =
CreateAndInitializeNewShrubNode(ShrubPtr, ActorTransform);
class JostleEmitterNode
{
public:
// World position of the jostle including actor
// transform and any offset present in the Jostle Behavior
Vector3 Position = Vector3::ZeroVector;
protected:
Transform ReferencedActorTransform = Transform();
private:
// Utility function for detecting sphere overlaps
bool IsWithinRangeSphere(
const float Radius, const Vector3& ExcitationLocation);
Our Init() function should be called right after the Emitter Node’s creation
during the Build Process (AddPotentialShrubBehavior() in Section 5.5.1).
Looking at the internals of this function, we see how the static mesh actor’s
transform is used to properly set the Jostle Emitter Node’s world posi-
tion. TryExciteNodeCollision() and TryExciteNodeExplosion() will
handle our run-time behavior in response to different types of excitation.
For now, these functions both return false. You may have also noticed
that JostleEmitterNode is agnostic to its Jostle Behavior. Much like we
created Shrub, Bump, and Rattle versions of our Jostle Behavior, we will
also create Shrub, Bump, and Rattle child classes of JostleEmitterNode
and override each of our virtual functions to reproduce the unique sonic
behavior of each jostle type.
bool ShrubEmitterNode::TryExciteNodeCollision(
const Vector3& ExcitationLocation, const Vector3& Velocity)
{
if (!IsWithinRange(&ShrubInfo, ExcitationLocation))
return false;
PlaySound(ShrubInfo.ShrubSound, ExcitationLocation);
return true;
}
bool BumpEmitterNode::TryExciteNodeCollision(
const Vector3& ExcitationLocation, const Vector3& Velocity)
{
if (!IsWithinRange(&BumpInfo, ExcitationLocation))
return false;
PlaySound(BumpInfo.BumpSound, ExcitationLocation);
return true;
}
Both the Shrub and the Bump first check to make sure the
ExcitationLocation falls within the range of the jostle’s collision
shape. ShrubEmitterNode goes on to play its jostle sound immediately
while BumpEmitterNode performs additional work using the excitation’s
Velocity and the Jostle Behavior’s Probability. Only after an incoming
excitation exceeds the SpeedThreshold and passes the probability test will
the sound be triggered.
94 ◾ Game Audio Programming 4: Principles and Practices
PlaySound(RattleInfo.RattleSound, ExcitationLocation);
return true;
}
bool JostleEmitterNode::IsWithinRangeBox(
const Vector3& BoxOrigin,
const Vector3& BoxExtent,
const Vector3& ExcitationLocation)
{
// Convert ExcitationLocation from world space to local.
const Vector3 TransformedExcitationLocation =
JostleMathUtils::WorldToLocal(
ReferencedActorTransform, ExcitationLocation);
// Now we can use our jostle's original box origin and extent
// to create an axis-aligned bounding box and check if the
// excitation occurred within our jostle's area of effect.
const AABBox JostleBox = AABBox(BoxOrigin, BoxExtent);
return JostleBox.Contains(TransformedExcitationLocation);
}
If the request is within the range of the Jostle Emitter Node and passes the
criteria specified by its Jostle Behavior, the excited Jostle Emitter Node will
play its jostle sound.
class JostleSystem
{
public:
// Register our list of Jostle Emitter Nodes, and insert them into
// our EmitterGrid when the gameplay map loads
void RegisterEmitters(
const std::vector<JostleEmitterNode>& Emitters);
// ...
private:
// 2D Grid which contains all currently active Jostle Emitter Nodes
TGridContainer<JostleEmitterNode*> EmitterGrid;
// ...
};
The Jostle System ◾ 97
FIGURE 5.3 (a) We partition our gameplay map into four quadrants, each cor-
responding to a cell in our (incredibly coarse) grid container. (b) When a charac-
ter moves through Quadrant D, we only need to evaluate the four Jostle Emitter
Nodes in the corresponding Grid Cell D for overlaps, rather than evaluating all
Jostle Emitter Nodes for the entire map.
98 ◾ Game Audio Programming 4: Principles and Practices
void JostleSystem::ExciteJostleSystemCollision(
const Vector3& CollisionLocation, const Vector3& Velocity)
{
// Create a JostleCollisionRequest and add it to CollisionRequests
CollisionRequests.push_back(
JostleCollisionRequest(CollisionLocation, Velocity));
}
void JostleSystem::ExciteJostleSystemExplosion(
const Vector3& ExplosionLocation, const float ExplosionRadius)
{
// Create a JostleExplosionRequest and add it to ExplosionRequests
ExplosionRequests.push_back(
JostleExplosionRequest(ExplosionLocation, ExplosionRadius));
}
void JostleSystem::EvaluateCollisionRequest(
const JostleCollisionRequest& Collision,
const Vector3& ListenerPosition)
{
std::vector<JostleEmitterNode*> CandidateEmitters;
// Get all Jostle Emitter Nodes for all grid cells that overlap
// with our search box
EmitterGrid.GetElements(&SearchBox, CandidateEmitters);
Background Sounds
Guy Somberg
6.1 INTRODUCTION
Creating the feeling of being in a realistic environment is one of the
challenges of game audio. There are many individual aural components
that make a player feel as though they are in a space. The room tone –
the sound of the space itself – is just the first component of this sound-
scape. Once the room tone is established, the sound designers will want
to layer on more elements, including both loops and randomized one-
shot sounds. In a space like the tunnels of a subway, there will be a light
breeze, the creaks of the tunnel walls settling, and maybe a ghost train
moving around in the distance. In a forest, we’ll hear wind, trees
rustling, birds chirping, and a stream burbling in the distance. One
sound designer I worked with immediately wanted to add the sound of
children crying in the distance.
All of these layers could be added to a single looped sound or middle-
ware event, but without any sort of systemic randomness, players will
quickly notice the repetition inherent in the sounds. I call the collection of
these sounds and layers as well as the system that plays them “Background
Sounds,” but they go by other names in different games: ambiences, envi-
ronmental effects, etc. Depending on how much fine-grain control we
want to provide the sound designers, we can either implement a system
that allows the sound designers to create these soundscapes or use the
tools that the middleware provides to do similar work. Either way, first
we’ll need to break the sounds down into their component parts and fig-
ure out how we want to handle them.
shown in Figure 6.2. However, if we just adjust the volume of the track
or the instrument using this modulator, we will end up with potentially
undesirable effects. If we use a square wave, then the volume changes
abruptly, and we have no way to change it over time.3 If we use a sine
FIGURE 6.2 (a) Volume control modulated by a square wave. (b) Volume control
modulated by a sine wave.
Background Sounds ◾ 105
FIGURE 6.3 Track volume automation curve for the BGSound Track 1 Volume
event parameter.
wave, then the volume will change smoothly, but it won’t hold at a par-
ticular volume over time.
Instead, we will solve this problem by adding an event parameter, which
we will call BGSound Track 1 Volume. First, we’ll add an automation curve
to the parameter for the track volume and give it a nice smooth curve,4 as
shown in Figure 6.3. With that curve in place, we can give the parameter a
seek speed and then apply a square wave LFO modulation to the parameter
value (Figure 6.4). Using a square wave in this way does not provide a ran-
dom time for the on-off cycle, but with a slow enough LFO, it shouldn’t be
noticeable. Although we won’t show it in this chapter, you can use a similar
scheme to adjust event parameters.
Now that we have the duration set up, we must handle the volume. We’d
like the volume to be randomized over time as the sound itself is playing,
FIGURE 6.4 Event parameter configuration showing parameter seek speed and
LFO modulation.
106 ◾ Game Audio Programming 4: Principles and Practices
FIGURE 6.5 Instrument volume with Noise (Ramped) LFO modulator applied.
FIGURE 6.7 Scatter configuration showing the minimum and maximum dis-
tances configured for nonspatial sounds.
FIGURE 6.8 Scatterer instrument playlist showing nested and referenced events.
108 ◾ Game Audio Programming 4: Principles and Practices
FIGURE 6.9 (a) Event parameter curve controlling event panning. (b) Random
modulator on the event parameter value.
based on those rolls. In this section, we’ll build a data- and event-driven
system to implement the desired behavior.
TABLE 6.1 Possibilities for Dice Rolls after the First One
Currently Playing Currently Not Playing
Choose to Play Select new playback parameters, Select new playback parameters.
then fade to those new values. Pitch and event parameters set
Play duration selected from instantly, and volume fades up
“Play Time” setting. from silence. Play duration
selected from “Play Time” setting.
Choose Not to Play Volume fades down to silence. Remain silent. Silent duration
Silent duration selected from selected from “Silent Time”
“Silent Time” setting. setting.
110 ◾ Game Audio Programming 4: Principles and Practices
select the volume, pitch, 3D position, and event parameters, then dec-
rement the group count and select a time for the next roll of the dice.
The next roll time is selected by the group delay if we have any more
sounds left in the group or by the trigger delay if this is the last sound
in the group. 5
The only other item of note here is that we want to force the first roll of
the dice to fail. If we allowed the initial playback to trigger, then we would
occasionally get a cacophony of sounds immediately as the player enters a
new environment.
That is it! These algorithms should be fairly straightforward to imple-
ment. Let’s go through the process.
• Play Chance – Every time we roll the dice, this is the chance that
we’ll choose to play a sound.
• Play Time and Silent Time – If we choose to play, then we’ll pick
a random duration within the Play Time range before rolling the
dice again. If we choose not to play, then we pick a random duration
within the Silent Time range before rolling the dice again.
• Fade In Time, Fade Out Time, Fade To Time – When fading in
from silence, we’ll fade the volume over a random value in the Fade
In Time range. When fading out to silence, we’ll fade the volume
over a random value in the Fade Out Time range. And, when we’re
already playing the sound and choose to continue playing the sound,
but at a different volume, then we’ll fade the volume over a random
value in the Fade To Time range.
• Volume Control – There are two ways that we can configure vol-
ume. We can allow the sound designers to provide decibel values
and adjust the event volume explicitly. Alternatively, if the sound
Background Sounds ◾ 111
designers would like a custom curve, then we can have them set up
an event parameter with a known name that the code will drive. Our
example code will support both options.
• Pitch Control – Very simple: the pitch is adjusted over time in the
same way that volume is controlled. We allow the sound designers to
configure the pitch range in cents.
• Event Parameters – Sound designers can configure event param-
eters that the system controls. We will select a value in the range
0..1 in order to keep this simple, but the ranges can be whatever
you would like, or even read the range of the value from the event
description and select from that range at runtime. Because the
range is fixed, we only need to know the name of the parameter to
control.
namespace BackgroundSoundData
{
enum class VolumeControlType
{
Decibels,
Parameter,
};
struct LoopingSoundData
{
EventDescription Event;
float PlayChance;
FloatRange PlayTime;
FloatRange SilentTime;
FloatRange FadeInTime;
FloatRange FadeOutTime;
FloatRange FadeToTime;
VolumeControlType VolumeControl;
FloatRange Volume;
FloatRange PitchCents;
std::vector<std::string> Parameters;
};
}
112 ◾ Game Audio Programming 4: Principles and Practices
EventDescription Event;
float PlayChance;
IntRange GroupCount;
FloatRange GroupDelay;
FloatRange TriggerDelay;
VolumeControlType VolumeControl;
FloatRange Volume;
FloatRange PitchCents;
std::vector<std::string> Parameters;
PanStyle Panning;
};
}
Background Sounds ◾ 113
There are a couple of useful helper functions that are used in the
implementation that we will present here. These functions are used
when setting the volume in order to support the dual-mode volume
control.
namespace BackgroundSoundData
{
inline float GetSilenceValue(VolumeControlType VolumeControl)
{
if (VolumeControl == VolumeControlType::Decibels)
{
return SILENCE_DB;
}
else
{
return 0.0f;
}
}
private:
// Context for playing a single looping sound
class LoopingSoundInstance
{
public:
LoopingSoundInstance(
BackgroundSoundContext& Owner,
const BackgroundSoundData::LoopingSoundData& Data);
~LoopingSoundInstance();
private:
// Roll the dice and pick what to do
void Roll();
// Fade contexts
LerpContext VolumeFade;
LerpContext PitchFade;
std::vector<LerpContext> ParameterFades;
bool IsPlaying = false;
};
// Configuration data
const BackgroundSoundData::Ambiance& Data;
OneshotSounds.reserve(Data.LoopingSounds.size());
for (const auto& OneshotSoundData : Data.OneShotSounds)
{
OneshotSounds.emplace_back(OneshotSoundData);
}
}
float TimeToNextRoll;
ParameterFades[i].Start(FloatRange::ZeroOne);
}
}
// Our time to the next roll of the dice is selected from the
// PlayTime value.
TimeToNextRoll = Data.PlayTime.RandomValue();
}
else
{
if (IsPlaying)
{
// We have decided to stop playing, but we are currently
// playing.
// Our time to the next roll of the dice is selected from the
// SilentTime value.
TimeToNextRoll = Data.SilentTime.RandomValue();
}
There are two extra details of note here. The first is that we call different
functions on the LerpContexts for the pitch and event parameters if the
sound is starting play (Set()) than if it is already playing (Start()). This
difference is because the pitch and other parameters will start with the cor-
rect values immediately when the sound is starting from nothing – there is
no need to fade those values from a previous value. The other item of note is
that we store this object on a list of fading looping sounds when the volume
is changing value over time so that we know which ones we need to tick.
Let’s take a look at how that works:
The only other details that we need to cover are the Tick() func-
tion to set the parameters of the playing sound, and the constructor and
destructor:
LoopingSoundInstance::LoopingSoundInstance(
BackgroundSoundContext& Owner,
const BackgroundSoundData::LoopingSoundData& Data) :
Owner(Owner), Data(Data)
{
// Get the parameter fades to the correct size.
ParameterFades.resize(Data.Parameters.size());
// If the roll indicated that we're playing, then force the sound
// to start playing immediately with no fade in.
if (IsPlaying)
{
TotalFadeTime = 0.0f;
RemainingFadeTime = 0.0f;
Tick(0.0f);
}
}
LoopingSoundInstance::~LoopingSoundInstance()
{
Game.CancelTimer(NextRollTimer);
}
// Figure out our percentage of the way through the fade, making
// sure to mind an invalid fade time.
float TimeValue =
(TotalFadeTime <= 0.0f)? 1.0f :
(TotalFadeTime - RemainingFadeTime) / TotalFadeTime;
Background Sounds ◾ 121
Most of the Tick() function is given over to applying the volume, pitch,
and event parameters, although it does also stop the sound when it’s done
fading out to silence. The constructor initializes some data, rolls the dice,
and then immediately forces the fade to end if the sound was played.
OneshotSoundInstance::OneshotSoundInstance(
const BackgroundSoundData::OneshotSoundData& Data) : Data(Data)
{
RollHelper(true);
}
OneshotSoundInstance::~OneshotSoundInstance()
{
Game.CancelTimer(NextRollTimer);
}
void OneshotSoundInstance::Roll()
{
RollHelper(false);
}
// Track the playing event, and reduce the remaining group count.
PlayingEvents.push_back(std::move(NewEvent));
--RemainingGroupCount;
}
// Select the time to next roll of the dice based on whether we're
// currently in a group.
float TimeToNextRoll;
if (RemainingGroupCount > 0)
Background Sounds ◾ 125
{
TimeToNextRoll = Data.GroupDelay.RandomValue();
}
else
{
TimeToNextRoll = Data.TriggerDelay.RandomValue();
}
NextRollTimer =
Game.ScheduleTimer([this]() { Roll(); }, TimeToNextRoll);
}
the panning is Fixed mode, this value can be passed directly to the
event parameter, and if the panning is Surround mode, then it will
need to be scaled to an angle in the range 0..2π.
6.5 CONCLUSION
A good background sound engine can bring your game environments
to life in subtle and wonderful ways. This chapter discussed two meth-
ods for building them: using built-in tools to create an event that can
be played as normal and creating a bespoke system to implement the
feature. Ultimately, which method you choose to use will be a conversa-
tion with your sound designers, but they are both effective and powerful
techniques.
NOTES
1 Early versions of this technology that I built referred to these as “intraset
delays” and “interset delays,” respectively, which I thought was very clever.
My sound designers eventually rebelled at this nomenclature, and I had to
change the names. This chapter will refer to them as group delays and trig-
ger delays, respectively.
2 Version 2.02.13, which is the latest of this writing. Doubtless, a newer ver-
sion will be available by the time it goes to print.
3 Future versions of FMOD (to be released after this book is written – see
previous footnote) are supposed to include per-property seek speeds, which
will solve this issue and simplify the entire setup. When they do materialize,
Background Sounds ◾ 127
it will be possible to remove the event property and just use the square wave
LFO, assuming that the fade characteristics are amenable to your sound
designer.
4 This curve is what would be missing from using the property seek speeds
mentioned in the previous footnotes.
5 I find it interesting that despite the fact that the one-shot sounds have more
configuration complexity, the playback complexity is actually much simpler
than the looping sounds.
Chapter 7
Data-Driven
Music Systems for
Open Worlds
Michelle Auyoung
7.1 INTRODUCTION
For vast open world games, there is a lot of space to fill both visually
and aurally. To help fill the sonic space, a data-driven and responsive
music system can provide a flexible tool to convey evolving gameplay
elements of the world by connecting to the emotional experience of the
music. With this responsive approach between gameplay and music, the
player experience becomes more immersive. The fundamental idea is
to parameterize gameplay data to map to music states and other real-
time parameter controls. Gameplay systems like combat with varying
enemy difficulty and progression of time in the game world can hook
into the music system to respond to these events, driving music state
changes throughout these encounters. To add variability to the music
system, other randomized techniques such as music data controls and
other overrides may help. In scenarios with narrative or scripted events,
this dynamic music system can be disabled. There is no one music system
to handle every gameplay situation, but a data-driven approach can help
with faster and more flexible music implementation for large levels with
various encounters.
Music systems with multiple layers and looping clips driven by game-
play parameters are a common approach to implementing game music,
but this chapter will explore more details on how the data can be driven
to help create a more dynamic musical soundscape. Some common game
engine features and gameplay systems will be discussed to explain how
they can influence music cues. Music data can contain overrides, ran-
domized subsets of higher level music state categories, and variable delay
thresholds.
A simple music manager class can have functions to get music data,
play and stop the music, and set music states, as well as member variables
to store the current music data and state that should be playing. Other
gameplay systems can then drive the music playback by feeding gameplay
data through parameters to control music transitions and fading between
different instrument layers. Some of the music manager functions may be
exposed so designers can script certain music behaviors when the data-
driven dynamic music system is disabled. Here is a basic music manager
Data-Driven Music Systems for Open Worlds ◾ 131
class MusicManager
{
public:
void Initialize();
void Deinitialize();
void PlayMusic();
void StopMusic();
private:
MusicSettings* MusicSettings;
MusicData* AreaMusicData;
float CurrentThreat;
float CurrentHour;
MusicThreshold* GetThreatMusicData();
MusicThreshold* GetTimeOfDayMusicData();
};
events at crucial story points. With the dynamic music system enabled,
separate overrides allow for smaller spaces within a larger level to trigger
unique music when inside the bounds of that area. The goal is to provide
enough flexibility for controlling music changes without overcomplicating
the data and allowing natural gameplay progressions to influence organic
music transitions during non-scripted moments.
struct AmbientStateCycle
{
FloatInterval DelayRange;
Array<MusicState*> AmbientMusicStates;
};
struct MusicThreshold
{
FloatInterval Threshold;
MusicState* MusicState;
AmbientStateCycle StateCycling;
bool EnableAmbientStateCycling;
};
class MusicData
Data-Driven Music Systems for Open Worlds ◾ 133
{
AudioEvent* PlayEvent;
AudioEvent* StopEvent;
MusicState* LevelTransitionState;
Array<MusicThreshold> ThreatMusicStateThresholds;
Array<MusicThreshold> TimeOfDayMusicStateThresholds;
};
gameplay systems can map to. In each container of music data thresholds,
an optional nested subset of ambient music states for each threshold range
can be randomly cycled through to add more variation to each thresh-
old. To add another layer of variability, a range of delays can be applied to
adjust the timing of the music state changes and reduce the abruptness of
the transitions during the random cycling.
to reduce the repetition of staying in the same state and looping the same
music layers too often. For less common threat values such as combat
experiences with more than ten enemies, this threshold may not need to
use ambient state cycling since a level would most likely only have one or
two encounters with that intensity.
For music specific to certain areas or regions on the map, we can assign
music override states to ambient volume actors. When the player enters a
volume, the music state can be overridden with the state assigned for that
volume to create location-based music. We keep track of the current state
in the music manager so that we can resume the music when leaving the
volume back to what it was before the override was applied. Many volumes
are primarily used to play ambient sound beds, but any volume can trig-
ger a music override state, which would not need to be defined in the level
music data itself to function. Therefore, not all volumes need to have an
override music state assigned. If an ambient volume does have an override
music state assigned, the music system will set or clear that override state
when the player enters or exits that volume.
136 ◾ Game Audio Programming 4: Principles and Practices
example, the character can usually gain experience points throughout the
game, slide between good and evil dispositions, and have different party
dynamics with diverse companions. The music data should be generic
enough that any of these features from the game could utilize it through
the music system.
7.8 SUMMARY
To summarize, this is just one of many ways to implement a dynamic music
system, but we found this to work well with a simple framework that still
offers flexibility. The main goal was to fill open world spaces with music that
would automatically evolve with the gameplay, driven by the different game
systems. This music system is not fully procedural or scripted, but we were
able to organize and map the music to game data, so the different gameplay
systems could organically orchestrate the music. It should provide a good
middle ground for implementing music in large open world games.
I would like to thank Raison Varner for designing and providing the
musical framework for me to build the dynamic music system on the game
engine side. Also, I am grateful for the opportunity and support from
many of my peers at Obsidian Entertainment.
II
Low-Level Topics
139
Chapter 8
Finding the
Intersection of a
Box and a Circle
Guy Somberg
8.1 INTRODUCTION
In the second volume of this series, Game Audio Programming: Principles
and Practices Volume 2, Chapter 12 “Approximate Position of Ambient
Sounds of Multiple Sources” by Nic Taylor shows how to determine the
contribution of a sound with multiple sources to a single event’s direction,
spread, and magnitude. It shows two solutions: one using a collection of
rectangles and the other using a uniform grid or set of points. For the solu-
tion using the collection of rectangles, we need to find the intersection of
a circle with a rectangle. Figure 8.1 is a duplicate of Figure 12.11 from that
volume, which shows how the subdivision would work. Section 12.11 of
that volume describes it thus:
The other detail is how to clip a rectangle with a sphere. Doing this
adds many different cases to solve. Instead, a rectangle not fully
inside or outside the sphere can be subdivided and each subdivi-
sion either subdivided further or solved as an independent rect-
angle. The process can be repeated within some error tolerance.
That chapter doesn’t go into greater detail about how to solve this prob-
lem, which is what we will do here.
DOI: 10.1201/9781003330936-10 141
142 ◾ Game Audio Programming 4: Principles and Practices
FIGURE 8.1 Subdividing rectangles. This is the same as Figure 12.11 from Game
Audio Programming: Principles and Practices Volume 2, Chapter 12 by Nic Taylor.
8.2 PSEUDOCODE
We can build the pseudocode for the solution to this problem with just a cou-
ple of observations. First, if the box is fully outside of the circle, then it cannot
be part of the output set. Next, if the box is fully within the circle, then it is
for sure part of the output set, so it gets added. Finally, we split the box into
its four quadrants and recurse into each of them. The only other thing is that
we need a way to break the recursion. So, if the box is at a minimum size and
intersects with the circle, then we’ll add it to the output set without recursion.
Written out, it looks like this:
Now that we have those two helper functions out of the way, we can write
the actual function, which exactly follows the shape of our pseudocode:
{
// If the box is outside of the circle, then we're done.
if (!DoesBoxIntersectCircle(Box, Circle))
return;
Figure 8.2 shows an example output from this algorithm. The non-
shaded rectangles are the resulting output, and the shaded rectangles
are outside of the output. We could stop here and run the resulting 49
rectangles through the volumetric sound code, but that’s a lot of boxes.
We should see whether there is anything that we can do to reduce that
count.
FIGURE 8.3 Circle-box intersection from Figure 8.2, with rectangles labeled A,
B, and C depending on how many of the subdivided rectangles are included in
the output.
146 ◾ Game Audio Programming 4: Principles and Practices
with all four of the quadrants included in the output, B for rectangles with
two adjacent quadrants included in the output, and C for rectangles with
three quadrants included in the output.
As we implement optimizations, we must be cognizant of diminishing
returns. We must balance the complexity of the code, the complexity of the
analysis, and the amount of gain that we get. There exists some optimally
minimal number of boxes that we can reach, but for our use case we are
willing to accept a best effort that gets us close.
FIGURE 8.4 The result of recursively replacing boxes subdivided into four with
the larger box.
Finding the Intersection of a Box and a Circle ◾ 147
by applying the optimization at every level. That is, the smallest boxes
get collapsed into a single box, which leaves the next level up with four
equal-size boxes all in the output, which get collapsed together until
we get to a single large box. After applying this optimization, we are
left with just 25 boxes, just over half of what we started with, which is
a great success!
In order to implement this optimization, we will make two changes to
our code:
return false;
}
Rather than erasing all four entries and then immediately adding one
back, we can just erase the last three and then overwrite the last one:
BoxList.erase(BoxList.end() - 3, BoxList.end());
BoxList.back() = Box;
TABLE 8.1 The different possibilities at the end of the output vector and
the actions that we will take.
Index Configuration Action Merge Offset Erase Offset
0 XY Z
Merge X and Y 3 2
1 X YZ
Merge Y and Z 2 1
2 Merge X and Z 3 1
XY Z
3 XY
Merge X and Y 2 1
1 BL None None
2 BR None None
3 BL, BR BL, BR 1
4 TR None None
6 BR, TR BR, TR 1
8 TL None None
Finding the Intersection of a Box and a Circle ◾ 151
9 BL, TL BL, TL 1
12 TR, TL TR, TL 1
{
// If the box is outside of the circle, then we're done.
if (!DoesBoxIntersectCircle(Box, Circle))
return false;
Subdivisions[SubdivisionCorner::BottomLeft] =
{ Center - Extents, Center };
Subdivisions[SubdivisionCorner::TopRight] =
{ Center, Center + Extents };
Subdivisions[SubdivisionCorner::TopLeft] =
Subdivisions[SubdivisionCorner::BottomLeft].ShiftBy(
Vector2D{ 0.0f, Extents.Y });
Subdivisions[SubdivisionCorner::BottomRight] =
Subdivisions[SubdivisionCorner::TopRight].ShiftBy(
Vector2D{ 0.0f, -Extents.Y });
{
SubdivisionResults[i] =
SubdivideBox(Subdivisions[i], Circle, BoxList);
}
// These are the first three entries from Table 8-1, which tell
// us what to do in each configuration.
static constexpr
std::array<SubdivisionAction, 3> SubdivisionActions =
{ {
{ 3, 2 },
{ 2, 1 },
Finding the Intersection of a Box and a Circle ◾ 155
{ 3, 1 },
} };
// This function returns true if the input box was added in its
// entirety, which we never do in this case.
return false;
}
}
uint8_t SubdivisionSum =
SubdivisionResults[SubdivisionCorner::BottomLeft] * 8 +
SubdivisionResults[SubdivisionCorner::BottomRight] * 4 +
SubdivisionResults[SubdivisionCorner::TopRight] * 2 +
SubdivisionResults[SubdivisionCorner::TopLeft] * 1;
With the index calculated, we special-case the value 15 where all four
quadrants are included. The remainder of the function encodes the func-
tionality described in Tables 8.1 and 8.2, and much of the code length is
taken up with compile-time tables. The actual code looks up the correct
action to take from the table, merges the two boxes together, and then
erases one of the boxes.
8.6 RESULTS
After applying the code from Section 8.5, we get the results from
Figure 8.5, which reduces the total box count from 25 down to 16 - a
reduction of roughly 1/3 from the previous step, and less than 1/3 of the
original count, which is pretty good! This count is not fully optimal –
with some manipulation we can get down to about nine rectangles,
but doing that is well into diminishing returns. 3 We’ll stop here with a
drastic reduction in count that we can now run through our volumetric
sound code.
Finding the Intersection of a Box and a Circle ◾ 157
FIGURE 8.5 The resulting boxes after running through the code from Section 8.5.
NOTES
1 Applying the results of a Karnaugh Map to a function is an interesting exer-
cise, but attempting to use the grouping rules for this example will result in
a solution that is not much simpler than simply writing out the entire table
as a sequence of if() statements.
2 I was about to type in “without loss of generality” until I realized that you do
lose generality. Expanding the formula by hand will require minor changes if
you expand this function to operate on spheres and 3D boxes. Nevertheless,
if you do not expect to make this change, then the non-algorithm code may
end up being more readable – the generated assembly is nearly identical.
3 Plus, I haven’t actually figured out the algorithm to get the optimal count!
Chapter 9
David Su
Begin simultaneously with the others. Sing any pitch. The maxi-
mum length of the pitch is determined by the breath. Listen to the
group. Locate the center of the group sound spectrum. Sing your
pitch again and make a tiny adjustment upward or downward,
but tuning toward the center of the sound spectrum. Continue
to tune slowly, in tiny increments toward the center of the
spectrum …
PAULINE OLIVEROS, SONIC MEDITATIONS XVI
9.1 INTRODUCTION
How are our ears able to distinguish between two musical notes and can
we build a digital audio system that can make that same distinction? More
generally, how can we create audio systems that listen to sounds and pro-
vide us with useful information about their properties? In addition, what
can we do to ensure that such systems run reliably in a real-time interac-
tive environment such as a video game? This is the bread and butter of
pitch tracking in games, which we’ll be exploring in this chapter as well as
in Chapter 10.
Both chapters are inspired in large part by the work I did as the audio
programmer on Bad Dream Games’ One Hand Clapping, a game in which
the player sings into their microphone to solve musical puzzles. The spe-
cific approaches and implementation details might differ, but the pitch
tracking in that game is built upon many of the same principles that we’ll
explore in these chapters.
9.1.2 Motivation
Probably the most common use case for monophonic pitch tracking is in
games that involve live musical input, especially karaoke-based games in
which players sing along to existing music – examples include the games
SingStar, Karaoke Revolution, and Rock Band. In such games, the audio
system must estimate the player’s input pitch so that the game can then
measure that input pitch’s distance to a target pitch (i.e., the current note
in a melody) and thus judge how accurate the player’s singing is.
160 ◾ Game Audio Programming 4: Principles and Practices
Our plugin will also contain a set of parameters (or properties) that
allow the user to control various aspects of the analysis process (the user in
this case might be an audio programmer, sound designer, or even the player
themselves!). As long as we define these parameters in the implementation
of the AK::IAkPluginParam interface (which in this case we’ll implement
via a GapTunerFXParams struct), we’ll have access to the parameters via
our GapTunerFX class’ m_PluginParams member variable, which we set in
our Init() method.
This is what our GapTunerFX class might look like to begin with:
// ----------------------------------------------------------------
// GapTunerFX.h
#include "GapTunerFXParams.h"
public:
// Constructor
GapTunerFX() = default;
// Initialize plugin
AKRESULT Init(AK::IAkPluginMemAlloc* InAllocator,
AK::IAkEffectPluginContext* InContext,
AK::IAkPluginParam* InParams,
AkAudioFormat& InFormat) override;
// Terminate plugin
AKRESULT Term(AK::IAkPluginMemAlloc* InAllocator) override;
private:
};
// ----------------------------------------------------------------
// GapTunerFX.cpp
#include "GapTunerFX.h"
return AK_Success;
}
AkRtpcID OutputPitchParamId =
m_PluginParams->NonRTPC.OutputPitchParameterId;
AK::IAkGlobalPluginContext* GlobalContext =
m_PluginContext->GlobalContext();
AKRESULT Result =
GlobalContext->SetRTPCValue(OutputPitchParamId,
OutputPitchParamValue);
}
Figure 9.1 shows what the UI for our plugin looks like with this output
pitch parameter ID set to the ID of a Pitch RTPC in a Wwise project.
FIGURE 9.1 GapTuner plugin UI, with the output pitch parameter ID hooked
up to an RTPC.
164 ◾ Game Audio Programming 4: Principles and Practices
// ------------------------------------------------------------
// Additional convenience methods (not in original implementation)
// Get the sample at a specific index offset from the read index
SampleType At(uint32_t InIndex) const
{
const uint32_t ReadIndex = ReadCounter.load();
const uint32_t SampleIndex = (ReadIndex + InIndex) % Capacity;
const SampleType SampleValue = InternalBuffer[SampleIndex];
return SampleValue;
}
// ----------------------------------------------------------------
// GapTunerFX.h
// ...
private:
// ...
CircularAudioBuffer<float> m_AnalysisWindow { };
uint32_t m_AnalysisWindowSamplesWritten { 0 };
};
// ----------------------------------------------------------------
// GapTunerFX.cpp
// ----
// Allocate memory for analysis window
const uint32_t WindowSize = m_PluginParams->NonRTPC.WindowSize;
return AK_Success;
}
We can then fill the analysis window with samples each time Execute()
is called (i.e., during each audio block). We’ll also start a separate
Building a Pitch Tracker: Fundamentals ◾ 167
// ----------------------------------------------------------------
// GapTunerFX.cpp
m_AnalysisWindowSamplesWritten += NumSamplesPushed;
// ...
// ----------------------------------------------------------------
// GapTunerAnalysis.h
#include <AK/SoundEngine/Common/IAkPlugin.h>
#include "CircularAudioBuffer/CircularAudioBuffer.h"
namespace GapTunerAnalysis
{
// Fill an analysis window with samples from an input audio buffer
uint32_t FillAnalysisWindow(
AkAudioBuffer* InBuffer,
CircularAudioBuffer<float>& InOutWindow);
}
168 ◾ Game Audio Programming 4: Principles and Practices
// ----------------------------------------------------------------
// GapTunerAnalysis.cpp
#include "GapTunerAnalysis.h"
namespace GapTunerAnalysis
{
uint32_t FillAnalysisWindow(AkAudioBuffer* InBuffer,
CircularAudioBuffer<float>& InOutWindow)
{
const uint32_t NumChannels = InBuffer->NumChannels();
const uint32_t NumSamples = InBuffer->uValidFrames;
uint32_t NumSamplesPushed = 0;
SampleValue += pBuf[SampleIdx];
}
// Set analysis window read index to new write index (now that
// we've pushed samples), so that we're always reading an entire
// window's worth of samples
InOutWindow.AlignReadWriteIndices();
return NumSamplesPushed;
}
}
correlation ( x , y ) = ∑x y
n=0
n n
9.3.2 Autocorrelation
Putting together our definitions of correlation and lag, the autocorrela-
tion function (often shortened to ACF) for an input signal x containing N
samples can be defined as follows, outputting a series of real numbers that
each correspond to the autocorrelation for a given lag t:
ACF ( x )
= correlation ( x , lag ( x , t )) : t ∈{0, …, N − 1}
N −1
= n=0 ∑xn xn −t : t ∈{0, …, n − 1}
0 : t ∈{n, …, N − 1}
The last step we’ll take is to normalize our ACF output so that all the
values end up between –1 and 1. To do this, we divide all our output cor-
relations by the correlation of the first lag:
ACFnormalized ( x )
correlation ( x , lag ( x , t ))
= : t ∈{0, …, N − 1}
correlation ( x 0 , lag ( x , 0 ))
N −1
∑ x n x n −t
= n=0 ( x0 )
2 : t ∈{0, …, n − 1}
0 : t ∈{n, …, N − 1}
// ----------------------------------------------------------------
// GapTunerAnalysis.h
// ...
#include <vector>
namespace GapTunerAnalysis
{
// ...
Building a Pitch Tracker: Fundamentals ◾ 171
// ----------------------------------------------------------------
// GapTunerAnalysis.cpp
// ...
#include <assert.h>
namespace GapTunerAnalysis
{
// ...
void CalculateAcf(
const CircularAudioBuffer<float>& InAnalysisWindow,
std::vector<float>& OutAutocorrelations)
{
assert(
InAnalysisWindow.GetCapacity() == OutAutocorrelations.size());
// Normalize
const float FirstCorrelation = OutAutocorrelations[0];
const float NormalizeMultiplier = FirstCorrelation != 0.f
? 1.f / FirstCorrelation
: 1.f;
172 ◾ Game Audio Programming 4: Principles and Practices
float CalculateAcfForLag(
const CircularAudioBuffer<float>& InSamples,
const uint32_t InLag)
{
const size_t WindowSize = InSamples.GetCapacity();
float Sum = 0.f;
return Sum;
}
}
FIGURE 9.2 Plot of autocorrelation coefficients for a sine wave playing at 440 Hz,
with a window size of 2,048 samples and sample rate of 48,000 Hz.
// ----------------------------------------------------------------
// GapTunerAnalysis.h
// ...
namespace GapTunerAnalysis
{
// ...
// ----------------------------------------------------------------
// GapTunerAnalysis.cpp
// ...
namespace GapTunerAnalysis
{
// ...
}
Building a Pitch Tracker: Fundamentals ◾ 175
// ----------------------------------------------------------------
// GapTunerAnalysis.h
// ...
namespace GapTunerAnalysis
{
// ...
// ----------------------------------------------------------------
// GapTunerAnalysis.cpp
// ...
namespace GapTunerAnalysis
{
// ...
uint32_t FindAcfPeakLag(
const std::vector<float>& InAutocorrelations)
{
const size_t WindowSize = InAutocorrelations.size();
176 ◾ Game Audio Programming 4: Principles and Practices
uint32_t PeakLag = 0;
float PeakCorr = 0.f;
bool bReachedFirstZeroCrossing = false;
return PeakLag;
}
}
We can then take that peak lag, convert it to Hz, and set the output pitch
parameter. This is what our GapTunerFX class looks like with all that in place,
with the bulk of the work being done in the Execute() method:
// ----------------------------------------------------------------
// GapTunerFX.h
#include <AK/SoundEngine/Common/IAkPlugin.h>
#include "CircularAudioBuffer/CircularAudioBuffer.h"
#include "GapTunerFXParams.h"
public:
// Constructor
GapTunerFX() = default;
// Initialize plugin
AKRESULT Init(AK::IAkPluginMemAlloc* InAllocator,
AK::IAkEffectPluginContext* InContext,
AK::IAkPluginParam* InParams,
AkAudioFormat& InFormat) override;
// Terminate plugin
AKRESULT Term(AK::IAkPluginMemAlloc* InAllocator) override;
private:
// -------
// Analysis members
};
178 ◾ Game Audio Programming 4: Principles and Practices
// ----------------------------------------------------------------
// GapTunerFX.cpp
#include "GapTunerFX.h"
#include <AK/AkWwiseSDKVersion.h>
#include "GapTunerAnalysis.h"
#include "../GapTunerConfig.h"
// ----
// Allocate memory for analysis window
const uint32_t WindowSize = m_PluginParams->NonRTPC.WindowSize;
m_AutocorrelationCoefficients.resize(WindowSize);
return AK_Success;
}
m_AnalysisWindowSamplesWritten += NumSamplesPushed;
// ----
// Perform analysis
// Autocorrelation
GapTunerAnalysis::CalculateAcf(m_AnalysisWindow,
m_AutocorrelationCoefficients);
// ----
// Peak picking
const uint32_t PeakLag =
GapTunerAnalysis::FindAcfPeakLag(m_AutocorrelationCoefficients);
// ----
// Conversion
const float PeakFrequency =
GapTunerAnalysis::ConvertSamplesToHz(static_cast<float>(PeakLag),
m_SampleRate);
// ----
// Set output parameters
AkRtpcID OutputPitchParamID =
m_PluginParams->NonRTPC.OutputPitchParameterId;
FIGURE 9.4 Wwise Game Sync monitor showing the tracked pitch of the 440 Hz
sine wave.
// Set RTPC
AK::IAkGlobalPluginContext* GlobalContext =
m_PluginContext->GlobalContext();
AKRESULT Result =
GlobalContext->SetRTPCValue(OutputPitchParamID,
OutputPitchParamValue);
}
Figure 9.4 shows the output pitch in the Wwise Game Sync Monitor
(again with a sine wave at 440 Hz). We can see that the chosen peak does
indeed correspond to 440.367 Hz.
9.4 CONCLUSION
We’ve successfully built a pitch tracker! Figure 9.5 shows the UI for our
Wwise plugin, with its window size and output pitch parameters. However,
our current implementation isn’t something that can really be used in a
NOTES
1 For more information on this topic, see Christopher Dobrian’s “Frequency
and Pitch” lesson from his Physics of Sound educational materials [3].
2 “Gap,” short for “Game Audio Programming” – this book – rather than the
word “gap” – Ed.
3 Note that we index starting from 0 instead of 1 in order to maintain consis-
tency with the corresponding implementation in code.
4 Note that the preferred method for storing audio buffers in Wwise
plugins is either via the AkArray class or via raw arrays in conjunction
with AK_PLUGIN_ALLOC(). In our case we’ll stick with std::vectors for
the sake of generality and compatibility.
5 Technically, there are ways to get more precision out of our data – for more
on that topic, check out Section 10.2.3 of this volume.
REFERENCES
[1] Audio plug-ins. In Wwise SDK 2022.1.2. Audiokinetic. https://www.
audiokinetic.com/en/library/edge/?source=SDK&id=effectplugin.html
[2] Geller, E. (2020). Building the patch cable. In Game Audio Programming:
Principles and Practices Volume 3 (pp. 93–118). CRC Press. https://www.
routledge.com/Game-Audio-Programming-3-Principles-and-Practices/
Somberg/p/book/9780367348045
[3] Dobrian, C. (2019). Frequency and pitch. In Physics and Sound. https://
dobrian.github.io/cmp/topics/physics-of-sound/1.frequency-and-pitch.
html
Chapter 10
David Su
… Each time sing a long tone with a complete breath until the whole
group is singing the same pitch. Continue to drone on that center
pitch far about the same length of time it took to reach the unison.
Then begin adjusting or tuning away from the center pitch as the
original beginning was.
PAULINE OLIVEROS, SONIC MEDITATIONS XVI
10.1 INTRODUCTION
In Chapter 9, we wrote a pitch detector that works, but isn’t efficient and
has accuracy issues. In this chapter, we’ll continue that development to
make a tool that we can use in a real game.
As a quick refresher, our GapTuner plugin currently computes the auto-
correlation function for all lag values in an analysis window, picks the lag
value with the highest (“peak”) autocorrelation, converts that lag value to
frequency, and finally outputs that frequency as the estimated pitch. This
all happens in the plugin’s Execute() function, which is called every audio
frame.
FIGURE 10.1 (a) Tracked pitch for the first four measures of Prélude à l’après-midi
d’un faune. (b) Musical notation for the first four measures of Prélude à l’après-midi
d’un faune.
FIGURE 10.2 Zoomed in portion of the tracked pitch from Figure 10.1a, showing
octave errors in the first C# note.
184 ◾ Game Audio Programming 4: Principles and Practices
// ----------------------------------------------------------------
// GapTunerAnalysis.h
#include <vector>
#include <AK/SoundEngine/Common/IAkPlugin.h>
#include "CircularAudioBuffer/CircularAudioBuffer.h"
namespace GapTunerAnalysis
{
// ...
// ----------------------------------------------------------------
// GapTunerAnalysis.cpp
#include "GapTunerAnalysis.h"
#include <assert.h>
namespace GapTunerAnalysis
{
// ...
Building a Pitch Tracker: Practical Techniques ◾ 185
// Reset
MaximaLag = 0;
MaximaCorr = 0.f;
}
else if (PrevCorr > 0.f && NextCorr < 0.f) // Negative slope
{
bReachedNextPositiveZeroCrossing = false;
}
}
186 ◾ Game Audio Programming 4: Principles and Practices
Our job is then to pick the most appropriate peak out of those key
maxima. To do so, we’ll define a threshold multiplier and multiply it by
the highest valued maxima – this is our threshold. We can then take the
first key maxima whose correlation value is above the threshold. This
gives us a chance to pick the fundamental frequency even in cases where
subharmonics might have a higher correlation. In general, lower harmonics
tend to be a bigger issue for pitch detectors than upper harmonics – if you
look again at Figure 10.2, you’ll notice that all the octave errors result from
downward jumps rather than upward ones. Figure 10.3 shows a plot of the
key maxima for a single analysis window for a sine wave at 440 Hz. Here’s
what this peak-picking process looks like in code.
// ----------------------------------------------------------------
// GapTunerAnalysis.h
// ...
namespace GapTunerAnalysis
{
// ...
// ----------------------------------------------------------------
// GapTunerAnalysis.cpp
// ...
namespace GapTunerAnalysis
{
// ...
uint32_t PickBestMaxima(
const std::vector<float>& InKeyMaximaLags,
const std::vector<float>& InKeyMaximaCorrelations,
const uint32_t InNumKeyMaxima,
const float InThresholdMultiplier)
{
// This is the index in the array of maxima, not the actual lag
uint32_t HighestMaximaIdx = 0;
float HighestMaximaCorr = 0.f;
{
const float MaximaCorr = InKeyMaximaCorrelations[MaximaIdx];
return BestMaximaIdx;
}
}
We can now plug this new peak-picking process into our Execute()
method:
// ----------------------------------------------------------------
// GapTunerFX.h
// ...
// ...
Building a Pitch Tracker: Practical Techniques ◾ 189
private:
// -------
// Analysis members
// ...
// ----------------------------------------------------------------
// GapTunerFX.cpp
// ----
// Perform analysis (same as before)
// ...
// ----
// Peak picking
const uint32_t MaxNumKeyMaxima =
m_PluginParams->NonRTPC.MaxNumKeyMaxima;
// ----
// Set output parameters
Figure 10.4a shows how the tracked pitch looks for those same flute
notes, with a threshold multiplier of 0.95. This looks a little better, espe-
cially on the static notes, but we can still see some discontinuities. Perhaps
we need to decrease our threshold a bit more? Figure 10.4b shows the
tracked pitch with a threshold multiplier of 0.8. The static notes are even
smoother, which is good, but now we have occasional upward octave
jumps! This indicates that our thresholding is biasing too strongly against
lower frequencies, leading us to pick upper harmonics instead of the fun-
damental frequency.
It’s worth playing around with the threshold value a bit to find what
works best for your exact use case – it can be a delicate balancing act.
Figure 10.4c shows the tracked pitch with a threshold multiplier of 0.88
(close to midway between 0.8 and 0.95). It’s smoother than Figure 10.4a
(notice especially the second C#, which has no octave errors at all now)
and has many fewer upward octave jumps than Figure 10.4b (only one as
opposed to seven) and so feels like a nice compromise between upper and
lower octave errors. Still, can we do better?
FIGURE 10.4 (a) Tracked pitch for first four measures of Prélude à l’après-midi
d’un faune, with a key maxima threshold multiplier of 0.95. (b) Tracked pitch
for first four measures of Prélude à l’après-midi d’un faune, with a key maxima
threshold multiplier of 0.8. (c) Tracked pitch for first four measures of Prélude à
l’après-midi d’un faune, with a key maxima threshold multiplier of 0.88.
192 ◾ Game Audio Programming 4: Principles and Practices
example, when we hear a flute playing a note, we can identify a clear fun-
damental frequency, which is why we think of it as pitched (and can say
something like “the flute is playing a C#”). On the other hand, when the
flute player takes a breath, the sound is much more noisy, and it’s generally
hard for us to pinpoint a specific frequency that’s stronger than the others.
As an exercise, try identifying the pitch of your own breathing – it’s tough!
We can use the actual correlation value of our chosen peaks as a rough
estimate of how “pitched” a sound is. In other words, how strongly does
our input signal actually match the peak pitch that we’ve chosen? How
confident are we in our pitch estimate? The MPM calls this the clarity
measure, and we can use this measure to define a clarity threshold. If
the peak correlation value is above this threshold (i.e., we’re confident
enough in our pitch estimate), then we’ll accept the estimated pitch and
update our output parameter – otherwise, we’ll discard that value and
wait for the next frame. We can accomplish this by defining a bSetRtpc
condition:
// ----------------------------------------------------------------
// GapTunerFX.cpp
// ...
// Set RTPC
AK::IAkGlobalPluginContext* GlobalContext =
m_PluginContext->GlobalContext();
AKRESULT Result =
GlobalContext->SetRTPCValue(OutputPitchParamId,
OutputPitchParamValue);
}
}
Figure 10.5 shows how the tracked pitch now looks, with a clarity
threshold of 0.88 along with a maxima threshold of 0.88 – definitely a lot
better than before!
FIGURE 10.5 Tracked pitch for first four measures of Prélude à l’après-midi
d’un faune, with a key maxima threshold multiplier of 0.88 and clarity thresh-
old of 0.88.
194 ◾ Game Audio Programming 4: Principles and Practices
control over the sample rate (and even if we did, we likely wouldn’t want to
change it just for the sake of pitch detection!).
Luckily, we can use successive parabolic interpolation to arrive at
peaks “in between” samples, which allows us to improve precision with-
out messing with the sample rate. For each key maximum, we can also
take its left and right neighbors, fit a parabolic curve to those three val-
ues, and use the maximum point of that parabola as our estimate. In
code, this looks like:
// ----------------------------------------------------------------
// GapTunerAnalysis.h
// ...
namespace GapTunerAnalysis
{
// ...
// ----------------------------------------------------------------
// GapTunerAnalysis.cpp
// ...
namespace GapTunerAnalysis
{
// ...
float FindInterpolatedMaximaLag(
const uint32_t InMaximaLag,
const std::vector<float>& InAutocorrelations)
{
const size_t WindowSize = InAutocorrelations.size();
auto InterpolatedLag = static_cast<float>(InMaximaLag);
Building a Pitch Tracker: Practical Techniques ◾ 195
// ----
// Perform interpolation calculation
return InterpolatedLag;
}
// ...
196 ◾ Game Audio Programming 4: Principles and Practices
MaximaCorr = Corr;
}
}
// ...
}
}
Figure 10.6 shows our pitch tracker’s output with parabolic interpola-
tion incorporated. At a glance, it doesn’t look much different from before.
However, if we zoom in a bit, we can see that the tracked pitch is indeed
more smooth and continuous than before. Figure 10.7a shows a zoomed in
portion of the pitch tracker output without parabolic interpolation – notice
FIGURE 10.6 Tracked pitch for first four measures of Prélude à l’après-midi d’un
faune, with parabolic interpolation applied to key maxima.
Building a Pitch Tracker: Practical Techniques ◾ 197
FIGURE 10.7 (a) Tracked pitch for first measure of Prélude à l’après-midi d’un
faune, without parabolic interpolation. (b) Tracked pitch for first measure of
Prélude à l’après-midi d’un faune, with parabolic interpolation.
the minor jumps that manifest as sharp corners. Figure 10.7b shows the
same portion with parabolic interpolation – you can see that the jumps are
much more fine-grained owing to the increased precision.
You can also see that the overall tracked pitch values themselves differ
slightly – in the frame chosen, the tracked pitch without parabolic interpo-
lation is 564.706 (which corresponds to a lag of 85 samples at 48,000 Hz),
whereas the tracked pitch with parabolic interpolation is 563.089 (which
corresponds to a lag of 85.244 samples2 at 48,000 Hz).
198 ◾ Game Audio Programming 4: Principles and Practices
// ----------------------------------------------------------------
// dj_fft.h
// ...
namespace dj {
// ...
Building a Pitch Tracker: Practical Techniques ◾ 199
// ...
// ----------------------------------------------------------------
// GapTunerAnalysis.h
// ...
#include <complex>
#include "dj_fft/dj_fft.h"
namespace GapTunerAnalysis
{
// ...
// ----------------------------------------------------------------
// GapTunerAnalysis.cpp
// ...
namespace GapTunerAnalysis
{
// ...
200 ◾ Game Audio Programming 4: Principles and Practices
void CalculateFft(
const std::vector<std::complex<double>>& InFftSequence,
std::vector<std::complex<double>>& OutFftSequence,
const dj::fft_dir InFftDirection)
{
dj::fft1d(InFftSequence, OutFftSequence, InFftDirection);
}
}
1. Fill the input array with the contents of the analysis window, then
zero-pad it so that it’s twice the window size.
2. Take the FFT of the zero-padded input.
3. Compute the squared magnitude of each coefficient in the FFT out-
put to get the power spectral density.
4. Take the IFFT (inverse FFT) of the array of squared magnitudes.
5. Take the real part of each value in the IFFT output and divide it by
the DC component (first element) – the result gives the correlation
coefficient between –1 and 1.
And in code:
// ----------------------------------------------------------------
// GapTunerAnalysis.h
// ...
namespace GapTunerAnalysis
{
// ...
// ----------------------------------------------------------------
// GapTunerAnalysis.cpp
// ...
namespace GapTunerAnalysis
{
// ...
void CalculateAcf_Fft(
const CircularAudioBuffer<float>& InAnalysisWindow,
std::vector<std::complex<double>>& OutFftInput,
std::vector<std::complex<double>>& OutFftOutput,
std::vector<float>& OutAutocorrelations)
{
// 1. Fill the input array with the contents of the analysis
// window, then zero-pad it so that it's twice the window size
assert(
InAnalysisWindow.GetCapacity() == OutAutocorrelations.size());
OutFftInput[CoeffIdx] = SquaredMagnitude;
}
// 5. Take the real part of each value in the IFFT output and
// divide by the DC component (first element) -- the result
// gives the correlation coefficient between -1 and 1
const auto IfftDcComponent =
static_cast<float>(OutFftOutput[0].real());
OutAutocorrelations[CoeffIdx] = CoefficientRealComponent /
IfftDcComponent;
}
}
}
When we run our pitch tracker with this FFT-based approach, the out-
put pitch parameter ends up looking identical to Figure 10.6, at a fraction
of the CPU cost – thanks to the FFT, we’re able to track pitches more effi-
ciently without sacrificing accuracy at all!
Building a Pitch Tracker: Practical Techniques ◾ 203
10.3.2 Downsampling
Another way we can make our pitch tracker more efficient is by downsam-
pling our analysis buffer. By decreasing the number of samples we operate
on, we can further reduce our pitch tracker’s computational load as well as
its memory footprint.
To implement downsampling in our pitch tracker, we can add a downs-
ampling factor parameter (accessible via m_PluginParams->NonRTPC.
DownsamplingFactor) and divide our window size by this factor. Since this
will affect the entirety of our analysis process, we’ll have to make a few changes:
#pragma once
#include <complex>
#include <AK/SoundEngine/Common/IAkPlugin.h>
204 ◾ Game Audio Programming 4: Principles and Practices
#include "CircularAudioBuffer/CircularAudioBuffer.h"
#include "GapTunerFXParams.h"
public:
// Constructor
GapTunerFX() = default;
// Initialize plugin
AKRESULT Init(AK::IAkPluginMemAlloc* InAllocator,
AK::IAkEffectPluginContext* InContext,
AK::IAkPluginParam* InParams,
AkAudioFormat& InFormat) override;
// Terminate plugin
AKRESULT Term(AK::IAkPluginMemAlloc* InAllocator) override;
private:
// -------
// Analysis members
std::vector<std::complex<double>> m_FftIn { };
std::vector<std::complex<double>> m_FftOut { };
};
// ----------------------------------------------------------------
// GapTunerFX.cpp
#include "GapTunerFX.h"
#include <AK/AkWwiseSDKVersion.h>
#include "GapTunerAnalysis.h"
#include "../GapTunerConfig.h"
// ----
// Allocate memory for analysis window
const uint32_t WindowSize = GetWindowSize();
m_AutocorrelationCoefficients.resize(WindowSize);
206 ◾ Game Audio Programming 4: Principles and Practices
// ----
// Allocate memory for key maxima
const uint32_t MaxNumKeyMaxima =
m_PluginParams->NonRTPC.MaxNumKeyMaxima;
m_KeyMaximaLags.resize(MaxNumKeyMaxima);
m_KeyMaximaCorrelations.resize(MaxNumKeyMaxima);
// ----
// Allocate memory for FFT
const uint32_t FftWindowSize = WindowSize * 2;
m_FftIn.resize(FftWindowSize);
m_FftOut.resize(FftWindowSize);
return AK_Success;
}
m_AnalysisWindowSamplesWritten += NumSamplesPushed;
Building a Pitch Tracker: Practical Techniques ◾ 207
// ----
// Perform analysis
GapTunerAnalysis::CalculateAcf_Fft(
m_AnalysisWindow,
m_FftIn,
m_FftOut,
m_AutocorrelationCoefficients);
// ----
// Peak picking
const uint32_t MaxNumKeyMaxima =
m_PluginParams->NonRTPC.MaxNumKeyMaxima;
// ----
// Conversion
const uint32_t AnalysisSampleRate =
m_SampleRate / m_PluginParams->NonRTPC.DownsamplingFactor;
// ----
// Set output parameters
// Set RTPC
AK::IAkGlobalPluginContext* GlobalContext =
m_PluginContext->GlobalContext();
AKRESULT Result =
GlobalContext->SetRTPCValue(OutputPitchParamId,
OutputPitchParamValue);
}
}
10.6.1 Calibration
One game-side step that can help with the quality of analysis is calibration.
This can take many forms, from having the player be silent for a period of
time to having them input a specific note (or series of notes).
if (bSetRtpc)
{
// Set RTPC with interpolation
AK::IAkGlobalPluginContext* GlobalContext =
m_PluginContext->GlobalContext();
AKRESULT Result =
GlobalContext->SetRTPCValue(OutputPitchParamId,
OutputPitchParamValue,
AK_INVALID_GAME_OBJECT,
SmoothingRateMs,
SmoothingCurve,
false);
}
}
results (as in the case of pitch correction) since we’re already on the audio
thread. Indeed, the current implementation of our pitch tracker as a plugin
means that we can place it anywhere in our signal chain – for example,
we might want to apply a compressor to the input, perform pitch estima-
tion, and then add some distortion. Similarly, we can easily have multiple
instances of the plugin running at once (e.g., if we have a local multiplayer
game with several players singing into separate microphones).
On the other hand, performing our analysis on the game thread frees up
audio thread resources that can then be devoted to actually generating or
processing audio that the player can hear. With that in mind, if you know
that you don’t need sample-accurate pitch tracking to begin with, and a
slower update rate is satisfactory for your use case, then the game thread or
a worker thread may be the better place for you to do your analysis.
10.7 CONCLUSION
As with many such chapters, we’ve only really scratched the surface of
pitch tracking – we haven’t even touched on topics such as timbral fea-
tures, noise cancellation, input monitoring, windowing functions, offline
analysis, alternate correlation measures, managing audio input devices,
multi-microphone setups, or voice-specific considerations.
That being said, it’s pretty cool to see that by applying some DSP prin-
ciples we can actually build a fully functional pitch tracker! Figure 10.8
FIGURE 10.8 UI for the final version of our GapTuner Wwise plugin.
Building a Pitch Tracker: Practical Techniques ◾ 213
shows the final UI of our GapTuner plugin, encompassing all the param-
eters that we’ve covered. The complete source code for this plugin (which
includes GapTunerFX, GapTunerAnalysis, and our modified implemen-
tations of CircularAudioBuffer and dj::fft1d()) can be found at
https://github.com/usdivad/GapTuner.
My hope is that, along with Chapter 9, this chapter serves to whet your appe-
tite for the world of pitch tracking and that it sparks your imagination with ideas
for how real-time musical analysis might be applied in a game audio setting.
NOTES
1 More specifically, the MPM uses the normalized square difference function
(NSDF) as input to its peak-picking process – the NSDF, in turn, uses auto-
correlation as one of its components. For our plugin, we’ll stick with direct
autocorrelation for simplicity.
2 Incidentally, this is why our definition of ConvertSamplesToHz() in
Section 9.3.2 of this volume declares InNumSamples as a float rather than
an int – this way, we can convert the “in-between” interpolated samples as
well as the regular integer-indexed samples.
3 Probably even an entire book! – Ed.
4 “Coyote time” is a term that refers to old cartoons – in particular, the road-
runner and coyote cartoons – where a character (usually the coyote) hangs
in the air before falling (usually only after looking down.) – Ed.
REFERENCES
[1] McLeod, P., & Wyvill, G. (2005, September). A smarter way to find pitch.
In Proceedings of the 2005 International Computer Music Conference.
https://www.cs.otago.ac.nz/students/postgrads/tartini/papers/A_Smarter_
Way_to_Find_Pitch.pdf
[2] Downey, A. (2016). Filtering and convolution. In Think DSP: Digital Signal
Processing in Python (pp. 89–100). O’Reilly Media. https://greenteapress.com/
wp/think-dsp/
[3] Frigo, M., & Johnson, S. G. (2005). FFTW: Fastest Fourier transform in the
West [Computer software]. https://www.fftw.org/
214 ◾ Game Audio Programming 4: Principles and Practices
[4] FFT functions (2022). In Developer Reference for Intel® oneAPI Math Kernel
Library - C. Intel. https://www.intel.com/content/www/us/en/develop/
documentation/onemkl-developer-reference-c/top/fourier-transform-
functions/fft-functions.html
[5] Pommier, J. (2022). PFFFT: Pretty fast FFT [Computer software]. https://
bitbucket.org/jpommier/pffft/
[6] Dupuy, J. (2019). dj_fft: Header-only FFT library [Computer software].
https://github.com/jdupuy/dj_fft
[7] De Cheveigné, A., & Kawahara, H. (2002). YIN, a fundamental fre-
quency estimator for speech and music. In The Journal of the Acoustical
Society of America, 111(4), 1917–1930. https://asa.scitation.org/doi/abs/
10.1121/1.1458024
[8] Lee, B. S. (2012). Noise robust pitch tracking by subband autocorrelation clas-
sification. Columbia University. https://academiccommons.columbia.edu/
doi/10.7916/D8BV7PS0/download
[9] Mauch, M., & Dixon, S. (2014, May). pYIN: A fundamental frequency esti-
mator using probabilistic threshold distributions. In 2014 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 659–663).
IEEE. https://ieeexplore.ieee.org/abstract/document/6853678/
[10] Noll, A. M. (1967). Cepstrum pitch determination. In The Journal of the
Acoustical Society of America, 41(2), 293–309. https://asa.scitation.org/doi/
abs/10.1121/1.1910339
[11] Schroeder, M. R. (1968). Period histogram and product spectrum: New
methods for fundamental-frequency measurement. In The Journal of the
Acoustical Society of America, 43(4), 829–834. https://asa.scitation.org/doi/
abs/10.1121/1.1910902
[12] Wise, J. C. J. D., Caprio, J., & Parks, T. (1976). Maximum likelihood pitch
estimation. In IEEE Transactions on Acoustics, Speech, and Signal Processing,
24(5), 418–423. https://ieeexplore.ieee.org/abstract/document/1162852/
[13] Brown, J. C., & Puckette, M. S. (1993). A high resolution fundamental fre-
quency determination based on phase changes of the Fourier transform. In
The Journal of the Acoustical Society of America, 94(2), 662–667. https://asa.
scitation.org/doi/abs/10.1121/1.406883
[14] Zahorian, S. A., & Hu, H. (2008). A spectral/temporal method for
robust fundamental frequency tracking. In The Journal of the Acoustical
Society of America, 123(6), 4559–4571. https://asa.scitation.org/doi/abs/
10.1121/1.2916590
[15] Kim, J. W., Salamon, J., Li, P., & Bello, J. P. (2018, April). CREPE: A con-
volutional representation for pitch estimation. In 2018 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 161–165).
IEEE. https://ieeexplore.ieee.org/abstract/document/8461329
[16] Gfeller, B., Frank, C., Roblek, D., Sharifi, M., Tagliasacchi, M., &
Velimirović, M. (2020). SPICE: Self-supervised pitch estimation. In IEEE/
ACM Transactions on Audio, Speech, and Language Processing, 28, 1118–1128.
https://ieeexplore.ieee.org/abstract/document/9043478
Chapter 11
Robert Bantin
11.1 INTRODUCTION
What is a delay line? Whenever a digital audio system transmits an
audio signal from one point in a mix graph to another, irrespective of
whether it transmits a single sample frame or a collection of frames
in a buffer, that transmission usually represents what is “now” in the
timeline. If, for any reason, you require a transmission that represents
“the past” in the timeline, whatever signal data that represents “now”
for that transmission needs to be stored somewhere so that when it is
later retrieved, it is “the past” in relative terms. Since mix graphs are
continuous data carrying systems, this store-for-later mechanism also
needs to be built as a continuous data carrying system for the sake of
compatibility. Put simply, then, a delay line is a store-for-later mech-
anism that’s compatible with digital audio signal data being carried
through a mix graph.
What is the use of an audio signal from the past? When we begin to
visualize how sound propagates through volumes of air while bouncing
off reflective surfaces, it becomes apparent that the direct-line-of-sight
signal coming from a sound source to the player’s listener position (i.e.,
“now”) would only represent a portion of the total audible sound field
of that source in a real-world scenario. The rest is from echoes of “the
class FlexibleDelayLine
{
public:
void Process(float* anIObuffer, int numSampleFrames);
private:
float myCurrentDelayTime;
int myWritePos;
float myBuffer[MAX_BUFFER_LEN];
float myFloatSamplerate;
};
void FlexibleDelayLine::Process(
float* anIObuffer, int numSampleFrames)
{
float* readPtr = anIObuffer;
float* writePtr = anIObuffer;
int samplesDelayed =
static_cast<int>(myCurrentDelayTime * myFloatSamplerate);
218 ◾ Game Audio Programming 4: Principles and Practices
So, for a buffer that is 2N sample frames long, what is the maximum
delay time feasible?
Place the “write” position at index (0). The minimum delay would be 1
sample frame, achieved when the “read” position is one index behind, in
this case (–1). But our buffer is circular, so index (–1) becomes index (2N – 1)
automatically. We can keep pushing the “read” position backward, and the
delay time gets longer as we do. However, we can’t allow the “read” index to
collide with the “write” index, so the last feasible index would be index (1). If
you count that index and the remaining ones until we get back to the write
position, the maximum delay possible is then 2N – 1 sample frames.
where fs is the sample rate and cair is the speed of sound in air. As an example,
a 10 m propagation distance should be about 1,399 sample frames, when fs
is 48,000 Hz and cair is 343 ms−1. Since fs will be constant, and vair can be
fs
assumed constant, this gives us a tidy scale factor of vair for a samples-per-
meter coefficient that we can initialize at a convenient time, leaving us with
a simple multiply to scale d with to give us a delay of Tseconds (which converts
to Tframes or “samples delayed” when the moment is right).
class FlexibleDelayLine
{
public:
void Initialize(int aSamplerate);
void SetDelayFromDistance(float aDistance);
void Process(float* anIObuffer, int numSampleFrames);
private:
float myCurrentDelayTime;
int myWritePos;
float myBuffer[MAX_BUFFER_LEN];
float myFloatSamplerate;
float mySamplesPerMetre;
};
11.4.2 Movement
This is a little trickier, but not by much. Irrespective of how you decide to
track movement in your game, you must ultimately determine a current
distance and a new distance and then calculate the gradient between the
two over the time it takes to make that transition. When you write your
DSP code, it should be working in even sized buffers, and so as your flex-
ible delay line is being read out, the read position can be modulated across
the length of the buffer using a sample-normalized version of that gradi-
ent. Remember, we already know what that sample-normalized formula
looks like: it’s the same as the one for distance above scaled for the number
of sample frames in the buffer.
class FlexibleDelayLine
{
public:
void Initialize(int aSamplerate);
void SetDelayFromDistance(float aDistance);
void Process(float* anIObuffer, int numSampleFrames);
private:
float myCurrentDelayTime;
float myLastDelayTime;
int myWritePos;
float myBuffer[MAX_BUFFER_LEN];
float myFloatSamplerate;
float mySamplesPerMetre;
};
void FlexibleDelayLine::Process(
float* anIObuffer, int numSampleFrames)
{
float timeGradient =
(myCurrentDelayTime - myLastDelayTime) /
static_cast<float>(numSampleFrames);
float currDelayTime = myLastDelayTime;
int samplesDelayed =
static_cast<int>(currDelayTime * myFloatSamplerate);
currDelayTime += timeGradient;
The key here is to accept that you cannot create delay times of less than
one sample frame using this technique, so just quantize the delay times at
the start and end of the buffer. The only moment you need to worry about
sub-sample delays is when calculating all the other delay times in between,
because the sample-normalized gradient will usually produce fractional
values. Rounding off every one of those in-between delay times will give
222 ◾ Game Audio Programming 4: Principles and Practices
float FlexibleDelayLine::Interpolate(
const float inputA, const float inputB, const float ratio)
{
return inputA * (1.0f - ratio) + inputB * ratio;
}
void FlexibleDelayLine::Process(
float* anIObuffer, int numSampleFrames)
{
float timeGradient =
(myCurrentDelayTime - myLastDelayTime) /
static_cast<float>(numSampleFrames);
float currDelayTime = myLastDelayTime;
currDelayTime += timeGradient;
class PowercomplimentaryInterpolator
{
public:
PowercomplimentaryInterpolator()
: myTableData()
, myTableLength(ourStaticInterpolationDepth)
, myTableScale(static_cast<float>(myTableLength - 1))
{
float nIdx = 0.0f;
++nIdx;
private:
float myTableData[ourStaticInterpolationDepth];
int myTableLength;
float myTableScale;
};
The advantage it has over the linear approach is that it’s better at dealing
with noisy sample values in the delay line.
11.4.2.4 Oversampling
If you up-sample the input signal (e.g., 4×, 8×, 16×) and then store those val-
ues in a delay line extended in length by the same scale factor, you can pick
the oversampled values in between the base-rate sample values. The higher
the oversampling factor, the less quantization artefacts you will introduce.
This oversampling technique does, however, mean that your delay line
will allocate much more memory, and it doesn’t come cheap either. This
technique is usually implemented by inserting the input buffer into a new
(larger) buffer at the new sample rate with zeros padded in between the
input sample values and then performing a low-pass filter over the new
buffer with the cut-off frequency set just below the Nyquist limit of the
original sample rate. Sinc function-based interpolation can work well
enough for that, but also consider a poly-phase FIR approach that avoids
performing multiplies on the zero pad values [2].
11.4.3 Doppler
Dealing with movement correctly will force you to build a system that
resamples the signal captured in the delay line because the effect of modu-
lating the delay time is to compress and expand time. If things are running
well, you should get Doppler for free.
about getting behind these splines. Figure 11.2 shows an example plan
view of a vehicle between the edges of a racetrack with splines marked out,
engine/motor emitter, and reflection points.
Getting the shortest distance to the spline should be computationally
cheap, in which case we won’t take care of the angle of incidence from
the reflective surface. So, this is a quasi-2D reflection system living in the
azimuth plane of a 3D world. However, the effect will be absolutely worth
the trouble once it’s fully implemented, as the player will hear themselves
driving through tunnels and under bridges. The delay time from the sides
of the car to these reflective surfaces will be based on twice the distance
from the sound emitter to the spline (the sound must propagate there-
and-back), and the delay effect needs to flip the phase of the delayed signal
so that it interferes with the outgoing sound in a similar way to real life.
This is a huge simplification, but since we’re not going to model acoustic
absorption, we may as well assume that all these reflective surfaces are
made of flat concrete and then our simplification here holds up.
FIGURE 11.2 A vehicle between edges of a racetrack with splines marked out,
engine/motor emitter, and reflection points.
Flexible Delay Lines ◾ 227
With any procedural system for emitter placement, some criteria need
to be met to make sure we don’t propagate any weird results. These mis-
reads need to be culled. The typical reasons for culling are:
1. When the system re-raycasts directly to a candidate position as a
sanity check, it gets a wildly different result from the original test
ray-cast. Clearly, the test ray hit a surface that didn’t continue to the
spot where the actual reflection would occur from.
2. Some candidate positions are so close that they “flam” or interfere
with one another, so we consider them near-duplicates and cull all
but one of them.
3. Perhaps we need lots of test ray-casts to get good coverage, but we
don’t necessarily want all of them to resolve to reflection points as
this might sound too chaotic. We could then consider having more
test ray-casts than our desired reflection budget would want to han-
dle. If our implementation manages to collect more reflection points
than it has budget for – even after the previous culling steps have
been taken – further culling can be performed based on (for example)
distance until the total is within the budget.
Since the ray-cast hit data should contain a normal, the angle of the
reflected surface can be considered, making this a fully 3D system if
desired. Even if we only choose to ray-cast along the azimuth plane instead
of in all directions, the hit positions and normals will still be represented
in 3D space. Therefore, we can potentially place reflection emitters below
and above the player’s vertical position if surfaces are tilted up or down.
The decision as to how to position the test rays is then a matter of where
you want the most detail. Humans have better spatial hearing accuracy in
the azimuth plane than they do in the zenith plane, so typically we would
focus our ray-cast budget accordingly.
Finally, for the reflection emitters to pipe their sound through with the
correct time delay, we need to calculate their distance to the player posi-
tion, which will naturally take into account the full transit time there-
and-back (since the reflection emitter is acting as a phantom image). We
just need to adjust the phase so that the reflected sounds interact with the
direct sound in a natural manner.
in some kind of pool. Pooling the delay lines will limit the total memory
usage and allocations, and allow the delay lines to be reused frequently
without any setup cost.
11.5.3.1 Budgets
In the examples from the previous sections, Scenario A will require two
delay lines, and Scenario B will require six delay lines. For each delay line,
the total memory they will need to allocate must be greater than or equal
to the maximum transit time possible.
Since we know:
D
Tmax = max
cair
d = pspline − psource
If we add this direction vector to the source vector twice (i.e., doubling
the distance), we get a reflection point that is as close to a mirrored phan-
tom position as we can get with the information we have.
preflection = psource + 2d
The actual distance is also critical as we will use this with the flexible
delay line to create the propagation delay. We can obtain that from either
230 ◾ Game Audio Programming 4: Principles and Practices
square rooting the sum of squares of the direction vector d we initially got
and then multiplying the value by two:
d ′ = preflection − psource
FIGURE 11.4 Rays being cast to hit various surfaces, with a hit mask ignoring
certain collisions.
Figure 11.4 shows a plan view of ray-cast with appropriate hit mask. You
can also extend this if you want to include physics/collision materials in
the hit data or if you want to apply some material-based EQ to the delay
lines for added realism. With these three requirements in place, the math-
ematics for determining the phantom image position from a ray-cast hit is
reasonably straightforward.
Let’s declare a ray-cast starting a 3D vector psource that extends out to
3D vector pdestination. If the ray-cast hits a collidable mesh, we will also
get phit. First, take the ray-cast hit position phit and subtract the source
position psource.
d = phit − psource
232 ◾ Game Audio Programming 4: Principles and Practices
Now we get the direction vector ( d ) from psource to phit. Think of this as
the hit position described as relative to the source position. That is, the
source position psource has become a local origin. From that we can calculate
the phantom position based on the “normalized” normal vector we also
got from the same ray-cast hit.
The ⋅ operator here is a dot product. Now we must translate the phan-
tom position back to a global position by re-adding the source position.
This is where we want to place the reflection point preflection.
Also needed is the scalar distance between the source position psource and
the reflected position preflection, as this will help us to prescribe the propagation
delay we will use with the delay line. Subtract the source position psource from
the reflection position preflection and find the length of the resulting vector.
d ′ = preflection − psource
D propagation
Tpropagation =
cair
FIGURE 11.5 A player character walking through a maze with six ray-casts,
normals, and derived reflection points.
With those technical points met one way or the other, the system should
be fairly robust in both channel-based and object-based approaches since
the flexible delay lines are doing the bulk of the work. When we talk about
the mix engine’s ability to surround-pan these 3D emitters as they play
out, all they need to do is work out the speaker distribution. In the case
of object-panning, the same applies except that the speaker distribution is
deferred to later in the pipeline.
However, what you also gain from accurate propagation delays
between the emitters is something called interaural time difference or
ITD [3]. When spatialization only deals with level difference (as per the
norm), the primary spatial hearing mechanism is interaural level differ-
ence, or ILD [4]. In humans, this effect can be properly detected only in
Flexible Delay Lines ◾ 235
11.7 CONCLUSIONS
From a very simple concept of delaying sound in a flexible yet reason-
ably transparent way, we can build some very rich and complex effects.
The two scenarios described in this chapter are useful for basic early
reflections:
Flexible delay lines are broadly useful in plenty of other scenarios, too.
For example, they can allow the designer to delve into ITD spatialization,
even when just spatializing direct sounds.
Special consideration should be made for memory allocation, since real-
istic acoustic propagation may require several seconds of delay. Therefore,
achievable budgets will inform how many reflections can be supported by
a given system. Finally, we should state clearly that the final implementa-
tion here is incredibly basic compared to what a commercially available
DAW plugin might do, but what is presented here will get you up and run-
ning and enable you to apply them to your games.
FURTHER READING
Flexible delay lines are useful in other types of effects, and indeed they are com-
monly the basis for “flanger” and “chorus” type effects. If this is also inter-
esting to you, DAFX has a list of delay-based audio effects that you may
want to try out [8].
There are also more sophisticated forms of delay line than the one described in
this chapter that can offer greater fidelity such as PSOLA [9] at a higher
computational cost.
REFERENCES
1. “Polynomial interpolation” – https://en.wikipedia.org/wiki/Polynomial_
interpolation
2. “Polyphase FIR interpolation” – https://www.mathworks.com/help/dsp/ug/
polyphase-implementation-of-fir-interpolation-block.html
3. “Interaural time difference (ITD)” – https://en.wikipedia.org/wiki/Interaural_
time_difference
4. “Interaural level difference (ILD)” – https://en.wikipedia.org/wiki/Sound_
localization
Flexible Delay Lines ◾ 237
Thread-Safe
Command Buffer
Guy Somberg
12.1 INTRODUCTION
Audio engines are distinctive (although not unique) in their hard real-time
requirements. There are parts of our codebases where we cannot perform
any operations that can block or perform any sort of kernel system call.
Most often, the code that has these requirements lives in the mixer thread – we
must perform all of our calculations and completely fill our buffer within
a very small number of milliseconds. Failure to do so results in a buffer
underrun and an unpleasant pop for the player.
Much of the code that has these strict requirements is already written for us
in our audio middleware,1 so – while we cannot waste cycles willy-nilly – the
vast majority of our code is free to follow the normal rules of programming.
Every once in a while, though, we are faced with a need to interact directly
with the mixer thread. Maybe we’re manipulating the DSP graph, or offload-
ing some calculation onto the mixer thread, or we’re writing a fancy modern
multi-threaded audio engine and the mixer thread is the lowest latency solu-
tion for what the “audio thread” should be. In these situations, we are faced
with a need to send commands to another thread in a fully lock-free fashion.
Enter a data structure that I call the Thread-Safe Command Buffer or
the Atomic-Powered Triple Buffer.2
commands, then send them to the worker thread in a batch. The worker
thread reads the messages and does its work, then comes back for more.
Because we need these operations to be lock-free, we must use operations
on std::atomic types.
Atomics are the razor blades of programming: they are the right tool for
a certain type of job, but you must handle them with care or you will cut
yourself badly. I believe that the code presented in this chapter is correct
and bug-free, but you should not just take my word for it.3 If you use the
code here, I encourage you to take time to read it, understand it, and con-
vince yourself of its correctness.4
private:
std::vector<Command> Commands;
std::atomic<bool> CommandExchange = false;
};
240 ◾ Game Audio Programming 4: Principles and Practices
This data structure will work, and it does solve the problem5: lock-free
exchange of data. However, it has one fundamental problem that will end
up causing all sorts of trouble: the sender cannot write a new buffer of
messages while the receiver is still processing messages.
It is highly unlikely that the two threads are operating in lockstep. One or
the other will arrive first, and if it is the wrong one, then you can end up in a
situation where you’re missing updates and adding latency to your messages.
12.2.2 Second Attempt: Double Buffer
Let’s try adding another buffer and see how it changes things.
class Buffer
{
public:
std::vector<Command>& GetSendBuffer()
{
if (CommandExchange)
return Commands[0];
return Commands[1];
}
const std::vector<Command>& GetReceiveBuffer() const
{
if (CommandExchange)
return Commands[1];
return Commands[0];
}
bool Send()
{
if (ReceiverWorking)
return false;
CommandExchange = !CommandExchange;
ReceiverWorking = true;
}
void CommandsProcessed()
{
ReceiverWorking = false;
}
private:
std::array<std::vector<Command>, 2> Commands;
std::atomic<bool> CommandExchange = false;
std::atomic<bool> ReceiverWorking = false;
};
Thread-Safe Command Buffer ◾ 241
This is better! We can now read and write at the same time, at the cost
of an extra bool and a second vector of commands. But we have a prob-
lem with threads that are not running in sync, similar to the single-buffer
approach: after we have sent the buffer to the other thread, we cannot write
again to the send buffer until the worker thread has finished processing its
buffer. This will be a problem if the worker thread is taking a long time
and the sender thread wants to send more messages.
What we need is a solution that will always allow the sender to write to
a buffer and the receiver to read from a buffer.
In this initial state, the sending thread can write freely to Buffers[0]
and the receiving thread can read freely from Buffers[2]. Buffers[1] is
unused in this initial state. In order to be able to send, X (current value: 1)
must be equal to ES (current value: 1), which means that we can send. And,
in order to be able to receive, X (current value: 1) must be equal to ER (current
value: 0), which means that we cannot receive.
At some point, the sender has filled its buffer and does a send. We per-
form a send by swapping the values of S and X. We also need to update the
value of ES with the current value of R – currently 2. Those operations will
leave us with the values given in Table 12.2.
After the send operation, the sending thread can write freely to
Buffers[1] and the receiving thread can read freely from Buffers[2].
Buffers[0] contains the messages that the sender wrote and is ready
to be read by the receiver. In order to be able to send, X (current value: 0)
must be equal to ES (current value: 2), which means that we cannot send.
And, in order to be able to receive, X (current value: 0) must be equal to ER
(current value: 0), which means that we can receive.
Finally, the receiver thread has finished its processing and does a receive
operation, which is performed similarly to the send. We swap the values of
R and X, then update ER to the current value of S. After that, we end up with
the values given in Table 12.3.
After the receive operation, the sending thread can write freely to
Buffers[1] and the receiving thread can read freely from Buffers[0].
Buffers[2] is unused – after the first send/receive pair, it will contain the
messages that the receiver has just finished processing (although in this
first send/receive pairing, it will be empty). In order to be able to send,
X (current value: 2) must be equal to ES (current value: 2), which means
that we can send. And, in order to be able to receive, X (current value: 2)
must be equal to ER (current value: 1), which means that we cannot receive.
Having gone through one send/receive cycle, we are now back where we
started, except with the values shuffled around from where they were.
void Send();
bool Receive();
private:
bool DoCompareExchange(uint8_t& Expected, uint8_t& Current);
We’ve got const and mutable accessors for the send and receive buf-
fers, a Send() function, and a Receive() function. We’ve also declared a
private helper function that will do the heavy lifting. The implementations
244 ◾ Game Audio Programming 4: Principles and Practices
if (DoCompareExchange(ExpectedBufferForSender, Sender)) {
// We must re-read the send buffer because the exchange
// operation has switched which buffer is which.
GetSendBuffer().clear();
}
}
bool Buffer::Receive() {
return DoCompareExchange(ExpectedBufferForReceiver, Receiver);
}
return false;
}
this operation is baked into the atomic library as a pair of member functions:
std::atomic<T>::compare_exchange_weak() and ::compare_exchange_
strong(). The difference between these is that compare_exchange_weak() is
allowed to have spurious failures (that is, the exchange can fail if the two val-
ues are not equal) whereas compare_exchange_strong() is not. In general,
the guidance is to use compare_exchange_weak() if you’re retrying in a loop
and compare_exchange_strong() if you’re just doing it once. Regardless,
ignoring the spurious failures, the algorithm encoded by these functions
is the atomic equivalent of this:
// Note: This is pseudocode only! All of this happens atomically -
// often in a single CPU instruction.
bool atomic<T>::compare_exchange_strong(T& Expected, const T Desired)
{
if (this->value == Expected) {
this->value = Desired;
return true;
} else {
Expected = this->value;
return false;
}
}
This is it! The entire class occupies less than 60 lines of code. In this
book, we have the luxury of writing the commentary independently of the
code. In shipping code, we really ought to have about a 4 to 1 ratio of com-
ments to actual code, since this is conceptually complex code, even if the
implementation is short.
12.4 IMPROVEMENTS
Code similar to Section 12.3 was used to ship Torchlight 3 – it’s real stuff!
But we can do better. Let’s run through our algorithm a few times and
examine the values of our buffer indices in Table 12.4.
ES = ( S + 1) % 3
ER = ( R + 1) % 3
246 ◾ Game Audio Programming 4: Principles and Practices
TABLE 12.6 Row Indices and Send/Receive Added to the Data from Table 12.5
Op Index Can Send Can Receive S X R
Initial 0 ☑️ × 0 1 2
Send 1 × ☑️ 1 0 2
Receive 2 ☑️ × 1 2 0
Send 3 × ☑️ 2 1 0
Receive 4 ☑️ × 2 0 1
Send 5 × ☑️ 0 2 1
Receive 0 ☑️ × 0 1 2
the entire table given just the index of the table row, and this is the funda-
mental premise of our next iteration. Let’s add the index and an indicator
of whether we can send or receive in Table 12.6.
If we are going to divine the contents of the table from the index, then
we need formulas for several things:
• Can Send and Can Receive – These are easy: we can use the parity
of the index.
• S, X, and R – These formulas are more complicated, but not too bad.
Index + 1
S= %3
2
Index + 4
R= %3
2
X = ( 7 − Index ) % 3
It turns out that we never actually need the value of X, so while it is inter-
esting that we have a formula to calculate its value, it’s not something that
we’ll be using. With the formulas for S and R, we now have a choice: we
can either encode these formulas in our class or just store the values in a
lookup table. In practice, this is the difference between an add, a divide,
and a modulo7 versus 12 bytes of static memory. We’ll start with the lookup
table version for simplicity, but see Section 12.7 for a more in-depth dis-
cussion on the subject.
248 ◾ Game Audio Programming 4: Principles and Practices
bool Send();
bool Receive();
private:
enum class Operation : uint8_t {
Send = 0,
Receive = 1,
};
uint8_t GetSenderIndex() const;
uint8_t GetReceiverIndex() const;
static constexpr bool CanDoOperation(
Operation DesiredOperation, uint8_t CurrentState);
bool TryIncrementState(Operation DesiredOperation);
The interface is more or less the same, although the details are quite
different. The implementation of the index functions is simply an array
lookup:
It’s worth noting that these functions are our first instance of atomic
operations in this implementation. The State variable is of type
std::atomic, which provides convenience accessors and conversions.
However, if we want to be explicit to the readers of the code that these
are atomic operations, then we can replace the references with calls to
State.load().
All that’s left now is the actual send and receive operations along with
their implementation details. Because both send and receive follow exactly
the same pattern, we have abstracted it into a single function.
bool Buffer::Send() {
if (GetSendBuffer().empty())
return false;
if (TryIncrementState(Operation::Send)) {
GetSendBuffer().clear();
return true;
}
return false;
}
bool Buffer::Receive() {
return TryIncrementState(Operation::Receive);
}
bool Buffer::CanDoOperation(
Operation DesiredOperation, uint8_t CurrentState) {
return (CurrentState % 2) == static_cast<uint8_t>(DesiredOperation);
}
250 ◾ Game Audio Programming 4: Principles and Practices
template<typename T>
class Buffer {
std::vector<T>& GetSendBuffer() {
return CommandValues[GetSenderIndex()];
}
std::vector<T>& GetReceiveBuffer() {
return CommandValues[GetReceiverIndex()];
}
template<typename T>
class ThreadSafeCommandValue
{
public:
// Renaming these from Buffer to Value
T& GetSendValue() {
return CommandValues[GetSenderIndex()];
}
T& GetReceiveValue() {
return CommandValues[GetReceiverIndex()];
}
// ...
// Implementation of Send() is now simpler and mirrors Receive()
bool Send() {
return TryIncrementState(Operation::Send);
}
// ...
private:
// ...
std::array<T, 3> CommandValues;
// ...
};
template<typename T>
class ThreadSafeCommandBuffer {
public:
std::vector<T>& GetSendBuffer() {
return Value.GetSendValue();
}
std::vector<T>& GetReceiveBuffer() {
return Value.GetReceiveValue();
}
void Send() {
if (GetSendBuffer().empty())
return;
if (Value.Send())
GetSendBuffer().clear();
}
bool Receive() {
return Value.Receive();
}
private:
ThreadSafeCommandValue<std::vector<T>> Value;
};
Thread-Safe Command Buffer ◾ 253
The only changes we had to make to our original class are renaming a few
things, simplifying the implementation of Send(), and changing from an
array of vectors to an array of values. The ThreadSafeCommandBuffer<T>
class is mostly a wrapper for a ThreadSafeCommandValue<std::vector<T>>,
with the vector clearing logic in Send() added back in.
Warning: This is an extremely complex topic, and this chapter will only
skim the very edges of the surface. I believe that my logic here is sound and
that the changes that we will make in this section are valid. Nevertheless,
this data structure will work just fine with the default sequential consistent
memory order. Convince yourself that the changes are valid, but if you
don’t feel confident in their correctness, then leave them off when you
apply this data structure in your own code. For further details, I recom-
mend the talk “atomic<> Weapons: The C++ Memory Model and Modern
Hardware” by Herb Sutter from the C++ and Beyond 2012 conference.8
assembly. Let’s first take a look at the code, and then we’ll examine the
actual instructions to see what the compiler has done with the change.
template<typename T>
bool ThreadSafeCommandValue::TryIncrementState(
Operation DesiredOperation) {
uint8_t CurrentState = State.load(std::memory_order_acquire);
if (!CanDoOperation(DesiredOperation, CurrentState))
return false;
State.store(
(CurrentState + 1) % NumStates, std::memory_order_release);
return true;
}
The only changes to this code from the previous version are that we call
the load() and store() methods on the atomic state in order to be able to
pass in the desired memory orders. This does make the code more complex
in some ways, so let’s take a look at what the compiler does with this. Of
course, every compiler is different and will output different assembly for the
same input. The instructions in Table 12.7 and the rest of the assembly code
in this chapter were generated using msvc 19.35.322159 compiling for x64
in Release mode.10
The only difference between the two versions is the last line (marked with
a triangle ⚠). The default memory order does an xchg instruction, whereas
the acquire/release does a mov. The end result here is the same: memory
address [rbx+48h] contains the value of the cl register. The difference is in the
locking semantics: with xchg, the processor’s locking protocol is automatically
implemented for the duration of the exchange operation.11 The mov instruc-
tion, therefore, is slightly cheaper since it does not engage this same lock.
Note that there is no difference between the two for the load operation. This
means that we could move that one back to sequentially consistent, at least for
this platform/compiler combination. However, it is odd to see a release with-
out a matching acquire, so we will leave it with the acquire in place.
12.5.3.2 Relaxed Semantics for Reads
We’ve modified the state-increment function and gotten a small improvement.
The other atomic operation is the load for the sender/receiver indices when
calling GetSenderIndex() or GetReceiverIndex(). How can we reduce the
restrictions on this value? It’s a bare load, so there’s no acquire/release pattern
to buy into. We’re not going to dive into consume semantics: down that way
lies madness. The only other option is relaxed: can we use that?
It turns out that we can: there are no consequences for reordering above
or below this function, and nothing that the other thread does could affect
the index that we get from this function. We can change our functions thus:
template<typename T>
uint8_t ThreadSafeCommandValue::GetSenderIndex() const {
return SendBufferIndexes[State.load(std::memory_order_relaxed)];
}
template<typename T>
uint8_t ThreadSafeCommandValue::GetReceiverIndex() const {
return ReceiveBufferIndexes[State.load(std::memory_order_relaxed)];
}
This should be safe to do. Let’s see if there is any benefit from the com-
piler. Table 12.8 is the output for one of these functions from the same
compiler that was used to generate Table 12.7.
12.6 A NON-IMPROVEMENT
Let us imagine a situation in which the sender writes some commands
to the buffer – say, a collection of “move this event instance to this world
location” commands. The receiver thread is still either working or waiting
and the sender is running at a particularly high framerate, so the receiver
still hasn’t gotten around to receiving its last batch of messages when the
sender comes back around. Can we add an extra layer of optimization to
this system by “un-sending” the last batch of messages and updating the
values in place? That way when the receiver does come in, it only has to
process each entry once. This will have the benefit of reducing latency as
well since only the latest set of commands need to be handled. There are
two questions here: can we do this operation, and should we do it?
The answer to the first question is, yes, we can do it. We can calcu-
late or store what the old value that we sent previously would be and then
do a compare-exchange operation to implement the un-sending. We then
update the list of commands with updated values and then do a regular
Send() to give it back to the receiver. Mechanically, it works!
However, we should not implement this operation. The reason is that it
can result in a denial of service where the receiver thread never receives any-
thing at all. Let’s take a look at one potential set of timings in Table 12.9 to
see why this is.
In Table 12.9, we have a set of timings where the sender has found that
the receiver hasn’t yet received the last messages it sent, so it un-sends it,
writes updated commands, and then re-sends. However, just after the
sender thread un-sends, the receiver thread gets scheduled and tries to do
a receive. The receive fails because the sender has just un-sent and so there
is nothing waiting for it. Table 12.9 shows this continuing three times in a
row, but if this timing continues, the receiver thread will never get any of
the sender’s messages!
The generated assembly is listed in Table 12.10 for both the lookup table
version and the formula version of GetReceiverIndex().
The lookup table is much shorter since we can just look up the answer
directly. Contrariwise, the formula version – despite being six instructions
longer – does not do anything more complex than a register-register mul-
tiply and may end up being faster due to the aforementioned data cache
issue. My gut says that the formula version will be faster,12 so we will use
that one in the code listing in Section 12.8.
bool Send() {
return TryIncrementState(Operation::Send);
}
bool Receive() {
return TryIncrementState(Operation::Receive);
}
private:
enum class Operation : uint8_t {
Send = 0,
Receive = 1,
};
State.store(
(CurrentState + 1) % NumStates, std::memory_order_release);
return true;
}
return
(CurrentState % 2) == static_cast<uint8_t>(DesiredOperation);
}
template<typename T>
class ThreadSafeCommandBuffer {
public:
std::vector<T>& GetSendBuffer() {
return Value.GetSendValue();
}
std::vector<T>& GetReceiveBuffer() {
return Value.GetReceiveValue();
}
void Send() {
if (GetSendBuffer().empty())
return;
if (Value.Send())
GetSendBuffer().clear();
}
bool Receive() {
return Value.Receive();
}
private:
ThreadSafeCommandValue<std::vector<T>> Value;
};
Thread-Safe Command Buffer ◾ 261
12.9 CONCLUSION
One common pattern for audio engines is to put the minimum possible onto
the main thread and move much of the management and coordination of the
audio engine to a worker thread – often the event thread, but sometimes the
mixer thread. In either case, we need an efficient and performant solution,
but particularly when interacting with the mixer thread, all communication
must be fully lock-free. The Thread Safe Command Buffer implementation
presented in this chapter is a great tool for these use cases.
NOTES
1 For those of us who use middleware, that is.
2 Sometimes we can get carried away with naming things. It is, after all, one
of the two hardest problems in computer science.
3 I will admit that I have not tested or run the code for the single buffer and
the double buffer – the code for those is mostly to demonstrate the short-
comings of those strategies, rather than practical code that I’m encouraging
people to use. Nevertheless, the code for those sections is as good as I can
make it, and I do believe it to be correct. The code for the triple buffer has
been thoroughly tested and shipped a real game.
4 This is especially true when it comes to the non-default memory ordering
that will come later in the chapter.
5 The code for the single and double buffer solutions is purposefully
incomplete – it shows just the exchange mechanism and is missing accessors,
convenience functions, and other small details.
6 I hope that you will bear with me as we express very pedantically the opera-
tions allowed and repeat very similar patterns three times. It is important
to understand these relationships, so a bit of pedantry will hopefully make
them all very clear.
7 In practice, compilers are smarter than this and will transform the divide
and modulo operations into cheaper instructions.
8 https://herbsutter.com/2013/02/11/atomic-weapons-the-c-memory-model-
and-modern-hardware/ (The website contains links to videos of the two-part
talk and the talk slides).
9 This corresponds to Visual Studio 2022 v17.5.1, which was the latest stable
build as of this writing.
10 Release mode is what it’s called in the Visual Studio IDE. The relevant
command line options are /O2 /Oi.
11 https://www.felixcloutier.com/x86/xchg
12 If performance matters, of course, then the correct solution is to measure.
III
Tools
263
Chapter 13
Optimizing Audio
Designer Workflows
Matias Lizana García
13.1 INTRODUCTION
Audio designers are the creative people behind audio in a game, and it
is tempting to think that they spend all their time playing with DAWs,
synthesizers, pedals, or doing weird recordings. But the truth is, as they
become more technical, they are mostly dragged into the game engine, to
set up audio components, systems, and pipelines in addition to bug fix-
ing (maybe they forgot to load a bank or an audio event reference was
suspiciously removed). Audio programmers help audio designers by pro-
viding them with the best possible pipelines and tools to work with, so
here’s where we ask ourselves: where do audio designers spend most of
their time?
The motivation of this chapter is to take a trip into many aspects of the
audio pipeline in a game and examine some of the issues we can face in
big productions. We will discuss and define how we should improve all the
processes and make them performant and easy to use in order to save as
much time as possible for audio designers on their daily workflow.
departments in a game, but we will find some examples that make it more
specific. Starting with an easy “Is there anything I can help you with?”, we
can formulate questions that can help improve any project.
it is important to invest time in finding every small thing that can make
it break and provide code checks or validation tools to address the issue.
13.3.2 Validators
We often have features that are applied to thousands of assets, such as col-
lisions with objects or footsteps. Each object needs a proper setup, with
some tag that defines the type or some other properties. It is important
to have a tool that can validate all those assets and tell if they are set up
properly, then generate a warning to spot the error and fix them manually,
or even automating the process to fix the issues. These validation tools can
run in daily smoke tests so we can get an output list of, for example, how
many interactable objects are missing sound in today’s build. Validators
work particularly well with assets that need a fixed configuration so we can
check the desired setup. If we know a character should contain different
268 ◾ Game Audio Programming 4: Principles and Practices
audio objects to play on each part of the body, we can validate those values
with a tool to tell us if some character is missing the proper setup.
13.3.3 Configurations
There are many parameters that the audio designers can use to configure
the audio engine. It’s important to spend time organizing these settings
and making them easy to configure. Even simple things like organizing
the settings into categories can make a big difference in usability and dis-
coverability. As new systems get built, expose their configuration param-
eters so that the audio designers can tweak to get exactly the sound they’re
looking for. A few examples of the sorts of settings that you might expose
to your audio designers are:
• Audio culling distance – How far sounds are before they’re culled
by the game.
• Debug tool visibility – Which widgets and gizmos and other debug
information are visible by default.
• Importance setup – Number of importance levels, effect configura-
tion for each level, max objects at each level, etc.
and we can walk around trying and profiling sound on all different types
of terrain. Another example is a gym for having every interactable object,
to test grabbing, collisions, and throwing items around. This allows audio
designers to go faster on trying every possible sound instead of trying to
find them on different levels, with the time consumption attached to it
(waiting for the level to load, finding the specific place where the feature
needs to be tested, etc.). There can be gyms to test player actions, anima-
tions, music states, ambiences, and dialogs – basically any feature you need
to test in the game.
13.3.7 Animations
Marking up animations with sounds is one of the most manual parts of
the audio designer’s job. Frequently, this work ends up being largely plac-
ing footstep events on animations when a character’s feet hit the ground
or placing foley on the clothes when walking. It’s all very tedious for the
audio designers.
The baseline functionality for this tool is to show a timeline of events,
and have the events trigger when the animation is previewed. For com-
mon sounds like footsteps, it’s a good idea to set up a generic trigger with a
data-driven setup. Allow the audio designers to define an event configured
by some parameters such as clothes and velocity of the feet. We’ll want to
put all of this configuration onto the properties on the character itself so
that we can read the properties in a systematic way. If there are multiple
animations (such as walk, run, and jump) that all want to share the same
event (like footsteps), we can create a template tool so that audio designers
can just drag an existing template onto the data.
Finally, we can work with the technical artist to find a way to tag the
animations in the 3D animation tool, which is then exported to the game.
The tech artists and animators can create tools to tag the animations with
the information that we need, which the game can import and use with no
input from the audio designers.
For more about automated footstep setup, see Chapter 17 of this book –
“Automatic Manual Foley and Footsteps” by Pablo Schwilden Diaz.
13.3.8 Cinematics
There is also a lot of work on setting up audio in timelines for cutscenes.
Imagine we have a cinematic with multiple characters and they need to
trigger foley and voice. We need to manually set up an event for each ani-
mation for the cinematic every time the characters trigger a sound or talk.
Make sure that the audio designers can play cutscenes in the editor with-
out playing the game, allowing them to scrub forward and backward on
the timeline and adjust details as needed to fit the action. Rendering the
audio waveform in the audio clips can help the audio designers to find
where to place voice lines so that they line up with the action.
Recording and editing the sound foley and voice as an entire cinematic
track can help sync the visual content into the whole audio track. If we need
to post audio from different emitters, we just need to export any specific
audio into a separate track – for instance, any character audio in different
tracks each. If we place each audio track at the beginning of the cinematic
Optimizing Audio Designer Workflows ◾ 271
for each of the characters, they will play together in sync, as if we recorded
foley for a film. Sometimes we also rely on cinematic voice tracks coming
from a motion capture session, so they will fit perfectly into the action.
Even the simple act of assigning each audio track to each cinematic can
be improved and automated by using naming conventions. By naming
cinematics and audio tracks with the same prefixes, we can make a tool
that automatically populates audio on every cinematic in-game by adding
audio tracks matching the name prefix.
13.3.10 Statistics
There is a lot of information to control and monitor how the audio systems
are performing. Even with our audio middleware profiler, it is also nice to
provide more information inside the engine about gameplay or game states
(for example, how many times the player or NPC did a particular action, or
how frequently we changed from exploration mode to combat). These stats
not only help game designers improve the game but can also help audio
designers to make decisions regarding audio triggers. For example, if there
is a certain audio stinger that triggers when entering combat, having an
understanding of how frequently it plays can inform the audio designers
of whether it should play every time or whether it needs more variations.
It can also be valuable to monitor the performance of our systems,
beyond what we receive in our middleware. It is interesting to get statistics,
272 ◾ Game Audio Programming 4: Principles and Practices
for instance, about the number of emitters being handled in our engine,
so we can reduce calls to our middleware and virtual voices. We can also
glean a lot of statistics from our levels, so we can get location-based sta-
tistics instead of relying just on time-based statistics. One useful tool is to
implement automation for making the player walk through the level, so
we render the audio as if it was normal gameplay, then we can build a heat
map of the number of resources used on every part of the map. That will
also help audio designers to figure out if there is too much audio in one
place, or if our systems collapse in specific situations.
one of the places where, depending on how the architecture is made, audio
designers will spend most of their time setting up, so it’s worth diving in.
Let’s start from the beginning and see how far we can improve it.
void Jump()
{
...
AudioEngine.PostEvent(eventID, gameObjectID)
}
AudioManager.PostEvent(eventID, gameObjectID)
Our jumping code, attached to the player, will post the event into an
audio object that also lives on the player itself (sometimes the object has its
behavior defined in scripts attached to it). Most of the time, however, we
will need an external reference to trigger the audio on another object (for
example, the script can be in the player top hierarchy as a behavior, but we
want to post the jump event on the feet).
Audio objects need to be registered and initialized with the middle-
ware, so game engines provide a component that we can place in a game
object to register the audio object automatically to this specific entity. We
do the same thing with the eventID, as we create a component to handle
the reference of the event (player_jump in this case) and define it manu-
ally on the jump script to be triggered in the code. Triggering explicitly
this way is useful for fast prototyping and small games, but it scales badly.
Imagine we have thousands of assets and each of them contains an object
reference and an event to be set up. Sometimes game assets change, or
some event gets lost because the audio structure changes as well, or maybe
we add more than one audio object component because we forgot there
was already one somewhere in the same prefab.
274 ◾ Game Audio Programming 4: Principles and Practices
references are, and in big productions, the number of assets in-game can
be so big that it becomes unmanageable.
One solution is to create a file that holds all event references in the game
(we call it here AudioEvents) and we use our AudioManager singleton/
controller to access this public list:
AudioManager.PostEvent(AudioEvents.Jump, gameObjectID)
void Jump()
{
...
EventManager.PostEvent(GameEvent.Jump, gameObjectID)
}
In our AudioManager, we will just receive the event and post what we
want. We can also modify our events list to automatically trigger depend-
ing on the type we receive from the game event:
class AudioManager
{
OnGameEvent(GameEvent type, int gameObjectID)
{
AudioEvents[type].Post(gameObjectID);
}
}
do from the audio programming side is to make sure all tags are registered
into the audio object before posting any event, so we need to make a layer
that transforms the tag system into a middleware parameter.
In a perfect scenario, if we do that, audio designers will play the game
and profile it with the middleware, and they will start receiving collision
events from the objects, with non-defined parameters that they will need
to fill. For instance, now there is a small new jar made from wood, so
they will just need to fill this instance with a proper sound. There is also
room here for more automation since it is possible to read all tags from our
engine and populate those values with a tool into our middleware every
time we add more values into the tags set, so we always have our param-
eters updated.
Other examples of this sort of setup are materials for the terrain to pro-
vide the correct footstep material, or character clothes where we want to
define the type of shoes, pants, or shirt, to change the foley sound in the
character. Even if it is a type of chest that we want to open or interact with,
it should always be event-based for the action and then send all data pos-
sible as parameters. Relying on this architecture, it is mostly automatic for
any audio designer to receive data from the engine and just work in the
middleware, without having to touch the game engine.
13.5 CONCLUSION
As we can see, there are a lot of ways for a programmer to improve an
audio designer’s workflow. Creating a good communication culture and
trying to find where we all spend most of our time, even if it is a long task
or a short one that repeats many times, they all deserve time to investi-
gate how to make them faster. Focusing on tools is also an important part,
as we are going to save so much time and make development more user-
friendly. And finally, having a solid and data-driven architecture for our
game engine will drastically save manual workload time.
Satisfaction with creating all those tools and seeing people happy with
their workflows is the best outcome you can get for working on a team
with audio designers, and if you invest enough time in all those improve-
ments, you will even manage to get some free time for coding those cool
DSP that were promised in the interview.
Chapter 14
An Introduction to “An
Introduction to Audio
Tools Development”
Simon N. Goodwin
This means that audio tools must be aware of all platforms, even if they
only run on a few of the biggest. They must deal with much larger sets of
assets than any one release of the game – typically uncompressed audio at
the highest sample rate designers can record at, with variants, back-ups, ver-
sion control, and the other book-keeping necessities of development. All of
DOI: 10.1201/9781003330936-17 279
280 ◾ Game Audio Programming 4: Principles and Practices
this multifarious data is only swept away when the gold master is pressed
and the team moves on – hopefully taking some of the tools with them.
If your game is mainly or exclusively played online, there may be no “gold
master” milestone, but a drawn-out sequence of betas or soft-launches, con-
tent packs and refreshes, crossplay extensions and localizations; these con-
tinue until the servers shut down, hopefully a decade or more later. In that
case you’ve traded crunch for a treadmill, so the chance to condense or ratio-
nalize audio assets may never come. Readily-available backups and developer-
managed version control are especially vital in the toolchains of online titles.
In the simplest, most traditional sense, audio tools marshal the raw
assets recorded or created by sound designers, then convert them to the
formats most suited to the target hardware.1 They are not part of the game,
though they may be incorporated into it in some models of game develop-
ment. More broadly, audio tools exist to reconcile static state – the pre-
authored context of the game or a specific level or scenario – with dynamic
state – the emergent consequences of the players’ behavior and their inter-
actions with the entire game world.
Beyond custom-baked audio samples – be those loops, streams, or
one-shots; foley, music, speech, or impulse responses – audio tools cap-
ture categories and context in the form of metadata which the game audio
code, much later, will integrate with gameplay so that the most appropri-
ate sounds play at the right time. Some audio tools also provide means
to audition samples in the context of other sounds and predictable states,
such as layers of ambience, weather, or crowds.
There is a complex trade-off between doing this auditioning in the
game and in the tool.
The audio tools programmer must work closely with the game pro-
grammers and designers, not just the audio specialists on either side, to
make sure the tool encapsulates the knowledge and intent of the sound
designer, to minimize the need for manual markup later, and to thrill the
player with sounds which are appropriate to the action in the game and the
imagination of the sound designers.
• The target audiences of audio tools are the sound designers and
sometimes game designers – not the game player, producer, or QA
department. This affects how the tools communicate with the user.
Audio jargon – decibel, filter, codec – is fine and helps with precision,
but programmer jargon is no more appropriate here than in the game
or installer. That is, it is best avoided, especially if you’d rather be
developing than explaining it to your less-technical clients.
• Audio tools do not need to run on low-power target platforms, as
long as they’re aware of the differences in terms of codecs, file man-
agement, and memory management.
• Audio tools and the tech underlying them can be carried from one
game to the next more readily than game code, even if the genre or
presentation changes.
NOTES
1 Proprietary DSP and codecs invariably outperform “portable” ones and
often have access to platform resources denied to cross-platform code and
data.
2 We don’t often get to see this tech except indirectly in interviews, as it’s
proprietary and riddled (for better or worse) with trade secrets.
3 This is the Unix philosophy, epitomized in the audio domain by Sound
Exchange (AKA SoX), a cross-platform open-source command aptly
described as “the Swiss Army knife of sound processing” - Ed.
4 Localization is one area where audio tools generally do a lot more work than
other parts of the game toolchain.
Chapter 15
An Introduction
to Audio Tools
Development
Jorge Garcia
15.1 INTRODUCTION
There are already books that will teach you how to build an application or
a tool using various technologies, so in this chapter I share part of my jour-
ney and discoveries designing and developing tools for sound designers and
content creators, explaining what worked and what didn’t work. Coming
from a background of years writing systems, audio-related code, and frame-
works, I started thinking about tools development and asked myself: how
hard can it be? Having already written in-game debugging tools and simple
user interfaces, at some point I thought that tools and UI development were
trivial, a well-solved problem in game audio. I was far from reality.
There are some (non-command line) audio tools where the most com-
plicated and laborious part is the UI and the UX side of things. Think of
your favorite DAW or audio middleware authoring app: imagine all the
features they deliver and how they are presented. Now think of the years
of iteration, maintenance, and fixes it took them to reach a mature state.
Designing and developing audio tools is hard in part because they need to
serve specialized users, our beloved sound designers and technical sound
designers. Sometimes tools are developed for more generalist game devel-
opers outside of the audio team, which brings different challenges.
But why do we need to develop audio tools at all? If you are an audio
programmer working in a game studio, you probably already have access
to the audio middleware tools of choice and the integration with a third-
party or proprietary game engine. Most middleware products come with
default tools that you can use “out of the box,” so you might think that you
are good to go for production. But the reality is that there are always ways
of improving the process of your designers and helping them be produc-
tive. The majority of game projects benefit from having dedicated tools
developers because the production needs for a mid- and large-size game
can easily fall beyond what your audio middleware provides. As an audio
programmer, it’s your responsibility to help make the designers and con-
tent creators who work on a game more productive and to ensure that the
tools they use don’t suck. Or, at least, that they don’t suck too much [1]!
Designers are very talented individuals who will author incredible con-
tent for a game even if the tools aren’t very good, but for us tool program-
mers and designers, that shouldn’t be acceptable. Time savings in iteration
time will help make a better game and a better experience for the players.
Moreover, if the designers enjoy using your tools, the process will be more
pleasant for them, which again will lead to having a better game. Great
audio tools are the ones designed and developed in close collaboration
with the users. These tools understand the user’s needs and aims.
This chapter gives an overview of an audio tool development process
and some technicalities, in case this is the first time that you are tasked
with writing a tool. If you are an experienced tools developer already, you
will still find the information presented here useful to improve your tools
and development process.
If you read any specialized book on tools development and user expe-
rience design (“Designing the User Experience of Game Development
Tools” by David Lightbown [2] is a good one), you will see that good tools
usually outlive the projects they are built for. A tool that works well may
be employed in the development of not just one game, but an entire game
franchise, which is economical, as game development timelines are usually
tight and need to be optimized as much as possible. This is particularly
true in larger projects or in game franchises that release a new title every
few years. Tools that are meant to be used by content creators determine
the velocity at which a team can create quality content.
Good tools support the adoption of new audio techniques. You may
incorporate, for instance, a novel algorithm or an old approach that is now
affordable to solve a certain production problem with current hardware
286 ◾ Game Audio Programming 4: Principles and Practices
majority of the votes with 76.8%. I believe the results of the poll reflect the
realities of game development, as having a larger number of features won’t
make your tool great if the workflow is clunky, or if it takes a long time for
users to perform trivial tasks or iterate on their work.
Understanding your users’ problems and desires takes time; it’s a learn-
ing process from both sides – your users also need to understand how the
tool could work and get back to you with improvements or preferences in
the way they want it to function. This leads to iteration on the original tool
idea and workflows – sometimes it won’t be possible to know if the solu-
tion that the tool provides is a good one until it reaches the hands of your
users and they start using it.
1. Model – Represents user and app data and potentially does some
business logic with the data. This could be a database connection,
files on disk, or data structures in memory.
2. Controller – Accepts input and converts it to commands for the
Model or the View.
3. View – Carries out the heavy lifting of rendering or representation of
information and is responsible for displaying the data.
288 ◾ Game Audio Programming 4: Principles and Practices
• Storing the entire contents of the data after each edit. This takes up
more memory while the application is running, but it is a simple
approach that may work well with the application framework. It
could also potentially be faster as it won’t need to recalculate data
transformations or use alternative data caches.
• Only store the actions carried out by the user. This way we don’t need
to store any intermediate data that might be quite big. The downside is
that the tool architecture becomes more complex, needing to handle
things like the history of changes and the dependencies across them
(e.g., the order of transformations in data when, for instance, pro-
cessing is applied to audio). This is the preferred approach for tools
that are expected to grow, as it also helps us to manage complexity.
It’s also good practice to find out what tech the game and non-audio
tools use and use the same technology if you can. This could simplify the
development process later as you can leverage the knowledge and skills of
other teams.
ACKNOWLEDGMENTS
I would like to thank Simon N Goodwin for providing feedback, correc-
tions, and comments on the early drafts of this chapter.
NOTE
1 Hopefully automated steps!
REFERENCES
1. https://www.gamedeveloper.com/production/the-6-reasons-your-game-
development-tools-suck
2. Designing the user experience of game development tools (David Lightbown,
CRC Press)
3. https://twitter.com/markkilborn/status/1499838812477919234
4. Modern software engineering (David Farley, Addison-Wesley Professional)
5. Software Engineering at Google: Lessons learned from programming over
time (Titus Winters, Tom Manshreck & Hyrum Wright, O’Reilly)
6. https://en.wikipedia.org/wiki/SOLID
7. https://en.wikipedia.org/wiki/Model-view-viewmodel
8. Fundamentals of Software Architecture (Mark Richards and Neal Ford,
O’Reilly Media)
9. https://developers.google.com/protocol-buffers
10. https://uscilab.github.io/cereal/
11. https://en.wikipedia.org/wiki/Windows_Presentation_Foundation
12. https://github.com/naudio/NAudio
13. https://juce.com/
14. https://en.wikipedia.org/wiki/Qt_(software)
15. https://github.com/ocornut/imgui
16. https://en.wikipedia.org/wiki/Observer_pattern
17. https://en.wikipedia.org/wiki/Command_pattern
18. https://balsamiq.com/
19. https://www.figma.com/
Chapter 16
Audio Debugging
Tools and Techniques
Stéphane Beauchemin
16.1 INTRODUCTION
Can you hear the difference between a sound playing at –4.3 dB and –3.9 dB?
Are you able to notice that a sound emitter is panned at 47 degrees from
the center, but it should be panned at 54 degrees? Are there three or four
instances of the same audio event playing? Let’s face it, it is hard to answer
those questions relying strictly on our ears. It is much easier to debug
audio by augmenting our hearing sense with graphical data. This fact is
not new to any audio programmer, but how do we go about implementing
these visualizations?
In this chapter, I would like to bring the spotlight on audio debugging
tools and techniques. First, I will go over the importance of creating and
using debugging tools as early as possible in the development cycle. Then
we will look at the different debugging tools that are available in Unreal
5 and the Wwise integration. Using UE5 and Wwise, we will show a few
code examples of debug utilities that can be created.
After that, we will jump into using the Dear ImGui library with Unreal
5 and Wwise. In that section, we will go through an overview of ImGui,
then we will briefly look at the ecosystem of tools that were built by the
community around ImGui. Finally, we will show some fine and concise
code examples of ImGui and show why it is such a great library for game
audio development. This chapter is not solely intended for people using
UE5, Wwise, and ImGUI. The concepts that are presented here can be
easily ported to another game engine or audio engine. If you don’t have
ImGui, know that it can easily be integrated into any game engine!
In the game, you can input console commands from the console input
in unreal by typing the backtick (or grave accent) character: `. At the bot-
tom of the Output Log window in the editor, you will find an input box
where you can type console commands. They can also be passed as com-
mand executable arguments using the following syntax:
Then, the static variable can be used in the source code to condition-
ally show the debug code.
if (CVarAudioShowReflectionDebug.GetValueOnGameThread())
{
// 3d debug draw code here
}
Audio Debugging Tools and Techniques ◾ 299
if (CVarAudioShowReflectionDebug.GetValueOnGameThread() &&
(CVarAkCompNameFilter.GetValueOnGameThread().Len() == 0 ||
GetName().Contains(CVarAkCompNameFilter.GetValueOnGameThread())))
{
// 3d debug draw code here
}
In this example, the audio capture will be started when you pass a parameter.
The parameter is the name of the wave file that will be written to disk. When
there are no parameters that are passed, the output capture will be stopped.
With this little snippet of code, you have a practical debug tool that will create
a wave file on disk containing the data that is passed to the soundcard.
16.3.4 Breakpoint on Audio Events
Another cool trick to do with console commands is to add a breakpoint
when a particular audio event is being called:
static uint32 sBreakEventID = AK_INVALID_UNIQUE_ID;
static FAutoConsoleCommand CCWwiseBreakOnEventName(
TEXT("Wwise.BreakOnEvent"),
TEXT("Pass the event name to break on. ")
TEXT("Pass no parameter to reset"),
FConsoleCommandWithArgsDelegate::CreateLambda(
[](const TArray<FString>& Args)
{
if (Args.Num() > 0)
{
IWwiseSoundEngineAPI* AudioEngine =
IWwiseSoundEngineAPI::Get();
check(AudioEngine);
sBreakEventID = AudioEngine->GetIDFromString(*Args[0]);
}
else
{
sBreakEventID = AK_INVALID_UNIQUE_ID;
}
}
));
Audio Debugging Tools and Techniques ◾ 301
if (sBreakEventID == GetShortID())
{
UE_DEBUG_BREAK();
}
In order for this technique to work, all post-audio event calls must be
routed through the same function – UAkAudioEvent::PostEvent() in this
example. If this is not the case, it is a good idea to spend some time refac-
toring the code so that you can use this and similar techniques. Although
this example code is specific to Unreal and Wwise, the concept of setting
a breakpoint programmatically on an audio event can be implemented on
any game engine or sound engine.
This example uses the UE_DEBUG_BREAK() macro, which is available if
you are using Unreal. If you are not using Unreal, then you will need to
replace that with a different mechanism. Each platform has a different
function that needs to be called in order to trigger a breakpoint in the
code. For example, on Windows using Microsoft C++, the function to call
is debugbreak().3
Hopefully, this section has given you ideas on how to take advantage of
console commands. They are great for programmers – in just a few lines
of code, we create nifty little debug tools. However, as programmers, we’re
used to interacting with text and commands. For non-programmers, con-
sole commands can be inconvenient and intimidating. In the next section,
we will look at the friendliest possible debug tool: ImGui.
ImGui has a wide range of available widgets: text input, button, check
box, combo box, slider, list view, plot widget, etc. It is easy to create win-
dows, and it handles mouse input, keyboard input, and even gamepad
input. It has enough functionality that it would almost be possible to use
ImGui for the UI of an operating system. The library has a tiny footprint:
without the examples and documentation, the library is composed of less
than a dozen of source files. With the fact that there are no dependencies,
the library is very easy to integrate into any game engine.
FIGURE 16.1 ImGui audio component debug display, filtering for “ambient”
objects and showing 3D positions.
Audio Debugging Tools and Techniques ◾ 305
#include <imgui.h>
void Draw()
{
ImGui::Begin("Audio Component Debug");
ImGui::BeginChild(
"Child AkComponents",
ImVec2(ImGui::GetContentRegionAvail().x,
ImGui::GetContentRegionAvail().y - 20.f),
false, WindowFlags);
FString ComponentName;
It->GetAkGameObjectName(ComponentName);
auto Name = StringCast<ANSICHAR>(*ComponentName);
if (Filter.PassFilter(Name.Get()))
{
ImGui::Text("%s", Name.Get());
DrawDebugCone(It->GetWorld(), Origin,
Direction, 45.0f, 0.20f, 0.3f, 8,
FColor::Blue, false, -1.f, 255);
306 ◾ Game Audio Programming 4: Principles and Practices
FIGURE 16.2 Contextual menu for the audio component debug window.
DrawDebugString(It->GetWorld(), Origin,
ComponentName, nullptr,
FColor::Blue, 0.00001f);
}
}
ImGui::EndChild();
}
ImGui::End();
}
The code is straightforward and accomplishes the job. This example has
some simplifications that real code will have to handle in a more robust
fashion. For example, it uses GWorld, which does not support multiplayer
configurations. A full-blown audio component debugger would require
more code – but not that much!
if (ImGui::BeginPopupContextItem(Name.Get()))
{
if (ImGui::Selectable("Game object ID to clipboard"))
{
FString ClipBoard =
Audio Debugging Tools and Techniques ◾ 307
FString::Printf(TEXT("%llu"), It->GetAkGameObjectID());
ImGui::SetClipboardText(StringCast<ANSICHAR>(*ClipBoard).Get());
}
if (ImGui::Selectable("Ak Component ptr to clipboard"))
{
FString ClipBoard = FString::Printf(TEXT("(UObject*)0x%p"), *It);
ImGui::SetClipboardText(StringCast<ANSICHAR>(*ClipBoard).Get());
}
ImGui::EndPopup();
}
When the user right-clicks on a component name, they will see two
options: “Game object ID to clipboard” and “Ak Component ptr to clip-
board.” Copying the game object to the clipboard will be useful in our next
example. Copying the game object as a pointer is a great trick for the pro-
grammer: pause the execution of the game in your debugger and paste the
string in your variable watch window. This is a huge time saver: now you
can inspect variables that are not shown by the debug UI.
#include <imgui.h>
void DrawRTPCWindow()
{
ImGui::Begin("Audio RTPC Debug");
308 ◾ Game Audio Programming 4: Principles and Practices
AkRtpcValue RtpcValue;
AK::SoundEngine::Query::RTPCValue_type ValueType =
AK::SoundEngine::Query::RTPCValue_GameObject;
FAkAudioDevice::Get()->GetRTPCValue(
StringCast<WIDECHAR>(RtpcName).Get(),
GameObjID,
AK_INVALID_PLAYING_ID,
RtpcValue,
ValueType);
RtpcValues[Idx] = RtpcValue;
char Overlay[32];
sprintf_s(Overlay, "Current value %0.1f", RtpcValue);
Idx++;
Idx %= IM_ARRAYSIZE(RtpcValues);
ImGui::End();
}
In this window, the RTPC name and the Game Object ID need to be
manually entered (or pasted from the clipboard if you have implemented
the previous audio component window). The reason behind this choice
is to decouple the two IMGui examples. However, in a more full-featured
Audio Debugging Tools and Techniques ◾ 309
game audio debug tool, the inputs to the RTPC plot widget could be inte-
grated directly with the rest of the debug user interface.
16.5 CONCLUSION
When it comes to audio debugging, using the right tools is one of the most
important details. With the right tools, you now know that it is easy to add
debugging functionality to the systems you create, so you will spend very
little time creating debug tools because it is easy and fast. Since it is easy
and fast to create debug tools, you do it from the beginning of the devel-
opment cycle. Consequently, you can build stable audio systems in a way
that is very efficient without losing time to debug. Since you are now a very
efficient audio programmer, you now have a lot of time to focus on what
you really enjoy doing: audio programming!
NOTES
1 https://docs.unrealengine.com/5.2/en-US/console-varaibles-cplusplus-in-
unreal-engine/ (Note that the typo “varaibles” is the correct URL. - Ed)
2 As of this writing, the most recent Wwise integration version is
2022.1.7.8290.2779.
3 There is a proposal working its way through the ISO C++ committee to add
debugging support functions to the standard library. You can read the lat-
est version of the proposal document (which is currently targeting inclu-
sion in C++26) at https://wg21.link/p2546. If this proposal does make it into
the C++ standard, then you will be able to trigger a breakpoint by calling
std::breakpoint(). – Ed.
4 https://github.com/ocornut/imgui
5 https://github.com/segross/UnrealImGui
6 https://github.com/sammyfreg/netImgui
Chapter 17
Automatic Manual
Foley and Footsteps
Pablo Schwilden Diaz
17.1 INTRODUCTION
When we think about audio for an animated game, be it 2D or 3D, one
of the first things that comes to mind is the movement of characters and
their audio representation: foley and footsteps. Unless playing from a very
far perspective, these sounds will tend to be ever-present. There are two
usual ways of implementing those sound triggers. The first one is to manu-
ally place tags, markers, or events on each animation timeline to trigger
sounds on certain frames of the animation. The second one is to analyze
movement at run-time while the game is playing and detect when a foot
hits the ground or a limb moves quickly to trigger a sound at that moment.
burn through an army of juniors and interns, then there is technically “no
problem,” but this isn’t really a satisfying or sustainable approach to the
challenge at hand.
The second solution, analyzing movement at run-time and triggering
sound on the fly, would solve the issue of manual labor. However, here it
implies completely giving artistic control to the analysis and creating a tool
that would work in the myriad of different scenarios that are involved.
There’s always going to be that one scene where the movement is unusual
or complex and the analysis will throw sounds at the wrong moment or
intensity, breaking the scene. Creating a tool that models real-life move-
ment and impact sounds is theoretically achievable but would again
require an enormous amount of effort from the audio programmers to get
“just right.”
One alternative solution is to go back to the cinema way of doing things:
recording all the foley and footsteps in the studio, in sync with the image.
However, not only would this require a lot of work, it would also break
anytime we bring interactivity back into the game. This isn’t a satisfying
solution either.
solution, manually placing tags, entrusts us with the full 100% of the work.
The second traditional solution, triggering sounds at run-time, takes away
all control and we end up with 0% of the creative work, pouring all our
energy into making a system that gets it right 100% of the time. Seeing it
laid out with these numbers, what we want becomes clearer: a system that
takes care of the boring 80% of the work without taking away the 20% we
really like.
This really starts to sound like the work of an intern who makes a first
pass on everything before the senior sound designer comes to check and
correct mistakes. So, with this in mind let’s see if we can create a system
that “simulates an intern.”
If we know the name of the bone representing the feet, we can tell the
computerized intern that it’s the object we want to track. For that, we will
create an array that holds the names of each bone that the tool needs to
follow.
/.../
/.../
/.../
//We crawl bones up the hierarchy until we reach the root which
//has no parent.
while(currentBone->parentBone != nullptr)
{
currentBone = currentBone->parentBone;
return result;
}
This transform now represents the foot bone in space relative to the root
of the animation skeleton at every frame in the animation.
/.../
//We assume the animation always starts with the feet on the
//ground.
bool planted = true;
planted = isPlanted;
}
}
With this code, we should now be able to detect every time the feet touch
the ground. Obviously, some footsteps might not be detected correctly
Automatic Manual Foley and Footsteps ◾ 317
depending on the values you use for the ground height and the deplant
height, so you might need to tweak those to get good results. However, let’s
not forget that we are simulating an intern here, which means it’s not cata-
strophic if some very specific or weird footsteps aren’t detected. We aren’t
creating the perfect tool that will detect all footsteps; we are creating a tool
that will relieve us from the biggest, most obvious chunk of the work.
Now that we can detect footsteps, we must do something with them. For
ease of code, we will just store them in an array for later use. We will use a struct
to store the info of each footstep. Our struct includes the footstep strength
here, which we do not yet have, but we will get that information shortly.
How do we get the strength of the footstep? The easiest way is to get the
speed at which the foot was traveling just before hitting the ground. The
faster it is going, the harder the footstep. As we already have the location
of the feet at any given time, deducing its speed is not complex. We could
calculate the speed on the last frame before the foot touched the ground,
but we found out with practice that it was closer to human perception to
take the average speed over a longer time period, so in our example we are
going to take the speed over the past 100 milliseconds.
Putting it all together, our first part of the code looks like this:
{
//Name of the bone that did this footstep
string boneName;
//We assume the animation always starts with the feet on the
//ground.
bool planted = true;
{
Transform footTransform = GetRootToBoneTransform(
animationFile, footBone, time);
newFootstep.boneName = footBone.name;
newFootstep.time = time;
newFootstep.strength = speed;
footstepsData.Add(newFootstep);
planted = isPlanted;
}
}
//This method gets the position of a bone relative to the root of the
//animation. It is pseudo-code inspired from an Unreal implementation.
Transform GetRootToBoneTransform(
AnimationFile* anim, Bone* bone, float time)
{
//This transform is relative to the parent bone
Transform result = anim->GetBoneTransform(bone, time);
Bone* currentBone = bone;
320 ◾ Game Audio Programming 4: Principles and Practices
//We crawl bones up the hierarchy until we reach the root which
//has no parent.
while(currentBone->parentBone != nullptr)
{
currentBone = currentBone->parentBone;
return result;
}
body movement” and derive a foley sound to accompany it. All of these
would be very interesting, but let’s remember we are simulating our
model intern’s editing, not real life. We only have to make it work 80%
of the time.
The approach we took when faced with this challenge was a very blunt,
simple, and naïve one. As the footsteps were defined by the movement of
the lower part of the character, we thought that what needed the most to
have foley was the upper part of the character: the arms in particular. The
basic movements we wanted to be able to highlight were people waving
at each other, shaking hands, or grabbing objects – all of which mostly
involve movement of the arms.
To implement this functionality, we analyzed the individual move-
ment of the wrists and triggered foley based on it. More precisely,
instead of looking at the location of each wrist on a given frame, we
calculated its movement speed at every frame. Whenever this speed
would go above a manually set threshold, our “intern” tool would place
a marker.
It wasn’t perfect, but again we weren’t targeting perfection. These
markers worked for most of the scenes where the movements weren’t
specific and for the other scenes, we would just add other markers
manually.
void UAudioAnimationTools_Widget::AutoGenerateFootstepEvents(
UAnimSequence* anim,
TMap<UAnimNotify*, FootstepsData>& createdAnimNotifiesMap)
{
createdAnimNotifiesMap.Empty()
if(anim == nullptr)
{
UE_LOG(LogTemp, Log, TEXT(
"The animation file passed to the AutoGen doesn't exist"));
return;
}
if(skeleton == nullptr)
{
UE_LOG(LogTemp, Log, TEXT(
Automatic Manual Foley and Footsteps ◾ 323
TArray<FootstepsData> footstepsDataArray;
footstepsDataArray.Empty();
FTransform pastRootTransform;
AnimationUtilities::GetWorldToRootTransform(
324 ◾ Game Audio Programming 4: Principles and Practices
FVector pastFootLocation =
pastRootTransform.TransformPosition(
pastFootTransform.GetLocation());
FootstepsData& footstep =
footstepsDataArray.AddDefaulted_GetRef();
footstep.boneName = footName;
footstep.strength = speed;
footstep.time = time;
}
else if (!planted)
{
plantedSinceTime = -1.0f;
}
}
}
FAnimNotifyEvent notifyEvent;
notifyEvent.NotifyName = "AutoGen_Footstep";
anim->Notifies.Add(notifyEvent);
createdAnimNotifiesMap.Add(notify, footstep);
}
anim->MarkRawDataAsModified();
anim->Modify(true);
anim->RefreshCacheData();
}
326 ◾ Game Audio Programming 4: Principles and Practices
17.12 CONCLUSION
With this tool, we can now handle a lot of animation files without compro-
mising our sound designers’ happiness or the quality of the work. Sound
designers can now concentrate on what they do best: using their ears
to create the best sounding game ever, rather than mindlessly scrolling
through long animation files frame by frame.
We have also touched on the idea of creating imperfect but useful tools.
By thinking of the tool as an “automatic intern,” we lessened the burden
of both the audio programmers creating the tool and the sound designers
using the tool. We managed to create a tool that does exactly what a sound
designer would expect, is easily maintainable, and also has very decent
expectations. This is a concept that we now try to apply to all the audio
tools development: “How would an intern do it?”
NOTE
1 Well… 99% the speed of light. We can’t actually reach the speed of
light. – Ed.
Index
327
328 ◾ Index
music data, 132 VO, 10, 28–29, 32, 36, 227, 270–272
raw, 280, 282 Audio file, 11, 29, 290, 300
static mesh, 87–90 Audio gym, 268–269
Atomics, 238–239, 243, 245, 249–250, Audio input, 159–160, 164, 169–170, 192,
253–255, 261 200, 210–212
compare-exchange, 244–245, 250, 253, Audio middleware, see Middleware
256 Audio mix, see Mix
load/store, 249–250, 253–256 Audio objects, 2–23, 267–268, 273–274,
Attack decay sustain release, see ADSR 278
curves activation, 3, 10, 15, 22
Attack time, 40–41, 44–47; see also ADSR allocation, 5–6, 14, 21
curves; Timed ADSRs deactivation, 3
Attenuation, 9, 11, 14–15, 21–22, 27–29, 31, management, 3–23, 274
39, 227, 230, 267 pools, 5–7, 229, 274, 276
Audio bus, 30, 39, 234 updates, 3–5, 7–12, 274
Audio capture, 299–300 visualization, 267, 298–299, 304–309
Audio debugging, 78–79, 100, 125–126, Audio pipelines, 27, 31, 81–83, 88, 234,
267–268, 284, 295–309 265–266, 270, 286, 288
Audio designer, see Sound designer Audio programmers, 3, 78, 158, 161, 265,
Audio director, 286 267, 275, 278, 280, 285–286, 288,
Audio emitters, 5–10, 27–28, 34, 225–228, 295, 303, 309, 311, 322, 326
230, 234, 270–272, 295 Audio scenes, 25, 27–28, 33–38, 69, 134,
stealing, 6 268–270, 311, 321
Audio engine, 60, 108, 129–130, 132, 134, Audio signal, 159–160, 169–170, 175, 192,
137, 160, 198, 234, 238, 261, 268, 210, 212
281–282, 288 Audio tools, 36, 38–39, 48–52, 54, 65–66,
Audio events, 3, 7–10, 24, 27, 29, 36, 38, 82, 87, 100, 101, 103–108, 126,
46, 48, 50–58, 60–63, 65–66, 182, 265–272, 276–278, 279–283,
101–102, 107–112, 125–126, 284–294, 295–309, 311, 313–314,
265, 267–268, 270–278, 295, 317, 320–322, 326
300–301, 322 for humans, 286–287
instances, 256 imperfect tools, 321, 326
nested events, 106 integration, 280–282, 285–286, 288,
parameters, 26, 29–30, 34, 38, 303–309
48–57, 63, 65–66, 102–103, 105, Audio track, 103–106, 129, 270–271
107–112, 119–121, 126, 128, 130, prefixes, 271
161–163, 176, 179–180, 190, 192, Audiokinetic, see Wwise
202, 208, 307–309 Autocorrelation, 164, 169–173, 182,
playback, 3–5, 7–8, 10–13, 27–29, 32, 184–186, 198, 200, 209, 213
37, 46, 48–54, 56, 58–59, 62, 65, Axis-aligned bounding boxes, see
80–81, 83, 102–105, 108–110, Bounding boxes
112, 116–127, 132, 266–271, Azimuth plane, 225–226, 228, 230
273–276, 295, 310–312, 316,
320–321
B
referenced events, 107
references, 265–266, 268, 273–276, 297 Back 4 Blood, 80
timeline, 41–42, 49–51, 53–54, 66, 106 Background sounds, 31, 68–79, 101–127
triggers, see Audio events, playback Balsamiq, 293–294
Index ◾ 329
Batch processing, 98, 239, 256, 294 Circles, 22, 74, 141–157
Birds, 101–102, 125 Circular buffer, 164–165, 216–218
Bitmask, 216, 218, 229 Clothes, 80, 270, 278
Blueprints, 320, 322–326 Cloud, 286, 288
Bones, 313–320 Codecs, 283
Boots, 43, 316 Coefficients, 172–173, 186, 200, 209, 219
Bottles, 85–86 Collidable meshes, 231
Bounding boxes, 18, 22, 269 Collidable surfaces, 227, 236
Bounds, 15, 17–22, 132 Collisions, 81, 85, 92–95, 98–100, 231, 267,
Box count, 144, 156–157 269, 278
Boxes, 15–19, 22, 84, 94–95, 99, 141–157, Combat encounters, 37, 128, 135
269 Combo box, 303
Breakpoints, 300–301, 309 Command and operations pattern, 292
Breathing, 158, 182, 192 Command channel, 251
Broadcasters, 292 Command line, 261, 284
Buffer underrun, 238 Communication culture, 278
Bugs, 265–266, 296, 302 Compilers, 69, 253–256, 261
Bumps, 81–82, 84–87, 92–94 Components, 8, 81–82, 265, 271, 273,
Button, 40, 79, 90, 266, 293, 302–303 298–299, 304–308
Composer, 136
Compressing time, 225
C
Compressor, 212
C++ template, 250–251 Cone of silence, 70
Calibration, 209–211 Config file, 274–275
Callbacks, 4, 7–8, 39, 64, 275 Configuration data, 57, 76, 105–107, 109,
Camera, 31 112, 116, 127, 268, 270, 289
Candidate position, 228, 230 Console
Candidate reflection vector, 230 commands, 79, 90, 296–302
Carriage return, 297 variables, 296–299
Cartoons, 213 Constructors, 6, 18, 116, 120–122
CD Projekt red, 39 Consume (memory order), see Memory
Cents, 111 orders
Cereal, 289 Context menu, 306–307
Chain link fences, 81, 86, 231 Control points, 43
Characters, 28–30, 46, 80–82, 85–86, Convolution theorem, 198
91–92, 94–95, 97–98, 132, 137, Coordinates, 19
213, 267–268, 270–271, 273–274, Corners, 22, 142, 197
278, 310, 312–313, 316, 321 Correlation, 169–173, 182, 186, 192, 200,
player character, 10, 57, 125, 132, 136, 209, 212–213
227, 232–233 correlation coefficient, 200, 209
Check box, 303 peak correlation, 173, 192
Chirping, 101–102, 125 perfect correlation, 175
Chord, 211 signal correlation, 169
Chorus, 236 Coyote, see Cartoons
Chunks, 11 Coyote time, 211, 213
Cinematics, 26–27, 36, 131, 270–271, 311 CPU cost, 7, 57, 64, 82, 96, 202, 223, 236
cinematic scenes, 27 CREPE, 209
cinematic tracks, 270–271 Crossfading, 44–45, 136–137
330 ◾ Index
Limbs, 310, 312 Middleware, 3–4, 6–7, 9, 11, 26, 31, 46, 48,
Linked lists, 6, 21 101, 103, 160, 234, 238, 261, 266,
Linux, 291 268, 271–273, 276–278, 284–285
List view, 303 MIDI, 281, 291
Listeners, 4, 8–10, 12, 21, 30–32, 107, Minimum and maximum delay, 71
159, 215–216, 234–235, 251, Minimum and maximum distance, 9, 27,
292–293, 301 71–72, 74–77, 107
Loading screen, 64, 132, 136 Mix, 24, 26, 31, 36, 39, 46, 100, 102, 129,
Locks, 238–240, 243, 255 136, 215, 234, 321
Logging, 27–28, 78–79, 296–297 base mix, 24–25
Loop region, 49, 51–53, 103–104 engine, 234
Looped sounds, 9, 32, 40, 47, 65–66, 69, static mix, 24, 26, 31
101–103, 106, 108–113, 116, Mix bus, see Audio bus
119–121, 125–127, 129, 135, 280 Mix graph, see DSP graph
Low-frequency oscillator, see LFO Mixer thread, 238, 261
modulator Mixing
actions, 24–26, 28–35, 37–39
decisions, 24–25, 33–39
M
patches, 25
Machine learning, 286 scenes, 25, 27–28, 33–38
MacOS, 291 selection, 24–30, 38–39
Magic spells, 46, 53, 57 states, 25–26, 28–29, 33, 35–36, 38
Markers transitions, 25–26, 33, 35–38
animation, 310, 312–313, 320–322 Mockup tools, see Balsamiq
destination, 50–53 Model view controller, 287–288
transition, 50–54, 67 Model view viewmodel, see MVVM
Marketplace, 8–9 Modulators
Markup language, see XAML LFO, 103, 105–106, 127
Master bus, 30 random, 107–108
Materials, 231, 267–269, 277–278 Monophonic pitch tracking, 159, 169
Maze, 227, 232–233 Monophony, 159
McLeod Pitch Method, see MPM Moog, Robert, 40
Medieval dungeon, 10 Motion capture, 271
Melodic range, 211 Motor, 40
Melody, 159, 210–211 Mouse input, 303
Memory allocation, 3, 5–7, 14, 21, 216, 218, Movement, 34, 46, 70, 74, 85, 218, 220–222,
225, 229, 236 225, 268, 310–312, 320–322
Memory cost, 7, 64, 203 MPM, 184, 192, 209, 213
Memory leak, 23, 64 MSVC, 254, 256, 261
Memory model, 253 Multiplayer games, 57, 212, 306
Memory orders, 250, 253–256 Music, 33, 37–39, 128–137, 158–159, 183,
acquire/release, 253–255 211, 213, 269, 280, 282, 321
consume, 255 ambient, 129, 132–137
relaxed, 255 cues, 129
sequential consistent, 253, 255 data, 132–134
Microphones, 159, 212 dynamic music system, 128–132, 134
Microservices, 288 enabling, 131–134
Microsoft, 288, 290, 301 interactive, 129
334 ◾ Index
Pointers, 18, 87, 116, 297, 307 Razor blades, see Atomics
Poly-phase FIR, 225 React, 291
Polyphony, 10, 159 Realism, 68, 79, 101, 218, 231, 236, 269
Portals, 269 Rectangles, 22, 141–157
Power spectral density, 200 analysis, 144–152
Prefabs, 266, 273 Recursion, 19, 125, 142, 146, 149, 156
Prélude à l’après-midi d’un faune, 183, 191, Redengine, 26–28
193, 196–197 Redo, 289, 292
Prelude to the Afternoon of a Faun, see Reflected sounds, 227–228, 232, 235
Prélude à l’après-midi d’un Reflection emitters, 227–228, 234
faune Reflection point, 226–234
Pretty fast FFT, see PFFFT Reflective surfaces, 215, 226, 236
Previewing audio, 282, 286, 288 Relaxed (memory order), see Memory
Probabilistic YIN, 209 orders
Procedural walking simulator, 227–228, Release, 40–45, 47; see also ADSR curves;
236 Timed ADSRs
Profiling, 9, 21, 268–269, 271, 278 Rendering, 11, 234, 270, 272, 282, 287,
Propagation delays, 100, 229, 232, 234 302–304
Prototyping, 70–72, 271, 293, 301 Repositioning, 30–32
Pseudorandom number generator, see Reset time, 46–52, 54, 56–57, 66
Random number generator Reverb, 31, 216, 281
Public domain, 198 RFX, 68–79
Puzzles, 159 Ridden, 80
PyQt, 291 RMS, 39
Python, 291–292 Roadrunner, see Cartoons
Rock Band, 159
Rolling dice, 108–110, 121
Q
Room tone, 101–102, 269
QML, 291 Rounding, 221, 229
QT, 291–292 RTPCs, see Event parameters
Quadrants, 15–19, 97, 142–157
Quadtrees, 15–21
S
insertion, 15–20
querying, 16–17, 21 Sample frames, 215–221, 232
Quantization artefacts, 222, 225 Sample rate, 160, 164, 172–174, 193–194,
Quests, 24, 28, 33, 36 203, 218–219, 225, 279
Samples, 160, 164, 166, 169–170,
172–174, 193–194, 197–198,
R
203, 213, 280
Racetrack, 225–226, 236 Scale factor, 219, 225
Random number generator, 69–70, 78 Scripting, 24, 28, 54, 128, 130–132, 134,
seed, 69–70, 78 137, 272–273, 281–282, 289, 291,
Random sound effects, see RFX 294
Random value distribution, 69–70 Serialization, 289
Rattles, 81–82, 84–87, 92, 94, 100 Shrubs, 80–82, 84–85, 87, 89–90, 92–93
Raycasting, 228–233, 269 Signal chain, 212
hit, 228–233 Signals and slots, 291–292
Raytracing, 269 Signposts, 36
336 ◾ Index
Silence, 43–46, 70–71, 102, 109–110, 121, Standard library, 58, 69, 87, 245, 309
134, 209 State machine, 25–26, 67, 292–293
Silent duration, 109 Static mesh, see Actors, static mesh
Sine wave, 104, 172–173, 180, 182, 186–187 Statistics, 271–272
Singing, 158–159, 182, 210–213 Stealth games, 83, 86
SingStar, 159 Subdivision, 15–19, 141–149, 152–156
Skeletons, 313–316 Subharmonics, 186
Sliders, 302–303 Sum of squares, 230
Smoke tests, 267 Surface normal, 228, 231–233
Software crash, 266 Sustain, 40–53, 56–57; see also ADSR
Sonic behavior, 81–83, 85, 92 curves; Timed ADSR
Sonic weaknesses, 279 Swiss army knife, 283, 297
Sound category, 10, 27, 75–78 Synthesizers, 40, 265; see also Hammond
Sound designer, 8–9, 24–26, 28–30, 38–39, Novachord
43, 48, 66, 72, 78, 81–85, 87–88,
90, 94, 100, 101–103, 108–111,
T
126–127, 129–130, 134, 136–137,
161, 164, 236, 265–278, 279–283, Tagging, 27–28, 266–267, 269–270,
284–286, 289, 296, 299, 312, 320, 277–278, 310, 312
322, 326 Terrain, 210, 230, 269, 278
Sound direction, 141; see also Sound Testing, 267–269, 272, 282, 296, 301
magnitude Text
Sound engine, see Audio engine input, 302–304
Sound events, see Audio events rendering, 267, 302, 304–305
Sound field, 31, 215 Thread safe command buffer, 238–261
Sound groups, 6, 9–10, 25, 102–103, double buffer, 240–241, 261
107–110, 112, 116, 125–126 single buffer, 239–240, 261
Sound magnitude, 141; see also Sound triple buffer, 238, 241–261
direction; Sound spread Threads, 211–212, 238–242, 255–257, 261,
Sound propagation, 100, 215, 218–219, 226, 281
229–234, 236 Threat value, 129–137
Sound spread, 141; see also Sound Threshold, 4, 9, 85–86, 93, 129, 131–137,
magnitude 186, 190–193, 209–210, 321
Source control, see Version control multiplier, 186, 190–191, 193
Spatial partitioning, 11–23, 96–97; see also music state, 134, 136
Quadtrees Ticking, 57, 64, 116, 119–122, 126
Spatialization, 103, 227, 234–236 Timbral features, 212
Speaker distribution, 234–235 Timbres, 159, 182
Spectral power density, 200 Time of day, 129–131, 134, 136
Speed of sound in air, 218–219, 232 Timed ADSRs, 40, 46–67
Spheres, 22, 74, 84, 94–95, 141, 157, 267 expiration, 60, 64–65
SPICE, 209 Timeline cursor, 50–51
Splines, 225–227, 229–230 pinning, 50, 67
Sprinting, 85–86 Timer scheduling, 64
Square roots, 4, 230 Torches, 10–11
Squared magnitude, 4, 200 Torchlight 3, 245
Squares, 15 Torus, 151
Standalone tools, 281–282, 299 Transit time, 228–229
Index ◾ 337