Game Audio Programming 3 Principles and Practices by Guy Somberg
Game Audio Programming 3 Principles and Practices by Guy Somberg
Programming 3
Game Audio
Programming 3
Principles and Practices
Edited by
Guy Somberg
First edition published 2021
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
Reasonable efforts have been made to publish reliable data and information, but the author and
publisher cannot assume responsibility for the validity of all materials or the consequences of
their use. The authors and publishers have attempted to trace the copyright holders of all material
reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write
and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted,
reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means,
now known or hereafter invented, including photocopying, microfilming, and recording, or in
any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access
www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please
contact [email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Typeset in Minion
by codeMantra
Preface, xi
Acknowledgments, xv
Editor, xvii
Contributors, xix
vii
viii ◾ Contents
INDEX, 251
Preface
INTRODUCTION
Welcome to the third volume of Game Audio Programming: Principles and
Practices! It is always exciting to me when these books come together. So
many audio programmers pour their expertise onto the page, and I get the
privilege of collecting that knowledge into one place that can be shared
with the world. As an added bonus, I get to be the first one to read the
chapters and to learn from them. As with all of the books in this series,
all of the contributors worked very hard, and this volume would not exist
without their hard work and dedication.
Game audio programming is a job that requires many layers of exper-
tise. We have to be tools programmers, pipeline managers, build engi-
neers, experts in using our chosen middleware (both tools and APIs), and
more. We need to have an understanding of audio design and the tools,
techniques, and terminology that they use. There are many fundamental
techniques that apply globally, but each game genre has its own set of spe-
cific challenges that require their own distinctive solutions. Small wonder,
then, that there is always more to write about and always more to learn.
THIS BOOK
The chapters in this book touch on only some of these techniques that are
critical to our jobs. Some of them dive deep into a topic and provide spe-
cific solutions to individual problems, while others are broad overviews of
a subject and provide a concept of the scope of the problem and the kinds
of questions you may need to ask.
Here are brief summaries of all of the chapters in this book:
xi
xii ◾ Preface
categories that sound effects live in and the sorts of features that will
be necessary in order to feed the content. It is intended as a starting
point and a common frame of reference in creating a dialog between
the audio designers and the audio programmers.
• Complex Numbers: A Primer for DSP Programming by Robert
Bantin—Complex numbers show up a lot in DSP programming,
but they can feel a little bit mystical without the proper background.
This chapter uses an alternating current as a practical demonstra-
tion of how complex numbers are useful in solving real-world prob-
lems. With this primer, the math of DSPs will be more approachable.
• Building Dynamic Analog-Style Filters: Bi-Quadratic Cascades
vs Digital Integrator Cascades by Robert Bantin—The ubiqui-
tous biquad filter and the “cookbook” formulas by Robert Bristow-
Johnson have been implemented in innumerable audio engines,
and they work well for the most part. However, they exhibit some
undesirable properties when their parameters are adjusted rapidly.
Digital Integrator Cascades, a technique by Hal Chamberlin, are an
alternative to the biquad filter that have some better properties for
this particular purpose.
• Modeling Atmospheric Absorption with a Low-Pass Filter by Nic
Taylor—Attenuation settings are often implemented in games as a
set of parameters, including a low-pass filter to model atmospheric
absorption. This chapter explores using the atmospheric features of
air temperature and humidity to provide a systematic way of setting
a low-pass filter cutoff for sound propagation.
• Software Engineering Principles of Voice Pipelines by Michael
Filion—Managing and delivering voice lines can be among the most
complex processes in a game project. This chapter takes a tour and
overview of the sorts of challenges that you are likely to run into and
some of the questions that you will need to ask.
• A Stimulus-Driven Server Authoritative Voice System by Tomas
Neumann—Spoken words are often at the core of why players con-
nect and relate to the characters within a video game. This chapter
presents some techniques for creating a voice system based on an
authoritative server that directs which lines are chosen, picks who
says something, and which client should play these lines.
Preface ◾ xiii
PARTING THOUGHTS
My first job out of college was writing a software mixer for a slot machine
operating system, and that is where I got my first taste of (and love for)
audio programming. As I transitioned to working on video games, I
learned some good ways of solving game audio problems and some bad
ways, but I never had a resource like this book or the previous two vol-
umes. I wish I had it at the time, but I am excited that it exists now. I hope
that it is inspiring, educational, and valuable.
Guy Somberg
Acknowledgments
Thanks to my contributors. Books like this don’t exist without your hard
work and expertise, and your determination to write it all down.
Thanks to Brian Fitzgerald, David Brevik, and Tyler Thompson, who
all gave me chances to prove myself and from whom I have learned a lot.
Thanks again to Thomas Buckeyne, who started me on my audio pro-
gramming journey.
Thanks to David Steinwedel, who was with me on my first big game
title, and whose partnership and friendship were both instrumental in
cementing my love of game audio programming.
Thanks once again to David Steinwedel, Jordan Stock, Andy Martin,
Pam Aranoff, Michael Kamper, Michael Csurics, and Erika Escamez—the
sound designers who have accompanied me on my audio programming
journey.
Thanks to Rick Adams, Jessica Vega, and the rest of the team at CRC
Press. I appreciate all of your hard work on my behalf in making this book
a reality.
And thanks to my wife Emily who is always helpful and supportive of
my work on this book.
xv
Editor
Guy Somberg has been programming audio engines for his entire
career. From humble beginnings writing a low-level audio mixer for slot
machines, he quickly transitioned to writing game audio engines for all
manner of games. He has written audio engines that shipped AAA games
like Hellgate: London, Bioshock 2, The Sims 4, and Torchlight 3, as well
as smaller titles like Minion Master, Tales from the Borderlands, and
Game of Thrones. Guy has also given several talks at the Game Developer
Conference, the Audio Developer Conference, and CppCon.
When he’s not programming or writing game audio programming
books, he can be found at home reading, playing video games, and play-
ing the flute.
xvii
Contributors
Florian Füsslin had a 10-year music background when entering the game
industry with Crytek in 2006. During the past 14 years, he has contributed
to the audio pipeline of CRYENGINE and shipped all major Crytek titles
on multiple platforms, including the Crysis Franchise, Ryse: Son of Rome,
the VR titles The Climb and Robinson, and HUNT: Showdown. Being a ded-
icated gamer and living the passion for game audio, he is leading the audio
team in the role of an Audio Director. He is lecturing at the Hochschule
Darmstadt (h_da) and School of Audio Engineering in Frankfurt (SAE),
and has given talks at multiple international conferences.
xix
xx ◾ Contributors
Jon Mitchell has worked as an audio programmer for United Front Games,
Radical Entertainment, and Codemasters, and is currently working with
the wonderfully talented and friendly people at Blackbird Interactive on
Homeworld 3. He lives in Vancouver with his partner, two destructive
cats, and the World’s Cutest Baby.
Michael Filion has been developing video games for his entire career of
more than 10 years with Ubisoft Québec, with the majority in the world
of audio. When explaining his work and passion to friends and family,
he often oversimplifies by stating that he is “responsible for ensuring the
bleeps and bloops are functional in video games.” He has had the opportu-
nity to work with many talented people on games such as Assassin’s Creed,
Child of Light, and Tom Clancy’s The Division. In between delivering great
titles, he enjoys traveling with his daughter and searching out different
craft brews from around the world.
Robert Bantin has been writing audio code for rather a long time. While
at school, he was an active member of the Amiga demo scene. At Salford
University, he studied acoustics and brought his coding experience to his
studies in the form of DSP and audio-focused applications. Upon graduat-
ing, he was recruited by Philips ASA Labs in Eindhoven in order to join
the MPEG technology program. After returning to the UK, he worked
at Thorn-EMI and brushed with their spin-off game audio middle-
ware: Sensaura GameCODA. He also worked at Yamaha and FXpansion
on several well-known DAW plug-ins, as well as writing some of Auro
Technologies’ first shippable code. Robert has since worked on a num-
ber of AAA games such as Guitar Hero Live, Dirt 4, and Tom Clancy’s
The Division 2. When he’s not programming, he can be found at home
Contributors ◾ xxi
building flying models with his son, attempting to shred on guitar, and
playing video games when no one is looking.
Robert Gay has been working in games ever since he graduated from
the University of Washington with a Bachelor of Science in Electrical
Engineering and a Bachelor of Fine Arts in Digital Arts & Experimental
Media. Starting as a Sound Designer in 2010 and then moving to being a
Technical Sound Designer, he finally moved to audio programming full
time while working at ArenaNet. Since then, he has worked at Amazon
as a Lead Game Audio Programmer and is currently a Senior Audio
Programmer working at Epic Games on the Unreal Engine.
Sound Effect
Categories
Florian Füsslin
Crytek GmbH
CONTENTS
1.1 Preamble 2
1.1.1 Interactive Media 2
1.1.2 The Big Three 3
1.2 The World 4
1.2.1 Environment 4
1.2.2 Weather 5
1.2.3 Particle Effects 5
1.2.4 Physics 6
1.3 Characters 7
1.3.1 Movement 7
1.3.2 Interactions 8
1.4 Feedback 9
1.4.1 Menu 10
1.4.2 Interface 10
1.4.3 Experience 10
1.5 Wrap-Up 11
1.5.1 Sound Effects Category Check List 11
1.6 Conclusion
12
1
2 ◾ Game Audio Programming 3
1.1 PREAMBLE
In the last two years, I have lectured on the subject of Game Audio at
various universities and audio schools. While giving those talks, I real-
ized that I ended up explaining terminology in greater detail than I had
anticipated. The general split of game audio production into dialog, music,
and sound effects makes sense to everyone. When I tried to break it into
smaller pieces, however, there were many follow-up questions on sound
effects in particular. Some people had either never heard of some sound
effect categories or didn’t associate anything with them. I consulted with
colleagues in the game industry about this observation, and it turns out
that even within this group of audio specialists, definitions and terminol-
ogies of sound effect categories vary. This chapter felt like a great oppor-
tunity to tackle the topic, and provide an overview of a potential project
structure, common complexity, and possible challenges. It can function
as a basis for a nomenclature for the project naming convention, and can
build the foundation for communication and collaboration between audio
designers and audio programmers. The goal is to enable you to handle all
sound effect requirements and requests coming your way in a structured
fashion.
1.1.1 Interactive Media
Most of the terminology around sound effects has been adopted from film
and audio post production, thinking in scenes and stems. Atmospheres
set the mood and background of a scene; foley effects support on-screen
sounds to add details and enhance drama; designed special effect sounds
create emotional reactions and support the actions. We have full control
of audio in this kind of linear media environment, so all of our sound
effects will play back exactly once and can be perfectly designed, timed,
balanced, and mixed to fit that particular scene.
But because games are interactive with player input, we have a lot less
control. Therefore, we have to think in sources (where is the sound emit-
ter), situations (when is it playing), and conditions (what states it is in). We
need a lot more assets to cover all potential scenarios and multiple varia-
tions to avoid repetition. In theory, every sound could play at any time
and be the most important sound playing at that moment, which requires
a constant shift in priorities and adjustment of the mix in real time. With
this complexity, game audio needs to develop new sub-categories within
the sound effects group.
Sound Effect Categories ◾ 3
It’s easy to see how these categories can map to the sounds for games like
an FPS or an MMORPG, but they are also applicable to other genres.
For example, in a soccer game, the world is the stadium, the character is
the ball, and feedback is the situational crowd reaction. In an RTS game,
the world is the battlefield, the units are the characters, and feedback is the
information about mission status, resources, and production.
These “big three” main categories can function as a starting point for
how we structure and manage our audio data in the project or audio mid-
dleware. In a soccer game, for example, we would need specific groups
and folders for each stadium, but we could treat and structure the crowd
globally. This line of thinking works for other categories as well. If our
game will always be in sunny daylight for all levels, then we don’t need any
weather effects, and we can treat our ambiences globally with no real-time
conditions.
In another scenario, we have an open world with different environments
ranging from a dense jungle to vast deserts, extreme weather conditions,
and a complete 24-hour day/night cycle featuring all four seasons. In this
case, we will probably design and structure per environment, including
dawn, day, dusk, and night layers. We will repeat this procedure per sea-
son, and support extreme weather conditions like seasonal types of rain in
the jungle and various sandstorms in the desert. All of this must be driven
by parameters so that our environments and conditions can adapt in real
time.
There are always exceptions to the rules, and each project has dif-
ferent requirements, which is why it is important to ask the following
questions:
4 ◾ Game Audio Programming 3
1.2 THE WORLD
The world represents the game environment and consists of the following
sub-categories:
1.2.1 Environment
Unless you are in space, there is always some noise. It can be a subtle
room tone, a rustling forest, or a cold mountain wind. This is called
ambience, sound bed, or atmosphere. Even if it is very subtle, it grounds
the player in the world and functions as the noise floor and threshold
from which we can build our audio mix and dynamic range. This base
layer is usually designed as a static loop or a granular loop which is
rather sparse and steady to hide its repetition. To reduce the potential
monotony, we can add details which don’t need visual support such as
blooms or falling dust.
Once we have the base loop, we can build on it. Wind gusts can help
to make the ambience feel more dynamic, ideally driven by a parameter
like wind_intensity. If our project supports a full day and night cycle, we
will want to consider sweeteners for dawn, day, dusk, and night and drive
them via a time_of_day parameter. If our game ranges across all seasons,
we might use a season parameter to provide variants for spring, summer,
autumn, and winter.
In addition to the ambiences, we can use spot effects for positional
details to the environment. This can be a constant emitter like a waterfall,
a generator, or a windmill, or it can be randomly occurring like an insect
flyby, wind gust, or distant rumble. Ideally there is a corresponding visual
Sound Effect Categories ◾ 5
1.2.2 Weather
Weather can be a big part of the perceived environment and ambience.
Due to its complex nature and the strong visual component, it makes
sense to treat it as a separate sub-category. Rain, for example, can range
from a sprinkle to a thunderstorm to a full-blown hurricane and will need
to blend between these via a parameter like rain_intensity. Wind can go
from a gentle breeze to a full storm all the way to a tornado, again driven
by a wind_intensity value.
If our weather simulation is dynamic, we also have to consider the time
before and the time after the effect. For example, a thunderstorm usu-
ally starts with gusts of wind which increase in intensity. Then they sud-
denly stop before the rain starts to fall with a couple of big drips before
the shower begins. Eventually the wind increases again during the rainfall
before it quiets down again. Finally, there is the aftermath. The ground is
wet with puddles, rivulets, and small streams. It is dripping from the roofs
and trees, and gurgling in the downspouts.
Even less noisy weather can have a strong impact on the audio. With
fog, for example, we may want to make everything sound more muted as
the fog gets thicker. Snow has a very similar effect. Falling snowflakes are
not very noisy, but they swallow all reflections, which reduces audibility
over distance. The acoustic difference is prominent.
1.2.3 Particle Effects
Similar to weather, particle effects also have a strong and very dynamic
visual component which can range in scale and size drastically. Because
they are often reused across the whole game, it makes sense to treat them
in their own environment sub-category. For example, a fire effect can
range from a small match all the way to a firestorm. The visual part often
does a copy and paste, treating it as more of the same scaled up or down.
For audio, playing the sound of a burning match one thousand times
still won’t make it a firestorm, and attempting to do so will not help our
performance.
Given this disparity, it makes sense to create assets for a fixed range
of scale and size such as small, medium, and large. In addition, we can
6 ◾ Game Audio Programming 3
drastically reduce the load on the audio engine and help to build a con-
vincing, manageable, and flexible toolkit by including parameters such as
size to drive a pitch effect or amount to trigger additional sweeteners like
an additional rumble or sizzle, or to switch the asset to a plural version of
the individual sound.
1.2.4 Physics
Physics describes everything that can collide, roll, slide, bend, or break in
our game world. While this is often tied to the actions the characters can
perform, it makes sense to keep it a global system and therefore tied to the
world and environment.
With physics, small design requests can quickly result in a very com-
plex system with a large number of assets needed. For example, maybe the
player can throw a stone to distract enemies. For audio, this feature means
that we need multiple stone impacts for all possible surface types in the
game like wood, metal, stone, and water. If the player can hold the throw
input to throw harder, we add an intensity from soft to hard. If the sizes
of the throwable rocks also vary, we also need to cater from pebble to brick
for all surface types. It is easy to end up with a couple hundred assets just
for the collision of one thing.
The closer we get to a real-world simulation, the more complex and
difficult it becomes to create believable outcomes. Just as with particles,
audio doesn’t scale with size: many small stone impacts don’t sound like
a big rock collision. To make it manageable to create these assets, we take
shortcuts by generating groups such as size (small, medium, large), throw-
ing intensity (soft, regular, hard), and use real-time parameters such as
mass, velocity, speed, and amount to drive real-time effects such as pitch
or volume. Also, we can design multi-impact sweeteners that trigger once
a certain threshold of “impacts per time” is reached.
While grouping can get us a fair distance, there will always be excep-
tions where we have to use original assets. A barrel is a good example.
While its surface is made of sheet metal, its internal resonance gives it
a distinct sound which will require a bespoke asset. A similar rule also
applies to breakable objects. A tree might consist of multiple sizes of wood
which splinter when the tree is destroyed, but simply playing the wood
splintering sound is unlikely to be convincing when the whole tree breaks
and comes down, for both the trunk and branches with leaves and foli-
age. Once again, we will need to design a bespoke asset in order to sound
realistic.
Sound Effect Categories ◾ 7
1.3 CHARACTERS
The category of characters includes everything that is related to player and
non-player assets with the following sub-categories:
1.3.1 Movement
In movies, the term “foley” describes the sounds that are added and
replaced during post production, either because they were not possible to
record properly on set or require special sound design. This includes foot-
steps on surfaces and sounds of cloth movement, as well as specific prop
sounds like a squeaking door for example. These sounds are often exag-
gerated to increase the drama and intensity of the scene. As video games
are silent from the start, we end up with a large amount of assets required
to cover our bases like basic movement of the main character. If our game
supports character customization or user-generated content with a wide
range of possible clothing styles, this can scale out of control, so we share
and reuse assets where possible.
For clothing sounds like cloth rustles and movement, we can group our
assets by speed (slow, medium, fast) or intensity (soft, regular, hard). For
different clothing styles, we can split our design into multiple layers, such as
fabric (the soft cloth-on-cloth movement), leather (typical correspondent
jacket crunch), and jingle (zippers, buttons, and chains). If we tie param-
eters to each layer and give clothing a specific layer value, we can create a
solid wardrobe of very-different-sounding clothes. There might be some
exceptions which need additional sweeteners because they have a specific
and unique-sounding element, like backpacks or gun belts. Together with
real-time effects and asset blending, a matrix like this should hold up for
the majority of our character movements.
8 ◾ Game Audio Programming 3
1.3.2 Interactions
Most of the sounds required for a game are based on the actions of the
player and the control inputs. This can range from a simple activity such
as opening a door or operating a rifle all the way to complex mechanics
like driving a car.
Most of the time, we can split an interaction into five consecutive
steps:
10 ◾ Game Audio Programming 3
1.4.1 Menu
The menu is the first contact between the player and our product. The
game begins with the main menu, where players start their onboarding
by creating an avatar; setting difficulty, graphic, and sound options; or
adjusting the controls to their needs. Going in or out of a submenu, slider
movement, and button presses usually have sound attached to support
the physical interaction of the input device. Ideally these sounds should be
themed to our product. For example, in an ancient Roman action game, we
might want using sword whooshes for moving between the menu pages, a
shield smash to cancel, and a sword hit to confirm. The main menu also
offers an option to give a first glimpse of the game world by playing an
ambience or moody soundscape. In our example, this could be the sound
of a distant battle or marching soldiers.
1.4.2 Interface
The interface plays a major role in giving the player helpful informa-
tion about the status, progress, and events of the game. These events
might be acoustic support for banners and tutorial hints, flashing icons
that highlight the controls, notifications of mission success, or warnings
that focus the player’s attention to a certain area of the screen or to an
important event which is about to happen. Like the menu, these sounds
should be themed to our product. We want to give these events a strong
audible identity while maximizing player readability. Using our Roman
action game example, we can support the banners with a sword pull-
ing from the sheath on appearance and holstering it when it disappears,
highlighting controls with a drum roll to underline the haptic nature,
or playing battle horn sounds to make the player aware that an attack is
underway.
1.4.3 Experience
The user experience is deeply connected to the emotional aspects of play-
ing a game. Audio can play a big role in achieving a memorable gaming
Sound Effect Categories ◾ 11
experience. HUD sounds are part of that, giving the player vital informa-
tion regarding the game status, critical events, or important information.
This can be an alert sound when your base is under attack in a real-time
strategy game, the heartbeat sound when running out of health in an
action game, or the ticking of the timer when falling behind in a racing
game.
Often, these sounds are unrealistic and designed to enhance the drama
or provide satisfaction. A good example is the successful hit feedback,
which is designed and exaggerated to celebrate the victorious moment, or
low-tone sub-bass rumbles which increase tension and build up a sense of
fear and danger long before the actual game event.
1.5 WRAP-UP
While this structure has been proven to work for a solid range of titles,
there will always be elements which might not fit into an existing category
and require a different structure based on the game you are making. The
list in Section 1.5.1 is meant to be a first check to give you a starting point
regarding your asset requirements, technical implementation, and project
structure.
• World
• Environment (e.g. ambiences, spot FX)
• Weather (e.g. rain, snow)
• Particle (e.g. fire, spark)
• Physics (e.g. collision, destruction)
• Character
• Movement (e.g. clothes, footsteps)
• Interaction (e.g. abilities, features)
• Feedback
• Menu (e.g. options, buttons)
• Interface (e.g. mini-map, events)
• Experience (e.g. hit feedback, health indicator)
12 ◾ Game Audio Programming 3
• Questions to answer:
• Do I need this (sub) category in my project?
• Do I see this category used globally or specifically?
• Do I have to react to real-time conditions, and if yes, what are
they?
1.6 CONCLUSION
It is important for us to talk about the sound effect requirements and
potential challenges early in production. The common use of audio mid-
dleware and the high standard of audio implementation with visual script-
ing in game engines allow the audio designers to build complex audio
behavior with minimum help from audio programmers. However, with
great power comes great responsibility: we want to enable audio designers
to build complex game audio with maximum flexibility, while keeping
maintenance, performance, and costs in consideration. I hope the sound
effect categories help you to avoid some pitfalls, master the challenges,
and strengthen the communication and collaboration between audio
programmers and audio designers.
I
DSP
13
Chapter 2
Complex Numbers
A Primer for DSP Programming
Robert Bantin
Massive Entertainment — an Ubisoft Studio
CONTENTS
2.1 Introduction 15
2.2 Implementing Incremental Phase 16
2.2.1 Resistance Is Real; Reaction is Imaginary 18
2.2.2 The Voltage Before the Load 19
2.2.3 The Voltage After the Load 21
2.3 Implementing Geometric Growth 23
2.4 Combining Incremental Phase with Geometric Growth 25
2.5 Notation Used by DSP Programmers 26
2.6 Conclusion 28
2.1 INTRODUCTION
Although it may not be intuitively obvious, the concept of a “complex
number” (i.e. a compound value containing a real and an imaginary com-
ponent) can be very powerful when you are modeling something that has
both magnitude and phase. Consider the model of a spiral in Figure 2.1.
When you compare each point along the graph with the next, two proper-
ties can be observed:
15
16 ◾ Game Audio Programming 3
We can say that the angle here is our phase, while the distance is our
magnitude. Modeling this graph with mathematics is precisely the sort of
thing that complex numbers are for.
FIGURE 2.4 Circuit of a voltage generator attached to an electric load for the
power cable (marked with a capital Z).
Complex Numbers ◾ 19
below a certain frequency and act like a closed circuit (i.e. very low resis-
tance) above a certain frequency. In this case, how much power dissipa-
tion actually occurs with this type of reactance in parallel to the resistance
could depend greatly on the oscillation frequency of our generator, as well
as the length of the cable (because as the cable gets longer, the resistance
and reactance increase at different rates). This is why power engineers are
so careful about cable selection when designing power lines.
St = [ at , bt ]
where a and b are the real and imaginary voltage states of the generator at
time interval t. Just think of t as a discrete time index: t = [0, 1, 2, 3, etc.].
If we were to attempt to progress the phase within S using vector math,
we could define a constant rotation vector as
cos θ
Q=
sin θ
You could then apply this incremental rotation in our simulation by reap-
plying the same vector operation to each time interval of S to get the next
time interval of S. You can thereby model the alternator turning over at an
angular rate of θ per time interval t.
For implementing this in code, the above equations would all work just
fine. However, in an era when a computer was a person, the mathemati-
cians of the day came up with a different approach that fit the tool they
had at the time: Algebra.
20 ◾ Game Audio Programming 3
St = at + ibt
Q = cos θ + i sin θ
The value i is a special coefficient that cannot be resolved into a real num-
ber until it is squared, at which point it becomes −1. The i is therefore used
to signify that the value on its own is “imaginary”—opposite to the nor-
mal “real” numbers that always produce a positive result when multiplied
by themselves. The value i is often described as “the square root of −1.”
While this is technically true, it is not a terribly helpful definition, so this
chapter will stick to the definition i 2 = −1.
Within the context of our circle model, what we are saying here is that
when an Im axis component is multiplied with another Im axis component,
the result is transformed into an inverted Re axis component. Conversely,
when a Re axis component is multiplied with an Im axis component, the
result is transformed into an Im axis component.
So, let’s apply this to S and Q using an algebraic product:
St = at + ibt
Q = cos θ + i sin θ
And then re-factorize the four terms into real and imaginary chunks:
FIGURE 2.5 The circle graph from Figure 2.2 achieved with complex algebra.
You may begin to see that this little algebraic trick with i 2 = −1 is what’s
doing the work of that vector operation for us, albeit with a somewhat
different style of notation. That’s really all it is.
In any case, any complex number can be plotted on this Re/Im graph
(known as an Argand diagram after its creator, J.D. Argand) by treating its
real and imaginary components as Cartesian-style coordinates.
Now that we’ve got our phase progression correctly worked out using
complex numbers, we can reproduce the circle graph that we built in
Figure 2.2 by using complex numbers (Figure 2.5). Note that Figures 2.2
and 2.5 are identical, even though they were constructed using different
mechanisms.
Complex Numbers ◾ 23
S 2 ( R − iX )
P=
R2 + X 2
Now the denominator is purely real. We just need to open the brackets of
the numerator:
S 2 R − iS 2 X
P=
R2 + X 2
To be clear then, the real and imaginary components of the power drop
are
S2 R −iS 2 X
Preal = 2 2
, Pimag = 2
R +X R + X2
Hopefully you can now see that if we ignored the cable capacitance of the
power line, our simulation of the power drop across the load would only
S2
be . That estimate would make our prediction way off with even the
R
slightest amount of reactance—even though we said the reactance of the
load was purely imaginary.
To summarize thus far, what we are saying is that since our A.C. voltage
generator simulation needs to predict the effect of a load that can be both
resistive and reactive, we can pretend that there is a second, imaginary
voltage working orthogonally to the real one, and this helps us correctly
understand how the power would drop over that kind of combined load.
Complex Numbers ◾ 25
Q = A cos θ + iB sin θ
where A and B are our respective real and imaginary growth components,
and θ is the phase increment between each time interval t.
Based on the formula we’ve been using for generating S along time
interval t,
St +1 = St Q
St +1 = St Q = St ( A cos θ + iB sin θ )
C2
B= A=
2
26 ◾ Game Audio Programming 3
FIGURE 2.6 The spiral graph from Figure 2.1 achieved with complex algebra.
C2
St +1 = St Q =
2
((at cosθ − bt sinθ ) + i (at sinθ + bt cosθ ))
Let’s pick some values now and graph them out. For growth factor C, we’re
3
going to pick the first golden ratio = , and for the rotational increment,
π 2
we’re going to pick θ = . The resulting graph is in Figure 2.6. Now that we
8
have our magnitude and phase properties worked out using complex num-
bers, we can see that the spiral graph in Figure 2.6 is identical to Figure 2.1.
28 ◾ Game Audio Programming 3
2.6 CONCLUSION
A complex number consists of a real and an imaginary component. Using
complex numbers, we can represent both amplitude and phase together.
Complex numbers work just like normal algebra, except that when an
imaginary component is multiplied with an imaginary component, the
result is an inverted real component. When a real component is multiplied
with an imaginary component, the result is an imaginary component,
so multiplying complex numbers needs to use the trick i 2 = −1 to get to
the right answer. Conversely, dividing by a complex number requires
multiplying the numerator and denominator of the fraction by the com-
plex conjugate (the same complex number with an inverted imaginary
component), thereby eliminating the imaginary component in the denom-
inator. (Note that electrical engineers often use j instead of i to avoid con-
fusing the imaginary component with the standard symbol for alternating
current.)
Finally, Euler’s formula allows us to use e ix shorthand in cases where the
real and imaginary components are cosine and sine functions, respectively.
This means that in code, we don’t usually implement the exponen-
tial function with an imaginary power and instead prefer to implement
a two-element container type that gets multiplied by cosine and sine
accordingly.
Chapter 3
Building Dynamic
Analog-Style Filters
Bi-Quadratic Cascades vs
Digital Integrator Cascades
Robert Bantin
Massive Entertainment — an Ubisoft Studio
CONTENTS
3.1 Introduction 30
3.2 The Infinite Impulse Response (IIR) Filter 30
3.2.1 Pole-Zero Maps, the Z-Plane, the Unit Circle, and the
Inverse-Z Transform32
3.2.2 Example: Math to Create a Notch Filter from Two
Poles and Two Zeros34
3.2.3 Going Beyond a Bi-Quadratic Filter 38
3.2.4 Robert Bristow-Johnson’s Cookbook 39
3.2.4.1 Digital Butterworth Low-Pass Bi-Quadratic
Coefficients Derived from Analog
Butterworth Control Parameters39
3.3 Digital Implementation of a Resistor-Capacitor (RC) Network 41
3.3.1 The Digital Integrator (DI) Filter 42
3.3.2 Example Code 43
3.3.3 A Fast ex Implementation 44
3.4 Building Standard Butterworth Filter Shapes with DI
Networks 45
3.4.1 Butterworth Low-Pass Filter 45
3.4.2 Butterworth High-Pass Filter 45
29
30 ◾ Game Audio Programming 3
3.1 INTRODUCTION
It’s a fairly regular occurrence that DSP coders get asked by sound design-
ers about dynamic filters (i.e. filters that can move their corner frequency
as they process) and why a particular dynamic filter sounds “harsh” or
“weird.” If it’s a simple low-pass or high-pass filter, the most likely reason
is that it was implemented using the ubiquitous “bi-quadratic filter” (or
“bi-quad”) structure. While this approach is generally excellent, it seems
to fall apart when modulating its cut-off or center frequency rapidly.
Prior to the bi-quad being widely adopted across the audio industry,
Hal Chamberlin [1] had been developing his own ideas in isolation and
came up with a vastly different approach to the one that later came to
popular research papers and textbooks. Neither approach is necessarily
superior to the other—there are pros and cons to both, and some will be
outlined in this chapter. However, Chamberlin’s approach really shines
in two important respects: Firstly, in its natural ability to emulate classic
Butterworth-type filter alignments (meaning those with a flat-as-possible
pass-band) and secondly, in the structure’s stability—particularly when
modulating the cut-off or center frequency rapidly. It is for these two
aspects that Chamberlin’s approach is measurably superior at building
dynamic analog-style filters. However, before diving headlong into some
of Hal Chamberlin’s work, let’s have a quick look at a more traditional
digital filter structure.
N −1 M −1
yk = ∑b x
n=0
n k −n − ∑a y
m=1
m k −m (3.1)
b0
xk yk
T –a1 T
b1
T b2 –a2 T
T T
bN-1 –am–1
Key
T A SINGLE UNIT DELAY AN ADDITION
3.2.1 Pole-Zero Maps, the Z-Plane, the Unit Circle, and the
Inverse-Z Transform
To apply the Z-plane design method, we first create a “pole-zero map”: a
2D graph that plots complex numbers as coordinates along a real axis and
an imaginary axis as shown in Figure 3.2. It’s an adaption of the Argand
diagram that treats complex numbers as 2D vectors. For more informa-
tion about complex numbers and algebra, refer to Chapter 2 “Complex
Numbers: A Primer for DSP Programming.”
The Z-plane is a plane along these two axes, so for simplicity, think of
the page that Figure 3.2 is on as the Z-plane. We place a “unit circle” on the
origin of the graph with a radius of 1.0. We can then plot a vector called
Z as a point that follows this unit circle based on its angle to the real axis,
ω T in radians-per-sample. This means that for any place along the unit
circle, Z is describing a frequency in radians that is normalized with the
sample rate in the range {0 ≤ ω T ≤ 2π}, or equally {−π ≤ ω T ≤ π}, as Z will
wrap around the circle indefinitely making these two ranges equivalent.
The Z-plane design method allows the designer to place two other types
of data, one called a “pole” and the other called a “zero.” This is not the
numerical value zero, rather the shape of zero as you might describe the
O’s in tic-tac-toe. Using the strategic placement of poles and zeros inside
Imag
wT
Real
the unit circle, a transfer function of a digital filter can be designed with
respect to the angle (i.e. frequency) of Z:
∏
Q−1
Z − zq
H (Z ) =
q=0
(3.2)
∏
R−1
Z − pr
r =0
where H is the transfer function of the filter with respect to the frequency
described by Z, Q is the number of zero points zq (of which q is the index),
and R is the number of pole points pr (of which r is the index).
In other words, the transfer function H(Z) is equal to the product of
all distances between Z and each zero, divided by the product of all dis-
tances between Z and each pole. This gives importance to the unit circle
as it presents the designer with a strict limit—one that ensures that these
data points’ gain effect is between 0 and unity. There is sometimes reason
to place a zero point outside the unit circle as the product of all the zero
points controls the DC gain of the filter. However, putting a pole point
outside of the unit circle will make the filter unstable.
The next step is to apply the Inverse-Z transform, which turns this map
of poles and zeros into a discrete time function of current and previous
inputs, and previous outputs, that you can implement as a discrete-time
algorithm in code.
Crucially, though, you should think of the angle ω T at π or −π radians
per sample as the frequency at the Nyquist limit (i.e. the sample rate divided
by two). This implies that we are encouraged to design filters with poles
and zeros beyond the Nyquist limit or with negative frequency. While this
might seem counterintuitive, it is in fact necessary. If you were to place a
single pole or zero in the upper half of the unit circle, the time-domain fil-
ter that came out of the Inverse-Z transform would have a complex num-
ber output. Since we usually want a filter that takes a purely real input and
generates a purely real output, we can mirror the poles and zeros of the
upper half of the unit circle in the lower half of the unit circle (doubling
the filter order). Each mirror point is a complex conjugate of the original,
so when they multiply together in the transfer function H(Z), the imagi-
nary components will cancel each other out. There’s only one case where
this isn’t necessary: when the poles and/or zeros are lying on the real axis,
implying that the data point is only working around 0 Hz (also known as
D.C.) and/or the Nyquist limit (the maximum frequency expressible at a
34 ◾ Game Audio Programming 3
given sample rate). Most of the time you will want to design filters what
work in between those two limits, so as a natural consequence, the mini-
mum number of poles or zeros is usually two each and increases in steps
of two. When resolving a pole-zero map with just two poles and two zeros,
the transfer function H(Z) is then a ratio of two quadratic polynomials - in
other words, a bi-quadratic.
b0 Z 2 + b1 Z + b2
H (Z ) = (3.3)
Z 2 + a1 Z + a2
where H is the transfer function of the filter with respect to the fre-
quency described by Z, using input coefficients [b 0 b1 b2] and output
coefficients [a1 a2].
Equation 3.3 is the transfer function H(Z) as a bi-quadratic in one its
standardized forms. Note that the a0 coefficient is missing because this
term will eventually become the output parameter y k in Equation 3.1.
The challenge, then, is to fit the transfer function H(Z) as described
in Equation 3.2 into the function H(Z) described in Equation 3.3. Once
this has been done, the Inverse-Z transform converts all the terms as
either current and/or past inputs or past outputs. This is decided by H(Z)
as any terms with which it is multiplied become output terms, while the
rest become input terms. Every multiple of Z shifts a term’s sample time
index one unit into the future, such that a term like b0 Z 2 becomes b0 x k+ 2.
Likewise, dividing by Z shifts a term’s sample time index one unit into the
past, such that b0 Z −1 becomes b0 x k−1 .
For the latter side of that ratio, we could leave it at 1 and have no poles
at all. However, you’d end up with a very wide notch with lots of ripple on
either side of it, so let’s instead place a pole along the same angle ω T but
slightly closer to the origin. Let’s call that distance D, ensuring that it’s
in the range {0 ≤ D < 1} to keep the filter stable. Let’s also mirror the pole
on the other side of the real axis to ensure a purely real output from these
poles when their terms are multiplied together.
If we then trace Z around the unit circle, we can see that the ratio of dis-
tances from Z to any pole and zero is almost 1.0 all the way around until
we get close to the angle ω T or −ω T, whereupon the ratio (and therefore
the gain of the filter) falls to nothing because the distance to one of the
zero points is nothing (Figure 3.3).
Since this pole-zero map is in fact an Argand diagram, the posi-
tion of either pole or zero point above the real axis can be described as
Equation 3.4, and the position of either pole or zero point below the real
axis can be described as Equation 3.5. This is all thanks to Euler’s theorem.
Imag
D Z
wT
Real
–wT
FIGURE 3.3 A pole-zero map with two poles and zeros configured to make a
notch filter.
36 ◾ Game Audio Programming 3
where ω T is the angle from the real axis and j is the engineers’ equivalent
of the imaginary number i (such that j 2 = −1).
The transfer function H(Z) for this specific pole-zero map then looks
like this:
H (Z ) =
( Z − e )( Z − e )
jω T − jω T
(3.6)
( Z − De )( Z − De )
jω T − jω T
Multiplying out the brackets and working through the terms gives us
Z 2 − Ze jω T − Ze − jω T + e 0
H (Z ) = (3.7)
Z − ZDe jω T − ZDe − jω T + D 2e 0
2
Z 2 − 2 Z cos ω T + 1
H (Z ) = (3.9)
Z 2 − 2 ZD cos ω T + D 2
If you then compare the bi-quadric equation with its standardized form in
Equation 3.3, the coefficients we are looking for are
b0 = 1
b1 = −2cos ω T
b2 = 1
a1 = −2D cos ω T
a2 = D 2
If you implement this filter, you will get a notch that is centered around
the normalized frequency ω T , so to modulate it, you simply need to
2πf
update the values of b1 and a1. Just remember that ω T = , where f in
fs
this case is your center frequency in Hz and fs is your sample rate. Setting
D close to 1 (say 0.95) will get you a tight notch with minimal ripple, but
Building Dynamic Analog-Style Filters ◾ 37
the attenuation of the notch will be small. Setting D to a lower value like
0.75 will not only increase the attenuation of the notch band, but will also
widen the notch band and increase the ripple on either side of it. Figure 3.4
shows the difference between a tight notch and a wide notch.
The only thing left to do is apply the Inverse-Z transform so we can see
how this transfer function becomes something implementable in the time
domain. This can be done providing you take the formula in Equation 3.3
and multiply both sides by the denominator in that ratio. This should give
you Equation 3.10.
( )
H ( Z ) Z 2 + a1Z + a2 = b0 Z 2 + b1Z + b2 (3.10)
y k + 2 + a1 y k +1 + a2 y k = b0 x k + 2 + b1x k +1 + b2 x k (3.11)
Since we can’t practically work in the future, the only way to really imple-
ment this time domain equation is by shuffling all the sample time indexes
k two units into the past:
y k + a1 y k −1 + a2 y k − 2 = b0 x k + b1x k −1 + b2 x k − 2 (3.12)
FIGURE 3.4 Tight notch (D = 0.95) vs wide notch (D = 0.75) at 1,000 Hz.
38 ◾ Game Audio Programming 3
This is the same as Equation 3.1, just described in the more specific
bi-quadratic form of three input terms and two previous output terms.
b0
xk yk
T T
b1 –a1
T b2 T
–a2
T b3 T
–a3
T T
b4 –a4
b0 c0
xk yk
T T T T
b1 –a1 c1 –d1
T T T T
b2 –a2 c2 –d2
Key
T A SINGLE UNIT DELAY AN ADDITION
b0 x k + b1x k −1 + b2 x k − 2 − a1 y k −1 − a2 y k − 2
yk = (3.14)
a0
In practice, though, you would probably work out the reciprocal of a0 and
multiply or normalize each coefficient by a0. Also introduced is the inter-
mediary parameter α , which takes care of the Q-factor for our filter:
sin ω T
α= (3.15)
2Q
1 − cos ω T
b0 =
2
b1 = 1 − cos ω T
1 − cos ω T
b2 =
2
a0 = 1 + α
a1 = −2cos ω T
a2 = 1 − α
K = 1 − e −ω T (3.16)
R
IN 1 OUT
xk T K yk
–
3.3.2 Example Code
#ifndef _DIGITAL_INTEGRATOR_INCLUDED_
#define _DIGITAL_INTEGRATOR_INCLUDED_
#include <math.h>
class DigitalIntegrator
{
public:
DigitalIntegrator()
: myZmin1(0.0f)
, myAttenuatorK(1.0f)
44 ◾ Game Audio Programming 3
{}
return output;
}
#endif // defined(_DIGITAL_INTEGRATOR_INCLUDED_)
Ist 2nd
xk Ist 2nd yk
DI DI
– –
DI DI
3.5.1 The Concept
Analog filter circuits will achieve resonance using some kind of con-
trolled feedback, typically by tapping off the immediate output of the fil-
ter into a separate amplifier such that the feedback path is buffered and
doesn’t interact with the output being passed on to the next stage of the
circuit. The resonance control parameter will then be used to vary how
much signal this separate amplifier feeds back into the input of the fil-
ter. It doesn’t take a lot of feedback to get the filter to self-oscillate (a few
percent), so the resonance amplifier will always attenuate to a varying
degree. If such a filter circuit is a cascade of multiple first-order filters,
then the feedback path is typically sent around the entire cascade. This is
crucial for resonance, as each first-order filter stage adds a 90° phase shift
at its corner frequency, and you need 180° of phase shift for the feedback
to cause resonance. This is why the resonant analog filters common to
musical applications are either second order or fourth order with a nega-
tive feedback path.
Building Dynamic Analog-Style Filters ◾ 47
xk DI DI DI DI yk
–R
xk DI DI DI DI yk
1
2 –R
T T T
1
2
FIGURE 3.14 Moog ladder filter schematic with added feedback delay of 2.5
samples.
3.5.2.3 Example Code
#ifndef _FRACTIONAL_DELAY_INCLUDED_
#define _FRACTIONAL_DELAY_INCLUDED_
#include <cassert>
#include <cstring>
class FractionalDelay
{
public:
explicit FractionalDelay(float inFractionalDelay)
: myFractionalDelay(inFractionalDelay)
, myWholeNumberedDelay(static_cast<int>(inFractionalDelay))
, myLERP(myFractionalDelay –
static_cast<float>(myWholeNumberedDelay))
, myWriteCursor(0)
Building Dynamic Analog-Style Filters ◾ 49
++myWriteCursor;
myWriteCursor &= FRACTIONAL_DELAY_MASK;
++myReadCursorA;
myReadCursorA &= FRACTIONAL_DELAY_MASK;
++myReadCursorB;
myReadCursorB &= FRACTIONAL_DELAY_MASK;
return output;
}
private:
FractionalDelay() = delete;
float myFractionalDelay;
int myWholeNumberedDelay;
float myLERP;
int myWriteCursor;
int myReadCursorA;
int myReadCursorB;
float mySampleBuffer[FRACTIONAL_DELAY_LENGTH];
};
#endif // defined(_FRACTIONAL_DELAY_INCLUDED_)
Despite this lo-fi approach, the digital artefacts it produces are minimal,
so most DSP coders are happy to live with it.
50 ◾ Game Audio Programming 3
3.6 CONCLUSIONS
While the bi-quadratic filter is flexible, it doesn’t perform well when emu-
lating analog Butterworth-type filters, both in a steady state and while
being modulated. The unpredictable results account for that “harshness” or
“weirdness” that a sound designer might notice. The digital integrator is less
flexible in that it only works with Butterworth-type filters, but when that’s
what the sound designer actually wants, it’s the ideal choice. Special consid-
eration has to be given when introducing resonance, but a fractional delay
in the feedback path can overcome this without the need to oversample.
REFERENCES
1. H. Chamberlin. Musical Applications of Microprocessors, p. 488, Second
Edition, Hayden Books, Indianapolis, 1985.
2. R. Bristow-Johnson. Cookbook formulae for audio EQ biquad filter
coefficients. http://www.musicdsp.org/files/Audio-EQ-Cookbook.txt.
3. T. Stilson and J. Smith. Analyzing the Moog VCF with considerations for
digital implementation. In “Proceedings of the International Computer
Music Conference,” pp. 398–401, Hong Kong, China, August 1996.
4. A. Houvilainen. Non-linear digital implementation of the Moog ladder
filter. In “Proceedings of the International Conference on Digital Audio
Effects,” pp. 61–64, Naples, Italy, October 2004.
Chapter 4
Modeling Atmospheric
Absorption with a
Low-Pass Filter
Nic Taylor
CONTENTS
4.1 Introduction 52
4.2 Motivations 52
4.3 Review 53
4.4 Extreme Ranges 53
4.5 A Look at the Low-Pass Filter 54
4.6 Maths and Code 58
4.6.1 Extra Vocabulary 58
4.6.2 Math 59
4.6.3 API 60
4.6.4 Helper Functions 61
4.6.5 Implementation 62
4.7 Integration 63
4.8 Future Work 64
Appendix A: Absorption Coefficient 65
Appendix B: Root Finding 66
4.B.1 Newton’s Method 66
4.B.2 Trigonometric Solver 67
References 68
51
52 ◾ Game Audio Programming 3
4.1 INTRODUCTION
Air temperature, humidity, and atmospheric pressure (not to be confused
with acoustic pressure) change how sound is absorbed over distance.
This atmospheric absorption has the strongest effect on high frequencies
and so is often modeled as a low-pass filter as one component of the overall
attenuation settings for a sound instance. The attenuation settings, includ-
ing the low-pass filter, are exposed to the sound designer as a function of
distance and for the most part are then static.
This chapter explores using the atmospheric features of air tempera-
ture and humidity to provide a systematic way of setting a low-pass filter
cutoff for sound propagation. The cutoff frequency can audibly change
based on temperature and humidity. By plotting the frequency response
of atmospheric absorption, we can see that the low-pass filter is a good
approximation of the real-world values.
4.2 MOTIVATIONS
I was motivated to explore using atmospheric features to adjust attenua-
tion settings for two primary reasons:
1. While working on open world games where the player could travel
from extreme environments such as Arctic-like zones to dense jungle
or desert, there was a desire to find subtle ways to influence sound so
that the environment would feel different without relying on ambient
sounds. Similar to how air temperature changes drastically between
day and night, the goal also included having a subtle change in percep-
tion in the same zone at different times of day.
2. One common issue I had observed working on games was incon-
sistencies in attenuation settings resulting in bugs typically caught
toward the end of production. More than once, these inconsisten-
cies required a large refactoring of attenuation settings across the
entire game, so it seemed worthwhile to find a systematic way to
address attenuation. When it comes to the low-pass filter, using
well-understood atmospheric features like temperature and humid-
ity can set a decent starting point even if the attenuation settings
were still static in game.
Modeling Atmospheric Absorption ◾ 53
4.3 REVIEW
Attenuation of a point source as a function of distance, r, can be modeled
by the following equation [1]:
4.4 EXTREME RANGES
Observing environments on Earth, temperature and humidity change the
atmospheric absorption coefficient the most. Atmospheric pressure even
at high elevations is almost negligible, so we can treat atmospheric pres-
sure as a constant.
The extreme ranges of temperature and humidity give some intuition as to
how variable the cutoff will be. Using Figure 4.1 as a guide, sounds that are all
near field or within 25 meters will have an effect that is perhaps not audible.
Past 250 meters, the change in cutoff from a hot, dry environment to a cold,
dry environment can be in entirely different frequency bands. This difference
is potentially significant enough to impact the mixdown of the game.
1 Some audio engines also include a high-pass filter which sound designers use to remove distant
low-frequency content so that closer sounds have better low-end clarity.
2 A non-uniform temperature, where the ground temperature is different from that of the air above,
has interesting effects on sound propagation but is outside the scope of this chapter. See [1] for
more details.
54 ◾ Game Audio Programming 3
FIGURE 4.1 Cutoff frequencies by distance: (A) 70% humidity, 65°F; (B) 5%
humidity, 100°F; and (C) 5% humidity, −5°F.
Notice that before the cutoff frequency, the first-order filter is almost
identical with the atmospheric absorption. After the cutoff, the second-
order filter follows more closely. This is the same across combinations of
temperature and humidity.
4.6.2 Math
The equation for the attenuation coefficient is [1, 3, 5]:
0.1068−3352/T frN
b1 = (4.3)
frN2 + f 2
0.01275−2239.1/T frO
b2 = (4.4)
frO2 + f 2
where b1 and b2 are terms dependent on frN and frO , the relaxation frequen-
cies in Hz of nitrogen and oxygen. τ r is the ratio of the given temperature
in Kelvin and the reference air temperature. Similarly, Pr is the ratio of the
ambient atmospheric pressure in kilopascals and the reference ambient
atmospheric pressure [5].
To find the filter cutoff frequency requires solving for frequency f
given the coefficient α . To begin to solve for f , Equation 4.2 must be
expanded. Because of the large number of constants, some placeholders
are introduced: a1, a2, a3 for combined coefficients, N for nitrogen (or frN ),
O for oxygen (or frO ), and finally F = f 2 to avoid confusion with expo-
nents. a4 which is negative α is used for consistency. Substituting these in
Equation 4.2 yields
a2 NF a OF
0 = a1F + 2
+ 32 + a4 (4.5)
N +F O +F
Creating common denominators and expanding out the equation will put
equation 4.6.1 in a form that can be solved as cubic equation of F :
0=
( )( ) ( ) ( ) ( )(
a1 F N 2 + F O 2 + F + a2 NF O 2 + F + a3OF N 2 + F + a 4 N 2 + F O 2 + F )
(N 2
+ F ) (O + F )
2
(4.6)
0 = aF 3 + bF 2 + cF + d
a = a1
( )
b = a1 N 2 + O 2 + a2 N + a3O + a4 (4.7)
(
c = a1N 2O 2 + a2O 2 N + a3 N 2O + a4 N 2 + O 2 )
d = a4 N 2O 2
4.6.3 API
For the implementation, a class FilterCutoffSolver will encapsulate an
environment’s atmosphere and expose a function, Solve(), which returns
the cutoff frequency given a distance.
Because several factors in Equation 4.2 are independent of distance,
these factors are computed once and stored in variables based on
Equation 4.6.
class FilterCutoffSolver {
public:
FilterCutoffSolver(const double humidity_percent,
const double temperature_farenheit,
const double pressure_pascals = kPressureSeaLevelPascals);
4.6.4 Helper Functions
These functions do not need to be exposed by the API but help in unit
testing by being extracted from the FilterCutoffSolver class.
(0.02 + humidity_concentration) /
(0.391 + humidity_concentration);
// An approximate expected value is 25,000 Hz.
return pressure_normalized * oxygen_relax_factor;
}
4.6.5 Implementation
The constructor performs the substitutions from Equation 4.6 to be stored
and cached for repeated calls to Solve().
FilterCutoffSolver::FilterCutoffSolver(
const double humidity_percent,
const double temperature_farenheit,
const double pressure_pascals)
{
const double temperature_kelvin =
FarenheitToKelvin(temperature_farenheit);
const double temp_normalized =
temperature_kelvin / kReferenceAirTemperature;
const double pressure_normalized =
pressure_pascals / kPressureSeaLevelPascals;
4.7 INTEGRATION
From a performance viewpoint, it should be fine to compute the filter
cutoff per game object per frame. Alternatively, a table of cutoffs at differ-
ent distances could be computed once—for example, when loading into a
zone. Using a pre-computed table, game objects could linearly interpolate
64 ◾ Game Audio Programming 3
the cutoff value between the two nearest keys. Having a table in mem-
ory also has the benefit of being able to visually inspect what the cutoff
frequency is across distances.
For fast moving objects, the filter cutoff may need to be interpolated
with a slight delay over time or smoothing function to avoid sudden jumps
in the filter cutoff.4
Sound designers sometimes rely on an attenuation test world where
they can test a sound at different intervals and perform mix balancing.
The addition of humidity and temperature parameters would complicate
this work. As a result, instead of allowing a near infinite set of poten-
tial humidity and temperature combinations, the team can decide on a
few “atmosphere profiles” that the game will use. This should include an
easy mechanism to switch between these profiles so that testing can be
done without changing the game world or zones. Keep in mind that sound
designers may also want a way to override the atmospheric absorption
cutoff value.
4.8 FUTURE WORK
I acknowledge that the math and number of constants involved in the
computation of the frequency cutoff is heavy handed. Future work would
be attempting to determine whether the entire model could be simplified
to a linear equation. This approximation may not be too large a simplifi-
cation as the equations used here are only accurate to ±10% in the ideal
range in the first place [5].
This model assumes an ideal atmosphere without wind. Wind is a very
large component of sound propagation and one that I personally find very
interesting. However, wind’s impact on sound may not “sound” correct
without other cues to convey it. As an example, I have tried to add a “speed
of sound” feature to a couple of game engines but always ended up remov-
ing it. At a typical speed of 340 m/s, it seemed like the delay would easily
come across for sounds just a hundred meters away. However, the imple-
mentation would feel as if the sound engine was being unresponsive or lag-
ging. Perhaps the reason is that there are other subtle cues missing that the
brain requires for processing images and audio—cues that might become
available with continued improvements in areas like VR and 3D sound.
4 Wwise supports this behavior with Interpolation Mode Slew Rate from the Game Parameter
properties [6].
Modeling Atmospheric Absorption ◾ 65
FIGURE 4.9 Typical cubic to be solved. Recall that the root is the frequency
cutoff squared.
5 The span of cubic equations was found by evaluating many combinations of temperature,
humidity, and distance.
Modeling Atmospheric Absorption ◾ 67
double x = start;
double delta = epsilon;
while (fabs(delta) >= epsilon) {
const double cubic = ((a * x + b) * x + c) * x + d;
const double quadratic = (a_prime * x + b_prime) * x +
c_prime;
delta = cubic / quadratic;
x = x - delta;
}
return x;
}
REFERENCES
II
Voice
69
Chapter 5
Software Engineering
Principles of
Voice Pipelines
Michael Filion
Ubisoft
CONTENTS
5.1 Introduction 71
5.2 Definitions 72
5.3 Defining Requirements 73
5.4 Design 74
5.4.1 Expecting the Unexpected 74
5.4.2 Platform Agnostic 75
5.4.3 Automation 75
5.4.4 Disaster Recovery/Revision Control 76
5.4.5 Integrating Third-Party Tools 77
5.5 Implementation 77
5.5.1 Deployment 78
5.5.2 Error Handling for Non-technical People 78
5.5.3 Nothing More Permanent than a Temporary Fix 80
5.6 Conclusion 80
5.1 INTRODUCTION
As games continue to grow larger, the number of spoken lines continues
to increase. As a result, the challenges for managing and delivering the
lines have increased as well. Designing and implementing a flexible and
robust voice pipeline becomes important in order to deliver the highest
71
72 ◾ Game Audio Programming 3
quality voice in-game. There are many challenges, both human and tech-
nological, related to any voice pipeline. Providing the necessary tools and
removing repetitive tasks allows the people responsible for voice to focus
on quality. Additionally, providing the proper debugging tools and infor-
mation allows programmers to empower less technical members of the
team to address problems as they arise in an efficient manner without
programmer intervention.
While this chapter will specifically discuss the design and implemen-
tation of voice pipelines for large-scale game productions (i.e. anything
related to a spoken line in the game on the audio side), the important points
can be adapted to a variety of different pipelines and contain reminders of
many solid software design principles.
5.2 DEFINITIONS
This chapter uses a number of terms that we will define here for clarity:
FIGURE 5.1 An example of a basic pipeline for taking written text and produc-
ing and implementing an audible voice line in-game.
5.3 DEFINING REQUIREMENTS
Before implementing or beginning any cursory design of the required
voice pipeline, it is important to list all the requirements for your project.
Even between two different projects of comparable size and complexity,
small details can make all the difference.
What is the expected line count? Will it be 100 or 100,000 lines? This
means the difference between developing tools and automated processes to
deal with the massive quantity of files, and simply dragging and dropping
the files from one location to the next. This number will allow a proper
evaluation of the time invested versus the potential time savings. There is
no sense in spending 10 hours developing some tools where it would take
only 1 hour to manually treat all of the lines with an existing toolset.
In what manner will the text be delivered to the department for record-
ing? Will text be sent as it is written, or will it only be sent once approved
by the parties responsible? How will you track which lines have been
approved and/or already recorded? A spreadsheet will help, but what is the
potential for human error in this flow? Will adding any sort of tools for
validation and approval into this allow for greater quality control, or will
one person become a bottleneck?
Once the lines are recorded, how will they be sent from the recording
studio to the development team? There are well-known transfer methods
that can easily be automated such that the files are moved with existing
third-party tools (such as FTP or transfer to a NAS). Other solutions might
require more or less manual intervention depending on how those tools/
protocols were built.
Where will all of the files be stored before being integrated into the
game engine? Will they be stored in a revision control system such
as Perforce or Plastic SCM, or will they simply be stored on a NAS
somewhere accessible to all members responsible for working with
these files?
What type of processing will need to be performed on the files before
being integrated into the engine? Will you need to enforce loudness stan-
dards, projection levels, file formats, sample rates, etc.?
Are you using middleware for your audio engine or is it a custom-
developed solution? This is the difference between doing a search online
to see if there are any third-party tools that can already perform many
(if not all) of the tasks that you need when treating your sound data and
knowing that you need to develop everything yourself.
74 ◾ Game Audio Programming 3
How will these files be stored once integrated? If these files will reside
alongside other game assets, there isn’t any extra work necessary. However,
the audio middleware might store them as raw WAV files which need to be
converted before being used in-game.
While the answers to these questions will reveal many details about
your requirements, there are still many more that have not been listed
here. It is important to review all of the requirements in conjunction with
the rest of the team and to bear in mind that something that worked for the
first release of SuperAwesome Game might not work for SuperAwesome
Game 2.
5.4 DESIGN
With the list of requirements in hand, now the time has come to define the
pipeline. The first version will almost never look like the final version used
at the end of production. It is important to iterate on the pipeline design
continually throughout the implementation process as new requirements
and technical challenges arise.
Starting with the individual steps in the pipeline, determining what
resources are required at each step will help flush out the overall design.
Sometimes the resources may seem obvious, but it is important to docu-
ment them for a new member of your team that joins several months after
the pipeline design has been completed and implementation has started.
This has the additional advantage of someone looking at the design and
pointing out errors or questions of concern.
5.4.2 Platform Agnostic
This ties into the previous section’s principle in that you should not rely
on any particular piece of software when possible. Today your company’s
servers are running Windows, but will they be running Linux next year?
Your production is currently using Jenkins, but perhaps they will switch
to something like TeamCity because of the reduced manpower cost or the
budget suddenly opening up to purchase a commercial license. Tightly
integrating with any specific software, internal or third-party, may prove
detrimental in the future when someone makes decisions without fully
realizing their impact.
Some design choices will have little to no impact in any future soft-
ware migrations; others could render your pipeline absolutely useless or at
the very least require hours of additional work for migration. Having an
understanding from the outset of these potential hurdles will help make
informed decisions with other members of the production when consider-
ing changes.
5.4.3 Automation
Most game productions have already mastered continuous integration for
code and most types of data, including sound. However, automation can
be a real time saver when dealing with the large amounts of voice files with
different statuses that are being moved through the pipeline. Amazingly,
people often don’t think to ask for automation or are too afraid to request
it, even if they are executing the same repetitive tasks time and time again
by the same group of people.
There are many different options for how to implement your automa-
tion. Continuous Integration systems (such as Jenkins, JetBrains TeamCity,
or Atlassian Bamboo) are easy options, especially if they’re already pres-
ent in the wider game development pipeline. Many of these Continuous
Integration systems allow for easy integration using web UIs and don’t
always require the help of a build system or automation engineer. As an
additional bonus, they provide easy and graphical scheduling capabilities.
Another route is to use the Windows Task Scheduler, cron, or an equiv-
alent tool that is available on each system to schedule the process locally.
While this is definitely the least advantageous for a number of reasons
(what happens if there is a power outage, the workstation is turned off by
someone else, or a flood destroys your workstation), it will do in a pinch.
Anything that doesn’t require someone to click a button on a regular basis
is a win.
76 ◾ Game Audio Programming 3
place for any data stored on these servers (such as daily backups, replica-
tion, etc.). Whatever the choice for a revision control system, it is impor-
tant to choose one that is well adapted to the type of data that you will be
manipulating, as not all systems handle binary data the same.
Of course, not everyone has the benefit of having a revision control
system available to them, and not everyone has the expertise to configure,
deploy, and manage these types of systems in addition to all of the other
tasks that are required of them. The barebones method is to have a shared
folder where the different people responsible can put recorded files to be
used in the pipeline. These could be stored on any cloud storage provider’s
platform, easily accessible to everyone (and easily deletable as well).
No matter how the data is stored, make sure it is accessible to those who
need access and that you develop a plan for a worst-case scenario (some-
one accidentally drops a mug full of coffee on the external hard drive,
destroying it) and a method to recover from it.
5.5 IMPLEMENTATION
With the long list of requirements in hand and your rough design done,
the next challenge is implementing everything. Most of the following sec-
tions discuss general software design principals and examples of their
application in the context of a voice pipeline.
78 ◾ Game Audio Programming 3
5.5.1 Deployment
Issues will happen, requirements will change, and new challenges will be
introduced into the pipeline many times throughout development. It’s
important to try and prevent that which is preventable, but the ability to
make changes and tweaks and have the result available immediately is
important. Many production teams already have a plan in place to distrib-
ute new code and data, but it is not always appropriate for always-running
processes that may be hard or impossible to test before deploying changes
to them. Regardless of how the game editor or a game build is created, the
needs for pipelines don’t always align with these deployment methods.
There are several different strategies for deployment, ranging from once
code/data is submitted and the result executable/build is available then it
is ready for use, all the way to long-term planning and infrequent releases
(think middleware or game engines releasing a polished version only a
few times a year). Obviously if there is a code fix that is needed for one
tool used in the voice pipeline, then waiting a week for it to be deployed is
going to be a bottleneck.
try
{
FetchAudioFiles();
}
catch(Exception ex)
{
Console.Error.WriteLine(ex.Message);
}
To empower users to address issues, even ones that they may have
caused themselves, they need actionable messages. Consider the following
(contrived) example:
class ProgramA
{
static void Main(string[] args)
{
using (StreamReader reader =
new StreamReader(
File.Open(@"C:\RandomFile.wav",
FileMode.Open,
FileAccess.Read, FileShare.None)))
{
Thread.Sleep(100000);
}
}
}
class ProgramB
{
static void Main(string[] args)
{
try
{
File.Delete(@"C:\RandomFile.wav");
}
catch (UnauthorizedAccessException ex)
{
Console.Error.WriteLine(ex.Message,
"Please ensure this file isn't open in any other program");
}
}
}
Running ProgramA will ensure that no other program (or user) can
delete the file. Executing these together will most likely result in the
exception being thrown for ProgramB. If the error message were sim-
ply the message text of the exception (Access to the path 'C:\
RandomFile.wav' is denied), it would not be clear to the user why
that is or what they can do to fix it. Adding a simple message such as
Please ensure this file isn't open in any other program will
help users (adding the program name goes a step further, making it
even easier).
80 ◾ Game Audio Programming 3
5.6 CONCLUSION
Most of the topics that were discussed were basic software engineering
principles. Their importance in relation to a voice pipeline is in applying
these principles well and consistently. Implementing good error logging
some of the time, having a rigid set of failure principles that fail to con-
sider edge cases that haven’t happened, or monolithic designs will result in
a voice pipeline that is fragile, hard to use, and sucks up debugging time.
Keep in mind that the purpose of the pipeline is to be able to handle data
easily and flexibly, with the ultimate goal of adding high-quality data that
is important to the finished game.
While most game productions will have a limited lifetime, voice pipe-
lines often extend past this time into future projects. Forgetting this fact
can provide a source of frustration in the future because of rigid design
choices that limit the ability of developers to refactor and improve the
pipeline.
Chapter 6
A Stimulus-Driven
Server Authoritative
Voice System
Tomas Neumann
Blizzard Entertainment
CONTENTS
6.1 Introduction 81
6.2 Clarifying Terminology 82
6.3 The Purpose of a Server Authoritative Voice System 83
6.3.1 Playing in a Multiverse 83
6.4 Server Workflow 84
6.4.1 Collecting and Rating Stimuli 84
6.5 Client Workflow 85
6.6 Line Selection 86
6.7 Network
Considerations 86
6.7.1 Prediction and Client-Only VO 87
6.7.2 Network Reliability 87
6.8 Voice Line Triggered Gameplay and Multi-Locale
Client Connections 88
6.9 Conclusion
89
6.1 INTRODUCTION
Spoken words are often at the core of why players connect and relate to
the characters within a video game. Voices can be used for tutorials, to tell
the story, to create drama, or to convey gameplay information. Enemies
in the original Wolfenstein 3D were yelling “Achtung!” and “Mein Leben!”
81
82 ◾ Game Audio Programming 3
to telegraph their AI states; in The Witcher, the voice lines drive the cam-
era cuts in most of the in-game cinematics; Overwatch’s heroes warn each
other with a “Behind you!”; and in The Last of Us, we can hear the heart-
wrenching death cries of a young girl.
In a single-player offline game, the client makes all of the decisions
about which voice lines to play, but multiplayer games are more complex
because it may be necessary that all players hear the same variation of
a line. By playing the same line on all clients, all of the connected play-
ers can experience the world through a shared experience. And if some
funny voice lines have a rare probability to play, all players will share their
surprise and this moment with each other when they do play.
In this chapter, I present some techniques which can be used to create a
voice system which is based on an authoritative server. The server directs
which lines are chosen, picks who says something, and which client should
play these lines.
6.2 CLARIFYING TERMINOLOGY
Game voice over (VO) is often called “dialog” or “dialogue.” Historically,
hardware channels on a soundcard were also called voices, often in the
context of a voice limit. However, these days the term “voice” is generally
used to describe spoken words in the field of game audio. The mechanism
through which game characters talk, at least in this chapter, are voices,
dialogue, and VO, and they are used interchangeably.
Some games like The Sims use an artificial language for all of their char-
acter dialog,1 but a vast majority of games need to translate and localize
their voice lines for each supported language or locale. A locale describes a
cultural set of words out of a language and country: for instance, es-ES for
Spanish spoken in Spain (sometimes referred to as “Castilian Spanish”)
or es-MX for Spanish in Mexico. Localization is the process of translating
a voice line in a manner culturally appropriate for a given locale, casting
voice talents appropriate to the preferences of that region, recording the
audio assets, and importing the data. Many multiplayer games allow play-
ers to connect to the same server or play directly with each other even if
their game clients are set to different locales.
A stimulus is an event that invokes a specific reaction; in the case of
this voice system, it can be as simple as what a game character should talk
about.
make a dramatic difference in the capability of the voice system and the
perceived quality of VO in the game.
6.4 SERVER WORKFLOW
All connected clients send their player inputs to the server with very
different bandwidth and latency times. The server receives these inputs,
simulates the world, and executes what the game characters might do and
say. A server-based voice system collects all requests over the length of the
server frame and then figures out which stimuli to send to which clients.
The next three stimuli with the characters all using their ability have the
same priority of 4. Ana is alive and can talk, and all clients are informed
about her voice request. Brigitte is dead but more importantly is already
requested to say a higher-priority line within the same server frame. Her
additional voice request can be dropped, and no client ever receives a
request. Cain is also alive, so the server sends this request to all clients.
6.5 CLIENT WORKFLOW
Each client receives a unique set of VO information and commands, and
they have some flexibility to follow the server’s directives. One example
of what the received stimuli might be is shown in Table 6.2. The damage-
witnessed warning from frame 10 is only sent to Client A to make Brigitte
say the warning to the player controlling Ana. When Client A receives the
request to play a death line for Brigitte on the next frame, it needs to han-
dle the request by interrupting her previous warning line. Player A might
hear something like “Behin … Aaargh….” Clients B and C can just play
the death line directly, because they never received the warning request.
In the end, all clients hear Brigitte’s death line.
For the two remaining requests of frame 11 of Ana and Cain using their
ability, all clients received the same information, but each client can deter-
mine which of the two requests makes more sense to their players. For
example, there may be a mechanism in place that limits how many voice
lines of a certain category or with the same stimulus priority can play at
the same time.
Let’s say there are already two characters who are currently saying prior-
ity 4 lines and the game has a rule to only ever play three priority 4 lines on
a client. Which client should play which of Ana’s and Cain’s lines? Imagine
now Ana and Brigitte are nearby, while Cain is on the other side of the
map. Depending on the game type, it could make more sense for Clients A
and B to play Ana’s line, while Client C chooses to play Cain’s line.
While most lines are requested by the server and played consistently
on all the three clients, each client also makes specific decisions based on
some game rules in order to improve the clarity and understanding of the
game world for each player.
6.6 LINE SELECTION
Until now, we have ignored the topic of line selection in order to focus
on stimulus handling. Once a server has selected which stimuli to send
to which clients, it must now decide which specific voice line should be
played on those clients. Voice line variants can have different probabilities
or extra criteria. Depending on the state of the game, the server might
pick, for instance, friendly or hostile versions of the same stimulus or spe-
cial lines depending on what map is active or which team is in the lead.
Once it makes the selection, the server can then send unique messages to
specific clients with different voice line IDs. From the client’s perspective,
it just receives a line ID and executes the line according to its playback
rules.
In Overwatch, when opponents of the hero McCree hear the line “It’s
high noon!” they learn quickly to take cover to avoid fatal hits. But all
members of his team hear him say the less threatening friendly variant
“Step right up.” What variant should the player controlling McCree hear?
Maybe the friendly version? After all, he is a member of his own team and
cannot harm himself. But having him say the enemy line sells the fantasy
of being a hero better and teaches the player in return to be very cautious
if they ever hear this line from another player. Table 6.3 shows how the
server would pick line variants and format packets to send to the clients
accordingly. Players M, N, and O are on one team, and Players P, Q, and R
are their opponents.
6.7 NETWORK CONSIDERATIONS
Sending information over the wire will always introduce issues to consider
and weigh against. Speech for a character is not something that requires
sending updates every frame—we mostly get away with telling the
clients which character should say what line, specified by your system of
identification.
6.7.2 Network Reliability
A chat with your friendly colleague who is in charge of network messag-
ing will quickly reveal that it is a deep and complex topic. A server voice
system can contribute to a smoother gameplay experience if the data to
be sent over the wire is small and if the message reliability is chosen cor-
rectly. A reliable packet will be resent by the server if it does not receive a
confirmation that the packet was accepted by the client. Contrariwise, an
unreliable packet will be sent just once and never resent. The sender will
never know if it was accepted.
In order to reduce network usage, some “chatter” voice lines—lines
which are not meaningful to gameplay but which provide some immersive
quality to the game world—can be sent in an unreliable fashion. A player
with high packet-loss may experience that some characters do not say
88 ◾ Game Audio Programming 3
their chatter lines because the packet to inform their client to play the line
never was received. Voice lines which are important to the gameplay must
be sent reliably because it is more important for the player to hear the line
at all, even if there is a substantial delay.
the game event the game designer wants to invoke after the line, and the
player with en-US will wait 0.2 seconds after their line has finished play-
ing. If the same en-US player would play with someone in the es-ES locale,
then they would not wait any extra time, but the es-ES player would need
to wait 0.1 seconds.
6.9 CONCLUSION
For a multiplayer game, a server authoritative voice system allows for very
interesting gameplay features, dramatically higher clarity and quality in
VO for individual players, and hardening of anti-cheat efforts. But this
feature comes with the additional cost of dealing with edge cases when
looking through the player-specific lens of the game world. There are
logistical hurdles to resolve and network issues and delays to be com-
pensated for. I hope I was able to introduce you to some techniques and
ideas and you have a kick start once you approach this field and consider
developing a voice system yourself.
III
Audio Engines
91
Chapter 7
Building the
Patch Cable
Ethan Geller
Epic Games
CONTENTS
7.1 On Patch Cables 93
7.2 C++
Audio Abstractions 95
7.3 First-Pass Abstractions 95
7.4 The Patch Cable Abstraction 97
7.5 Patch Inputs and Outputs 99
7.5.1 Circular Buffer 99
7.5.2 Patch Output 102
7.5.3 Ownership Semantics 105
7.5.4 Patch Input 105
7.6 Patch Cable Mixer 108
7.7 Patch Splitter 111
7.8 Patch Mixer Splitter 114
7.9 Patch Cable Abstraction Applications 116
7.10 Conclusion
117
References118
into an amp or heard your own voice amplified through a monitor. There’s
a satisfying click, and with it, you are audible; a circuit is closed, and you
are in it. So often we consider electricity a utility (a means to power your
refrigerator) or a luxury (a means to power your television), but analog
audio signals give us the chance to be complicit in the systems we use.
When you plug an instrument into an amp or a PA, you are engaging in
a century-old ritual in which your kinesics drive a current used to push
demoniac amounts of air.
This is what got me into audio in the first place. There’s no visual corre-
late for being amplified, nor is there one for resonating in a physical space.
These are experiences that are unique to sound: to have every motion of
your fingers along a fretboard interact with every surface of a room or to
glissando upward and find sudden resonances along the way. In many
cathedrals, a single sound at a single point in time will have seconds of
consequences.
Imagine my disappointment when I realized that very little of this
magic is reproducible in game audio programming. Granted, there is
plenty of software that lets you design and iterate on arbitrary signal flows:
MaxMSP/Pure Data, Supercollider, Reaktor Blocks, and Reason are all
brilliant tools for iterating on audio systems. But these are all sandboxes:
once we build the topology we want, it can’t be extracted into a component
of a larger piece of shipped software. Patches built in PureData must stay
in PureData—you can record the results of your patch, but you can’t take
it with you.1 Faust is the closest thing to what I’d like: a way to experiment
with routing signals within a larger piece of compiled software. However,
at the end of the day, there is still a distinct barrier between the systems I
use in Faust and the systems I use in my larger C++ codebase.
What makes this such a shame is that the act of playing a game is very
similar to playing an instrument. Compare the experience of playing gui-
tar through an amplifier with the experience of playing any action game.
You apply pressure to the left thumb stick; your avatar begins to run.
You press A while applying that same pressure to the thumb stick, and
your avatar jumps across a precipice, narrowly escaping death. You keep
the right trigger held down and tap the left trigger at just the right time,
and the tail of your car lurches out from behind you: you are drifting,
and it is badass. Watch any participant’s hands during a fighting game
1 This is only partially correct: Enzien did create and later open-source a service called Heavy,
which transcompiles PureData patches into C++.
Building the Patch Cable ◾ 95
tournament, and you will know that the best fighting games are as idiom-
atic as any Chopin étude.
class AudioInputInterface
{
public:
AudioInputInterface();
virtual ~AudioInputInterface();
virtual GenerateAudio(float* OutAudio, int32_t NumSamples) = 0;
}
class AudioOutputInterface
{
public:
96 ◾ Game Audio Programming 3
AudioOutputInterface();
virtual ~AudioOutputInterface();
virtual ReceiveAudio(const float* InAudio, int32_t NumSamples) = 0;
}
class AudioEngine
{
private:
std::vector<AudioInputInterface*> Inputs;
mutable std::mutex InputListMutationLock;
std::vector<AudioOutputInterface*> Outputs;
mutable std::mutex OutputListMutationLock;
std::vector<float> ScratchAudioBuffer;
std::vector<float> MixedAudioBuffer;
public:
void RegisterInput(AudioInputInterface* InInput)
{
std::lock_guard<std::mutex> ScopeLock(InputListMutationLock);
Inputs.push_back(InInput);
}
void ProcessAudio()
{
const int32_t NumSamples = 1024;
MixedAudioBuffer.resize(NumSamples);
ScratchAudioBuffer.resize(NumSamples);
// Poll inputs:
std::lock_guard<std::mutex> ScopeLock(InputListMutationLock);
for(AudioInputInterface* Input : Inputs)
{
Input->GenerateAudio(ScratchAudioBuffer.data(), NumSamples);
// Mix it in:
for(int32_t Index = 0; Index < NumSamples; Index++)
MixedAudioBuffer[Index] += ScratchAudioBuffer[Index];
}
// Push outputs:
std::lock_guard<std::mutex> ScopeLock(OutputListMutationLock);
for(AudioOutputInterface* Output : Outputs)
{
Output>ReceiveAudio(MixedAudioBuffer.data(), NumSamples);
}
}
}
What if there’s a codec we’re trying to use that only takes 20 millisec-
onds of audio at a time and the number of samples per each callback of
ProcessAudio() is not exactly 20 milliseconds of audio? What if any
given call to GenerateAudio() or ReceiveAudio() takes a prohibitively
long amount of time?
With this framework, we’ve created a good set of interfaces to build a
single-threaded topology for audio signal processing. However, if anyone
else wants to try patching audio from our subsystem to theirs, they will have
to debug and understand our audio engine, rather than focus on theirs.
When I build an API that will allow you to send or receive an audio
signal, I don’t want to give you a buffer or a callback. I want to give you
one end of a patch cable, and I want you to be able to plug it into anything.
Consider the following API:
class AudioEngine
{
//...
public:
PatchInput ConnectNewInput(uint32_t MaxLatencyInSamples);
PatchOutput ConnectNewOutput(uint32_t MaxLatencyInSamples);
}
AudioEngine DefaultAudioEngine;
//...
PatchInput MySynthSend = DefaultAudioEngine.ConnectNewInput(4096);
public:
CircularAudioBuffer()
{
SetCapacity(0);
}
CircularAudioBuffer(uint32_t InCapacity)
{
SetCapacity(InCapacity);
}
memcpy(
&DestBuffer[0],
&InBuffer[NumToWrite],
(NumToCopy - NumToWrite) * sizeof(SampleType));
return NumToCopy;
}
memcpy(
&OutBuffer[NumRead],
&SrcBuffer[0],
(NumToCopy - NumRead) * sizeof(SampleType));
return NumToCopy;
}
Building the Patch Cable ◾ 101
ReadCounter.store(
(ReadCounter.load() + NumSamplesRead) % Capacity);
return NumSamplesRead;
}
ReadCounter.store(ReadCounterNum);
}
}
// Get the number of samples that can be popped off of the buffer.
uint32_t Num() const
{
const uint32_t ReadIndex = ReadCounter.load();
const uint32_t WriteIndex = WriteCounter.load();
This structure is safe for SPSC situations. Notice how we explicitly load
our read and write counters at the beginning of Peek(), Pop(), and Push()
but only increment them at the very end of Pop() and Push(). We then
truncate the amount of audio we push to the buffer based on our poten-
tially stale read counter or truncate the amount of audio we peek/pop
based on our potentially stale write counter. In short, if one thread is in
the middle of calling Push while another thread is calling Pop, the worst
thing that can happen is that we truncate the push and pop calls but we
never lock either call. If the buffer is suitably large enough, we won’t need
to worry about the push and pop calls fighting each other.
struct PatchOutput
{
private:
// Internal buffer.
CircularAudioBuffer<float> InternalBuffer;
Building the Patch Cable ◾ 103
std::atomic<int32_t> NumAliveInputs;
public:
PatchOutput(uint32_t MaxCapacity, float InGain = 1.0f)
: InternalBuffer(MaxCapacity)
, TargetGain(InGain)
, NumAliveInputs(0)
{}
if (bUseLatestAudio
&& InternalBuffer.Num() > NumSamples)
{
InternalBuffer.SetNum(NumSamples);
}
return PopResult;
}
MixingBuffer.SetNumUninitialized(NumSamples, false);
int32_t PopResult = 0;
if (bUseLatestAudio
&& InternalBuffer.Num() > NumSamples)
{
InternalBuffer.SetNum(NumSamples);
PopResult = InternalBuffer.Peek(
MixingBuffer.GetData(), NumSamples);
}
else
{
PopResult = InternalBuffer.Pop(
MixingBuffer.GetData(), NumSamples);
}
return PopResult;
}
// Returns true if the input for this patch has been destroyed.
bool IsInputStale() const { return NumAliveInputs == 0; }
I’ve added MixInAudio() for use with the PatchMixer class that we will
build later in this chapter. The MixInBuffer() function that it uses takes
an existing buffer and sums it into a different one2:
2 For dynamic gain values like these, we will need to interpolate from one gain value to the next in
order to avoid significant discontinuities.
Building the Patch Cable ◾ 105
void MixInBuffer(
const float* InBuffer, float* BufferToSumTo,
uint32_t NumSamples, float Gain)
{
for(uint32_t Index = 0; Index < NumSamples; Index++)
{
BufferToSumTo[Index] += InBuffer * Gain;
}
}
We can have the PatchInput class own a strong pointer to its corre-
sponding PatchOutput in order to guarantee that it is not deleted until
the PatchInput instance is deleted as well. Alternatively, we can have the
PatchInput class own a weak pointer to the PatchOutput instance, and
any time we want to query or push audio to the PatchOutput instance, we
would attempt to lock the weak pointer, converting it to a strong pointer for
the scope of our work. Using a strong pointer has the advantage of avoid-
ing the overhead of incrementing and decrementing an atomic reference
count during every audio callback. Using a weak pointer has the advan-
tage of ensuring the circular buffer is deleted as soon as the PatchOutput
is deleted.
I’ve decided on the weak pointer, in order to ensure correctness.
7.5.4 Patch
Input
Let’s take a look at the other end of our cable, the PatchInput class:
class PatchInput
{
private:
// Weak pointer to our destination buffer.
106 ◾ Game Audio Programming 3
PatchOutputWeakPtr OutputHandle;
public:
// Valid PatchInputs can only be created from explicit outputs.
PatchInput(const PatchOutputStrongPtr& InOutput)
: OutputHandle(InOutput)
, PushCallsCounter(0)
{
if (InOutput)
{
InOutput->NumAliveInputs++;
}
}
return *this;
}
~PatchInput()
{
if (auto StrongOutputPtr = OutputHandle.lock())
{
StrongOutputPtr->NumAliveInputs--;
}
Building the Patch Cable ◾ 107
if (!StrongOutput)
{
return -1;
}
int32_t SamplesPushed =
StrongOutput->InternalBuffer.Push(InBuffer, NumSamples);
return SamplesPushed;
}
if (!StrongOutput)
{
return;
}
And just like that, we’ve built a thread safe SPSC cable. There are two
ways we could be passing shared pointers around. The first option is to
encapsulate all of the state and APIs that the PatchInput will need to
use in a struct that is private to the PatchOutput class and instead give
108 ◾ Game Audio Programming 3
class PatchMixer
{
private:
// New taps are added here in AddNewPatch, and then are moved
// to CurrentPatches in ConnectNewPatches.
std::vector<PatchOutputStrongPtr> PendingNewInputs;
PendingNewInputs.reset();
}
Building the Patch Cable ◾ 109
public:
PatchMixer() {}
PendingNewInputs.emplace_back(
new PatchOutput(MaxLatencyInSamples, InGain));
return PatchInput(PendingNewInputs.back());
}
if (MaxPoppedSamples < 0)
{
// If MixInAudio returns -1, the PatchInput has been
// destroyed.
CurrentInputs.erase(CurrentInputs.begin() + Index);
}
else
{
MaxPoppedSamples =
std::max(NumPoppedSamples, MaxPoppedSamples);
}
110 ◾ Game Audio Programming 3
return MaxPoppedSamples;
}
// Iterate through our inputs, and see which input has the
// least audio buffered.
uint32 SmallestNumSamplesBuffered =
std::numeric_limits<uint32_t>::max();
if (SmallestNumSamplesBuffered ==
std::numeric_limits<uint32>::max())
{
return -1;
}
else
{
// If this check is hit, we need to either change this
// function to return an int64_t or find a different way
// to notify the caller that all outputs have been
// disconnected.
Building the Patch Cable ◾ 111
assert(SmallestNumSamplesBuffered <=
((uint32_t)std::numeric_limits<int32_t>::max()));
return SmallestNumSamplesBuffered;
}
}
};
7.7 PATCH
SPLITTER
This implementation of PatchMixer is surprisingly simple, and now that
we have that in place, we’ll also want a splitter. The PatchSplitter will
have one PatchInput and distribute it to multiple PatchOutputs (also
potentially on different threads). Once again, the implementation using
the patch cable abstraction is straightforward.
class PatchSplitter
{
private:
std::vector<PatchInput> PendingOutputs;
mutable std::mutex PendingOutputsCriticalSection;
std::vector<PatchInput> ConnectedOutputs;
mutable std::mutex ConnectedOutputsCriticalSection;
public:
PatchSplitter() {}
{
std::lock_guard ScopeLock(PendingOutputsCriticalSection);
PendingOutputs.push_back(StrongOutputPtr);
}
return StrongOutputPtr;
}
std::lock_guard<std::mutex> ScopeLock(
ConnectedOutputsCriticalSection);
int32_t MinimumSamplesPushed =
std::numeric_limits<int32_t>::Max();
if (MinimumSamplesPushed == std::numeric_limits<int32_t>::max())
{
MinimumSamplesPushed = -1;
}
return MinimumSamplesPushed;
}
if (SmallestRemainder == std::numeric_limits<uint32_t>::max())
{
return -1;
}
else
{
// If we hit this check, we need to either return an int64_t
// or use some other method to notify the caller that all
// outputs are disconnected.
assert(SmallestRemainder <=
((uint32_t)std::numeric_limits<int32_t>::Max()));
114 ◾ Game Audio Programming 3
return SmallestRemainder;
}
}
};
PatchMixer follows from the first, and PatchSplitter follows from the
second. Given this, we can effectively create an MPMC data structure by
connecting an MPSC structure to an SPMC structure, as long as there is
some worker thread or fiber that can consume from the MPSC structure
and produce to the SPMC structure. That MPMC structure for us will
be PatchMixerSplitter: a class that will mix down inputs from multiple
threads and send the result to outputs on multiple threads.
class PatchMixerSplitter
{
private:
PatchMixer Mixer;
PatchSplitter Splitter;
// This buffer is used to pop audio from our Mixer and push it to
// our splitter.
std::vector<float> IntermediateBuffer;
protected:
// This class can be subclassed with OnProcessAudio overridden.
virtual void OnProcessAudio(std::span<float> InAudio) {}
public:
PatchMixerSplitter() {}
if (NumSamplesToForward <= 0)
{
// Likely there are either no inputs or no outputs connected,
// or one of the inputs has not pushed any audio yet.
return;
}
IntermediateBuffer.reset();
IntermediateBuffer.insert(
IntermediateBuffer.begin(), NumSamplesToForward, 0);
OnProcessAudio(
std::span<float>(IntermediateBuffer.data(),
IntermediateBuffer.size()));
int32_t PushResult =
Splitter.PushAudio(
IntermediateBuffer.GetData(), NumSamplesToForward);
assert(PushResult == NumSamplesToForward);
}
};
7.10 CONCLUSION
Recently, a programmer reached out to me because they wanted to be able
to send Unreal’s native VOIP output to any arbitrary playback device on
Linux. While I did not have a Linux machine handy, I exposed an API
from our VOIP engine class:
Audio::FPatchOutputStrongPtr GetMicrophoneOutput();
Audio::FPatchOutputStrongPtr GetRemoteTalkerOutput();
veneer of a frail plastic fan. And yet our work does have a material impact
in one important place: labor. When you write difficult code, it has conse-
quences for the people that need to finish using it before they can go home
and have dinner with their families.
I’ve been the victim of this in some cases and the perpetrator in others.
These are consequences much worse to me than any bug I could try and
introduce in a codebase. There’s one very effective cure I’ve found for this,
and it is this: build useful abstractions and APIs, make sure they are read-
able, document them, test them, and share them. Build something that
will let someone route VOIP audio to an external device within an hour
rather than within a 6-hour, energy-drink-fueled panic. At the very least,
take this abstraction, use it, and share it. The next time you see someone
panicking over the specifics of multithreaded audio, give them a patch
cable. The rest is intuition.
REFERENCES
Murray, Dan. “Multithreading for Game Audio.” Game Audio Programming
Principles and Practices Volume 2, edited by Guy Somberg. CRC Press, 2019,
pp. 33–62.
Chapter 8
CONTENTS
8.1 Introduction 120
8.2 3D Geometry 121
8.2.1 Frames of Reference 121
8.2.2 The Math of Transforms 123
8.2.3 Reversibility 123
8.2.4 Changing Frames of Reference Using Transforms 125
8.3 Listener Geometry 125
8.4 Listeners as a Frame of Reference 126
8.5 Multiple Listeners 127
8.6 Counterintuition: Playing Once 128
8.6.1 Multiple Triggering 128
8.6.2 Clipping and Phasing 128
8.6.3 Significantly Extra CPU Costs 129
8.7 Drawbacks and Edge Cases 129
8.7.1 Boundary
Flipping 129
8.7.2 Singleton Systems 129
8.7.3 CPU Costs 130
8.7.4 Competitive Multiplayer 130
8.8 Additional Audio Considerations 130
8.8.1 Music 130
8.8.2 Local-Player-Only Audio 131
8.8.3 User Interface Audio 131
8.9 Rendering Twice: Dual Output 131
119
120 ◾ Game Audio Programming 3
8.10 Conclusion
131
References132
8.1 INTRODUCTION
Split screen is a technique whereby a game engine provides multiple views
into the same game instance with separate controls given to multiple local
(non-networked) players. Each local player can control their own view,
and each view is independent. Exactly where and how the splits are dis-
played on the screen is up to the game engine and often provided as player
settings preference. For example, a player may choose to split a screen
between top and bottom or between left and right. The number of splits
supported is also up to the game and the game engine. Most games which
support split screen usually limit it to two screens, but there are many
notable examples that support up to four splits. Figure 8.1 shows some of
the possible arrangements.
In the early days of video gaming, when the Internet was less com-
mon, split screen was a commonly supported feature. For multiplayer
games without a network connection, it was a requirement. As networked
multiplayer became more widely adopted in the early 2000s by gaming
consoles, split screen began to fall out of favor. However, it has seen
somewhat of a resurgence in recent years, especially with local split screen
in combination with networked multiplayer. In other words, multiple
players can play on one game console client while also playing along with
other players connected to the same game on remote clients.
Split screen support is fundamentally challenging from a CPU and
GPU resource point of view, as displaying multiple views requires render-
ing and processing more objects. Furthermore, many optimization tech-
niques that depend on frustum culling or distance-based culling are less
effective when multiple views in multiple locations can be rendered.
While graphical quality is reduced and rendering multiple views for
split screen can be confusing for players, the audio experience of split
FIGURE 8.1 From left to right, the most common split screen arrangements.
Vertical split, horizontal split, and four-way split.
Split Screen and Audio Engines ◾ 121
8.2 3D
GEOMETRY
To understand the details of split screen for audio, it’s important to first
review the basic mathematics of 3D geometry.
(4,3)
(0,0)
FIGURE 8.2 Simple X–Y plot of a point relative to an origin (0, 0).
122 ◾ Game Audio Programming 3
(8,7)
G’
(4,4)
(0,0)
FIGURE 8.3 The original point, in the frame of reference of G, could also be
considered relative to a different arbitrary frame of reference, G′.
M = TRS
p′ = Mp
p′ = TRSp
(
p′ = T R ( Sp ) )
Note that the application of the transforms follows right to left. First,
S, then R, then T are applied to the point, p. The order in which these
transformations are applied does change the outcome, as shown in
Figure 8.4.
Combining these matrices in any order results in a technically valid
transformation matrix. However, the standard convention is to first apply
scale, then rotation, then translation. This convention is used primarily
because it’s easier to conceptualize the results of these operations in this
order than other orders.
8.2.3 Reversibility
One important property of the linear transformations we use in 3D game
engines is that they are reversible. To undo the operation of a scale (S), we
multiply by its inverse (S −1):
I = S −1 S
124 ◾ Game Audio Programming 3
FIGURE 8.4 The order in which rotation and translation transformations are
applied in a given coordinate system (frame of reference) has an effect on the
resulting output.
where I is the identity matrix—a matrix with ones in the diagonals and
zeroes everywhere else. A 3 × 3 identity matrix looks like this:
1 0 0
0 1 0
0 0 1
M =T RS
The inverse transformations are applied in reverse order that they were
applied to begin with – that is, first inverse scale, then inverse rotation,
then inverse translation. This makes sense if you imagine the steps needed
to precisely undo scaling, rotating, and translating an object in a 3D scene.
You’d need to first move it back to the origin (undoing translation), reverse
the rotation, and then multiply the scaling by its inverse.
8.3 LISTENER GEOMETRY
Audio engines typically have an object which contains properties which
represent information about a virtual listener. You can think of a virtual
listener as a pair of ears (or, more generally, a microphone) in a game
world. Audio is rendered from the perspective of this listener, and many
significant CPU optimizations are made based on the location and ori-
entation of this listener relative to sound sources. For VR games and
first-person games, listeners are almost invariably hooked up to head-
tracking mechanisms and represent the orientation of the player’s head
in the game.
Much like a camera and its transform, listeners usually have their 3D
orientation represented by a matrix transform of translation, orientation,
and scale—although the scale transform is almost always ignored. The
126 ◾ Game Audio Programming 3
listener transform is often set by the same code which sets up the camera
transform but not always. There are many cases where you may want to
render audio relative to a virtual listener even though there is no cam-
era transform available (e.g. tools which preview spatialization or game
features which allow listener traversal without changing a game’s camera
position).
3D sounds in games are panned and distance-attenuated relative to
this listener transform, though there are some notable exceptions. For
example, third-person games often have a hybrid listener setup where an
optional position vector is used to determine distance-attenuation vs the
translation of the listener transform.1 This decoupling is usually to com-
pensate for the fact that game cameras are often far above a controllable
game character, and attenuating from the camera position would result
in nearly all audio sounding far away, which is likely not the desired
effect.
1 For more details about how third-person camera attenuation works, see Somberg (2017).
Split Screen and Audio Engines ◾ 127
Note that the order of operations here does matter, so only multiply the
inverse transform from the left on both sides to keep it simple. On the
right is a computable transform using any 3D math library which deals
with affine transform. (You can also work it out by hand to prove it.)
On the left, the inverse listener transform multiplied by the listener
transform cancel out, creating an identity transform (i.e. essentially
multiply by 1.0). This leaves the desired answer for the Soundlistener trans-
form since an identity matrix multiplied by any other matrix is just the
matrix itself:
If the audio engine always uses listener-relative transforms for sounds, the
audio renderer (the low-level DSP mixing code required to actually gener-
ate audio from the parameters derived from higher level features) only ever
needs to deal with listener-relative sound spatialization and attenuation.
In fact, no representation of the listener is required to accurately render
audio; many details are consequently simplified, and there is a significant
reduction in code complexity.
8.5 MULTIPLE LISTENERS
From the perspective of the audio engine, the key difference with split
screen games is the addition of extra listeners. Each split screen view has
its own camera transform and a corresponding listener transform.
On first impulse, you might expect that an audio engine would have to
deal with many additional complexities and details to render audio from
multiple perspectives. However, if the audio engine uses listener-space
sound transforms, it becomes surprisingly easy to support any number
of additional listeners. The only additional step required before comput-
ing a given sound’s listener-relative transform is to determine that sound’s
closest listener.
128 ◾ Game Audio Programming 3
8.6.1 Multiple Triggering
Playing a sound multiple times for each listener in range would result in a
large percentage of 3D sounds in a game getting triggered multiple times.
Every gun shot, every footstep, every line of 3D dialogue…—everything
would be played multiple times. It would sound confusing in a two-player
split screen. In a four-player split screen, it would be total sonic chaos.
Players will not be able to distinguish which camera/listener any given
sound is coming from.
8.7.2 Singleton Systems
One drawback of the technique of playing each sound only on the closest
listener (and, in fact, split screen in general) for audio engines is that a
variety of audio engine features are often implemented assuming a single-
ton listener.
For example, the majority of audio engines traditionally implement a
listener-based reverb. In other words, what reverb settings to use (or which
convolution impulse response chosen) are based on where the listener is.
Running multiple reverbs, one for each listener, would be computation-
ally expensive as reverb is often one of the more expensive DSP effects in
games. For systems that assume a singleton listener (e.g. ambient zones,
dynamic ambient systems, global effects processing like underwater DSP
130 ◾ Game Audio Programming 3
8.7.3 CPU Costs
Although rendering a 3D sound once relative to the closest listener is
computationally less expensive than rendering it for each listener, there
is still an additional CPU cost for split screen audio rendering. The
additional cost is usually a result of playing more sounds than would
otherwise be played—with multiple listener locations, more audio is
simply in range. Thus, doing CPU performance analysis between split
screen and non-split-screen games will usually result in some ratio of
additional cost somewhere between one and two times as expensive.
This factor may not seem like that much for an optimized audio ren-
derer, but split screen usually has CPU overhead across the board. To
compensate, it’s possible to reduce audio rendering quality or disable
certain features when in split screen mode in order to maintain a more
constant CPU profile.
8.7.4 Competitive Multiplayer
For competitive multiplayer games, rendering audio only from the per-
spective of the closest listener may have gameplay implications. A player
who can hear all audio from their perspective (even if far away) will have
a significant competitive advantage over a player who is always hearing
audio that is far from them but close to their split screen partner. This
may seem like a good argument for rendering the audio multiple times,
but that will not resolve this issue. Split screen audio (and graphics) will
fundamentally make a player have inferior information to be competitive,
so it’s not a reasonable constraint for split screen audio.
8.8.1 Music
Music in split screen should always be a singleton—otherwise it will be
cacophony. It is up to the music system designer as to how each player in
Split Screen and Audio Engines ◾ 131
a split screen interacts with the music. For example, an interactive music
system which plays different music states based off stealth modes or action
modes will need take into account the actions and state of all players. Such
a system should largely operate identically between single screen and split
screen modes.
8.8.2 Local-Player-Only Audio
Often in multiplayer games, there is a portion of audio that is intended
to only play on the local player and not be heard by other players, such as
quest queues or health warnings. However, there are multiple local players
for split screen audio, and—in most cases—local-player-only sounds will
need to be played for each split screen player, even though many gameplay
systems are often written to assume there is only one listener.
8.10 CONCLUSION
If reasonable compromises are accepted, split screen support for audio
engines is surprisingly straightforward to implement. The key insight is
to convert world-space sound transforms to the listener-space transform
132 ◾ Game Audio Programming 3
of the sound’s closest listener. Usually, the experience of playing the same
game with your friend (or enemy) on the couch locally is compelling
enough to overcome any of the side effects of the technique. In general, if
players are playing a split-screen game, the overall experience should be
optimizing the social experience.
REFERENCES
Somberg, Guy. “Listeners for Third-Person Cameras.” Game Audio Programming
Principles and Practices, edited by Guy Somberg. CRC Press, 2017,
pp. 197–208.
Chapter 9
Voice Management
and Virtualization
Robert Gay
Epic Games
CONTENTS
9.1 The Need for Voice Management 133
9.2 Sonifying a Forest 134
9.3 The Single Cap Trap 135
9.4 Real Voice Pools 135
9.5 Virtual Voice Pools 136
9.6 Reviving the Dead 137
9.7 Real Trees in a Virtual Forest 137
9.8 Rule Building 138
9.9 Virtual Pool Rules 139
9.9.1 Time-Based Rules 139
9.9.2 Distance-Based Rules 139
9.9.3 Volume-Based Rules 140
9.9.4 Voice Stealing 140
9.9.5 Realization 140
9.10 Runtime Asset Caching 141
9.11 Dynamic Pool Allocation 142
9.12 Conclusion
142
133
134 ◾ Game Audio Programming 3
systems attempts to play more sounds that the hardware can or should
sonify. Voice management encompasses multiple features and design pat-
terns which can be implemented to some level or another as your engine
or game requires. Regardless of scale and complexity, every project needs
to consider voice management—preferably as early in development as
possible.
Building an audio engine that caters to various types of games with
sonically complex scenarios requires complex systems of voice man-
agement. From this perspective, it is easier to scale features and func-
tionality down than up. Therefore, it is preferable to provide voice
management features in the hands of the sound designers which can be
enabled or disabled at will. Even if it is only the programmer that is to
be fiddling with voice management parameters, providing such a tool-
set remains useful for flexibility and iterative development. Breaking
out voices into well-organized data structures that are configurable in
editor or development builds allows for rapid iteration and scalability,
which is paramount in dialing in the final game’s experience. This is
crucial, particularly when time becomes scarce and the development
cycle is nearing completion.
This chapter does not intend to indoctrinate a particular methodology
to manage voices, nor does it aim to hyperfocus on a particular aspect.
An entire textbook could be written on this topic in much further detail.
Rather, this chapter serves as an introduction to what voice management
is, an overview of the typical feature set it entails, and how a programmer
can begin building a system from the ground up that is readily extensible
and scalable.
require similar behavior to load and unload chunks of audio from disk.
Another type of pool may be made for sounds that are compressed using a
certain format that may have hardware restrictions on the number avail-
able to be decompressed at one time. Yet another could be a reserved num-
ber of real voices that are allowed to stop gracefully, fading over a quick
time period to avoid pops in audio. If a maximum single cap has been
instigated, best practice is to ensure the maximum number of voices for
each type of real voice pool is at or below the maximum single cap.
Because real voice pools deal with hardware-level rendering, they are
typically managed on a dedicated audio render thread. They may be soft-
ware imposed to avoid performance overage stressing other critical engine
systems or may be hardware imposed.
The real voice pool is performance-critical, so when a real voice pool
reaches its maximum, we assume that resources are starved. This situation
may manifest as undesired audible behavior such as sounds immediately
stopping or starting delayed. We can mitigate abrupt interruptions, avoid-
ing pops by applying fast transitions at the buffer level. Regardless, these
pools are generally a last line of defense for avoiding resource starvation
and constraining your audio system’s runtime performance and memory
characteristics. Therefore, it is recommended to stress test your game or
audio engine in many voice-heavy scenarios to determine what these pool
limits should be set at in order to avoid sudden, perceptible interruption.
When a real voice pool reaches its limits, it is crucial to include adequate
logging and other debug information in order to be able to determine how
best to tune the voice pools and their respective maximum distributions.
playing more can lead to sonic confusion, while conveniently and conse-
quently culling sounds from stressing the real voice pool system. Virtual
pool voices may be tagged as a member of a single pool or multiple pools
within the engine’s content management system. By tuning the virtual
pool limits, sound designers have the indirect power to avoid real voice
pool saturation and sonic artifacts therein.
a real voice can then be recreated and played, a process called realization.
A real voice is typically comprised of an active object that is rendering
output to the hardware. A virtual voice is, ideally, a minimal set of data
required to determine whether or not a real voice should be created. In the
event that there is not enough space in one of the pools that a virtual voice
subscribes to when playback is requested, it should be virtualized imme-
diately without ever initiating a request to start a real voice.
In order to determine whether a virtual voice is eligible to play back
as a real voice when its subscribed virtual pools are saturated, this mini-
mal set of data is evaluated against a ruleset. This virtualization data is
a combination of runtime and static information about the sound asset.
Static virtualization data can be extracted when the assets are serialized or
compressed. If a sound’s virtualization ruleset requires volume data, one
common technique is to store a coarse array of root mean square volumes
and a corresponding seek table in order to allow for seeking and resuming
based on the elapsed time of the virtual voice.
As the game starts and stops sounds, space will be used up and freed,
respectively, in the virtual voice pools, requiring a mechanism to evaluate
virtualization logic. Typically, this logic happens on an audio logic thread,
which can run at a slower rate than the audio rendering thread. Depending
on the platform architecture, the audio logic can be evaluated on the main
game thread or on a dedicated thread. In situations where there is a dedi-
cated thread that needs to poll the gameplay state, the update rate of the
audio logic thread should be both configurable and independent of the
gameplay update rate.
9.8 RULE BUILDING
Each virtual pool has a set of rules that are evaluated when the pool
becomes saturated from the advent of a new playback request. These rules
determine whether a voice should continue playing as a real and virtual
voice (an active voice), as just a virtual voice (said to be virtualized), or
stopped entirely. Designing and applying these rules allows sound design-
ers to walk a balance between performance and aesthetics and to avoid
flooding virtual voice pools (and by extension real voice pools). A good
set of rules will have a minimal impact on the listening experience while
managing its actual performance.
The most basic rule is a Boolean value that determines whether a vir-
tual pool is active or not, which is a useful tool for both debugging and
Voice Management and Virtualization ◾ 139
isolation. This rule can help determine if sounds are stopping or cutting
out due to this system or another independent gameplay mechanism.
Using this basic rule effectively disables the pool: all playback requests
create a virtual voice in a single pool unrestricted by any rules or voice
limit.
Rule design should be as flexible as possible, since the rulesets will
undergo many iterations throughout the development lifecycle. Different
classes of sounds will need different rules, and the rules may even change
based on gameplay state. It is best to design the rule system in a way that
can be parameterized easily.
The first stage of evaluating individual voice rules falls into three basic
classes: time-based rules, distance-based rules, and volume-based rules.
9.9.1 Time-Based Rules
Time-based rules compare how much time remains for a voice and a pre-
scribed value. For instance, a gunshot sound has a quiet tail, and its virtual
pool is saturated. The sound has been playing as a real voice for 5 seconds,
while only having a remaining playtime of 0.2 seconds. A time-based rule
may declare that beyond this limit, it is no longer a candidate for virtual-
ization and should be killed. Correspondingly, if the same sound has been
virtualized for 0.2 seconds and has 5 seconds of playtime remaining when
a slot then becomes available in the pool permitting it to play, it may or
may not make more sense to eject the sound from the virtual pool and no
longer process it as a virtual voice. In this case, it defers to the realization
settings (see Section 9.9.5) in order to determine whether to realize or kill
the voice.
9.9.2 Distance-Based Rules
Distance-based rules control whether a voice should be real based on the
sound source’s max distance (the distance at which its attenuation curve
reaches a terminally zero volume) plus an optional distance buffer. This
buffer avoids abrupt stopping and starting of the sound if the listener is
140 ◾ Game Audio Programming 3
9.9.3 Volume-Based Rules
Volume-based rules evaluate whether a sound is a candidate for virtual-
ization or eviction based on whether its final output volume is below a
certain threshold. For example, a sound could be considered for virtual-
ization when its volume drops below a threshold of −40 dB. Each volume
rule may be evaluated pre attenuation or post attenuation. Post attenu-
ation, rules combine the distance attenuation and volume attenuation
together, which can simplify the rulesets. However, if the sound designers
desire more fine-grained control, they can select pre-attenuation and then
add a separate distance rule if required.
9.9.4 Voice Stealing
After all of the rules have been applied to all of the sounds in a virtual
pool, there may still be more real sounds than the virtual pool has allotted.
In such a case, the virtual pool needs to virtualize or stop a playing sound,
ideally one that is low priority and which will not adversely affect the mix
if it is stopped.
The most common technique to determine which sounds to virtualize
is to sort the list of currently playing sounds by one or more predicates
and virtualize or stop the lowest-priority sounds until the voice pool limit
is reached. The sorting predicates are usually fairly simple: how long the
sound has been playing, its current audibility, or a priority value that the
sound designers can set.
9.9.5 Realization
When a virtual voice is no longer able to play due to its pool being satu-
rated, it can be either stopped or virtualized. Correspondingly, when a
virtual voice is realized, there are a few potential ways in which it can
continue playback. Table 9.1 summarizes the various realization settings
that sound designers can choose from.
Voice Management and Virtualization ◾ 141
9.12 CONCLUSION
Approaching voice management as a collection of real and virtual voices
assigned to real and virtual pools allows the rendered sound to be split
from the voice’s gameplay lifetime. Splitting the logic into individual voice
rules, voice stealing rules, and a voice’s virtualization rule makes the
whole process of determining playback priority manageable and scalable.
Chapter 10
Screen-Space Distance
Attenuation
Guy Somberg
Echtra Games
CONTENTS
10.1 Introduction 144
10.2 Distance Attenuation Review 144
10.3 The Problem with Action RPGs 145
10.4 The Meaning of Distance 146
10.5 Converting to Screen-Space 147
10.6 Screen-Space Distance Algorithm 148
10.6.1 Pixels Are Not Meaningful 148
10.6.2 The Range Is Too Small 149
10.6.3 Using the Wrong Camera 151
10.7 Next Steps 153
10.8 Rectangular Distances 154
10.9 On-Screen Debug Visualization 156
10.9.1 Describing the Shape 156
10.9.1.1 Circular Shape 156
10.9.1.2 Rectangular Shape 156
10.9.2 Debug Rendering Algorithm 158
10.9.3 Example Code for Unreal Engine 160
10.10 Conclusion
165
References165
143
144 ◾ Game Audio Programming 3
10.1 INTRODUCTION
Action RPGs like Torchlight, Diablo, and Path of Exile have many distinc-
tive challenges in their audio. One of the most fundamental challenges is
that of panning and attenuation, which are shared with many other games
that have a third-person perspective on the action. The problem is that the
distance to the camera is not a meaningful measurement at all in these
games—rather, it is the distance to the player that matters.
The problem of using the distance to the player for attenuation has
already been solved, and the solution is taken as a given in this chapter.
The thing that we’re going to do is take a step back and ask what we mean
by “distance.”
TABLE 10.1 A Selection of Max Distances and Whether They Succeed in Matching the
Sound Designer’s Intention
Max Distance Audible Onscreen Inaudible Offscreen
10 meters True at the bottom of the screen True everywhere except at the
but only covers half the screen bottom of the screen
or less on the sides and the top
15 meters True for roughly the bottom half True for roughly the top half of
of the screen the screen
20 meters True everywhere except the upper Only true in the upper corners
corners
25 meters True everywhere False everywhere
useless for mixing and tuning. This is particularly true in ARPGs because
in general, the things on the screen are the ones that matter.
in UGameplayStatics::ProjectWorldToScreen(). In Unity, it is
Camera.WorldToScreenPoint(). As handy as these functions are, we can-
not use them with their default inputs (that is, the player’s active camera)
and in their default form—at least not directly. We’ll take a look at why
this is true in the next section.
std::optional<Vector2> WorldSpaceToClipSpace(
const Vector3& Point,
const Matrix4& View, const Matrix4& Projection)
{
auto ClipSpacePosition =
Screen-Space Distance Attenuation ◾ 149
Figure 10.4 shows how this code has affected the coordinate space.
FIGURE 10.4 Screen-space coordinate system centered at the center of the screen.
150 ◾ Game Audio Programming 3
FIGURE 10.5 FMOD Studio Spatializer with minimum distance of 0.6 and
maximum distance of 1.0.
std::optional<Vector2> WorldSpaceToAudioSpace(
const Vector3& Point,
const Matrix4& View, const Matrix4& Projection)
{
auto ClipSpacePosition =
Projection * View * Vector4(Point, 1.0f);
if (ClipSpacePosition.W == 0.0f)
return std::nullopt;
ClipSpacePosition /= ClipSpacePosition.W;
return AudioSpaceScale * Vector2{
ClipSpacePosition.X, ClipSpacePosition.Y };
}
FIGURE 10.6 Screen-space coordinate system centered at the center of the screen
and scaled to 20 units.
FIGURE 10.7 Player character in a test level with the camera zoomed in.
152 ◾ Game Audio Programming 3
that the fully zoomed-out view from Figure 10.2 is what the player should
be hearing, no matter how far in they have zoomed in their camera.
The precise details of how to accomplish this are very game-specific.
They depend on how the camera is placed in the world, whether it is a
right- or left-handed coordinate system, which axis is “up,” and various
other details. Let’s start with some pseudocode to show the shape of
the code:
Real code is rarely so pithy. If you’re using Unreal, then the code will look
something like this (with error checking and some game-specific code
elided for brevity):
FMatrix GetAudioViewProjectionMatrix(
const TOptional<FVector3>& OverridePosition,
FIntRect* OutViewRect)
{
// Find the camera component. Details are game-specific, so
// this is a fakey placeholder.
UCameraComponent* CameraComponent = GetCameraComponent();
// Grab the camera view info from the camera. We will presume
// that the camera’s view info does not
// change meaningfully as the camera moves around.
constexpr float UnusedDeltaTime = 0.0f;
FMinimalViewInfo ViewInfo;
CameraComponent->GetCameraView(UnusedDeltaTime, ViewInfo);
FSceneViewProjectionData ProjectionData;
ProjectionData.SetViewRectangle(UnconstrainedRectangle);
ProjectionData.ViewOrigin = ViewInfo.Location;
ProjectionData.ViewRotationMatrix =
FInverseRotationMatrix{ ViewInfo.Rotation } * FMatrix{
FPlane{0, 0, 1, 0},
FPlane{1, 0, 0, 0},
FPlane{0, 1, 0, 0},
FPlane{0, 0, 0, 1} };
FMinimalViewInfo::CalculateProjectionMatrixGivenView(
ViewInfo, AspectRatio_MajorAxisFOV, nullptr, ProjectionData);
return ProjectionData.ComputeViewProjectionMatrix();
}
We have sneaked ahead and added a couple of features to this function that
we will need later on: an override position and an output parameter that fills
in the view rectangle. We will be using these for debug visualization later on.
10.7 NEXT STEPS
With all of these details taken into account, our algorithm is now subtly
but importantly modified:
154 ◾ Game Audio Programming 3
With this algorithm hooked up, sound designers can now start to assign
minimum and maximum distances to their sounds that are in screen
space. If there is already a pre-existing set of sounds, then they will have
to go through all of their existing assets and rebalance their ranges.
But even this algorithm is a little bit off. By taking the 2D distance
between the points, we are describing a circle in screen space around
which our sound is audible, rather than actually describing whether or not
the sound source is on the screen. We will need one more tweak in order
to fully describe our screen-space distance attenuation.
10.8 RECTANGULAR DISTANCES
Our coordinate system has a value of 20 units at the edges. If the listener
position is in the center of the screen, then the positions at the centers
along the edges of the screen will all have a distance of 20 units, and the
corners will all be 20 2 units away. In order for a sound to be audible
while it’s on the screen, the sound designers will have to set maximum
distances of over 28, which is far larger than intended.
What we actually want is a setup such that every point along the edge of
the screen is 20 units away from the attenuation position, no matter where
it is. Note that this is subtly different from having it be 20 units away from
the center of the screen—we still want to take the attenuation position
into account. Figure 10.8 shows how the distances need to work: we break
the screen into quadrants, and each quadrant’s axes are scaled to 20 units
away from the attenuation position.
In order to create this projection, we project our point and the attenu-
ation position into screen space, scale each axis of the projected point by
the size of the quadrant, and then return the maximum of the x and y
coordinates. We can express this in code thus:
20
20 Center
Attenuation position
20
20
float GetDistanceSquared(
const Vector2& ProjectedPosition,
const Vector2& ProjectedAttenuationPosition)
{
auto XDistance =
ProjectedPosition.X – ProjectedAttenuationPosition.X;
auto YDistance =
ProjectedPosition.Y – ProjectedAttenuationPosition.Y;
156 ◾ Game Audio Programming 3
But once again, we are faced with a dilemma whether we use circular or
rectangular coordinates, because while the sound designers have a mean-
ingful value that they can understand for any given sound, they have no
way to visualize it in the world.
10.9.1.1 Circular Shape
If the sound is at the center of the screen, then the shape that it will draw
on the screen is a circle centered about the origin, stretched out to an oval
at the aspect ratio of the rendered viewport. Figure 10.9 shows how we
build our shape in screen space. The nice thing is that, because our coor-
dinate system is resolution-independent, we can operate on the circle from
Figure 10.9b, and it will come out looking like an oval. We want to take
this oval and project it into the world such that the shape will still look like
an oval when projected back into screen space.
10.9.1.2 Rectangular Shape
If the sound is at the center of the screen, then the shape that it will draw
on the screen is a square centered about the origin, stretched out to a
rectangle at the aspect ratio of the rendered viewport. Figure 10.10 shows
how we build our shape in screen space. As with the circular shape, we
Screen-Space Distance Attenuation ◾ 157
(a)
(b)
(c)
FIGURE 10.9 Progression of circular shape. (a) Starting circle. (b) Circle scaled
to the desired radius (10 units = half the screen for this example). (c) Circle
stretched out to screen space.
158 ◾ Game Audio Programming 3
(a)
(b)
can operate on the rectangle from Figure 10.10a, and our shape will be
rendered correctly.
void DrawDebug()
{
// [Detail 1]
std::vector<Vector3> Points;
Plane PlayerPlane{ AtenuationPosition, UpVector };
// [Detail 2]
// [Detail 3]
// Draw one last line segment connecting the end of the circle
// to the beginning
DrawLine(Points.back(), Points.front(), Color);
}
There are three extra details that we have not yet covered that are marked
in comments in the pseudocode. Let’s fill those pieces in.
First, all of this so far has been ignoring the sound’s position and
just using the origin. In order to determine where to center our circle,
we must take our sound’s source position and project it into screen
space—but at the elevation of the attenuation position. If we do not adjust
the elevation, then the circle that we are drawing will not look correct
if the player is not at the same elevation as the sound. We will use our
GetAudioViewProjectionMatrix() function from Section 10.6.3, with
160 ◾ Game Audio Programming 3
the overridden position. This allows us to fill in our [Detail 1] with the
following:
Let us skip the second detail for a moment and jump to the third detail.
If you implement this debug drawing function unmodified, your debug
draw circles will likely be invisible because they will be at the same Z posi-
tion as the ground. They will either be “Z fighting” and end up flickering,
or they will be completely invisible. In order to avoid this, we push the
debug draw circle up slightly in [Detail 3]:
Point.Z += VisibilityAdjustment;
Finally, we can get to the second detail. Now that we can actually see our
shape, it is close to but doesn’t actually match up with where the sound
is audible. The reason for the discrepancy is that this point is where the
sound would have to be if the attenuation position were at the sound’s
location, but what we want is where the attenuation position should be
given the sound’s location—the exact opposite. Fortunately, this is easy to
rectify by flipping the X and Y axes around the sound’s position. We can
now finish our algorithm by filling in [Detail 2]:
Point.X = SoundLocationOnGround.X –
(TargetLocation.X - SoundLocationOnGround.X);
Point.Y = SoundLocationOnGround.Y –
(TargetLocation.Y - SoundLocationOnGround.Y);
Finally, with all of these details filled in, we can draw our shape. When
using circular distance, the shape looks like an oval when viewed from the
camera’s default angle (Figure 10.11) but appears to be a strange, oblong,
off-center egg shape when viewed from above (Figure 10.12). When using
rectangular distance, the shape looks like a trapezoid when viewed from
the camera’s default angle (Figure 10.13) and is a slightly oblong trapezoid
when viewed from above (Figure 10.14).
FIGURE 10.11 Circular min and max distance debug display from the game camera.
FIGURE 10.12 Circular min and max distance debug display from above.
FIGURE 10.13 Rectangular min and max distance debug display from the game
camera.
162 ◾ Game Audio Programming 3
FIGURE 10.14 Rectangular min and max distance debug display from above.
Engine. Note that we take advantage of the extra parameters that we added
into GetAudioViewProjectionMatrix() earlier in order to override the
position and exfiltrate the view rectangle.
void DrawDebug(bool bUseCircleShape)
{
// Get the sound’s position on the ground
auto OriginalSoundLocation = GetLocation();
auto SoundLocationOnGround = OriginalSoundLocation;
SoundLocationOnGround.Z = AttenuationPosition.Z;
int PointCount;
if (bDrawCircle)
{
PointCount = 16;
const float RadiansPerPoint =
2.0f * PI / static_cast<float>(PointCount);
Points.Reserve(PointCount);
for (int i = 0; i < PointCount; i++)
{
auto Angle = static_cast<float>(i) * RadiansPerPoint;
// Start with a circle in -1..+1 space
AddPoint(FVector2D{ FMath::Cos(Angle), FMath::Sin(Angle) });
}
}
else
{
PointCount = 4;
Points.Reserve(PointCount);
AddPoint(FVector2D{ -1.0f, -1.0f });
AddPoint(FVector2D{ -1.0f, 1.0f });
AddPoint(FVector2D{ 1.0f, 1.0f });
AddPoint(FVector2D{ 1.0f, -1.0f });
}
// Close the circle by connecting the last point to the first one.
DrawDebugLine(World, Points.Last(), Points[0], Color);
};
DrawShape(MinDistance, MinDistanceColor);
DrawShape(MaxDistance, MaxDistanceColor);
}
Screen-Space Distance Attenuation ◾ 165
10.10 CONCLUSION
Attenuating sounds by screen-space distance is a powerful and effective
technique. At a fundamental level, the algorithm for calculating panning
and attenuation is unchanged: we calculate the distance and reposition
the sound and the appropriate panning position based on the distance
to the attenuation position. What we have done in this chapter is take a
step back and redefine the concept of distance to be calculated in screen
space. By calculating the distance in screen space, we are able to provide
sound designers with a way to understand when a sound will be audible
that translates particularly well for ARPGs, so long as our debug display
is robust.
REFERENCES
Somberg, Guy. “Listeners for Third-Person Cameras.” Game Audio Programming
Principles and Practices, edited by Guy Somberg. CRC Press, 2017,
pp. 197–208.
Chapter 11
Jon Mitchell
Blackbird Interactive
CONTENTS
11.1 Introduction 168
11.2 How Are IMs Useful for Audio? 169
11.3 Storing Influence Maps 169
11.3.1 Grid 169
11.3.2 Sparse Grids 171
11.3.3 “Infinite” Influence Maps 171
11.3.4 Combining Different Representations 172
11.4 Building the Maps 172
11.4.1 Adding Points 172
11.4.2 Adding Points across Cell Boundaries 172
11.4.3 Adding Radii 172
11.4.4 GPU Accelerated IMs 173
11.5 Updating 173
11.5.1 Event-Based 173
11.5.2 Continuous 174
11.5.3 Static 174
11.6 Querying 175
11.7 Debugging and Visualizing 175
11.8 Feature Case Study: Grid Activity Report (GAR) 175
167
168 ◾ Game Audio Programming 3
11.1 INTRODUCTION
Influence maps (IMs) are a well-established game AI technique, origi-
nating in RTS games. RTS AI needs to make high-level strategic deci-
sions about its goals, as well as low-level tactical decisions about how
individual units should react and navigate the game’s landscape. AI
players and units can’t see the game in the way the human player does, so
all their knowledge comes from inspecting the state of the game directly.
An AI unit may need to make hundreds of checks per second, such as
the following:
IMs can make the code to answer questions like this simpler. The game
map is divided into a grid, and grid cells are populated using an IM func-
tion representing a feature of the game data. For example:
Rather than complex code which needs to make direct queries of game
entity data, we can make simple mathematical checks:
Using Influence Maps for Audio ◾ 169
bool isUnitInDanger =
EnemyThreatMap.SumValuesInRadius(position, radius) <
PlayerThreatMap.SumValuesInRadius(position,radius);
Figure 11.1 shows an overview of the stages of creating and using an influ-
ence map.
Just like AI decisions, these queries require that the game gather a
context-specific state from the game objects, evaluates conditions on that
state, and chooses an appropriate response. Gameplay audio code already
frequently makes use of game AI techniques like finite state machines,
behavior trees, blackboards, and stimulus-response systems, and IMs
work extremely well in tandem with them.
can use a significant amount of memory, especially if you have many IMs
for different game contexts. Also, when the grid is sparse (that is, it has
very few active entries relative to its size), then not only are large sections
of memory sitting idle, but any queries that operate on large areas of the
grid will be needlessly performing operations as they traverse the mostly
empty array.
11.3.2 Sparse Grids
Rather than directly allocating the whole grid, a sparse grid keeps track
of which cells are active, and all cell accesses are indirect. With an empty
grid, the memory overhead is only the size of the tracking structures, and
summing the values of a mostly empty map will be fast. The downsides
come as the density of the grid increases. The cost of the accesses can add
up, and by the time the grid is full, the memory and CPU consumption are
much worse than a plain 2D array.
11.4.3 Adding Radii
Adding a point of influence with a center, a radius, and a falloff function
is essentially the same problem as rasterizing a filled, shaded 2D circle.
Especially for larger radii or high-resolution maps, computing these val-
ues every time can be expensive. In many cases, the radius and falloff
Using Influence Maps for Audio ◾ 173
functions are known ahead of time and remain constant for each game
entity, so these can be computed once and then cached. Continuing the
2D graphics analogy even further, using these pre-computed values is
basically the same as software sprite rendering.
11.5 UPDATING
11.5.1 Event-Based
Game events like impact damage, deaths, and collisions are transient,
and unless we track them somehow, this knowledge can’t be used by our
audio systems. We use IMs to track deaths, damage, and targeting changes,
updating the map every time an event happens. Maps have a fade-out value
174 ◾ Game Audio Programming 3
which is subtracted from each active cell every N seconds, essentially giv-
ing the game a windowed history of where events have recently occurred.
11.5.2 Continuous
If the IM function relies on the location of an object or set of objects, and
the client code needs this function to be spatially and temporally accu-
rate, you may have no choice but to build your IM every frame. This can
be a worst-case scenario for large IMs, involving gathering a large set of
game objects to retrieve the location from, clearing out a large grid, and
propagating the falloff values to many cells. This may cause performance
problems, but it’s easy to get the system up and running.
If performance becomes an issue, you can replace the underlying data
structures and implementation easily when optimizing. Since building
maps based on continuously updating values has the distinct stages of
gathering game state and writing the cells, this makes them good candi-
dates for implementation via job systems or other parallel approaches. If
the map is also double-buffered, then the current version of the map can
be used for querying while the next version is built over as many frames as
needed to maintain performance.
11.5.3 Static
IMs aren’t only useful for monitoring the history of game events or watch-
ing dynamic game state—there are lots of potential uses for a map that
is built once when the game starts, and is only queried (never updated)
subsequently:
Static maps of this sort can also be built offline from a game’s splat maps
or other level data or hand authored.
Using Influence Maps for Audio ◾ 175
11.6 QUERYING
Our maps implement a simple querying interface, allowing us to
either retrieve a single cell value or return the aggregate value of a
given area. This interface can be backed by any of the described storage
structures.
FIGURE 11.3 Screenshot of IM visualizer showing enemy threat values and the
clusters extracted from those values.
11.8.2 Feature Implementation
To help implement these contexts, we created maps for the following:
11.8.2.1 Region Thresholds
The values in our maps are constantly changing and are monitored to
determine when it’s most appropriate to play the speech events. The first
approach we tried was tracking the sum of the values in the GAR sectors.
This worked well for triggering our combat_casualties events. We speci-
fied different thresholds for low, medium, and heavy casualties and queued
the appropriate speech requests whenever the thresholds were crossed. It
didn’t work nearly as well for determining when to play combat_start
events, largely because the unit locations were continuously moving. We’d
hear dialog like the following:
Enemy engaged! Small force in northeast sector! Enemy Engaged!
Small force in north sector!
What was really happening, of course, is that a single, medium-sized
group of units was attacking from two sectors simultaneously. What we
178 ◾ Game Audio Programming 3
11.8.2.2 Cluster Analysis
Finding clusters, at least in a grid of values, isn’t too hard—for all active
cells in a grid, a cluster is the set of cells that can reach each other via
a path through the other active cells. This is very similar to cliques in
a graph. For each active cell in our map, we can recursively check the
active neighbor cells to see if they’re already used in a cluster, adding if
not. This essentially flood-fills the grid of active cells, stopping when we
hit inactive cells.
SearchNeighbours(cell,map,newCluster);
}
}
bool IsActiveAndFree(IMap map, Cell cell)
{
if (mAssignedCells.Contains(cell))
{
return false;
}
if (!map.ActiveCells.Contains(cell))
{
return false;
}
return true;
}
if(!IsActiveAndFree(map,curCell)
{
continue;
}
cluster.Add(curCell);
mAssignedCells.Add(curCell);
SearchNeighbors(curCell,map,cluster);
}
}
11.9 CONCLUSION
IMs are a feature from the game AI world, and it’s not immediately
obvious that they will be useful in a game audio context. However, by
connecting an AI feature to audio, we are able to implement features
that would not otherwise have been possible. Hopefully this gives you
some ways to think about implementing features you may not have con-
sidered—it’s always worth looking at techniques in use in other areas
of programming and thinking about how they can be applied to audio
features.
180 ◾ Game Audio Programming 3
REFERENCES
Lewis, Mike. “Escaping the Grid: Infinite-Resolution Influence Mapping.” Game
AI Pro 2, edited by Steve Rabin. CRC Press, 2015, pp. 327–342. http://www.
gameaipro.com/GameAIPro2/GameAIPro2_Chapter29_Escaping_the_
Grid_Infinite-Resolution_Influence_Mapping.pdf.
Mitchell, Jon. “Techniques for Improving Data Drivability of Gameplay Audio
Code.” Game Audio Programming Principles and Practices Volume 2, edited
by Guy Somberg. CRC Press, 2019, pp. 227–236.
Chapter 12
An Importance-Based
Mixing System
Guy Somberg
Echtra Games
CONTENTS
12.1 Managing the Chaos 182
12.2 The Importance of Context 182
12.3 Importance System Algorithm 183
12.3.1 Assign Each Object an Importance Score 184
12.3.2 Sort All Objects by Score 185
12.3.3 Place Sorted Objects into Importance Buckets 185
12.3.4 Apply Effects to Sounds by Bucket 187
12.3.5 Importance Changes over Time 188
12.4 Example Implementation 188
12.4.1 Calculating Importance Scores 188
12.4.2 Data Setup 189
12.4.3 Importance Bucket Assignment 189
12.4.4 Querying the Importance Bucket 192
12.4.5 Importance State 193
12.4.6 Applying Filters Based on Importance 197
12.4.7 Assigning Importance Buckets 199
12.4.8 Debug Display 200
12.5 Conclusion 202
References203
181
182 ◾ Game Audio Programming 3
The answer to all of these questions very much depends upon the con-
text of what’s going on in the game. A goblin brute attacking another
player is probably less important than the goblin warrior that is attacking
my player because of who it is targeting. The other player’s skills are prob-
ably less important than my player’s skills, but if it’s a buff that is targeting
my player, perhaps it is equally important.
Every game will have a different set of rules for what properties of the
context will make a sound more or less important than another. The first
step in creating an Importance system, therefore, is to determine what
makes a particular game entity more or less important in your game’s
context.
I= ∑s w
c
c c
FIGURE 12.2 Gameplay from Torchlight 3 with importance scores assigned for
relevant entities.
TABLE 12.1 Sorted List of Entities with Their Matching Importance Scores
Entity Importance
Player 3.0
Goblin Brute 2.27
Goblin Brute 2.13
Goblin Shaman 1.99
Goblin Warrior 1.72
Player Pet Eagle 1.70
Player Train Caboose Car 1.63
Player Train Middle Car 1.58
Goblin Warrior 1.55
Goblin Warrior 1.54
Player Train Lead Car 1.53
Other Player 0.88
Torch 0.84
Hammer 0.71
Minecart 0.65
Other Player Pet Cat 0.64
An Importance-Based Mixing System ◾ 187
12.4 EXAMPLE IMPLEMENTATION
This example implementation will use FMOD Studio as a back end. We
will hook into the playback of Events and attach some DSPs and then
manipulate the DSPs as the Event changes importance over time. Other
audio middleware may have different mechanisms describing this data,
but the principles are the same.
In this example code, we will be presuming a straw-man entity compo-
nent system containing Actors and Components, with reasonable acces-
sors, iterators, and weak pointers to them. This system does not actually
exist in this form and would need to be adapted to whatever game systems
you have. Furthermore, we will be using C++ standard library compo-
nents and algorithms.
12.4.2 Data Setup
First, we need somewhere in static configuration the number of impor-
tance buckets, their max counts, and their respective audio parameters.
This might look something like this:
struct AudioImportanceBucketParameters
{
// How many sounds are allowed in this bucket
int MaxCount;
// When displaying on-screen debug information, what color to use
Color DebugDisplayColor;
class AudioEngine
{
public:
//...
190 ◾ Game Audio Programming 3
private:
// other stuff...
void CalculateImportance();
int GetImportanceBucket(const Actor& Actor);
void AudioEngine::CalculateImportance()
{
auto& Settings = GameSettings::Get();
if (Settings.ImportanceBuckets.empty())
return;
ImportanceComponents.push_back(&ImportanceComponent);
}
ImportanceBuckets.clear();
ImportanceBuckets.resize(Settings.ImportanceBuckets.Num());
return MaxCount;
};
One important item to note (which is not reflected in the above code)
is that sometimes actors can have relationships that would affect their
importance. For example, if one actor is attached to another actor, then
the attached actor should probably be getting its importance score from
the actor that it is attached to.
12.4.5 Importance State
As we are tracking the lifetime of our playing sound in a state machine [2],
we can store the state of which importance bucket this particular sound
is in, as well as any parameters useful for fading. Note that in this code,
we are using an Initialize()/Shutdown() pattern so that we can include
the object directly into the memory of our playing sound, but that could
be replaced with a constructor/destructor pair if we’re willing to place the
tracking information into the heap or use some other mechanism for late
initialization such as std::optional.
struct ImportanceDSPFader
{
// The actual DSPs that we will be attaching to the DSP graph
FMOD::DSP* MultiBandEQ = nullptr;
FMOD::DSP* Fader = nullptr;
private:
void SetDSPParameters();
void UpdateImportanceBucket(int NewBucket);
};
194 ◾ Game Audio Programming 3
Now we just need one of those for each playing Event. The implementa-
tion of the various functions of this structure is fairly straightforward, so
let’s start with initialization and shutdown, where we will be creating and
destroying the DSPs:
FMODSystem->createDSPByType(FMOD_DSP_TYPE_FADER, &Fader);
if (Fader == nullptr)
return false;
void ImportanceDSPFader::Shutdown()
{
if (MultiBandEQ != nullptr)
{
MultiBandEQ->release();
An Importance-Based Mixing System ◾ 195
MultiBandEQ = nullptr;
}
if (Fader != nullptr)
{
Fader->release();
Fader = nullptr;
}
RemainingFadeTimeSeconds = 0.0f;
}
During the tick or update function, we will need to fade across buckets.
In this example, we will be fading parameters by hand, but middleware or
game engine libraries may provide either an automated way to perform
these fades or a different metaphor for implementing the effects.
// Figure out how far through our Lerp these parameters are
RemainingFadeTimeSeconds -= DeltaTime;
RemainingFadeTimeSeconds = max(RemainingFadeTimeSeconds, 0.0f);
To.HighShelfFilterGainDecibels,
LerpAmount);
Current.HighShelfFilterFrequencyHz =
Lerp(From.HighShelfFilterFrequencyHz,
To.HighShelfFilterFrequencyHz,
LerpAmount);
void ImportanceDSPFader::SetDSPParameters()
{
if (Fader == nullptr || MultiBandEQ == nullptr)
return;
{
// If we have a valid bucket, then start a fade to
// the destination bucket
To = Settings.ImportanceBuckets[NewBucket];
}
else
{
// We have an invalid bucket, which likely means that this
// is a sound that is attached to an actor that is not
// participating in the importance system, or that this
// sound was played before the importance score and bucket
// were calculated for the attached actor.
From = Current;
RemainingFadeTimeSeconds =
Settings.GetImportanceFadeTimeSeconds();
}
Finally, the only part of the public interface that isn’t related to lifetime or
update is a function to assign the current importance bucket:
for as long as the event is playing, we need to hook into the Event’s call-
backs that trigger when the event has actually started and stopped playing:
switch (type)
{
case FMOD_STUDIO_EVENT_CALLBACK_STARTED:
{
// Helper lambda for initializing the fader context
// and attaching the effects to the DSP chain. We use
// a helper lambda in order to keep the tabs under control.
auto CreateDSPEffects = [&]()
{
// Get the master channel group for the event. We will be
// attaching our DSPs to its head
FMOD::ChannelGroup* EventInstanceChannelGroup = nullptr;
EventInstance->getChannelGroup(&EventInstanceChannelGroup);
if (EventInstanceChannelGroup == nullptr)
return;
// Call our helper lambda. You can avoid giving this lambda
// a name and calling it by using an immediately invoked
// expression: [](){}(). For clarity, this code prefers to
// give it a name and call it.
CreateDSPEffects();
return FMOD_OK;
}
ImportanceFader.CurrentBucket =
AudioEngine.GetImportanceBucket(Instigator);
Next, during the Playing, Virtualizing, and Stopping states (that is, all of
the states where the sound is playing), we need to update the sound’s bucket
but only for looped sounds. One-shot sounds will be finishing soon anyway,
so there is no need to update their importance bucket as they are playing.
200 ◾ Game Audio Programming 3
ImportanceFader.Tick(DeltaTime);
}
12.4.8 Debug Display
With this system in place, it is important for the sound designers to be
able to visualize which sounds are important and why. There are three
variations that are useful for this debug display:
An Importance-Based Mixing System ◾ 201
FIGURE 12.4 Per-actor overlay debug display showing the actor name and its
importance score.
FIGURE 12.5 Per-actor overlay debug display showing the actor name, its impor-
tance score, and the individual score contributions.
• A detailed tag on each actor that shows the component scores that
add up to the final score, as shown in Figure 12.5. This debug dis-
play will be useful in helping the sound designers figure out why a
particular actor is more or less important than they were expecting.
12.5 CONCLUSION
With a little bit of data management, DSP wrangling, and a very simple
algorithm, an importance system can revolutionize how your mix works.
Importance is so fundamental, in fact, that it is now a part of my own
An Importance-Based Mixing System ◾ 203
REFERENCES
Chapter 13
Voxel-Based Emitters
Approximating the Position
of Ambient Sounds
Nic Taylor
CONTENTS
13.1 Introduction 205
13.2 Preliminary 206
13.3 Voxel Emitter Implementation 209
13.4 The Iterator 211
13.5 Attenuation Range and Voxel Size 213
13.6 Close to Zero 214
13.7 Near Field and Spread 216
13.8 Debugging 219
13.9 Weight Functions 220
13.10 Support Beyond Stereo: Z Axis 222
13.11 Support Beyond Stereo: 5.1 and More 225
13.12 Final Notes 232
References233
13.1 INTRODUCTION
In Game Audio Programming Principles and Practices Volume 2, Chapter 12,
“Approximate Position of Ambient Sounds of Multiple Sources” [1], I
discussed how ambient sounds are designed to represent an area or vol-
ume as opposed to a point emitter. For example, a loop of a riverbed does
not represent a single point in space but a volume or area representing the
cumulative sounds of a section of river. From there, this chapter covered
different approaches to approximate the position for an ambient sound
205
206 ◾ Game Audio Programming 3
using a point emitter that would move in real-time relative to the listener’s
position. In particular, the focus was on methods that computed the emit-
ter’s properties of direction, magnitude, and spread separately.
The method in Section 12.15—verbosely titled “Average Direction and
Spread Using a Uniform Grid or Set of Points”—has two useful applications:
approximating the position of ambient beds as the listener approaches and
visual effects with area such as a beam weapon or a wall of fire. But this
chapter left several implementation details for the reader to work out on
their own. This chapter will revisit the implementation in more detail, add
advice on debugging, explain edge cases, and discuss some extensions to
the basic algorithm.
The encapsulation of the position, spread, and algorithm will be more
succinctly called a “voxel emitter” or “grid emitter” for 2D.
13.2 PRELIMINARY
Before getting to the revised implementation, here is a quick review of
the theory used by the algorithm. The voxel emitter takes the listener
(or receiver) position r̂ and a collection of points, the voxel centers, and
returns a direction and spread value to approximate the position relative
to receiver.1
The direction is the sum of each voxel relative to the receiver scaled by
a weight function. Voxels farther from the receiver have less influence on
the final emitter position and behave as if each voxel were its own emitter
where the distance attenuates the volume or gain. The sum of all voxel
directions is called the total attenuated direction or σ̂ .
Let total attenuated direction be defined as
W ( vˆi )
σ̂ = ∑
i ∈ V
v̂i
v̂i
(13.1)
where V is the set of voxel center positions relative to the receiver position
r̂, v̂i is each voxel center position in V, v̂i is the magnitude or Euclidean
distance from the receiver position to v̂i , and W is the weight function.
Figure 13.1 shows the voxel emitter components for an example V.
The weight function should be zero outside of the attenuation range
of the sound. Inside the attenuation range, the weight function can be a
1 The hat symbol ˆ⋅ is used to differentiate vector variables from scalar variables.
Voxel-Based Emitters ◾ 207
FIGURE 13.1 Voxel emitter V represented by (a) the attenuation range α from
the receiver r̂, (b) the closest voxel used to compute the magnitude m, (c) the
position p̂ along total attenuated direction σ̂ , and (d) an arc showing the spread
amount µ which is about 0.25 in this example.
v̂i
1 − , if v̂i < α
W ( vˆi ) = α (13.2)
0, if v̂i ≥ α
( ) (
m = mini ∈ V vˆi = mini ∈ V W ( v̂i ) ) (13.3)
208 ◾ Game Audio Programming 3
σ̂ m
p̂ = + r̂ (13.4)
σ̂
Notice the closest distance is not the closest voxel boundary but the closest
voxel center. This was a somewhat arbitrary simplification. Section 13.7
(“Near Field and Spread”) below will cover handling positions close to the
voxel boundary.
Spread is the diffusion of the sound across speakers. Zero spread means
no diffusion and occurs when all of the voxels are in one direction rela-
tive to the receiver. Max spread, mapped to the value 1.0, represents full
diffusion and occurs when the receiver position is either surrounded
by or inside a voxel. Max spread also occurs if voxels are symmetrically
spaced around the receiver which will be discussed more in Section 13.6.
To compute spread, let the total weight be
w= ∑W ( v̂ )
i ∈ V
i (13.5)
and spread
σ̂
µ = 1 − µθ = 1 − (13.6)
w
Spread, µθ , is based on the cosine of the angle between each position v̂i and
the total attenuated direction σ̂ scaled by the weight function and nor-
malized by the total weight w. A larger angle formed from a voxel center
and σ̂ results in larger spread.2
The purpose of spread is to hide discontinuous jumps. As the receiver
moves around the voxel emitter, spread and magnitude should update
continuously relative to the movement. The total direction σ̂ can flip/
invert directions or change rapidly when σ̂ approaches zero. These dis-
continuities should not be audible if spread is at the maximum value.
2 Refer to Taylor [1] to see how the sum of weighted cosines reduces to σ̂ .
Voxel-Based Emitters ◾ 209
• voxel extent—The half dimension of the voxel (or grid cell) size.
The output is the position p̂ from Equation 13.4 and spread µ as well as
additional state information such as true/false if the sound was audible
and optional debugging information. The closest voxel center which
is used to compute the distance m from Equation 13.3 is tracked for
debugging.
struct AttenuatedPosition {
bool audible;
Vector position;
float spread;
int voxels_processed; // for debugging
Vector closest_voxel_center; // for debugging
};
template<typename VoxelContainer>
AttenuatedPosition VoxelsToAttenuatedPosition(
const VoxelContainer& voxels,
const Sphere& receiver, const float voxel_extent) {
int voxels_processed = 0;
for (const Vector voxel_center : voxels) {
const Vector direction = voxel_center - receiver.center;
// Early out if receiver is inside a voxel.
if (PointInsideVoxel(direction, voxel_extent)) {
total_direction = { 0.f, 0.f, 0.f };
closest_distance = 0.f;
closest_voxel_center = voxel_center;
++voxels_processed;
break;
}
const float distance = Length(direction);
if (distance < attenuation_range) {
if (distance < closest_distance) {
closest_distance = distance;
closest_voxel_direction = direction;
closest_voxel_center = voxel_center;
}
const float weight = attenuation_range - distance;
total_direction += (weight / distance) * direction;
total_weight += weight;
++voxels_processed;
}
}
3 I am using a coordinate system where x, y, and z correspond to left, forward, and up.
212 ◾ Game Audio Programming 3
FIGURE 13.2 Game world broken into chunks c1 , , cn with overlapping attenu-
ation range for a receiver r̂.
2D and the game is 3D, the z axis position might need to be computed in
real-time.4
Pseudocode for the iterator might be something like:
This is just one example of how the iterator might work. Another option
is to use a flood fill algorithm starting from the receiver position. Other
4 Trade-offs between doing the voxel look-up in real-time and baking the data into a custom data
representation stored with the world data requires experimenting. For example, the aforemen-
tioned 2D grid might store per voxel 1 bit for active/inactive state and use a few bits to specify a
discrete value to approximate the z axis offset. If your world is large, the storage size for a single
voxel emitter can become non-trivial in this uncompressed format.
Voxel-Based Emitters ◾ 213
data structures for space partitioning can also improve performance and
size requirements. A full discussion of all of these options depends on
the context of the game and goes beyond the scope of this chapter. But
I can describe the approach I use to make algorithm decisions for voxel
emitters.
When evaluating a voxel emitter in context, I start with the simple
implementation, which is a nested loop like the above pseudocode, com-
puting positions in real-time. Storage in the world or game data is also
handled naively. This allows for rapid iteration on tuning variables such
as voxel size and attenuation with the sound designer.
Once the sound designer and others agree, the voxel emitter sounds
decent; then the storage requirements can be estimated from the tuned
variables. By making a worst-case example in the test world, CPU usage
can be estimated deterministically and used to find the specific code that
is expensive via profiling.5 With an understanding of the memory and
CPU, decisions about the effort and complexity of optimizations can be
made objectively.
5 Changes to the algorithm can change the worst-case scenario setup. For example, switching from
a row by column search at the attenuation boundary to a flood fill from the receiver’s position
would early out on the first voxel if the receiver is inside.
214 ◾ Game Audio Programming 3
engine assumes the sound should be active in this ring, every other frame
will attempt to turn the sound on.6
One solution for both cases is to attempt to use the authored attenu-
ation range clamped to some maximum range. Then communicate this
with the sound designer or have a notification as part of the data pipeline
if the sound designer commits an attenuation range that is larger.
The voxel size also has a major impact on performance. Smaller voxel
sizes give better granularity, but at a certain size, the difference is not
audible (especially if the sound designer added their own spread). Small
voxel sizes require more memory and CPU both for the running game
and serialized data. Voxels which are too large cause the emitter direc-
tion to not be as precise at closer distances. Larger voxels are suitable if
the listener cannot get close to the source. A balance can be found by fac-
toring in the sound’s attenuation range and how close the receiver can
get. What I found is 1.0–1.5 meters seems to work well (or a voxel extent
between 0.5 and 0.75) for voxel emitters where the receiver can approach
such as an ocean or a river.
6 If the game engine expects the voxel emitter sound to be active and the sound is not active in the
audio engine, it might be worth logging a warning and stopping the voxel emitter calculation.
7 To get the magnitude of total direction to sum to a value less than FLT_EPSILON in game is quite
rare, but I have caught it happening naturally a couple of times.
Voxel-Based Emitters ◾ 215
(a)
(b)
A similar situation that can occur is that one or two axes cancel and
the third but most minor axis becomes dominant as the direction of p̂.
For example, if Figure 13.3 represents a 3D world looking at the x and y
axes and the two voxels were slightly above or below the receiver’s z axis
position, σ̂ will be greater than zero. The resulting vector will be length m
from the closest voxel but pointing above or below the receiver. In debug-
ging, this looks odd, even like a bug. As long as there are no other posi-
tional or ambisonic effects, this result is fine as spread will still be close to
the max value. Section 13.11 will propose two hypothetical solutions if the
z axis must be constrained in some way.
if (PointInsideVoxel(direction, kVoxelExtent)) {
total_direction = { 0.0, 0.0, 0.0 };
closest_distance = 0.f;
closest_voxel_center = voxel_center;
break;
}
When the receiver is inside a voxel, the effect should be that the listener is
“surrounded” by the sound. Spread will be at the maximum value of one.8
Because the algorithm aggregates distance using the voxel centers, and
ignoring that the voxels have volume, this causes a discrete jump of spread
at the voxel boundary as seen in Figure 13.4a.
One could use the corners of each voxel instead of the centers, but this
increases the amount of computation per voxel and can still lead to bound-
aries where spread changes rapidly. Another approach is to smooth the
transition with a linear interpolation starting some distance away from
the voxel boundary.
8 Note that closest voxel center is set to the voxel center which will still have some distance. Using
closest voxel center keeps debugging consistent. Ideally the authored sound’s attenuation does
not change within this distance. Otherwise a small vector in the forward direction of the receiver
would also work. Using a forward vector has the advantage that in stereo configurations, the
sound should be guaranteed to be spread to both speakers due to the audio engine’s pan rules.
Voxel-Based Emitters ◾ 217
FIGURE 13.4 (a) The spread of a group of 1 m voxels. (b) The spread of a group
of voxels using a 1.5 m near field.
This near field9 value ranges from zero to one and is used to interpolate
the remaining spread available after removing the computed spread, µ
(Equation 13.6), from the total available spread value:
The discussion so far has assumed that spread is the entire range from
zero to one. It is more likely that the sound designer will want to control
9 I call this the near-field range after the volume of area in a sound field close to the emitter where
the relationship between distance and sound level does not observe the inverse square law.
218 ◾ Game Audio Programming 3
FIGURE 13.5 Representation of the spread priorities in the range zero to one for
the authored µa, attenuated position µ p , and near-field µn spread values.
spread from the authoring tool too. The priority of spreads is shown in
Figure 13.5. In the same way, the near-field interpolation amount is scaled
by the remaining spread after subtracting the attenuated position spread;
this sum is scaled by the remaining spread after subtracting the authored
spread. Let the attenuated position spread be µ p , near-field spread be µn,
and the authored spread be µa:10
(
final spread = µa + (1 − µa ) µ p + (1 − µ p ) µn ) (13.7)
10 In my case, the game object data was guaranteed to never move in memory for the lifetime of the
sound, so I wrapped the spread value as an atomic.
Voxel-Based Emitters ◾ 219
13.8 DEBUGGING
Various components of the voxel emitter are helpful to visualize with
debugging both to identify bugs and for making the system transparent to
the sound designer. The voxel emitter can be broken into these debugging
components: the set of voxels, the attenuated position used as the emitter
for the audio engine, and the spread.
The set of voxels can be visualized in the game world by rendering a
circle or dot at the center of each voxel.11 It is useful to see both voxels
actively contributing to a playing sound and the inactive voxels. Also use-
ful is knowing which voxel is being used as the closest voxel center. The
voxel state can be visualized by color coding. For example, gray, yellow,
and green for inactive, active, and closest.
When there are multiple active voxel emitters, it is not easy to differ-
entiate which debug circles in game correspond to which voxel emitter.
But it is also likely whoever is debugging is interested in one voxel emitter
sound at a time. Instead of connecting the in-game voxel debug to the
global in-game audio debug or a single toggle, it is recommended to create
a separate toggle per voxel emitter instance.
The AttenuatedPosition of the voxel emitter is a single emitter posi-
tion to be passed into the audio engine. The corresponding sound event
should work with existing debugging both in-game and in the sound
debug window (or list). The sound debug window, which typically includes
the sound event name, distance to sound, and maybe virtual/active state,
can be customized for the voxel emitter sound. The spread and number of
voxels processed, voxels_processed, are useful for debugging.12
Seeing the numeric value of spread may not be meaningful enough on
its own. It can be difficult to distinguish from headphone listening how
“spread” the sound is and if the spread value is changing rapidly. If your
debug UI supports plots, capturing the history of spread can help catch
hard-to-identify value changes or verify if spread is increasing as expected
near voxel boundaries.
Lastly the debug images rendered for this book chapter, such as
Figure 13.4b or Figure 13.6, provide detailed offline debugging. Given a posi-
tion in the world, in a brute force fashion, x and y coordinates are iterated
11 I found that drawing the voxels in-game using other options such as drawing lines or projecting a
transparent overlay became too visually noisy.
12 Using a UI toolkit like Dear ImGui, the sound debug window can be modified to hide the extra
metrics of the voxel emitter in a collapsed dropdown. This is also where I added buttons to toggle
on/off the in-game voxel debug.
220 ◾ Game Audio Programming 3
FIGURE 13.6 Sparse voxel emitter using a “distant only” weight function.
(a) Estimated loudness at each point where white regions are louder. (b) Spread at
each point where dark regions are higher spread.
over computing the attenuated position result and writing the spread or
estimated loudness to a png file. Estimated loudness can be computed by
Rendering an image is helpful as anyone can file a bug with the world
position and you can visually inspect the situation first without having to
guess or try and listen for a possibly difficult-to-reproduce scenario.
13 Actually, the voxel emitter was the discretized version of an earlier algorithm to compute the
attenuated position along a spline. Because the spline version was based on integrals, the choices
in weight function were limited to analytical solutions. See Taylor [1] for more details.
Voxel-Based Emitters ◾ 221
(a)
(b)
FIGURE 13.7 Voxel emitter with one close active voxel and four more distant
voxels. (a) The direction of attenuated position is pointing away from the closest
voxel using a linear weight. (b) The direction of attenuated position is pointing
more in the direction of the closest voxel using a squared weight (and the spread
has slightly increased).
222 ◾ Game Audio Programming 3
Another type of weight function can model a voxel emitter that only rep-
resents the distant layer of the sound, and the “close” sound is triggered by
a separate point emitter. Each voxel would represent an independent point
emitter with a short attenuation range, and the set of all voxels represents
the distant sound with a much larger attenuation range as a voxel emitter.
As the receiver approaches a single voxel, the close sound would activate,
but the distant sound should move away.
This “distant only” weight function can be implemented by introduc-
ing an inner attenuation range. As the receiver enters the inner range,
the weight function will decrease for the corresponding voxel instead of
increasing. Outside the inner range, the weight function behaves as nor-
mal. One way to integrate this into VoxelsToAttenuatedPosition() is to
change the distance to the voxel prior to testing closest distance and com-
puting the weight:
...
float distance = Length(direction);
// Assign a distance farther away inside inner_attenuation_range.
if (distance < inner_attenuation_range) {
distance = Max(distance, attenuation_range *
(1 - distance / inner_attenuation_range));
}
if (distance < attenuation_range) {
...
W ( vˆi )
σz = ∑
i ∈ V
v i ,z
v̂i
= v0 ,z w0 , + v1,z w1 , ++ v N ,z w N (13.8)
where wi is the weight divided by magnitude of the voxel center v̂i . The
weighted average z axis coordinate, σ z , is then normalized so that the set
of all weights Ω is equal to one:
σz
σz = (13.9)
∑ i∈Ω
wi
The next step is to project the attenuated position p̂ onto the plane at σ z .
Let ẑ = {0,0, σ z } be a vector from the receiver and p̂′ be the new projected
{ }
position. That is p̂′ = px′ , p y′ , σ z for some new px′ and p y′.
The magnitude of p̂′ is the same as the magnitude of p̂ which is m.
Let p̂′′ be the 2D vector going in the x, y direction of p̂. This is will be
in the same x, y direction as p̂′. As shown in Figure 13.8, p̂′′ can be used
to connect a right triangle. Then we can find projected position p̂′ by
normalizing p̂′′ and adding ẑ:
p̂′ =
pˆ ′′ ( m 2 −σ z 2 ) + ẑ (13.10)
p̂′′
Because the projected vector has the same magnitude as the original vector,
spread will be the same as if the attenuated position were unaltered. The
difference is that the original vector’s position would update smoothly,
but p̂′ has the potential to make large discontinuous jumps. We can use
a method from earlier where we add one more level of spread. Let this
fourth spread interpolation be µ z . The interpolation can be the ratio of
the original z component pz to the new z magnitude σ z . Therefore, when
there is minimal projection or pz σ z , the interpolation will be close to
zero. The spread projected is
pz − σ z
µz = (13.11)
pz
FIGURE 13.8 Side view of the projection of the attenuated position p̂ to the plane
of the average z coordinate σ z to form p̂′. The magnitude of p̂ is equal to p̂′.
Voxel-Based Emitters ◾ 225
(a)
FIGURE 13.9 Voxel emitter in which voxels are on either side of the receiver.
(a) Using the attenuated position algorithm with a single emitter.
14 It is my opinion that the example in Figure 13-9 is somewhat contrived, and for a listener to notice
a difference between the single emitter solution and the more complex algorithm below may be
unlikely. However, there could be cases that I did not think of and the solution does have some
interesting properties.
226 ◾ Game Audio Programming 3
(b)
(c)
FIGURE 13.9 (c) Typical 5.1 arrangement mapped from virtual speakers.
struct VirtualSpeakerSet {
struct SpeakerData {
float angle = 0;
float total_weight;
};
std::array<SpeakerData, kNumVirtualSpeakers> speakers;
Vector closest_voxel_center = { 0.f, 0.f, 0.f };
VirtualSpeakerSet() {
const float angle_dist = 2 * (float)M_PI / kNumVirtualSpeakers;
int speaker_id = 0;
// Initialize speaker angles evenly around a circle.
for (auto& speaker : speakers) {
speaker.angle = angle_dist * speaker_id++;
speaker.total_weight = 0.f;
}
}
};
228 ◾ Game Audio Programming 3
template<typename VoxelContainer>
VirtualSpeakerSet VoxelsToVirtualSpeakers(
const VoxelContainer& voxels, const Sphere& receiver,
const float voxel_extent) {
On either the game engine side or the audio engine mixer, the vir-
tual speakers need to be mapped to the current speaker configuration
of the game. VirtualSpeakerSetToSpeakerArrangement() is similar
to VoxelsToAttenuatedPosition() with a couple of changes. First the
speaker arrangement is unlikely to have a speaker directly at 0°, and so a bit
of extra care is required to
wrap around the circle. The Wwise implementation: To trigger
speaker angles are assumed playback and get information about the
to have already been trans- authored sound’s gain, the closest voxel
lated to match the receiv- center can be used as the emitter’s posi-
er’s forward direction. tion sent to the audio engine. Registering
Secondly after the weights the callback AK_SpeakerVolumeMatrix
are accumulated, they can be used to alter the per-speaker
must be normalized to the gain values.
desired gain values.
230 ◾ Game Audio Programming 3
template<int N>
std::array<float, N> VirtualSpeakerSetToSpeakerArrangement(
const VirtualSpeakerSet& speaker_set,
const std::array<float, N>& speaker_angles,
const float gain_rms) {
where cn are the gain values per channel and n is the number of channels.
In this case for 5.1, n = 5.
The speaker angles in the example were based on the ITU-R BS.775-3
reference loudspeaker arrangement as shown in Table 13.1. This repre-
sentation may differ depending on the sound engine. Figure 13.10 shows
the per-speaker gain from the example used in Figure 13.9. For simplic-
ity, near field has been left out as well as the sound’s authored spread
which would also need to be applied to the normalized weights. This
combination of managing the attenuation and the speaker arrangements
may heavily overlap with the audio engine’s functionality. Some care
should be taken to not end up rewriting entire systems from the audio
engine.
One improvement over the single emitter approach used in
VoxelsToAttenuatedPosition() is that each virtual speaker could have
its own occlusion value (or entirely separate DSP/signal processing chain
for that matter). This extra control could model a setup like a river bend
that goes behind a wall but only on the right-hand side relative to the
receiver.
TABLE 13.1 Speaker Angles for a 5.1 Setup Based on the ITU-R
BS.775-3 Reference Loudspeaker Arrangement
Channel Degrees from Center Coordinate
Left 30 π/3
Center 0 π/2
Right 30 2π/3
Right Surround 120 7π/6
Left Surround 120 11π/6
232 ◾ Game Audio Programming 3
FIGURE 13.10 Gain plots of the five 5.1 speakers. The brighter areas are louder.
(Notice there is no near field applied.)
13.12 FINAL NOTES
This chapter extended the idea of using a grid or set of voxels to approxi-
mate the position of a sound that represents an area or volume of space.
Integration of these approaches can involve a large commitment on devel-
opment time as well as CPU and memory resources. As the feature’s
Voxel-Based Emitters ◾ 233
REFERENCES
Chapter 14
Improvisational Music
Charlie Huguenard
derelict.computer
CONTENTS
14.1 All That Jazz 235
14.2 Music System Foundations, Lightning Round 236
14.2.1 Sound Generator 236
14.2.2 Clock 237
14.2.3 Sequencers 238
14.3 Musician Recipes 239
14.3.1 Designing the Conductor 239
14.3.2 Musician Design Considerations 240
14.3.3 Funky Drummer 241
14.3.4 All About That Bass 243
14.3.5 Spacey Chimes 245
14.3.6 The Soloist 246
14.4 Wrapping Up 248
References 249
235
236 ◾ Game Audio Programming 3
14.2.1 Sound Generator
In order to make music, you’ll have to make some sound. We use all kinds
of sound generators—horns, bells, drums, synthesizers, and a huge variety
of software instruments.
There are many kinds of samplers with myriad settings. To demon-
strate this system, all you need is what I like to call the “one-shot” sampler.
Improvisational Music ◾ 237
A one-shot sampler takes a single audio file and plays the file at different
speeds based on incoming musical notes.
Let’s assume we always create tonal audio files at middle C (261.626 Hz,
MIDI note 60). If we wanted our sampler to play the file one octave above
middle C (523.251 Hz, MIDI note 72), we would tell it to play the file two
times as fast. Similarly, if we wanted to play one octave below middle C
(130.813 Hz, MIDI note 48), we would tell the sampler to play the file half
as fast. The formula for determining the playback speed based on a MIDI
note is
speed = 2(midiNote−60)/12
Sampler:
SoundFile file
FilePlayer player
14.2.2 Clock
Most interactive music systems require something to tell time or to send
a signal when a musical interval is encountered. We generally call this a
clock or metronome (timer tends to imply something that’s not precise
enough for audio or musical timing). These musical clocks can be either
discrete or continuous—a decision which affects the design of the music
system.
For example, a traditional DAW timeline is typically continuous.
This requires plugins such as beat-synced effects to poll the timeline to
determine when beats are going to happen. Effects are required to detect
“edges” to demarcate musical events like a quarter note.
A pulse-based clock like one you might see in a modular synthesizer is
an example of a discrete musical clock. It sends out a “pulse” periodically,
238 ◾ Game Audio Programming 3
and pieces of the system use that pulse to trigger or otherwise manipulate
sound. Many times, using a discrete clock is a subtractive process. If the
initial pulse is at 16th note intervals, a clock divider in the chain might
take every fourth pulse, creating a quarter note pattern. An additional
clock divider could take every other quarter note, playing just the first
and third quarter notes in a measure or shifting to the second and fourth.
By chaining clock dividers and other logical modules, you can create all
kinds of musical patterns. Even with a discrete clock, though, you are by
no means limited to dividing the initial pulse (look up “clock multipliers”
for examples of this).
For this example, we’ll use a continuous clock, which looks something
like this:
Clock:
float tempo // in quarter notes per minute
float startTime // in seconds using the audio engine clock
bool playing
function Play():
startTime = currentEngineTime
playing = true
function Stop():
playing = false
14.2.3 Sequencers
“Sequencer” is an overloaded term, even when you narrow it down to
musical uses. We could be talking about a MIDI sequencer tool like those
Improvisational Music ◾ 239
found in a DAW or a groove box step sequencer. And there are several
smaller kinds of sequencers that transform pitches, select pulses, and rese-
quence breakbeat samples. The general definition I like to use is that a
sequencer is something that processes control signals in a musical system,
much like how an effect processes an audio signal. A control signal could
be a pulse, a MIDI note, a knob, a chord, a sensor, or anything else that
could eventually manipulate a sound.
14.3 MUSICIAN RECIPES
Now that we have a concept of the underlying systems that enable real-
time composition, let’s think about how to build the whole thing. I like to
think of real-time music systems in terms of three layers:
• The Conductor determines the overall shape of the music and gen-
erates control signals.
• The Musicians process the control signals from the conductor.
• The Instruments turn the control signals from the musicians into
sound.
use that in our musician logic for selecting notes to play. If we describe a
note like so:
Note:
int noteNumber // MIDI note number
float strength // 0-1, how “comfortable” or “strong” is this note?
Chord:
float posBars
Note[] chordNotes
Note[] scaleNotes
Using this information, we can have the conductor cycle through chord
changes and notify the musicians:
Conductor:
Clock clock
Chord[] chords // chords, sorted by position
float chordLengthBars // at what point do we loop?
Musician[] musicians // we’ll get to this in a minute
function Update():
if (!clock.playing):
return
Musician:
Clock clock // hang on to a reference so we can do conversions
Chord currentChord
float lastTimeBars
float intensity // 0.0-1.0, the amount of musical movement
We’ll build on this for each of the individual musicians, and some addi-
tional “homework” for extending each musician will be found at the end
of each section.
14.3.3 Funky Drummer
Many interactive compositions call for a rhythm section, so let’s make a
drumbeat generator. This “drummer” will output a beat based on a couple
of inputs—tempo and intensity. For the purposes of this example, we will
assume 4/4 timing.
A drumbeat can be conceptually broken down into the following:
A very simple drum kit might include a kick drum, snare drum, and hi-
hat. The kick anchors the beat, so its most prominent hits usually end
up on strong beats, such as the downbeat of every measure. The snare
compliments the kick by providing a “back beat,” typically landing on the
weak beats of the measure (beats 2 and 4 in 4/4 time). The hi-hat fills
in space and, depending on the feel, can land pretty much anywhere in
the measure. Figure 14.1 shows one example of a regular-time drum beat
in 4/4.
We could just play this beat back over and over, but that wouldn’t be
very interesting. One way to vary a beat is to add and remove notes to
make it feel more or less “intense” or “busy,” as shown in Figure 14.2.
As the intensity input changes, we add or remove these embellishments.
Building on the musician interface above, a drummer that plays an embel-
lished beat could look like this:
Drummer:
Sampler kick, snare, hat
Note[] kickNotes, snareNotes, hatNotes
Homework
• Add more instruments.
• Add fills that could play every few bars.
• Omit or add embellishments using randomness to add variation.
• Switch between half, normal, and double time feels based on intensity.
TonalMusician:
Sampler sampler // we just need one for these
Real bassists play around in the key quite a bit. But for the purpose of this
example, let’s assume the bass line should stick within the chord tones and
primarily the root note of the chord. The bass musician might look like this:
Bass:
function UpdateNotes(float timeBars):
// find out the positions in 16th notes
int last16th = floor(lastTimeBars * 16) % 16
int this16th = floor(timeBars * 16) % 16
lastTimeBars = timeBars
Homework
• Use scale tones for embellishments, instead of just chord tones.
• Add “memory” to provide a sense of repetition.
14.3.5 Spacey Chimes
Texture provides harmonic context and movement. While many times
you’ll hear a guitar or piano play full chords, another approach to add tex-
ture is to arpeggiate the chords. Simply put, an arpeggio is a sequence of
notes that move around in a chord. If you spent any time in music lessons,
you probably practiced running scales and arpeggios. If not, the typical
arpeggio goes up, then down a scale, skipping every other note, as shown
in the figure below.
So, let’s look at how to make an arpeggiator. Much of the logic from the
bass generator can be reused to create a chord arpeggiator like so:
Chimes:
Mode mode // Up, Down, Random
int sequenceIdx = 0
stepSize = 8
else:
stepSize = 16
Homework
• Add more arpeggio modes, such as an up/down mode.
• Add optional scale runs in addition to chord arpeggios.
• Tie the mode to the intensity.
14.3.6 The Soloist
Now that we have a rhythmic and harmonic base, a melody will complete
our “band.” In jazz music, there are often pre-composed melodies in the
Improvisational Music ◾ 247
Solo:
int recordBars = 4 // the length of the recording
int repeats = 1 // how many times we want to repeat the recording
int[] recordedMelody
float repeatStartBar
Homework
• Modify the sampler and the solo musician to play sustained notes.
• Keep track of two recorded melodies to play an AABA sequence.
14.4 WRAPPING UP
There are so many ways to design interactive music that it can become
overwhelming, but the process becomes a little simpler by looking at how
Improvisational Music ◾ 249
humans make music on the spot. Jazz music provides a useful framework
for autonomous music, thanks to its focus on improvisation and well-
defined theory. Leaning on the composer, musician, instrument meta-
phor further grounds the concept such that we can visualize how the code
components come together. Hopefully, these concepts and examples help
you decide how to make your own improvised music systems.
REFERENCES
Index
251
252 ◾ Index
Interactions Movement
player and control inputs, 8 clothing sounds, 7
steps footsteps, 8
attempt, 8 Multi-locale client connections, 88–89
condition, 8 Multiverse, 83–84
execution, 8 Music, 130–131
reaction, 9 Musician recipes
result, 9 bass, 243–245
Interactive media, 2 design considerations, 240–241
Inverse-Z transform, 33–34, 37 designing the conductor, 239–240
ITU-R BS.775-3 reference loudspeaker drummer, 241–243
arrangement, 231 layers, 239
the soloist, 246–248
spacey chimes, 245–246
J
Music improvisation
Jazz music, 235–236, 249 foundations, lightning round
clock, 237–238
sequencers, 238–239
K
sound generator, 236–237
Kosma, J., 236 jazz, 235–236, 249
musician recipes, 239–248
L
N
Lackey, P., 182
Line, 72 Network considerations
Line selection, 86 prediction and client-only VO, 87
Listeners reliability, 87–88
frame of reference, 126–127 Network reliability, 87–88
geometry, 125–126 Neumann, T., 81–89, 182
multiple, 127–128 Newton, I., 122
Localization, 82, 88 Newton’s method, 66–67
Local-player-only audio, 131 Non-linear music, 236
Local space transform, 123 Nyquist limit, 33
M O
McLeran, A., 119–132 Object space transform, 123
Metronome, 237 Object transform, 123
Mitchell, J., 167–179 One-shot sampler, 236
Molar concentration of water, 58 On-screen debug visualization
Monolithic designs, 80 circular min and max, 160, 161
Moog ladder filter debug rendering algorithm, 158–160
coding, 48–49 rectangular min and max, 160, 161, 162
constant Q, 47 shape
digital recreation, 47 circular, 156, 157
discrete sample rate, 47 rectangular, 156, 158
feedback delay fix, 48 unreal engine, 160–164
schematic, 47 Ownership semantics, 105
Index ◾ 255
P Root finding
Newton’s method, 66–67
Panning position, 144
trigonometric solver, 67
Particle effects, 5–6
Patch cable
abstraction (see Abstractions) S
amplified/resonating, 94
Sandboxes, 94
C++ audio abstractions, 95
Saturation vapor pressure, 58
complicit, 94
Screen-space coordinate system, 147–149,
first-pass abstractions, 95–97
151
inputs and outputs, 99–108
Screen-space distance algorithm
mixer, 108–111
pixels, 148–149
mixer splitter, 114–116
range, 149–151
sandboxes, 94
using wrong camera, 151–153
splitter, 111–114
Screen-space distance attenuation
Patch input, 105–108
action RPGs, 145–146
Patch output, 102–105
algorithm, 148–153
Physics, 6–7
challenges, 144
Pipeline, 72
converting to, 147–148
Pole-zero map, 32–35
meaning of, 146–147
Predictive models, 233
on-screen debug visualization,
Pythagoras theorem, 23
156–164
rectangular distances, 154–156
review, 144–145
R
sound designer’s intention, 146
RC network, see Resistor capacitor (RC) steps, 153–154
network third-person camera setup, 144, 145
Realization, 137–138, 140–141 Self-interpolating lookup table, 45
Real voice, 135 Sequencers, 238–239
Real voice pools, 135–136 Server-based voice system, 84
Rectangular distances, 154–156 Server workflow, 84–85
Region thresholds, 177–178 Signal processing, 27
Relaxation frequency, 58 Single emitter approach, 231
Resistor capacitor (RC) network Singleton systems, 129–130
coding, 43–44 Smith, J., 48
DI, 42, 43 Software development, 80
fast ex implementation, 44–45 Somberg, G., 143–165, 181–203
low-pass into analog buffer, 42 Sonification, 134–135
Resonance Sound bed, 4
concept, 46 Sound effect categories
Moog ladder filter, 47–49 audio designers and
Robert Bristow-Johnson’s cookbook programmers, 2
digital Butterworth filters, 40 big three, 3–4
inverse-Z transform, 40 characters
low-pass magnitude plot vs. analog interactions, 8–9
cascade, 41 movement, 7–8
parametric equalizer band, 41 feedback, 9–11
Q-factor, 40 interactive media, 2
256 ◾ Index
Improved
A streamlined A single point search and
experience for of discovery discovery of
our library for all of our content at both
customers eBook content book and
chapter level